-i, --input <path> Input file of AA sequences in FASTA format
(protein sequences or predicted CDS)
-N, --name <str> Dataset/job name
MetaCLADE OPTIONS:
...
...
@@ -67,41 +72,6 @@ MetaCLADE usage
(e.g., use --time-limit 2:30:00 for setting a limit of 2h and 30m)
```
### 1. MetaCLADE configuration
First of all it is advised to include (if it is not) MetaCLADE main directory to your PATH environment variable by adding the following line to your `~/.bashrc`
```
export PATH=[MetaCLADE_DIR]:${PATH}"
```
where `[MetaCLADE_DIR]` is MetaCLADE's installation directory.
Then, in order to create MetaCLADE jobs you must first create a *Run configuration file* (see below) and run the following command:
```
metaclade --run-cfg [Run configuration file]
```
#### Input file preprocessing
Before running MetaCLADE on the input FASTA file you should build a BLAST database.
You can either set the CREATE_BLASTDB parameter to True in the Run configuration file (see below) or you can manually run the following command:
A custom working directory (where jobs and results are saved) could be set with the `WORKING_DIR` parameter (the default value is the directory from which the metaclade command has been called).
A custom temporary directory could be set using the `TMP_DIR` parameter (the default is a temp subdirectory in the working directory).
If you want to restrict MetaCLADE's annotation to a subset of domains, you could provide a file containing one domain identifier per line to the `DOMAINS_DIR` parameter.
#### MetaCLADE configuration file example (optional)
Optionally, a MetaCLADE configuration file could be provided to metaclade with the parameter `--metaclade-cfg`.
This file could be used to set custom paths to PSI-BLAST/HMMER/Python executables or to the MetaCLADE model library.
...
...
@@ -118,52 +88,50 @@ Lines starting with a semicolon are not taken into account. Also, you should pro
```
### 2. MetaCLADE jobs
### MetaCLADE jobs
By default jobs are created in `[WORKING_DIR]/[DATASET_NAME]/jobs/`.
Each (numbered) folder in this directory represents a step of the pipeline and contains several `*.sh` files (depending on the value assigned to the `NUMBER_OF_JOBS` parameter):
By default `[WORKING_DIR]` is the current directory where the `metaclade2` command has been run.
Each (numbered) folder in this directory represents a step of the pipeline and contains several `*.sh` files (depending on the value provided with the `-j [NUMBER_OF_JOBS]` parameter):
In the first three directories you can find a `submit.sh` file that contains the `qsub` command to submit each job to the queue system of a SGE environment.
This file can be used (or adapted for other HPC environments) in order to submit all jobs at each step.
Each file in a given directory can be submitted independently to the HPC environment.
### 3. MetaCLADE results
### MetaCLADE2 results
By default results are stored in the `[WORKING_DIR]/[DATASET_NAME]/results/` directory.
Each (numbered) folder in this directory contains the results after each step of the pipeline.
After running each step, the final annotation is saved in the file