@@ -70,7 +73,7 @@ where `[MetaCLADE2_DIR]` is MetaCLADE2 installation directory.
-V, --version Print version
SGE OPTIONS:
--sge Run MetaCLADE jobs on a SGE HPC environment
--sge Run MetaCLADE2 jobs on a SGE HPC environment
--pe <name> Parallel environment to use (mandatory)
--queue <name> Name of a specific queue where jobs are submitted
--time-limit <hh:mm:ss> Time limit for submitted jobs formatted as hh:mm:ss
...
...
@@ -78,8 +81,13 @@ where `[MetaCLADE2_DIR]` is MetaCLADE2 installation directory.
(e.g., use --time-limit 2:30:00 for setting a limit of 2h and 30m)
```
Scripts and computation results are stored in `[WORKING_DIR]/[DATASET_NAME]`. By default `[WORKING_DIR]` is the current directory (the one from which `metaclade2` command is run).
It is possible to change this path with the `-W|--work-dir` argument.
It is finally possible to delete intermediate files (after a successful execution) with
MetaCLADE2 optionnally accepts a configuration file that allows the user to set custom paths to the MetaCLADE model library.
MetaCLADE2 optionnally accepts a configuration file that allows the user to set custom paths to the MetaCLADE2 model library.
Lines starting with a semicolon are not taken into account and are considered as comments.
You **must** also provide absolute paths.
```
...
...
@@ -88,37 +96,9 @@ You **must** also provide absolute paths.
;hmms_path = /absolute/path/to/data/models/HMMs
```
# MetaCLADE jobs
By default jobs are created in `[WORKING_DIR]/[DATASET_NAME]/jobs/`. By default `[WORKING_DIR]` is the current directory where the `metaclade2` command has been run.
Using the `--sge` parameter it is possible to automatically handle MetaCLADE2 pipeline in a SGE-based cluster (see [MetaCLADE2 usage](#metaclade2-usage) section).
Each (numbered) folder in this directory represents a step of the pipeline and contains several `*.sh` files (depending on the value provided with the `-j [NUMBER_OF_JOBS]` parameter):
```
[DATASET_NAME]_1.sh
[DATASET_NAME]_2.sh
...
[DATASET_NAME]_[NUMBER_OF_JOBS].sh
```
Jobs **must** be run in the following order:
```
[WORKING_DIR]/[DATASET_NAME]/jobs/1_search/
[WORKING_DIR]/[DATASET_NAME]/jobs/2_filter/
[WORKING_DIR]/[DATASET_NAME]/jobs/3_arch/
```
Each file in a given directory can be submitted independently to the HPC environment.
# MetaCLADE2 results
By default results are stored in the `[WORKING_DIR]/[DATASET_NAME]/results/` directory.
Each (numbered) folder in this directory contains the results after each step of the pipeline.
After running each step, the final annotation is saved in the file named
It is a tab-separated values (TSV) file whose lines represent annotations.
Each annotation has the following fields:
# MetaCLADE2 output architecture
The domain architecture for the sequences provided in input is saved as a TSV file to `[WORKING_DIR]/[DATASET_NAME]/[DATASET_NAME].arch.tsv` (or to the path specified with the `-o|--output` argument.
Each line represents a domain annotation and has the following fields/columns:
* Sequence identifier
* Sequence start
* Sequence end
...
...
@@ -131,32 +111,30 @@ Each annotation has the following fields:
* E-value of the prediction
* Bitscore of the prediction
* Accuracy value in the interval [0,1]
* Species of the template used to build the model
# Example
A test dataset is available in the `test` directory and can be run with the following command:
A test dataset is available in the `test` directory and can be run, using 4 threads, with the following command:
This will create at most two scrips (jobs) in each directory of the pipeline.
Alternatively, if you are running MetaCLADE2 in a SGE cluster, the following script will run at most 2 jobs, each one using 2 CPUs, for each step of the pipeline:
Alternatively, in a SGE-based cluster, the following command will run MetaCLADE2 submitting at most 2 jobs, each one using 4 CPUs, for each step of the pipeline: