@@ -8,35 +8,35 @@ Biochemical and regulatory pathways have until recently been thought and modelle
...
@@ -8,35 +8,35 @@ Biochemical and regulatory pathways have until recently been thought and modelle
We introduce MetaCLADE2, and improved profile-based domain annotation pipeline based on the multi-source domain annotation strategy. It provides a domain annotation realised directly from reads, and reaches an improved identification of the catalog of functions in a microbiome. MetaCLADE2 can be applied to either metagenomic or metatranscriptomic datasets as well as proteomes.
We introduce MetaCLADE2, and improved profile-based domain annotation pipeline based on the multi-source domain annotation strategy. It provides a domain annotation realised directly from reads, and reaches an improved identification of the catalog of functions in a microbiome. MetaCLADE2 can be applied to either metagenomic or metatranscriptomic datasets as well as proteomes.
System requirements
# System requirements
-------------------
+ MetaCLADE has been developed under a Unix environment.
+ MetaCLADE2 has been developed under a Linux environment.
+ The bash environment should be installed.
+ The bash environment should be installed.
+ Python 3 is required for this package.
+ Python 3 is required for this package.
Software requirements
# Software requirements
---------------------
+ HMMer-3
+ HMMer-3
+ DAMA
+ DAMA
+ GNU parallel (optional but recommended for running jobs on multiple threads)
+ GNU parallel (optional but recommended for running jobs on multiple threads)
Installation
# Installation
------------
Latest development version of MetaCLADE2 can be obtained running the following command:
Latest development version of MetaCLADE2 can be obtained running the following command:
Optionally, a MetaCLADE configuration file could be provided to metaclade with the parameter `--metaclade-cfg`.
MetaCLADE2 optionnally accepts a configuration file that allows the user to set custom paths to the MetaCLADE model library.
This file could be used to set custom paths to PSI-BLAST/HMMER/Python executables or to the MetaCLADE model library.
Lines starting with a semicolon are not taken into account and are considered as comments.
Lines starting with a semicolon are not taken into account. Also, you should provide absolute paths.
You **must** also provide absolute paths.
```
```
[Programs]
[metaclade]
;PSIBLAST_DIR = /home/ncbi-blast-2.7.1+/bin/
;ccms_path = /absolute/path/to/data/models/CCMs
;HMMER_DIR = /home/hmmer-3.2.1/bin/
;hmms_path = /absolute/path/to/data/models/HMMs
;PYTHON_DIR = /home/python-2.7.15/bin
[Models]
;PSSMS_DIR = /home/MetaCLADE/data/models/pssms
;HMMS_DIR = /home/MetaCLADE/data/models/hmms
```
```
### MetaCLADE jobs
# MetaCLADE jobs
By default jobs are created in `[WORKING_DIR]/[DATASET_NAME]/jobs/`.
By default jobs are created in `[WORKING_DIR]/[DATASET_NAME]/jobs/`. By default `[WORKING_DIR]` is the current directory where the `metaclade2` command has been run.
By default `[WORKING_DIR]` is the current directory where the `metaclade2` command has been run.
Using the `--sge` parameter it is possible to automatically handle MetaCLADE2 pipeline in a SGE-based cluster (see [MetaCLADE2 usage](#metaclade2-usage) section).
Each (numbered) folder in this directory represents a step of the pipeline and contains several `*.sh` files (depending on the value provided with the `-j [NUMBER_OF_JOBS]` parameter):
Each (numbered) folder in this directory represents a step of the pipeline and contains several `*.sh` files (depending on the value provided with the `-j [NUMBER_OF_JOBS]` parameter):
```
```
[DATASET_NAME]_1.sh
[DATASET_NAME]_1.sh
...
@@ -108,7 +104,7 @@ Jobs **must** be run in the following order:
...
@@ -108,7 +104,7 @@ Jobs **must** be run in the following order:
Each file in a given directory can be submitted independently to the HPC environment.
Each file in a given directory can be submitted independently to the HPC environment.
### MetaCLADE2 results
# MetaCLADE2 results
By default results are stored in the `[WORKING_DIR]/[DATASET_NAME]/results/` directory.
By default results are stored in the `[WORKING_DIR]/[DATASET_NAME]/results/` directory.
Each (numbered) folder in this directory contains the results after each step of the pipeline.
Each (numbered) folder in this directory contains the results after each step of the pipeline.
After running each step, the final annotation is saved in the file
After running each step, the final annotation is saved in the file
...
@@ -131,7 +127,16 @@ Each annotation has the following fields:
...
@@ -131,7 +127,16 @@ Each annotation has the following fields:
* Accuracy value in the interval [0,1]
* Accuracy value in the interval [0,1]
Example
# Example
-------
A test dataset is available in the `test` directory and can be run with the following command:
Alternatively, if you are running MetaCLADE2 in a SGE cluster, the following script will run at most 2 jobs, each one using 2 CPUs, for each step of the pipeline: