@@ -8,35 +8,35 @@ Biochemical and regulatory pathways have until recently been thought and modelle
We introduce MetaCLADE2, and improved profile-based domain annotation pipeline based on the multi-source domain annotation strategy. It provides a domain annotation realised directly from reads, and reaches an improved identification of the catalog of functions in a microbiome. MetaCLADE2 can be applied to either metagenomic or metatranscriptomic datasets as well as proteomes.
System requirements
-------------------
+ MetaCLADE has been developed under a Unix environment.
# System requirements
+ MetaCLADE2 has been developed under a Linux environment.
+ The bash environment should be installed.
+ Python 3 is required for this package.
Software requirements
---------------------
# Software requirements
+ HMMer-3
+ DAMA
+ GNU parallel (optional but recommended for running jobs on multiple threads)
Installation
------------
# Installation
Latest development version of MetaCLADE2 can be obtained running the following command:
MetaCLADE2 optionnally accepts a configuration file that allows the user to set custom paths to the MetaCLADE model library.
Lines starting with a semicolon are not taken into account and are considered as comments.
You **must** also provide absolute paths.
```
[Programs]
;PSIBLAST_DIR = /home/ncbi-blast-2.7.1+/bin/
;HMMER_DIR = /home/hmmer-3.2.1/bin/
;PYTHON_DIR = /home/python-2.7.15/bin
[Models]
;PSSMS_DIR = /home/MetaCLADE/data/models/pssms
;HMMS_DIR = /home/MetaCLADE/data/models/hmms
[metaclade]
;ccms_path = /absolute/path/to/data/models/CCMs
;hmms_path = /absolute/path/to/data/models/HMMs
```
### MetaCLADE jobs
By default jobs are created in `[WORKING_DIR]/[DATASET_NAME]/jobs/`.
By default `[WORKING_DIR]` is the current directory where the `metaclade2` command has been run.
# MetaCLADE jobs
By default jobs are created in `[WORKING_DIR]/[DATASET_NAME]/jobs/`. By default `[WORKING_DIR]` is the current directory where the `metaclade2` command has been run.
Using the `--sge` parameter it is possible to automatically handle MetaCLADE2 pipeline in a SGE-based cluster (see [MetaCLADE2 usage](#metaclade2-usage) section).
Each (numbered) folder in this directory represents a step of the pipeline and contains several `*.sh` files (depending on the value provided with the `-j [NUMBER_OF_JOBS]` parameter):
```
[DATASET_NAME]_1.sh
...
...
@@ -108,7 +104,7 @@ Jobs **must** be run in the following order:
Each file in a given directory can be submitted independently to the HPC environment.
### MetaCLADE2 results
# MetaCLADE2 results
By default results are stored in the `[WORKING_DIR]/[DATASET_NAME]/results/` directory.
Each (numbered) folder in this directory contains the results after each step of the pipeline.
After running each step, the final annotation is saved in the file
...
...
@@ -131,7 +127,16 @@ Each annotation has the following fields:
* Accuracy value in the interval [0,1]
Example
-------
# Example
A test dataset is available in the `test` directory and can be run with the following command:
Alternatively, if you are running MetaCLADE2 in a SGE cluster, the following script will run at most 2 jobs, each one using 2 CPUs, for each step of the pipeline: