Commit 1b6d600b by Riccardo Vicedomini

Update README.md

parent 41b9da71
...@@ -8,35 +8,35 @@ Biochemical and regulatory pathways have until recently been thought and modelle ...@@ -8,35 +8,35 @@ Biochemical and regulatory pathways have until recently been thought and modelle
We introduce MetaCLADE2, and improved profile-based domain annotation pipeline based on the multi-source domain annotation strategy. It provides a domain annotation realised directly from reads, and reaches an improved identification of the catalog of functions in a microbiome. MetaCLADE2 can be applied to either metagenomic or metatranscriptomic datasets as well as proteomes. We introduce MetaCLADE2, and improved profile-based domain annotation pipeline based on the multi-source domain annotation strategy. It provides a domain annotation realised directly from reads, and reaches an improved identification of the catalog of functions in a microbiome. MetaCLADE2 can be applied to either metagenomic or metatranscriptomic datasets as well as proteomes.
System requirements # System requirements
-------------------
+ MetaCLADE has been developed under a Unix environment. + MetaCLADE2 has been developed under a Linux environment.
+ The bash environment should be installed. + The bash environment should be installed.
+ Python 3 is required for this package. + Python 3 is required for this package.
Software requirements # Software requirements
---------------------
+ HMMer-3 + HMMer-3
+ DAMA + DAMA
+ GNU parallel (optional but recommended for running jobs on multiple threads) + GNU parallel (optional but recommended for running jobs on multiple threads)
Installation # Installation
------------
Latest development version of MetaCLADE2 can be obtained running the following command: Latest development version of MetaCLADE2 can be obtained running the following command:
``` ```
git clone http://gitlab.lcqb.upmc.fr/vicedomini/metaclade2.git git clone http://gitlab.lcqb.upmc.fr/vicedomini/metaclade2.git
``` ```
Then, it is advised to include MetaCLADE2 directory in your PATH environment variable by adding the following line to your `~/.bashrc` file: Then, it is advised to include MetaCLADE2 directory in your PATH environment variable by adding the following line to your `~/.bashrc` file:
``` ```
export PATH=[MetaCLADE_DIR]:${PATH}" export PATH=[MetaCLADE2_DIR]:${PATH}"
``` ```
where `[MetaCLADE_DIR]` is MetaCLADE's installation directory. where `[MetaCLADE2_DIR]` is MetaCLADE2 installation directory.
MetaCLADE usage # MetaCLADE2 usage
---------------
``` ```
USAGE: metaclade2 -i <input_fasta> -N <name> [options] USAGE: metaclade2 -i <input_fasta> -N <name> [options]
...@@ -72,25 +72,21 @@ MetaCLADE usage ...@@ -72,25 +72,21 @@ MetaCLADE usage
(e.g., use --time-limit 2:30:00 for setting a limit of 2h and 30m) (e.g., use --time-limit 2:30:00 for setting a limit of 2h and 30m)
``` ```
#### MetaCLADE configuration file example (optional) #### Optional MetaCLADE2 configuration file (available soon)
Optionally, a MetaCLADE configuration file could be provided to metaclade with the parameter `--metaclade-cfg`. MetaCLADE2 optionnally accepts a configuration file that allows the user to set custom paths to the MetaCLADE model library.
This file could be used to set custom paths to PSI-BLAST/HMMER/Python executables or to the MetaCLADE model library. Lines starting with a semicolon are not taken into account and are considered as comments.
Lines starting with a semicolon are not taken into account. Also, you should provide absolute paths. You **must** also provide absolute paths.
``` ```
[Programs] [metaclade]
;PSIBLAST_DIR = /home/ncbi-blast-2.7.1+/bin/ ;ccms_path = /absolute/path/to/data/models/CCMs
;HMMER_DIR = /home/hmmer-3.2.1/bin/ ;hmms_path = /absolute/path/to/data/models/HMMs
;PYTHON_DIR = /home/python-2.7.15/bin
[Models]
;PSSMS_DIR = /home/MetaCLADE/data/models/pssms
;HMMS_DIR = /home/MetaCLADE/data/models/hmms
``` ```
### MetaCLADE jobs # MetaCLADE jobs
By default jobs are created in `[WORKING_DIR]/[DATASET_NAME]/jobs/`. By default jobs are created in `[WORKING_DIR]/[DATASET_NAME]/jobs/`. By default `[WORKING_DIR]` is the current directory where the `metaclade2` command has been run.
By default `[WORKING_DIR]` is the current directory where the `metaclade2` command has been run. Using the `--sge` parameter it is possible to automatically handle MetaCLADE2 pipeline in a SGE-based cluster (see [MetaCLADE2 usage](#metaclade2-usage) section).
Each (numbered) folder in this directory represents a step of the pipeline and contains several `*.sh` files (depending on the value provided with the `-j [NUMBER_OF_JOBS]` parameter): Each (numbered) folder in this directory represents a step of the pipeline and contains several `*.sh` files (depending on the value provided with the `-j [NUMBER_OF_JOBS]` parameter):
``` ```
[DATASET_NAME]_1.sh [DATASET_NAME]_1.sh
...@@ -108,7 +104,7 @@ Jobs **must** be run in the following order: ...@@ -108,7 +104,7 @@ Jobs **must** be run in the following order:
Each file in a given directory can be submitted independently to the HPC environment. Each file in a given directory can be submitted independently to the HPC environment.
### MetaCLADE2 results # MetaCLADE2 results
By default results are stored in the `[WORKING_DIR]/[DATASET_NAME]/results/` directory. By default results are stored in the `[WORKING_DIR]/[DATASET_NAME]/results/` directory.
Each (numbered) folder in this directory contains the results after each step of the pipeline. Each (numbered) folder in this directory contains the results after each step of the pipeline.
After running each step, the final annotation is saved in the file After running each step, the final annotation is saved in the file
...@@ -131,7 +127,16 @@ Each annotation has the following fields: ...@@ -131,7 +127,16 @@ Each annotation has the following fields:
* Accuracy value in the interval [0,1] * Accuracy value in the interval [0,1]
Example # Example
------- A test dataset is available in the `test` directory and can be run with the following command:
```
cd [METACLADE2_DIR]
metaclade2 -i ./test/test.fa -N testDataSet -d PF00875,PF03441,PF03167,PF12546 -W ./ -j 2
```
This will create at most two scrips (jobs) in each directory of the pipeline.
metaclade2 -i ./test/test.fa -N pippo -d PF00875,PF03441,PF03167,PF12546 -W ./test/ --arch --sge --pe smp -j 2 -t 2 Alternatively, if you are running MetaCLADE2 in a SGE cluster, the following script will run at most 2 jobs, each one using 2 CPUs, for each step of the pipeline:
```
cd [METACLADE2_DIR]
metaclade2 -i ./test/test.fa -N testDataSet -d PF00875,PF03441,PF03167,PF12546 -W ./ --sge --pe smp -j 2 -t 2
```
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment