Commit 41b9da71 by Riccardo Vicedomini

Update README.md

parent 0d2fcdea
......@@ -17,27 +17,32 @@ System requirements
Software requirements
---------------------
+ PSI-BLAST
+ HMMer-3.0
+ HMMer-3
+ DAMA
+ GNU parallel (optional but recommended for running jobs on multiple threads)
CLADE's model library
---------------------
In order to run MetaCLADE, CLADE's library must be downloaded from [here](http://134.157.11.245/CLADE/deploy/models/).
Let `[MetaCLADE_DIR]` be the directory of MetaCLADE. The library should be extracted in the following two directories:
Installation
------------
Latest development version of MetaCLADE2 can be obtained running the following command:
```
git clone http://gitlab.lcqb.upmc.fr/vicedomini/metaclade2.git
```
[MetaCLADE_DIR]/data/models/pssms/
[MetaCLADE_DIR]/data/models/hmms/
Then, it is advised to include MetaCLADE2 directory in your PATH environment variable by adding the following line to your `~/.bashrc` file:
```
export PATH=[MetaCLADE_DIR]:${PATH}"
```
where `[MetaCLADE_DIR]` is MetaCLADE's installation directory.
MetaCLADE usage
---------------
```
USAGE: metaclade2 -i <input_fasta> -N <name> [options]
MANDATORY OPTIONS:
-i, --input <path> Input file of AA sequences in FASTA format
(protein sequences or predicted CDS)
-N, --name <str> Dataset/job name
MetaCLADE OPTIONS:
......@@ -67,41 +72,6 @@ MetaCLADE usage
(e.g., use --time-limit 2:30:00 for setting a limit of 2h and 30m)
```
### 1. MetaCLADE configuration
First of all it is advised to include (if it is not) MetaCLADE main directory to your PATH environment variable by adding the following line to your `~/.bashrc`
```
export PATH=[MetaCLADE_DIR]:${PATH}"
```
where `[MetaCLADE_DIR]` is MetaCLADE's installation directory.
Then, in order to create MetaCLADE jobs you must first create a *Run configuration file* (see below) and run the following command:
```
metaclade --run-cfg [Run configuration file]
```
#### Input file preprocessing
Before running MetaCLADE on the input FASTA file you should build a BLAST database.
You can either set the CREATE_BLASTDB parameter to True in the Run configuration file (see below) or you can manually run the following command:
```
makeblastdb -dbtype prot -in /path/to/sequence/database/CDS.faa
```
#### Run configuration file example (mandatory)
Lines starting with a semicolon are considered as comments and are not taken into account. Also, you should provide absolute paths.
```
[Parameters]
DATASET_NAME = CDS
FASTA_FILE = /path/to/sequence/database/CDS.faa
NUMBER_OF_JOBS = 32
;CREATE_BLASTDB = True
;WORKING_DIR = /path/to/a/custom/working/directory
;TMP_DIR = /path/to/a/custom/temporary/directory
;DOMAINS_LIST = /path/to/a/custom/model.list
```
A custom working directory (where jobs and results are saved) could be set with the `WORKING_DIR` parameter (the default value is the directory from which the metaclade command has been called).
A custom temporary directory could be set using the `TMP_DIR` parameter (the default is a temp subdirectory in the working directory).
If you want to restrict MetaCLADE's annotation to a subset of domains, you could provide a file containing one domain identifier per line to the `DOMAINS_DIR` parameter.
#### MetaCLADE configuration file example (optional)
Optionally, a MetaCLADE configuration file could be provided to metaclade with the parameter `--metaclade-cfg`.
This file could be used to set custom paths to PSI-BLAST/HMMER/Python executables or to the MetaCLADE model library.
......@@ -118,52 +88,50 @@ Lines starting with a semicolon are not taken into account. Also, you should pro
```
### 2. MetaCLADE jobs
### MetaCLADE jobs
By default jobs are created in `[WORKING_DIR]/[DATASET_NAME]/jobs/`.
Each (numbered) folder in this directory represents a step of the pipeline and contains several `*.sh` files (depending on the value assigned to the `NUMBER_OF_JOBS` parameter):
By default `[WORKING_DIR]` is the current directory where the `metaclade2` command has been run.
Each (numbered) folder in this directory represents a step of the pipeline and contains several `*.sh` files (depending on the value provided with the `-j [NUMBER_OF_JOBS]` parameter):
```
[DATASET_NAME]_0.sh
[DATASET_NAME]_1.sh
[DATASET_NAME]_2.sh
...
[DATASET_NAME]_[NUMBER_OF_JOBS].sh
```
Jobs must be run in the following order:
Jobs **must** be run in the following order:
```
[WORKING_DIR]/[DATASET_NAME]/jobs/1_model_search/
[WORKING_DIR]/[DATASET_NAME]/jobs/2_arff_files/
[WORKING_DIR]/[DATASET_NAME]/jobs/3_mclade_eval/
[WORKING_DIR]/[DATASET_NAME]/jobs/4_best_domains/
[WORKING_DIR]/[DATASET_NAME]/jobs/5_final_prediction/
[WORKING_DIR]/[DATASET_NAME]/jobs/1_search/
[WORKING_DIR]/[DATASET_NAME]/jobs/2_filter/
[WORKING_DIR]/[DATASET_NAME]/jobs/3_arch/
```
In the first three directories you can find a `submit.sh` file that contains the `qsub` command to submit each job to the queue system of a SGE environment.
This file can be used (or adapted for other HPC environments) in order to submit all jobs at each step.
Each file in a given directory can be submitted independently to the HPC environment.
### 3. MetaCLADE results
### MetaCLADE2 results
By default results are stored in the `[WORKING_DIR]/[DATASET_NAME]/results/` directory.
Each (numbered) folder in this directory contains the results after each step of the pipeline.
After running each step, the final annotation is saved in the file
```
[WORKING_DIR]/[DATASET_NAME]/results/5_final_prediction/final_prediction.mclade
[WORKING_DIR]/[DATASET_NAME]/results/3_arch/
```
It is a tab-separated values (TSV) file whose lines represent annotations.
Each annotation has the following 10 fields:
* E-value
* Score
Each annotation has the following fields:
* Sequence identifier
* Sequence start
* Sequence end
* Sequence length
* Domain identifier (_i.e._, Pfam accession number)
* Model identifier
* Model start
* Model end
* Domain identifier (i.e., Pfam accession number)
* Sequence identifier
* Sequence start
* Sequence end
* Prediction probability
* Model size
* E-value of the prediction
* Bitscore of the prediction
* Accuracy value in the interval [0,1]
Example
-------
Example command:
metaclade2 -i ./test/test.fa -N pippo -d PF00875,PF03441,PF03167,PF12546 -W ./test/ --arch --sge --pe smp -j 2 -t 2
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment