Commit 2403b757 by Mustafa Tekpinar

Updated README file.

parent e7a7207f
############################################################################################################
# #
# GEMME: a tool to predict mutational outcomes using evolutionary conservation and global epistasis #
# #
############################################################################################################
#
#
# GEMME is implemented in Python and R.
# https://www.python.org/
# https://cran.r-project.org/
#
#
##################
# Dependencies: #
##################
#
# Joint Evolutionary Trees: http://www.lcqb.upmc.fr/JET2/
# seqinr R package: https://cran.r-project.org/web/packages/seqinr/index.html
# These tools should be installed to be able to use GEMME.
#
#
##################
# Installation: #
##################
#
# Download the GEMME.tgz archive from http://www.lcqb.upmc.fr/GEMME/.
# Uncompress the archive in the directory of your choice.
# Define and export the environment variable GEMME_PATH=/path-to-GEMME-directory/
# Run the program by typing "python $GEMME_PATH/gemme.py inputAli.fasta".
# A help can be accessed by typing "python $GEMME_PATH/gemme.py --help".
#
#
#################
# Usage notes: #
#################
#
# The inputAli.fasta is a mandatory argument that corresponds to the input multiple sequence
# alignment file, in FASTA format. The query sequence is taken as the first sequence in the alignment.
#
# By default, GEMME will predict the effect of all possible single mutations at all positions in the
# query sequence. Alternatively, a set of single or multiple mutations can be given with the option -m.
# Eachline of the file should contain a mutation (e.g. D136R) or combination of mutations separated
# by commas and ordered according to their positions in the sequence (e.g. D136R,V271A).
#
# GEMME calls JET2 to compute evolutionary conservation levels. By default, JET2 will retrieve a set
# of sequences related to the query, independent from the input set, according to specific criteria.
# The retrieval method used in JET2 is PSI-BLAST, which can perform the search either locally (by
# default) or remotely (-r server). Alternatively, the user can provide her/his own psiblast file
# (-r input-b pFile) or her/his own multiple sequence alignment in FASTA format (-r input -f fFile).
# JET is run in its iterative mode, iJET, 10 times and the final conservation levels are the maxium
# values obtained over the 10 iterations.
# JET2 configuration file is: default.conf.
# JET2 output file is: myProt_jet.res.
#
# By default, GEMME will output mutational effects predictions obtained from the global epistatic model,
# the independent model, and a combination of those two using a reduced alphabet (alphabets/lw-i.11.txt):
# myProt_pred_evolEpi.txt
# myProt_normPred_evolEpi.txt
# myProt_pred_evolInd.txt
# myProt_normPred_evolInd.txt
# myProt_normPred_evolCombi.txt
# The values of interest are the normalized predictions (normPred). Each file contains a 20 x n matrix,
# where n is the number of positions in the query sequence.
# If the user provides her/his own list of mutations, then only the global epistatic model will be run
# and the output file will contain 2 columns, the first one with the mutations, the second one with the
# normalized predicted effects.
#
#
####################
# Main reference: #
####################
#
# Laine E, Karami Y, Carbone A. GEMME: A Simple and Fast Global Epistatic Model Predicting Mutational Effects
# Molecular Biology and Evolution, Volume 36, Issue 11, November 2019, Pages 2604–2619
#
#
#
############################################################################################################
# ESGEMME: Evolutionary and Structural Global Epistatic Model for Mutational Effects
## Installation
ESGEMME is implemented in Python 3 and R. It has been tested only on Linux. Since ESGEMME has many dependencies, we recommend using our web site or our docker image.
### Installation from the source:
#### Getting the source code and preparing the environment:
Download the ESGEMME source code from http://gitlab.lcqb.upmc.fr/tekpinar/ESGEMME.
Define and export the environment variable ESGEMME_PATH=/path-to-ESGEMME-directory/
#### Installing the dependencies:
ESGEMME has the following external dependencies:
* Joint Evolutionary Trees: http://www.lcqb.upmc.fr/JET2/
* seqinr R package: https://cran.r-project.org/web/packages/seqinr/index.html
These tools should be installed to be able to use ESGEMME.
## Usage
### Required input files
ESGEMME requires two files:
* a multiple sequence alignment (MSA) file in fasta format (mandatory):
Your query protein must be the first sequence in the fasta file. In addition, the query sequence should not contain any gaps.
* a structure file in PDB format (optional but recommended)
Run the program by typing "python $ESGEMME_PATH/esgemme.py inputAli.fasta --pdbfile input.pdb ".
A help can be accessed by typing "python $ESGEMME_PATH/esgemme.py --help".
### Running the program
Let's assume that our input MSA is inputAli.fasta and input.pdb is our structure file in PDB format.
By default, GEMME will predict the effect of all possible single mutations at all positions in the
query sequence. Alternatively, a set of single or multiple mutations can be given with the option -m.
Eachline of the file should contain a mutation (e.g. D136R) or combination of mutations separated
by commas (or colons) and ordered according to their positions in the sequence (e.g. D136R,V271A).
GEMME calls JET2 to compute evolutionary conservation levels. By default, JET2 will retrieve a set
of sequences related to the query, independent from the input set, according to specific criteria.
The retrieval method used in JET2 is PSI-BLAST, which can perform the search either locally (by
default) or remotely (-r server). Alternatively, the user can provide her/his own psiblast file
(-r input-b pFile) or her/his own multiple sequence alignment in FASTA format (-r input -f fFile).
JET is run in its iterative mode, iJET, 10 times and the final conservation levels are the maxium
values obtained over the 10 iterations.
JET2 configuration file is: default.conf.
JET2 output file is: myProt_jet.res.
By default, GEMME will output mutational effects predictions obtained from the global epistatic model,
the independent model, and a combination of those two using a reduced alphabet (alphabets/lw-i.11.txt):
myProt_pred_evolEpi.txt
myProt_normPred_evolEpi.txt
myProt_pred_evolInd.txt
myProt_normPred_evolInd.txt
myProt_normPred_evolCombi.txt
The values of interest are the normalized predictions (normPred). Each file contains a 20 x n matrix,
where n is the number of positions in the query sequence.
If the user provides her/his own list of mutations, then only the global epistatic model will be run
and the output file will contain 2 columns, the first one with the mutations, the second one with the
normalized predicted effects.
# Cite
Laine E, Karami Y, Carbone A. GEMME: A Simple and Fast Global Epistatic Model Predicting Mutational Effects. Molecular Biology and Evolution, Volume 36, Issue 11, November 2019, Pages 2604–2619
...@@ -296,8 +296,8 @@ def cleanTheMess(prot,bFile,fFile, chainID): ...@@ -296,8 +296,8 @@ def cleanTheMess(prot,bFile,fFile, chainID):
if fFile!=prot+"_"+chainID+".fasta": if fFile!=prot+"_"+chainID+".fasta":
if os.path.isfile(prot+"_"+chainID+".fasta"): if os.path.isfile(prot+"_"+chainID+".fasta"):
os.remove(prot+"_"+chainID+".fasta") os.remove(prot+"_"+chainID+".fasta")
# if os.path.isfile(prot+"_jet.res"): if os.path.isfile(prot+"_jet.res"):
# os.remove(prot+"_jet.res") os.remove(prot+"_jet.res")
if os.path.isfile(prot+".pdb"): if os.path.isfile(prot+".pdb"):
os.remove(prot+".pdb") os.remove(prot+".pdb")
......
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment