# ESGEMME: Evolutionary and Structural Global Epistatic Model for Mutational Effects
## Installation
ESGEMME is implemented in Python 3 and R. It has been tested only on Linux. Since ESGEMME has many dependencies, we recommend using our web site or our docker image.
### Installation from the source:
#### Getting the source code and preparing the environment:
Download the ESGEMME source code from http://gitlab.lcqb.upmc.fr/tekpinar/ESGEMME.
Define and export the environment variable ESGEMME_PATH=/path-to-ESGEMME-directory/
* seqinr R package: https://cran.r-project.org/web/packages/seqinr/index.html
These tools should be installed to be able to use ESGEMME.
## Usage
### Required input files
ESGEMME requires two files:
* a multiple sequence alignment (MSA) file in fasta format (mandatory):
Your query protein must be the first sequence in the fasta file. In addition, the query sequence should not contain any gaps.
* a structure file in PDB format (optional but recommended)
Run the program by typing "python $ESGEMME_PATH/esgemme.py inputAli.fasta --pdbfile input.pdb ".
A help can be accessed by typing "python $ESGEMME_PATH/esgemme.py --help".
### Running the program
Let's assume that our input MSA is inputAli.fasta and input.pdb is our structure file in PDB format.
By default, GEMME will predict the effect of all possible single mutations at all positions in the
query sequence. Alternatively, a set of single or multiple mutations can be given with the option -m.
Eachline of the file should contain a mutation (e.g. D136R) or combination of mutations separated
by commas (or colons) and ordered according to their positions in the sequence (e.g. D136R,V271A).
GEMME calls JET2 to compute evolutionary conservation levels. By default, JET2 will retrieve a set
of sequences related to the query, independent from the input set, according to specific criteria.
The retrieval method used in JET2 is PSI-BLAST, which can perform the search either locally (by
default) or remotely (-r server). Alternatively, the user can provide her/his own psiblast file
(-r input-b pFile) or her/his own multiple sequence alignment in FASTA format (-r input -f fFile).
JET is run in its iterative mode, iJET, 10 times and the final conservation levels are the maxium
values obtained over the 10 iterations.
JET2 configuration file is: default.conf.
JET2 output file is: myProt_jet.res.
By default, GEMME will output mutational effects predictions obtained from the global epistatic model,
the independent model, and a combination of those two using a reduced alphabet (alphabets/lw-i.11.txt):
myProt_pred_evolEpi.txt
myProt_normPred_evolEpi.txt
myProt_pred_evolInd.txt
myProt_normPred_evolInd.txt
myProt_normPred_evolCombi.txt
The values of interest are the normalized predictions (normPred). Each file contains a 20 x n matrix,
where n is the number of positions in the query sequence.
If the user provides her/his own list of mutations, then only the global epistatic model will be run
and the output file will contain 2 columns, the first one with the mutations, the second one with the
normalized predicted effects.
# Cite
Laine E, Karami Y, Carbone A. GEMME: A Simple and Fast Global Epistatic Model Predicting Mutational Effects. Molecular Biology and Evolution, Volume 36, Issue 11, November 2019, Pages 2604–2619