Commit f5de1a26 by Mustafa Tekpinar

Some more documentation and Dockerfile

parent 2403b757
FROM ubuntu:20.04
LABEL maintainer="Mustafa Tekpinar <tekpinar@buffalo.edu>"
LABEL description="ESGEMME: Evolutionary and Structural Global Epistatic Model for Predicting Mutational Effects."
WORKDIR /home/tekpinar/research/lcqb
ENV JET2_PATH=/home/tekpinar/research/lcqb/JET2
ENV ESGEMME_PATH=/home/tekpinar/research/lcqb/ESGEMME
COPY ./JET2/ ./JET2/
COPY ./naccess2.1.1 ./naccess2.1.1
COPY ./ESGEMME/esgemme.py ./ESGEMME/esgemme.py
COPY ./ESGEMME/pred.R ./ESGEMME/pred.R
COPY ./ESGEMME/computePred.R ./ESGEMME/computePred.R
COPY ./ESGEMME/default.conf ./ESGEMME/default.conf
COPY ./ESGEMME/data/ ./ESGEMME/data/
###################################################################
RUN apt-get update --fix-missing && \
apt-get install -y --no-install-recommends apt-utils &&\
apt-get install -y software-properties-common && \
apt-get install -y build-essential && \
apt-get install -y python3-dev && \
apt-get install -y python3-pip && \
apt-get install -y r-base r-base-core && \
apt-get install -y muscle && \
apt-get install -y dssp && \
apt-get install -y default-jre && \
apt-get install csh && \
apt-get clean && rm -rf /var/lib/apt/lists/* /tmp/* /var/tmp/*
###################################################################
RUN ln -s /usr/bin/python3 /usr/bin/python
RUN pip3 install --no-cache-dir prody matplotlib scipy pandas biotite
RUN Rscript -e 'install.packages("seqinr", repos="http://cran.us.r-project.org", dependencies=TRUE)'
[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)
# ESGEMME: Evolutionary and Structural Global Epistatic Model for Mutational Effects # ESGEMME: Evolutionary and Structural Global Epistatic Model for Mutational Effects
## Installation ## Introduction
ESGEMME is implemented in Python 3 and R. It has been tested only on Linux. Since ESGEMME has many dependencies, we recommend using our web site or our docker image. ESGEMME is a program predicting mutational effects of a protein based on evolutionary and structural information.
It can calculate effects of single point mutations and multiple point mutations.
We recommend using ESGEMME via our web site or our docker image.
### Installation from the source:
#### Getting the source code and preparing the environment:
Download the ESGEMME source code from http://gitlab.lcqb.upmc.fr/tekpinar/ESGEMME.
Define and export the environment variable ESGEMME_PATH=/path-to-ESGEMME-directory/
#### Installing the dependencies:
ESGEMME has the following external dependencies:
* Joint Evolutionary Trees: http://www.lcqb.upmc.fr/JET2/
* seqinr R package: https://cran.r-project.org/web/packages/seqinr/index.html
These tools should be installed to be able to use ESGEMME.
## Usage ## Input Data Requirements
### Required input files
ESGEMME requires two files: ESGEMME requires two files:
* a multiple sequence alignment (MSA) file in fasta format (mandatory): * a multiple sequence alignment (MSA) file in fasta format (mandatory):
...@@ -26,13 +18,26 @@ ESGEMME requires two files: ...@@ -26,13 +18,26 @@ ESGEMME requires two files:
* a structure file in PDB format (optional but recommended) * a structure file in PDB format (optional but recommended)
Run the program by typing "python $ESGEMME_PATH/esgemme.py inputAli.fasta --pdbfile input.pdb ". One of the fastest ways to obtain both input MSA and a PDB file is to run colabfold:
A help can be accessed by typing "python $ESGEMME_PATH/esgemme.py --help". https://colab.research.google.com/github/sokrypton/ColabFold/blob/main/AlphaFold2.ipynb
Please note that the MSA file produced by colabfold (a3m file) can contain gaps in the query sequence. You have to remove them before using it in ESGEMME.
For testing purpose, you can find example input files for BLAT protein in data/ folder of this repository.
## Usage
### Running the program ### Running the program
Let's assume that our input MSA is inputAli.fasta and input.pdb is our structure file in PDB format. Let's assume that our input MSA is inputAli.fasta and input.pdb is our structure file in PDB format.
Run the program by issuing the following command in a bash terminal:
```bash
python $ESGEMME_PATH/esgemme.py inputAli.fasta --pdbfile input.pdb
```
A quick help can be accessed by typing
```bash
python $ESGEMME_PATH/esgemme.py --help
```
By default, GEMME will predict the effect of all possible single mutations at all positions in the By default, GEMME will predict the effect of all possible single mutations at all positions in the
query sequence. Alternatively, a set of single or multiple mutations can be given with the option -m. query sequence. Alternatively, a set of single or multiple mutations can be given with the option -m.
...@@ -48,20 +53,43 @@ JET is run in its iterative mode, iJET, 10 times and the final conservation leve ...@@ -48,20 +53,43 @@ JET is run in its iterative mode, iJET, 10 times and the final conservation leve
values obtained over the 10 iterations. values obtained over the 10 iterations.
JET2 configuration file is: default.conf. JET2 configuration file is: default.conf.
JET2 output file is: myProt_jet.res. JET2 output file is: myProt_jet.res.
### Analyzing the ESGEMME output
By default, GEMME will output mutational effects predictions obtained from the global epistatic model, By default, ESGEMME will output the following files:
the independent model, and a combination of those two using a reduced alphabet (alphabets/lw-i.11.txt): * myProt_pred_evolEpi.txt
myProt_pred_evolEpi.txt * myProt_normPred_evolEpi.txt
myProt_normPred_evolEpi.txt * myProt_pred_evolInd.txt
myProt_pred_evolInd.txt * myProt_normPred_evolInd.txt
myProt_normPred_evolInd.txt * myProt_normPred_evolCombi.txt
myProt_normPred_evolCombi.txt
The most important output file is **myProt_normPred_evolCombi.txt**.
The values of interest are the normalized predictions (normPred). Each file contains a 20 x n matrix, The values of interest are the normalized predictions (normPred). Each file contains a 20 x n matrix,
where n is the number of positions in the query sequence. where n is the number of positions in the query sequence.
If the user provides her/his own list of mutations, then only the global epistatic model will be run If the user provides her/his own list of mutations, then only the global epistatic model will be run
and the output file will contain 2 columns, the first one with the mutations, the second one with the and the output file will contain 2 columns, the first one with the mutations, the second one with the
normalized predicted effects. normalized predicted effects.
## Installation
ESGEMME is implemented in Python 3 and R. It has been tested only on Linux. Since ESGEMME has many dependencies, we recommend using our web site or our docker image. If you are a determined user, here comes the steps required to install it from the source.
### Installation from the source:
#### Getting the source code and preparing the environment:
Download the ESGEMME source code from http://gitlab.lcqb.upmc.fr/tekpinar/ESGEMME.
Define and export the environment variable ESGEMME_PATH=/path-to-ESGEMME-directory/
```bash
export ESGEMME_PATH=/path-to-ESGEMME-directory/
```
#### Installing the dependencies:
ESGEMME has the following external dependencies:
* Joint Evolutionary Trees: http://www.lcqb.upmc.fr/JET2/
* seqinr R package: https://cran.r-project.org/web/packages/seqinr/index.html
* naccess: http://www.bioinf.manchester.ac.uk/naccess/
These tools should be installed to be able to use ESGEMME.
# Cite # Cite
Laine E, Karami Y, Carbone A. GEMME: A Simple and Fast Global Epistatic Model Predicting Mutational Effects. Molecular Biology and Evolution, Volume 36, Issue 11, November 2019, Pages 2604–2619 Laine E, Karami Y, Carbone A. GEMME: A Simple and Fast Global Epistatic Model Predicting Mutational Effects. Molecular Biology and Evolution, Volume 36, Issue 11, November 2019, Pages 2604–2619
......
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment