Commit 2d1df3ad by Mustafa Tekpinar

Initial Commit

parents
*~
._*
\ No newline at end of file
MIT License
Copyright (c) 2018: Elodie Laine, Yasaman Karami and Alessandra Carbone.
Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:
The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE.
\ No newline at end of file
############################################################################################################
# #
# GEMME: a tool to predict mutational outcomes using evolutionary conservation and global epistasis #
# #
############################################################################################################
#
#
# GEMME is implemented in Python and R.
# https://www.python.org/
# https://cran.r-project.org/
#
#
##################
# Dependencies: #
##################
#
# Joint Evolutionary Trees: http://www.lcqb.upmc.fr/JET2/
# seqinr R package: https://cran.r-project.org/web/packages/seqinr/index.html
# These tools should be installed to be able to use GEMME.
#
#
##################
# Installation: #
##################
#
# Download the GEMME.tgz archive from http://www.lcqb.upmc.fr/GEMME/.
# Uncompress the archive in the directory of your choice.
# Define and export the environment variable GEMME_PATH=/path-to-GEMME-directory/
# Run the program by typing "python $GEMME_PATH/gemme.py inputAli.fasta".
# A help can be accessed by typing "python $GEMME_PATH/gemme.py --help".
#
#
#################
# Usage notes: #
#################
#
# The inputAli.fasta is a mandatory argument that corresponds to the input multiple sequence
# alignment file, in FASTA format. The query sequence is taken as the first sequence in the alignment.
#
# By default, GEMME will predict the effect of all possible single mutations at all positions in the
# query sequence. Alternatively, a set of single or multiple mutations can be given with the option -m.
# Eachline of the file should contain a mutation (e.g. D136R) or combination of mutations separated
# by commas and ordered according to their positions in the sequence (e.g. D136R,V271A).
#
# GEMME calls JET2 to compute evolutionary conservation levels. By default, JET2 will retrieve a set
# of sequences related to the query, independent from the input set, according to specific criteria.
# The retrieval method used in JET2 is PSI-BLAST, which can perform the search either locally (by
# default) or remotely (-r server). Alternatively, the user can provide her/his own psiblast file
# (-r input-b pFile) or her/his own multiple sequence alignment in FASTA format (-r input -f fFile).
# JET is run in its iterative mode, iJET, 10 times and the final conservation levels are the maxium
# values obtained over the 10 iterations.
# JET2 configuration file is: default.conf.
# JET2 output file is: myProt_jet.res.
#
# By default, GEMME will output mutational effects predictions obtained from the global epistatic model,
# the independent model, and a combination of those two using a reduced alphabet (alphabets/lw-i.11.txt):
# myProt_pred_evolEpi.txt
# myProt_normPred_evolEpi.txt
# myProt_pred_evolInd.txt
# myProt_normPred_evolInd.txt
# myProt_normPred_evolCombi.txt
# The values of interest are the normalized predictions (normPred). Each file contains a 20 x n matrix,
# where n is the number of positions in the query sequence.
# If the user provides her/his own list of mutations, then only the global epistatic model will be run
# and the output file will contain 2 columns, the first one with the mutations, the second one with the
# normalized predicted effects.
#
#
####################
# Main reference: #
####################
#
# Laine E, Karami Y, Carbone A. GEMME: A Simple and Fast Global Epistatic Model Predicting Mutational Effects
# Molecular Biology and Evolution, Volume 36, Issue 11, November 2019, Pages 2604–2619
#
#
#
############################################################################################################
P
G
EKRQ
DSN
T
HC
IV
WYF
A
LM
P
G
EKRQ
D
SN
T
HC
IV
WYF
A
LM
P
G
EKRQ
D
SN
T
H
C
IV
WYF
A
LM
P
G
E
KRQ
D
SN
T
H
C
IV
WYF
A
LM
P
G
E
KRQ
D
S
N
T
H
C
IV
WYF
A
LM
P
G
E
KRQ
D
S
N
T
H
C
IV
W
YF
A
LM
P
G
E
KRQ
D
S
N
T
H
C
IV
W
YF
A
L
M
P
G
E
K
RQ
D
S
N
T
H
C
IV
W
YF
A
L
M
P
G
E
K
RQ
D
S
N
T
H
C
I
V
W
YF
A
L
M
P
G
E
K
R
Q
D
S
N
T
H
C
I
V
W
YF
A
L
M
PGEKRQDSNTHC
IVWYFALM
PG
EKRQDSNTHC
IVWYFALM
PG
EKRQDSNTHC
IVWYF
ALM
PG
EKRQ
DSNTHC
IVWYF
ALM
PG
EKRQ
DSN
THC
IVWYF
ALM
PG
EKRQ
DSN
THC
IVWYF
A
LM
P
G
EKRQ
DSN
THC
IVWYF
A
LM
P
G
EKRQ
DSN
THC
IV
WYF
A
LM
LFIMVWCY
HATGPRQSNEDK
LFI
MVWCY
HA
TGPRQSNED
K
EKQR
IV
LY
F
AM
W
HT
C
DNS
GP
AEKQR
IV
LM
F
Y
W
C
H
T
DNS
GP
AEKQR
I
V
LM
F
Y
W
C
H
T
DNS
GP
AEKQR
I
V
L
M
F
Y
W
C
H
T
DNS
GP
DEKQ
R
A
I
V
L
M
F
Y
W
C
H
T
GNPS
ACEFHIKLMQRVWY
DGNPST
\ No newline at end of file
AEHKQR
CFILMVWY
DGNPST
\ No newline at end of file
AEHKQR
CFILMVWY
DNST
GP
\ No newline at end of file
AEHKQR
FILMVWY
CST
DN
GP
\ No newline at end of file
AEKQR
FIV
LMWY
HCT
DNS
GP
EKQR
FIV
LMWY
ACH
ST
DN
GP
EKQR
IV
LWY
AM
CF
HT
DNS
GP
EKQR
IV
L
F
AMW
CY
HT
DNS
GP
G
D
N
AEFIKLMQRVW
Y
H
C
T
S
P
\ No newline at end of file
G
D
N
AEFIKLMQRV
W
Y
H
C
T
S
P
\ No newline at end of file
G
D
N
AEFIKLMQV
R
W
Y
H
C
T
S
P
\ No newline at end of file
G
D
N
AEFIKLMV
Q
R
W
Y
H
C
T
S
P
\ No newline at end of file
G
D
N
AEFIKLV
M
Q
R
W
Y
H
C
T
S
P
\ No newline at end of file
GP
DNAEFIKLVMQRWYHCTS
G
DNAEFIKLVMQRWYHCTS
P
G
ADEKNQRST
CFHILMVWY
P
G
ND
AEHKQRST
CFILMVWY
P
\ No newline at end of file
G
DN
AEFHIKLMQRVWY
CT
S
P
\ No newline at end of file
G
DN
AEFIKLMQRVWY
CH
T
S
P
\ No newline at end of file
G
D
N
AEFIKLMQRVWY
CH
T
S
P
\ No newline at end of file
G
D
N
AEFIKLMQRVWY
H
C
T
S
P
\ No newline at end of file
FWYM
CLIV
AP
NST
GH
DE
KQR
FWYM
H
LIVC
A
NST
P
G
DE
KQR
LIVFM
Y
W
C
DN
TSKEQR
A
G
P
H
\ No newline at end of file
LIVF
M
Y
W
C
DN
TSKEQ
R
A
G
P
H
\ No newline at end of file
LIV
F
M
Y
W
C
DN
TS
KEQ
R
A
G
P
H
\ No newline at end of file
LIV
F
M
Y
W
C
D
N
TS
KEQ
R
A
G
P
H
\ No newline at end of file
LIV
F
M
Y
W
C
D
N
TS
KE
Q
R
A
G
P
H
\ No newline at end of file
LIV
F
M
Y
W
C
D
N
T
S
KE
Q
R
A
G
P
H
\ No newline at end of file
LIVFMYWC
DNTSKEQRAGPH
\ No newline at end of file
LIVFMYW
C
DNTSKEQRAGPH
\ No newline at end of file
LIVFMYW
C
DNTSKEQRAGP
H
\ No newline at end of file
LIVFMY
W
C
DNTSKEQRAGP
H
\ No newline at end of file
LIVFMY
W
C
DNTSKEQRAG
P
H
\ No newline at end of file
LIVFMY
W
C
DNTSKEQRA
G
P
H
\ No newline at end of file
LIVFM
Y
W
C
DNTSKEQRA
G
P
H
\ No newline at end of file
LIVFM
Y
W
C
DNTSKEQR
A
G
P
H
\ No newline at end of file
AST
C
DEN
FY
G
H
ILMV
KQR
P
W
FWY
ML
IV
CA
TS
NH
P
G
DE
QRK
FWY
ML
IV
CA
TS
NH
P
G
D
QE
RK
FWY
ML
IV
C
A
TS
NH
P
G
D
QE
RK
FWY
ML
IV
C
A
T
S
NH
P
G
D
QE
RK
FWY
ML
IV
C
A
T
S
NH
P
G
D
QE
R
K
FWY
ML
IV
C
A
T
S
N
H
P
G
D
QE
R
K
W
FY
ML
IV
C
A
T
S
N
H
P
G
D
QE
R
K
W
FY
ML
IV
C
A
T
S
N
H
P
G
D
Q
E
R
K
W
FY
M
L
IV
C
A
T
S
N
H
P
G
D
Q
E
R
K
W
F
Y
M
L
IV
C
A
T
S
N
H
P
G
D
Q
E
R
K
CMFILVWY
AGTSNQDEHRKP
CMFILVWY
AGTSP
NQDEHRK
CMFWY
ILV
AGTS
NQDEHRKP
FWYH
MILV
CATSP
G
NQDERK
FWYH
MILV
CATS
P
G
NQDERK
FWYH
MILV
CATS
P
G
NQDE
RK
FWYH
MILV
CA
NTS
P
G
DE
QRK
FWYH
ML
IV
CA
NTS
P
G
DE
QRK
C
FYW
ML
IV
G
P
ATS
NH
QED
RK
C
FYW
ML
IV
G
P
A
TS
NH
QED
RK
C
FYW
ML
IV
G
P
A
TS
NH
QE
D
RK
C
FYW
ML
IV
G
P
A
T
S
NH
QE
D
RK
C
FYW
ML
IV
G
P
A
T
S
N
H
QE
D
RK
FWY
ML
IV
C
A
T
S
N
H
P
G
D
QE
R
K
W
FY
ML
IV
C
A
T
S
N
H
P
G
D
QE
R
K
W
FY
ML
IV
C
A
T
S
N
H
P
G
D
Q
E
R
K
W
FY
M
L
IV
C
A
T
S
N
H
P
G
D
Q
E
R
K
W
F
Y
M
L
IV
C
A
T
S
N
H
P
G
D
Q
E
R
K
CFYWMLIV
GPATSNHQEDRK
CFYWMLIV
GPATS
NHQEDRK
CFYW
MLIV
GPATS
NHQEDRK
CFYW
MLIV
G
PATS
NHQEDRK
CFYW
MLIV
G
P
ATS
NHQEDRK
CFYW
MLIV
G
P
ATS
NHQED
RK
This diff is collapsed. Click to expand it.
This diff is collapsed. Click to expand it.
This diff is collapsed. Click to expand it.
This source diff could not be displayed because it is too large. You can view the blob instead.
This diff is collapsed. Click to expand it.
This diff is collapsed. Click to expand it.
This diff is collapsed. Click to expand it.
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment