Analyzing and Modifying the ESCOTT Output
Raw ESCOTT Scores and Their Interpretation
There is not a hardcoded limit for raw ESCOTT scores. However, the values
range between [-12, 2] generally. The lower values mean the mutations is impactful,
while values close to 0 means the mutation does not have any significant impact.
We should note that most of the 'impactful' mutations are deleterious but
it is not always the case.
Entire Single Point Mutation Landscape Calculations
By default, ESCOTT will only output the combined (independent
and epistatic) scores*:
Assuming that your fasta sequence has a name 'myProt' after '>' character, there will be three output files:
#. myProt_normPred_evolCombi.txt
myProt is the short name in the MSA file for your protein. The
'myProt_normPred_evolCombi.txt' file contains 20 rows (for 20 amino acids
in alphabetical order) and L columns, where L is the number of amino acids
in your protein of interest. Since this file is horizontal, it is easy to
read it in R or Python but difficult to find the mutations you are interested.
#. myProt_normPred_evolCombiTransposedRanksorted.csv
As the name implies, this is the transposed and reverse ranksorted version of the combined results.
It is easier to find the mutations you are interested in this file. It can be opened with any spreadsheet
program like MS Excel. Each row is an amino acid in the protein and 20 columns contain mutational effects
of the original amino acid. The values are between 0 and 1. While 0 indicates no effect, 1 indicates a
high impact.
#. myProt_normPred_evolCombi.png
This is the image file of the combined results. It selects 'turbo_r'
matplotlib color map by default. You can change it by adding '--colormap turbo_r'
for a more fancy look during the escott call. It --colormap argument accepts
all the color maps in matplotlib.
If your query sequence is longer than 500 amino acids, the program may produce
multiple png files, each one containing a 500 residue segment.
Note*: If you want to see epistatic and independent contributions as well,
you should add '--verbose true' argument while calling escott.
Selected Single Point or Multiple Point Mutation Calculations
Since you used a mutation file to predict mutations, you will have your combined (epistatic+independent)
results in the same format, such as :
*. myProt_normPred_evolCombi.txt:
A2C -6.23
A2D -1.23
D286F -0.23
Using ESCOTT via Docker
You need to have docker installed on your machine. You can consult the
following page for this:
I am assuming some basic familiarity with Linux/Unix/MacOS terminal
Let’s start our favorite terminal app.
You must create a folder called docker-tutorial and go to that empty
.. code:: bash
mkdir docker-tutorial
cd docker-tutorial
Getting the example input data
Let’s download the sample data provided in the PRESCOTT repository for
this exercise. First, we will download the multiple sequence alignment
file in fasta format:
.. code:: bash
If you don’t have wget, you can try the same command with curl:
.. code:: bash
curl >aliBLAT.fasta
Please verify that the aliBLAT.fasta file is in the folder.
Now, we will download the PDB (Protein Databank) file for BLAT:
.. code:: bash
Single point mutation calculations
In order to make sure that the docker is installed:
.. code:: bash
sudo docker -h
If it shows you a list of options, you are on a good track. On MacOS,
you may not need ‘sudo’ word before the docker command at all.
.. code:: bash
sudo docker run -ti --rm --mount type=bind,source=$PWD,target=/home/tekpinar/research/myexample \
You are in the container (your virtual operating system) now. You
created a folder called myexample in your container with the previous
command. Let’s change to that folder.
.. code:: bash
cd ../myexample/
When you check the data in that folder with ‘ls’ command, you are
supposed to see aliBLAT.fasta and blat-af2.pdb files. Basically, your
docker-tutorial folder on the host system and myexample folder on the
docker container are pointing to the same place.
Obtaining the entire single point mutation landscape
In this step, we will use only evolutionary information from an MSA file:
.. code:: bash
escott aliBLAT.fasta
After a few minutes of calculation, you must see at least two files named
BLAT_normPred_evolCombi.txt and BLAT_normPred_evolCombi.png. You have
the entire single point mutational landscape of BLAT protein in these
If you want to utilize structural information (highly recommended) as well as
evolutionary information:
.. code:: bash
escott aliBLAT.fasta --pdbfile blat-af2.pdb
Predicting the effect of a subset of single point mutations
If you are interested in only a bunch of single point mutations,
you have to prepare a mut file. The format is a simple text file and
each line contains a single point mutation such as D26A....
Fortunately, we have an example mut in data folder of PRESCOTT repository.
.. code:: bash
Similar to the previous step, there are two possible ways to do the calculations: with or without
structural information. First, let's do it without structural information:
.. code:: bash
escott aliBLAT.fasta -m Stiffler_2015_BLAT_ECOLX.mut
You can include structural information in the following way:
.. code:: bash
escott aliBLAT.fasta --pdbfile blat-af2.pdb \
-m Stiffler_2015_BLAT_ECOLX.mut
You will have BLAT_normPred_evolCombi.txt file in your folder. However, the output
format is completely different from the entire mutational landscape scanning file.
Each line of this file is a mutation and its predicted effect separated by a space.
In addition, you won't have a png file like in the previous case.
Multiple point mutation calculations
Sometimes, we need to see effects of double or triple mutations. ESCOTT can
perform calculations if you provide a mut file. In this case, the mut file must
have the following format:
.. code:: bash
The first line of the text file is impact of a double mutation and the second
line is the impact of the triple mutations. As you can see, the mutations are
separated by a colon(:) character.
The output file will be in a similar format. Each line will contain the multiple
mutation and its predicted effect, separated by a space.
Running several jobs using docker
If you want to use docker in a more automated way for several proteins,
you can call docker within a bash script.
.. code:: bash
sudo docker run --rm -v $PWD:/home/tekpinar/research/lcqb tekpinar/prescott-docker:v1.5.0 escott aliBLAT.fasta --pdbfile blat-af2.pdb
Note: It is very important to have aliBLAT.fasta and blat-af2.pdb files in your local folder when you call docker like an executable.
Typically, I create a folder for each protein that contain the alignment and the structure. Then, I change the path to each folder with 'cd'
command inside bash script and execute the command above in each local folder.
.. PRESCOTT documentation master file, created by
sphinx-quickstart on Fri May 5 13:52:13 2023.
You can adapt this file completely to your liking, but it should at least
contain the root `toctree` directive.
Welcome to PRESCOTT documentation!
.. toctree::
:maxdepth: 2
:caption: Contents:
Indices and tables
* :ref:`genindex`
* :ref:`modindex`
* :ref:`search`
Preparing Your Own Input
Preparing Your Input MSA and PDB with Colabfold
You have a fasta file for your protein of interest and you want to understand
impact of (certain) mutations.
Before starting, please make sure that your fasta file does not contain a gap.
The quickest method to obtain both multiple sequence alignment and a protein
structure is to use Colabfold. Let's do this step by step:
1. Let's go the Colabfold web site:
Sign in using your gmail account.
2. Click on the 'Connect' button on the top right hand side.
3. Clean 'query_sequence' box and paste your sequence to the 'query_sequence' box.
For me, I selected adenylate kinase (AKE) as my example fasta sequence
4. Change the 'jobname' to something that makes more sense to you.
5. Go to the menu bar of your 'AlphaFold2.ipynb' notebook, where 'File, Edit,
View, Insert, Runtime, Tools, Help' are listed. Click on the Runtime and
select 'Run all'.
6. This process make take from a few minutes to a few hours depending on your
protein size. It will give you an a3m file and up to 5 PDB models. Put these
files in a clean folder and change the directory to that folder in your
7. Unfortunately, a3m file is not in fasta format and it contains gap columns.
We have to clean those gaps. We can do that with a GUI program like Ugene
or Jalview. However, it is a labor intensive procedure. Here, I will use a
small tool that I developed and added to the PRESCOTT docker image that I created.
8. Start the docker image with the following command:
.. code:: bash
sudo docker run -ti --rm --mount type=bind,source=$PWD,target=/home/tekpinar/research/myexample \
9. Now, change the directory to myexample folder.
.. code:: bash
cd ../myexample/
ls -l
We are supposed to see our a3m and pdb files in this folder.
10. Let's use a small script from hhsuite to convert a3m file to fasta format.
.. code:: bash a3m fas AKE.a3m AKE.fasta
11. Final step and we are there:
.. code:: bash
demust removegaps -i AKE.fasta -o AKE_nogaps.fasta
There is one last step to reach our goal. ID and description parts of the
a3m and fasta files are too long. We have to shorten them. We can do that with
.. code:: bash
awk 'BEGIN{FS=" "}{if(NF>1) {printf(">%s\n", $1)}else{print $0}}' AKE_nogaps.fasta > AKE_nogaps_short_names.fasta
# Recheck this command if you can remove extra >
Congratulations! Now, you have all the input files required for PRESCOTT:
#. An input MSA: AKE_nogaps_short_names.fasta
#. An input PDB: myprotein.pdb
PRESCOTT is implemented in Python 3 and R. It has been tested only on
Linux. Since PRESCOTT has many dependencies, we recommend using our web
site or our docker image. If you are a determined user, here comes the
steps required to install it from the source.
Installing the dependencies:
PRESCOTT has the following external dependencies:
* Joint Evolutionary Trees: and its dependencies:
* java
* naccess:
After you installed JET2 define a parameter called JET2_PATH inside your .profile file.
You can open .profile as follows:
.. code:: bash
gedit ~/.profile
You should add a command like below to the end of that file, save and exit.
.. code:: bash
export JET2_PATH=/home/tekpinar/JET2/
Please, do not forget to replace /home/tekpinar/JET2 with your own file path.
Then, source the saved .profile so that the environment variable will be taken into account:
.. code:: bash
source ~/.profile
JET2 is essential and it should be installed to be able to use PRESCOTT.
Preparation of the environment and installation of PRESCOTT
Step by step installation on Ubuntu 22.04
Prepare your environment and install the required packages:
.. code:: bash
sudo apt-get update --fix-missing && \
sudo apt-get install -y --no-install-recommends apt-utils && \
sudo apt-get install -y software-properties-common && \
sudo apt-get install -y autotools-dev && \
sudo apt-get install -y automake && \
sudo apt-get install -y build-essential && \
sudo apt-get install -y python3-dev && \
sudo apt-get install -y python3-pip && \
sudo apt-get install -y r-base r-base-core && \
sudo apt-get install -y muscle && \
sudo apt-get install -y default-jre && \
sudo apt-get install -y ncbi-blast+ && \
sudo apt-get install -y nano && \
sudo apt-get install -y less && \
sudo apt-get install -y wget && \
sudo apt-get install csh && \
sudo apt-get install -y hmmer && \
sudo apt-get install -y libboost-all-dev && \
sudo apt-get clean && \
sudo rm -rf /var/lib/apt/lists/* /tmp/* /var/tmp/*
#Dssp installation
If you are using Ubuntu 20.04, you can install dssp by the following command
.. code:: bash
sudo apt-get install dssp
Otherwise, you can install it from the source by the following commands.
Please note that default dssp in Ubuntu 22.04 is not working properly.
.. code:: bash
wget && \
unzip -o && cd dssp-master/ && \
./ && \
./configure && \
make && \
sudo make install && \
sudo ln -s /usr/local/bin/mkdssp /usr/local/bin/dssp && \
cd ../ && \
sudo rm -rf dssp-master/ && \
sudo rm -f
#HHSUITE installation
.. code:: bash
wget && \
mkdir hhsuite && \
mv hhsuite-3.3.0-AVX2-Linux.tar.gz hhsuite/ && \
cd hhsuite && \
tar xvfz hhsuite-3.3.0-AVX2-Linux.tar.gz && \
rm -f hhsuite-3.3.0-AVX2-Linux.tar.gz
#Add it to your path permanently inside .bashrc or .profile or .bash_profile
Check the location of hhsuite folder and add it to your path
In my case it was in /home/tekpinar/research/lcqb folder. Therefore, I added the following line
to my .profile file.
Open .profile file with gedit:
.. code:: bash
gedit ~/.profile
Now, add the following line to the end of the file.
.. code:: bash
Of course, your path will not be /home/tekpinar/research/lcqb/ and you have to modify the path according to
your system. Save the file and exit. Then,
.. code:: bash
source ~/.profile
#Download PRESCOTT from repository and go inside the PRESCOTT folder.!
You can download the master version using command line as follows:
.. code:: bash
git clone
If you would like the development version:
.. code:: bash
git clone -b development
.. code:: bash
Configuring default.conf file
Inside PRESCOTT/esgemme folder, there is an important file called default.conf.
This file contains essential parameters of PRESCOTT, such as paths of
external parts, default internal parameters. etc. You have to correct the Software section of this
file according to your system.
.. code:: bash
pip3 install -e . &&\
cd ../
#Installing the required R packages
.. code:: bash
sudo Rscript -e 'install.packages("seqinr", repos="", dependencies=TRUE)'
#Installing secondary programs such as ev_couplings to obtain MSA files.
.. code:: bash
wget && \
unzip -o && \
cd plmc-master && \
make all-openmp32 && \
sudo cp bin/plmc /usr/local/bin/ && \
cd ../ && \
rm -rf plmc-master
<h1>Analyzing and Modifying the ESCOTT Output<a class="headerlink" href="#analyzing-and-modifying-the-escott-output" title="Permalink to this heading"></a></h1>
<section id="raw-escott-scores-and-their-interpretation">
<h2>Raw ESCOTT Scores and Their Interpretation<a class="headerlink" href="#raw-escott-scores-and-their-interpretation" title="Permalink to this heading"></a></h2>
<p>There is not a hardcoded limit for raw ESCOTT scores. However, the values
range between [-12, 2] generally. The lower values mean the mutations is impactful,
while values close to 0 means the mutation does not have any significant impact.</p>
<p>We should note that most of the ‘impactful’ mutations are deleterious but
it is not always the case.</p>
<section id="entire-single-point-mutation-landscape-calculations">
<h2>Entire Single Point Mutation Landscape Calculations<a class="headerlink" href="#entire-single-point-mutation-landscape-calculations" title="Permalink to this heading"></a></h2>
<p>By default, ESCOTT will only output the combined (independent
and epistatic) scores*:</p>
Assuming that your fasta sequence has a name 'myProt' after '>' character, there will be three output files:
<ol class="arabic">
<div><p>myProt is the short name in the MSA file for your protein. The
‘myProt_normPred_evolCombi.txt’ file contains 20 rows (for 20 amino acids
in alphabetical order) and L columns, where L is the number of amino acids
in your protein of interest. Since this file is horizontal, it is easy to
read it in R or Python but difficult to find the mutations you are interested.</p>
<div><p>As the name implies, this is the transposed and reverse ranksorted version of the combined results.
It is easier to find the mutations you are interested in this file. It can be opened with any spreadsheet
program like MS Excel. Each row is an amino acid in the protein and 20 columns contain mutational effects
of the original amino acid. The values are between 0 and 1. While 0 indicates no effect, 1 indicates a
high impact.</p>
<div><p>This is the image file of the combined results. It selects ‘turbo_r’
matplotlib color map by default. You can change it by adding ‘–colormap turbo_r’
for a more fancy look during the escott call. It –colormap argument accepts
all the color maps in matplotlib.
If your query sequence is longer than 500 amino acids, the program may produce
multiple png files, each one containing a 500 residue segment.</p>
<p>Note*: If you want to see epistatic and independent contributions as well,
you should add ‘–verbose true’ argument while calling escott.</p>
<section id="selected-single-point-or-multiple-point-mutation-calculations">
<h2>Selected Single Point or Multiple Point Mutation Calculations<a class="headerlink" href="#selected-single-point-or-multiple-point-mutation-calculations" title="Permalink to this heading"></a></h2>
<p>Since you used a mutation file to predict mutations, you will have your combined (epistatic+independent)
results in the same format, such as :</p>
<dl class="simple">
<dt><a href="#id1"><span class="problematic" id="id2">*</span></a>. myProt_normPred_evolCombi.txt:</dt><dd><p>A2C -6.23
A2D -1.23
D286F -0.23</p>
<h1>Welcome to PRESCOTT documentation!<a class="headerlink" href="#welcome-to-prescott-documentation" title="Permalink to this heading"></a></h1>
<div class="toctree-wrapper compound">
<p class="caption" role="heading"><span class="caption-text">Contents:</span></p>
<li class="toctree-l1"><a class="reference internal" href="introduction.html">Introduction</a><ul>
<li class="toctree-l2"><a class="reference internal" href="introduction.html#what-is-prescott">What is PRESCOTT?</a></li>
<li class="toctree-l2"><a class="reference internal" href="introduction.html#input-data-requirements">Input Data Requirements</a></li>
<li class="toctree-l2"><a class="reference internal" href="introduction.html#usage">Usage</a></li>
<li class="toctree-l2"><a class="reference internal" href="introduction.html#installation">Installation</a></li>
<li class="toctree-l2"><a class="reference internal" href="introduction.html#citation">Citation</a></li>
<li class="toctree-l1"><a class="reference internal" href="docker.html">Using ESCOTT via Docker</a><ul>
<li class="toctree-l2"><a class="reference internal" href="docker.html#requirements">Requirements</a></li>
<li class="toctree-l2"><a class="reference internal" href="docker.html#getting-the-example-input-data">Getting the example input data</a></li>
<li class="toctree-l2"><a class="reference internal" href="docker.html#single-point-mutation-calculations">Single point mutation calculations</a></li>
<li class="toctree-l2"><a class="reference internal" href="docker.html#multiple-point-mutation-calculations">Multiple point mutation calculations</a></li>
<li class="toctree-l2"><a class="reference internal" href="docker.html#running-several-jobs-using-docker">Running several jobs using docker</a></li>
<li class="toctree-l1"><a class="reference internal" href="analysis.html">Analyzing and Modifying the ESCOTT Output</a><ul>
<li class="toctree-l2"><a class="reference internal" href="analysis.html#raw-escott-scores-and-their-interpretation">Raw ESCOTT Scores and Their Interpretation</a></li>
<li class="toctree-l2"><a class="reference internal" href="analysis.html#entire-single-point-mutation-landscape-calculations">Entire Single Point Mutation Landscape Calculations</a></li>
<li class="toctree-l2"><a class="reference internal" href="analysis.html#selected-single-point-or-multiple-point-mutation-calculations">Selected Single Point or Multiple Point Mutation Calculations</a></li>
<li class="toctree-l1"><a class="reference internal" href="input-preparation.html">Preparing Your Own Input</a><ul>
<li class="toctree-l2"><a class="reference internal" href="input-preparation.html#preparing-your-input-msa-and-pdb-with-colabfold">Preparing Your Input MSA and PDB with Colabfold</a></li>
<li class="toctree-l1"><a class="reference internal" href="singularity.html">Using ESCOTT via Singularity</a></li>
<li class="toctree-l1"><a class="reference internal" href="installation.html">Installation</a><ul>
<li class="toctree-l2"><a class="reference internal" href="installation.html#installing-the-dependencies">Installing the dependencies:</a></li>
<li class="toctree-l2"><a class="reference internal" href="installation.html#preparation-of-the-environment-and-installation-of-prescott">Preparation of the environment and installation of PRESCOTT</a></li>
<li class="toctree-l2"><a class="reference internal" href="installation.html#configuring-default-conf-file">Configuring default.conf file</a></li>
<h1>Preparing Your Own Input<a class="headerlink" href="#preparing-your-own-input" title="Permalink to this heading"></a></h1>
<section id="preparing-your-input-msa-and-pdb-with-colabfold">
<h2>Preparing Your Input MSA and PDB with Colabfold<a class="headerlink" href="#preparing-your-input-msa-and-pdb-with-colabfold" title="Permalink to this heading"></a></h2>
<p>You have a fasta file for your protein of interest and you want to understand
impact of (certain) mutations.
Before starting, please make sure that your fasta file does not contain a gap.
The quickest method to obtain both multiple sequence alignment and a protein
structure is to use Colabfold. Let’s do this step by step:</p>
<ol class="arabic">
<li><p>Let’s go the Colabfold web site:</p>
<p><a class="reference external" href=""></a></p>
<p>Sign in using your gmail account.</p>
<li><p>Click on the ‘Connect’ button on the top right hand side.</p></li>
<li><p>Clean ‘query_sequence’ box and paste your sequence to the ‘query_sequence’ box.
For me, I selected adenylate kinase (AKE) as my example fasta sequence
(<a class="reference external" href=""></a>).</p></li>
<li><p>Change the ‘jobname’ to something that makes more sense to you.</p></li>
<li><p>Go to the menu bar of your ‘AlphaFold2.ipynb’ notebook, where ‘File, Edit,
View, Insert, Runtime, Tools, Help’ are listed. Click on the Runtime and
select ‘Run all’.</p></li>
<li><p>This process make take from a few minutes to a few hours depending on your
protein size. It will give you an a3m file and up to 5 PDB models. Put these
files in a clean folder and change the directory to that folder in your
<li><p>Unfortunately, a3m file is not in fasta format and it contains gap columns.
We have to clean those gaps. We can do that with a GUI program like Ugene
or Jalview. However, it is a labor intensive procedure. Here, I will use a
small tool that I developed and added to the PRESCOTT docker image that I created.</p></li>
<li><p>Start the docker image with the following command:</p>
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>sudo docker run -ti --rm --mount <span class="nv">type</span><span class="o">=</span>bind,source<span class="o">=</span><span class="nv">$PWD</span>,target<span class="o">=</span>/home/tekpinar/research/myexample <span class="se">\</span>
<li><p>Now, change the directory to myexample folder.</p>
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span><span class="nb">cd</span> ../myexample/
ls -l
<p>We are supposed to see our a3m and pdb files in this folder.</p>
<li><p>Let’s use a small script from hhsuite to convert a3m file to fasta format.</p>
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span> a3m fas AKE.a3m AKE.fasta
<li><p>Final step and we are there:</p>
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>demust removegaps -i AKE.fasta -o AKE_nogaps.fasta
<p>There is one last step to reach our goal. ID and description parts of the
a3m and fasta files are too long. We have to shorten them. We can do that with</p>
<div><div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>awk <span class="s1">&#39;BEGIN{FS=&quot; &quot;}{if(NF&gt;1) {printf(&quot;&gt;%s\n&quot;, $1)}else{print $0}}&#39;</span> AKE_nogaps.fasta &gt; AKE_nogaps_short_names.fasta
<span class="c1"># Recheck this command if you can remove extra &gt;</span>
<p>Congratulations! Now, you have all the input files required for PRESCOTT:</p>
