Using ESCOTT via Docker
Requirements
You need to have docker installed on your machine. You can consult the following page for this: https://docs.docker.com/get-docker/
I am assuming some basic familiarity with Linux/Unix/MacOS terminal commands.
Let’s start our favorite terminal app.
You must create a folder called docker-tutorial and go to that empty folder:
mkdir docker-tutorial
cd docker-tutorial
Getting the example input data
Let’s download the sample data provided in the PRESCOTT repository for this exercise. First, we will download the multiple sequence alignment file in fasta format:
wget http://gitlab.lcqb.upmc.fr/tekpinar/PRESCOTT/raw/8d766d4d11af0e93c9da8fc2c5cc1bfc457d2936/data/aliBLAT.fasta
If you don’t have wget, you can try the same command with curl:
curl http://gitlab.lcqb.upmc.fr/tekpinar/PRESCOTT/raw/8d766d4d11af0e93c9da8fc2c5cc1bfc457d2936/data/aliBLAT.fasta >aliBLAT.fasta
Please verify that the aliBLAT.fasta file is in the folder.
Now, we will download the PDB (Protein Databank) file for BLAT:
wget http://gitlab.lcqb.upmc.fr/tekpinar/PRESCOTT/raw/8d766d4d11af0e93c9da8fc2c5cc1bfc457d2936/data/blat-af2.pdb
Single point mutation calculations
In order to make sure that the docker is installed:
sudo docker -h
If it shows you a list of options, you are on a good track. On MacOS, you may not need ‘sudo’ word before the docker command at all.
sudo docker run -ti --rm --mount type=bind,source=$PWD,target=/home/tekpinar/research/myexample \
tekpinar/prescott-docker:v1.5.0
You are in the container (your virtual operating system) now. You created a folder called myexample in your container with the previous command. Let’s change to that folder.
cd ../myexample/
When you check the data in that folder with ‘ls’ command, you are supposed to see aliBLAT.fasta and blat-af2.pdb files. Basically, your docker-tutorial folder on the host system and myexample folder on the docker container are pointing to the same place.
Obtaining the entire single point mutation landscape
In this step, we will use only evolutionary information from an MSA file:
escott aliBLAT.fasta
After a few minutes of calculation, you must see at least two files named BLAT_normPred_evolCombi.txt and BLAT_normPred_evolCombi.png. You have the entire single point mutational landscape of BLAT protein in these files.
If you want to utilize structural information (highly recommended) as well as evolutionary information:
escott aliBLAT.fasta --pdbfile blat-af2.pdb
Predicting the effect of a subset of single point mutations
If you are interested in only a bunch of single point mutations, you have to prepare a mut file. The format is a simple text file and each line contains a single point mutation such as D26A…. Fortunately, we have an example mut in data folder of PRESCOTT repository.
wget http://gitlab.lcqb.upmc.fr/tekpinar/PRESCOTT/raw/master/data/Stiffler_2015_BLAT_ECOLX.mut
Similar to the previous step, there are two possible ways to do the calculations: with or without structural information. First, let’s do it without structural information:
escott aliBLAT.fasta -m Stiffler_2015_BLAT_ECOLX.mut
You can include structural information in the following way:
escott aliBLAT.fasta --pdbfile blat-af2.pdb \
-m Stiffler_2015_BLAT_ECOLX.mut
You will have BLAT_normPred_evolCombi.txt file in your folder. However, the output format is completely different from the entire mutational landscape scanning file. Each line of this file is a mutation and its predicted effect separated by a space. In addition, you won’t have a png file like in the previous case.
Multiple point mutation calculations
Sometimes, we need to see effects of double or triple mutations. ESCOTT can perform calculations if you provide a mut file. In this case, the mut file must have the following format:
E26D:Y44R
E56N:A77F:H94V
The first line of the text file is impact of a double mutation and the second line is the impact of the triple mutations. As you can see, the mutations are separated by a colon(:) character. The output file will be in a similar format. Each line will contain the multiple mutation and its predicted effect, separated by a space.
Running several jobs using docker
If you want to use docker in a more automated way for several proteins, you can call docker within a bash script.
sudo docker run --rm -v $PWD:/home/tekpinar/research/lcqb tekpinar/prescott-docker:v1.5.0 escott aliBLAT.fasta --pdbfile blat-af2.pdb
Note: It is very important to have aliBLAT.fasta and blat-af2.pdb files in your local folder when you call docker like an executable. Typically, I create a folder for each protein that contain the alignment and the structure. Then, I change the path to each folder with ‘cd’ command inside bash script and execute the command above in each local folder.