Using ESGEMME via Docker
Requirements
You need to have docker installed on your machine. You can consult the following page for this: https://docs.docker.com/get-docker/
I am assuming some basic familiarity with Linux/Unix/MacOS terminal commands.
Let’s start our favorite terminal app.
You must create a folder called docker-tutorial and go to that empty folder:
mkdir docker-tutorial
cd docker-tutorial
Getting the example input data
Let’s download the sample data provided in the ESGEMME repository for this exercise. First, we will download the multiple sequence alignment file in fasta format:
wget http://gitlab.lcqb.upmc.fr/tekpinar/ESGEMME/raw/8d766d4d11af0e93c9da8fc2c5cc1bfc457d2936/data/aliBLAT.fasta
If you don’t have wget, you can try the same command with curl:
curl http://gitlab.lcqb.upmc.fr/tekpinar/ESGEMME/raw/8d766d4d11af0e93c9da8fc2c5cc1bfc457d2936/data/aliBLAT.fasta >aliBLAT.fasta
Please verify that the aliBLAT.fasta file is in the folder.
Now, we will download the PDB (Protein Databank) file for BLAT:
wget http://gitlab.lcqb.upmc.fr/tekpinar/ESGEMME/raw/8d766d4d11af0e93c9da8fc2c5cc1bfc457d2936/data/blat-af2.pdb
Single point mutation calculations
In order to make sure that the docker is installed:
sudo docker -h
If it shows you a list of options, you are on a good track. On MacOS, you may not need ‘sudo’ word before the docker command at all.
sudo docker run -ti --rm --mount type=bind,source=$PWD,target=/home/tekpinar/research/myexample \
tekpinar/esgemme-docker:v1.3.0
You are in the container (your virtual operating system) now. You created a folder called myexample in your container with the previous command. Let’s change to that folder.
cd ../myexample/
When you check the data in that folder with ‘ls’ command, you are supposed to see aliBLAT.fasta and blat-af2.pdb files. Basically, your docker-tutorial folder on the host system and myexample folder on the docker container are pointing to the same place.
Obtaining the entire single point mutation landscape
In this step, we will use only evolutionary information from an MSA file:
esgemme aliBLAT.fasta -r input -f aliBLAT.fasta
After a few minutes of calculation, you must see at least two files named BLAT_normPred_evolCombi.txt and BLAT_normPred_evolCombi.png. You have the entire single point mutational landscape of BLAT protein in these files.
If you want to utilize structural information (highly recommended) as well as evolutionary information:
esgemme aliBLAT.fasta -r input -f aliBLAT.fasta \
--pdbfile blat-af2.pdb \
--normweightmode sstjetormax
Predicting the effect of a subset of single point mutations
If you are interested in only a bunch of single point mutations, you have to prepare a mut file. The format is a simple text file and each line contains a single point mutation such as D26A…. Fortunately, we have an example mut in data folder of ESGEMME repository.
wget http://gitlab.lcqb.upmc.fr/tekpinar/ESGEMME/raw/master/data/Stiffler_2015_BLAT_ECOLX.mut
Similar to the previous step, there are two possible ways to do the calculations: with or without structural information. First, let’s do it without structural information:
esgemme aliBLAT.fasta -r input -f aliBLAT.fasta \
-m Stiffler_2015_BLAT_ECOLX.mut
You can include structural information in the following way: .. code:: bash
esgemme aliBLAT.fasta -r input -f aliBLAT.fasta –pdbfile blat-af2.pdb –normweightmode sstjetormax -m Stiffler_2015_BLAT_ECOLX.mut
You will have BLAT_normPred_evolCombi.txt file in your folder. However, the output format is completely different from the entire mutational landscape scanning file. Each line of this file is a mutation and its predicted effect separated by a space. In addition, you won’t have a png file like in the previous case.
Multiple point mutation calculations
Sometimes, we need to see effects of double or triple mutations. ESGEMME can perform calculations if you provide a mut file. In this case, the mut file must have the following format:
E26D:Y44R
E56N:A77F:H94V
The first line of the text file is impact of a double mutation and the second line is the impact of the triple mutations. As you can see, the mutations are separated by a colon(:) character. The output file will be in a similar format. Each line will contain the multiple mutation and its predicted effect, separated by a space.