Using ESGEMME via Docker

Requirements

You need to have docker installed on your machine. You can consult the following page for this: https://docs.docker.com/get-docker/

I am assuming some basic familiarity with Linux/Unix/MacOS terminal commands.

Let’s start our favorite terminal app.

You must create a folder called docker-tutorial and go to that empty folder:

mkdir docker-tutorial
cd docker-tutorial

Getting the example input data

Let’s download the sample data provided in the ESGEMME repository for this exercise. First, we will download the multiple sequence alignment file in fasta format:

wget http://gitlab.lcqb.upmc.fr/tekpinar/ESGEMME/raw/8d766d4d11af0e93c9da8fc2c5cc1bfc457d2936/data/aliBLAT.fasta

If you don’t have wget, you can try the same command with curl:

curl http://gitlab.lcqb.upmc.fr/tekpinar/ESGEMME/raw/8d766d4d11af0e93c9da8fc2c5cc1bfc457d2936/data/aliBLAT.fasta >aliBLAT.fasta

Please verify that the aliBLAT.fasta file is in the folder.

Now, we will download the PDB (Protein Databank) file for BLAT:

wget http://gitlab.lcqb.upmc.fr/tekpinar/ESGEMME/raw/8d766d4d11af0e93c9da8fc2c5cc1bfc457d2936/data/blat-af2.pdb

Single point mutation calculations

In order to make sure that the docker is installed:

sudo docker -h

If it shows you a list of options, you are on a good track. On MacOS, you may not need ‘sudo’ word before the docker command at all.

sudo docker run -ti --rm --mount type=bind,source=$PWD,target=/home/tekpinar/research/myexample \
tekpinar/esgemme-docker:v1.3.0

You are in the container (your virtual operating system) now. You created a folder called myexample in your container with the previous command. Let’s change to that folder.

cd ../myexample/

When you check the data in that folder with ‘ls’ command, you are supposed to see aliBLAT.fasta and blat-af2.pdb files. Basically, your docker-tutorial folder on the host system and myexample folder on the docker container are pointing to the same place.

Obtaining the entire single point mutation landscape

In this step, we will use only evolutionary information from an MSA file:

esgemme aliBLAT.fasta -r input -f aliBLAT.fasta

After a few minutes of calculation, you must see at least two files named BLAT_normPred_evolCombi.txt and BLAT_normPred_evolCombi.png. You have the entire single point mutational landscape of BLAT protein in these files.

If you want to utilize structural information (highly recommended) as well as evolutionary information:

esgemme aliBLAT.fasta -r input -f aliBLAT.fasta \
--pdbfile blat-af2.pdb \
--normweightmode sstjetormax

Predicting the effect of a subset of single point mutations

If you are interested in only a bunch of single point mutations, you have to prepare a mut file. The format is a simple text file and each line contains a single point mutation such as D26A…. Fortunately, we have an example mut in data folder of ESGEMME repository.

wget http://gitlab.lcqb.upmc.fr/tekpinar/ESGEMME/raw/master/data/Stiffler_2015_BLAT_ECOLX.mut

Similar to the previous step, there are two possible ways to do the calculations: with or without structural information. First, let’s do it without structural information:

esgemme aliBLAT.fasta -r input -f aliBLAT.fasta \
-m Stiffler_2015_BLAT_ECOLX.mut

You can include structural information in the following way: .. code:: bash

esgemme aliBLAT.fasta -r input -f aliBLAT.fasta –pdbfile blat-af2.pdb –normweightmode sstjetormax -m Stiffler_2015_BLAT_ECOLX.mut

You will have BLAT_normPred_evolCombi.txt file in your folder. However, the output format is completely different from the entire mutational landscape scanning file. Each line of this file is a mutation and its predicted effect separated by a space. In addition, you won’t have a png file like in the previous case.

Multiple point mutation calculations

Sometimes, we need to see effects of double or triple mutations. ESGEMME can perform calculations if you provide a mut file. In this case, the mut file must have the following format:

E26D:Y44R
E56N:A77F:H94V

The first line of the text file is impact of a double mutation and the second line is the impact of the triple mutations. As you can see, the mutations are separated by a colon(:) character. The output file will be in a similar format. Each line will contain the multiple mutation and its predicted effect, separated by a space.

Running several jobs using docker