Analyzing and Modifying the ESGEMME Output

Raw ESGEMME Scores and Their Interpretation

There is not a hardcoded limit for raw ESGEMME scores. However, the values range between [-12, 2] generally. The lower values mean the mutations is impactful, while values close to 0 means the mutation does not have any significant impact.

We should note that most of the ‘impactful’ mutations are deleterious but it is not always the case.

Entire Single Point Mutation Landscape Calculations

By default, ESGEMME will only output the combined (independent and epistatic) scores*:

There are three output files:

  1. myProt_normPred_evolCombi.txt

    myProt is the short name in the MSA file for your protein. The ‘myProt_normPred_evolCombi.txt’ file contains 20 rows (for 20 amino acids in alphabetical order) and L columns, where L is the number of amino acids in your protein of interest. Since this file is horizontal, it is easy to read it in R or Python but difficult to find the mutations you are interested.

  2. myProt_normPred_evolCombiTransposed.txt

    As the name implies, this is the transposed version of the combined results. It is easier to find the mutations you are interested in this file. Just check the row corresponding to the mutation.

  3. myProt_normPred_evolCombi.Preparing

    This is the image file of the combined results. It selects ‘Oranges_r’ matplotlib color map by default. You can change it by adding ‘–colormap turbo_r’ for a more fancy look during the esgemme call. It –colormap argument accepts all the color maps in matplotlib.

Note*: If you want to see epistatic and independent contributions as well, you should add ‘–verbose true’ argument while calling esgemme.

Selected Single Point or Multiple Point Mutation Calculations

Since you used a mutation file to predict mutations, you will have your combined (epistatic+independent) results in the same format, such as :

*. myProt_normPred_evolCombi.txt:

A2C -6.23 A2D -1.23 . . . D286F -0.23