Analyzing and Modifying the ESCOTT Output
Raw ESCOTT Scores and Their Interpretation
There is not a hardcoded limit for raw ESCOTT scores. However, the values range between [-12, 2] generally. The lower values mean the mutations is impactful, while values close to 0 means the mutation does not have any significant impact.
We should note that most of the ‘impactful’ mutations are deleterious but it is not always the case.
Entire Single Point Mutation Landscape Calculations
By default, ESCOTT will only output the combined (independent and epistatic) scores*:
Assuming that your fasta sequence has a name ‘myProt’ after ‘>’ character, there will be three output files:
myProt_normPred_evolCombi.txt
myProt is the short name in the MSA file for your protein. The ‘myProt_normPred_evolCombi.txt’ file contains 20 rows (for 20 amino acids in alphabetical order) and L columns, where L is the number of amino acids in your protein of interest. Since this file is horizontal, it is easy to read it in R or Python but difficult to find the mutations you are interested.
myProt_normPred_evolCombiTransposedRanksorted.csv
As the name implies, this is the transposed and reverse ranksorted version of the combined results. It is easier to find the mutations you are interested in this file. It can be opened with any spreadsheet program like MS Excel. Each row is an amino acid in the protein and 20 columns contain mutational effects of the original amino acid. The values are between 0 and 1. While 0 indicates no effect, 1 indicates a high impact.
myProt_normPred_evolCombi.png
This is the image file of the combined results. It selects ‘turbo_r’ matplotlib color map by default. You can change it by adding ‘–colormap turbo_r’ for a more fancy look during the escott call. It –colormap argument accepts all the color maps in matplotlib. If your query sequence is longer than 500 amino acids, the program may produce multiple png files, each one containing a 500 residue segment.
Note*: If you want to see epistatic and independent contributions as well, you should add ‘–verbose true’ argument while calling escott.
Selected Single Point or Multiple Point Mutation Calculations
Since you used a mutation file to predict mutations, you will have your combined (epistatic+independent) results in the same format, such as :
- *. myProt_normPred_evolCombi.txt:
A2C -6.23 A2D -1.23 . . . D286F -0.23