Commit f8504279 by Mustafa Tekpinar

Created csv output both for escott and prescott modules

csv output for escott is 1-ranksorted scores given in
myProt_normPredCombi.txt file. It contains 1 column for indices plus
20 columns for each residue. csv output for prescott is exacly in the
same format. However, population information coming from the
gnomad file modulated some data points. The csv files can be visualized in
MS Excel or any other spreadsheet program.
is in the exactly
parent b967a86c
......@@ -16,7 +16,7 @@ Entire Single Point Mutation Landscape Calculations
By default, ESCOTT will only output the combined (independent
and epistatic) scores*:
There are three output files:
Assuming that your fasta sequence has a name 'myProt' after '>' character, there will be three output files:
#. myProt_normPred_evolCombi.txt
......@@ -26,21 +26,25 @@ There are three output files:
in your protein of interest. Since this file is horizontal, it is easy to
read it in R or Python but difficult to find the mutations you are interested.
#. myProt_normPred_evolCombiTransposed.txt
#. myProt_normPred_evolCombiTransposedRanksorted.csv
As the name implies, this is the transposed version of the combined results.
It is easier to find the mutations you are interested in this file. Just
check the row corresponding to the mutation.
As the name implies, this is the transposed and reverse ranksorted version of the combined results.
It is easier to find the mutations you are interested in this file. It can be opened with any spreadsheet
program like MS Excel. Each row is an amino acid in the protein and 20 columns contain mutational effects
of the original amino acid. The values are between 0 and 1. While 0 indicates no effect, 1 indicates a
high impact.
#. myProt_normPred_evolCombi.png
This is the image file of the combined results. It selects 'Oranges_r'
This is the image file of the combined results. It selects 'turbo_r'
matplotlib color map by default. You can change it by adding '--colormap turbo_r'
for a more fancy look during the esgemme call. It --colormap argument accepts
for a more fancy look during the escott call. It --colormap argument accepts
all the color maps in matplotlib.
If your query sequence is longer than 500 amino acids, the program may produce
multiple png files, each one containing a 500 residue segment.
Note*: If you want to see epistatic and independent contributions as well,
you should add '--verbose true' argument while calling esgemme.
you should add '--verbose true' argument while calling escott.
Selected Single Point or Multiple Point Mutation Calculations
-------------------------------------------------------------
......
......@@ -145,3 +145,16 @@ mutation and its predicted effect, separated by a space.
Running several jobs using docker
---------------------------------
If you want to use docker in a more automated way for several proteins,
you can call docker within a bash script.
.. code:: bash
sudo docker run --rm -v $PWD:/home/tekpinar/research/lcqb tekpinar/prescott-docker:v1.5.0 escott aliBLAT.fasta --pdbfile blat-af2.pdb
Note: It is very important to have aliBLAT.fasta and blat-af2.pdb files in your local folder when you call docker like an executable.
Typically, I create a folder for each protein that contain the alignment and the structure. Then, I change the path to each folder with 'cd'
command inside bash script and execute the command above in each local folder.
......@@ -94,7 +94,7 @@ it is not always the case.</p>
<h2>Entire Single Point Mutation Landscape Calculations<a class="headerlink" href="#entire-single-point-mutation-landscape-calculations" title="Permalink to this heading"></a></h2>
<p>By default, ESCOTT will only output the combined (independent
and epistatic) scores*:</p>
<p>There are three output files:</p>
<p>Assuming that your fasta sequence has a name ‘myProt’ after ‘&gt;’ character, there will be three output files:</p>
<ol class="arabic">
<li><p>myProt_normPred_evolCombi.txt</p>
<blockquote>
......@@ -105,24 +105,28 @@ in your protein of interest. Since this file is horizontal, it is easy to
read it in R or Python but difficult to find the mutations you are interested.</p>
</div></blockquote>
</li>
<li><p>myProt_normPred_evolCombiTransposed.txt</p>
<li><p>myProt_normPred_evolCombiTransposedRanksorted.csv</p>
<blockquote>
<div><p>As the name implies, this is the transposed version of the combined results.
It is easier to find the mutations you are interested in this file. Just
check the row corresponding to the mutation.</p>
<div><p>As the name implies, this is the transposed and reverse ranksorted version of the combined results.
It is easier to find the mutations you are interested in this file. It can be opened with any spreadsheet
program like MS Excel. Each row is an amino acid in the protein and 20 columns contain mutational effects
of the original amino acid. The values are between 0 and 1. While 0 indicates no effect, 1 indicates a
high impact.</p>
</div></blockquote>
</li>
<li><p>myProt_normPred_evolCombi.png</p>
<blockquote>
<div><p>This is the image file of the combined results. It selects ‘Oranges_r’
<div><p>This is the image file of the combined results. It selects ‘turbo_r’
matplotlib color map by default. You can change it by adding ‘–colormap turbo_r’
for a more fancy look during the esgemme call. It –colormap argument accepts
all the color maps in matplotlib.</p>
for a more fancy look during the escott call. It –colormap argument accepts
all the color maps in matplotlib.
If your query sequence is longer than 500 amino acids, the program may produce
multiple png files, each one containing a 500 residue segment.</p>
</div></blockquote>
</li>
</ol>
<p>Note*: If you want to see epistatic and independent contributions as well,
you should add ‘–verbose true’ argument while calling esgemme.</p>
you should add ‘–verbose true’ argument while calling escott.</p>
</section>
<section id="selected-single-point-or-multiple-point-mutation-calculations">
<h2>Selected Single Point or Multiple Point Mutation Calculations<a class="headerlink" href="#selected-single-point-or-multiple-point-mutation-calculations" title="Permalink to this heading"></a></h2>
......
......@@ -200,6 +200,14 @@ mutation and its predicted effect, separated by a space.</p>
</section>
<section id="running-several-jobs-using-docker">
<h2>Running several jobs using docker<a class="headerlink" href="#running-several-jobs-using-docker" title="Permalink to this heading"></a></h2>
<p>If you want to use docker in a more automated way for several proteins,
you can call docker within a bash script.</p>
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>sudo docker run --rm -v <span class="nv">$PWD</span>:/home/tekpinar/research/lcqb tekpinar/prescott-docker:v1.5.0 escott aliBLAT.fasta --pdbfile blat-af2.pdb
</pre></div>
</div>
<p>Note: It is very important to have aliBLAT.fasta and blat-af2.pdb files in your local folder when you call docker like an executable.
Typically, I create a folder for each protein that contain the alignment and the structure. Then, I change the path to each folder with ‘cd’
command inside bash script and execute the command above in each local folder.</p>
</section>
</section>
......
......@@ -169,13 +169,20 @@ positions in the sequence (e.g. D136R,V271A).</p>
</pre></div>
</div>
<p>Run the program by issuing the following command in a bash terminal:</p>
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>prescott -e ../data/MLH1_normPred_evolCombi.txt -g ../data/gnomAD_v2.1.1_MLH1_HUMAN_ENSG00000076242.csv -s ../data/MLH1.fasta
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>prescott -e ../data/MLH1_normPred_evolCombi.txt -g ../data/gnomAD_v4.0.0_MLH1_HUMAN_ENSG00000076242.csv -s ../data/MLH1.fasta
</pre></div>
</div>
<p>The most important output is prescott-scores.txt file, which produces
frequecy modified scores for the mutations.</p>
<p>Please note that the example input files for prescott are in the data
directory of this repository.</p>
<p>GnomAD v4.0.0 is the most comprehensive, publicly available human population dataset as far as we know. However,
if you would like to use GnomAD v2.1.1, you should specify the version with ‘–gnomadversion’ parameter as below:</p>
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>prescott -e ../data/MLH1_normPred_evolCombi.txt -g ../data/gnomAD_v2.1.1_MLH1_HUMAN_ENSG00000076242.csv -s ../data/MLH1.fasta --gnomadversion <span class="m">2</span>
</pre></div>
</div>
<p>The most important output is prescott-scores.csv file, which produces entire single point mutational landscape for the protein.</p>
<p>In addition, there is a file called prescott-scores-details.csv. The file contains all information about the points modulated by population
information coming from gnomad file and non-modulated variants.</p>
<p>Finally, if you have both pathogenic and benign labels in the gnomad file, there will be a ‘clinvar-vs-position.png’ file showing how these labeled
variants are affected by population information.</p>
<p>Please note that the example input files of MLH1 protein for prescott acalculations are in the data directory of this repository.</p>
</section>
</section>
<section id="installation">
......
Search.setIndex({"docnames": ["analysis", "docker", "index", "input-preparation", "installation", "introduction"], "filenames": ["analysis.rst", "docker.rst", "index.rst", "input-preparation.rst", "installation.rst", "introduction.rst"], "titles": ["Analyzing and Modifying the ESCOTT Output", "Using ESCOTT via Docker", "Welcome to PRESCOTT documentation!", "Preparing Your Own Input", "Installation", "Introduction"], "terms": {"There": [0, 3], "hardcod": 0, "limit": 0, "howev": [0, 1, 3], "valu": 0, "rang": 0, "between": 0, "12": 0, "2": 0, "gener": 0, "The": [0, 1, 3, 5], "lower": 0, "mean": 0, "impact": [0, 1, 3], "while": 0, "close": 0, "0": [0, 1, 3, 4], "doe": [0, 3], "have": [0, 1, 3, 4, 5], "ani": [0, 5], "signific": 0, "we": [0, 1, 3, 4, 5], "should": [0, 4, 5], "note": [0, 4, 5], "most": [0, 5], "ar": [0, 1, 3, 4, 5], "deleteri": 0, "alwai": 0, "case": [0, 1, 4], "By": [0, 5], "default": [0, 2, 5], "onli": [0, 1, 4, 5], "combin": [0, 5], "independ": 0, "epistat": [0, 5], "three": [0, 5], "file": [0, 1, 2, 3, 5], "myprot_normpred_evolcombi": 0, "txt": [0, 1, 5], "myprot": 0, "short": 0, "name": [0, 1], "msa": [0, 1, 2, 4, 5], "your": [0, 1, 2, 4, 5], "protein": [0, 1, 3, 5], "contain": [0, 1, 3, 4, 5], "20": [0, 4], "row": 0, "amino": 0, "acid": 0, "alphabet": 0, "order": [0, 1, 5], "l": [0, 3], "column": [0, 3], "where": [0, 3], "number": 0, "interest": [0, 1, 3], "sinc": [0, 4, 5], "thi": [0, 1, 3, 4, 5], "horizont": 0, "easi": 0, "read": 0, "r": [0, 4, 5], "python": [0, 4, 5], "difficult": 0, "find": [0, 5], "you": [0, 1, 3, 4, 5], "myprot_normpred_evolcombitranspos": 0, "As": [0, 1], "impli": 0, "transpos": 0, "version": [0, 4], "result": 0, "It": [0, 3, 4, 5], "easier": 0, "just": 0, "check": [0, 1, 4], "correspond": 0, "prepar": [1, 2], "imag": [0, 3, 4, 5], "oranges_r": 0, "matplotlib": 0, "color": 0, "map": 0, "can": [0, 1, 3, 4, 5], "chang": [0, 1, 3], "ad": [0, 3, 4], "colormap": 0, "turbo_r": 0, "more": [0, 3], "fanci": 0, "look": 0, "dure": 0, "call": [0, 1, 4], "argument": 0, "accept": 0, "all": [0, 1, 3, 4, 5], "If": [0, 1, 4, 5], "want": [0, 1, 3], "see": [0, 1, 3], "contribut": 0, "well": [0, 1], "add": [0, 4], "verbos": 0, "true": [0, 4], "us": [0, 2, 3, 4, 5], "predict": [0, 5], "same": [0, 1], "format": [0, 1, 3, 5], "a2c": 0, "6": 0, "23": 0, "a2d": 0, "1": [0, 3, 5], "d286f": 0, "need": [1, 5], "instal": [1, 2], "machin": 1, "consult": [1, 5], "follow": [1, 3, 4, 5], "page": [1, 2], "http": [1, 3, 4, 5], "doc": [1, 5], "com": [1, 3, 4, 5], "i": [1, 3, 4], "am": 1, "assum": [1, 5], "some": [1, 5], "basic": [1, 5], "familiar": 1, "linux": [1, 4, 5], "unix": 1, "maco": 1, "termin": [1, 3, 5], "command": [1, 3, 4, 5], "let": [1, 3, 5], "s": [1, 3, 4, 5], "start": [1, 3], "our": [1, 3, 4, 5], "favorit": 1, "app": 1, "must": [1, 5], "creat": [1, 3], "folder": [1, 3, 4, 5], "tutori": 1, "go": [1, 3, 4], "empti": 1, "mkdir": [1, 4], "cd": [1, 3, 4], "download": [1, 4, 5], "sampl": 1, "provid": 1, "repositori": [1, 4, 5], "exercis": 1, "first": [1, 5], "sequenc": [1, 3, 5], "align": [1, 3, 5], "fasta": [1, 3, 5], "wget": [1, 4], "gitlab": [1, 4], "lcqb": [1, 4], "upmc": [1, 4], "fr": [1, 4], "tekpinar": [1, 3, 4, 5], "raw": [1, 2], "8d766d4d11af0e93c9da8fc2c5cc1bfc457d2936": 1, "aliblat": 1, "don": 1, "t": 1, "try": 1, "curl": 1, "pleas": [1, 3, 4, 5], "verifi": 1, "now": [1, 3, 4], "pdb": [1, 2, 5], "databank": 1, "blat": [1, 5], "af2": 1, "In": [1, 4, 5], "make": [1, 3, 4], "sure": [1, 3], "sudo": [1, 3, 4], "h": 1, "show": 1, "list": [1, 3, 4], "option": [1, 5], "good": 1, "track": 1, "On": [1, 5], "mai": 1, "word": 1, "befor": [1, 3, 5], "ti": [1, 3], "rm": [1, 3, 4], "mount": [1, 3], "type": [1, 3, 5], "bind": [1, 3], "sourc": [1, 3, 4, 5], "pwd": [1, 3], "target": [1, 3], "home": [1, 3, 4], "research": [1, 3, 4, 5], "myexampl": [1, 3], "v1": [1, 3], "3": [4, 5], "virtual": 1, "oper": 1, "system": [1, 4], "previou": 1, "when": 1, "ls": [1, 3], "suppos": [1, 3], "host": 1, "place": 1, "step": [1, 3, 4, 5], "evolutionari": [1, 4, 5], "inform": [1, 5], "from": [1, 3, 4, 5], "an": [1, 3, 4], "esgemme_path": [], "py": [], "f": 4, "after": [1, 4], "few": [1, 3], "minut": [1, 3], "least": 1, "two": [1, 5], "blat_normpred_evolcombi": 1, "png": [0, 1], "util": [1, 4], "structur": [1, 3, 5], "highli": [1, 5], "recommend": [1, 4, 5], "pdbfile": [1, 5], "normweightmod": [], "sstjetormax": [], "bunch": 1, "mut": 1, "simpl": 1, "text": 1, "each": [1, 5], "line": [1, 4, 5], "d26a": 1, "fortun": 1, "master": [1, 4], "stiffler_2015_blat_ecolx": 1, "similar": 1, "possibl": [1, 5], "wai": [1, 5], "do": [1, 3, 4], "without": 1, "m": [1, 5], "includ": 1, "code": [], "bash": 5, "output": [1, 2, 5], "complet": 1, "differ": 1, "scan": 1, "its": [1, 4, 5], "separ": [1, 5], "space": 1, "addit": [1, 5], "won": 1, "like": [1, 3, 4], "sometim": 1, "doubl": 1, "tripl": 1, "perform": 1, "e26d": 1, "y44r": 1, "e56n": 1, "a77f": 1, "h94v": 1, "second": 1, "colon": [1, 5], "charact": 1, "via": [2, 5], "docker": [2, 3, 4, 5], "requir": [2, 3, 4], "get": [2, 4], "exampl": [2, 3, 5], "input": 2, "data": 2, "singl": [2, 5], "point": [2, 5], "mutat": [2, 3, 5], "calcul": [2, 5], "multipl": [2, 3, 5], "run": [2, 3], "sever": 2, "job": 2, "analyz": 2, "modifi": [2, 4, 5], "score": [2, 5], "Their": 2, "interpret": 2, "entir": 2, "landscap": 2, "select": [2, 3], "own": [2, 4], "colabfold": [2, 5], "depend": [2, 3, 5], "environ": 2, "configur": 2, "conf": 2, "index": 2, "modul": 2, "search": 2, "understand": 3, "certain": 3, "gap": [3, 5], "quickest": 3, "method": 3, "obtain": [3, 4, 5], "both": [3, 5], "web": [3, 4, 5], "site": [3, 4, 5], "colab": [3, 5], "googl": [3, 5], "github": [3, 4, 5], "sokrypton": [3, 5], "blob": [3, 5], "main": [3, 5], "alphafold2": [3, 5], "ipynb": [3, 5], "sign": 3, "gmail": 3, "account": [3, 4], "click": 3, "connect": 3, "button": 3, "top": 3, "right": 3, "hand": [3, 5], "side": 3, "clean": [3, 4], "query_sequ": 3, "box": 3, "past": 3, "For": [3, 5], "me": 3, "adenyl": 3, "kinas": 3, "ak": 3, "my": [3, 4], "www": [3, 4, 5], "rcsb": 3, "org": [3, 4, 5], "entri": 3, "4ake": 3, "displai": 3, "jobnam": 3, "someth": 3, "sens": 3, "menu": 3, "bar": 3, "notebook": 3, "edit": 3, "view": 3, "insert": 3, "runtim": 3, "tool": 3, "help": [3, 5], "process": 3, "take": 3, "hour": 3, "size": 3, "give": 3, "a3m": [3, 5], "up": [3, 5], "5": [1, 3], "model": [3, 5], "put": 3, "directori": [3, 5], "unfortun": 3, "those": 3, "gui": [3, 5], "program": [3, 4], "ugen": [3, 5], "jalview": [3, 5], "labor": 3, "intens": 3, "procedur": 3, "here": [3, 4], "small": 3, "develop": [3, 4], "esgemm": [0, 4], "script": [3, 4, 5], "hhsuit": [3, 4], "convert": 3, "reformat": 3, "pl": 3, "fa": 3, "final": 3, "demust": 3, "removegap": 3, "o": [3, 4], "ake_nogap": 3, "congratul": 3, "ii": [], "myprotein": 3, "implement": [4, 5], "ha": [4, 5], "been": [4, 5], "test": [4, 5], "mani": [4, 5], "determin": [4, 5], "user": [4, 5], "come": 4, "extern": 4, "joint": 4, "tree": 4, "jet2": 4, "java": 4, "naccess": 4, "bioinf": 4, "manchest": 4, "ac": 4, "uk": 4, "muscl": 4, "drive5": [], "seqinr": 4, "packag": [4, 5], "cran": 4, "project": 4, "html": [], "dssp": 4, "secondari": 4, "These": [], "abl": 4, "defin": 4, "export": 4, "variabl": 4, "path": 4, "insid": 4, "import": [4, 5], "essenti": 4, "paramet": 4, "part": [3, 4], "intern": 4, "etc": 4, "correct": 4, "softwar": 4, "section": 4, "accord": [4, 5], "ubuntu": 4, "22": 4, "04": 4, "apt": 4, "updat": 4, "fix": 4, "miss": 4, "y": 4, "properti": 4, "common": 4, "autotool": 4, "dev": 4, "automak": 4, "build": 4, "python3": 4, "pip": 4, "base": [4, 5], "core": 4, "jre": 4, "ncbi": 4, "blast": 4, "nano": 4, "less": 4, "csh": 4, "hmmer": 4, "libboost": 4, "rf": 4, "var": 4, "lib": 4, "tmp": 4, "otherwis": 4, "work": 4, "properli": 4, "cmbi": 4, "archiv": 4, "ref": 4, "head": 4, "zip": 4, "unzip": 4, "autogen": 4, "sh": 4, "ln": 4, "usr": 4, "local": 4, "bin": 4, "mkdssp": 4, "soedinglab": 4, "hh": 4, "suit": 4, "releas": 4, "v3": 4, "avx2": 4, "tar": 4, "gz": 4, "mv": 4, "xvfz": 4, "perman": 4, "bashrc": 4, "profil": 4, "bash_profil": 4, "locat": 4, "wa": 4, "therefor": [4, 5], "Then": 4, "rscript": 4, "e": [4, 5], "repo": 4, "ev_coupl": 4, "pip3": 4, "debbiemarkslab": 4, "plmc": 4, "openmp32": 4, "cp": 4, "one": 3, "last": 3, "reach": 3, "goal": 3, "id": 3, "descript": 3, "too": 3, "long": 3, "shorten": 3, "them": [3, 5], "awk": 3, "begin": 3, "fs": 3, "nf": 3, "printf": 3, "n": 3, "els": 3, "print": 3, "ake_nogaps_short_nam": 3, "recheck": 3, "remov": [3, 5], "extra": 3, "jet2_path": 4, "open": 4, "gedit": 4, "below": [4, 5], "end": [4, 5], "save": 4, "exit": 4, "forget": 4, "replac": 4, "so": 4, "taken": 4, "Of": 4, "cours": 4, "would": 4, "git": 4, "clone": 4, "b": 4, "4": [], "prescott": [1, 3], "escott": 2, "popul": 5, "awar": 5, "effect": 5, "introduct": 2, "usag": 2, "cite": [], "licens": [], "mit": [], "logo": [], "made": 5, "other": 5, "incorpor": 5, "frequenc": 5, "due": 5, "mandatori": 5, "queri": 5, "One": 5, "fastest": 5, "produc": 5, "pragram": 5, "net": 5, "purpos": 5, "normpredcombi": 5, "gnomad": 5, "csv": 5, "broadinstitut": 5, "document": 5, "further": 5, "detail": 5, "inputali": 5, "issu": 5, "A": 5, "quick": 5, "access": 5, "posit": 5, "altern": 5, "set": 5, "given": 5, "g": 5, "d136r": 5, "v271a": 5, "mlh1_normpred_evolcombi": 5, "gnomad_v2": 5, "1_mlh1_human_ensg00000076242": 5, "mlh1": 5, "which": 5, "frequeci": 5, "link": 5, "soon": [], "img": [], "shield": [], "io": [], "badg": [], "yellow": [], "svg": [], "opensourc": [], "what": 2, "citat": 2, "mustafa": 5, "thoma": 5, "henri": 5, "alessandra": 5, "carbon": 5, "accur": 5, "missens": 5}, "objects": {}, "objtypes": {}, "objnames": {}, "titleterms": {"analyz": 0, "modifi": 0, "esgemm": [], "output": 0, "raw": 0, "score": 0, "Their": 0, "interpret": 0, "entir": [0, 1], "singl": [0, 1], "point": [0, 1], "mutat": [0, 1], "landscap": [0, 1], "calcul": [0, 1], "select": 0, "multipl": [0, 1], "us": 1, "via": 1, "docker": 1, "requir": [1, 5], "get": 1, "exampl": 1, "input": [1, 3, 5], "data": [1, 5], "obtain": 1, "predict": 1, "effect": 1, "subset": 1, "run": [1, 5], "sever": 1, "job": 1, "welcom": 2, "s": [], "document": 2, "content": 2, "indic": 2, "tabl": 2, "prepar": [3, 4], "your": 3, "own": 3, "msa": 3, "pdb": 3, "colabfold": 3, "instal": [4, 5], "depend": 4, "sourc": [], "code": [], "environ": 4, "configur": 4, "default": 4, "conf": 4, "file": 4, "prescott": [2, 4, 5], "escott": [0, 1, 5], "popul": [], "awar": [], "epistat": [], "structur": [], "model": [], "introduct": 5, "usag": 5, "program": 5, "cite": [], "what": 5, "citat": 5}, "envversion": {"sphinx.domains.c": 2, "sphinx.domains.changeset": 1, "sphinx.domains.citation": 1, "sphinx.domains.cpp": 6, "sphinx.domains.index": 1, "sphinx.domains.javascript": 2, "sphinx.domains.math": 2, "sphinx.domains.python": 3, "sphinx.domains.rst": 2, "sphinx.domains.std": 2, "sphinx": 56}})
\ No newline at end of file
Search.setIndex({"docnames": ["analysis", "docker", "index", "input-preparation", "installation", "introduction"], "filenames": ["analysis.rst", "docker.rst", "index.rst", "input-preparation.rst", "installation.rst", "introduction.rst"], "titles": ["Analyzing and Modifying the ESCOTT Output", "Using ESCOTT via Docker", "Welcome to PRESCOTT documentation!", "Preparing Your Own Input", "Installation", "Introduction"], "terms": {"There": [0, 3], "hardcod": 0, "limit": 0, "howev": [0, 1, 3, 5], "valu": 0, "rang": 0, "between": 0, "12": 0, "2": [0, 5], "gener": 0, "The": [0, 1, 3, 5], "lower": 0, "mean": 0, "impact": [0, 1, 3], "while": 0, "close": 0, "0": [0, 1, 3, 4, 5], "doe": [0, 3], "have": [0, 1, 3, 4, 5], "ani": [0, 5], "signific": 0, "we": [0, 1, 3, 4, 5], "should": [0, 4, 5], "note": [0, 1, 4, 5], "most": [0, 5], "ar": [0, 1, 3, 4, 5], "deleteri": 0, "alwai": 0, "case": [0, 1, 4], "By": [0, 5], "default": [0, 2, 5], "onli": [0, 1, 4, 5], "combin": [0, 5], "independ": 0, "epistat": [0, 5], "three": [0, 5], "file": [0, 1, 2, 3, 5], "myprot_normpred_evolcombi": 0, "txt": [0, 1, 5], "myprot": 0, "short": 0, "name": [0, 1], "msa": [0, 1, 2, 4, 5], "your": [0, 1, 2, 4, 5], "protein": [0, 1, 3, 5], "contain": [0, 1, 3, 4, 5], "20": [0, 4], "row": 0, "amino": 0, "acid": 0, "alphabet": 0, "order": [0, 1, 5], "l": [0, 3], "column": [0, 3], "where": [0, 3], "number": 0, "interest": [0, 1, 3], "sinc": [0, 4, 5], "thi": [0, 1, 3, 4, 5], "horizont": 0, "easi": 0, "read": 0, "r": [0, 4, 5], "python": [0, 4, 5], "difficult": 0, "find": [0, 5], "you": [0, 1, 3, 4, 5], "myprot_normpred_evolcombitranspos": [], "As": [0, 1], "impli": 0, "transpos": 0, "version": [0, 4, 5], "result": 0, "It": [0, 1, 3, 4, 5], "easier": 0, "just": [], "check": [1, 4], "correspond": [], "prepar": [1, 2], "imag": [0, 3, 4, 5], "oranges_r": [], "matplotlib": 0, "color": 0, "map": 0, "can": [0, 1, 3, 4, 5], "chang": [0, 1, 3], "ad": [0, 3, 4], "colormap": 0, "turbo_r": 0, "more": [0, 1, 3], "fanci": 0, "look": 0, "dure": 0, "call": [0, 1, 4, 5], "argument": 0, "accept": 0, "all": [0, 1, 3, 4, 5], "If": [0, 1, 4, 5], "want": [0, 1, 3], "see": [0, 1, 3], "contribut": 0, "well": [0, 1], "add": [0, 4], "verbos": 0, "true": [0, 4], "us": [0, 2, 3, 4, 5], "predict": [0, 5], "same": [0, 1], "format": [0, 1, 3, 5], "a2c": 0, "6": 0, "23": 0, "a2d": 0, "1": [0, 3, 5], "d286f": 0, "need": [1, 5], "instal": [1, 2], "machin": 1, "consult": [1, 5], "follow": [1, 3, 4, 5], "page": [1, 2], "http": [1, 3, 4, 5], "doc": [1, 5], "com": [1, 3, 4, 5], "i": [1, 3, 4], "am": 1, "assum": [0, 1, 5], "some": [1, 5], "basic": [1, 5], "familiar": 1, "linux": [1, 4, 5], "unix": 1, "maco": 1, "termin": [1, 3, 5], "command": [1, 3, 4, 5], "let": [1, 3, 5], "s": [1, 3, 4, 5], "start": [1, 3], "our": [1, 3, 4, 5], "favorit": 1, "app": 1, "must": [1, 5], "creat": [1, 3], "folder": [1, 3, 4, 5], "tutori": 1, "go": [1, 3, 4], "empti": 1, "mkdir": [1, 4], "cd": [1, 3, 4], "download": [1, 4, 5], "sampl": 1, "provid": 1, "repositori": [1, 4, 5], "exercis": 1, "first": [1, 5], "sequenc": [0, 1, 3, 5], "align": [1, 3, 5], "fasta": [0, 1, 3, 5], "wget": [1, 4], "gitlab": [1, 4], "lcqb": [1, 4], "upmc": [1, 4], "fr": [1, 4], "tekpinar": [1, 3, 4, 5], "raw": [1, 2], "8d766d4d11af0e93c9da8fc2c5cc1bfc457d2936": 1, "aliblat": 1, "don": 1, "t": 1, "try": 1, "curl": 1, "pleas": [1, 3, 4, 5], "verifi": 1, "now": [1, 3, 4], "pdb": [1, 2, 5], "databank": 1, "blat": [1, 5], "af2": 1, "In": [1, 4, 5], "make": [1, 3, 4], "sure": [1, 3], "sudo": [1, 3, 4], "h": 1, "show": [1, 5], "list": [1, 3, 4], "option": [1, 5], "good": 1, "track": 1, "On": [1, 5], "mai": [0, 1], "word": 1, "befor": [1, 3, 5], "ti": [1, 3], "rm": [1, 3, 4], "mount": [1, 3], "type": [1, 3, 5], "bind": [1, 3], "sourc": [1, 3, 4, 5], "pwd": [1, 3], "target": [1, 3], "home": [1, 3, 4], "research": [1, 3, 4, 5], "myexampl": [1, 3], "v1": [1, 3], "3": [4, 5], "virtual": 1, "oper": 1, "system": [1, 4], "previou": 1, "when": 1, "ls": [1, 3], "suppos": [1, 3], "host": 1, "place": 1, "step": [1, 3, 4, 5], "evolutionari": [1, 4, 5], "inform": [1, 5], "from": [1, 3, 4, 5], "an": [0, 1, 3, 4], "esgemme_path": [], "py": [], "f": 4, "after": [0, 1, 4], "few": [1, 3], "minut": [1, 3], "least": 1, "two": [1, 5], "blat_normpred_evolcombi": 1, "png": [0, 1, 5], "util": [1, 4], "structur": [1, 3, 5], "highli": [1, 5], "recommend": [1, 4, 5], "pdbfile": [1, 5], "normweightmod": [], "sstjetormax": [], "bunch": 1, "mut": 1, "simpl": 1, "text": 1, "each": [0, 1, 5], "line": [1, 4, 5], "d26a": 1, "fortun": 1, "master": [1, 4], "stiffler_2015_blat_ecolx": 1, "similar": 1, "possibl": [1, 5], "wai": [1, 5], "do": [1, 3, 4], "without": 1, "m": [1, 5], "includ": 1, "code": [], "bash": [1, 5], "output": [1, 2, 5], "complet": 1, "differ": 1, "scan": 1, "its": [1, 4, 5], "separ": [1, 5], "space": 1, "addit": [1, 5], "won": 1, "like": [0, 1, 3, 4, 5], "sometim": 1, "doubl": 1, "tripl": 1, "perform": 1, "e26d": 1, "y44r": 1, "e56n": 1, "a77f": 1, "h94v": 1, "second": 1, "colon": [1, 5], "charact": [0, 1], "via": [2, 5], "docker": [2, 3, 4, 5], "requir": [2, 3, 4], "get": [2, 4], "exampl": [2, 3, 5], "input": 2, "data": 2, "singl": [2, 5], "point": [2, 5], "mutat": [2, 3, 5], "calcul": [2, 5], "multipl": [2, 3, 5], "run": [2, 3], "sever": 2, "job": 2, "analyz": 2, "modifi": [2, 4], "score": [2, 5], "Their": 2, "interpret": 2, "entir": [2, 5], "landscap": [2, 5], "select": [2, 3], "own": [2, 4], "colabfold": [2, 5], "depend": [2, 3, 5], "environ": 2, "configur": 2, "conf": 2, "index": 2, "modul": [2, 5], "search": 2, "understand": 3, "certain": 3, "gap": [3, 5], "quickest": 3, "method": 3, "obtain": [3, 4, 5], "both": [3, 5], "web": [3, 4, 5], "site": [3, 4, 5], "colab": [3, 5], "googl": [3, 5], "github": [3, 4, 5], "sokrypton": [3, 5], "blob": [3, 5], "main": [3, 5], "alphafold2": [3, 5], "ipynb": [3, 5], "sign": 3, "gmail": 3, "account": [3, 4], "click": 3, "connect": 3, "button": 3, "top": 3, "right": 3, "hand": [3, 5], "side": 3, "clean": [3, 4], "query_sequ": 3, "box": 3, "past": 3, "For": [3, 5], "me": 3, "adenyl": 3, "kinas": 3, "ak": 3, "my": [3, 4], "www": [3, 4, 5], "rcsb": 3, "org": [3, 4, 5], "entri": 3, "4ake": 3, "displai": 3, "jobnam": 3, "someth": 3, "sens": 3, "menu": 3, "bar": 3, "notebook": 3, "edit": 3, "view": 3, "insert": 3, "runtim": 3, "tool": 3, "help": [3, 5], "process": 3, "take": 3, "hour": 3, "size": 3, "give": 3, "a3m": [3, 5], "up": [3, 5], "5": [1, 3], "model": [3, 5], "put": 3, "directori": [3, 5], "unfortun": 3, "those": 3, "gui": [3, 5], "program": [0, 3, 4], "ugen": [3, 5], "jalview": [3, 5], "labor": 3, "intens": 3, "procedur": 3, "here": [3, 4], "small": 3, "develop": [3, 4], "esgemm": 4, "script": [1, 3, 4, 5], "hhsuit": [3, 4], "convert": 3, "reformat": 3, "pl": 3, "fa": 3, "final": [3, 5], "demust": 3, "removegap": 3, "o": [3, 4], "ake_nogap": 3, "congratul": 3, "ii": [], "myprotein": 3, "implement": [4, 5], "ha": [0, 4, 5], "been": [4, 5], "test": [4, 5], "mani": [4, 5], "determin": [4, 5], "user": [4, 5], "come": [4, 5], "extern": 4, "joint": 4, "tree": 4, "jet2": 4, "java": 4, "naccess": 4, "bioinf": 4, "manchest": 4, "ac": 4, "uk": 4, "muscl": 4, "drive5": [], "seqinr": 4, "packag": [4, 5], "cran": 4, "project": 4, "html": [], "dssp": 4, "secondari": 4, "These": [], "abl": 4, "defin": 4, "export": 4, "variabl": 4, "path": [1, 4], "insid": [1, 4], "import": [1, 4, 5], "essenti": 4, "paramet": [4, 5], "part": [3, 4], "intern": 4, "etc": 4, "correct": 4, "softwar": 4, "section": 4, "accord": [4, 5], "ubuntu": 4, "22": 4, "04": 4, "apt": 4, "updat": 4, "fix": 4, "miss": 4, "y": 4, "properti": 4, "common": 4, "autotool": 4, "dev": 4, "automak": 4, "build": 4, "python3": 4, "pip": 4, "base": [4, 5], "core": 4, "jre": 4, "ncbi": 4, "blast": 4, "nano": 4, "less": 4, "csh": 4, "hmmer": 4, "libboost": 4, "rf": 4, "var": 4, "lib": 4, "tmp": 4, "otherwis": 4, "work": 4, "properli": 4, "cmbi": 4, "archiv": 4, "ref": 4, "head": 4, "zip": 4, "unzip": 4, "autogen": 4, "sh": 4, "ln": 4, "usr": 4, "local": [1, 4], "bin": 4, "mkdssp": 4, "soedinglab": 4, "hh": 4, "suit": 4, "releas": 4, "v3": 4, "avx2": 4, "tar": 4, "gz": 4, "mv": 4, "xvfz": 4, "perman": 4, "bashrc": 4, "profil": 4, "bash_profil": 4, "locat": 4, "wa": 4, "therefor": [4, 5], "Then": [1, 4], "rscript": 4, "e": [4, 5], "repo": 4, "ev_coupl": 4, "pip3": 4, "debbiemarkslab": 4, "plmc": 4, "openmp32": 4, "cp": 4, "one": [0, 3], "last": 3, "reach": 3, "goal": 3, "id": 3, "descript": 3, "too": 3, "long": 3, "shorten": 3, "them": [3, 5], "awk": 3, "begin": 3, "fs": 3, "nf": 3, "printf": 3, "n": 3, "els": 3, "print": 3, "ake_nogaps_short_nam": 3, "recheck": 3, "remov": [3, 5], "extra": 3, "jet2_path": 4, "open": [0, 4], "gedit": 4, "below": [4, 5], "end": [4, 5], "save": 4, "exit": 4, "forget": 4, "replac": 4, "so": 4, "taken": 4, "Of": 4, "cours": 4, "would": [4, 5], "git": 4, "clone": 4, "b": 4, "4": [], "prescott": [1, 3], "escott": 2, "popul": 5, "awar": 5, "effect": [0, 5], "introduct": 2, "usag": 2, "cite": [], "licens": [], "mit": [], "logo": [], "made": 5, "other": 5, "incorpor": 5, "frequenc": 5, "due": 5, "mandatori": 5, "queri": [0, 5], "One": 5, "fastest": 5, "produc": [0, 5], "pragram": 5, "net": 5, "purpos": 5, "normpredcombi": 5, "gnomad": 5, "csv": [0, 5], "broadinstitut": 5, "document": 5, "further": 5, "detail": 5, "inputali": 5, "issu": 5, "A": 5, "quick": 5, "access": 5, "posit": 5, "altern": 5, "set": 5, "given": 5, "g": 5, "d136r": 5, "v271a": 5, "mlh1_normpred_evolcombi": 5, "gnomad_v2": 5, "1_mlh1_human_ensg00000076242": 5, "mlh1": 5, "which": 5, "frequeci": [], "link": 5, "soon": [], "img": [], "shield": [], "io": [], "badg": [], "yellow": [], "svg": [], "opensourc": [], "what": 2, "citat": 2, "mustafa": 5, "thoma": 5, "henri": 5, "alessandra": 5, "carbon": 5, "accur": 5, "missens": 5, "myprot_normpred_evolcombitransposedranksort": 0, "revers": 0, "ranksort": 0, "spreadsheet": 0, "ms": 0, "excel": 0, "origin": 0, "indic": 0, "high": 0, "longer": 0, "than": 0, "500": 0, "residu": 0, "segment": 0, "autom": 1, "within": 1, "v": 1, "veri": 1, "execut": 1, "typic": 1, "abov": 1, "gnomad_v4": 5, "0_mlh1_human_ensg00000076242": 5, "v4": 5, "comprehens": 5, "publicli": 5, "avail": 5, "human": 5, "dataset": 5, "far": 5, "know": 5, "v2": 5, "specifi": 5, "gnomadvers": 5, "about": 5, "non": 5, "variant": 5, "pathogen": 5, "benign": 5, "label": 5, "clinvar": 5, "vs": 5, "how": 5, "affect": 5, "acalcul": 5}, "objects": {}, "objtypes": {}, "objnames": {}, "titleterms": {"analyz": 0, "modifi": 0, "esgemm": [], "output": 0, "raw": 0, "score": 0, "Their": 0, "interpret": 0, "entir": [0, 1], "singl": [0, 1], "point": [0, 1], "mutat": [0, 1], "landscap": [0, 1], "calcul": [0, 1], "select": 0, "multipl": [0, 1], "us": 1, "via": 1, "docker": 1, "requir": [1, 5], "get": 1, "exampl": 1, "input": [1, 3, 5], "data": [1, 5], "obtain": 1, "predict": 1, "effect": 1, "subset": 1, "run": [1, 5], "sever": 1, "job": 1, "welcom": 2, "s": [], "document": 2, "content": 2, "indic": 2, "tabl": 2, "prepar": [3, 4], "your": 3, "own": 3, "msa": 3, "pdb": 3, "colabfold": 3, "instal": [4, 5], "depend": 4, "sourc": [], "code": [], "environ": 4, "configur": 4, "default": 4, "conf": 4, "file": 4, "prescott": [2, 4, 5], "escott": [0, 1, 5], "popul": [], "awar": [], "epistat": [], "structur": [], "model": [], "introduct": 5, "usag": 5, "program": 5, "cite": [], "what": 5, "citat": 5}, "envversion": {"sphinx.domains.c": 2, "sphinx.domains.changeset": 1, "sphinx.domains.citation": 1, "sphinx.domains.cpp": 6, "sphinx.domains.index": 1, "sphinx.domains.javascript": 2, "sphinx.domains.math": 2, "sphinx.domains.python": 3, "sphinx.domains.rst": 2, "sphinx.domains.std": 2, "sphinx": 56}})
\ No newline at end of file
# Fdb version 3
["makeindex prescott.idx"] 1691770116 "prescott.idx" "prescott.ind" "prescott" 1698222132
"prescott.idx" 1698222131 0 d41d8cd98f00b204e9800998ecf8427e "pdflatex"
["makeindex prescott.idx"] 1691770116 "prescott.idx" "prescott.ind" "prescott" 1700227059
"prescott.idx" 1700227058 0 d41d8cd98f00b204e9800998ecf8427e "pdflatex"
(generated)
"prescott.ilg"
"prescott.ind"
["pdflatex"] 1698222131 "prescott.tex" "prescott.pdf" "prescott" 1698222132
["pdflatex"] 1700227058 "prescott.tex" "prescott.pdf" "prescott" 1700227059
"/etc/texmf/web2c/texmf.cnf" 1686915816 475 c0e671620eb5563b2130f56340a5fde8 ""
"/usr/share/texlive/texmf-dist/fonts/map/fontname/texfonts.map" 1577235249 3524 cb3e574dea2d1052e39280babc910dc8 ""
"/usr/share/texlive/texmf-dist/fonts/tfm/jknappen/ec/ecrm1000.tfm" 1136768653 3584 adb004a0c8e7c46ee66cad73671f37b4 ""
......@@ -140,13 +140,13 @@
"/usr/share/texmf/web2c/texmf.cnf" 1644012257 39432 7155514e09a3d69036fac785183a21c2 ""
"/var/lib/texmf/fonts/map/pdftex/updmap/pdftex.map" 1692610720 128028 f533b797fba58d231669ea19e894e23e ""
"/var/lib/texmf/web2c/pdftex/pdflatex.fmt" 1692610730 1455466 cf5d347cc06e6f91065ccb2619b82e95 ""
"prescott.aux" 1698222131 8757 49bd9be15e4a1990a75c401ccac48f11 "pdflatex"
"prescott.aux" 1700227058 8757 49bd9be15e4a1990a75c401ccac48f11 "pdflatex"
"prescott.ind" 1694694190 0 d41d8cd98f00b204e9800998ecf8427e "makeindex prescott.idx"
"prescott.out" 1698222132 6307 8119393c8f110ffbb217a1e7b2c390de "pdflatex"
"prescott.tex" 1698222131 31425 773dd700c24f5dfc70e1f98f9758474c ""
"prescott.toc" 1698222132 2706 6a2a8c488f73b27e9fedb6cddefdfa53 "pdflatex"
"prescott.out" 1700227058 6307 8119393c8f110ffbb217a1e7b2c390de "pdflatex"
"prescott.tex" 1700227058 33833 c5b13bae0058b8edd46789a6b12ea3ec ""
"prescott.toc" 1700227058 2706 6a2a8c488f73b27e9fedb6cddefdfa53 "pdflatex"
"sphinx.sty" 1691571542 12780 919e6ba449302e2597e7722681a087c6 ""
"sphinxhighlight.sty" 1698222131 6679 76d10c62e0f0661410b46f5db6118e26 ""
"sphinxhighlight.sty" 1700227057 6679 76d10c62e0f0661410b46f5db6118e26 ""
"sphinxlatexadmonitions.sty" 1691571543 6238 2d867d769abf3f72abc17ef52adff78b ""
"sphinxlatexcontainers.sty" 1691571543 901 d3a3a1b7b2547f47ade2499350b5c420 ""
"sphinxlatexgraphics.sty" 1691571543 4840 a9578332b6f3b35e198751fb632c9b79 ""
......@@ -161,7 +161,7 @@
"sphinxlatexstyletext.sty" 1691571543 6177 c18841ce3fafc366cd3b145f8faa4c37 ""
"sphinxlatextables.sty" 1691571542 21848 2827eb0b11b99b185a8b77317d3e131c ""
"sphinxmanual.cls" 1691571543 4241 7b0d7a37df7b5715fb0dbd585c52ecdb ""
"sphinxmessages.sty" 1698222131 745 3f5fcd6cdd7964ed608767954a8ced6f ""
"sphinxmessages.sty" 1700227058 745 3f5fcd6cdd7964ed608767954a8ced6f ""
"sphinxoptionsgeometry.sty" 1691571542 2061 47bb34b8ed8a78823eb0c886abfb9f4d ""
"sphinxoptionshyperref.sty" 1691571543 1094 79beb8b8a3f10784f8cce804e0f9d3aa ""
"sphinxpackagefootnote.sty" 1691571543 15217 dd26fe418b6fb1b26b18f042a7f43d40 ""
......
This is pdfTeX, Version 3.141592653-2.6-1.40.22 (TeX Live 2022/dev/Debian) (preloaded format=pdflatex 2023.8.21) 25 OCT 2023 10:22
This is pdfTeX, Version 3.141592653-2.6-1.40.22 (TeX Live 2022/dev/Debian) (preloaded format=pdflatex 2023.8.21) 17 NOV 2023 14:17
entering extended mode
restricted \write18 enabled.
%&-line parsing enabled.
......@@ -793,7 +793,7 @@ File: t1txtt.fd 2000/12/15 v3.1
Chapter 2.
[5]
LaTeX Font Info: Font shape `T1/txtt/b/n' in size <10> not available
(Font) Font shape `T1/txtt/bx/n' tried instead on input line 319.
(Font) Font shape `T1/txtt/bx/n' tried instead on input line 333.
[6] [7] [8
]
......@@ -804,18 +804,18 @@ Chapter 4.
]
LaTeX Font Info: Trying to load font information for TS1+txtt on input line
613.
644.
(/usr/share/texlive/texmf-dist/tex/latex/txfonts/ts1txtt.fd
File: ts1txtt.fd 2000/12/15 v3.1
)
LaTeX Font Info: Font shape `T1/txtt/m/it' in size <10> not available
(Font) Font shape `T1/txtt/m/sl' tried instead on input line 614.
(Font) Font shape `T1/txtt/m/sl' tried instead on input line 645.
[12]
Chapter 5.
[13
] [14]
Underfull \hbox (badness 7981) in paragraph at lines 798--801
Underfull \hbox (badness 7981) in paragraph at lines 829--832
[]\T1/qtm/m/n/10 #Down-load PRESCOTT from [][]$http : / / gitlab . lcqb . upmc
. fr / tekpinar / PRESCOTT$[][] repos-i-tory and go in-side the
[]
......@@ -846,7 +846,7 @@ usr/share/texlive/texmf-dist/fonts/type1/public/txfonts/t1xbtt.pfb></usr/share/
texlive/texmf-dist/fonts/type1/public/txfonts/t1xtt.pfb></usr/share/texlive/tex
mf-dist/fonts/type1/public/txfonts/t1xtt.pfb></usr/share/texlive/texmf-dist/fon
ts/type1/public/txfonts/tcxtt.pfb>
Output written on prescott.pdf (21 pages, 172069 bytes).
Output written on prescott.pdf (21 pages, 173782 bytes).
PDF statistics:
332 PDF objects out of 1000 (max. 8388607)
289 compressed objects within 3 object streams
......
......@@ -61,7 +61,7 @@
\title{prescott}
\date{Oct 25, 2023}
\date{Nov 17, 2023}
\release{1.5.0}
\author{Mustafa Tekpinar}
\newcommand{\sphinxlogo}{\vbox{}}
......@@ -217,16 +217,30 @@ prescott \PYGZhy{}\PYGZhy{}help
Run the program by issuing the following command in a bash terminal:
\begin{sphinxVerbatim}[commandchars=\\\{\}]
prescott \PYGZhy{}e ../data/MLH1\PYGZus{}normPred\PYGZus{}evolCombi.txt \PYGZhy{}g ../data/gnomAD\PYGZus{}v2.1.1\PYGZus{}MLH1\PYGZus{}HUMAN\PYGZus{}ENSG00000076242.csv \PYGZhy{}s ../data/MLH1.fasta
prescott \PYGZhy{}e ../data/MLH1\PYGZus{}normPred\PYGZus{}evolCombi.txt \PYGZhy{}g ../data/gnomAD\PYGZus{}v4.0.0\PYGZus{}MLH1\PYGZus{}HUMAN\PYGZus{}ENSG00000076242.csv \PYGZhy{}s ../data/MLH1.fasta
\end{sphinxVerbatim}
\sphinxAtStartPar
The most important output is prescott\sphinxhyphen{}scores.txt file, which produces
frequecy modified scores for the mutations.
GnomAD v4.0.0 is the most comprehensive, publicly available human population dataset as far as we know. However,
if you would like to use GnomAD v2.1.1, you should specify the version with ‘\textendash{}gnomadversion’ parameter as below:
\begin{sphinxVerbatim}[commandchars=\\\{\}]
prescott \PYGZhy{}e ../data/MLH1\PYGZus{}normPred\PYGZus{}evolCombi.txt \PYGZhy{}g ../data/gnomAD\PYGZus{}v2.1.1\PYGZus{}MLH1\PYGZus{}HUMAN\PYGZus{}ENSG00000076242.csv \PYGZhy{}s ../data/MLH1.fasta \PYGZhy{}\PYGZhy{}gnomadversion \PYG{l+m}{2}
\end{sphinxVerbatim}
\sphinxAtStartPar
The most important output is prescott\sphinxhyphen{}scores.csv file, which produces entire single point mutational landscape for the protein.
\sphinxAtStartPar
In addition, there is a file called prescott\sphinxhyphen{}scores\sphinxhyphen{}details.csv. The file contains all information about the points modulated by population
information coming from gnomad file and non\sphinxhyphen{}modulated variants.
\sphinxAtStartPar
Finally, if you have both pathogenic and benign labels in the gnomad file, there will be a ‘clinvar\sphinxhyphen{}vs\sphinxhyphen{}position.png’ file showing how these labeled
variants are affected by population information.
\sphinxAtStartPar
Please note that the example input files for prescott are in the data
directory of this repository.
Please note that the example input files of MLH1 protein for prescott acalculations are in the data directory of this repository.
\section{Installation}
......@@ -417,6 +431,19 @@ mutation and its predicted effect, separated by a space.
\section{Running several jobs using docker}
\label{\detokenize{docker:running-several-jobs-using-docker}}
\sphinxAtStartPar
If you want to use docker in a more automated way for several proteins,
you can call docker within a bash script.
\begin{sphinxVerbatim}[commandchars=\\\{\}]
sudo docker run \PYGZhy{}\PYGZhy{}rm \PYGZhy{}v \PYG{n+nv}{\PYGZdl{}PWD}:/home/tekpinar/research/lcqb tekpinar/prescott\PYGZhy{}docker:v1.5.0 escott aliBLAT.fasta \PYGZhy{}\PYGZhy{}pdbfile blat\PYGZhy{}af2.pdb
\end{sphinxVerbatim}
\sphinxAtStartPar
Note: It is very important to have aliBLAT.fasta and blat\sphinxhyphen{}af2.pdb files in your local folder when you call docker like an executable.
Typically, I create a folder for each protein that contain the alignment and the structure. Then, I change the path to each folder with ‘cd’
command inside bash script and execute the command above in each local folder.
\sphinxstepscope
......@@ -442,7 +469,7 @@ By default, ESCOTT will only output the combined (independent
and epistatic) scores*:
\sphinxAtStartPar
There are three output files:
Assuming that your fasta sequence has a name ‘myProt’ after ‘\textgreater{}’ character, there will be three output files:
\begin{enumerate}
\sphinxsetlistlabels{\arabic}{enumi}{enumii}{}{.}%
\item {}
......@@ -460,13 +487,15 @@ read it in R or Python but difficult to find the mutations you are interested.
\item {}
\sphinxAtStartPar
myProt\_normPred\_evolCombiTransposed.txt
myProt\_normPred\_evolCombiTransposedRanksorted.csv
\begin{quote}
\sphinxAtStartPar
As the name implies, this is the transposed version of the combined results.
It is easier to find the mutations you are interested in this file. Just
check the row corresponding to the mutation.
As the name implies, this is the transposed and reverse ranksorted version of the combined results.
It is easier to find the mutations you are interested in this file. It can be opened with any spreadsheet
program like MS Excel. Each row is an amino acid in the protein and 20 columns contain mutational effects
of the original amino acid. The values are between 0 and 1. While 0 indicates no effect, 1 indicates a
high impact.
\end{quote}
\item {}
......@@ -475,17 +504,19 @@ myProt\_normPred\_evolCombi.png
\begin{quote}
\sphinxAtStartPar
This is the image file of the combined results. It selects ‘Oranges\_r’
This is the image file of the combined results. It selects ‘turbo\_r’
matplotlib color map by default. You can change it by adding ‘\textendash{}colormap turbo\_r’
for a more fancy look during the esgemme call. It \textendash{}colormap argument accepts
for a more fancy look during the escott call. It \textendash{}colormap argument accepts
all the color maps in matplotlib.
If your query sequence is longer than 500 amino acids, the program may produce
multiple png files, each one containing a 500 residue segment.
\end{quote}
\end{enumerate}
\sphinxAtStartPar
Note*: If you want to see epistatic and independent contributions as well,
you should add ‘\textendash{}verbose true’ argument while calling esgemme.
you should add ‘\textendash{}verbose true’ argument while calling escott.
\section{Selected Single Point or Multiple Point Mutation Calculations}
......
......@@ -16,7 +16,7 @@ Entire Single Point Mutation Landscape Calculations
By default, ESCOTT will only output the combined (independent
and epistatic) scores*:
There are three output files:
Assuming that your fasta sequence has a name 'myProt' after '>' character, there will be three output files:
#. myProt_normPred_evolCombi.txt
......@@ -26,21 +26,25 @@ There are three output files:
in your protein of interest. Since this file is horizontal, it is easy to
read it in R or Python but difficult to find the mutations you are interested.
#. myProt_normPred_evolCombiTransposed.txt
#. myProt_normPred_evolCombiTransposedRanksorted.csv
As the name implies, this is the transposed version of the combined results.
It is easier to find the mutations you are interested in this file. Just
check the row corresponding to the mutation.
As the name implies, this is the transposed and reverse ranksorted version of the combined results.
It is easier to find the mutations you are interested in this file. It can be opened with any spreadsheet
program like MS Excel. Each row is an amino acid in the protein and 20 columns contain mutational effects
of the original amino acid. The values are between 0 and 1. While 0 indicates no effect, 1 indicates a
high impact.
#. myProt_normPred_evolCombi.png
This is the image file of the combined results. It selects 'Oranges_r'
This is the image file of the combined results. It selects 'turbo_r'
matplotlib color map by default. You can change it by adding '--colormap turbo_r'
for a more fancy look during the esgemme call. It --colormap argument accepts
for a more fancy look during the escott call. It --colormap argument accepts
all the color maps in matplotlib.
If your query sequence is longer than 500 amino acids, the program may produce
multiple png files, each one containing a 500 residue segment.
Note*: If you want to see epistatic and independent contributions as well,
you should add '--verbose true' argument while calling esgemme.
you should add '--verbose true' argument while calling escott.
Selected Single Point or Multiple Point Mutation Calculations
-------------------------------------------------------------
......
......@@ -145,3 +145,16 @@ mutation and its predicted effect, separated by a space.
Running several jobs using docker
---------------------------------
If you want to use docker in a more automated way for several proteins,
you can call docker within a bash script.
.. code:: bash
sudo docker run --rm -v $PWD:/home/tekpinar/research/lcqb tekpinar/prescott-docker:v1.5.0 escott aliBLAT.fasta --pdbfile blat-af2.pdb
Note: It is very important to have aliBLAT.fasta and blat-af2.pdb files in your local folder when you call docker like an executable.
Typically, I create a folder for each protein that contain the alignment and the structure. Then, I change the path to each folder with 'cd'
command inside bash script and execute the command above in each local folder.
......@@ -98,13 +98,24 @@ Run the program by issuing the following command in a bash terminal:
.. code:: bash
prescott -e ../data/MLH1_normPred_evolCombi.txt -g ../data/gnomAD_v2.1.1_MLH1_HUMAN_ENSG00000076242.csv -s ../data/MLH1.fasta
prescott -e ../data/MLH1_normPred_evolCombi.txt -g ../data/gnomAD_v4.0.0_MLH1_HUMAN_ENSG00000076242.csv -s ../data/MLH1.fasta
The most important output is prescott-scores.txt file, which produces
frequecy modified scores for the mutations.
GnomAD v4.0.0 is the most comprehensive, publicly available human population dataset as far as we know. However,
if you would like to use GnomAD v2.1.1, you should specify the version with '--gnomadversion' parameter as below:
Please note that the example input files for prescott are in the data
directory of this repository.
.. code:: bash
prescott -e ../data/MLH1_normPred_evolCombi.txt -g ../data/gnomAD_v2.1.1_MLH1_HUMAN_ENSG00000076242.csv -s ../data/MLH1.fasta --gnomadversion 2
The most important output is prescott-scores.csv file, which produces entire single point mutational landscape for the protein.
In addition, there is a file called prescott-scores-details.csv. The file contains all information about the points modulated by population
information coming from gnomad file and non-modulated variants.
Finally, if you have both pathogenic and benign labels in the gnomad file, there will be a 'clinvar-vs-position.png' file showing how these labeled
variants are affected by population information.
Please note that the example input files of MLH1 protein for prescott acalculations are in the data directory of this repository.
Installation
------------
......
#!/bin/bash
prescott -e ../data/MLH1_normPred_evolCombi.txt -g ../data/gnomAD_v2.1.1_MLH1_HUMAN_ENSG00000076242.csv -s ../data/MLH1.fasta
#If you have gnomad data from GnomAD version 2.1.1 or version 3.1.2
#prescott -e ../data/MLH1_normPred_evolCombi.txt -g ../data/gnomAD_v2.1.1_MLH1_HUMAN_ENSG00000076242.csv -s ../data/MLH1.fasta --gnomadversion 2
#If you are using GnomAD v4.0.0 data.
prescott -e ../data/MLH1_normPred_evolCombi.txt -g ../data/gnomAD_v4.0.0_MLH1_HUMAN_ENSG00000076242.csv -s ../data/MLH1.fasta --gnomadversion 4
......@@ -1455,7 +1455,7 @@ def doit(inAli,mutFile,retMet,bFile,fFile,n,N, jetfile, pdbfile, normWeightMode,
# gemmeDFtrans.rename(index=aaAndPosition, inplace=True)
# gemmeDFtrans['pos'] = aaAndPosition
#print(df['pos'])
gemmeDFtrans.to_csv(prot+"_normPred_evolCombiTransposedRanksorted.txt", sep='\t', float_format='%.2f', na_rep='NaN')
gemmeDFtrans.to_csv(prot+"_normPred_evolCombiTransposedRanksorted.csv", float_format='%.2f', na_rep='NaN')
#sys.exit(-1)
else:
......
......@@ -875,7 +875,7 @@ def main():
arrowprops=dict(arrowstyle="->", connectionstyle="arc3"))
plt.xticks(rotation = 90)
plt.ylabel("Ranksorted Score")
plt.ylabel("PR/ESCOTT Score")
plt.xlabel("Position")
plt.legend(loc='upper right')
plt.tight_layout()
......@@ -884,9 +884,35 @@ def main():
print("@> AUC= {:.3f} {:.3f}".format( AUC_ESCOTT, AUC_PRESCOTT))
myBigMergedDF.to_csv(outfile+'-details.csv', index=None)
myBigMergedDF.to_csv(outfile+'.txt', columns=['mutant', 'PRESCOTT'], index=False, header=None, sep=' ')
# myBigMergedDF.to_csv(outfile+'.txt', columns=['mutant', 'PRESCOTT'], index=False, header=None, sep=' ')
with open(outfile+'.csv', 'w') as my_file:
my_file.write(",")
for item in alphabeticalAminoAcidsList:
if(item=='Y'):
my_file.write(item+"\n")
else:
my_file.write(item+",")
for pos in range(len(localResidueList)):
resAndPos = str(localResidueList[pos])+str(pos+1)
my_file.write("{},".format(resAndPos))
for item in alphabeticalAminoAcidsList:
variant = str(localResidueList[pos]).upper()+str(pos+1)+item
if(item=='Y'):
#print(variant)
#print(myBigMergedDF.loc[myBigMergedDF['mutant']==variant, 'PRESCOTT'].values[0])
my_file.write("{:.2f}\n".format(float(myBigMergedDF.loc[myBigMergedDF['mutant']==variant, 'PRESCOTT'].values[0])))
else:
#print(myBigMergedDF.loc[myBigMergedDF['mutant']==variant, 'PRESCOTT'].values[0])
my_file.write("{:.2f},".format(float(myBigMergedDF.loc[myBigMergedDF['mutant']==variant, 'PRESCOTT'].values[0])))
if(os.path.exists(protein+'_singleline.txt')):
os.remove(protein+'_singleline.txt')
if(os.path.exists(protein+'_singleline_1-ranksort.txt')):
os.remove(protein+'_singleline_1-ranksort.txt')
if __name__ == "__main__":
main()
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment