Commit cdece0cd by Mustafa Tekpinar

Updates for GnomAD 4.1.

parent 27078d3c
This source diff could not be displayed because it is too large. You can view the blob instead.
......@@ -72,8 +72,6 @@ a3m and fasta files are too long. We have to shorten them. We can do that with
.. code:: bash
awk 'BEGIN{FS=" "}{if(NF>1) {printf(">%s\n", $1)}else{print $0}}' AKE_nogaps.fasta > AKE_nogaps_short_names.fasta
# Recheck this command if you can remove extra >
Congratulations! Now, you have all the input files required for PRESCOTT:
......
......@@ -60,7 +60,7 @@ folder of this repository.
Below, you will find examples of the most basic usage. Consult to the
documentation for further details.
Running the escott program
Running escott program
~~~~~~~~~~~~~~~~~~~~~~~~~~
Let’s assume that our input MSA is inputAli.fasta and input.pdb is our
......@@ -85,8 +85,8 @@ line of the file should contain a mutation (e.g. D136R) or combination
of mutations separated by colons and ordered according to their
positions in the sequence (e.g. D136R,V271A).
Running the prescott program with gnomAD frequency data
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Running prescott program with gnomAD frequency data
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
A quick help can be accessed by typing
......@@ -98,15 +98,10 @@ Run the program by issuing the following command in a bash terminal:
.. code:: bash
prescott -e ../data/MLH1_normPred_evolCombi.txt -g ../data/gnomAD_v4.0.0_MLH1_HUMAN_ENSG00000076242.csv -s ../data/MLH1.fasta
GnomAD v4.0.0 is the most comprehensive, publicly available human population dataset as far as we know. However,
if you would like to use GnomAD v2.1.1, you should specify the version with '--gnomadversion' parameter as below:
.. code:: bash
prescott -e ../data/MLH1_normPred_evolCombi.txt -g ../data/gnomAD_v4.1.0_MLH1_HUMAN_ENSG00000076242.csv -s ../data/MLH1.fasta
prescott -e ../data/MLH1_normPred_evolCombi.txt -g ../data/gnomAD_v2.1.1_MLH1_HUMAN_ENSG00000076242.csv -s ../data/MLH1.fasta --gnomadversion 2
GnomAD v4.1.0 is the most comprehensive, publicly available human population dataset as far as we are aware of.
The most important output is prescott-scores.csv file, which produces entire single point mutational landscape for the protein.
In addition, there is a file called prescott-scores-details.csv. The file contains all information about the points modulated by population
......@@ -138,7 +133,23 @@ variants are affected by population information.
Please note that the example input files of MLH1 protein for prescott acalculations are in the data directory of this repository.
Running the prescott program with custom frequency data
Running prescott program with older versions of GnomAD data
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
If you would like to use GnomAD v4.0.0 data in prescott, you should specify GnomAD version with '--gnomadversion' parameter as follows:
.. code:: bash
prescott -e ../data/MLH1_normPred_evolCombi.txt -g ../data/gnomAD_v4.0.0_MLH1_HUMAN_ENSG00000076242.csv -s ../data/MLH1.fasta --gnomadversion 40
However, if you would like to use GnomAD v2.1.1, you should specify the version with '--gnomadversion' parameter as below:
.. code:: bash
prescott -e ../data/MLH1_normPred_evolCombi.txt -g ../data/gnomAD_v2.1.1_MLH1_HUMAN_ENSG00000076242.csv -s ../data/MLH1.fasta --gnomadversion 2
Running prescott program with custom frequency data
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
What if you have your own frequencies for a set of missense mutations that you or another researcher measured?
prescott module can do calculations with custom frequency files as well. The first step is to prepare a plain text file with ".txt" extension.
......@@ -188,6 +199,6 @@ and structural model accurately predicts missense effect. medRxiv. doi:10.1101/2
https://www.medrxiv.org/content/10.1101/2024.02.03.24302219v1
.. |License: CC BY-NC-SA 4.0| image:: https://img.shields.io/badge/License-CC%20BY--NC--SA%204.0-lightgrey.svg
:target: http://creativecommons.org/licenses/by-nc-sa/4.0/
.. |License: GPLv3| image:: https://img.shields.io/badge/license-GPLv3-blue
:target: https://www.gnu.org/licenses/gpl-3.0.html
......@@ -4,7 +4,10 @@
#prescott -e ../data/MLH1_normPred_evolCombi.txt -g ../data/gnomAD_v2.1.1_MLH1_HUMAN_ENSG00000076242.csv -s ../data/MLH1.fasta --gnomadversion 2
#If you are using GnomAD v4.0.0 data.
prescott -e ../data/MLH1_normPred_evolCombi.txt -g ../data/gnomAD_v4.0.0_MLH1_HUMAN_ENSG00000076242.csv -s ../data/MLH1.fasta
#prescott -e ../data/MLH1_normPred_evolCombi.txt -g ../data/gnomAD_v4.0.0_MLH1_HUMAN_ENSG00000076242.csv -s ../data/MLH1.fasta --gnomadversion 40
#If you are using GnomAD v4.1.0 data.
prescott -e ../data/MLH1_normPred_evolCombi.txt -g ../data/gnomAD_v4.1.0_MLH1_HUMAN_ENSG00000076242.csv -s ../data/MLH1.fasta
#If you have a custom frequency file
#prescott -e ../data/MLH1_normPred_evolCombi.txt -g ../data/custom-frequency-file.txt -s ../data/MLH1.fasta
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment