Updated README.md file for prescott module.

d68afcdb · Mustafa Tekpinar · 6f4cbe77 · d68afcdb
Commit d68afcdb authored Sep 29, 2023 by Mustafa Tekpinar
Hide whitespace changes
Inline Side-by-side

Showing with 32 additions and 9 deletions

README.md README.md +32 -9

No files found.
--- a/README.md
+++ b/README.md
@@ -9,33 +9,41 @@ It is made up of two main programs: escott and prescott.

 ESCOTT can calculate effects of single point mutations and multiple point mutations. On the other hand, PRESCOTT incorporates
 population frequencies into ESCOTT predictions. Therefore, you need to run ESCOTT first to have predictions of mutational effects. 
-We recommend using PRESCOTT via our web site or our docker image. 
-
-
+We recommend using PRESCOTT package via our web site or our docker image due to its dependencies. 

 ## Input Data Requirements
+### Input Data Requirements for escott

-ESCOTT requires two files:
+escott requires two files:
 * a multiple sequence alignment (MSA) file in fasta format (mandatory):

    Your query protein must be the first sequence in the fasta file. In addition, the query sequence should not contain any gaps. 

-* a structure file in PDB format (optional but recommended)
+* a structure file in PDB format (optional but highly recommended).

 One of the fastest ways to obtain both input MSA and a PDB file is to run colabfold:
 https://colab.research.google.com/github/sokrypton/ColabFold/blob/main/AlphaFold2.ipynb

 Please note that the MSA file produced by colabfold (a3m file) can contain gaps in the query sequence. You have to remove them before using it in PRESCOTT. You can remove the gaps with pragrams that have a GUI, such as ugene (http://ugene.net/) or jalview (https://www.jalview.org/). 

-For testing purpose, you can find example input files for BLAT protein in data/ folder of this repository. 
+For testing purpose, you can find some example input files for BLAT protein in data/ folder of this repository. 
+### Input Data Requirements for prescott
+prescott requires three files:
+* output file of escott (the file ending with ...normPredCombi.txt)
+* a fasta file containing only your query sequence
+* gnomad csv file containing to be downloaded from https://gnomad.broadinstitute.org/ for your protein.

 ## Usage
-### Running the program
+You can find example bash scripts for escott and prescott in examples folder of this repository.
+
+Below, you will find examples of the most basic usage. Consult to the documentation for further details. 
+
+### Running the escott program
 Let's assume that our input MSA is inputAli.fasta and input.pdb is our structure file in PDB format.   

 Run the program by issuing the following command in a bash terminal:
 ```bash
-escott inputAli.fasta --pdbfile input.pdb 
+escott inputAli.fasta -f inputAli.fasta --pdbfile input.pdb 
 ```

 A quick help can be accessed by typing 
@@ -45,7 +53,7 @@ escott --help

 By default, ESCOTT will predict the effect of all possible single mutations at all positions in the 
 query sequence. Alternatively, a set of single or multiple mutations can be given with the option -m.
-Eachline of the file should contain a mutation (e.g. D136R) or combination of mutations separated 
+Each line of the file should contain a mutation (e.g. D136R) or combination of mutations separated 
 by commas (or colons) and ordered according to their positions in the sequence (e.g. D136R,V271A).

 GEMME calls JET2 to compute evolutionary conservation levels. By default, JET2 will retrieve a set
@@ -58,6 +66,21 @@ values obtained over the 10 iterations.
 JET2 configuration file is: default.conf.
 JET2 output file is: myProt_jet.res.

+### Running the prescott program
+A quick help can be accessed by typing 
+```bash
+prescott --help
+```
+Run the program by issuing the following command in a bash terminal:
+```bash
+prescott -e ../data/MLH1_normPred_evolCombi.txt -g ../data/gnomAD_v2.1.1_MLH1_HUMAN_ENSG00000076242.csv -s ../data/MLH1.fasta 
+```
+The most important output is prescott-scores.txt file, which produces frequecy modified scores for the mutations. 
+
+Please note that the example input files are in the data directory of this repository. 
+
+
+
 ## Installation
 PRESCOTT is implemented in Python 3 and R. It has been tested only on Linux. Since PRESCOTT has many dependencies, we recommend using our web site or our docker image. If you are a determined user, you can find the steps required to install it from the source in the following link (or in the docs folder of this repository):