Please note that the MSA file produced by colabfold (a3m file) can contain gaps in the query sequence. You have to remove them before using it in PRESCOTT. You can remove the gaps with pragrams that have a GUI, such as ugene (http://ugene.net/) or jalview (https://www.jalview.org/).
For testing purpose, you can find example input files for BLAT protein in data/ folder of this repository.
For testing purpose, you can find some example input files for BLAT protein in data/ folder of this repository.
### Input Data Requirements for prescott
prescott requires three files:
* output file of escott (the file ending with ...normPredCombi.txt)
* a fasta file containing only your query sequence
* gnomad csv file containing to be downloaded from https://gnomad.broadinstitute.org/ for your protein.
## Usage
### Running the program
You can find example bash scripts for escott and prescott in examples folder of this repository.
Below, you will find examples of the most basic usage. Consult to the documentation for further details.
### Running the escott program
Let's assume that our input MSA is inputAli.fasta and input.pdb is our structure file in PDB format.
Run the program by issuing the following command in a bash terminal:
The most important output is prescott-scores.txt file, which produces frequecy modified scores for the mutations.
Please note that the example input files are in the data directory of this repository.
## Installation
PRESCOTT is implemented in Python 3 and R. It has been tested only on Linux. Since PRESCOTT has many dependencies, we recommend using our web site or our docker image. If you are a determined user, you can find the steps required to install it from the source in the following link (or in the docs folder of this repository):