Commit 771e8bae by DLA

Updates

parent b34ef025
......@@ -8,19 +8,17 @@
## Overview
![](Images/method_mutation.svg.png?raw=true "DLA")
Deep Local Analysis (DLA)-Mutation, contrasts the patterns observed in two small cubes encapsulating the physico-chemical and geometrical environments around the wild-type and the mutant amino acids. The underlying self-supervised model (ssDLA) takes advantage of a large-scale exploration of non-redundant available experimental protein complex structures in the Protein Data Bank (PDB) to learn the fundamental properties of protein-protein interfaces. Using evolutionary constraints and conformational heterogeneity improves the performance of DLA-Mutation.
Deep Local Analysis (DLA)-Mutation, contrasts the patterns observed in two local cubes encapsulating the physico-chemical and geometrical environments around the wild-type and the mutant amino acids. The underlying self-supervised model (ssDLA) takes advantage of a large-scale exploration of non-redundant experimental protein complex structures in the Protein Data Bank (PDB) to learn the fundamental properties of protein-protein interfaces. The evolutionary constraints and conformational heterogeneity improves the performance of DLA-Mutation.
#### Features:
- Useful APIs for fast estimation of changes in binding affinity due to single-point mutations based on a local comparison of atomic patterns found in pairs of cubes around a wild-type residue and its mutant. Beyond the predictive power on the effects of mutations, DLA is a general framework for transferring the knowledge gained from the available non-redundant set of complex protein structures to various tasks. For instance, given a single partially masked cube, it recovers the identity and physico-chemical class of the central residue. Given an ensemble of cubes representing an interface, it predicts the function of the complex.
- Prediction of the changes of binding affinity upon single-point mutation using Siamese architecture.
- Prediction of the changes of binding affinity based on Siamese architecture.
- Transfer the knowledge of protein-protein interfaces.
- Transfer the knowledge of protein-protein interfaces to various down stream tasks. For instance, given a single partially masked cube, it recovers the identity and physico-chemical class of the central residue. Given an ensemble of cubes representing an interface, it predicts the function of the complex.
- Using structural and evolutionary information.
- Fast generation of cubes and evaluation of interface.
- Fast generation of cubes and evaluation of interfaces.
- Training and testing 3D-CNN models.
......@@ -45,7 +43,7 @@ All-in-one: Run conda create --name dla --file dla.yml
ssDLA is a structure-based general purpose model to generate informative representations from the local environments (masked or not-masked) around interfacial residues for downstream tasks.
#### Finding residue-specific patterns
Here we evaluate the pre-trained ssDLA models to predict the type of amino acid from the masked cube.
We can use the pre-trained ssDLA model to predict the type of amino acid given a masked cube.
##### Generating masked locally oriented cubes
- Place the protein complexes in a directory (*e.g. 'Examples/complex_directory'*) like below. The 'complex_list.txt' is a csv file that contains three columns separated by ';': Name of target complex (`Comp`); receptor chain ID(s) (`ch1`), ligand chain ID(s) (`ch2`).
......@@ -63,11 +61,11 @@ Example
```
- Specify the path to FreeSASA or NACCESS in ```lib/tools.py``` (```FREESASA_PATH``` or ```NACCESS_PATH```). The choice between FreeSASA or NACCESS can be specified in ```lib/tools.py``` (default is ```USE_FREESASA = True```).
- If you have 'Nvidia GPU' on your computer, or execute on 'Google COLAB', set ```FORCE_CPU = False``` in ```lib/tools.py```. Otherwise set ```FORCE_CPU = True``` (default is ```FORCE_CPU=False```).
- Specify the type of masking in ```Representation/python generate_cubes_interface.py```. You have the following options:
- Masking a sphere of radius 5A randomly centered on an atom of the central residue. This is the default masking. The ssDLA model is trained by this masking procedure.
- If you have 'Nvidia GPU' on your computer, or execute on 'Google Colab', set ```FORCE_CPU = False``` in ```lib/tools.py```. Otherwise set ```FORCE_CPU = True``` (default is ```FORCE_CPU=False```).
- Specify the type of masking in ```Representation/generate_cubes_interface.py```. You have the following options:
- Masking a sphere of radius 5A randomly centered on an atom of the central residue. This is the default masking. The ssDLA model is trained by this masking option.
- Masking a sphere of radius 3A randomly centered on an atom of the central residue.
- Masking only the side-chain the central residue.
- Masking only the side-chain of the central residue.
- Masking the whole central residue.
- No masking at all.
......@@ -95,7 +93,7 @@ atom types that can be found in amino acids (without the hydrogen). This dimensi
From directory 'Evaluation' run ```python test_xray.py``` or ```python test_xray_4channels.py``` depending on the number of channels.
It processes all the target complexes and produces csv files 'output_xray_wt_mask' ('output_xray_wt_mask_4channels') as the output and 'intermediate_xray_wt_mask_200' ('intermediate_xray_wt_mask_200_4channels') as the embedding vectors. Each row of the output file belongs to an interfacial residue of a target complex and it has 10 columns separated by 'tab':
It processes all the target complexes and produces csv files 'output_xray_wt_mask' ('output_xray_wt_mask_4channels') as the output and 'intermediate_xray_wt_mask_200' ('intermediate_xray_wt_mask_200_4channels') as the embedding vectors. Each row of the output file belongs to an interfacial residue of a target complex and has 10 columns separated by 'tab':
Name of the complex (`complex`) <br>
Residue name (`resname`) <br>
......@@ -103,12 +101,12 @@ Structural region of the residue (`resregion`) <br>
Residue number (`resnumber`; according to PDB) <br>
Residue coordinate position (`respos`) <br>
Receptor or ligand (`partner`) <br>
The prediction vector of size 20 (`prediction`) <br>
The predicted vector of size 20 (`prediction`) <br>
The one-hot encoding of the target residue (`target`) <br>
Entropy of the prediction vector (`entropy`) <br>
Cross-entropy between the prediction vector and target vector (`crossentropy`) <br>
Entropy of the predicted vector (`entropy`) <br>
Cross-entropy between the predicted and target vectors (`crossentropy`) <br>
Each row of the embedding file also belongs to an interfacial residue. Beside the information mentioned above, it has the feature vectors of size 200 extracted from each cube.
Each row of the embedding file also belongs to an interfacial residue. Beside the information mentioned above, it has the feature vectors of size 200 extracted from each cube. This files serves as input for the downstream tasks (transfer learning with frozen weights).
<p align="center">
......
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment