0.5.9 minor changes

parent 973c7875
...@@ -43,4 +43,6 @@ The package already comes with preinstalled model `senseppi.ckpt` that is used b ...@@ -43,4 +43,6 @@ The package already comes with preinstalled model `senseppi.ckpt` that is used b
**N.B.**: Both pretrained models were made to work with proteins in range 50-800 amino acids. **N.B.**: Both pretrained models were made to work with proteins in range 50-800 amino acids.
In order to cite the original SENSE-PPI paper, please use the following link: https://doi.org/10.1101/2023.09.19.558413 In order to cite the original SENSE-PPI paper, please use the following link: https://doi.org/10.1101/2023.09.19.558413
\ No newline at end of file
The documentation for the package can be found [here](https://sense-ppi.readthedocs.io/en/latest/).
\ No newline at end of file
...@@ -22,7 +22,7 @@ List of commands ...@@ -22,7 +22,7 @@ List of commands
There are 5 commands available in the package: There are 5 commands available in the package:
- `train`: trains SENSE-PPI on a given dataset - `train`: trains SENSE-PPI on a given dataset.
- `test`: computes test metrics (AUROC, AUPRC, F1, MCC, Presicion, Recall, Accuracy) on a given dataset - `test`: computes test metrics (AUROC, AUPRC, F1, MCC, Presicion, Recall, Accuracy) on a given dataset
- `predict`: predicts interactions for a given dataset - `predict`: predicts interactions for a given dataset
- `predict_string`: predicts interactions for a given dataset using STRING database: the interactions are taken from the STRING database (based on seed proteins). Predictions are compared with the STRING database. Optionally, the graphs can be constructed. - `predict_string`: predicts interactions for a given dataset using STRING database: the interactions are taken from the STRING database (based on seed proteins). Predictions are compared with the STRING database. Optionally, the graphs can be constructed.
...@@ -127,6 +127,12 @@ Test ...@@ -127,6 +127,12 @@ Test
Train Train
------------ ------------
A dataset for training must be provided as two separate files:
- **pairs_file**: a .tsv file with pairs of proteins and their labels (1 for interacting, 0 for non-interacting)
- **fasta_file**: a FASTA file with protein sequences. The FASTA file is used to extract ESM2 embeddings for each protein. The embeddings are saved in a separate folder so they can be reused in multiple runs. In order to reuse the embeddings, make sure that `--output_dir_esm` is set to the correct folder.
.. code-block:: bash .. code-block:: bash
usage: senseppi <command> [<args>] train [-h] [-v] [--min_len MIN_LEN] [--max_len MAX_LEN] [--device {cpu,gpu,mps,auto}] [--valid_size VALID_SIZE] [--seed SEED] usage: senseppi <command> [<args>] train [-h] [-v] [--min_len MIN_LEN] [--max_len MAX_LEN] [--device {cpu,gpu,mps,auto}] [--valid_size VALID_SIZE] [--seed SEED]
......
__version__ = "0.5.8" __version__ = "0.5.9"
__author__ = "Konstantin Volzhenin" __author__ = "Konstantin Volzhenin"
from . import model, commands, esm2_model, dataset, utils, network_utils from . import model, commands, esm2_model, dataset, utils, network_utils
......
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment