Default model change: from senseppi.ckpt to fly_worm_human_chicken.ckpt

parent 358ad745
...@@ -127,6 +127,8 @@ dmypy.json ...@@ -127,6 +127,8 @@ dmypy.json
/esm2_embs_3B /esm2_embs_3B
*.sh *.sh
draft.py draft.py
/data/string_species/mmseqs_dbs/ /data/string_species/mmseqs_dbs_orig/
/data/human_virus/all_test_viruses.csv /data/human_virus/all_test_viruses.csv
/esm2_backup /esm2_backup
/data/string_species/mmseqs_dbs/
/data/string_species/mmseqs_dbs_fwh/
...@@ -33,15 +33,17 @@ the interactions are taken from the STRING database (based on seed proteins). ...@@ -33,15 +33,17 @@ the interactions are taken from the STRING database (based on seed proteins).
Predictions are compared with the STRING database. Optionally, the graphs can be constructed. Predictions are compared with the STRING database. Optionally, the graphs can be constructed.
- `create_dataset`: creates a dataset from the STRING database based on the taxonomic ID of the organism. - `create_dataset`: creates a dataset from the STRING database based on the taxonomic ID of the organism.
The package already comes with one pretrained version of the model `fly_worm_human_chiken.ckpt` (checkpoint with weights) that is used by **default** if model path is not specified.
This model was trained on dataset that combined PPIs from D. melanogaster, C. elegans, H. sapiens and G. gallus, and it provides the best performance with respect to the other pretrained models.
The original SENSE-PPI repository contains two models (checkpoints with weights) pretrained on human PPIs: `senseppi.ckpt` and `dscript.ckpt` pretrained on SENSE-PPI and DSCRIPT human datasets respectively. The original SENSE-PPI repository also contains two human-based models pretrained on human PPIs: `senseppi.ckpt` and `dscript.ckpt` pretrained on SENSE-PPI and DSCRIPT human datasets respectively.
- `senseppi.ckpt`: Download from [here](http://gitlab.lcqb.upmc.fr/Konstvv/SENSE-PPI/raw/master/pretrained_models/senseppi.ckpt) - `senseppi.ckpt`: Download from [here](http://gitlab.lcqb.upmc.fr/Konstvv/SENSE-PPI/raw/master/pretrained_models/senseppi.ckpt)
- `dscript.ckpt` : Download from [here](http://gitlab.lcqb.upmc.fr/Konstvv/SENSE-PPI/raw/master/pretrained_models/dscript.ckpt) - `dscript.ckpt` : Download from [here](http://gitlab.lcqb.upmc.fr/Konstvv/SENSE-PPI/raw/master/pretrained_models/dscript.ckpt)
The package already comes with preinstalled model `senseppi.ckpt` that is used by default if model path is not specified. For information about the other models that can be found in the pretrained_models folder, please refer to the original article.
**N.B.**: Both pretrained models were made to work with proteins in range 50-800 amino acids. **N.B.**: All pretrained models were made to work with proteins in range 50-800 amino acids.
In order to cite the original SENSE-PPI paper, please use the following link: https://doi.org/10.1101/2023.09.19.558413 In order to cite the original SENSE-PPI paper, please use the following link: https://doi.org/10.1101/2023.09.19.558413
......
...@@ -34,9 +34,8 @@ Predict ...@@ -34,9 +34,8 @@ Predict
.. code-block:: bash .. code-block:: bash
usage: senseppi <command> [<args>] predict [-h] [-v] [--min_len MIN_LEN] [--max_len MAX_LEN] [--device {cpu,gpu,mps,auto}] [--model_path MODEL_PATH] [--pairs_file PAIRS_FILE] usage: senseppi <command> [<args>] predict [-h] [-v] [--min_len MIN_LEN] [--max_len MAX_LEN] [--device {cpu,gpu,mps,auto}] [--model_path MODEL_PATH] [--pairs_file PAIRS_FILE] [-o OUTPUT] [--with_self] [-p PRED_THRESHOLD]
[-o OUTPUT] [--with_self] [-p PRED_THRESHOLD] [--batch_size BATCH_SIZE] [--output_dir_esm OUTPUT_DIR_ESM] [--num_nodes NUM_NODES] [--batch_size BATCH_SIZE] [--output_dir_esm OUTPUT_DIR_ESM] [--toks_per_batch_esm TOKS_PER_BATCH_ESM]
[--toks_per_batch_esm TOKS_PER_BATCH_ESM]
fasta_file fasta_file
positional arguments: positional arguments:
...@@ -48,29 +47,29 @@ Predict ...@@ -48,29 +47,29 @@ Predict
--min_len MIN_LEN Minimum length of the protein sequence. The sequences with smaller length will not be considered and will be deleted from the fasta file. (Default: 50) --min_len MIN_LEN Minimum length of the protein sequence. The sequences with smaller length will not be considered and will be deleted from the fasta file. (Default: 50)
--max_len MAX_LEN Maximum length of the protein sequence. The sequences with larger length will not be considered and will be deleted from the fasta file. (Default: 800) --max_len MAX_LEN Maximum length of the protein sequence. The sequences with larger length will not be considered and will be deleted from the fasta file. (Default: 800)
--device {cpu,gpu,mps,auto} --device {cpu,gpu,mps,auto}
Device to use for computations. Options include: cpu, gpu, mps (for MacOS), and auto.If not selected the device is set by torch automatically. WARNING: mps Device to use for computations. Options include: cpu, gpu, mps (for MacOS), and auto.If not selected the device is set by torch automatically. WARNING: mps is temporarily disabled, if it is chosen, cpu will
is temporarily disabled, if it is chosen, cpu will be used instead. (Default: auto) be used instead. (Default: auto)
Predict args: Predict args:
--model_path MODEL_PATH --model_path MODEL_PATH
A path to .ckpt file that contains weights to a pretrained model. If None, the preinstalled senseppi.ckpt trained version is used. (Trained on human PPIs) A path to .ckpt file that contains weights to a pretrained model. If None, the preinstalled fly_worm_human_chicken.ckpt trained version is used. (Trained on human PPIs) (Default: None)
(Default: None)
--pairs_file PAIRS_FILE --pairs_file PAIRS_FILE
A path to a .tsv file with pairs of proteins to test (Optional). If not provided, all-to-all pairs will be generated. (Default: None) A path to a .tsv file with pairs of proteins to test (Optional). If not provided, all-to-all pairs will be generated. (Default: None)
-o OUTPUT, --output OUTPUT -o OUTPUT, --output OUTPUT
A path to a file where the predictions will be saved. (.tsv format will be added automatically) (Default: predictions) A path to a file where the predictions will be saved. (.tsv format will be added automatically) (Default: predictions)
--with_self Include self-interactions in the predictions.By default they are not included since they were not part of training but they can be included by setting this --with_self Include self-interactions in the predictions.By default they are not included since they were not part of training but they can be included by setting this flag to True.
flag to True.
-p PRED_THRESHOLD, --pred_threshold PRED_THRESHOLD -p PRED_THRESHOLD, --pred_threshold PRED_THRESHOLD
Prediction threshold to determine interacting pairs that will be written to a separate file. Range: (0, 1). (Default: 0.5) Prediction threshold to determine interacting pairs that will be written to a separate file. Range: (0, 1). (Default: 0.5)
--num_nodes NUM_NODES
Number of nodes to use for launching on a cluster. (Default: 1)
Args_model: Args_model:
--batch_size BATCH_SIZE --batch_size BATCH_SIZE
Batch size for training/testing. (Default: 32) Batch size for training/testing. (Default: 32)
ESM2 model args: ESM2 model args:
ESM2: Extract per-token representations and model outputs for sequences in a FASTA file. The representations are saved in --output_dir_esm folder so they can be reused in ESM2: Extract per-token representations and model outputs for sequences in a FASTA file. The representations are saved in --output_dir_esm folder so they can be reused in multiple runs. In order to reuse the embeddings, make
multiple runs. In order to reuse the embeddings, make sure that --output_dir_esm is set to the correct folder. sure that --output_dir_esm is set to the correct folder.
--output_dir_esm OUTPUT_DIR_ESM --output_dir_esm OUTPUT_DIR_ESM
output directory for extracted representations (Default: esm2_embs_3B) output directory for extracted representations (Default: esm2_embs_3B)
...@@ -83,8 +82,8 @@ Test ...@@ -83,8 +82,8 @@ Test
.. code-block:: bash .. code-block:: bash
usage: senseppi <command> [<args>] test [-h] [-v] [--min_len MIN_LEN] [--max_len MAX_LEN] [--device {cpu,gpu,mps,auto}] [--model_path MODEL_PATH] [-o OUTPUT] usage: senseppi <command> [<args>] test [-h] [-v] [--min_len MIN_LEN] [--max_len MAX_LEN] [--device {cpu,gpu,mps,auto}] [--model_path MODEL_PATH] [-o OUTPUT] [--crop_data_to_model_lims] [--num_nodes NUM_NODES]
[--crop_data_to_model_lims] [--batch_size BATCH_SIZE] [--output_dir_esm OUTPUT_DIR_ESM] [--toks_per_batch_esm TOKS_PER_BATCH_ESM] [--batch_size BATCH_SIZE] [--output_dir_esm OUTPUT_DIR_ESM] [--toks_per_batch_esm TOKS_PER_BATCH_ESM]
pairs_file fasta_file pairs_file fasta_file
positional arguments: positional arguments:
...@@ -97,26 +96,26 @@ Test ...@@ -97,26 +96,26 @@ Test
--min_len MIN_LEN Minimum length of the protein sequence. The sequences with smaller length will not be considered and will be deleted from the fasta file. (Default: 50) --min_len MIN_LEN Minimum length of the protein sequence. The sequences with smaller length will not be considered and will be deleted from the fasta file. (Default: 50)
--max_len MAX_LEN Maximum length of the protein sequence. The sequences with larger length will not be considered and will be deleted from the fasta file. (Default: 800) --max_len MAX_LEN Maximum length of the protein sequence. The sequences with larger length will not be considered and will be deleted from the fasta file. (Default: 800)
--device {cpu,gpu,mps,auto} --device {cpu,gpu,mps,auto}
Device to use for computations. Options include: cpu, gpu, mps (for MacOS), and auto.If not selected the device is set by torch automatically. WARNING: mps Device to use for computations. Options include: cpu, gpu, mps (for MacOS), and auto.If not selected the device is set by torch automatically. WARNING: mps is temporarily disabled, if it is chosen, cpu will
is temporarily disabled, if it is chosen, cpu will be used instead. (Default: auto) be used instead. (Default: auto)
Predict args: Predict args:
--model_path MODEL_PATH --model_path MODEL_PATH
A path to .ckpt file that contains weights to a pretrained model. If None, the preinstalled senseppi.ckpt trained version is used. (Trained on human PPIs) A path to .ckpt file that contains weights to a pretrained model. If None, the preinstalled fly_worm_human_chicken.ckpt trained version is used. (Trained on human PPIs) (Default: None)
(Default: None)
-o OUTPUT, --output OUTPUT -o OUTPUT, --output OUTPUT
A path to a file where the test metrics will be saved. (.tsv format will be added automatically) (Default: test_metrics) A path to a file where the test metrics will be saved. (.tsv format will be added automatically) (Default: test_metrics)
--crop_data_to_model_lims --crop_data_to_model_lims
If set, the data will be cropped to the limits of the model: evaluations will be done only for proteins >50aa and <800aa. WARNING: this will modify the If set, the data will be cropped to the limits of the model: evaluations will be done only for proteins >50aa and <800aa. WARNING: this will modify the original input files.
original input files. --num_nodes NUM_NODES
Number of nodes to use for launching on a cluster. (Default: 1)
Args_model: Args_model:
--batch_size BATCH_SIZE --batch_size BATCH_SIZE
Batch size for training/testing. (Default: 32) Batch size for training/testing. (Default: 32)
ESM2 model args: ESM2 model args:
ESM2: Extract per-token representations and model outputs for sequences in a FASTA file. The representations are saved in --output_dir_esm folder so they can be reused in ESM2: Extract per-token representations and model outputs for sequences in a FASTA file. The representations are saved in --output_dir_esm folder so they can be reused in multiple runs. In order to reuse the embeddings, make
multiple runs. In order to reuse the embeddings, make sure that --output_dir_esm is set to the correct folder. sure that --output_dir_esm is set to the correct folder.
--output_dir_esm OUTPUT_DIR_ESM --output_dir_esm OUTPUT_DIR_ESM
output directory for extracted representations (Default: esm2_embs_3B) output directory for extracted representations (Default: esm2_embs_3B)
...@@ -135,14 +134,12 @@ A dataset for training must be provided as two separate files: ...@@ -135,14 +134,12 @@ A dataset for training must be provided as two separate files:
.. code-block:: bash .. code-block:: bash
usage: senseppi <command> [<args>] train [-h] [-v] [--min_len MIN_LEN] [--max_len MAX_LEN] [--device {cpu,gpu,mps,auto}] [--valid_size VALID_SIZE] [--seed SEED] usage: senseppi <command> [<args>] train [-h] [-v] [--min_len MIN_LEN] [--max_len MAX_LEN] [--device {cpu,gpu,mps,auto}] [--valid_size VALID_SIZE] [--seed SEED] [--num_epochs NUM_EPOCHS] [--num_nodes NUM_NODES]
[--num_epochs NUM_EPOCHS] [--num_devices NUM_DEVICES] [--num_nodes NUM_NODES] [--early_stop EARLY_STOP] [--lr LR] [--early_stop EARLY_STOP] [--lr LR] [--batch_size BATCH_SIZE] [--output_dir_esm OUTPUT_DIR_ESM] [--toks_per_batch_esm TOKS_PER_BATCH_ESM]
[--batch_size BATCH_SIZE] [--output_dir_esm OUTPUT_DIR_ESM] [--toks_per_batch_esm TOKS_PER_BATCH_ESM]
pairs_file fasta_file pairs_file fasta_file
positional arguments: positional arguments:
pairs_file A path to a .tsv file containing training pairs. Required format: 3 tab separated columns: first protein, second protein (protein names have to be present pairs_file A path to a .tsv file containing training pairs. Required format: 3 tab separated columns: first protein, second protein (protein names have to be present in fasta_file), label (0 or 1).
in fasta_file), label (0 or 1).
fasta_file FASTA file on which to extract the ESM2 representations and then train. fasta_file FASTA file on which to extract the ESM2 representations and then train.
options: options:
...@@ -151,8 +148,8 @@ A dataset for training must be provided as two separate files: ...@@ -151,8 +148,8 @@ A dataset for training must be provided as two separate files:
--min_len MIN_LEN Minimum length of the protein sequence. The sequences with smaller length will not be considered and will be deleted from the fasta file. (Default: 50) --min_len MIN_LEN Minimum length of the protein sequence. The sequences with smaller length will not be considered and will be deleted from the fasta file. (Default: 50)
--max_len MAX_LEN Maximum length of the protein sequence. The sequences with larger length will not be considered and will be deleted from the fasta file. (Default: 800) --max_len MAX_LEN Maximum length of the protein sequence. The sequences with larger length will not be considered and will be deleted from the fasta file. (Default: 800)
--device {cpu,gpu,mps,auto} --device {cpu,gpu,mps,auto}
Device to use for computations. Options include: cpu, gpu, mps (for MacOS), and auto.If not selected the device is set by torch automatically. WARNING: mps Device to use for computations. Options include: cpu, gpu, mps (for MacOS), and auto.If not selected the device is set by torch automatically. WARNING: mps is temporarily disabled, if it is chosen, cpu will
is temporarily disabled, if it is chosen, cpu will be used instead. (Default: auto) be used instead. (Default: auto)
Training args: Training args:
Arguments for training the model. Arguments for training the model.
...@@ -162,12 +159,10 @@ A dataset for training must be provided as two separate files: ...@@ -162,12 +159,10 @@ A dataset for training must be provided as two separate files:
--seed SEED Global training seed. (Default: None) --seed SEED Global training seed. (Default: None)
--num_epochs NUM_EPOCHS --num_epochs NUM_EPOCHS
Number of training epochs. (Default: 100) Number of training epochs. (Default: 100)
--num_devices NUM_DEVICES
Number of devices to use for multi GPU training. (Default: 1)
--num_nodes NUM_NODES --num_nodes NUM_NODES
Number of nodes to use for training on a cluster. (Default: 1) Number of nodes to use for training on a cluster. (Default: 1)
--early_stop EARLY_STOP --early_stop EARLY_STOP
Number of epochs to wait before stopping the training (tracking is done with validation loss). By default, the is no early stopping. (Default: None) Number of epochs to wait before stopping the training (tracking is done with validation loss). By default, the is no early stopping. (Default: 10)
Args_model: Args_model:
--lr LR Learning rate for training. Cosine warmup will be applied. (Default: 0.0001) --lr LR Learning rate for training. Cosine warmup will be applied. (Default: 0.0001)
...@@ -175,8 +170,8 @@ A dataset for training must be provided as two separate files: ...@@ -175,8 +170,8 @@ A dataset for training must be provided as two separate files:
Batch size for training/testing. (Default: 32) Batch size for training/testing. (Default: 32)
ESM2 model args: ESM2 model args:
ESM2: Extract per-token representations and model outputs for sequences in a FASTA file. The representations are saved in --output_dir_esm folder so they can be reused in ESM2: Extract per-token representations and model outputs for sequences in a FASTA file. The representations are saved in --output_dir_esm folder so they can be reused in multiple runs. In order to reuse the embeddings, make
multiple runs. In order to reuse the embeddings, make sure that --output_dir_esm is set to the correct folder. sure that --output_dir_esm is set to the correct folder.
--output_dir_esm OUTPUT_DIR_ESM --output_dir_esm OUTPUT_DIR_ESM
output directory for extracted representations (Default: esm2_embs_3B) output directory for extracted representations (Default: esm2_embs_3B)
...@@ -189,9 +184,8 @@ Predict_string ...@@ -189,9 +184,8 @@ Predict_string
.. code-block:: bash .. code-block:: bash
usage: senseppi <command> [<args>] predict_string [-h] [-v] [--min_len MIN_LEN] [--max_len MAX_LEN] [--device {cpu,gpu,mps,auto}] [--model_path MODEL_PATH] [-s SPECIES] [-n NODES] usage: senseppi <command> [<args>] predict_string [-h] [-v] [--min_len MIN_LEN] [--max_len MAX_LEN] [--device {cpu,gpu,mps,auto}] [--model_path MODEL_PATH] [-s SPECIES] [-n NODES] [-r SCORE] [-p PRED_THRESHOLD] [--graphs]
[-r SCORE] [-p PRED_THRESHOLD] [--graphs] [-o OUTPUT] [--network_type {physical,functional}] [-o OUTPUT] [--network_type {physical,functional}] [--delete_proteins DELETE_PROTEINS [DELETE_PROTEINS ...]] [--batch_size BATCH_SIZE] [--output_dir_esm OUTPUT_DIR_ESM]
[--delete_proteins DELETE_PROTEINS [DELETE_PROTEINS ...]] [--batch_size BATCH_SIZE] [--output_dir_esm OUTPUT_DIR_ESM]
[--toks_per_batch_esm TOKS_PER_BATCH_ESM] [--toks_per_batch_esm TOKS_PER_BATCH_ESM]
genes [genes ...] genes [genes ...]
...@@ -204,13 +198,12 @@ Predict_string ...@@ -204,13 +198,12 @@ Predict_string
--min_len MIN_LEN Minimum length of the protein sequence. The sequences with smaller length will not be considered and will be deleted from the fasta file. (Default: 50) --min_len MIN_LEN Minimum length of the protein sequence. The sequences with smaller length will not be considered and will be deleted from the fasta file. (Default: 50)
--max_len MAX_LEN Maximum length of the protein sequence. The sequences with larger length will not be considered and will be deleted from the fasta file. (Default: 800) --max_len MAX_LEN Maximum length of the protein sequence. The sequences with larger length will not be considered and will be deleted from the fasta file. (Default: 800)
--device {cpu,gpu,mps,auto} --device {cpu,gpu,mps,auto}
Device to use for computations. Options include: cpu, gpu, mps (for MacOS), and auto.If not selected the device is set by torch automatically. WARNING: mps Device to use for computations. Options include: cpu, gpu, mps (for MacOS), and auto.If not selected the device is set by torch automatically. WARNING: mps is temporarily disabled, if it is chosen, cpu will
is temporarily disabled, if it is chosen, cpu will be used instead. (Default: auto) be used instead. (Default: auto)
General options: General options:
--model_path MODEL_PATH --model_path MODEL_PATH
A path to .ckpt file that contains weights to a pretrained model. If None, the preinstalled senseppi.ckpt trained version is used. (Trained on human PPIs) A path to .ckpt file that contains weights to a pretrained model. If None, the preinstalled fly_worm_human_chicken.ckpt trained version is used. (Trained on human PPIs) (Default: None)
(Default: None)
-s SPECIES, --species SPECIES -s SPECIES, --species SPECIES
Species from STRING database. Default: H. Sapiens (Default: 9606) Species from STRING database. Default: H. Sapiens (Default: 9606)
-n NODES, --nodes NODES -n NODES, --nodes NODES
...@@ -232,8 +225,8 @@ Predict_string ...@@ -232,8 +225,8 @@ Predict_string
Batch size for training/testing. (Default: 32) Batch size for training/testing. (Default: 32)
ESM2 model args: ESM2 model args:
ESM2: Extract per-token representations and model outputs for sequences in a FASTA file. The representations are saved in --output_dir_esm folder so they can be reused in ESM2: Extract per-token representations and model outputs for sequences in a FASTA file. The representations are saved in --output_dir_esm folder so they can be reused in multiple runs. In order to reuse the embeddings, make
multiple runs. In order to reuse the embeddings, make sure that --output_dir_esm is set to the correct folder. sure that --output_dir_esm is set to the correct folder.
--output_dir_esm OUTPUT_DIR_ESM --output_dir_esm OUTPUT_DIR_ESM
output directory for extracted representations (Default: esm2_embs_3B) output directory for extracted representations (Default: esm2_embs_3B)
...@@ -246,9 +239,8 @@ Create_dataset ...@@ -246,9 +239,8 @@ Create_dataset
.. code-block:: bash .. code-block:: bash
usage: senseppi <command> [<args>] create_dataset [-h] [--interactions INTERACTIONS] [--sequences SEQUENCES] [--not_remove_long_short_proteins] [--min_length MIN_LENGTH] usage: senseppi <command> [<args>] create_dataset [-h] [--interactions INTERACTIONS] [--sequences SEQUENCES] [--not_remove_long_short_proteins] [--min_length MIN_LENGTH] [--max_length MAX_LENGTH]
[--max_length MAX_LENGTH] [--max_positive_pairs MAX_POSITIVE_PAIRS] [--combined_score COMBINED_SCORE] [--max_positive_pairs MAX_POSITIVE_PAIRS] [--combined_score COMBINED_SCORE] [--experimental_score EXPERIMENTAL_SCORE]
[--experimental_score EXPERIMENTAL_SCORE]
species species
positional arguments: positional arguments:
...@@ -267,10 +259,9 @@ Create_dataset ...@@ -267,10 +259,9 @@ Create_dataset
--max_length MAX_LENGTH --max_length MAX_LENGTH
The maximum length of a protein to be included in the dataset. (Default: 800) The maximum length of a protein to be included in the dataset. (Default: 800)
--max_positive_pairs MAX_POSITIVE_PAIRS --max_positive_pairs MAX_POSITIVE_PAIRS
The maximum number of positive pairs to be included in the dataset. If None, all pairs are included. If specified, the pairs are selected based on the The maximum number of positive pairs to be included in the dataset. If None, all pairs are included. If specified, the pairs are selected based on the combined score in STRING. (Default: None)
combined score in STRING. (Default: None)
--combined_score COMBINED_SCORE --combined_score COMBINED_SCORE
The combined score threshold for the pairs extracted from STRING. Ranges from 0 to 1000. (Default: 500) The combined score threshold for the pairs extracted from STRING. Ranges from 0 to 1000. (Default: 500)
--experimental_score EXPERIMENTAL_SCORE --experimental_score EXPERIMENTAL_SCORE
The experimental score threshold for the pairs extracted from STRING. Ranges from 0 to 1000. Default is None, which means that the experimental score is The experimental score threshold for the pairs extracted from STRING. Ranges from 0 to 1000. Default is None, which means that the experimental score is not used. (Default: None)
not used. (Default: None)
__version__ = "0.6.1" __version__ = "0.6.2"
__author__ = "Konstantin Volzhenin" __author__ = "Konstantin Volzhenin"
from . import model, commands, esm2_model, dataset, utils, network_utils from . import model, commands, esm2_model, dataset, utils, network_utils
......
...@@ -71,7 +71,7 @@ def add_args(parser): ...@@ -71,7 +71,7 @@ def add_args(parser):
) )
predict_args.add_argument("--model_path", type=str, default=None, predict_args.add_argument("--model_path", type=str, default=None,
help="A path to .ckpt file that contains weights to a pretrained model. If " help="A path to .ckpt file that contains weights to a pretrained model. If "
"None, the preinstalled senseppi.ckpt trained version is used. " "None, the preinstalled fly_worm_human_chicken.ckpt trained version is used. "
"(Trained on human PPIs)") "(Trained on human PPIs)")
predict_args.add_argument("--pairs_file", type=str, default=None, predict_args.add_argument("--pairs_file", type=str, default=None,
help="A path to a .tsv file with pairs of proteins to test (Optional). If not provided, " help="A path to a .tsv file with pairs of proteins to test (Optional). If not provided, "
......
...@@ -173,7 +173,7 @@ def add_args(parser): ...@@ -173,7 +173,7 @@ def add_args(parser):
"typed (separated by whitespaces).") "typed (separated by whitespaces).")
string_pred_args.add_argument("--model_path", type=str, default=None, string_pred_args.add_argument("--model_path", type=str, default=None,
help="A path to .ckpt file that contains weights to a pretrained model. If " help="A path to .ckpt file that contains weights to a pretrained model. If "
"None, the preinstalled senseppi.ckpt trained version is used. " "None, the preinstalled fly_worm_human_chicken.ckpt trained version is used. "
"(Trained on human PPIs)") "(Trained on human PPIs)")
string_pred_args.add_argument("-s", "--species", type=int, default=9606, string_pred_args.add_argument("-s", "--species", type=int, default=9606,
help="Species from STRING database. Default: H. Sapiens") help="Species from STRING database. Default: H. Sapiens")
......
...@@ -47,7 +47,7 @@ def add_args(parser): ...@@ -47,7 +47,7 @@ def add_args(parser):
) )
test_args.add_argument("--model_path", type=str, default=None, test_args.add_argument("--model_path", type=str, default=None,
help="A path to .ckpt file that contains weights to a pretrained model. If " help="A path to .ckpt file that contains weights to a pretrained model. If "
"None, the preinstalled senseppi.ckpt trained version is used. " "None, the preinstalled fly_worm_human_chicken.ckpt trained version is used. "
"(Trained on human PPIs)") "(Trained on human PPIs)")
test_args.add_argument("-o", "--output", type=str, default="test_metrics", test_args.add_argument("-o", "--output", type=str, default="test_metrics",
help="A path to a file where the test metrics will be saved. " help="A path to a file where the test metrics will be saved. "
......
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment