0.6.6 generate_pairs optimized for predict + minor doc edit

parent 060842aa
...@@ -9,13 +9,13 @@ Quick start ...@@ -9,13 +9,13 @@ Quick start
SENSE-PPI can be used to predict pairwise physical interactions between proteins. The simplest input is single a FASTA file with protein sequences. SENSE-PPI can be used to predict pairwise physical interactions between proteins. The simplest input is single a FASTA file with protein sequences.
The output is a .tsv file with all predictions as well as a secondary .tsv file that contains only positive interactions. By default, the predictions are made in "all vs all" manner: all possible protein pairs from the input file are considered. The output is a .tsv file with all predictions as well as a secondary .tsv file that contains only positive interactions. By default, the predictions are made in "all vs all" manner: all possible protein pairs from the input file are considered.
In order to copmute the predictions for all possible pairs from FASTA file, the following command can be used: In order to compute the predictions for all possible pairs from FASTA file, the following command can be used:
.. code-block:: bash .. code-block:: bash
$ senseppi predict proteins.fasta $ senseppi predict proteins.fasta
By default, if no model is provided, the pre-trained model on human PPIs is used. By default, if no model is provided, the pre-trained model on human+worm+chicken+fly PPIs is used.
List of commands List of commands
------------ ------------
......
__version__ = "0.6.5" __version__ = "0.6.6"
__author__ = "Konstantin Volzhenin" __author__ = "Konstantin Volzhenin"
from . import model, commands, esm2_model, dataset, utils, network_utils from . import model, commands, esm2_model, dataset, utils, network_utils
......
from torch.utils.data import DataLoader from torch.utils.data import DataLoader
import pytorch_lightning as pl import pytorch_lightning as pl
from itertools import permutations, product from itertools import combinations
import numpy as np import numpy as np
import pandas as pd import pandas as pd
import pathlib import pathlib
...@@ -52,17 +52,16 @@ def generate_pairs(fasta_file, output_path, with_self=False): ...@@ -52,17 +52,16 @@ def generate_pairs(fasta_file, output_path, with_self=False):
for record in SeqIO.parse(fasta_file, "fasta"): for record in SeqIO.parse(fasta_file, "fasta"):
ids.append(record.id) ids.append(record.id)
all_pairs = combinations(ids, 2)
if with_self: if with_self:
all_pairs = [p for p in product(ids, repeat=2)] all_pairs = list(all_pairs)
else: for id in ids:
all_pairs = [p for p in permutations(ids, 2)] all_pairs.append((id, id))
pairs = [] unique_pairs = set(all_pairs)
for p in all_pairs:
if (p[1], p[0]) not in pairs and (p[0], p[1]) not in pairs:
pairs.append(p)
pairs = pd.DataFrame(pairs, columns=['seq1', 'seq2']) pairs = pd.DataFrame(list(unique_pairs), columns=['seq1', 'seq2'])
pairs.to_csv(output_path, sep='\t', index=False, header=False) pairs.to_csv(output_path, sep='\t', index=False, header=False)
......
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment