Commit f41e2cf7 by Yasser Mohseni

Update

parents
name: null
channels:
- pytorch
- conda-forge
- defaults
dependencies:
- bokeh=1.4
- cmake=3.16 # insures that Gloo library extensions will be built
- cudnn=7.6
- cupti=10.1
- cxx-compiler=1.0 # insures C and C++ compilers are available
- jupyterlab=1.2
- mpi4py=3.0 # installs cuda-aware openmpi
- nccl=2.5
- nodejs=13
- nvcc_linux-64=10.1 # configures environment to be "cuda-aware"
- pip=20.0
- pip:
- mxnet-cu101mkl==1.6.* # MXNET is installed prior to horovod
- -r file:requirements.txt
- python=3.7
- pytorch=1.4
- tensorboard=2.2
- tensorflow-gpu=2.2
- torchvision=0.5
This source diff could not be displayed because it is too large. You can view the blob instead.
This source diff could not be displayed because it is too large. You can view the blob instead.
This source diff could not be displayed because it is too large. You can view the blob instead.
This source diff could not be displayed because it is too large. You can view the blob instead.
This source diff could not be displayed because it is too large. You can view the blob instead.
This source diff could not be displayed because it is too large. You can view the blob instead.
This source diff could not be displayed because it is too large. You can view the blob instead.
This source diff could not be displayed because it is too large. You can view the blob instead.
This source diff could not be displayed because it is too large. You can view the blob instead.
This source diff could not be displayed because it is too large. You can view the blob instead.
This source diff could not be displayed because it is too large. You can view the blob instead.
This source diff could not be displayed because it is too large. You can view the blob instead.
This source diff could not be displayed because it is too large. You can view the blob instead.
This source diff could not be displayed because it is too large. You can view the blob instead.
This source diff could not be displayed because it is too large. You can view the blob instead.
This source diff could not be displayed because it is too large. You can view the blob instead.
This source diff could not be displayed because it is too large. You can view the blob instead.
This source diff could not be displayed because it is too large. You can view the blob instead.
This source diff could not be displayed because it is too large. You can view the blob instead.
This source diff could not be displayed because it is too large. You can view the blob instead.
This source diff could not be displayed because it is too large. You can view the blob instead.
This source diff could not be displayed because it is too large. You can view the blob instead.
This source diff could not be displayed because it is too large. You can view the blob instead.
This source diff could not be displayed because it is too large. You can view the blob instead.
This source diff could not be displayed because it is too large. You can view the blob instead.
This source diff could not be displayed because it is too large. You can view the blob instead.
This source diff could not be displayed because it is too large. You can view the blob instead.
This source diff could not be displayed because it is too large. You can view the blob instead.
This source diff could not be displayed because it is too large. You can view the blob instead.
This source diff could not be displayed because it is too large. You can view the blob instead.
This source diff could not be displayed because it is too large. You can view the blob instead.
This source diff could not be displayed because it is too large. You can view the blob instead.
This source diff could not be displayed because it is too large. You can view the blob instead.
This source diff could not be displayed because it is too large. You can view the blob instead.
This source diff could not be displayed because it is too large. You can view the blob instead.
This source diff could not be displayed because it is too large. You can view the blob instead.
This source diff could not be displayed because it is too large. You can view the blob instead.
This source diff could not be displayed because it is too large. You can view the blob instead.
This source diff could not be displayed because it is too large. You can view the blob instead.
This source diff could not be displayed because it is too large. You can view the blob instead.
This source diff could not be displayed because it is too large. You can view the blob instead.
This source diff could not be displayed because it is too large. You can view the blob instead.
This source diff could not be displayed because it is too large. You can view the blob instead.
This source diff could not be displayed because it is too large. You can view the blob instead.
This source diff could not be displayed because it is too large. You can view the blob instead.
This source diff could not be displayed because it is too large. You can view the blob instead.
This source diff could not be displayed because it is too large. You can view the blob instead.
This source diff could not be displayed because it is too large. You can view the blob instead.
This source diff could not be displayed because it is too large. You can view the blob instead.
This source diff could not be displayed because it is too large. You can view the blob instead.
This source diff could not be displayed because it is too large. You can view the blob instead.
This source diff could not be displayed because it is too large. You can view the blob instead.
This source diff could not be displayed because it is too large. You can view the blob instead.
This source diff could not be displayed because it is too large. You can view the blob instead.
This source diff could not be displayed because it is too large. You can view the blob instead.
This source diff could not be displayed because it is too large. You can view the blob instead.
This source diff could not be displayed because it is too large. You can view the blob instead.
This source diff could not be displayed because it is too large. You can view the blob instead.
This source diff could not be displayed because it is too large. You can view the blob instead.
This source diff could not be displayed because it is too large. You can view the blob instead.
This source diff could not be displayed because it is too large. You can view the blob instead.
This source diff could not be displayed because it is too large. You can view the blob instead.
This source diff could not be displayed because it is too large. You can view the blob instead.
This source diff could not be displayed because it is too large. You can view the blob instead.
This source diff could not be displayed because it is too large. You can view the blob instead.
This source diff could not be displayed because it is too large. You can view the blob instead.
This source diff could not be displayed because it is too large. You can view the blob instead.
This source diff could not be displayed because it is too large. You can view the blob instead.
This source diff could not be displayed because it is too large. You can view the blob instead.
This source diff could not be displayed because it is too large. You can view the blob instead.
This source diff could not be displayed because it is too large. You can view the blob instead.
This source diff could not be displayed because it is too large. You can view the blob instead.
This source diff could not be displayed because it is too large. You can view the blob instead.
This source diff could not be displayed because it is too large. You can view the blob instead.
This source diff could not be displayed because it is too large. You can view the blob instead.
This source diff could not be displayed because it is too large. You can view the blob instead.
This source diff could not be displayed because it is too large. You can view the blob instead.
This source diff could not be displayed because it is too large. You can view the blob instead.
This source diff could not be displayed because it is too large. You can view the blob instead.
This source diff could not be displayed because it is too large. You can view the blob instead.
This source diff could not be displayed because it is too large. You can view the blob instead.
This source diff could not be displayed because it is too large. You can view the blob instead.
This source diff could not be displayed because it is too large. You can view the blob instead.
This source diff could not be displayed because it is too large. You can view the blob instead.
This source diff could not be displayed because it is too large. You can view the blob instead.
This source diff could not be displayed because it is too large. You can view the blob instead.
This source diff could not be displayed because it is too large. You can view the blob instead.
This source diff could not be displayed because it is too large. You can view the blob instead.
This source diff could not be displayed because it is too large. You can view the blob instead.
This source diff could not be displayed because it is too large. You can view the blob instead.
This source diff could not be displayed because it is too large. You can view the blob instead.
This source diff could not be displayed because it is too large. You can view the blob instead.
This source diff could not be displayed because it is too large. You can view the blob instead.
This source diff could not be displayed because it is too large. You can view the blob instead.
This source diff could not be displayed because it is too large. You can view the blob instead.
This source diff could not be displayed because it is too large. You can view the blob instead.
This source diff could not be displayed because it is too large. You can view the blob instead.
This source diff could not be displayed because it is too large. You can view the blob instead.
This source diff could not be displayed because it is too large. You can view the blob instead.
This source diff could not be displayed because it is too large. You can view the blob instead.
This source diff could not be displayed because it is too large. You can view the blob instead.
This source diff could not be displayed because it is too large. You can view the blob instead.
This source diff could not be displayed because it is too large. You can view the blob instead.
This source diff could not be displayed because it is too large. You can view the blob instead.
This source diff could not be displayed because it is too large. You can view the blob instead.
This source diff could not be displayed because it is too large. You can view the blob instead.
This source diff could not be displayed because it is too large. You can view the blob instead.
This source diff could not be displayed because it is too large. You can view the blob instead.
This source diff could not be displayed because it is too large. You can view the blob instead.
This source diff could not be displayed because it is too large. You can view the blob instead.
This source diff could not be displayed because it is too large. You can view the blob instead.
This source diff could not be displayed because it is too large. You can view the blob instead.
This source diff could not be displayed because it is too large. You can view the blob instead.
This source diff could not be displayed because it is too large. You can view the blob instead.
This source diff could not be displayed because it is too large. You can view the blob instead.
This source diff could not be displayed because it is too large. You can view the blob instead.
This source diff could not be displayed because it is too large. You can view the blob instead.
This source diff could not be displayed because it is too large. You can view the blob instead.
This source diff could not be displayed because it is too large. You can view the blob instead.
This source diff could not be displayed because it is too large. You can view the blob instead.
This source diff could not be displayed because it is too large. You can view the blob instead.
This source diff could not be displayed because it is too large. You can view the blob instead.
This source diff could not be displayed because it is too large. You can view the blob instead.
This source diff could not be displayed because it is too large. You can view the blob instead.
This source diff could not be displayed because it is too large. You can view the blob instead.
This source diff could not be displayed because it is too large. You can view the blob instead.
This source diff could not be displayed because it is too large. You can view the blob instead.
This source diff could not be displayed because it is too large. You can view the blob instead.
This source diff could not be displayed because it is too large. You can view the blob instead.
This source diff could not be displayed because it is too large. You can view the blob instead.
This source diff could not be displayed because it is too large. You can view the blob instead.
This source diff could not be displayed because it is too large. You can view the blob instead.
This source diff could not be displayed because it is too large. You can view the blob instead.
This source diff could not be displayed because it is too large. You can view the blob instead.
This source diff could not be displayed because it is too large. You can view the blob instead.
This source diff could not be displayed because it is too large. You can view the blob instead.
This source diff could not be displayed because it is too large. You can view the blob instead.
This source diff could not be displayed because it is too large. You can view the blob instead.
This source diff could not be displayed because it is too large. You can view the blob instead.
This source diff could not be displayed because it is too large. You can view the blob instead.
This source diff could not be displayed because it is too large. You can view the blob instead.
This source diff could not be displayed because it is too large. You can view the blob instead.
This source diff could not be displayed because it is too large. You can view the blob instead.
This source diff could not be displayed because it is too large. You can view the blob instead.
This source diff could not be displayed because it is too large. You can view the blob instead.
This source diff could not be displayed because it is too large. You can view the blob instead.
This source diff could not be displayed because it is too large. You can view the blob instead.
This source diff could not be displayed because it is too large. You can view the blob instead.
This source diff could not be displayed because it is too large. You can view the blob instead.
This source diff could not be displayed because it is too large. You can view the blob instead.
This source diff could not be displayed because it is too large. You can view the blob instead.
This source diff could not be displayed because it is too large. You can view the blob instead.
This source diff could not be displayed because it is too large. You can view the blob instead.
This source diff could not be displayed because it is too large. You can view the blob instead.
This source diff could not be displayed because it is too large. You can view the blob instead.
This source diff could not be displayed because it is too large. You can view the blob instead.
This source diff could not be displayed because it is too large. You can view the blob instead.
This source diff could not be displayed because it is too large. You can view the blob instead.
This source diff could not be displayed because it is too large. You can view the blob instead.
This source diff could not be displayed because it is too large. You can view the blob instead.
This source diff could not be displayed because it is too large. You can view the blob instead.
This source diff could not be displayed because it is too large. You can view the blob instead.
This source diff could not be displayed because it is too large. You can view the blob instead.
This source diff could not be displayed because it is too large. You can view the blob instead.
This source diff could not be displayed because it is too large. You can view the blob instead.
This source diff could not be displayed because it is too large. You can view the blob instead.
This source diff could not be displayed because it is too large. You can view the blob instead.
This source diff could not be displayed because it is too large. You can view the blob instead.
This source diff could not be displayed because it is too large. You can view the blob instead.
This source diff could not be displayed because it is too large. You can view the blob instead.
This source diff could not be displayed because it is too large. You can view the blob instead.
This source diff could not be displayed because it is too large. You can view the blob instead.
This source diff could not be displayed because it is too large. You can view the blob instead.
This source diff could not be displayed because it is too large. You can view the blob instead.
This source diff could not be displayed because it is too large. You can view the blob instead.
This source diff could not be displayed because it is too large. You can view the blob instead.
This source diff could not be displayed because it is too large. You can view the blob instead.
This source diff could not be displayed because it is too large. You can view the blob instead.
This source diff could not be displayed because it is too large. You can view the blob instead.
This source diff could not be displayed because it is too large. You can view the blob instead.
This source diff could not be displayed because it is too large. You can view the blob instead.
This source diff could not be displayed because it is too large. You can view the blob instead.
This source diff could not be displayed because it is too large. You can view the blob instead.
This source diff could not be displayed because it is too large. You can view the blob instead.
This source diff could not be displayed because it is too large. You can view the blob instead.
This source diff could not be displayed because it is too large. You can view the blob instead.
This source diff could not be displayed because it is too large. You can view the blob instead.
This source diff could not be displayed because it is too large. You can view the blob instead.
This source diff could not be displayed because it is too large. You can view the blob instead.
This source diff could not be displayed because it is too large. You can view the blob instead.
This source diff could not be displayed because it is too large. You can view the blob instead.
This source diff could not be displayed because it is too large. You can view the blob instead.
This source diff could not be displayed because it is too large. You can view the blob instead.
This source diff could not be displayed because it is too large. You can view the blob instead.
This source diff could not be displayed because it is too large. You can view the blob instead.
This source diff could not be displayed because it is too large. You can view the blob instead.
This source diff could not be displayed because it is too large. You can view the blob instead.
This source diff could not be displayed because it is too large. You can view the blob instead.
This source diff could not be displayed because it is too large. You can view the blob instead.
This source diff could not be displayed because it is too large. You can view the blob instead.
This source diff could not be displayed because it is too large. You can view the blob instead.
This source diff could not be displayed because it is too large. You can view the blob instead.
This source diff could not be displayed because it is too large. You can view the blob instead.
This source diff could not be displayed because it is too large. You can view the blob instead.
This source diff could not be displayed because it is too large. You can view the blob instead.
This source diff could not be displayed because it is too large. You can view the blob instead.
This source diff could not be displayed because it is too large. You can view the blob instead.
This source diff could not be displayed because it is too large. You can view the blob instead.
This source diff could not be displayed because it is too large. You can view the blob instead.
This source diff could not be displayed because it is too large. You can view the blob instead.
This source diff could not be displayed because it is too large. You can view the blob instead.
This source diff could not be displayed because it is too large. You can view the blob instead.
This source diff could not be displayed because it is too large. You can view the blob instead.
This source diff could not be displayed because it is too large. You can view the blob instead.
This source diff could not be displayed because it is too large. You can view the blob instead.
This source diff could not be displayed because it is too large. You can view the blob instead.
This source diff could not be displayed because it is too large. You can view the blob instead.
This source diff could not be displayed because it is too large. You can view the blob instead.
This source diff could not be displayed because it is too large. You can view the blob instead.
This source diff could not be displayed because it is too large. You can view the blob instead.
This source diff could not be displayed because it is too large. You can view the blob instead.
This source diff could not be displayed because it is too large. You can view the blob instead.
This source diff could not be displayed because it is too large. You can view the blob instead.
This source diff could not be displayed because it is too large. You can view the blob instead.
This source diff could not be displayed because it is too large. You can view the blob instead.
This source diff could not be displayed because it is too large. You can view the blob instead.
This source diff could not be displayed because it is too large. You can view the blob instead.
This source diff could not be displayed because it is too large. You can view the blob instead.
#!/usr/bin/env python3
# -*- coding: utf-8 -*-
"""
Created on Sun Jan 2 20:21:06 2022
@author: yasser
"""
import logging
import os
import sys
from os import path, mkdir, getenv, listdir, remove, system, stat
import pandas as pd
import numpy as np
from prody import *
import glob
import shutil
#import matplotlib.pyplot as plt
import seaborn as sns
from math import exp
from subprocess import CalledProcessError, check_call, call
import traceback
from random import shuffle, random, seed, sample
from numpy import newaxis
import matplotlib.pyplot as plt
import time
from prody import *
import collections
import scr
from numpy import asarray
from sklearn.preprocessing import OneHotEncoder
import subprocess
import load_data as load
import generate_cubes_reduce_channels_multiproc as reduce_channels
from sklearn.preprocessing import MinMaxScaler
logging.basicConfig(filename='manager.log', filemode='w', format='%(levelname)s: %(message)s', level=logging.DEBUG)
mainlog = logging.getLogger('main')
logging.Logger
sys.path.insert(1, '../lib/')
import tools as tl
#sys.path.insert(1, '../Test/')
import test as tst
comp_dir = 'conformations_directory'
target_comp = listdir(comp_dir)
bin_path = "./maps_generator"
v_dim = 24
def mapcomplex(file, pose_class, ch1, ch2, pair, pose):
try:
name = pair+'_'+str(pose)
rec = parsePDB(file).select('protein').select('chain ' + ch1)
rec.setChids('R')
lig = parsePDB(file).select('protein').select('chain ' + ch2)
lig.setChids('L')
writePDB(name+'_r.pdb', rec.toAtomGroup())
writePDB(name+'_l.pdb', lig.toAtomGroup())
writePDB(name+'_complex.pdb', rec.toAtomGroup() + lig.toAtomGroup())
scr.get_scr(name+'_r.pdb', name+'_l.pdb', name+'_complex.pdb', name)
rimcoresup = pd.read_csv(name+'_rimcoresup.csv', header=None, sep=' ')
rec_regions = rimcoresup.loc[rimcoresup[4] == 'receptor']
rec_regions = pd.Series(rec_regions[5].values, index=rec_regions[2]).to_dict()
lig_regions = rimcoresup.loc[rimcoresup[4] == 'ligand']
lig_regions = pd.Series(lig_regions[5].values, index=lig_regions[2]).to_dict()
res_num2name_map_rec = dict(zip(rec.getResnums(),rec.getResnames()))
res_num2name_map_lig = dict(zip(lig.getResnums(),lig.getResnames()))
res_num2coord_map_rec = dict(zip(rec.select('ca').getResnums(),rec.select('ca').getCoords()))
res_num2coord_map_lig = dict(zip(lig.select('ca').getResnums(),lig.select('ca').getCoords()))
L1 = list(set(rec.getResnums()))
res_ind_map_rec = dict([(x,inx) for inx, x in enumerate(sorted(L1))])
L1 = list(set(lig.getResnums()))
res_ind_map_lig = dict([(x,inx+len(res_ind_map_rec)) for inx, x in enumerate(sorted(L1))])
res_inter_rec = [(res_ind_map_rec[x], rec_regions[x], x, 'R', res_num2name_map_rec[x], res_num2coord_map_rec[x])
for x in sorted(list(rec_regions.keys())) if x in res_ind_map_rec]
res_inter_lig = [(res_ind_map_lig[x], lig_regions[x], x, 'L', res_num2name_map_lig[x], res_num2coord_map_lig[x])
for x in sorted(list(lig_regions.keys())) if x in res_ind_map_lig]
reg_type = list(map(lambda x: x[1],res_inter_rec))# + list(map(lambda x: x[1],res_inter_lig))
res_name = list(map(lambda x: [x[4]],res_inter_rec))# + list(map(lambda x: [x[4]],res_inter_lig))
res_pos = list(map(lambda x: x[5],res_inter_rec))# + list(map(lambda x: x[5],res_inter_lig))
#Merge these two files!
with open('resinfo','w') as fh_res:
for x in res_inter_rec:
fh_res.write(str(x[2])+';'+x[3]+'\n')
with open('scrinfo','w') as fh_csr:
for x in res_inter_rec:
fh_csr.write(str(x[2])+';'+x[3]+';'+x[1]+'\n')
if len(res_inter_rec) < 5 or len(res_inter_lig) < 5:
raise Exception('There is no interface!')
mapcommand = [bin_path, "--mode", "map", "-i", name+'_complex.pdb', "--native", "-m", str(v_dim), "-t", "167", "-v", "0.8", "-o", name+'_complex.bin']
call(mapcommand)
dataset_train = load.read_data_set(name+'_complex.bin')
print(dataset_train.maps.shape)
data_norm = dataset_train.maps
X = np.reshape(data_norm, (-1,v_dim,v_dim,v_dim,173))
#Reduce channels to 4
X = reduce_channels.process_map(X)
if X is None:
remove(name+'_complex.bin')
raise Exception('Dimensionality reduction failed!')
y = [int(pose_class)]*len(res_inter_rec)
_obj = (X,y,reg_type,res_pos,res_name,res_inter_rec)
remove(name+'_complex.bin')
remove(name+'_r.pdb')
remove(name+'_l.pdb')
remove(name+'_complex.pdb')
remove(name+'_rimcoresup.csv')
except Exception as e:
remove(name+'_r.pdb')
remove(name+'_l.pdb')
remove(name+'_complex.pdb')
remove(name+'_rimcoresup.csv')
logging.info("Bad interface!" + '\nError message: ' + str(e) +
"\nMore information:\n" + traceback.format_exc())
return None
return _obj
def process_targetcomplex(targetcomplex, comp_dir, report_dict):
try:
logging.info('Processing target' + targetcomplex + ' ...')
predictions_file = open('predictions_SCR_' + targetcomplex , 'w')
fold = 'test'
predictions_file.write('Conf' + '\t' +
'Fold' + '\t' +
'Scores' + '\t' +
'Regions' + '\t' +
'Score' + '\t' +
'Time' + '\t' +
'Class' + '\t' +
'RecLig' + '\t' +
'ResNumber' + '\n')
good_poses = list(map(lambda x: path.basename(x), glob.glob(path.join(comp_dir, targetcomplex, '*'))))
print(good_poses)
for pose in good_poses:
logging.info('Processing conformation ' + pose + ' ...')
file = path.join(comp_dir, targetcomplex, pose)
all_chain_ids = set(parsePDB(file).select('protein').getChids())
for ch1 in all_chain_ids:
try:
test_interface = path.basename(pose).replace('.pdb', '') + '_' + ch1
logging.info('Processing interface ' + test_interface + ' ...')
all_chain_ids_tmp = all_chain_ids.copy()
all_chain_ids_tmp.remove(ch1)
ch2 = ' '.join(list(all_chain_ids_tmp))
_obj = mapcomplex(file, '1', ch1, ch2, targetcomplex, test_interface)
if _obj == None:
raise Exception('No map is generated!')
X_test, y_test, reg_type, res_pos,_,info = _obj
all_scores, start, end = tst.predict(test_interface, X_test, y_test, reg_type, res_pos, info)
if all_scores is None:
raise Exception("Prediction faile!")
test_preds = all_scores.mean()
predictions_file.write(test_interface + '\t' +
str(fold) + '\t' +
','.join(list(map(lambda x: str(x[0]), all_scores))) + '\t' +
','.join(reg_type) + '\t' +
str(test_preds) + '\t' +
str(end-start) + '\t' +
str(y_test[0]) + '\t' +
','.join(list(map(lambda x: x[3], info))) + '\t' +
','.join(list(map(lambda x: str(x[2]), info))) + '\n')
except:
predictions_file.write(test_interface + '\t' +
str(fold) + '\t' +
'0' + '\t' +
'NA' + '\t' +
'0' + '\t' +
'NA' + '\t' +
'NA' + '\t' +
'NA' + '\t' +
'NA' + '\n')
predictions_file.close()
except Exception as e:
logging.info("Bad target complex!" + '\nError message: ' + str(e) +
"\nMore information:\n" + traceback.format_exc())
def manage_pair_files(use_multiprocessing):
tc_cases = []
for tc in target_comp:
tc_cases.append((tc, comp_dir))
report_dict = tl.do_processing(tc_cases, process_targetcomplex, use_multiprocessing)
return report_dict
report_dict = manage_pair_files(False)
\ No newline at end of file
#!/usr/bin/env python3
# -*- coding: utf-8 -*-
"""
Created on Fri Jan 28 16:00:07 2022
@author: mohseni
"""
import numpy as np
channels = {'ALA':['C','N','O','CA','CB'],
'ARG':['C','N','O','CA','CB','CG','CD','NE','CZ','NH1','NH2'],
'ASN':['C','N','O','CA','CB','CG','ND2','OD1'],
'ASP':['C','N','O','CA','CB','CG','OD1','OD2'],
'CYS':['C','N','O','CA','CB','SG'],
'GLN':['C','N','O','CA','CB','CG','CD','NE2','OE1'],
'GLU':['C','N','O','CA','CB','CG','CD','OE1','OE2'],
'GLY':['C','N','O','CA'],
'HIS':['C','N','O','CA','CB','CG','CD2','ND1','CE1','NE2'],
'ILE':['C','N','O','CA','CB','CG1','CG2','CD1'],
'LEU':['C','N','O','CA','CB','CG','CD1','CD2'],
'LYS':['C','N','O','CA','CB','CG','CD','CE','NZ'],
'MET':['C','N','O','CA','CB','CG','SD','CE'],
'PHE':['C','N','O','CA','CB','CG','CD1','CD2','CE1','CE2','CZ'],
'PRO':['C','N','O','CA','CB','CG','CD'],
'SER':['C','N','O','CA','CB','OG'],
'THR':['C','N','O','CA','CB','CG2','OG1'],
'TRP':['C','N','O','CA','CB','CG','CD1','CD2','CE2','CE3','NE1','CZ2','CZ3','CH2'],
'TYR':['C','N','O','CA','CB','CG','CD1','CD2','CE1','CE2','CZ','OH'],
'VAL':['C','N','O','CA','CB','CG1','CG2']}
v_dim = 24
n_channels = 4 + 4 + 2
all_channels = []
for aa, a_vector in channels.items():
all_channels += a_vector
C_index, O_index, N_index, S_index = [], [], [], []
for i,a in enumerate(all_channels):
if a[0] == "C":
C_index.append(i)
if a[0] == "O":
O_index.append(i)
if a[0] == "N":
N_index.append(i)
if a[0] == "S":
S_index.append(i)
def process_map(X):
try:
X_new = np.zeros(X.shape[:-1] + tuple([n_channels]))
X_new[:,:,:,:,0] = X[:,:,:,:,C_index].sum(axis=4)
X_new[:,:,:,:,1] = X[:,:,:,:,N_index].sum(axis=4)
X_new[:,:,:,:,2] = X[:,:,:,:,O_index].sum(axis=4)
X_new[:,:,:,:,3] = X[:,:,:,:,S_index].sum(axis=4)
for i in range(6):
X_new[:,:,:,:,i+4] = X[:,:,:,:,167+i]
except:
return None
return X_new
\ No newline at end of file
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import gzip
import numpy
from six.moves import xrange
#from tensorflow.contrib.learn.python.learn.datasets import base
from tensorflow.python.framework import dtypes
from tensorflow.python.framework import random_seed
import tensorflow as tf
def _read32(bytestream):
dt = numpy.dtype(numpy.uint32)
return numpy.frombuffer(bytestream.read(4), dtype=dt)[0]
def check_dims(f, gridSize, nbDim):
print('Check dimensions ', f.name, flush = True)
with f as bytestream:
headerSize = _read32(bytestream)
magic = _read32(bytestream)
if magic != 7919:
raise ValueError('Invalid magic number %d in maps file: %s' %
(magic, f.name))
rows = _read32(bytestream)
cols = _read32(bytestream)
lays = _read32(bytestream)
assert(rows == gridSize)
assert(cols == gridSize)
assert(lays == gridSize)
chan = _read32(bytestream)
assert(chan == nbDim)
def extract_maps(f):
#print('Extracting', f.name, flush = True)
with f as bytestream:
headerSize = _read32(bytestream)
magic = _read32(bytestream)
if magic != 7919:
raise ValueError('Invalid magic number %d in maps file: %s' %
(magic, f.name))
rows = _read32(bytestream)
#print("rows "+str(rows))
cols = _read32(bytestream)
#print("cols "+str(cols))
lays = _read32(bytestream)
#print("lays "+str(lays))
chan = _read32(bytestream)
#print("chan "+str(chan))
metaSize = _read32(bytestream)
#print("metaSize "+str(metaSize))
num_maps = _read32(bytestream)
#print("num_maps "+str(num_maps))
header_end = bytestream.read(headerSize - 4*8)
if num_maps<=0 :
return None,None
size = int(rows) * int(cols) * int(lays) * int(chan) * int(num_maps)
size += int(metaSize) * int(num_maps)
try :
buf = bytestream.read(size)
except OverflowError :
return None, None
data = numpy.frombuffer(buf, dtype=numpy.uint8)
data = data.reshape(num_maps, -1)
meta = numpy.ascontiguousarray(data[:, -int(metaSize):]).view(dtype=numpy.int32)
ss_dict = {0: -1,
66:0,#B
98:0,#b
67:1,#C
69:2,#E
71:3,#G
72:4,#H
73:5,#I
84:6,#T
}
#meta[:,3] = [ss_dict[x] for x in meta[:,3]] #Y commented!
res_dict = {0:-1,
65:0, #A
67:1, #C
68:2, #D
69:3, #E
70:4, #F
71:5, #G
72:6, #H
73:7, #I
75:8, #K
76:9, #L
77:10,#M
78:11,#N
80:12,#P
81:13,#Q
82:14,#R
83:15,#S
84:16,#T
86:17,#V
87:18,#W
89:19 #Y
}
#meta[:,1] = [res_dict[x] for x in meta[:,1]]
#print(meta[:,3])
#print(meta[:,2])
data = data[:,:-int(metaSize)]
return data , meta
class DataSet(object):
def __init__(self,
maps,
meta,
dtype=dtypes.float32,
seed=None,
prop = 1,
shuffle = False):
# prop means the percentage of maps from the data that are put in the dataset, useful to make the dataset lighter
# when doing that shuffle is useful to take different residue each time
seed1, seed2 = random_seed.get_seed(seed)
numpy.random.seed(seed1 if seed is None else seed2)
dtype = dtypes.as_dtype(dtype).base_dtype
if dtype not in (dtypes.uint8, dtypes.float32, dtypes.float16):
raise TypeError('Invalid map dtype %r, expected uint8 or float32 or float16' %
dtype)
if dtype == dtypes.float32:
maps = maps.astype(numpy.float32)
numpy.multiply(maps, 1.0 / 255.0, out = maps)
if dtype == dtypes.float16:
maps = maps.astype(numpy.float16)
numpy.multiply(maps, 1.0 / 255.0, out = maps)
if shuffle:
perm0 = numpy.arange(maps.shape[0])[:int(maps.shape[0]*prop)]
self._maps = maps[perm0]
self._meta = meta[perm0]
else:
self._maps = maps
self._meta = meta
self._epochs_completed = 0
self._index_in_epoch = 0
self._num_res = self._maps.shape[0]
@property
def maps(self):
return self._maps
@property
def meta(self):
return self._meta
@property
def num_res(self):
return self._num_res
@property
def epochs_completed(self):
return self._epochs_completed
def next_batch(self, batch_size, shuffle=True, select_residue = -1):
"""Return the next `batch_size` examples from this data set."""
# Select residue is not used anymore, just kept for compatibility purposes
start = self._index_in_epoch
# Shuffle for the first epoch
if self._epochs_completed == 0 and start == 0 and shuffle:
perm0 = numpy.arange(self._num_res)
numpy.random.shuffle(perm0)
self._maps = self.maps[perm0]
self._meta = self._meta[perm0] # Go to the next epoch
if start + batch_size > self._num_res:
# Finished epoch
self._epochs_completed += 1
# Get the rest examples in this epoch
rest_num_examples = self._num_res - start
maps_rest_part = self._maps[start:self._num_res]
meta_rest_part = self._meta[start:self._num_res]
# Shuffle the data
if shuffle:
perm = numpy.arange(self._num_res)
numpy.random.shuffle(perm)
self._maps = self.maps[perm]
self._meta = self.meta[perm]
# Start next epoch
start = 0
self._index_in_epoch = batch_size - rest_num_examples
end = self._index_in_epoch
maps_new_part = self._maps[start:end]
meta_new_part = self._meta[start:end]
return numpy.concatenate((maps_rest_part, maps_new_part), axis=0) , numpy.concatenate((meta_rest_part, meta_new_part), axis=0)
else:
self._index_in_epoch += batch_size
end = self._index_in_epoch
return self._maps[start:end], self._meta[start:end]
def append(self, dataSet_):
self._maps = numpy.concatenate((self._maps, dataSet_._maps))
self._meta = numpy.concatenate((self._meta, dataSet_._meta))
self._num_res += dataSet_._num_res
def is_res(self, index, res_code):
if index < self._num_res :
if self._meta[index, 1] == res_code:
return True
else:
print('index = num_res')
return False
def find_next_res(self, index, res_code):
i = index + 1
while (not self.is_res(i, res_code)) and i < self._num_res - 1:
i += 1
if self.is_res(i, res_code):
return i
return -1
def read_data_set(filename,
dtype=dtypes.float32,
seed=None,
shuffle = False,
prop = 1):
local_file = filename
try :
with open(local_file, 'rb') as f:
train_maps,train_meta = extract_maps(f)
if train_maps is None :
return None
train = DataSet(
train_maps, train_meta, dtype=dtype, seed=seed, shuffle = shuffle, prop = prop)
return train
except ValueError :
return None
Conf Fold Scores Regions Score Time Class RecLig ResNumber
af2-multimer_H1106_3_A test 0.15815966,0.47225899,0.6619729,0.34243613,0.69934464,0.24911189,0.73987263,0.77989554,0.28382453,0.27332252,0.34490728,0.54095775,0.1938091,0.76082975,0.16924942,0.45580986,0.4401235,0.8006071,0.7609061,0.37165338,0.44692183,0.63350624,0.28104702,0.3782434,0.400609,0.0855076,0.20164119,0.2395203,0.24166702,0.3560921,0.32557496,0.19873475,0.16239041,0.4464147,0.38357538,0.3407107,0.40222043,0.4166043,0.45678082,0.46500087,0.22739935,0.45479804,0.2691929,0.55450726 R,R,C,R,R,C,R,R,S,R,C,C,C,C,C,R,C,R,R,R,C,R,R,C,R,R,C,S,R,C,R,S,R,S,R,R,C,R,C,C,S,R,C,R 0.40608442 1.346400260925293 1 R,R,R,R,R,R,R,R,R,R,R,R,R,R,R,R,R,R,R,R,R,R,R,R,R,R,R,R,R,R,R,R,R,R,R,R,R,R,R,R,R,R,R,R 40,41,44,45,47,48,49,52,53,54,56,57,58,60,61,62,63,64,65,66,67,68,69,70,75,76,79,80,82,83,86,87,89,90,92,94,95,97,98,101,102,104,105,108
af2-multimer_H1106_3_B test 0.29245356,0.31496918,0.5700532,0.16482458,0.93374944,0.9095075,0.83483255,0.95948464,0.95645744,0.95275676,0.96102035,0.95811564,0.880402,0.91624343,0.37646586,0.81785786,0.3388625,0.7688463,0.7242456,0.92971134,0.8913298,0.9183213,0.23050837,0.8228417,0.60270244,0.46086937,0.92439276,0.96556073,0.65267944,0.9152282,0.91309404,0.75065255,0.14330877,0.5740294,0.6536723,0.66610533,0.9210067,0.8153911,0.88763857,0.7396247,0.9259354,0.77128166,0.58458096,0.7891749,0.19399244,0.7670014,0.88130814,0.5977824 R,C,R,S,R,R,C,C,C,S,S,C,S,C,R,S,S,R,R,R,C,C,C,R,R,S,C,R,S,S,C,S,R,S,R,C,S,C,S,S,R,C,R,S,R,C,S,R 0.7191849 0.2427997589111328 1 R,R,R,R,R,R,R,R,R,R,R,R,R,R,R,R,R,R,R,R,R,R,R,R,R,R,R,R,R,R,R,R,R,R,R,R,R,R,R,R,R,R,R,R,R,R,R,R 1,3,4,5,6,8,9,11,12,13,14,15,16,18,19,20,21,22,28,31,32,34,35,38,39,43,45,48,49,51,52,55,56,70,77,79,80,82,83,85,86,89,94,97,98,101,104,108
af2-multimer_H1106_5_A test 0.1535242,0.5072091,0.5815127,0.23126394,0.6665723,0.27197725,0.6296713,0.7249014,0.32946756,0.4129447,0.29652178,0.56584823,0.19834426,0.763362,0.17817967,0.46319604,0.35225883,0.74155164,0.74608254,0.48741284,0.5438555,0.63963795,0.18552417,0.36671534,0.37741935,0.08951344,0.20488404,0.25155216,0.23253827,0.3768496,0.34520078,0.21791631,0.1644054,0.45157048,0.41221198,0.35784718,0.40234298,0.4164185,0.4913468,0.56175137,0.42087308,0.33733043 R,R,C,R,R,C,R,R,S,R,C,C,C,C,C,R,C,R,R,R,C,R,R,C,R,R,C,S,R,C,R,S,R,S,R,R,C,R,C,C,S,C 0.40832162 0.2245469093322754 1 R,R,R,R,R,R,R,R,R,R,R,R,R,R,R,R,R,R,R,R,R,R,R,R,R,R,R,R,R,R,R,R,R,R,R,R,R,R,R,R,R,R 40,41,44,45,47,48,49,52,53,54,56,57,58,60,61,62,63,64,65,66,67,68,69,70,75,76,79,80,82,83,86,87,89,90,92,94,95,97,98,101,102,105
af2-multimer_H1106_5_B test 0.2845522,0.28952688,0.58401823,0.16019438,0.93168783,0.9033336,0.85128963,0.96483827,0.95379055,0.95735383,0.95858216,0.9621105,0.887648,0.92338556,0.37638,0.8623992,0.34670344,0.66470647,0.73436296,0.93652886,0.89856243,0.91020644,0.22425595,0.85650045,0.61829376,0.43329453,0.9268552,0.9686239,0.6302407,0.91817105,0.9113155,0.75911564,0.14656074,0.5926981,0.6545509,0.67587113,0.9215565,0.83415955,0.8903751,0.75511605,0.9161205,0.795166,0.6508358,0.7971177,0.23682547,0.78460145,0.8857023,0.5881279 R,C,R,S,R,R,C,C,C,S,S,C,S,C,R,S,S,R,R,R,C,C,C,R,R,S,C,R,S,S,C,S,R,S,R,C,S,C,S,S,R,C,R,S,R,C,S,R 0.7232128 0.07693862915039062 1 R,R,R,R,R,R,R,R,R,R,R,R,R,R,R,R,R,R,R,R,R,R,R,R,R,R,R,R,R,R,R,R,R,R,R,R,R,R,R,R,R,R,R,R,R,R,R,R 1,3,4,5,6,8,9,11,12,13,14,15,16,18,19,20,21,22,28,31,32,34,35,38,39,43,45,48,49,51,52,55,56,70,77,79,80,82,83,85,86,89,94,97,98,101,104,108
af2-multimer_H1106_4_A test 0.8067786,0.45138788,0.13705601,0.7735964,0.53844553,0.72906744,0.36825904,0.20052111,0.4701209,0.3550733,0.15219027,0.7076484,0.0751357,0.50883377,0.5018989,0.47432303,0.5885639,0.4810102,0.31858745,0.24639626,0.3831833,0.13858977,0.26066616,0.26630467,0.24647567,0.34337282,0.34375226,0.16631483,0.13334109,0.34581298,0.360383,0.20901813,0.3975431,0.39339894,0.38641167,0.4313885,0.49785134,0.20950921,0.4230762,0.25472656,0.532815 C,R,C,R,R,R,C,R,R,C,C,C,C,R,C,R,R,C,R,C,R,R,C,S,R,C,R,S,R,S,R,R,R,C,R,C,C,S,R,C,R 0.38070318 0.14757657051086426 1 R,R,R,R,R,R,R,R,R,R,R,R,R,R,R,R,R,R,R,R,R,R,R,R,R,R,R,R,R,R,R,R,R,R,R,R,R,R,R,R,R 44,47,48,49,50,52,53,54,56,57,58,60,61,62,63,64,65,67,69,70,75,76,79,80,82,83,86,87,89,90,91,92,94,95,97,98,101,102,104,105,108
af2-multimer_H1106_4_B test 0.3365898,0.27210575,0.77000296,0.3530504,0.94553643,0.87286806,0.96772087,0.8330127,0.96294844,0.9728122,0.9553707,0.9491401,0.96168256,0.87980545,0.88912904,0.580102,0.69417626,0.36220327,0.72357064,0.8239404,0.966692,0.916691,0.92081153,0.5448731,0.8554709,0.52489036,0.62139326,0.9025005,0.94557935,0.7456535,0.9352551,0.9091163,0.73634225,0.22296005,0.54542553,0.58422774,0.8180086,0.89293087,0.8579953,0.9188472,0.84124666,0.90867436,0.7563263,0.727114,0.6801329,0.6971116,0.8711505,0.88466114,0.62878865 R,C,R,S,R,R,R,C,C,C,S,S,C,C,C,R,C,S,R,R,R,C,C,C,R,C,S,C,R,S,S,C,S,R,S,R,C,S,C,S,S,R,R,R,S,R,C,S,C 0.7646253 0.17351818084716797 1 R,R,R,R,R,R,R,R,R,R,R,R,R,R,R,R,R,R,R,R,R,R,R,R,R,R,R,R,R,R,R,R,R,R,R,R,R,R,R,R,R,R,R,R,R,R,R,R,R 1,3,4,5,6,7,8,9,11,12,13,14,15,16,18,19,20,21,22,28,31,32,34,35,38,39,43,45,48,49,51,52,55,56,70,77,79,80,82,83,85,86,89,94,97,98,101,104,108
af2-multimer_H1106_2_A test 0.38571474,0.8393652,0.42178312,0.18430029,0.65094656,0.52959704,0.79263,0.4498829,0.22420268,0.39447647,0.27274665,0.2453598,0.7016983,0.09784882,0.58379835,0.48960775,0.6230186,0.44802487,0.39284456,0.290107,0.28845924,0.40413377,0.1191006,0.2510668,0.31669876,0.23985124,0.34209105,0.26694033,0.13827279,0.12490044,0.37881115,0.2986529,0.26234666,0.31188908,0.4103466,0.4791775,0.4681471,0.53507596,0.25409606,0.39149398,0.34874997,0.5470431 R,C,R,C,R,R,R,C,R,C,C,C,C,C,R,C,R,R,C,R,C,R,R,C,S,R,C,R,S,R,S,R,R,R,C,R,C,C,S,R,C,R 0.38560236 0.07428789138793945 1 R,R,R,R,R,R,R,R,R,R,R,R,R,R,R,R,R,R,R,R,R,R,R,R,R,R,R,R,R,R,R,R,R,R,R,R,R,R,R,R,R,R 40,44,47,48,49,50,52,53,54,56,57,58,60,61,62,63,64,65,67,69,70,75,76,79,80,82,83,86,87,89,90,91,92,94,95,97,98,101,102,104,105,108
af2-multimer_H1106_2_B test 0.34864926,0.25960857,0.7665469,0.27272004,0.932127,0.9570522,0.85634017,0.96698385,0.96644896,0.96215534,0.956522,0.90253586,0.8914846,0.88125277,0.5140031,0.33262587,0.73752636,0.64393693,0.9018432,0.92390424,0.9257663,0.38943195,0.8332341,0.6617192,0.5597714,0.91380864,0.98174876,0.7843204,0.9358032,0.89938056,0.7790568,0.18099883,0.48241115,0.6848434,0.82328063,0.95724666,0.85457045,0.9136446,0.89612585,0.9216434,0.82345366,0.8527153,0.7544685,0.7761986,0.88444173,0.87768805,0.6190072 R,C,R,S,R,R,C,C,C,S,S,C,C,C,R,S,R,R,R,C,C,C,R,C,S,C,R,S,S,C,S,R,S,R,C,S,S,S,S,C,R,R,S,R,C,S,C 0.76470315 0.15354561805725098 1 R,R,R,R,R,R,R,R,R,R,R,R,R,R,R,R,R,R,R,R,R,R,R,R,R,R,R,R,R,R,R,R,R,R,R,R,R,R,R,R,R,R,R,R,R,R,R 1,3,4,5,6,8,9,11,12,13,14,15,16,18,19,21,22,28,31,32,34,35,38,39,43,45,48,49,51,52,55,56,70,77,79,80,82,83,85,86,89,94,97,98,101,104,108
af2-multimer_H1106_1_A test 0.14543971,0.82321316,0.387358,0.22056516,0.2531447,0.7989886,0.41904253,0.27782968,0.3916736,0.30695355,0.12868752,0.6648477,0.095708795,0.4230846,0.47683775,0.6134431,0.48064056,0.45141146,0.22497803,0.25045058,0.37017694,0.1490017,0.25143927,0.2586556,0.34447503,0.33515,0.23281255,0.113449395,0.16047075,0.35865575,0.35195217,0.33793807,0.34422982,0.46787623,0.5168594,0.5925903,0.29133162,0.36407736,0.52939856 R,C,R,C,R,R,C,R,R,C,C,C,C,R,C,R,R,C,R,C,R,R,C,S,R,C,R,S,R,S,R,R,C,R,C,C,S,C,R 0.36422667 0.13948369026184082 1 R,R,R,R,R,R,R,R,R,R,R,R,R,R,R,R,R,R,R,R,R,R,R,R,R,R,R,R,R,R,R,R,R,R,R,R,R,R,R 40,44,47,48,49,52,53,54,56,57,58,60,61,62,63,64,65,67,69,70,75,76,79,80,82,83,86,87,89,90,92,94,95,97,98,101,102,105,108
af2-multimer_H1106_1_B test 0.35119322,0.31339464,0.7666052,0.2775791,0.9238077,0.9521823,0.74951583,0.963935,0.9722168,0.962476,0.9557974,0.9590967,0.9026333,0.8840854,0.6870988,0.6787404,0.33046275,0.7446913,0.75817823,0.9529541,0.92939585,0.92616695,0.34919074,0.83341724,0.7091033,0.5493,0.9023294,0.9711945,0.74714535,0.9165675,0.91290784,0.7045464,0.22045785,0.42127126,0.75389665,0.82415116,0.9109828,0.8528461,0.9065334,0.7669932,0.9079912,0.76540995,0.7906235,0.6722181,0.47325116,0.55407834,0.9152896,0.64560837 R,C,R,S,R,R,C,C,C,S,S,C,C,C,R,C,S,R,R,R,C,C,C,R,C,S,C,R,S,S,C,S,R,S,R,C,S,C,S,S,R,R,R,S,R,C,S,C 0.74832326 0.0740807056427002 1 R,R,R,R,R,R,R,R,R,R,R,R,R,R,R,R,R,R,R,R,R,R,R,R,R,R,R,R,R,R,R,R,R,R,R,R,R,R,R,R,R,R,R,R,R,R,R,R 1,3,4,5,6,8,9,11,12,13,14,15,16,18,19,20,21,22,28,31,32,34,35,38,39,43,45,48,49,51,52,55,56,70,77,79,80,82,83,85,86,89,94,97,98,101,104,108
#!/usr/bin/env python3
# -*- coding: utf-8 -*-
"""
Created on Thu Feb 4 23:49:27 2021
@author: yasser
"""
import re
import os
import sys
from os import path, remove
import glob
sys.path.insert(1, '../lib/')
import tools as tl
def rimcoresup(rsa_rec,rsa_lig,rsa_complex):
'''INPUT: file rsa da NACCESS.
###rASAm: relative ASA in monomer
###rASAc: relative ASA in complex
###Levy model
###deltarASA=rASAm-rASAc
###RIM deltarASA > 0 and rASAc >= 25 and rASAm >= 25 ###corretto da rASAc > 25 and rASAm > 25
###CORE deltarASA > 0 and rASAm >= 25 and rASAc < 25 ###corretto da rASAm > 25 and rASAc <= 25
###SUPPORT deltarASA > 0 and rASAc < 25 and rASAm < 25
OUTPUT:rim, core, support'''
ASA1=[]
resNUMasa1=0
lines = [line.rstrip('\n') for line in open(rsa_rec)]
for lineee in lines:
a=re.split(' ',lineee)
a=list(filter(None, a))
if a[0] == 'RES' and len(a) == 14:
resNUMasa1=(resNUMasa1)+1
restype=a[1]
chain=a[2]
resnumb=a[3]
resnumb=re.findall('\d+', resnumb)
resnumb=resnumb[0]
rASAm=a[5]
ASA1.append((restype,chain,int(resnumb),rASAm,'receptor'))
elif a[0] == 'RES' and len(a) == 13:
resNUMasa1=(resNUMasa1)+1
restype=a[1]
testchain=re.findall('\d+|\D+', a[2])
if len(testchain) == 1:
chain=''
resnumb=int(testchain[0])
elif len(testchain) == 2:
primoterm=testchain[0]
if primoterm.isdigit():
chain=''
resnumb=int(testchain[0])
else:
chain=testchain[0]
resnumb=int(testchain[1])
#resnumb=re.findall('\d+', resnumb)
#resnumb=resnumb[0]
rASAm=a[4]
ASA1.append((restype,chain,int(resnumb),rASAm,'receptor'))
ASA2=[]
resNUMasa2=0
lines = [line.rstrip('\n') for line in open(rsa_lig)]
for lineee in lines:
a=re.split(' ',lineee)
a=list(filter(None, a))
if a[0] == 'RES' and len(a) == 14:
resNUMasa2=(resNUMasa2)+1
restype=a[1]
chain=a[2]
resnumb=a[3]
resnumb=re.findall('\d+', resnumb)
resnumb=resnumb[0]
rASAm=a[5]
ASA2.append((restype,chain,int(resnumb),rASAm,'ligand'))
elif a[0] == 'RES' and len(a) == 13:
resNUMasa2=(resNUMasa2)+1
restype=a[1]
testchain=re.findall('\d+|\D+', a[2])
if len(testchain) == 1:
chain=''
resnumb=int(testchain[0])
elif len(testchain) == 2:
primoterm=testchain[0]
if primoterm.isdigit():
chain=''
resnumb=int(testchain[0])
else:
chain=testchain[0]
resnumb=int(testchain[1])
#resnumb=re.findall('\d+', resnumb)
#resnumb=resnumb[0]
rASAm=a[4]
ASA2.append((restype,chain,int(resnumb),rASAm,'ligand'))
ASAfull=[]
resNUMasafull=0
lines = [line.rstrip('\n') for line in open(rsa_complex)]
for lineee in lines:
a=re.split(' ',lineee)
a=list(filter(None, a))
if a[0] == 'RES' and len(a) == 14:
resNUMasafull=resNUMasafull+1
restype=a[1]
chain=a[2]
resnumb=a[3]
resnumb=re.findall('\d+', resnumb)
resnumb=resnumb[0]
rASAm=a[5]
if resNUMasafull <= len(ASA1):
filename='receptor'
else:
filename='ligand'
ASAfull.append((restype,chain,int(resnumb),rASAm,filename))
elif a[0] == 'RES' and len(a) == 13:
resNUMasafull=resNUMasafull+1
restype=a[1]
testchain=re.findall('\d+|\D+', a[2])
if len(testchain) == 1:
chain=''
resnumb=int(testchain[0])
elif len(testchain) == 2:
primoterm=testchain[0]
if primoterm.isdigit():
chain=''
resnumb=int(testchain[0])
else:
chain=testchain[0]
resnumb=int(testchain[1])
#resnumb=re.findall('\d+', resnumb)
#resnumb=resnumb[0]
rASAm=a[4]
if resNUMasafull <= len(ASA1):
filename='receptor'
else:
filename='ligand'
ASAfull.append((restype,chain,int(resnumb),rASAm,filename))
rim=[]
core=[]
support=[]
for elements in ASAfull:
for x in ASA1:
if elements[0:3] == x[0:3] and elements[4] == x[4]:
rASAm=float(x[3])
rASAc=float(elements[3])
deltarASA=rASAm-rASAc
if deltarASA > 0:
if rASAm < 25:# and rASAc < 25:
support.append(x)
elif rASAm > 25:
if rASAc <= 25:
core.append(x)
else:
rim.append(x)
for x in ASA2:
if elements[0:3] == x[0:3] and elements[4] == x[4]:
rASAm=float(x[3])
rASAc=float(elements[3])
deltarASA=rASAm-rASAc
if deltarASA > 0:
if rASAm < 25:# and rASAc < 25:
support.append(x)
elif rASAm > 25:
if rASAc <= 25:
core.append(x)
else:
rim.append(x)
return rim, core, support
def get_scr(rec, lig, com, name):
if tl.USE_FREESASA:
cmdcompl=tl.FreeSASA_PATH + ' --format=rsa ' + com + ' > ' + path.basename(com.replace('pdb', 'rsa'))
os.system(cmdcompl)
cmdrec=tl.FreeSASA_PATH + ' --format=rsa ' + rec + ' > ' + path.basename(rec.replace('pdb', 'rsa'))
os.system(cmdrec)
cmdlig=tl.FreeSASA_PATH + ' --format=rsa ' + lig + ' > ' + path.basename(lig.replace('pdb', 'rsa'))
os.system(cmdlig)
else:
cmdcompl=tl.NACCESS_PATH + ' ' + com
os.system(cmdcompl)
cmdrec=tl.NACCESS_PATH + ' ' + rec
os.system(cmdrec)
cmdlig=tl.NACCESS_PATH + ' ' + lig
os.system(cmdlig)
# ('GLN', 'B', '44', '55.7', 'receptor')
rim,core,support = rimcoresup(path.basename(rec.replace('pdb', 'rsa')), path.basename(lig.replace('pdb', 'rsa')), path.basename(com.replace('pdb', 'rsa')))
outprimcoresup = open(name+'_rimcoresup.csv', 'w')
for elementrim in rim:
outprimcoresup.write(str((' '.join(map(str,elementrim)))+" R")+"\n") #Rim
for elementcore in core:
outprimcoresup.write(str((' '.join(map(str,elementcore)))+" C")+"\n") #Core
for elementsup in support:
outprimcoresup.write(str((' '.join(map(str,elementsup)))+" S")+"\n") #Support
outprimcoresup.close()
for f in glob.glob("*.rsa"):
try:
remove(f)
except:
continue
for f in glob.glob("*.asa"):
try:
remove(f)
except:
continue
"""
remove(path.basename(rec.replace('pdb', 'rsa')))
remove(path.basename(lig.replace('pdb', 'rsa')))
remove(path.basename(com.replace('pdb', 'rsa')))
remove(path.basename(rec.replace('pdb', 'asa')))
remove(path.basename(lig.replace('pdb', 'asa')))
remove(path.basename(com.replace('pdb', 'asa')))
"""
try:
remove(path.basename(rec.replace('pdb', 'log')))
remove(path.basename(lig.replace('pdb', 'log')))
remove(path.basename(com.replace('pdb', 'log')))
except:
pass
########################
\ No newline at end of file
#!/usr/bin/env python3
# -*- coding: utf-8 -*-
"""
Created on Sun Dec 13 21:09:56 2020
@author: yasser
"""
import logging
import os
import sys
import gc
from os import path, mkdir, getenv, listdir, remove, system, stat
import pandas as pd
import numpy as np
import glob
import seaborn as sns
from math import exp
from subprocess import CalledProcessError, check_call
import traceback
from random import shuffle, random, seed, sample
from numpy import newaxis
import matplotlib.pyplot as plt
import time
from numpy import asarray
from sklearn.preprocessing import OneHotEncoder
import tensorflow.keras
from tensorflow.keras import backend as K
from tensorflow.keras.models import load_model
from tensorflow.keras.layers import Dot
from tensorflow.keras.backend import ones, ones_like
from tensorflow.keras.optimizers import Adam
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error, accuracy_score, roc_auc_score, roc_curve, precision_recall_curve
from sklearn.preprocessing import MinMaxScaler
sys.path.insert(1, '../lib/')
import tools as tl
print('Your python version: {}'.format(sys.version_info.major))
USE_TENSORFLOW_AS_BACKEND = True
if USE_TENSORFLOW_AS_BACKEND:
os.environ['KERAS_BACKEND'] = 'tensorflow'
else:
os.environ['KERAS_BACKEND'] = 'theano'
if tl.FORCE_CPU:
os.environ["CUDA_DEVICE_ORDER"] = "PCI_BUS_ID" # see issue #152
os.environ["CUDA_VISIBLE_DEVICES"] = ""
if USE_TENSORFLOW_AS_BACKEND == True:
import tensorflow as tf
print('Your tensorflow version: {}'.format(tf.__version__))
if not tl.FORCE_CPU:
print("GPU : "+tf.test.gpu_device_name())
physical_devices = tf.config.experimental.list_physical_devices('GPU')
#tf.config.experimental.set_memory_growth(physical_devices[0], True)
else:
import theano
print('Your theano version: {}'.format(theano.__version__))
seed(int(np.round(np.random.random()*10)))
#################################################################################################
v_dim = 24
#model = load_model(path.join('../Models', 'Dockground', '0_model'))
model = load_model(path.join('../Models', 'ALL_20_model'))
encoder = OneHotEncoder(sparse=False, handle_unknown='ignore')
onehot = encoder.fit(np.asarray([['S'], ['C'], ['R']]))
def predict(test_interface, X_test, y_test, reg_type, res_pos, info):
try:
print('Prediction for ' + test_interface)
X_aux = encoder.transform(list(map(lambda x: [x], reg_type)))
if len(X_test) == 0 or len(X_aux) != len(X_test):
#raise Exception("Not compatible features!")
return None, None, None
start = time.time()
all_scores = model.predict([X_test, X_aux], batch_size=X_test.shape[0])
end = time.time()
_ = gc.collect()
except Exception as e:
#logging.info("Bad target complex!" + '\nError message: ' + str(e) +
# "\nMore information:\n" + traceback.format_exc())
return None, None, None
return all_scores, start, end
\ No newline at end of file
MIT License
Copyright (c) 2022 Yasser Mohseni
Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:
The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE.
### Contents
- [Overview](#overview)
- [Requirements](#Requirements)
- [Tutorial](#Tutorial)
- [License](./LICENSE)
## Citation:
```
@article {Mohseni Behbahani2022.04.05.487134,
author = {Mohseni Behbahani, Yasser and Crouzet, Simon and Laine, {\'E}lodie and Carbone, Alessandra},
title = {Deep Local Analysis evaluates protein docking conformations with locally oriented cubes},
elocation-id = {2022.04.05.487134},
year = {2022},
doi = {10.1101/2022.04.05.487134},
publisher = {Cold Spring Harbor Laboratory},
URL = {https://www.biorxiv.org/content/early/2022/04/06/2022.04.05.487134},
eprint = {https://www.biorxiv.org/content/early/2022/04/06/2022.04.05.487134.full.pdf},
journal = {bioRxiv}
}
```
## Overview
![](Images/method5.svg.png?raw=true "DLA-Ranker")
Deep Local Analysis (DLA)-Ranker is a deep learning framework applying 3D convolutions to a set of locally oriented cubes representing the protein interface. It explicitly considers the local geometry of
the interfacial residues along with their neighboring atoms and the regions of the interface with different solvent accessibility. DLA-Ranker identifies near-native conformations and discovers alternative interfaces from ensembles generated by molecular docking.
#### Features:
- Useful APIs for fast preprocessing of huge assembly of the complex conformations and classify them based on CAPRI criteria.
- Representation of an interface as a set of locally oriented cubes.
- *Atomic density map as a 3D gird.*
- *Structure class based on solvant accessibility (Support, Core, Rim).*
- *Information on Receptor and Ligand.*
- *Information of interfacial residues.*
- Classification of docking conformations based on CAPRI criteria (Incorrect, Acceptable, Medium, High quality)
- Fast generation of cubes and and evaluation of interface.
- Training and testing 3D-CNN models.
- Support various per-score aggregation schemes.
- *Considering only subset cubes for evaluation of interface.*
- *Residues from Support or Core regions.*
- *Residues from Core or Rim regions.*
- *Selecting residues exclusively from the receptor or from the ligand.*
- Extraction of embeddings and the topology of the interface for graph representation learning.
## Requirements
#### Packages:
DLA-Ranker can be run on Linux, MacOS, and Windows. We recommend to use DLA-Ranker on the machines with GPU. It requires following packages:
- [FreeSASA](https://github.com/mittinatten/freesasa) or [NACCESS](http://www.bioinf.manchester.ac.uk/naccess/)
- [ProDy] (http://prody.csb.pitt.edu/)
- Python version 3.7 or 3.8.
- Tensorflow version 2.2 or 2.3.
- Cuda-Toolkit
- Scikit-Learn, numpy pandas matplotlib lz4 and tqdm (conda install -c pytorch -c pyg -c conda-forge python=3.9 numpy pandas matplotlib tqdm pytorch pyg scikit-learn cuda-toolkit).
All-in-one: Run conda create --name dla-ranker --file dla-ranker.yml
- For requirements of InteractionGNN please visit its Readme.
## Tutorial
Place the ensemble of conformations in a directory (*e.g. 'conformations_directory'*) like below:
```
Evaluation
|___conformations_directory
|
|___target complex 1
| | Conformation 1
| | Conformation 2
| | ...
|
|___target complex 2
| | Conformation 1
| | Conformation 2
| | ...
|
..........
```
Specify the path to FreeSASA or NACCESS in ```lib/tools.py``` (```FREESASA_PATH``` or ```NACCESS_PATH```). The choice between FreeSASA or NACCESS can be specified in ```lib/tools.py``` (default is ```USE_FREESASA = True```). <br>
<br>
If you have 'Nvidia GPU' on your computer, or execute on 'Google COLAB', set ```FORCE_CPU = False``` in ```lib/tools.py```. Otherwise set ```FORCE_CPU = True``` (default is ```FORCE_CPU=True```). <br>
Run evaluation.py from Evaluation directory. It processes all the target complexes and their conformations and produces a csv file 'predictions_SCR' for each target complex. Each row of the output file belongs to a conformation and it has 9 columns separated by 'tab':
Name of target complex and the conformation (`Conf`) <br>
Fold Id (`Fold`) <br>
Score of each residue (`Scores`) <br>
Region (SCR) of each residue (`Regions`) <br>
Global averaged score of the interface (`Score`) <br>
Processing time (`Time`) <br>
Class of the conformation (`Class`, 0:incorrect, 1: near-native) <br>
Partner (`RecLig`) <br>
Residue number (`ResNumber`; according to PDB) <br>
One can associate the Residues' numbers, regions, scores, and partner to evaluate the interface on a subset of interfacial residues.
#### Acknowledgement
We would like to thank Dr. Sergei Grudinin and his team for helping us with the initial source code of ```maps_generator``` and ```load_data.py```. See [Ornate](https://academic.oup.com/bioinformatics/article/35/18/3313/5341430?login=true).
#!/usr/bin/env python3
# -*- coding: utf-8 -*-
"""
Created on Fri Feb 7 16:33:47 2020
@author: mohseni
"""
import logging
import numpy as np
import pickle
import shutil
import pypdb
import pandas as pd
from prody import *
from os import path, mkdir, remove, getenv, listdir, system
from io import StringIO
import urllib
import re
import glob
from subprocess import CalledProcessError, check_call
import traceback
import sys
import gzip
#========================================================
FORCE_CPU = True # IF YOU *DO* HAVE AN Nvidia GPU on your computer, or execute on Google COLAB, then change this option to False!
USE_FREESASA = True
NACCESS_PATH='naccess'
FreeSASA_PATH='freesasa'
#========================================================
def save_obj(obj, name):
with open(name + '.pkl', 'wb') as f:
pickle.dump(obj, f, pickle.HIGHEST_PROTOCOL)
def load_obj(name):
with open(name + '.pkl', 'rb') as f:
return pickle.load(f)
def do_processing(cases, function, use_multiprocessing):
if use_multiprocessing:
import multiprocessing
max_cpus = 30
manager = multiprocessing.Manager()
report_dict = manager.dict()
pool = multiprocessing.Pool(processes = min(max_cpus, multiprocessing.cpu_count()))
else:
report_dict = dict()
for args in cases:
args += (report_dict,)
if use_multiprocessing:
pool.apply_async(function, args = args)
else:
function(*args)
if use_multiprocessing:
pool.close()
pool.join()
return dict(report_dict)
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment