Commit d1948723 by DLA

Updates

parent 39265aff
This image diff could not be displayed because it is too large. You can view the blob instead.
......@@ -4,11 +4,11 @@
# elodie.laine@sorbonne-universite.fr, alessandra.carbone@lip6.fr
# UMR 7238 Biologie Computationnelle et Quantitative
#
#Deep Local Analysis (DLA)-Mutation is a deep learning framework applying 3D convolutions
#to a set of locally oriented cubes representing the protein interface. It explicitly considers
#the local geometry of the interfacial residues along with their neighboring atoms and the regions of
#the interface with different solvent accessibility. DLA-Ranker identifies near-native conformations
#and discovers alternative interfaces from ensembles generated by molecular docking.
# Deep Local Analysis (DLA)-Mutation, contrasts the patterns observed in two small cubes encapsulating
# the physico-chemical and geometrical environments around the wild-type and the mutant amino acids.
# The underlying self-supervised model (ssDLA) takes advantage of a large-scale exploration of non-redundant
# available experimental protein complex structures in the Protein Data Bank (PDB) to learn the fundamental
# properties of protein-protein interfaces.
#
# This software is governed by the CeCILL license under French law and abiding
# by the rules of distribution of free software. You can use, modify and/or
......
# DLA-Mutation
\ No newline at end of file
### Contents
- [Overview](#overview)
- [Requirements](#Requirements)
- [Tutorial](#Tutorial)
- [License](./LICENSE)
## Overview
![](Images/method_mutation.svg.png?raw=true "DLA")
Deep Local Analysis (DLA)-Mutation, contrasts the patterns observed in two small cubes encapsulating the physico-chemical and geometrical environments around the wild-type and the mutant amino acids. The underlying self-supervised model (ssDLA) takes advantage of a large-scale exploration of non-redundant available experimental protein complex structures in the Protein Data Bank (PDB) to learn the fundamental properties of protein-protein interfaces. Using evolutionary constraints and conformational heterogeneity improves the performance of DLA-Mutation.
#### Features:
- Useful APIs for fast estimation of changes in binding affinity due to single-point mutations based on a local comparison of atomic patterns found in pairs of cubes around a wild-type residue and its mutant. Beyond the predictive power on the effects of mutations, DLA is a general framework for transferring the knowledge gained from the available non-redundant set of complex protein structures to various tasks. For instance, given a single partially masked cube, it recovers the identity and physico-chemical class of the central residue. Given an ensemble of cubes representing an interface, it predicts the function of the complex.
- Prediction of the changes of binding affinity based on Siamese architecture.
- Transfer the knowledge of protein-protein interfaces.
- Using structural and evolutionary information.
- Fast generation of cubes and evaluation of interface.
- Training and testing 3D-CNN models.
## Requirements
#### Packages:
DLA-Ranker can be run on Linux, MacOS, and Windows. We recommend to use DLA-Ranker on the machines with GPU. It requires following packages:
- [FreeSASA](https://github.com/mittinatten/freesasa) or [NACCESS](http://www.bioinf.manchester.ac.uk/naccess/)
- [ProDy](http://prody.csb.pitt.edu/)
- lz4 compression tool
- Python version 3.7 or 3.8.
- Tensorflow version 2.2 or 2.3.
- Cuda-Toolkit
- Scikit-Learn, numpy pandas matplotlib lz4 and tqdm (conda install -c pytorch -c pyg -c conda-forge python=3.9 numpy pandas matplotlib tqdm pytorch pyg scikit-learn cuda-toolkit lz4).
All-in-one: Run conda create --name dla --file dla.yml
## self-supervised Deep Local Analysis (ssDLA)
ssDLA works in two steps:
- Generating a set of masked locally orient cubes representing the interface.
- Running the deep learning framework to:
- *Train: creating a new ssDLA model.*
- *Test: Evaluating pre-trained ssDLA model to predict the type of amino acid from the masked cube.*
- *Encode: Extracting embeddings from the interfacial residues for downstream tasks.*
### Generating masked locally oriented cubes
#### Dataset of conformations:
Place the ensemble of conformations in a directory (*e.g. 'Examples/conformations_directory'*) like below:
```
Example
|___conformations_list.txt
|
|___conformations_directory
|
|___target complex 1
| | Conformation 1
| | Conformation 2
| | ...
|
|___target complex 2
| | Conformation 1
| | Conformation 2
| | ...
|
..........
```
'conformations_list.txt' is a csv file that contains five columns separated by ';': Name of target complex (`Comp`); receptor chain ID(s) (`ch1`), ligand chain ID(s) (`ch2`); Name of the conformation file (`Conf`); class of the conformation (`Class`, 0:incorrect, 1: near-native).
#### Processing the conformations
Specify the path to FreeSASA or NACCESS in ```lib/tools.py``` (```FREESASA_PATH``` or ```NACCESS_PATH```). The choice between FreeSASA or NACCESS can be specified in ```lib/tools.py``` (default is ```USE_FREESASA = True```). <br>
<br>
If you have 'Nvidia GPU' on your computer, or execute on 'Google COLAB', set ```FORCE_CPU = False``` in ```lib/tools.py```. Otherwise set ```FORCE_CPU = True``` (default is ```FORCE_CPU=False```). <br>
<br>
From directory 'Representation' run: ```python generate_cubes.py```
The output will be directory 'map_dir' with the following structure:
```
Example
|___map_dir
|___target complex 1
| |___0
| | | conformation 1
| | | conformation 2
| |
| |___1
| | conformation 3
| | conformation 4
|
|___target complex 2
| |___0
| | | conformation 1
| | | conformation 2
| |
| |___1
| | conformation 3
| | conformation 4
..........
```
Each output represents interface of a conformation and contains a set of local environments (*e.g. atomic density map, structure classes (S,C,R), topology of the interface, ...*)
An atomic density map is a 4 dimensional tensor: a voxelized 3D grid with a size of ```24*24*24```. Each voxel encodes some characteristics of the protein atoms. Namely, the first 167 dimensions correspond to the
atom types that can be found in amino acids (without the hydrogen). This dimension can be reduced to 4 element symbols (C,N,O,S) by running ```python generate_cubes_reduce_channels_multiproc.py``` (ATTENTION: This code overwrites the existing files). Dimension reduction must be applied in order to use models of BM5 as well as the general model.
### Deep learning framework
Following commands will use the trained models that can be found in the directory 'Models'. This directory includes 3 sets of models:
'BM5': 10 models generated following 10-fold cross validation procedure on the 142 dimers of the Docking Benchmakr version 5. The docking conformations had been generated by HADDOCK. See [DeepRank](https://www.nature.com/articles/s41467-021-27396-0).<br>
'Dockground': 4 models generated following 4-fold cross validation procedure on the 59 target complexes of the Dockground database. The docking conformations had been generated by GRAMM-X. See [GNN-Dove] (https://www.frontiersin.org/articles/10.3389/fmolb.2021.647915/full).<br>
'CCD4PPI': 5 models generated following 5-fold cross validation procedure on the 400 target complexes. The conformations are generated by MAXDo.<br>
For detailed information please read the article.
#### Evaluation of interfaces
From directory 'Test' run ```python test.py```
It processes all the target complexes and their conformations and produces csv file 'predictions_SCR'. Each row of the output file belongs to a conformation and it has 9 columns separated by 'tab':
Name of target complex and the conformation (`Conf`) <br>
Fold Id (`Fold`) <br>
Score of each residue (`Scores`) <br>
Region (SCR) of each residue (`Regions`) <br>
Global averaged score of the interface (`Score`) <br>
Processing time (`Time`) <br>
Class of the conformation (`Class`, 0:incorrect, 1: near-native) <br>
Partner (`RecLig`) <br>
Residue number (`ResNumber`; according to PDB) <br>
One can associate the Residues' numbers, regions, scores, and partner to evaluate the interface on a subset of interfacial residues.
#### Extraction of the embeddings
From directory 'Test' run ```python extract_embeddings.py```
It extracts embeddings and the topology for given interfaces and write them in directory 'Examples/intermediate'. For each conformation it produces an output file with the same name. Each row in a file belongs to a residue and includes the its coordinates, its region, and its embedding vector. These files can be used for aggregation of embeddings based on graph-learning.
#### Acknowledgement
We would like to thank Dr. Sergei Grudinin and his team for helping us with the initial source code of ```maps_generator``` and ```load_data.py```. See [Ornate](https://academic.oup.com/bioinformatics/article/35/18/3313/5341430?login=true).
......@@ -25,7 +25,7 @@ from random import shuffle, random, seed, sample
from numpy import newaxis
import matplotlib.pyplot as plt
import time
import deepScoring.load_data as load
import load_data as load
import collections
import scr
......@@ -43,7 +43,7 @@ map_dir_mut = 'map_dir_mut'
map_dir_mut_sep = 'map_dir_mut_sep'
inter_dir_mut = 'inter_dir_mut'
bin_path = "./mapsGenerator/build/maps_generator"
bin_path = "./maps_generator"
v_dim = 24
......@@ -262,4 +262,4 @@ def manage_mut_files(use_multiprocessing):
return report_dict
report_dict = manage_mut_files(False)
\ No newline at end of file
report_dict = manage_mut_files(False)
......@@ -7,7 +7,6 @@ import numpy as np
from prody import *
import glob
import shutil
#import matplotlib.pyplot as plt
import seaborn as sns
from math import exp
from subprocess import CalledProcessError, check_call, call
......@@ -23,7 +22,7 @@ from numpy import asarray
from sklearn.preprocessing import OneHotEncoder
import subprocess
from subprocess import CalledProcessError, check_call
import deepScoring.load_data as load
import load_data as load
from sklearn.preprocessing import MinMaxScaler
sys.path.insert(1, '../lib/')
......@@ -37,7 +36,7 @@ if not path.exists(inter_dir_pdb):
mkdir(inter_dir_pdb)
bin_path = "./mapsGenerator_mask/build/maps_generator"
bin_path = "./maps_generator_masked_randomcenter_sphere_5A"
v_dim = 24
......@@ -169,7 +168,6 @@ def mapcomplex(comp_inter, name, map_path):
if not res_inter_rec or not res_inter_lig:
return [],[],[]
#tl.coarse_grain_pdb('train.pdb')
mapcommand = [bin_path, "--mode", "map", "-i", name+'_train.pdb', "--native", "-m", str(v_dim), "-t", "167", "-v", "0.8", "-o", name+'_train.bin']
call(mapcommand)
......@@ -180,7 +178,6 @@ def mapcomplex(comp_inter, name, map_path):
#print(list(dataset_train.meta[:,0]))
#print(res_name)
#print(list(map(lambda x: x[2],res_inter_rec)) + list(map(lambda x: x[2],res_inter_lig)))
#ddddd
print(dataset_train.maps.shape)
......@@ -289,7 +286,6 @@ def process_com(com_path_in, com, report_dict):
comp_list.append(sam_tup)
except Exception as e:
#dddd
logging.error("Bad interface!" + '\ninter_rec: ' + interface_file_rec + '\ninter_lig: ' + interface_file_lig + '\nError message: ' + str(e) +
"\nMore information:\n" + traceback.format_exc())
report_dict[path.basename(com_path_in)] = comp_list
......@@ -304,7 +300,6 @@ def manage_pdb_files(use_multiprocessing):
com_cases.append((com_path_in, com))
shuffle(com_cases)
#com_cases=com_cases[50000:]
report_dict = tl.do_processing(com_cases, process_com, use_multiprocessing)
return report_dict
......
#!/usr/bin/env python3
# -*- coding: utf-8 -*-
"""
Created on Wed Apr 28 14:42:12 2021
@author: yasser
"""
import glob
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from tensorflow.keras.layers import Input, Conv3D, MaxPooling3D, AveragePooling3D, Layer, BatchNormalization, Add, Lambda, Dense, Flatten, Concatenate
from tensorflow.keras.models import Model, Sequential, load_model
from tensorflow.keras.regularizers import l2
from tensorflow.keras import backend as K
from tensorflow.keras.optimizers import Adam
from subprocess import CalledProcessError, check_call
from random import shuffle, random, seed, sample
import matplotlib.pyplot as plt
from os import path, remove, system
import pickle, sys
from scipy import stats
import shutil
import gc
from sklearn.preprocessing import OneHotEncoder
from random import shuffle, random, seed, sample
seed(int(np.round(np.random.random()*10)))
def save_obj(obj, name):
with open(name + '.pkl', 'wb') as f:
pickle.dump(obj, f, pickle.HIGHEST_PROTOCOL)
def load_obj(name):
with open(name + '.pkl', 'rb') as f:
return pickle.load(f)
def load_map(filename):
#system('copy ' + filename + ' file.pkl.lz4')
#shutil.copyfile(filename, 'file.pkl.lz4')
"""
check_call(
[
'lz4_win64_v1_9_3\lz4.exe', '-d', '-f',
'file.pkl.lz4'
],
stdout=sys.stdout)
"""
#X_wt, X_mut, y, scr, _, _ = load_obj('file')
X_wt, X_mut, y, scr, _, _ = load_obj(filename.replace('.pkl',''))
X_wt = X_wt[:,:,:,:,:167]
X_mut = X_mut[:,:,:,:,:167]
#remove('file.pkl')
#remove('file.pkl.lz4')
return X_wt, X_mut, y, scr
path_training = '../finetune_mapping_scr_gemme_jet/'
gemme_jet_dict = load_obj(path.join(path_training, '../comp_mut_map_gemme_jet'))
backrub_models = list(map(lambda x: x.strip(), open('../split/test_interfaces.txt', 'r').readlines()))
backrub_models_test = []
for br_model in backrub_models:
comp = path.basename(br_model)[:4]
muta = path.basename(br_model).split('--')[1]
comp_clustid = 'NA'
backrub_models_test.append((br_model, comp, comp_clustid, muta))
encoder = OneHotEncoder(sparse=False, handle_unknown='ignore')
onehot = encoder.fit(np.asarray([['SUP'], ['COR'], ['RIM'], ['SUR'], ['INT']]))
model_s = load_model(path.join(path_training,'model_s_50'))
test_preds = []
for backrub_model in backrub_models_test:
br_model = backrub_model[0]
comp = backrub_model[1]
muta = backrub_model[3]
comp_clustid = backrub_model[2]
try:
X_wt, X_mut, y, scr = load_map(br_model)
gemme_value, pc_value, tr_value, freq_value, trace_value = gemme_jet_dict[path.basename(path.dirname(br_model))]
except:
continue
if scr not in ['SUP', 'COR', 'RIM', 'SUR', 'INT']:
continue
scr = list(encoder.transform(np.array([scr]).reshape(-1,1)).reshape(-1))
aux_feat = np.array(scr + [gemme_value, pc_value, tr_value, freq_value, trace_value])
#aux_feat = np.array(scr)
print(aux_feat.shape)
aux_feat = np.array([aux_feat])
print(aux_feat.shape)
print('X_wt', X_wt.shape)
test_preds.append((model_s.predict([X_wt, X_mut, aux_feat])[0][0], y, comp, comp_clustid, br_model))
save_obj(test_preds, 'test_preds')
This source diff could not be displayed because it is too large. You can view the blob instead.
This source diff could not be displayed because it is too large. You can view the blob instead.
# -*- coding: utf-8 -*-
"""
Created on Mon Jan 3 23:39:44 2022
@author: awadmin
"""
import pandas as pd
import numpy as np
from sklearn.utils.class_weight import compute_class_weight
import pickle
list_aa = ['CYS', 'ASP', 'SER', 'GLN', 'LYS', 'ILE', 'PRO',
'THR', 'PHE', 'ASN', 'GLY', 'HIS', 'LEU', 'ARG',
'TRP', 'ALA', 'VAL', 'GLU', 'TYR', 'MET']
list_aa.sort()
def save_obj(obj, name):
with open(name + '.pkl', 'wb') as f:
pickle.dump(obj, f, pickle.HIGHEST_PROTOCOL)
def load_obj(name):
with open(name + '.pkl', 'rb') as f:
return pickle.load(f)
a=pd.read_csv('env_mask5_50epoches_completedatabase_nonorm_good/train_log_0_1',sep='\t')
y=list(map(lambda x: list(map(lambda y: float(y), x.split(','))), a.target.to_list()))
y_integers = np.argmax(y, axis=1)
class_weights = compute_class_weight('balanced', np.unique(y_integers), y_integers)
d_class_weights = dict(enumerate(class_weights))
print(list(zip(list_aa, d_class_weights.values())))
save_obj(d_class_weights, 'class_weights')
......@@ -221,14 +221,9 @@ fhandler_train = open('train_log', 'w')
fhandler_test = open('test_log', 'w')
#List of pair cluster ids for specific complexes (such as Covid RBD and extreme cases of SKEMPI) to be excluded from the training!
# The cluster id of Covid RBD is excluded from the training!
######################################################################
list_clustid = [(178, 440), #2PCC_A_B
(2992, 3676),#3SCJ_A_E
(2889, 264), #1EAW_A_B
(392, 1596), #1BRS_A_D
(3746, 8), #1S1Q_A_B
(2640, 13008)] #1IAR_A_B
list_clustid = [(2992, 3676)] #3SCJ_A_E
def load_chains_cluster(file_pdb_cluster):
"""Reads all PDB chains in clusters.
......@@ -472,4 +467,4 @@ for foldk in range(5):
np.save('epoch_log_val_ll_'+str(foldk)+'_'+str(epoch), np.array(epoch_log_val_ll))
np.save('epoch_log_test_ll_'+str(foldk)+'_'+str(epoch), np.array(epoch_log_test_ll))
fhandler_train_e.close()
fhandler_test_e.close()
\ No newline at end of file
fhandler_test_e.close()
......@@ -221,14 +221,9 @@ fhandler_train = open('train_log', 'w')
fhandler_test = open('test_log', 'w')
#List of pair cluster ids for specific complexes (such as Covid RBD and extreme cases of SKEMPI) to be excluded from the training!
# The cluster id of Covid RBD is excluded from the training!
######################################################################
list_clustid = [(178, 440), #2PCC_A_B
(2992, 3676),#3SCJ_A_E
(2889, 264), #1EAW_A_B
(392, 1596), #1BRS_A_D
(3746, 8), #1S1Q_A_B
(2640, 13008)] #1IAR_A_B
list_clustid = [(2992, 3676)] #3SCJ_A_E
def load_chains_cluster(file_pdb_cluster):
"""Reads all PDB chains in clusters.
......@@ -473,4 +468,4 @@ for foldk in range(5):
np.save('epoch_log_val_ll_'+str(foldk)+'_'+str(epoch), np.array(epoch_log_val_ll))
np.save('epoch_log_test_ll_'+str(foldk)+'_'+str(epoch), np.array(epoch_log_test_ll))
fhandler_train_e.close()
fhandler_test_e.close()
\ No newline at end of file
fhandler_test_e.close()
#!/usr/bin/env python3
# -*- coding: utf-8 -*-
"""
Created on Wed Apr 28 14:42:12 2021
@author: yasser
"""
import glob
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from tensorflow.keras.layers import Input, Conv3D, MaxPooling3D, AveragePooling3D, Layer, BatchNormalization, Add, Lambda, Dense, Flatten, Concatenate
from tensorflow.keras.models import Model, Sequential, load_model
from tensorflow.keras.regularizers import l2
from tensorflow.keras import backend as K
from tensorflow.keras.optimizers import Adam
from subprocess import CalledProcessError, check_call
from random import shuffle, random, seed, sample
import matplotlib.pyplot as plt
from os import path, remove, system
import pickle, sys
from scipy import stats
import shutil
import gc
from sklearn.preprocessing import OneHotEncoder
from random import shuffle, random, seed, sample
seed(int(np.round(np.random.random()*10)))
def save_obj(obj, name):
with open(name + '.pkl', 'wb') as f:
pickle.dump(obj, f, pickle.HIGHEST_PROTOCOL)
def load_obj(name):
with open(name + '.pkl', 'rb') as f:
return pickle.load(f)
model_pretrained = load_model('../models5_nonorm_classweight_Porlineweight08/0_30_model')
def get_siamese_model(input_shape, input_shape_aux):
"""
Model architecture
"""
# Define the tensors for the two input images
left_input = Input(shape=input_shape)
right_input = Input(shape=input_shape)
aux_input = Input(shape=input_shape_aux)
# Convolutional Neural Network
model = Sequential()
model_intermediate = Model(inputs=model_pretrained.input, outputs=model_pretrained.get_layer('layer1').output)
model.add(model_intermediate)
model.add(Dense(100, use_bias=True, activation='elu', kernel_initializer='he_uniform', kernel_regularizer=l2(1e-3))) #For comparison with no-pretraining I need to remove this layer!
model.add(Dense(10, use_bias=True, activation='elu', kernel_initializer='he_uniform', kernel_regularizer=l2(1e-3)))
# Generate the encodings (feature vectors) for the two images
encoded_l = model(left_input)
encoded_r = model(right_input)
# Add a customized layer to compute the absolute difference between the encodings
L1_layer = Lambda(lambda tensors:K.abs(tensors[0] - tensors[1]))
L1_distance = L1_layer([encoded_l, encoded_r])
feat_vector = Concatenate()([L1_distance, aux_input])
# Add a dense layer with a sigmoid unit to generate the similarity score
prediction = Dense(1, use_bias=True, activation='linear')(feat_vector)
# Connect the inputs with the outputs
siamese_net = Model(inputs=[left_input,right_input,aux_input],outputs=prediction)
siamese_net.compile(loss='mean_squared_error', optimizer=Adam(lr=0.001))
siamese_net.summary()
# return the model
return siamese_net
def load_map(filename):
#system('copy ' + filename + ' file.pkl.lz4')
#shutil.copyfile(filename, 'file.pkl.lz4')
"""
check_call(
[
'lz4_win64_v1_9_3\lz4.exe', '-d', '-f',
'file.pkl.lz4'
],
stdout=sys.stdout)
"""
#X_wt, X_mut, y, scr, _, _ = load_obj('file')
X_wt, X_mut, y, scr, _, _ = load_obj(filename.replace('.pkl',''))
X_wt = X_wt[:,:,:,:,:167]
X_mut = X_mut[:,:,:,:,:167]
#remove('file.pkl')
#remove('file.pkl.lz4')
return X_wt, X_mut, y, scr
gemme_jet_dict = load_obj('../comp_mut_map_gemme_jet')
backrub_models = list(map(lambda x: x.strip(), open('../split/train_interfaces.txt', 'r').readlines()))
skempi_homology = {}
skempi_homology['100'] = load_obj('../skempi_homology/skempi_homology_100')
skempi_homology['95'] = load_obj('../skempi_homology/skempi_homology_95')
skempi_homology['90'] = load_obj('../skempi_homology/skempi_homology_90')
skempi_homology['70'] = load_obj('../skempi_homology/skempi_homology_70')
skempi_homology['50'] = load_obj('../skempi_homology/skempi_homology_50')
skempi_homology['30'] = load_obj('../skempi_homology/skempi_homology_30')
encoder = OneHotEncoder(sparse=False, handle_unknown='ignore')
onehot = encoder.fit(np.asarray([['SUP'], ['COR'], ['RIM'], ['SUR'], ['INT']]))
backrub_models_train = []
for br_model in backrub_models:
comp = path.basename(br_model)[:4]
muta = path.basename(br_model).split('--')[1]
comp_clustid = skempi_homology['100'][comp]
backrub_models_train.append((br_model, comp, comp_clustid, muta))
v_dim = 24
#input_shape=(v_dim,v_dim,v_dim,167+4+2)
input_shape=(v_dim,v_dim,v_dim,167)
model_s = get_siamese_model(input_shape, 10)
NB_EPOCH = 80
#backrub_models_train = backrub_models_train[:80]
step = 40
history_epoch = {'loss':[], 'val_loss':[]}
for epoch in range(1, NB_EPOCH+1):
print('epoch: '+str(epoch))
history_batch = {'loss':[], 'val_loss':[]}
for batch_i in range(0, len(backrub_models_train), step):
try:
X_train, X_train_aux, y_train = [],[],[]
for step_i in range(step):
backrub_model = backrub_models_train[batch_i+step_i]
br_model = backrub_model[0]
comp = backrub_model[1]
muta = backrub_model[3]
comp_clustid = backrub_model[2]
try:
X_wt, X_mut, y, scr = load_map(br_model)
gemme_value, pc_value, tr_value, freq_value, trace_value = gemme_jet_dict[path.basename(path.dirname(br_model))]
except:
continue
if scr not in ['SUP', 'COR', 'RIM', 'SUR', 'INT']:
continue
scr = list(encoder.transform(np.array([scr]).reshape(-1,1)).reshape(-1))
aux_feat = np.array(scr + [gemme_value, pc_value, tr_value, freq_value, trace_value])
#aux_feat = np.array(scr)
X_train.append([X_wt[0], X_mut[0]])
X_train_aux.append(aux_feat)
y_train.append(y)
except:
continue
val_len = int(np.round(0.2*len(X_train)))
X_valid = np.array(X_train[:val_len])
X_valid_aux = np.array(X_train_aux[:val_len])
y_valid = np.array(y_train[:val_len])
X_train = np.array(X_train[val_len:])
X_train_aux = np.array(X_train_aux[val_len:])
y_train = np.array(y_train[val_len:])
#print(X_train.shape)
#print(X_train[:,0].shape)
#print(X_train[:,1].shape)
#print(np.array([X_train[:,0], X_train[:,1]]).shape)
history = model_s.fit([X_train[:,0], X_train[:,1], X_train_aux], y_train, validation_data=([X_valid[:,0], X_valid[:,1], X_valid_aux], y_valid), batch_size=X_train.shape[0], epochs=1, verbose=1)
history_batch['loss'].append(history.history['loss'][0])
history_batch['val_loss'].append(history.history['val_loss'][0])
_ = gc.collect()
history_epoch['loss'].append(np.array(history_batch['loss']).mean())
history_epoch['val_loss'].append(np.array(history_batch['val_loss']).mean())
model_s.save('model_s_'+str(epoch))
save_obj(history_epoch, 'history_epoch_'+str(epoch))
name: dla
channels:
- conda-forge
- anaconda
- pytorch
dependencies:
- python=3.8
- compilers
- cudnn=7.6
- cudatoolkit=10.1
- cudatoolkit-dev=10.1
- cupti=10.1
- pytorch-gpu=1.7.1
- torchvision=0.8.2
- cmake
- jupyterlab
- mpi4py
- nodejs
- pandas
- matplotlib
- seaborn
- scikit-learn
- prody
- lz4
- pip
- pip:
- tensorflow-gpu==2.3
- pypdb
- ipython
- bokeh
#!/usr/bin/env python3
# -*- coding: utf-8 -*-
"""
Created on Fri Feb 7 16:33:47 2020
@author: mohseni
"""
import logging
import numpy as np
import pickle
import shutil
import pypdb
import pandas as pd
from prody import *
from os import path, mkdir, remove, getenv, listdir, system
from io import StringIO
import urllib
import re
import glob
from subprocess import CalledProcessError, check_call
import traceback
import sys
import gzip
#========================================================
FORCE_CPU = False # IF YOU *DO* HAVE AN Nvidia GPU on your computer, or execute on Google COLAB, then change this option to False!
USE_FREESASA = True
NACCESS_PATH='naccess'
FreeSASA_PATH='freesasa'
#========================================================
def save_obj(obj, name):