Commit 9175fdba by Mustafa Tekpinar

Added sstjetormaxtwocomponent as default normweightmode.

parent 7447bfab
...@@ -15,12 +15,30 @@ ESGEMME has the following external dependencies: ...@@ -15,12 +15,30 @@ ESGEMME has the following external dependencies:
* java * java
* naccess: http://www.bioinf.manchester.ac.uk/naccess/ * naccess: http://www.bioinf.manchester.ac.uk/naccess/
* muscle: https://www.drive5.com/muscle/
* seqinr R package: https://cran.r-project.org/web/packages/seqinr/index.html
* dssp for secondary structure prediction.
These tools should be installed to be able to use ESGEMME. After you installed JET2 define a parameter called JET2_PATH inside your .profile file.
You can open .profile as follows:
.. code:: bash
gedit ~/.profile
You should add a command like below to the end of that file, save and exit.
.. code:: bash
export JET2_PATH=/home/tekpinar/JET2/
Please, do not forget to replace /home/tekpinar/JET2 with your own file path.
Then, source the saved .profile so that the environment variable will be taken into account:
.. code:: bash
source ~/.profile
JET2 is essential and it should be installed to be able to use ESGEMME.
Preparation of the environment and installation of ESGEMME Preparation of the environment and installation of ESGEMME
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
...@@ -88,16 +106,41 @@ Please note that default dssp in Ubuntu 22.04 is not working properly. ...@@ -88,16 +106,41 @@ Please note that default dssp in Ubuntu 22.04 is not working properly.
Check the location of hhsuite folder and add it to your path Check the location of hhsuite folder and add it to your path
In my case it was in /home/tekpinar/research/lcqb folder. Therefore, I added the following line In my case it was in /home/tekpinar/research/lcqb folder. Therefore, I added the following line
to my .profile file. to my .profile file.
PATH="/home/tekpinar/research/lcqb/hhsuite/bin:/home/tekpinar/research/lcqb/hhsuite/scripts:$PATH" Open .profile file with gedit:
.. code:: bash
gedit ~/.profile
Then Now, add the following line to the end of the file.
source ~/.profile
.. code:: bash
PATH="/home/tekpinar/research/lcqb/hhsuite/bin:/home/tekpinar/research/lcqb/hhsuite/scripts:$PATH"
Of course, your path will not be /home/tekpinar/research/lcqb/ and you have to modify the path according to
your system. Save the file and exit. Then,
.. code:: bash
source ~/.profile
# #
cd ESGEMME cd ESGEMME
#Download ESGEMME from http://gitlab.lcqb.upmc.fr/tekpinar/ESGEMME repository and go inside the ESGEMME folder.! #Download ESGEMME from http://gitlab.lcqb.upmc.fr/tekpinar/ESGEMME repository and go inside the ESGEMME folder.!
You can download the master version using command line as follows:
.. code:: bash
git clone http://gitlab.lcqb.upmc.fr/tekpinar/ESGEMME.git
If you would like the development version:
.. code:: bash
git clone -b development http://gitlab.lcqb.upmc.fr/tekpinar/ESGEMME.git
.. code:: bash .. code:: bash
cd ESGEMME cd ESGEMME
...@@ -115,6 +158,7 @@ file according to your system. ...@@ -115,6 +158,7 @@ file according to your system.
cd ../ cd ../
#Installing the required R packages #Installing the required R packages
.. code:: bash .. code:: bash
sudo Rscript -e 'install.packages("seqinr", repos="http://cran.us.r-project.org", dependencies=TRUE)' sudo Rscript -e 'install.packages("seqinr", repos="http://cran.us.r-project.org", dependencies=TRUE)'
......
...@@ -84,9 +84,38 @@ if ((normWeightMode=="max")){ ...@@ -84,9 +84,38 @@ if ((normWeightMode=="max")){
quit(status=-1) quit(status=-1)
} }
} }
}else if (normWeightMode=="sstjetormaxtwocomponent"){
print("Using only sstjetormaxtwocomponent")
for (row in 1:nrow(jet)) {
if(sum(colnames(jet)=="sstjetormaxtwocomponent")==1){
trace<-append(trace, jet[row, "sstjetormaxtwocomponent"])
}else{
print("No field called sstjetormaxtwocomponent in the JET output!")
quit(status=-1)
}
}
}else if (normWeightMode=="sstjetormaxthirdchanged"){
print("Using only sstjetormaxthirdchanged")
for (row in 1:nrow(jet)) {
if(sum(colnames(jet)=="sstjetormaxthirdchanged")==1){
trace<-append(trace, jet[row, "sstjetormaxthirdchanged"])
}else{
print("No field called sstjetormaxthirdchanged in the JET output!")
quit(status=-1)
}
}
}else if (normWeightMode=="sstjetormaxthirdpcstar"){
print("Using only sstjetormaxthirdpcstar")
for (row in 1:nrow(jet)) {
if(sum(colnames(jet)=="sstjetormaxthirdpcstar")==1){
trace<-append(trace, jet[row, "sstjetormaxthirdpcstar"])
}else{
print("No field called sstjetormaxthirdpcstar in the JET output!")
quit(status=-1)
}
}
}else if (normWeightMode=="tjetormax"){ }else if (normWeightMode=="tjetormax"){
print("Using tjetormax with inverse CV") print("Using tjetormax with inverse")
for (row in 1:nrow(jet)) { for (row in 1:nrow(jet)) {
if(sum(colnames(jet)=="traceMax")==1){ if(sum(colnames(jet)=="traceMax")==1){
trace<-append(trace, max((jet[row, "traceMax"]+jet[row, "pc"])/2.0, max((jet[row, "traceMax"]+1.0-jet[row, "cv"])/2.0, (jet[row, "pc"]+1.0-jet[row, "cv"])/2.0 ))) trace<-append(trace, max((jet[row, "traceMax"]+jet[row, "pc"])/2.0, max((jet[row, "traceMax"]+1.0-jet[row, "cv"])/2.0, (jet[row, "pc"]+1.0-jet[row, "cv"])/2.0 )))
......
...@@ -936,8 +936,8 @@ def parse_command_line(): ...@@ -936,8 +936,8 @@ def parse_command_line():
parser.add_argument('--normweightmode', dest='normweightmode', type=str, \ parser.add_argument('--normweightmode', dest='normweightmode', type=str, \
help="It can be one of these: 'tjet', 'cv', 'pc',"+\ help="It can be one of these: 'tjet', 'cv', 'pc',"+\
"max, tjetormax or sstjetormax. Default is 'tjet'.", "max, tjetormax or sstjetormaxtwocomponent. Default is 'tjet'.",
required=False, default="tjet") required=False, default="sstjetormaxtwocomponent")
parser.add_argument('--verbose', dest='verbose', type=bool, \ parser.add_argument('--verbose', dest='verbose', type=bool, \
help="This argument controls amount of the output. Default is 'False'."+\ help="This argument controls amount of the output. Default is 'False'."+\
...@@ -1004,6 +1004,9 @@ def doit(inAli,mutFile,retMet,bFile,fFile,n,N, jetfile, pdbfile, normWeightMode, ...@@ -1004,6 +1004,9 @@ def doit(inAli,mutFile,retMet,bFile,fFile,n,N, jetfile, pdbfile, normWeightMode,
(normWeightMode != 'pc') and \ (normWeightMode != 'pc') and \
(normWeightMode != 'max') and \ (normWeightMode != 'max') and \
(normWeightMode != 'tjetormax') and \ (normWeightMode != 'tjetormax') and \
(normWeightMode != 'sstjetormaxtwocomponent') and \
(normWeightMode != 'sstjetormaxthirdchanged') and \
(normWeightMode != 'sstjetormaxthirdpcstar') and \
(normWeightMode != 'sstjetormax')): (normWeightMode != 'sstjetormax')):
print("ERROR: normWeightMode can only be 'tjet', 'cv', 'pc', "+\ print("ERROR: normWeightMode can only be 'tjet', 'cv', 'pc', "+\
"'max', 'tjetormax' or 'sstjetormax'!") "'max', 'tjetormax' or 'sstjetormax'!")
...@@ -1073,6 +1076,154 @@ def doit(inAli,mutFile,retMet,bFile,fFile,n,N, jetfile, pdbfile, normWeightMode, ...@@ -1073,6 +1076,154 @@ def doit(inAli,mutFile,retMet,bFile,fFile,n,N, jetfile, pdbfile, normWeightMode,
df.to_csv(prot+"_jet.res", header=True, index=None, sep='\t', mode='w') df.to_csv(prot+"_jet.res", header=True, index=None, sep='\t', mode='w')
#sys.exit(-1) #sys.exit(-1)
if((normWeightMode=='sstjetormaxtwocomponent')):
if (pdbfile == None):
print("ERROR: There is not any pdb file.")
sys.exit(-1)
else:
calculateSecondaryStructure(pdbfile)
countCoilSegments(pdbfile+".dssp")
df = pd.read_table(prot+"_jet.res", sep="\s+")
df2 = pd.read_table(pdbfile+".dssp.new", header=None, sep=",")
df2.columns = ['pos', 'ss', 'length']
mergedRes = pd.merge(df, df2, on ='pos', right_index=False)
if(debug):
print(df['pos'])
print(pdbfile+".dssp")
print(os.getcwd())
print(df2)
print(mergedRes)
sstjetormaxList = []
# maxCoilLength = 5
print("WARNING: Max. coil length = {}".format(maxCoilLength))
for index, row in mergedRes.iterrows():
if(row['ss']=='C') and (row['length']>maxCoilLength):
sstjetormaxList.append(row['trace'])
else:
maxVal = max([((row['trace']+row['pc'])/2.0), ((row['trace']+row['cv'])/2.0)])
sstjetormaxList.append(maxVal)
#sstjetormaxList=rankSortProteinData(sstjetormaxList, inverted=False)
df['sstjetormaxtwocomponent'] = sstjetormaxList
df['sstjetormaxtwocomponent'] = df['sstjetormaxtwocomponent'].round(decimals = 4)
df.to_csv(prot+"_jet.res", header=True, index=None, sep='\t', mode='w')
#sys.exit(-1)
if((normWeightMode=='sstjetormaxthirdchanged')):
if (pdbfile == None):
print("ERROR: There is not any pdb file.")
sys.exit(-1)
else:
calculateSecondaryStructure(pdbfile)
countCoilSegments(pdbfile+".dssp")
df = pd.read_table(prot+"_jet.res", sep="\s+")
df2 = pd.read_table(pdbfile+".dssp.new", header=None, sep=",")
df2.columns = ['pos', 'ss', 'length']
mergedRes = pd.merge(df, df2, on ='pos', right_index=False)
if(debug):
print(df['pos'])
print(pdbfile+".dssp")
print(os.getcwd())
print(df2)
print(mergedRes)
sstjetormaxList = []
# maxCoilLength = 5
print("WARNING: Max. coil length = {}".format(maxCoilLength))
for index, row in mergedRes.iterrows():
if(row['ss']=='C') and (row['length']>maxCoilLength):
sstjetormaxList.append(row['trace'])
else:
maxVal = max([((row['trace']+row['pc'])/2.0), ((row['trace']+row['cv'])/2.0), ((row['trace']))])
sstjetormaxList.append(maxVal)
#sstjetormaxList=rankSortProteinData(sstjetormaxList, inverted=False)
df['sstjetormaxthirdchanged'] = sstjetormaxList
df['sstjetormaxthirdchanged'] = df['sstjetormaxthirdchanged'].round(decimals = 4)
df.to_csv(prot+"_jet.res", header=True, index=None, sep='\t', mode='w')
if((normWeightMode=='sstjetormaxthirdpcstar')):
if (pdbfile == None):
print("ERROR: There is not any pdb file.")
sys.exit(-1)
else:
from collections import OrderedDict
pcStar = OrderedDict()
pcStar={'A':0.38,
'V':0.54,
'L':0.45,
'I':0.60,
'P':0.18,
'F':0.50,
'W':0.27,
'M':0.40,
'G':0.36,
'S':0.23,
'T':0.24,
'C':0.45,
'Y':0.15,
'N':0.13,
'Q':0.07,
'D':0.15,
'E':0.18,
'K':0.04,
'R':0.02,
'H':0.17}
oneLetter2ThreeLetters = {'C': 'CYS', 'D': 'ASP', 'S': 'SER', 'Q': 'GLN', 'K': 'LYS',
'I': 'ILE', 'P': 'PRO', 'T': 'THR', 'F': 'PHE', 'N': 'ASN',
'G': 'GLY', 'H': 'HIS', 'L': 'LEU', 'R': 'ARG', 'W': 'TRP',
'A': 'ALA', 'V': 'VAL', 'E': 'GLU', 'Y': 'TYR', 'M': 'MET'}
threeLetters2oneLetter = {v: k for k, v in oneLetter2ThreeLetters.items()}
calculateSecondaryStructure(pdbfile)
countCoilSegments(pdbfile+".dssp")
df = pd.read_table(prot+"_jet.res", sep="\s+")
pcstarList = []
for index, row in df.iterrows():
print(row['pos'], row['AA'])
singleLetterCode = threeLetters2oneLetter[row['AA']]
pcstarList.append(pcStar[singleLetterCode])
import numpy as np
df['pcstar'] = list(np.array(pcstarList)/(np.array(pcstarList).max()))
print(df)
df2 = pd.read_table(pdbfile+".dssp.new", header=None, sep=",")
df2.columns = ['pos', 'ss', 'length']
mergedRes = pd.merge(df, df2, on ='pos', right_index=False)
if(debug):
print(df['pos'])
print(pdbfile+".dssp")
print(os.getcwd())
print(df2)
print(mergedRes)
sstjetormaxList = []
# maxCoilLength = 5
print("WARNING: Max. coil length = {}".format(maxCoilLength))
for index, row in mergedRes.iterrows():
if(row['ss']=='C') and (row['length']>maxCoilLength):
sstjetormaxList.append(row['trace'])
else:
maxVal = max([((row['trace']+row['pc'])/2.0), ((row['trace']+row['cv'])/2.0), ( (row['cv']+row['pcstar'])/2.0 )])
sstjetormaxList.append(maxVal)
#sstjetormaxList=rankSortProteinData(sstjetormaxList, inverted=False)
df['sstjetormaxthirdpcstar'] = sstjetormaxList
df['sstjetormaxthirdpcstar'] = df['sstjetormaxthirdpcstar'].round(decimals = 4)
df.to_csv(prot+"_jet.res", header=True, index=None, sep='\t', mode='w')
#sys.exit(-1)
# #If a real pdb file is given, calculate dfi for the residues. # #If a real pdb file is given, calculate dfi for the residues.
# if(((normWeightMode=='tracemovingaverage'))): # if(((normWeightMode=='tracemovingaverage'))):
# print("Calculating trace moving average per residue!") # print("Calculating trace moving average per residue!")
......
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment