|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
java.lang.Object org.strbio.mol.Molecule org.strbio.mol.Polymer org.strbio.mol.Protein org.strbio.mol.Profile
public class Profile
Class to represent a protein with profile information. The main protein described by this class (i.e. with coordinates, sequence, predSS, etc) is called the "key protein". Other aligned proteins are stored only as sequences. The names of the sequences are stored in this class, as are the "e values", which are a measure of the probability that each of the sequences is aligned by chance rather than a real homologue. The sequences of these aligned proteins are stored in the residues of the key protein, in the seq[] array.
Version 1.71, 3/2/05 - changed readBLAST to skip occasional error in blast output formatting: Query with no Subject Version 1.7, 3/1/04 - changed readBLAST to track first and last residues of subject (hit) sequence, add it to seqName Version 1.61, 11/12/02 - fixed readMSF to read bogus MSF format from 3DPSSM files Version 1.6, 11/02/01 - don't remove redundant sequences by default in blast... need to explicitly do that. Version 1.52, 10/19/01 - recognizes TBLASTN output as BLAST. Version 1.51, 8/2/01 - made seqPctID public, wrote better docs Version 1.5, 7/18/01 - supports SAF format Version 1.41, 10/26/00 - changed removeRedundantSequences to handle subsequences (ALL) Version 1.4, 2/11/00 - added Clustal format Version 1.36, 2/10/00 - recognizes more bogus MSF formats Version 1.35, 11/4/99 - added setSequence Version 1.34, 10/14/99 - added support for precalculated cons weights (PRECALC_CW) Version 1.33, 10/5/99 - speed up findCW by ignoring >500 seqs, fixed compatibility bugs in MSF format Version 1.32, 10/4/99 - fixed kludge in readBLAST for faster psiblast reading Version 1.31, 9/22/99 - fixed MSF routines for Clustal compatibility Version 1.3, 6/22/99 - added blastLog10E, blastBits Version 1.22, 3/30/99 - made printfs consistent, limited names to 4096 chars Version 1.21, 2/17/99 - moved BLAST code from ProfileSet to here. Version 1.2, 2/9/99 - added YAPF format Version 1.18, 2/2/99 - fixed bug in readBLAST - 'X' being read as gap Version 1.17, 1/19/99 - added removeSeq, removeRedundantSequences Version 1.16, 1/8/99 - reads BLAST files. Version 1.15, 10/26/98 - reads lower case MSF files. Version 1.14, 9/24/98 - detects some unexpectedly truncated HSSP files. Version 1.13, 8/12/98 - made readMSF not reset() if at the end of a file; this triggers a bug in BufferedReader not resetting to the right place. Version 1.12, 8/7/98 - changed PrintfStream to Printf Version 1.11, 7/17/98 - added writeTDP, doVarTom Version 1.1, 7/10/98 - added choose, writeFasta, Profile(ProteinSet) Version 1.01, 5/19/98 - made readMSF more flexible about format Version 1.0, 5/1/98 - original version
ProfRes
Field Summary | |
---|---|
int[] |
blastBits
Blast score for each sequence, if known... this should measure how close the sequence is to the key sequence. |
double[] |
blastLog10E
E values for each sequence, if known... this should measure how close the sequence is to the key sequence, and how likely the match was in the database searched. |
java.lang.String[] |
seqName
Name of each of the sequences in the profile, if sequence information is available; otherwise, null. |
Fields inherited from class org.strbio.mol.Polymer |
---|
chainID, includeFile, includingFile, monDistance, monomers, properties |
Fields inherited from class org.strbio.mol.Molecule |
---|
atoms, data, MAX_NAME_LENGTH, name |
Constructor Summary | |
---|---|
Profile()
Make an empty Profile. |
|
Profile(Profile q)
Copy a Profile, including all data in it. |
|
Profile(Protein q)
Make a Profile from a Protein, copying all data. |
|
Profile(ProteinSet q)
Make a Profile from a ProteinSet, copying all data. |
Method Summary | |
---|---|
void |
addKeySeq()
Tell the profile to add the "key sequence"... another sequence with the sequence of this protein. |
void |
addSeq(java.lang.String newSeqName)
Tell the profile to add a sequence with a given name. |
void |
addSeqsDirectlyFrom(Profile q)
Add additional sequences from another profile of the same length. |
void |
addSeqsFrom(Profile q)
add sequence info from another profile; aligns and copies. |
void |
allocSeqs(int n)
Tell the Profile to allocate space for at least N known sequences. |
void |
allocSeqsRes()
Tell all residues in the profile to allocate space for at least as many sequences as we know about. |
void |
blast(Printf outfile)
Run BLAST on this profile, using NCBI's blast server. |
void |
blast(Printf outfile,
Blast blastServer)
Run BLAST on this profile, using a specified server. |
Profile |
choose(int seqNum)
Make a profile based on this one, featuring one of the sub- sequences. |
void |
clearSeqs()
Tell the Profile to forget any sequence information it knows. |
Polymer |
copy()
Return a copy of yourself. |
void |
copySeqsDirectlyFrom(Profile q)
Copy additional sequences from another profile of the same length, eliminating current sequences. |
void |
copySeqsFrom(Profile q)
copy sequence info from another profile; aligns and copies, replacing current sequence info. |
Protein |
doVarTom(Printf outfile)
Do var-tom on this profile. |
void |
ensureUniqueNames()
This makes sure that every seqName is unique and non-null. |
void |
ensureUniqueNames(int maxNameLen)
This makes sure that every seqName is unique and non-null. |
void |
findConsensus()
Find the consensus residue for each residue in the profile. |
void |
findConservationWeights()
Calculate conservation weight for each residue in the profile. |
void |
findFrequencies()
Calculate the frequencies for each residue in the profile, but leave the consensus residue as it is. |
void |
findNonZeroFrequencies()
Find the non-zero frequencies for each residue in the profile. |
int |
findSeqByName(java.lang.String name)
Find a sequence by name, if it exists. |
int |
firstNonGap(int n)
What's the position of the first non-gap residue in sequence N? |
java.lang.String |
getSeqName(int i)
get the name of sequence i |
int |
lastNonGap(int n)
What's the position of the first non-gap residue in sequence N? |
Monomer |
newMonomer()
Profiles are made of ProfRes-type monomers. |
Monomer |
newMonomer(char t)
This should return a new monomer of whatever type this polymer is made of (i.e. |
Monomer |
newMonomer(java.lang.String s)
This should return a new monomer of whatever type this polymer is made of (i.e. |
void |
printProfile(Printf outfile)
Print profile to output in a pretty format. |
void |
printProfile(Printf outfile,
int pad)
Print profile to output in a pretty format. |
void |
processYAPF(java.lang.String buffer)
Process a line from a YAPF file... this should ignore the line if it doesn't know what it is. |
void |
readBLAST(java.io.BufferedReader infile,
Printf outfile)
This reads a profile from BLASTP 2.0.11 output. |
void |
readClustal(java.io.BufferedReader infile,
Printf outfile)
read a protein out of a CLUSTAL format file. |
void |
readHSSP(java.io.BufferedReader infile,
Printf outfile)
read a protein out of a HSSP (Sander & Schneider format) file. |
void |
readMSF(java.io.BufferedReader infile,
Printf outfile)
read a protein out of a MSF (GCG's format) file. |
protected boolean |
recognizeAndRead(java.lang.String buffer,
java.io.BufferedReader infile,
Printf outfile)
This looks for a profile and reads it in. |
void |
removeRedundantSequences(boolean removeSubSequences)
Eliminate redundant sequences (or subsequences) from the set. |
void |
removeSeq(int n)
Remove one of the sequences. |
void |
removeSpacesInNames()
This changes all spaces in sequence names to _ |
double |
seqPctCoverage(int r,
int s)
Find pct coverage of second sequence by first. |
double |
seqPctID(int r,
int s)
Find pct id between 2 sequences, w/o gap penalties. |
java.lang.String |
sequence(int n)
Return a string containing one of the sequences in the profile. |
int |
sequences()
Number of sequences in the profile, if known; otherwise, zero. |
void |
setSeqName(int i,
java.lang.String s)
set the name of sequence i |
void |
setSequence(int n,
java.lang.String newseq)
Set the sequence for one of the sequences in the profile. |
protected Polymer |
splitCopy()
When a Profile is split, sequence names should go to each child. |
protected void |
splitCopy(Profile q)
When a Profile is split, sequence names should go to each child. |
void |
truncateNames(int maxChars)
This truncates all names to a given length (or less). |
void |
writeClustal(Printf outfile)
Write in CLUSTAL format. |
void |
writeFasta(Printf outfile)
Write each sequence in the profile in Fasta format. |
void |
writeMSF(Printf outfile)
Write in MSF format. |
void |
writeProf(Printf outfile)
Write in Prof format. |
void |
writeSAF(Printf outfile)
Write in Burkhard Rost's SAF format. |
void |
writeTDP(Printf outfile)
A simple way of printing the profile, that var-tom uses as input. |
protected void |
writeYAPFInfo(Printf outfile)
Write applicable sections of YAPF info. |
Methods inherited from class org.strbio.mol.Protein |
---|
AA, actualAccuracy, copyPredSSFrom, expectedAccuracy, filterPred, filterPred, filterReal, filterReal, findAccess, findDSSP, findDSSPResult, findPDB, firstRes, fixDistanceGaps, getDSSPResults, getInfo, hasGaps, isCATHMatch, makeMonDistance, makeMonDistanceCB, makeVirtualCB, molecularWeight, preCalculateAlphaTau, preCalculateAngles, preCalculatePhiPsi, predictSS, predictSS, readAccess, readCASP, readConv, readDSSP, readEA, readPDB, readPDBAtom, readProf, readSwissProt, readVar, readVar2, residues, reverse, runDSSP, showGaps, smoothHE, stripAllButCA, thread, thread, translateEA, writeCASP, writeConv, writeEA, writePDB, writePDB, writeVar2 |
Methods inherited from class org.strbio.mol.Molecule |
---|
atomSearch, copyAtoms |
Methods inherited from class java.lang.Object |
---|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
Field Detail |
---|
public java.lang.String[] seqName
public double[] blastLog10E
public int[] blastBits
Constructor Detail |
---|
public Profile()
public Profile(Protein q)
public Profile(Profile q)
public Profile(ProteinSet q)
Method Detail |
---|
public int sequences()
public int findSeqByName(java.lang.String name)
public java.lang.String getSeqName(int i)
public void setSeqName(int i, java.lang.String s)
public Polymer copy()
copy
in class Protein
public Monomer newMonomer()
newMonomer
in class Protein
public Monomer newMonomer(char t)
Protein
newMonomer
in class Protein
public Monomer newMonomer(java.lang.String s)
Protein
newMonomer
in class Protein
protected void splitCopy(Profile q)
protected Polymer splitCopy()
splitCopy
in class Protein
public void clearSeqs()
public void allocSeqsRes()
public void allocSeqs(int n)
public final void addSeq(java.lang.String newSeqName)
public final void addKeySeq()
public final void addSeqsDirectlyFrom(Profile q)
public final void copySeqsDirectlyFrom(Profile q)
public final void addSeqsFrom(Profile q)
public final void copySeqsFrom(Profile q)
public final java.lang.String sequence(int n)
public final void setSequence(int n, java.lang.String newseq)
public final void removeSeq(int n)
public final void removeRedundantSequences(boolean removeSubSequences)
public final void findFrequencies()
public final void findNonZeroFrequencies()
public final void findConsensus()
public double seqPctID(int r, int s)
public double seqPctCoverage(int r, int s)
public final void findConservationWeights()
public final int firstNonGap(int n)
public final int lastNonGap(int n)
public final void printProfile(Printf outfile, int pad) throws java.io.IOException
outfile
- where to print topad
- number of leading spaces to pad output with
java.io.IOException
public final void writeTDP(Printf outfile) throws java.io.IOException
java.io.IOException
public void processYAPF(java.lang.String buffer) throws java.io.IOException
processYAPF
in class Protein
java.io.IOException
protected void writeYAPFInfo(Printf outfile) throws java.io.IOException
writeYAPFInfo
in class Protein
java.io.IOException
public final void printProfile(Printf outfile) throws java.io.IOException
outfile
- where to print to
java.io.IOException
public final void truncateNames(int maxChars)
public final void removeSpacesInNames()
public final void ensureUniqueNames(int maxNameLen)
public final void ensureUniqueNames()
public final void readHSSP(java.io.BufferedReader infile, Printf outfile) throws java.io.IOException
infile
- an open HSSP fileoutfile
- if non-null, will print info on what's going on
java.io.IOException
public final void readMSF(java.io.BufferedReader infile, Printf outfile) throws java.io.IOException
This should go into a MultipleAlignment class instead, and a Profile should turn itself into a MultipleAlignment (and vice versa) and this function should be called from here, but coded there.
infile
- an open MSF fileoutfile
- if non-null, will print info on what's going on
java.io.IOException
public final void readClustal(java.io.BufferedReader infile, Printf outfile) throws java.io.IOException
infile
- an open CLUSTAL fileoutfile
- if non-null, will print info on what's going on
java.io.IOException
public void blast(Printf outfile, Blast blastServer)
public void blast(Printf outfile)
public final void readBLAST(java.io.BufferedReader infile, Printf outfile) throws java.io.IOException
java.io.IOException
protected boolean recognizeAndRead(java.lang.String buffer, java.io.BufferedReader infile, Printf outfile) throws java.io.IOException
recognizeAndRead
in class Protein
java.io.IOException
Protein.recognizeAndRead(java.lang.String, java.io.BufferedReader, org.strbio.io.Printf)
public final void writeMSF(Printf outfile) throws java.io.IOException
This should go into a MultipleAlignment class instead, and a Profile should turn itself into a MultipleAlignment (and vice versa) and this function should be called from here, but coded there.
outfile
- where to write to
java.io.IOException
public final void writeSAF(Printf outfile) throws java.io.IOException
outfile
- where to write to
java.io.IOException
public final void writeClustal(Printf outfile) throws java.io.IOException
outfile
- where to write to
java.io.IOException
public final void writeProf(Printf outfile) throws java.io.IOException
outfile
- where to write to
java.io.IOException
public final Profile choose(int seqNum)
public void writeFasta(Printf outfile) throws java.io.IOException
writeFasta
in class Polymer
outfile
- where to write to
java.io.IOException
public Protein doVarTom(Printf outfile) throws java.io.IOException
java.io.IOException
|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |