org.strbio.mol
Class ThreadSet

java.lang.Object
  extended by org.strbio.mol.ThreadSet

public class ThreadSet
extends java.lang.Object

Class to keep track of two sets of proteins that are being used in threading algorithms.

 Version 2.0, 11/12/02 - can load from 3DPSSM file
 Version 1.8, 1/30/1 - fixed bug where ASpc was printed as ASns
 Version 1.71, 8/28/00 - saveStrMatches follows .casp suffix
 Version 1.7, 4/11/00 - fixed bug where alignmentParameters2 was
   saying it was finished with a given sequence... had to be re-setup()
 Version 1.69, 12/7/99 - fixed bug in probabilityOfFindingMatch
 Version 1.68, 9/17/99 - added makeCATHStrMatches
 Version 1.67, 9/2/99 - added alignmentParameters2
 Version 1.66, 7/14/99 - fixed API for globalCompare
 Version 1.65, 7/8/99 - added getEstPMatch
 Version 1.64, 6/25/99 - added calcEstPMatch function, moved gapModel
   and scoreList into alignmentParameters
 Version 1.63, 6/9/99 - added reciprocal weighted average
 Version 1.62, 6/3/99 - added median functions, fixed bug in
   definition of 'same sequence'
 Version 1.61, 3/31/99 - saves avg score with/out gaps
 Version 1.6, 2/26/99 - fixed to conform to new GapModel api
 Version 1.5, 2/22/99 - fixed to conform to new AlignmentSet api
 Version 1.4, 10/20/98 - added new Alignment format to loadStrMatches
 Version 1.33, 10/15/98 - added naildown alignment to loadStrMatches
 Version 1.32, 10/7/98 - added minareaInfoAll
 Version 1.31, 9/30/98 - added minareaInfo, removed 1.3 change
 Version 1.3, 9/28/98 - added ratio score output option to
   alignStrMatchesMinarea
 Version 1.22, 9/1/98 - don't let somebody add scoring function with
   zero weight, to minimize stupid errors
 Version 1.21, 7/22/98 - alignStrMatchesMinarea auto-renumbers polymers
 Version 1.2, 4/23/98 - uses PolymerSet, made everything double precision
 Version 1.1, 4/6/98 - added all functions from thread_set.cpp
 Version 1.0, 4/3/98 - original version
 

Version:
2.0, 11/12/02
Author:
JMC
See Also:
PolymerSet, Alignment

Field Summary
 AlignmentParameters alignmentParameters
          Alignment parameters
 AlignmentParameters alignmentParameters2
          Alignment parameters for evaluating alignment goodness in FR tests; Null means use same one as that used to generate the alignment.
 PolymerSet folds
          Folds being threaded.
 PolymerSet seqs
          Sequences being threaded.
 int tolerance
          margin for error in evaluating alignments.
 
Constructor Summary
ThreadSet()
          Initialize everything to default values.
 
Method Summary
 void addStrMatch(AlignmentSet aSet)
          add a single structural match to the set, created by the user
 AlignmentSet addStrMatch(Polymer seq, Polymer fold)
          add a single structural match to the set.
 void addStrMatch(java.lang.String seq_name, java.lang.String fold_name, java.lang.String correct_name, java.lang.String nail_name)
          add a single structural match to the set.
 void alignmentStats(Printf outfile)
          compute stats on alignment.
 void alignStrMatches()
          Calculated alignment for all structural matches, without printing anything.
 void alignStrMatches(Printf outfile, boolean showAlignment, boolean showModeller)
          Aligns structural matches.
 void alignStrMatchesMinarea()
          redo all structural matches using minarea, and save the results to a single file
 double[] ASns(int tolerance)
          calculate alignment specificity, as in CASP2.
 double[] ASpc(int tolerance)
          calculate alignment specificity, as in CASP2.
 double averageRank()
          average rank of structural matches
 double averageRankRW()
          average rank with reciprocal weighting
 double averageZ()
          average z score of structural matches
 DVector calcMatchRank()
          Calculate vector of ranks, showing one rank for each match.
 DVector calcMatchZ()
          Calculate vector of z scores, showing one z score for each match.
 void clearSame()
          Forget that seq-fold pairs have the same name.
 void clearScores()
          forget about any alignments done so far.
 void clearStrMatches()
          Deny all knowledge of structural matches.
 void copyCorrectFrom(ThreadSet ts)
          Copy correct alignments from another threadSet to this one's calculated alignments.
 void findSameSeqs()
          Find seq-fold pairs with the same name.
 DVector getEstPMatch(int i)
          Get a DVector of estimated probabilities for one of the proteins in the seqs set, or null if this has not been calculated.
 IVector getSortedScores(int i)
          Get an IVector of sorted probabilities for one of the proteins in the seqs set, from most to least likely, or null if this has not been calculated.
 AlignmentSet getStrMatch(int i)
          Return a given structural match, or null if out of range.
 AlignmentSet getStrMatch(Polymer seq, Polymer fold)
          Get first structural match, if one exists, with a given seq and fold, or null if one doesn't exist.
 void globalCompare(Printf outfile, boolean optZ, boolean optN)
          do global comparison of all sequences, showing scores and z-scores.
 boolean isMatch(int n, int m)
          is sequence N a known structural match for fold M?
 boolean isSame(int n, int m)
          is sequence N the same as fold M?
 int load3DPSSM(java.lang.String filename)
          loads seqs, folds, and str matches from 3DPSSM output (mailbox of messages from server).
 int loadStrMatches(java.lang.String filename)
          get structural matches from a file, with or without correct and naildown alignments.
 int makeAllStrMatches()
          add all possible structural matches to the set.
 void makeCATHStrMatches()
          make a set of StrMatches using CATH data.
 double medianRank()
          median rank of structural matches
 double medianZ()
          median z score of structural matches
 void minareaInfo(Printf outfile)
          Show info on all structural matches.
 void minareaInfoAll(Printf outfile)
          Show info on all seq/fold pairs.
 int nMatches()
          How many structural matches do we know about?
 int nSame()
          How many seq-fold pairs are the same?
 int numberFound(int cutoff)
          number of structural matches in top N scores (cutoff)
 void oldAlignStrMatchesMinarea()
          Deprecated.  
 void optimizeAlignment(Printf outfile)
          optimize the gap penalties for highest alignment accuracy.
 void optimizeASns(Printf outfile)
          optimize the gap penalties for highest asns(1).
 void optimizeMedianRank(Printf outfile)
          optimize the gap penalties for best median rank of known structural matches.
 void optimizeRank(Printf outfile)
          optimize the gap penalties for best average rank of known structural matches.
 void optimizeRankRW(Printf outfile)
          optimize the gap penalties for best reciprocal weighted average rank of known structural matches.
 void optimizeTopN(int n, Printf outfile)
          optimize the gap penalties for most correct structural matches in the top N.
 void optimizeZScore(Printf outfile)
          optimize the gap penalties for highest average Z score among structural matches.
 double[] pctRight(int tolerance)
          Returns pctRight for the structural matches.
 DVector probabilityOfFindingMatch()
          Calculates confidence vector... this is a DVector of the length of the number of folds.
 void removeStrMatchesOver(double score)
          Remove structural matches over a certain minarea score.
 void removeStrMatchesUnder(double score)
          Remove structural matches under a certain minarea score.
 void saveEstAAStats(java.lang.String filename)
          save the stats needed for alignment accuracy estimation to a file.
 void saveEstFRStats(java.lang.String filename)
          save the stats needed for FR accuracy estimation to a file.
 void saveStrMatches(java.lang.String filename)
          Save structural matches to a file.
 void saveStrMatchesModeller(java.lang.String filename)
          Save structural matches to a modeller format file.
 void setHookeParameters(double rho, double epsilon, int itermax, int iterstart)
          used to set optimizing parameters for optimize* functions; these are the Hooke parameters.
 void showFRAll(Printf outfile)
          show comparison results for all proteins.
 void showFRMatches(Printf outfile)
          show comparison results for str matches.
 void showPMatch(Printf outfile)
          shows probability of a structural match for various Z score ranges.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

seqs

public PolymerSet seqs
Sequences being threaded.


folds

public PolymerSet folds
Folds being threaded.


alignmentParameters

public AlignmentParameters alignmentParameters
Alignment parameters


alignmentParameters2

public AlignmentParameters alignmentParameters2
Alignment parameters for evaluating alignment goodness in FR tests; Null means use same one as that used to generate the alignment.


tolerance

public int tolerance
margin for error in evaluating alignments.

Constructor Detail

ThreadSet

public ThreadSet()
Initialize everything to default values. Alignment paramters are set to null.

Method Detail

nMatches

public final int nMatches()
How many structural matches do we know about?


isMatch

public final boolean isMatch(int n,
                             int m)
is sequence N a known structural match for fold M?


isSame

public final boolean isSame(int n,
                            int m)
is sequence N the same as fold M?


copyCorrectFrom

public final void copyCorrectFrom(ThreadSet ts)
Copy correct alignments from another threadSet to this one's calculated alignments. Very big kludge. Does no error checking whatsoever.


load3DPSSM

public final int load3DPSSM(java.lang.String filename)
loads seqs, folds, and str matches from 3DPSSM output (mailbox of messages from server). E values stored in AlignmentStats' estPMatch field. returns number of structual alignments read.


loadStrMatches

public final int loadStrMatches(java.lang.String filename)
get structural matches from a file, with or without correct and naildown alignments. returns number found.


saveStrMatches

public final void saveStrMatches(java.lang.String filename)
                          throws java.io.IOException
Save structural matches to a file.

Throws:
java.io.IOException

saveStrMatchesModeller

public final void saveStrMatchesModeller(java.lang.String filename)
                                  throws java.io.IOException
Save structural matches to a modeller format file.

Throws:
java.io.IOException

alignStrMatchesMinarea

public final void alignStrMatchesMinarea()
redo all structural matches using minarea, and save the results to a single file

See Also:
AlignmentSet.minareaAlign()

oldAlignStrMatchesMinarea

public final void oldAlignStrMatchesMinarea()
                                     throws java.io.IOException
Deprecated. 

redo all structural matches using minarea, and save the results to files called 'seqname.foldname.al_minarea'.

Throws:
java.io.IOException
See Also:
AlignmentSet.minareaAlign()

minareaInfo

public final void minareaInfo(Printf outfile)
                       throws java.io.IOException
Show info on all structural matches.

Throws:
java.io.IOException
See Also:
AlignmentSet.minareaInfo(org.strbio.io.Printf)

minareaInfoAll

public final void minareaInfoAll(Printf outfile)
                          throws java.io.IOException
Show info on all seq/fold pairs.

Throws:
java.io.IOException
See Also:
AlignmentSet.minareaInfo(org.strbio.io.Printf)

clearSame

public final void clearSame()
Forget that seq-fold pairs have the same name.


findSameSeqs

public final void findSameSeqs()
Find seq-fold pairs with the same name.


nSame

public final int nSame()
How many seq-fold pairs are the same?


addStrMatch

public final void addStrMatch(java.lang.String seq_name,
                              java.lang.String fold_name,
                              java.lang.String correct_name,
                              java.lang.String nail_name)
add a single structural match to the set.


addStrMatch

public final void addStrMatch(AlignmentSet aSet)
add a single structural match to the set, created by the user


addStrMatch

public final AlignmentSet addStrMatch(Polymer seq,
                                      Polymer fold)
add a single structural match to the set. Returns the AlignmentSet created.


makeAllStrMatches

public final int makeAllStrMatches()
add all possible structural matches to the set.


makeCATHStrMatches

public final void makeCATHStrMatches()
make a set of StrMatches using CATH data. Matches must agree at the H level or below. CATH data must be present for every seqs and folds which is considered.


clearStrMatches

public final void clearStrMatches()
Deny all knowledge of structural matches.


removeStrMatchesOver

public final void removeStrMatchesOver(double score)
Remove structural matches over a certain minarea score. This is probably not generally useful, so isn't well debugged.


removeStrMatchesUnder

public final void removeStrMatchesUnder(double score)
Remove structural matches under a certain minarea score. This is probably not generally useful, so isn't well debugged.


getStrMatch

public final AlignmentSet getStrMatch(int i)
Return a given structural match, or null if out of range.


getStrMatch

public final AlignmentSet getStrMatch(Polymer seq,
                                      Polymer fold)
Get first structural match, if one exists, with a given seq and fold, or null if one doesn't exist.


alignStrMatches

public final void alignStrMatches(Printf outfile,
                                  boolean showAlignment,
                                  boolean showModeller)
                           throws java.io.IOException
Aligns structural matches.

Throws:
java.io.IOException

alignStrMatches

public final void alignStrMatches()
                           throws java.io.IOException
Calculated alignment for all structural matches, without printing anything.

Throws:
java.io.IOException

pctRight

public final double[] pctRight(int tolerance)
Returns pctRight for the structural matches. Must have been already aligned. Return format is double[3] containing percent right, nCorrect, and nAligned.


ASpc

public final double[] ASpc(int tolerance)
calculate alignment specificity, as in CASP2. Returns a double[4] with the ASpc value, the ACrct value (correctly aligned positions), Na (aligned positions in this alignment), and NaC (aligned positions in the correct alignment) Must have been already aligned.


ASns

public final double[] ASns(int tolerance)
calculate alignment specificity, as in CASP2. Returns a double[4] with the ASpc value, the ACrct value (correctly aligned positions), Na (aligned positions in this alignment), and NaC (aligned positions in the correct alignment) Must have been already aligned.


alignmentStats

public final void alignmentStats(Printf outfile)
compute stats on alignment. For debugging purposes only; don't use it.


clearScores

public final void clearScores()
forget about any alignments done so far.


getEstPMatch

public final DVector getEstPMatch(int i)
Get a DVector of estimated probabilities for one of the proteins in the seqs set, or null if this has not been calculated.


getSortedScores

public final IVector getSortedScores(int i)
Get an IVector of sorted probabilities for one of the proteins in the seqs set, from most to least likely, or null if this has not been calculated.


saveEstAAStats

public final void saveEstAAStats(java.lang.String filename)
                          throws java.io.IOException
save the stats needed for alignment accuracy estimation to a file.

Throws:
java.io.IOException

saveEstFRStats

public final void saveEstFRStats(java.lang.String filename)
                          throws java.io.IOException
save the stats needed for FR accuracy estimation to a file.

Throws:
java.io.IOException

globalCompare

public final void globalCompare(Printf outfile,
                                boolean optZ,
                                boolean optN)
                         throws java.io.IOException
do global comparison of all sequences, showing scores and z-scores. Set optZ if optimizing for z scores (it saves time by not calculating sorted scores). Set optN if optimizing for top N scores (it saves time by not calculating z scores). If either is set, showAll, showMatch, outfile should all be off.

Parameters:
outfile - show results here
optZ - don't calculate top N scores
optN - don't calculate Z scores
Throws:
java.io.IOException

showFRAll

public final void showFRAll(Printf outfile)
                     throws java.io.IOException
show comparison results for all proteins.

Throws:
java.io.IOException

showFRMatches

public final void showFRMatches(Printf outfile)
                         throws java.io.IOException
show comparison results for str matches.

Throws:
java.io.IOException

numberFound

public final int numberFound(int cutoff)
number of structural matches in top N scores (cutoff)


calcMatchZ

public final DVector calcMatchZ()
Calculate vector of z scores, showing one z score for each match.


calcMatchRank

public final DVector calcMatchRank()
Calculate vector of ranks, showing one rank for each match. This is a DVector, for convenience in doing statistics.


averageZ

public final double averageZ()
average z score of structural matches


averageRank

public final double averageRank()
average rank of structural matches


medianRank

public final double medianRank()
median rank of structural matches


averageRankRW

public final double averageRankRW()
average rank with reciprocal weighting


medianZ

public final double medianZ()
median z score of structural matches


probabilityOfFindingMatch

public final DVector probabilityOfFindingMatch()
Calculates confidence vector... this is a DVector of the length of the number of folds. In every bin, it shows the probability that for a given seq there was a matching fold found in the top N-1 hits. Only sequences for which there is a matching fold are considered.


showPMatch

public final void showPMatch(Printf outfile)
                      throws java.io.IOException
shows probability of a structural match for various Z score ranges. The lowest range should contain no matches.

Throws:
java.io.IOException

setHookeParameters

public void setHookeParameters(double rho,
                               double epsilon,
                               int itermax,
                               int iterstart)
used to set optimizing parameters for optimize* functions; these are the Hooke parameters.

See Also:
Hooke

optimizeAlignment

public void optimizeAlignment(Printf outfile)
                       throws java.io.IOException
optimize the gap penalties for highest alignment accuracy.

Throws:
java.io.IOException

optimizeASns

public void optimizeASns(Printf outfile)
                  throws java.io.IOException
optimize the gap penalties for highest asns(1).

Throws:
java.io.IOException

optimizeZScore

public void optimizeZScore(Printf outfile)
                    throws java.io.IOException
optimize the gap penalties for highest average Z score among structural matches.

Throws:
java.io.IOException

optimizeRank

public void optimizeRank(Printf outfile)
                  throws java.io.IOException
optimize the gap penalties for best average rank of known structural matches.

Throws:
java.io.IOException

optimizeRankRW

public void optimizeRankRW(Printf outfile)
                    throws java.io.IOException
optimize the gap penalties for best reciprocal weighted average rank of known structural matches.

Throws:
java.io.IOException

optimizeMedianRank

public void optimizeMedianRank(Printf outfile)
                        throws java.io.IOException
optimize the gap penalties for best median rank of known structural matches.

Throws:
java.io.IOException

optimizeTopN

public void optimizeTopN(int n,
                         Printf outfile)
                  throws java.io.IOException
optimize the gap penalties for most correct structural matches in the top N.

Throws:
java.io.IOException