org.strbio.mol.lib.pred2ary
Class TrainingSet

java.lang.Object
  extended by java.util.AbstractCollection<E>
      extended by java.util.AbstractList<E>
          extended by java.util.Vector
              extended by org.strbio.mol.PolymerSet
                  extended by org.strbio.mol.ProteinSet
                      extended by org.strbio.mol.ProfileSet
                          extended by org.strbio.mol.lib.pred2ary.PCPSet
                              extended by org.strbio.mol.lib.pred2ary.TPSet
                                  extended by org.strbio.mol.lib.pred2ary.TrainingSet
All Implemented Interfaces:
java.io.Serializable, java.lang.Cloneable, java.lang.Iterable, java.util.Collection, java.util.List, java.util.RandomAccess

public class TrainingSet
extends TPSet

Class to represent a set of profiles being used in 2ary structure prediction as a training set.

 Version 1.2, 6/7/99 - moved to org.strbio.mol.lib.pred2ary
 Version 1.11, 8/17/98 - changed some fatal() to exceptions.
 Version 1.1, 5/20/98 - changed setupXXXNet, reduceTrain to allow
   binary input.
 Version 1.0, 5/12/98 - original version, adapted from 2ary_set.cpp
 

Version:
1.2, 6/7/99
Author:
JMC
See Also:
PredClassProfile, Serialized Form

Nested Class Summary
 class TrainingSet.Present2ary
           
 class TrainingSet.PresentClass1
          class to show proteins to class prediction network (1-output) depends on training_class being set.
 class TrainingSet.PresentClass4
          class to show proteins to class prediction network (4-output)
 class TrainingSet.PresentLvl2
           
 
Nested classes/interfaces inherited from class org.strbio.mol.lib.pred2ary.TPSet
TPSet.NetVars
 
Nested classes/interfaces inherited from class org.strbio.mol.PolymerSet
PolymerSet.PolymerEnumeration
 
Field Summary
 NeuralNet[] cnet1
           
 NeuralNet cnet4
           
 boolean done_2ary
           
 boolean done_class
           
 boolean done_lvl2
           
 NeuralNet net
           
 NeuralNet net_2
           
 PredictionSet p_set
           
 java.lang.String weight_2ary
           
 java.lang.String weight_class
           
 java.lang.String weight_lvl2
           
 
Fields inherited from class org.strbio.mol.lib.pred2ary.TPSet
CLASS_2ARY, CLASS_AA, CLASS_STRONG, CSM, CSM1, cutoff_2ary, cutoff_class, cutoff_lvl2, nstats, r_SM, SM, USE_LENGTH_INFO, vars
 
Fields inherited from class java.util.Vector
capacityIncrement, elementCount, elementData
 
Fields inherited from class java.util.AbstractList
modCount
 
Constructor Summary
TrainingSet()
           
TrainingSet(TrainingSet q)
          Copy the set, but don't dupicate any data/networks.
 
Method Summary
 void clear()
          Clear out all information in the set.
 void delete2aryNet()
           
 void deleteClassNet()
           
 void deleteLvl2Net()
           
 void estAccy()
          Estimate accuracy.
 double fcut2ary(int sample, Printf outfile)
           
 void fcutClass(Printf outfile)
           
 double fcutLvl2(int sample, Printf outfile)
           
 TrainingSet filterTrain(PredClassProfile q)
          reduce training set based on predicted classes of a protein
 void findClassStop(int[] stop)
          find optimum number of steps to stop after, based on a training set
 void jackknife(int sets, Printf outfile)
          split up a set using jackknife procedure
 void link(PredictionSet x)
           
 void newSM(int x)
          make/delete stats for all proteins in a set, and related sets
 void predict2aryHE()
           
 void predictClassRaw()
           
 void predictLvl2HE()
           
 void reduceTrain(Printf outfile)
          This sets the defaults properly, for backward compatibility.
 void reduceTrain(Printf outfile, java.io.DataInputStream binary_input)
          This sets the defaults properly.
 void reduceTrain(Printf outfile, double cutoff_r_a, double cutoff_r_b, double cutoff_r_a_b, double cutoff_lvl2_r_a, double cutoff_lvl2_r_b, double cutoff_lvl2_r_a_b)
          This sets the defaults properly, for backward compatibility.
 void reduceTrain(Printf outfile, double cutoff_r_a, double cutoff_r_b, double cutoff_r_a_b, double cutoff_lvl2_r_a, double cutoff_lvl2_r_b, double cutoff_lvl2_r_a_b, java.io.DataInputStream binary_input)
          Setup reduced training sets based on p_set.
 void saveClassWeights()
          save weights for all 5 class nets
 void setup2aryNet(java.io.DataInputStream binary_input)
          Setup 2ary network from a binary input stream.
 boolean setup2aryNet(java.lang.String weight_file, Printf outfile)
          set up networks... returns true if loaded in old weights.
 void setupClassNet(java.io.DataInputStream binary_input)
          Setup class networks from a binary input stream.
 boolean setupClassNet(java.lang.String weight_file, Printf outfile)
          set up class networks... returns true if loaded in old weights.
 void setupLvl2Net(java.io.DataInputStream binary_input)
          Setup lvl2 network from a binary input stream.
 boolean setupLvl2Net(java.lang.String weight_file, Printf outfile)
          set up lvl 2 network... returns true if loaded in old weights.
 void train2ary(Printf outfile)
          do the whole 2ary training procedure
 void train2aryLoop(Printf outfile)
          train one net as much as needed, including finding cutoff and predicting prediction set, if we're testing.
 int train2aryNet(int nsteps, Printf outfile)
          train a few steps, return actual number trained
 double[] train2aryTcp(int nsteps, int sample, Printf outfile)
          train, cutoff, predict for a single training set, or group of them... returns actual number of steps taken, and the average error.
 void trainClass(Printf outfile)
          do the whole class training procedure
 void trainClassLoop(Printf outfile)
          train one net as much as needed, including finding cutoff and predicting prediction set, if we're testing.
 int trainClassNet(int nsteps, boolean[] train_net, Printf outfile)
          train a few steps, return actual number trained
 double[] trainClassTcp(int nsteps, boolean[] train_net, Printf outfile)
          train, cutoff, predict for a single training set, or group of them... returns actual number of steps taken, avg error on cnet4, and average error on cnet1's.
 void trainLvl2(Printf outfile)
          do the whole 2ary training procedure on lvl2 net
 void trainLvl2Loop(Printf outfile)
          train one net as much as needed, including finding cutoff and predicting prediction set, if we're testing.
 int trainLvl2Net(int nsteps, Printf outfile)
          train a few steps, return actual number trained
 double[] trainLvl2Tcp(int nsteps, int sample, Printf outfile)
          train, cutoff, predict for a single training set, or group of them... returns actual number of steps taken, and average error.
 void unJackknife()
          reverse jackknifing
 void unReduceTrain()
          remove training set reduction.
 
Methods inherited from class org.strbio.mol.lib.pred2ary.TPSet
addSM, clearSM, compute2arySM, computeClassSM, copyLvl1HE, copyUnreducedStats, deleteSM, estAccy, estAccyCount, fcut2ary, fcutClass, fcutLvl2, filterPred, findInputs, name2ary, predict2ary, predict2aryHE, predictClass, predictClassDirectly, predictClassRaw, predictLvl2, predictLvl2HE, print2ary, print2aryByClass, print2aryStats, print2aryUnreduced, printClass, printClassStats, printDirectClassPrediction, printDirectClassStats, smooth, translate2ary, translateClass, translateEA
 
Methods inherited from class org.strbio.mol.lib.pred2ary.PCPSet
addClassPred, addPred, addPred, addPredToArray, clearClassPred, clearPred, combine, divideClassPred, dividePred, loadClassPred, loadPred, newPolymer, newPolymer, pcp, predDiffs, saveClassPred, savePred
 
Methods inherited from class org.strbio.mol.ProfileSet
blast, blast, write, writeClustal, writeClustal, writeMSF, writeMSF, writeProf, writeProf, writeSAF, writeSAF, writeTDP
 
Methods inherited from class org.strbio.mol.ProteinSet
findDSSP, findPDB, fixDistanceGaps, predictSS, predictSS, protein, residues, thread, thread, writeCASP, writeCASP, writeConv, writeConv, writeEA, writeEA, writePDB, writePDB, writePDB, writePDB, writeVar2, writeVar2
 
Methods inherited from class org.strbio.mol.PolymerSet
add, add, add, addReversedCopies, clearPolymers, clearProperties, clearProperty, ensureNames, findClosest, getNames, getPropertyAll, getPropertyOne, isEqual, keepOnlyChainID, keepOnlyNames, keepOnlyNamesFuzzy, load, n, nMonomers, noSpaceNames, nPolymersInFile, p, polymer, polymers, polymersInFile, polymersInFile, polymersInFile, polymersInFile, printNames, read, read, read, readList, remove, remove, removeRedundantSequences, save, searchByName, searchByNameFuzzy, searchByNameFuzzy, searchByNameFuzzyIndex, searchByNameFuzzyIndex, searchByNameIndex, setPolymerAt, setProperty, stripNoAtoms, writeFasta, writeFasta, writeList, writeList, writePTS, writePTS, writeYAPF, writeYAPF
 
Methods inherited from class java.util.Vector
add, add, addAll, addAll, addElement, capacity, clone, contains, containsAll, copyInto, elementAt, elements, ensureCapacity, equals, firstElement, get, hashCode, indexOf, indexOf, insertElementAt, isEmpty, lastElement, lastIndexOf, lastIndexOf, remove, removeAll, removeAllElements, removeElement, removeElementAt, removeRange, retainAll, set, setElementAt, setSize, size, subList, toArray, toArray, toString, trimToSize
 
Methods inherited from class java.util.AbstractList
iterator, listIterator, listIterator
 
Methods inherited from class java.lang.Object
finalize, getClass, notify, notifyAll, wait, wait, wait
 
Methods inherited from interface java.util.List
iterator, listIterator, listIterator
 

Field Detail

p_set

public PredictionSet p_set

net

public NeuralNet net

net_2

public NeuralNet net_2

cnet4

public NeuralNet cnet4

cnet1

public NeuralNet[] cnet1

weight_2ary

public java.lang.String weight_2ary

weight_lvl2

public java.lang.String weight_lvl2

weight_class

public java.lang.String weight_class

done_2ary

public boolean done_2ary

done_lvl2

public boolean done_lvl2

done_class

public boolean done_class
Constructor Detail

TrainingSet

public TrainingSet()

TrainingSet

public TrainingSet(TrainingSet q)
Copy the set, but don't dupicate any data/networks.

Method Detail

clear

public final void clear()
Description copied from class: TPSet
Clear out all information in the set.

Specified by:
clear in interface java.util.Collection
Specified by:
clear in interface java.util.List
Overrides:
clear in class TPSet

link

public final void link(PredictionSet x)

predict2aryHE

public final void predict2aryHE()

predictLvl2HE

public final void predictLvl2HE()

predictClassRaw

public final void predictClassRaw()

setup2aryNet

public final boolean setup2aryNet(java.lang.String weight_file,
                                  Printf outfile)
set up networks... returns true if loaded in old weights.


setup2aryNet

public final void setup2aryNet(java.io.DataInputStream binary_input)
                        throws java.io.IOException
Setup 2ary network from a binary input stream.

Throws:
java.io.IOException

setupLvl2Net

public final boolean setupLvl2Net(java.lang.String weight_file,
                                  Printf outfile)
set up lvl 2 network... returns true if loaded in old weights.


setupLvl2Net

public final void setupLvl2Net(java.io.DataInputStream binary_input)
                        throws java.io.IOException
Setup lvl2 network from a binary input stream.

Throws:
java.io.IOException

setupClassNet

public final boolean setupClassNet(java.lang.String weight_file,
                                   Printf outfile)
set up class networks... returns true if loaded in old weights.


setupClassNet

public final void setupClassNet(java.io.DataInputStream binary_input)
                         throws java.io.IOException
Setup class networks from a binary input stream.

Throws:
java.io.IOException

saveClassWeights

public final void saveClassWeights()
save weights for all 5 class nets


delete2aryNet

public final void delete2aryNet()

deleteLvl2Net

public final void deleteLvl2Net()

deleteClassNet

public final void deleteClassNet()

fcut2ary

public final double fcut2ary(int sample,
                             Printf outfile)

fcutLvl2

public final double fcutLvl2(int sample,
                             Printf outfile)

fcutClass

public final void fcutClass(Printf outfile)

train2aryNet

public final int train2aryNet(int nsteps,
                              Printf outfile)
train a few steps, return actual number trained


trainLvl2Net

public final int trainLvl2Net(int nsteps,
                              Printf outfile)
train a few steps, return actual number trained


trainClassNet

public final int trainClassNet(int nsteps,
                               boolean[] train_net,
                               Printf outfile)
train a few steps, return actual number trained


train2aryTcp

public final double[] train2aryTcp(int nsteps,
                                   int sample,
                                   Printf outfile)
train, cutoff, predict for a single training set, or group of them... returns actual number of steps taken, and the average error.


trainLvl2Tcp

public final double[] trainLvl2Tcp(int nsteps,
                                   int sample,
                                   Printf outfile)
train, cutoff, predict for a single training set, or group of them... returns actual number of steps taken, and average error.


trainClassTcp

public final double[] trainClassTcp(int nsteps,
                                    boolean[] train_net,
                                    Printf outfile)
train, cutoff, predict for a single training set, or group of them... returns actual number of steps taken, avg error on cnet4, and average error on cnet1's.


train2aryLoop

public final void train2aryLoop(Printf outfile)
train one net as much as needed, including finding cutoff and predicting prediction set, if we're testing. if set is divided, train children, then combine statistics


trainLvl2Loop

public final void trainLvl2Loop(Printf outfile)
train one net as much as needed, including finding cutoff and predicting prediction set, if we're testing. if set is divided, train children, then combine statistics


trainClassLoop

public final void trainClassLoop(Printf outfile)
train one net as much as needed, including finding cutoff and predicting prediction set, if we're testing. if set is divided, train children, then combine statistics


findClassStop

public final void findClassStop(int[] stop)
find optimum number of steps to stop after, based on a training set


filterTrain

public final TrainingSet filterTrain(PredClassProfile q)
reduce training set based on predicted classes of a protein


reduceTrain

public final void reduceTrain(Printf outfile,
                              double cutoff_r_a,
                              double cutoff_r_b,
                              double cutoff_r_a_b,
                              double cutoff_lvl2_r_a,
                              double cutoff_lvl2_r_b,
                              double cutoff_lvl2_r_a_b,
                              java.io.DataInputStream binary_input)
Setup reduced training sets based on p_set. Cutoffs can be given for each reduced set; they default to 0.0 if not given. If binary_input is non-null, this will read 4 reduced training networks (r_a, r_b, lvl2_r_a, lvl2_r_b) from a binary input file, and use them if appropriate. In this case, the default is to have all cutoffs equal to 1.0. If binary_input is used, networks can not be trained furthur; this will give an error since the networks are not fully initialized. This would be a bad idea anyway since the networks are not necessarily unique to their own set (as they are when binary_input is null).


reduceTrain

public final void reduceTrain(Printf outfile,
                              double cutoff_r_a,
                              double cutoff_r_b,
                              double cutoff_r_a_b,
                              double cutoff_lvl2_r_a,
                              double cutoff_lvl2_r_b,
                              double cutoff_lvl2_r_a_b)
This sets the defaults properly, for backward compatibility.


reduceTrain

public final void reduceTrain(Printf outfile,
                              java.io.DataInputStream binary_input)
This sets the defaults properly.


reduceTrain

public final void reduceTrain(Printf outfile)
This sets the defaults properly, for backward compatibility.


unReduceTrain

public final void unReduceTrain()
remove training set reduction.


jackknife

public final void jackknife(int sets,
                            Printf outfile)
split up a set using jackknife procedure


unJackknife

public final void unJackknife()
reverse jackknifing


train2ary

public final void train2ary(Printf outfile)
do the whole 2ary training procedure


trainLvl2

public final void trainLvl2(Printf outfile)
do the whole 2ary training procedure on lvl2 net


trainClass

public final void trainClass(Printf outfile)
do the whole class training procedure


newSM

public final void newSM(int x)
make/delete stats for all proteins in a set, and related sets

Overrides:
newSM in class TPSet

estAccy

public final void estAccy()
Estimate accuracy.