org.strbio.mol
Class PolymerSet

java.lang.Object
  extended by java.util.AbstractCollection<E>
      extended by java.util.AbstractList<E>
          extended by java.util.Vector
              extended by org.strbio.mol.PolymerSet
All Implemented Interfaces:
java.io.Serializable, java.lang.Cloneable, java.lang.Iterable, java.util.Collection, java.util.List, java.util.RandomAccess
Direct Known Subclasses:
ProteinSet

public class PolymerSet
extends java.util.Vector

Class to represent a set of polymers to be acted on in a group.

 Version 1.32, 6/24/03 - changed ensureNames, added noSpaceNames
 Version 1.31, 2/12/02 - added more suffixes for fasta format
 Version 1.3, 7/20/01 - added nPolymersInFile
 Version 1.27, 10/26/00 - changed removeRedundantSequences to
   handle subsequences (ALL)
 Version 1.26, 10/24/00 - fixed bug with null outfile
 Version 1.25, 12/6/99 - added addReversedCopies
 Version 1.24, 11/3/99 - added ensureNames
 Version 1.23, 9/2/99 - added keepOnlyChainID
 Version 1.22, 7/9/99 - added properties() calls for entire sets
 Version 1.21, 6/3/99 - extended fuzzy matching ability
 Version 1.2, 6/1/99 - added polymers, polymersInFile enumerations
 Version 1.18, 3/30/99 - added getNames, keepOnly* functions
 Version 1.17, 3/24/99 - made Java 1.2 compatible
 Version 1.16, 2/10/99 - added YAPF format
 Version 1.15, 11/23/98 - added yield() to read() to allow multithreading
 Version 1.14, 11/18/98 - added removeRedundantSequences
 Version 1.13, 10/28/98 - addes searchByNameFuzzy
 Version 1.12, 8/7/98 - added read(infile, outfile), writeXXX(outfile)
 Version 1.11, 7/17/98 - added setPolymerAt
 Version 1.1, 5/21/98 - read() looks for filename and filename.gz;
   Files ending in .gz are transparently read in.
 Version 1.01, 5/5/98 - added writeFasta
 Version 1.0, 4/22/98 - based on ProteinSet 1.1
 

Version:
1.32, 6/24/03
Author:
JMC
See Also:
Polymer, ThreadSet, Serialized Form

Nested Class Summary
static class PolymerSet.PolymerEnumeration
          An enumeration of Polymers, that gets them one by one out of a file.
 
Field Summary
 
Fields inherited from class java.util.Vector
capacityIncrement, elementCount, elementData
 
Fields inherited from class java.util.AbstractList
modCount
 
Constructor Summary
PolymerSet()
          Create an empty set of polymers.
PolymerSet(PolymerSet y)
          Create a set by copying another set, but not the polymers in it.
 
Method Summary
 void add(Polymer q)
          add in a single Polymer to this set, without duplicating it.
 void add(PolymerSet q)
          add in another set of Polymers, without duplicating them.
 void add(java.lang.String filename)
          add in another set of Polymers from a file, without duplicating them.
 void addReversedCopies()
          Add reversed copies from the set.
 void clear()
          Delete all info on this set, but not the polymers themselves.
protected  void clearPolymers()
          Delete info on which polymers are in this set, but not the polymers themselves.
 void clearProperties()
          clears all the properties.
 void clearProperty(int prop)
          clears (un-sets) one property.
 void ensureNames()
          name unnamed Polymers according to position in set (starting at 1)
 int[] findClosest(Polymer q)
          return the most similar Polymer (by sequence ID) in a set to a given Polymer.
 java.lang.String[] getNames()
          get names of Polymers in this set.
 boolean getPropertyAll(int prop)
          looks up one property; returns true only if all polymers in the set have the property.
 boolean getPropertyOne(int prop)
          looks up a property; returns true if at least one polymer in the set has the property.
 boolean isEqual(PolymerSet y)
          do 2 sets of Polymers contain the same Polymers?
 int keepOnlyChainID(char ch)
          Removes all polymers in the set not having a particular chain ID.
 int keepOnlyNames(java.lang.String[] names)
          Like searchByName, but for a whole set of names.
 int keepOnlyNamesFuzzy(java.lang.String[] names)
          Like searchByNameFuzzy, but for a whole set of names.
 void load(java.lang.String filename, Printf outfile)
          Default load automatically figures out file type.
 int n()
          How many polymers are in the set?
 Polymer newPolymer()
          Create a new polymer to add to the set... this may be replaced by subclasses which encapsulate specific polymers.
 long nMonomers()
          total # of monomers in the set
 void noSpaceNames()
          strip names after the first space
 int nPolymersInFile(java.io.BufferedReader infile)
          Count the number of polymers in a file.
 Polymer p(int i)
          short for polymer(i)
 Polymer polymer(int i)
          Return the i'th polymer in the set.
 java.util.Enumeration polymers()
          Return an enumeration of polymers in the set.
 java.util.Enumeration polymersInFile(java.io.BufferedReader infile)
          Return an enumeration of polymers in a file.
 java.util.Enumeration polymersInFile(java.io.BufferedReader infile, Printf outfile)
          Return an enumeration of polymers in a file.
 java.util.Enumeration polymersInFile(java.lang.String filename)
          Return an enumeration of polymers in a file.
 java.util.Enumeration polymersInFile(java.lang.String filename, Printf outfile)
          Return an enumeration of polymers in a file.
 void printNames(int indent, Printf outfile)
          print names of Polymers in this set.
 void read(java.io.BufferedReader infile, Printf outfile)
          Read Polymers from unknown file type.
 void read(java.lang.String filename)
          Read Polymers from unknown file type (figures out what kind of Polymers they are).
 void read(java.lang.String filename, Printf outfile)
          Read Polymers from unknown file type.
 void readList(java.lang.String filename, Printf outfile)
          Read Polymers from a list file (a list of the Polymer names).
 java.lang.Object remove(int i)
          Remove polymer number i from the set, and return it.
 void remove(Polymer q)
          Remove a polymer from the set.
 void removeRedundantSequences(boolean removeSubSequences)
          Eliminate redundant sequences (or subsequences) from the set.
 void save(java.lang.String filename)
          Save is equivalent to write.
 Polymer searchByName(java.lang.String name)
          return a Polymer matching a given name, or null if not in set
 Polymer searchByNameFuzzy(java.lang.String name)
          Find a Polymer matching a given name, using fuzzy matching.
 Polymer searchByNameFuzzy(java.lang.String name, int fuzziness)
          Find a Polymer matching a given name, using fuzzy matching.
 int searchByNameFuzzyIndex(java.lang.String name)
          Find a Polymer matching a given name, using fuzzy matching, Returns the index, or -1 if nothing close is found.
 int searchByNameFuzzyIndex(java.lang.String name, int fuzziness)
          Find a Polymer matching a given name, using fuzzy matching, Returns the index, or -1 if nothing close is found.
 int searchByNameIndex(java.lang.String name)
          return index of Polymer matching a given name, or -1 if not in set
 void setPolymerAt(int i, Polymer p)
          set the i'th polymer in the set.
 void setProperty(int prop)
          sets one property.
 void stripNoAtoms()
          strip out monomers without atomic coordinates, for all Polymers.
 boolean write(java.lang.String filename)
          Write set to a file, with name determined by the suffix.
 void writeFasta(Printf outfile)
          Write set to Fasta file.
 void writeFasta(java.lang.String filename)
          Write set to Fasta file.
 void writeList(Printf outfile)
          create a file with a list of protein names in it.
 void writeList(java.lang.String filename)
          create a file with a list of protein names in it.
 void writePDB(Printf outfile)
          Write set to PDB file.
 void writePDB(java.lang.String filename)
          Write set to PDB file.
 void writePTS(Printf outfile)
          Write set to PTS file.
 void writePTS(java.lang.String filename)
          Write set to PTS file.
 void writeYAPF(Printf outfile)
          Write set to YAPF file.
 void writeYAPF(java.lang.String filename)
          Write set to YAPF file.
 
Methods inherited from class java.util.Vector
add, add, addAll, addAll, addElement, capacity, clone, contains, containsAll, copyInto, elementAt, elements, ensureCapacity, equals, firstElement, get, hashCode, indexOf, indexOf, insertElementAt, isEmpty, lastElement, lastIndexOf, lastIndexOf, remove, removeAll, removeAllElements, removeElement, removeElementAt, removeRange, retainAll, set, setElementAt, setSize, size, subList, toArray, toArray, toString, trimToSize
 
Methods inherited from class java.util.AbstractList
iterator, listIterator, listIterator
 
Methods inherited from class java.lang.Object
finalize, getClass, notify, notifyAll, wait, wait, wait
 
Methods inherited from interface java.util.List
iterator, listIterator, listIterator
 

Constructor Detail

PolymerSet

public PolymerSet()
Create an empty set of polymers.


PolymerSet

public PolymerSet(PolymerSet y)
Create a set by copying another set, but not the polymers in it.

Method Detail

n

public int n()
How many polymers are in the set?


polymer

public Polymer polymer(int i)
Return the i'th polymer in the set.


setPolymerAt

public void setPolymerAt(int i,
                         Polymer p)
set the i'th polymer in the set.


p

public Polymer p(int i)
short for polymer(i)


newPolymer

public Polymer newPolymer()
Create a new polymer to add to the set... this may be replaced by subclasses which encapsulate specific polymers.


clearPolymers

protected void clearPolymers()
Delete info on which polymers are in this set, but not the polymers themselves.


clear

public void clear()
Delete all info on this set, but not the polymers themselves.

Specified by:
clear in interface java.util.Collection
Specified by:
clear in interface java.util.List
Overrides:
clear in class java.util.Vector

polymers

public java.util.Enumeration polymers()
Return an enumeration of polymers in the set.


setProperty

public void setProperty(int prop)
sets one property.


getPropertyAll

public boolean getPropertyAll(int prop)
looks up one property; returns true only if all polymers in the set have the property.


getPropertyOne

public boolean getPropertyOne(int prop)
looks up a property; returns true if at least one polymer in the set has the property.


clearProperty

public void clearProperty(int prop)
clears (un-sets) one property.


clearProperties

public void clearProperties()
clears all the properties.


readList

public void readList(java.lang.String filename,
                     Printf outfile)
              throws java.io.IOException
Read Polymers from a list file (a list of the Polymer names).

Throws:
java.io.IOException
See Also:
Polymer.readList(java.io.BufferedReader, org.strbio.io.Printf)

read

public void read(java.io.BufferedReader infile,
                 Printf outfile)
          throws java.io.IOException
Read Polymers from unknown file type. Prints what it's doing to output file. By default, this only understands Fasta. Subclasses of Polymer are a bit more intelligent.

Throws:
java.io.IOException
See Also:
Polymer.read(java.io.BufferedReader, org.strbio.io.Printf, boolean)

read

public void read(java.lang.String filename,
                 Printf outfile)
          throws java.io.IOException
Read Polymers from unknown file type. Prints what it's doing to output file. By default, this only understands Fasta. Subclasses of Polymer are a bit more intelligent.

Throws:
java.io.IOException
See Also:
Polymer.read(java.io.BufferedReader, org.strbio.io.Printf, boolean)

nPolymersInFile

public int nPolymersInFile(java.io.BufferedReader infile)
Count the number of polymers in a file.


polymersInFile

public java.util.Enumeration polymersInFile(java.io.BufferedReader infile)
Return an enumeration of polymers in a file. This is not static because it has to know about different types of Polymers (subclasses of Polymer).


polymersInFile

public java.util.Enumeration polymersInFile(java.io.BufferedReader infile,
                                            Printf outfile)
Return an enumeration of polymers in a file. This is not static because it has to know about different types of Polymers (subclasses of Polymer). Will print to outfile as it reads polymers (similar to read).


polymersInFile

public java.util.Enumeration polymersInFile(java.lang.String filename,
                                            Printf outfile)
Return an enumeration of polymers in a file. This is not static because it has to know about different types of Polymers (subclasses of Polymer). Prints status to outfile as it reads polymers.


polymersInFile

public java.util.Enumeration polymersInFile(java.lang.String filename)
Return an enumeration of polymers in a file. This is not static because it has to know about different types of Polymers (subclasses of Polymer).


read

public void read(java.lang.String filename)
          throws java.io.IOException
Read Polymers from unknown file type (figures out what kind of Polymers they are).

Throws:
java.io.IOException
See Also:
Polymer.read(java.io.BufferedReader, org.strbio.io.Printf, boolean)

writePDB

public void writePDB(java.lang.String filename)
              throws java.io.IOException
Write set to PDB file.

Parameters:
filename - name of file to write to
Throws:
java.io.IOException
See Also:
Polymer.writePDB(org.strbio.io.Printf)

writePDB

public void writePDB(Printf outfile)
              throws java.io.IOException
Write set to PDB file.

Throws:
java.io.IOException
See Also:
Polymer.writePDB(org.strbio.io.Printf)

writePTS

public void writePTS(java.lang.String filename)
              throws java.io.IOException
Write set to PTS file.

Throws:
java.io.IOException
See Also:
Polymer.writePTS(org.strbio.io.Printf)

writePTS

public void writePTS(Printf outfile)
              throws java.io.IOException
Write set to PTS file.

Throws:
java.io.IOException
See Also:
Polymer.writePTS(org.strbio.io.Printf)

writeFasta

public void writeFasta(java.lang.String filename)
                throws java.io.IOException
Write set to Fasta file.

Throws:
java.io.IOException
See Also:
Polymer.writeFasta(org.strbio.io.Printf)

writeFasta

public void writeFasta(Printf outfile)
                throws java.io.IOException
Write set to Fasta file.

Throws:
java.io.IOException
See Also:
Polymer.writeFasta(org.strbio.io.Printf)

writeYAPF

public void writeYAPF(java.lang.String filename)
               throws java.io.IOException
Write set to YAPF file.

Throws:
java.io.IOException
See Also:
Polymer.writeYAPF(org.strbio.io.Printf)

writeYAPF

public void writeYAPF(Printf outfile)
               throws java.io.IOException
Write set to YAPF file.

Throws:
java.io.IOException
See Also:
Polymer.writeYAPF(org.strbio.io.Printf)

write

public boolean write(java.lang.String filename)
              throws java.io.IOException
Write set to a file, with name determined by the suffix. Returns true if the name was understood, or false if a default (YAPF format) file was written.

Throws:
java.io.IOException

load

public void load(java.lang.String filename,
                 Printf outfile)
          throws java.io.IOException
Default load automatically figures out file type.

Throws:
java.io.IOException
See Also:
read(java.io.BufferedReader, org.strbio.io.Printf)

save

public void save(java.lang.String filename)
          throws java.io.IOException
Save is equivalent to write.

Throws:
java.io.IOException
See Also:
write(java.lang.String)

add

public void add(Polymer q)
add in a single Polymer to this set, without duplicating it.


remove

public final void remove(Polymer q)
Remove a polymer from the set.


remove

public final java.lang.Object remove(int i)
Remove polymer number i from the set, and return it.

Specified by:
remove in interface java.util.List
Overrides:
remove in class java.util.Vector

add

public final void add(PolymerSet q)
add in another set of Polymers, without duplicating them.


add

public final void add(java.lang.String filename)
add in another set of Polymers from a file, without duplicating them.

Parameters:
filename - name of file to load other set from.

printNames

public final void printNames(int indent,
                             Printf outfile)
print names of Polymers in this set.

Parameters:
indent - number of spaces of indentation
outfile - output file to print to

writeList

public final void writeList(Printf outfile)
create a file with a list of protein names in it.


writeList

public final void writeList(java.lang.String filename)
                     throws java.io.IOException
create a file with a list of protein names in it.

Throws:
java.io.IOException

getNames

public final java.lang.String[] getNames()
get names of Polymers in this set.


isEqual

public final boolean isEqual(PolymerSet y)
do 2 sets of Polymers contain the same Polymers?


nMonomers

public final long nMonomers()
total # of monomers in the set


searchByNameIndex

public final int searchByNameIndex(java.lang.String name)
return index of Polymer matching a given name, or -1 if not in set


searchByName

public final Polymer searchByName(java.lang.String name)
return a Polymer matching a given name, or null if not in set


searchByNameFuzzyIndex

public final int searchByNameFuzzyIndex(java.lang.String name,
                                        int fuzziness)
Find a Polymer matching a given name, using fuzzy matching, Returns the index, or -1 if nothing close is found.
      fuzziness = 0 -> exact match
      fuzziness = 1 -> case insensitive match
      fuzziness = 2 -> no _
      fuzziness = 3 -> no _, case insensitive
      fuzziness = 4 -> no leading digit or _; case insensitive
      fuzziness = 5 -> no last character, case insensitive
      


searchByNameFuzzyIndex

public final int searchByNameFuzzyIndex(java.lang.String name)
Find a Polymer matching a given name, using fuzzy matching, Returns the index, or -1 if nothing close is found. Default is maximum fuzziness.


searchByNameFuzzy

public final Polymer searchByNameFuzzy(java.lang.String name,
                                       int fuzziness)
Find a Polymer matching a given name, using fuzzy matching. Null if nothing close is found.
      fuzziness = 0 -> exact match
      fuzziness = 1 -> case insensitive match
      fuzziness = 2 -> no _
      fuzziness = 3 -> no _, case insensitive
      fuzziness = 4 -> no leading digit or _; case insensitive
      fuzziness = 5 -> no last character, case insensitive
      


searchByNameFuzzy

public final Polymer searchByNameFuzzy(java.lang.String name)
Find a Polymer matching a given name, using fuzzy matching. Null if nothing close is found. Default is maximum fuzziness.


keepOnlyNames

public final int keepOnlyNames(java.lang.String[] names)
Like searchByName, but for a whole set of names. Removes all polymers in the set not matching one of the given names. Returns number removed.


keepOnlyChainID

public final int keepOnlyChainID(char ch)
Removes all polymers in the set not having a particular chain ID. Returns number removed.


keepOnlyNamesFuzzy

public final int keepOnlyNamesFuzzy(java.lang.String[] names)
Like searchByNameFuzzy, but for a whole set of names. Removes all polymers in the set not matching at least one of the given names. Returns number removed. Will keep only the best (fuzzy) match for each name, if there is a best match, so that common patterns don't keep too many proteins.


findClosest

public final int[] findClosest(Polymer q)
return the most similar Polymer (by sequence ID) in a set to a given Polymer.

Returns:
array of 2 integers, the first being the index of the Polymer in this set, and the second being the raw # of matches.

removeRedundantSequences

public final void removeRedundantSequences(boolean removeSubSequences)
Eliminate redundant sequences (or subsequences) from the set.


stripNoAtoms

public final void stripNoAtoms()
strip out monomers without atomic coordinates, for all Polymers.


ensureNames

public final void ensureNames()
name unnamed Polymers according to position in set (starting at 1)


noSpaceNames

public final void noSpaceNames()
strip names after the first space


addReversedCopies

public final void addReversedCopies()
Add reversed copies from the set.