org.strbio.mol
Class Alignment

java.lang.Object
  extended by org.strbio.mol.Alignment

public class Alignment
extends java.lang.Object

An Alignment is a simple representation of an alignment between two polymers.

  Version 1.5, 1/7/03 - added inverse
  Version 1.4, 2/12/02 - added stripGaps, makeSameLength, stripCommonGaps
    improved printModeller
  Version 1.3, 8/31/99 - added avgScoreWith*Gaps to calculate score for
    any arbitrary AlignmentParameters
  Version 1.23, 8/30/99 - made arrayToPolymers do stripGaps first
  Version 1.22, 6/25/99 - added AlignmentParameters function calls
  Version 1.21, 6/23/99 - added lastScore
  Version 1.2, 4/21/99 - made seq, fold part of this object.  The
    polymers themselves are not copied, so this isn't too wasteful.
  Version 1.13, 4/15/99 - added nID, pctAligned
  Version 1.12, 4/2/99 - added pctID
  Version 1.11, 3/31/99 - added nAligned
  Version 1.1, 3/5/99 - added makeFromSuperposition, using JDB's alignment
    algorithm.
  Version 1.01, 12/2/98 - changed name to Alignment; old Alignment
    is now AlignmentSet.
  Version 1.0, 11/3/98 - original version
  

Version:
1.5, 1/7/03
Author:
JMC

Field Summary
 Polymer fold
          the fold in the alignment.
 double lastScore
          last total score for when this alignment was made.
static double LONGEST_DISTANCE_CUTOFF
          The longest distance 2 aligned CA atoms can be apart in the makeFromSuperposition* algorithms: currently 6 A.
 Polymer seq
          the sequence in the alignment.
protected  int[] sfArray
          A vector containing the current alignment.
static int SHORTEST_ALIGNMENT_LENGTH
          The smallest number of consecutive monomers that can be aligned: currently 2.
 
Constructor Summary
Alignment()
          making a new alignment sets everything to null.
Alignment(Alignment av)
          make a new alignment from another one.
Alignment(int seqLength)
          make a new alignment vector of a given length, setting all entries to unaligned.
Alignment(int[] v)
          make a new alignment from an integer array.
Alignment(Polymer s, Polymer f)
          make a new alignment with a seq and fold.
Alignment(Polymer s, Polymer f, int[] v)
          make a new alignment from two polymers and an integer array.
 
Method Summary
 int aligned(int i)
          returns number of fold monomer alignmed to seq monomer i, or -1 if aligned to a gap.
 void arrayToPolymers()
          Sets up gaps in two Polymers to match this vector.
 double[] ASns(Alignment correct, int tolerance)
          calculate alignment sensitivity, as in CASP2.
 double[] ASpc(Alignment correct, int tolerance)
          calculate alignment specificity, as in CASP2.
 double averageScore(ScoreList sl)
          Returns average score of all pairs of aligned monomers.
 double avgScoreWithGaps(AlignmentParameters ap)
          Returns average score for all pairs of aligned monomers.
 double avgScoreWithoutGaps(AlignmentParameters ap)
          Returns average score for all pairs of aligned monomers.
 double[] compare(Alignment av, int tolerance)
          compares two alignments.
 void filterByDistance()
          This uses the default cutoff (LONGEST_DISTANCE_CUTOFF).
 void filterByDistance(double cutoff)
          Check all aligned pairs of residues; remove any with CA-CA distance greater than CUTOFF.
 void filterByLength()
          This uses the default cutoff (SHORTEST_ALIGNMENT_LENGTH).
 void filterByLength(int cutoff)
          Check all aligned pairs of residues; remove any stretches of alignment shorter than SHORTEST_ALIGNMENT_LENGTH.
 int[] foldToSeq()
          Return a vector of which fold monomer is related to which sequence monomer.
 int[] foldToSeq(int foldLength)
          Return a vector of which fold monomer is related to which sequence monomer.
 double globalAlign(AlignmentParameters ap)
          Do the global alignment, returning the score (which is also stored in lastScore)
 double globalAlign(AlignmentParameters ap, Alignment nail)
          Do the global alignment, returning the score (which is also stored in lastScore)
 Alignment inverse()
          invert alignment, returning fold->seq alignment
protected  void loadOld(java.lang.String in_file, int seqLength, int foldLength)
          load an Alignment out of a simple format file; contains pairs of numbers indicating which res in the sequence aligns to which res in the fold.
static Alignment makeFromSuperposition(Protein seq, Protein fold)
          Make from 2 superimposed proteins, using JDB's method. 1) Make table of everything in (fold) which is <= CUTOFF angstroms from each residue in (seq). 2) Get longest consecutive stretch, and set them as aligned; remove these residues from future consideration. 3) Repeat until nothing is left that has any matches.
static Alignment makeFromSuperpositionDP(Protein seq, Protein fold)
          Make from a superposition, using dynamic programming.
 void makeSameLength()
          pad both seqs to same length
 int minimumFoldLength()
          Return the last monomer in the fold which is aligned to something in the sequence.
 int nAligned()
          Returns the number of aligned positions.
 int nID()
          Returns the number of identical residues in the alignment.
 double pctAligned()
          Returns the percent aligned residues in the alignment.
 double pctAlignedLongest()
          Returns the percent aligned residues in the alignment, relative to the longer of the two sequences.
 double pctID()
          Returns the percent identical residues in the alignment.
 void polymersToArray()
          Sets up array based on alignment in Polymers.
 void print(Printf outfile, boolean showgaps)
          prints out both sequences with alternating seq and fold lines.
 void printModeller(Printf outfile)
          prints both sequences in a format Modeller likes.
 void printRelative(Alignment b, Printf outfile)
          prints out both sequences with alternating seq and fold lines, and also shows relation relative to another alignment (i.e. the correct alignment)
 double RMS()
          RMS of aligned CA positions.
 void saveCASP(Printf outfile)
          Save alignment in CASP format, to open file.
 void saveCASP(java.lang.String filename)
          Save alignment in CASP format, creating new file.
 void saveOld(java.lang.String filename)
          Saves alignment to a file in old format.
 int[] seqToFold()
          Returns a copy of the seq to fold array.
 void setAligned(int i, int j)
          Sets seq monomer i to be aligned to fold monomer j.
 void setUnaligned(int i)
          Sets seq monomer i to be unaligned.
 double[] shift(Alignment av)
          calculate average alignment shift, as in CASP2.
static char shiftToChar(int shift)
          Convert an alignment shift to an ascii character for printing.
 void stats(Printf outfile)
          get statistics on the alignment.
 void stripCommonGaps()
          Strip common gaps in the two sequences
 void stripGaps()
          Strip all gaps in the two sequences
 char[][] toCharArrays()
          Saves alignment in 2 char[] arrays (sequence and fold), so it can be printed.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

sfArray

protected int[] sfArray
A vector containing the current alignment. Internally, there is one integer per monomer in a sequence. The integer is -1 if the mononmer is aligned to nothing (a gap), or else holds the index of the fold monomer the monomer is aligned to (starting with 0).


lastScore

public double lastScore
last total score for when this alignment was made.


seq

public Polymer seq
the sequence in the alignment.


fold

public Polymer fold
the fold in the alignment.


LONGEST_DISTANCE_CUTOFF

public static final double LONGEST_DISTANCE_CUTOFF
The longest distance 2 aligned CA atoms can be apart in the makeFromSuperposition* algorithms: currently 6 A.

See Also:
Constant Field Values

SHORTEST_ALIGNMENT_LENGTH

public static final int SHORTEST_ALIGNMENT_LENGTH
The smallest number of consecutive monomers that can be aligned: currently 2.

See Also:
Constant Field Values
Constructor Detail

Alignment

public Alignment()
making a new alignment sets everything to null.


Alignment

public Alignment(Polymer s,
                 Polymer f)
make a new alignment with a seq and fold.


Alignment

public Alignment(int seqLength)
make a new alignment vector of a given length, setting all entries to unaligned.


Alignment

public Alignment(Polymer s,
                 Polymer f,
                 int[] v)
make a new alignment from two polymers and an integer array. No checking for validity is done.


Alignment

public Alignment(int[] v)
make a new alignment from an integer array. No checking for validity is done.


Alignment

public Alignment(Alignment av)
make a new alignment from another one.

Method Detail

aligned

public int aligned(int i)
returns number of fold monomer alignmed to seq monomer i, or -1 if aligned to a gap. No error checking is done.


setAligned

public void setAligned(int i,
                       int j)
Sets seq monomer i to be aligned to fold monomer j. No error checking is done.


setUnaligned

public void setUnaligned(int i)
Sets seq monomer i to be unaligned. No error checking is done.


nID

public final int nID()
Returns the number of identical residues in the alignment.


pctID

public final double pctID()
Returns the percent identical residues in the alignment. This is the number of aligned, identical residues divided by the alignment length (the length of the shorter of the 2 sequences)


pctAligned

public final double pctAligned()
Returns the percent aligned residues in the alignment. This is the number of aligned residues divided by the alignment length (the length of the shorter of the 2 sequences)


pctAlignedLongest

public final double pctAlignedLongest()
Returns the percent aligned residues in the alignment, relative to the longer of the two sequences.


compare

public final double[] compare(Alignment av,
                              int tolerance)
compares two alignments. Returns a double[3] with the percent in common, the number of common positions, and the total number of positions compared. Positions are allowed to vary by +/- tolerance. This only counts positions where both alignments are aligned to something; if either is a gap, the position is not scored.


ASpc

public final double[] ASpc(Alignment correct,
                           int tolerance)
calculate alignment specificity, as in CASP2. Returns a double[4] with the ASpc value, the ACrct value (correctly aligned positions), Na (aligned positions in this alignment), and NaC (aligned positions in the correct alignment) Marchler-Bauer & Bryant, Proteins Supplement 1, 1997, 74-82.


ASns

public final double[] ASns(Alignment correct,
                           int tolerance)
calculate alignment sensitivity, as in CASP2. Returns a double[4] with the ASns value, the ACrct value (correctly aligned positions), Na (aligned positions in this alignment), and NaC (aligned positions in the correct alignment)


shift

public final double[] shift(Alignment av)
calculate average alignment shift, as in CASP2. Returns double[3] with average shift, total shift, and the number of positions compared.


RMS

public final double RMS()
RMS of aligned CA positions. Structures must already be superimposed. Note that this will return 0.0 for non-Protein polymers, since it should be measured for something besides CA atoms.


foldToSeq

public final int[] foldToSeq(int foldLength)
Return a vector of which fold monomer is related to which sequence monomer. Starts with 0; -1 indicates a gap.


minimumFoldLength

public final int minimumFoldLength()
Return the last monomer in the fold which is aligned to something in the sequence. The actual fold length might be greater.


inverse

public final Alignment inverse()
invert alignment, returning fold->seq alignment


foldToSeq

public final int[] foldToSeq()
Return a vector of which fold monomer is related to which sequence monomer. Starts with 0; -1 indicates a gap. If you don't know the fold length, the largest (last) monomer in the sequence array will be used.


seqToFold

public final int[] seqToFold()
Returns a copy of the seq to fold array. Starts with 0; -1 indicates a gap.


arrayToPolymers

public final void arrayToPolymers()
Sets up gaps in two Polymers to match this vector.


polymersToArray

public final void polymersToArray()
Sets up array based on alignment in Polymers.


stripGaps

public final void stripGaps()
Strip all gaps in the two sequences


makeSameLength

public final void makeSameLength()
pad both seqs to same length


stripCommonGaps

public final void stripCommonGaps()
Strip common gaps in the two sequences


loadOld

protected final void loadOld(java.lang.String in_file,
                             int seqLength,
                             int foldLength)
load an Alignment out of a simple format file; contains pairs of numbers indicating which res in the sequence aligns to which res in the fold.


nAligned

public final int nAligned()
Returns the number of aligned positions.


averageScore

public final double averageScore(ScoreList sl)
Returns average score of all pairs of aligned monomers.


avgScoreWithoutGaps

public final double avgScoreWithoutGaps(AlignmentParameters ap)
Returns average score for all pairs of aligned monomers. Gaps are not counted.


avgScoreWithGaps

public final double avgScoreWithGaps(AlignmentParameters ap)
Returns average score for all pairs of aligned monomers. Gaps are counted; i.e. the total alignment score (including gap penalties) is divided by the number of aligned monomers.


shiftToChar

public static final char shiftToChar(int shift)
Convert an alignment shift to an ascii character for printing. i.e. 0 == '|', +1 = '\', -1 = '/', etc.


printRelative

public final void printRelative(Alignment b,
                                Printf outfile)
                         throws java.io.IOException
prints out both sequences with alternating seq and fold lines, and also shows relation relative to another alignment (i.e. the correct alignment)

Throws:
java.io.IOException

toCharArrays

public final char[][] toCharArrays()
Saves alignment in 2 char[] arrays (sequence and fold), so it can be printed. The first dimension is always 2 (0=seq, 1=fold), and the second is the length of the alignment, including gaps.


print

public final void print(Printf outfile,
                        boolean showgaps)
                 throws java.io.IOException
prints out both sequences with alternating seq and fold lines.

Throws:
java.io.IOException

globalAlign

public final double globalAlign(AlignmentParameters ap,
                                Alignment nail)
Do the global alignment, returning the score (which is also stored in lastScore)


globalAlign

public final double globalAlign(AlignmentParameters ap)
Do the global alignment, returning the score (which is also stored in lastScore)


makeFromSuperposition

public static final Alignment makeFromSuperposition(Protein seq,
                                                    Protein fold)
Make from 2 superimposed proteins, using JDB's method. 1) Make table of everything in (fold) which is <= CUTOFF angstroms from each residue in (seq). 2) Get longest consecutive stretch, and set them as aligned; remove these residues from future consideration. 3) Repeat until nothing is left that has any matches. Note: this will not usually be a legal N-W result, but will not have any loopbacks.


makeFromSuperpositionDP

public static final Alignment makeFromSuperpositionDP(Protein seq,
                                                      Protein fold)
Make from a superposition, using dynamic programming.


filterByDistance

public final void filterByDistance(double cutoff)
Check all aligned pairs of residues; remove any with CA-CA distance greater than CUTOFF. Warning: Structures must already be superimposed!


filterByLength

public final void filterByLength(int cutoff)
Check all aligned pairs of residues; remove any stretches of alignment shorter than SHORTEST_ALIGNMENT_LENGTH.


filterByDistance

public final void filterByDistance()
This uses the default cutoff (LONGEST_DISTANCE_CUTOFF).


filterByLength

public final void filterByLength()
This uses the default cutoff (SHORTEST_ALIGNMENT_LENGTH).


saveOld

public final void saveOld(java.lang.String filename)
                   throws java.io.IOException
Saves alignment to a file in old format.

Throws:
java.io.IOException

saveCASP

public final void saveCASP(java.lang.String filename)
                    throws java.io.IOException
Save alignment in CASP format, creating new file.

Throws:
java.io.IOException

saveCASP

public final void saveCASP(Printf outfile)
                    throws java.io.IOException
Save alignment in CASP format, to open file.

Throws:
java.io.IOException

printModeller

public final void printModeller(Printf outfile)
                         throws java.io.IOException
prints both sequences in a format Modeller likes.

Throws:
java.io.IOException

stats

public final void stats(Printf outfile)
                 throws java.io.IOException
get statistics on the alignment. This is really for debugging purposes, and you shouldn't use it.

Throws:
java.io.IOException