Cigar Class Reference

Inheritance diagram for Cigar:
Inheritance graph
[legend]
Collaboration diagram for Cigar:
Collaboration graph
[legend]

List of all members.

Classes

struct  CigarOperator

Public Types

enum  Operation {
  none, match, mismatch, insert,
  del, skip, softClip, hardClip,
  pad
}

Public Member Functions

void getCigarString (String &cigarString) const
void getCigarString (std::string &cigarString) const
void getExpandedString (std::string &s) const
 obtain a non-run length encoded string of operations
const CigarOperatoroperator[] (int i) const
const CigarOperatorgetOperator (int i) const
bool operator== (Cigar &rhs) const
int size () const
void Dump () const
int getExpectedQueryBaseCount () const
 return the length of the read that corresponds to the current CIGAR string.
int getExpectedReferenceBaseCount () const
 return the number of bases in the reference that this read "spans"
int getNumBeginClips () const
 Return the number of clips that are at the beginning of the cigar.
int getNumEndClips () const
 Return the number of clips that are at the end of the cigar.
int32_t getRefOffset (int32_t queryIndex)
int32_t getQueryIndex (int32_t refOffset)
int32_t getRefPosition (int32_t queryIndex, int32_t queryStartPos)
int32_t getQueryIndex (int32_t refPosition, int32_t queryStartPos)
uint32_t getNumOverlaps (int32_t start, int32_t end, int32_t queryStartPos)

Static Public Member Functions

static bool foundInQuery (Operation op)
static bool isClip (Operation op)

Static Public Attributes

static const int32_t INDEX_NA = -1

Protected Member Functions

void clearQueryAndReferenceIndexes ()
void setQueryAndReferenceIndexes ()

Protected Attributes

std::vector< CigarOperatorcigarOperations

Friends

std::ostream & operator<< (std::ostream &stream, const Cigar &cigar)

Detailed Description

Definition at line 80 of file Cigar.h.


Member Function Documentation

void Cigar::getExpandedString ( std::string &  s  )  const

obtain a non-run length encoded string of operations

The returned string is actually also a valid CIGAR string, but it does not have any digits in it - only the characters themselves. In theory this makes it easier to parse some reads.

/return s the string to populate

Definition at line 63 of file Cigar.cpp.

00064 {
00065     s = "";
00066 
00067     std::vector<CigarOperator>::const_iterator i;
00068 
00069     // Progressively append the character representations of the operations to
00070     // the string passed in
00071 
00072     for (i = cigarOperations.begin(); i != cigarOperations.end(); i++)
00073     {
00074         for (uint32_t j = 0; j<(*i).count; j++) s += (*i).getChar();
00075     }
00076     return;
00077 }

int Cigar::getExpectedQueryBaseCount (  )  const

return the length of the read that corresponds to the current CIGAR string.

For validation, we should expect that a sequence read in a SAM file will be the same length as the value returned by this method.

Example: 3M2D3M describes a read with three bases matching the reference, then skips 2 bases, then has three more bases that match the reference (match/mismatch). In this case, the read length is expected to be 6.

Example: 3M2I3M describes a read with 3 match/mismatch bases, two extra bases, and then 3 more match/mistmatch bases. The total in this example is 8 bases.

/return returns the expected read length

Definition at line 110 of file Cigar.cpp.

00111 {
00112     int matchCount = 0;
00113     std::vector<CigarOperator>::const_iterator i;
00114     for (i = cigarOperations.begin(); i != cigarOperations.end(); i++)
00115     {
00116         switch (i->operation)
00117         {
00118             case match:
00119             case mismatch:
00120             case softClip:
00121             case insert:
00122                 matchCount += i->count;
00123                 break;
00124             default:
00125                 // we only care about operations that are in the query sequence.
00126                 break;
00127         }
00128     }
00129     return matchCount;
00130 }

int Cigar::getExpectedReferenceBaseCount (  )  const

return the number of bases in the reference that this read "spans"

When doing range checking, we occassionally need to know how many total bases the CIGAR string represents as compared to the reference.

Examples: 3M2D3M describes a read that overlays 8 bases in the reference. 3M2I3M describes a read with 3 bases that match the reference, two additional bases that aren't in the reference, and 3 more bases that match the reference, so it spans 6 bases in the reference.

/return how many bases in the reference are spanned by the given CIGAR string

Definition at line 149 of file Cigar.cpp.

00150 {
00151     int matchCount = 0;
00152     std::vector<CigarOperator>::const_iterator i;
00153     for (i = cigarOperations.begin(); i != cigarOperations.end(); i++)
00154     {
00155         switch (i->operation)
00156         {
00157             case match:
00158             case mismatch:
00159             case del:
00160             case skip:
00161                 matchCount += i->count;
00162                 break;
00163             default:
00164                 // we only care about operations that are in the reference sequence.
00165                 break;
00166         }
00167     }
00168     return matchCount;
00169 }


The documentation for this class was generated from the following files:
Generated on Wed Nov 17 15:38:30 2010 for StatGen Software by  doxygen 1.6.3