Classes | |
struct | CigarOperator |
Public Types | |
enum | Operation { none, match, mismatch, insert, del, skip, softClip, hardClip, pad } |
Public Member Functions | |
void | getCigarString (String &cigarString) const |
void | getCigarString (std::string &cigarString) const |
void | getExpandedString (std::string &s) const |
obtain a non-run length encoded string of operations | |
const CigarOperator & | operator[] (int i) const |
const CigarOperator & | getOperator (int i) const |
bool | operator== (Cigar &rhs) const |
int | size () const |
void | Dump () const |
int | getExpectedQueryBaseCount () const |
return the length of the read that corresponds to the current CIGAR string. | |
int | getExpectedReferenceBaseCount () const |
return the number of bases in the reference that this read "spans" | |
int | getNumBeginClips () const |
Return the number of clips that are at the beginning of the cigar. | |
int | getNumEndClips () const |
Return the number of clips that are at the end of the cigar. | |
int32_t | getRefOffset (int32_t queryIndex) |
int32_t | getQueryIndex (int32_t refOffset) |
int32_t | getRefPosition (int32_t queryIndex, int32_t queryStartPos) |
int32_t | getQueryIndex (int32_t refPosition, int32_t queryStartPos) |
uint32_t | getNumOverlaps (int32_t start, int32_t end, int32_t queryStartPos) |
Static Public Member Functions | |
static bool | foundInQuery (Operation op) |
static bool | isClip (Operation op) |
Static Public Attributes | |
static const int32_t | INDEX_NA = -1 |
Protected Member Functions | |
void | clearQueryAndReferenceIndexes () |
void | setQueryAndReferenceIndexes () |
Protected Attributes | |
std::vector< CigarOperator > | cigarOperations |
Friends | |
std::ostream & | operator<< (std::ostream &stream, const Cigar &cigar) |
Definition at line 80 of file Cigar.h.
void Cigar::getExpandedString | ( | std::string & | s | ) | const |
obtain a non-run length encoded string of operations
The returned string is actually also a valid CIGAR string, but it does not have any digits in it - only the characters themselves. In theory this makes it easier to parse some reads.
/return s the string to populate
Definition at line 63 of file Cigar.cpp.
00064 { 00065 s = ""; 00066 00067 std::vector<CigarOperator>::const_iterator i; 00068 00069 // Progressively append the character representations of the operations to 00070 // the string passed in 00071 00072 for (i = cigarOperations.begin(); i != cigarOperations.end(); i++) 00073 { 00074 for (uint32_t j = 0; j<(*i).count; j++) s += (*i).getChar(); 00075 } 00076 return; 00077 }
int Cigar::getExpectedQueryBaseCount | ( | ) | const |
return the length of the read that corresponds to the current CIGAR string.
For validation, we should expect that a sequence read in a SAM file will be the same length as the value returned by this method.
Example: 3M2D3M describes a read with three bases matching the reference, then skips 2 bases, then has three more bases that match the reference (match/mismatch). In this case, the read length is expected to be 6.
Example: 3M2I3M describes a read with 3 match/mismatch bases, two extra bases, and then 3 more match/mistmatch bases. The total in this example is 8 bases.
/return returns the expected read length
Definition at line 110 of file Cigar.cpp.
00111 { 00112 int matchCount = 0; 00113 std::vector<CigarOperator>::const_iterator i; 00114 for (i = cigarOperations.begin(); i != cigarOperations.end(); i++) 00115 { 00116 switch (i->operation) 00117 { 00118 case match: 00119 case mismatch: 00120 case softClip: 00121 case insert: 00122 matchCount += i->count; 00123 break; 00124 default: 00125 // we only care about operations that are in the query sequence. 00126 break; 00127 } 00128 } 00129 return matchCount; 00130 }
int Cigar::getExpectedReferenceBaseCount | ( | ) | const |
return the number of bases in the reference that this read "spans"
When doing range checking, we occassionally need to know how many total bases the CIGAR string represents as compared to the reference.
Examples: 3M2D3M describes a read that overlays 8 bases in the reference. 3M2I3M describes a read with 3 bases that match the reference, two additional bases that aren't in the reference, and 3 more bases that match the reference, so it spans 6 bases in the reference.
/return how many bases in the reference are spanned by the given CIGAR string
Definition at line 149 of file Cigar.cpp.
00150 { 00151 int matchCount = 0; 00152 std::vector<CigarOperator>::const_iterator i; 00153 for (i = cigarOperations.begin(); i != cigarOperations.end(); i++) 00154 { 00155 switch (i->operation) 00156 { 00157 case match: 00158 case mismatch: 00159 case del: 00160 case skip: 00161 matchCount += i->count; 00162 break; 00163 default: 00164 // we only care about operations that are in the reference sequence. 00165 break; 00166 } 00167 } 00168 return matchCount; 00169 }