This class represents the CIGAR without any methods to set the cigar (see CigarRoller for that). More...
#include <Cigar.h>
Classes | |
struct | CigarOperator |
Public Types | |
enum | Operation { none = 0, match, mismatch, insert, del, skip, softClip, hardClip, pad } |
Enum for the cigar operations. More... | |
Public Member Functions | |
Cigar () | |
Default constructor initializes as a CIGAR with no operations. | |
void | getCigarString (String &cigarString) const |
Set the passed in String to the string reprentation of the Cigar operations in this object. | |
void | getCigarString (std::string &cigarString) const |
Set the passed in std::string to the string reprentation of the Cigar operations in this object. | |
void | getExpandedString (std::string &s) const |
Sets the specified string to a valid CIGAR string of characters that represent the cigar with no digits (a CIGAR of "3M" would return "MMM"). | |
const CigarOperator & | operator[] (int i) const |
Return the Cigar Operation at the specified index (starting at 0). | |
const CigarOperator & | getOperator (int i) const |
Return the Cigar Operation at the specified index (starting at 0). | |
bool | operator== (Cigar &rhs) const |
Return true if the 2 Cigars are the same (the same operations of the same sizes). | |
int | size () const |
Return the number of cigar operations. | |
void | Dump () const |
Write this object as a string to cout. | |
int | getExpectedQueryBaseCount () const |
Return the length of the read that corresponds to the current CIGAR string. | |
int | getExpectedReferenceBaseCount () const |
Return the number of bases in the reference that this CIGAR "spans". | |
int | getNumBeginClips () const |
Return the number of clips that are at the beginning of the cigar. | |
int | getNumEndClips () const |
Return the number of clips that are at the end of the cigar. | |
int32_t | getRefOffset (int32_t queryIndex) |
Return the reference offset associated with the specified query index or INDEX_NA based on this cigar. | |
int32_t | getQueryIndex (int32_t refOffset) |
Return the query index associated with the specified reference offset or INDEX_NA based on this cigar. | |
int32_t | getRefPosition (int32_t queryIndex, int32_t queryStartPos) |
Return the reference position associated with the specified query index or INDEX_NA based on this cigar and the specified queryStartPos which is the leftmost mapping position of the first matching base in the query. | |
int32_t | getQueryIndex (int32_t refPosition, int32_t queryStartPos) |
Return the query index or INDEX_NA associated with the specified reference offset when the query starts at the specified reference position. | |
uint32_t | getNumOverlaps (int32_t start, int32_t end, int32_t queryStartPos) |
Return the number of bases that overlap the reference and the read associated with this cigar that falls within the specified region. | |
bool | hasIndel () |
Return whether or not the cigar has indels (insertions or delections). | |
Static Public Member Functions | |
static bool | foundInQuery (Operation op) |
Return true if the specified operation is found in the query sequence, false if not. | |
static bool | foundInQuery (CigarOperator op) |
Return true if the specified operation is found in the query sequence, false if not. | |
static bool | isClip (Operation op) |
Return true if the specified operation is a clipping operation, false if not. | |
static bool | isClip (CigarOperator op) |
Return true if the specified operation is a clipping operation, false if not. | |
static bool | isMatchOrMismatch (Operation op) |
Return true if the specified operation is a match/mismatch operation, false if not. | |
static bool | isMatchOrMismatch (CigarOperator op) |
Return true if the specified operation is a match/mismatch operation, false if not. | |
Static Public Attributes | |
static const int | MAX_OP_VALUE = pad |
static const int32_t | INDEX_NA = -1 |
Value associated with an index that is not applicable/does not exist, used for converting between query and reference indexes/offsets when an associated index/offset does not exist. | |
Protected Member Functions | |
void | clearQueryAndReferenceIndexes () |
void | setQueryAndReferenceIndexes () |
Protected Attributes | |
std::vector< CigarOperator > | cigarOperations |
Friends | |
std::ostream & | operator<< (std::ostream &stream, const Cigar &cigar) |
Writes all of the cigar operations contained in the cigar to the passed in stream. |
This class represents the CIGAR without any methods to set the cigar (see CigarRoller for that).
This class represents the CIGAR. It contains methods for converting to strings and extracting information from the cigar on how a read maps to the reference.
It only contains read only methods. There are no ways to set values. To set a value, a child class must be used.
Definition at line 83 of file Cigar.h.
enum Cigar::Operation |
Enum for the cigar operations.
Definition at line 87 of file Cigar.h.
00087 { 00088 none=0, ///< no operation has been set. 00089 match, ///< match/mismatch operation. Associated with CIGAR Operation "M" 00090 mismatch, ///< mismatch operation. Associated with CIGAR Operation "M" 00091 insert, ///< insertion to the reference (the query sequence contains bases that have no corresponding base in the reference). Associated with CIGAR Operation "I" 00092 del, ///< deletion from the reference (the reference contains bases that have no corresponding base in the query sequence). Associated with CIGAR Operation "D" 00093 skip, ///< skipped region from the reference (the reference contains bases that have no corresponding base in the query sequence). Associated with CIGAR Operation "N" 00094 softClip, ///< Soft clip on the read (clipped sequence present in the query sequence, but not in reference). Associated with CIGAR Operation "S" 00095 hardClip, ///< Hard clip on the read (clipped sequence not present in the query sequence or reference). Associated with CIGAR Operation "H" 00096 pad ///< Padding (not in reference or query). Associated with CIGAR Operation "P" 00097 };
static bool Cigar::foundInQuery | ( | CigarOperator | op | ) | [inline, static] |
static bool Cigar::foundInQuery | ( | Operation | op | ) | [inline, static] |
Return true if the specified operation is found in the query sequence, false if not.
Definition at line 177 of file Cigar.h.
References insert, match, mismatch, and softClip.
Referenced by SamRecord::shiftIndelsLeft(), and SamFilter::softClip().
void Cigar::getCigarString | ( | std::string & | cigarString | ) | const |
Set the passed in std::string to the string reprentation of the Cigar operations in this object.
Definition at line 36 of file Cigar.cpp.
00037 { 00038 using namespace STLUtilities; 00039 00040 std::vector<CigarOperator>::const_iterator i; 00041 00042 cigarString.clear(); // clear result string 00043 00044 // Progressively append the character representations of the operations to 00045 // the cigar string. 00046 for (i = cigarOperations.begin(); i != cigarOperations.end(); i++) 00047 { 00048 cigarString << (*i).count << (*i).getChar(); 00049 } 00050 }
void Cigar::getCigarString | ( | String & | cigarString | ) | const |
Set the passed in String to the string reprentation of the Cigar operations in this object.
Definition at line 52 of file Cigar.cpp.
Referenced by Dump(), and SamRecord::setCigar().
00053 { 00054 std::string cigar; 00055 00056 getCigarString(cigar); 00057 00058 cigarString = cigar.c_str(); 00059 00060 return; 00061 }
void Cigar::getExpandedString | ( | std::string & | s | ) | const |
Sets the specified string to a valid CIGAR string of characters that represent the cigar with no digits (a CIGAR of "3M" would return "MMM").
The returned string is actually also a valid CIGAR string. In theory this makes it easier to parse some reads.
Definition at line 63 of file Cigar.cpp.
00064 { 00065 s = ""; 00066 00067 std::vector<CigarOperator>::const_iterator i; 00068 00069 // Progressively append the character representations of the operations to 00070 // the string passed in 00071 00072 for (i = cigarOperations.begin(); i != cigarOperations.end(); i++) 00073 { 00074 for (uint32_t j = 0; j<(*i).count; j++) s += (*i).getChar(); 00075 } 00076 return; 00077 }
int Cigar::getExpectedQueryBaseCount | ( | ) | const |
Return the length of the read that corresponds to the current CIGAR string.
For validation, we should expect that a sequence read in a SAM file will be the same length as the value returned by this method.
Example: 3M2D3M describes a read with three bases matching the reference, then skips 2 bases, then has three more bases that match the reference (match/mismatch). In this case, the read length is expected to be 6.
Example: 3M2I3M describes a read with 3 match/mismatch bases, two extra bases, and then 3 more match/mistmatch bases. The total in this example is 8 bases.
Definition at line 95 of file Cigar.cpp.
References insert, match, mismatch, and softClip.
Referenced by SamValidator::isValidCigar(), and SamFilter::softClip().
00096 { 00097 int matchCount = 0; 00098 std::vector<CigarOperator>::const_iterator i; 00099 for (i = cigarOperations.begin(); i != cigarOperations.end(); i++) 00100 { 00101 switch (i->operation) 00102 { 00103 case match: 00104 case mismatch: 00105 case softClip: 00106 case insert: 00107 matchCount += i->count; 00108 break; 00109 default: 00110 // we only care about operations that are in the query sequence. 00111 break; 00112 } 00113 } 00114 return matchCount; 00115 }
int Cigar::getExpectedReferenceBaseCount | ( | ) | const |
Return the number of bases in the reference that this CIGAR "spans".
When doing range checking, we occassionally need to know how many total bases the CIGAR string represents as compared to the reference.
Examples: 3M2D3M describes a read that overlays 8 bases in the reference. 3M2I3M describes a read with 3 bases that match the reference, two additional bases that aren't in the reference, and 3 more bases that match the reference, so it spans 6 bases in the reference.
Definition at line 120 of file Cigar.cpp.
References del, match, mismatch, and skip.
Referenced by SamTags::createMDTag().
00121 { 00122 int matchCount = 0; 00123 std::vector<CigarOperator>::const_iterator i; 00124 for (i = cigarOperations.begin(); i != cigarOperations.end(); i++) 00125 { 00126 switch (i->operation) 00127 { 00128 case match: 00129 case mismatch: 00130 case del: 00131 case skip: 00132 matchCount += i->count; 00133 break; 00134 default: 00135 // we only care about operations that are in the reference sequence. 00136 break; 00137 } 00138 } 00139 return matchCount; 00140 }
uint32_t Cigar::getNumOverlaps | ( | int32_t | start, | |
int32_t | end, | |||
int32_t | queryStartPos | |||
) |
Return the number of bases that overlap the reference and the read associated with this cigar that falls within the specified region.
start | : inclusive 0-based start position (reference position) of the region to check for overlaps in (-1 indicates to start at the beginning of the reference.) | |
end | : exclusive 0-based end position (reference position) of the region to check for overlaps in (-1 indicates to go to the end of the reference.) | |
queryStartPos | : 0-based leftmost mapping position of the first matcihng base in the query. |
Definition at line 260 of file Cigar.cpp.
References getRefOffset().
Referenced by SamRecord::getNumOverlaps().
00262 { 00263 // Get the overlap info. 00264 if ((queryToRef.size() == 0) || (refToQuery.size() == 0)) 00265 { 00266 setQueryAndReferenceIndexes(); 00267 } 00268 00269 // Get the start and end offsets. 00270 int32_t startRefOffset = 0; 00271 // If the specified start is more than the queryStartPos, set 00272 // the startRefOffset to the appropriate non-zero value. 00273 // (if start is <= queryStartPos, than startRefOffset is 0 - it should 00274 // not be set to a negative value.) 00275 if (start > queryStartPos) 00276 { 00277 startRefOffset = start - queryStartPos; 00278 } 00279 00280 int32_t endRefOffset = end - queryStartPos; 00281 if (end == -1) 00282 { 00283 // -1 means that the region goes to the end of the refrerence. 00284 // So set endRefOffset to the max refOffset + 1 which is the 00285 // size of the refToQuery vector. 00286 endRefOffset = refToQuery.size(); 00287 } 00288 00289 00290 // if endRefOffset is less than 0, then this read does not fall within 00291 // the specified region, so return 0. 00292 if (endRefOffset < 0) 00293 { 00294 return(0); 00295 } 00296 00297 // Get the overlaps for these offsets. 00298 // Loop through the read counting positions that match the reference 00299 // within this region. 00300 int32_t refOffset = 0; 00301 int32_t numOverlaps = 0; 00302 for (unsigned int queryIndex = 0; queryIndex < queryToRef.size(); 00303 queryIndex++) 00304 { 00305 refOffset = getRefOffset(queryIndex); 00306 if (refOffset > endRefOffset) 00307 { 00308 // Past the end of the specified region, so stop checking 00309 // for overlaps since there will be no more. 00310 break; 00311 } 00312 else if ((refOffset >= startRefOffset) && (refOffset < endRefOffset)) 00313 { 00314 // within the region, increment the counter. 00315 ++numOverlaps; 00316 } 00317 } 00318 00319 return(numOverlaps); 00320 }
int32_t Cigar::getQueryIndex | ( | int32_t | refPosition, | |
int32_t | queryStartPos | |||
) |
Return the query index or INDEX_NA associated with the specified reference offset when the query starts at the specified reference position.
Definition at line 240 of file Cigar.cpp.
References INDEX_NA.
00241 { 00242 // If the vectors aren't set, set them. 00243 if ((queryToRef.size() == 0) || (refToQuery.size() == 0)) 00244 { 00245 setQueryAndReferenceIndexes(); 00246 } 00247 00248 int32_t refOffset = refPosition - queryStartPos; 00249 if ((refOffset < 0) || ((uint32_t)refOffset >= refToQuery.size())) 00250 { 00251 return(INDEX_NA); 00252 } 00253 00254 return(refToQuery[refOffset]); 00255 }
int32_t Cigar::getQueryIndex | ( | int32_t | refOffset | ) |
Return the query index associated with the specified reference offset or INDEX_NA based on this cigar.
Definition at line 202 of file Cigar.cpp.
References INDEX_NA.
Referenced by SamTags::createMDTag().
00203 { 00204 // If the vectors aren't set, set them. 00205 if ((queryToRef.size() == 0) || (refToQuery.size() == 0)) 00206 { 00207 setQueryAndReferenceIndexes(); 00208 } 00209 if ((refOffset < 0) || ((uint32_t)refOffset >= refToQuery.size())) 00210 { 00211 return(INDEX_NA); 00212 } 00213 return(refToQuery[refOffset]); 00214 }
int32_t Cigar::getRefOffset | ( | int32_t | queryIndex | ) |
Return the reference offset associated with the specified query index or INDEX_NA based on this cigar.
Definition at line 187 of file Cigar.cpp.
References INDEX_NA.
Referenced by SamQuerySeqWithRefIter::getNextMatchMismatch(), getNumOverlaps(), SamQuerySeqWithRef::seqWithEquals(), and SamQuerySeqWithRef::seqWithoutEquals().
00188 { 00189 // If the vectors aren't set, set them. 00190 if ((queryToRef.size() == 0) || (refToQuery.size() == 0)) 00191 { 00192 setQueryAndReferenceIndexes(); 00193 } 00194 if ((queryIndex < 0) || ((uint32_t)queryIndex >= queryToRef.size())) 00195 { 00196 return(INDEX_NA); 00197 } 00198 return(queryToRef[queryIndex]); 00199 }
int32_t Cigar::getRefPosition | ( | int32_t | queryIndex, | |
int32_t | queryStartPos | |||
) |
Return the reference position associated with the specified query index or INDEX_NA based on this cigar and the specified queryStartPos which is the leftmost mapping position of the first matching base in the query.
Definition at line 217 of file Cigar.cpp.
References INDEX_NA.
Referenced by SamFilter::softClip().
00218 { 00219 // If the vectors aren't set, set them. 00220 if ((queryToRef.size() == 0) || (refToQuery.size() == 0)) 00221 { 00222 setQueryAndReferenceIndexes(); 00223 } 00224 if ((queryIndex < 0) || ((uint32_t)queryIndex >= queryToRef.size())) 00225 { 00226 return(INDEX_NA); 00227 } 00228 00229 if (queryToRef[queryIndex] != INDEX_NA) 00230 { 00231 return(queryToRef[queryIndex] + queryStartPos); 00232 } 00233 return(INDEX_NA); 00234 }
bool Cigar::hasIndel | ( | ) |
Return whether or not the cigar has indels (insertions or delections).
Definition at line 324 of file Cigar.cpp.
00325 { 00326 for(unsigned int i = 0; i < cigarOperations.size(); i++) 00327 { 00328 if((cigarOperations[i].operation == insert) || 00329 (cigarOperations[i].operation == del)) 00330 { 00331 // Found an indel, so return true. 00332 return(true); 00333 } 00334 } 00335 // Went through all the operations, and found no indel, so return false. 00336 return(false); 00337 }
static bool Cigar::isClip | ( | CigarOperator | op | ) | [inline, static] |
static bool Cigar::isClip | ( | Operation | op | ) | [inline, static] |
Return true if the specified operation is a clipping operation, false if not.
Definition at line 211 of file Cigar.h.
References hardClip, and softClip.
Referenced by SamFilter::softClip().
static bool Cigar::isMatchOrMismatch | ( | CigarOperator | op | ) | [inline, static] |
static bool Cigar::isMatchOrMismatch | ( | Operation | op | ) | [inline, static] |
Return true if the specified operation is a match/mismatch operation, false if not.
Definition at line 241 of file Cigar.h.
References match, and mismatch.
Referenced by SamRecord::shiftIndelsLeft().
bool Cigar::operator== | ( | Cigar & | rhs | ) | const |
const int32_t Cigar::INDEX_NA = -1 [static] |
Value associated with an index that is not applicable/does not exist, used for converting between query and reference indexes/offsets when an associated index/offset does not exist.
Definition at line 409 of file Cigar.h.
Referenced by SamTags::createMDTag(), SamQuerySeqWithRefIter::getNextMatchMismatch(), getQueryIndex(), getRefOffset(), getRefPosition(), SamQuerySeqWithRef::seqWithEquals(), and SamQuerySeqWithRef::seqWithoutEquals().