Cigar Class Reference

This class represents the CIGAR without any methods to set the cigar (see CigarRoller for that). More...

#include <Cigar.h>

Inheritance diagram for Cigar:
Inheritance graph
[legend]
Collaboration diagram for Cigar:
Collaboration graph
[legend]

List of all members.

Classes

struct  CigarOperator

Public Types

enum  Operation {
  none = 0, match, mismatch, insert,
  del, skip, softClip, hardClip,
  pad
}
 

Enum for the cigar operations.

More...

Public Member Functions

 Cigar ()
 Default constructor initializes as a CIGAR with no operations.
void getCigarString (String &cigarString) const
 Set the passed in String to the string reprentation of the Cigar operations in this object.
void getCigarString (std::string &cigarString) const
 Set the passed in std::string to the string reprentation of the Cigar operations in this object.
void getExpandedString (std::string &s) const
 Sets the specified string to a valid CIGAR string of characters that represent the cigar with no digits (a CIGAR of "3M" would return "MMM").
const CigarOperatoroperator[] (int i) const
 Return the Cigar Operation at the specified index (starting at 0).
const CigarOperatorgetOperator (int i) const
 Return the Cigar Operation at the specified index (starting at 0).
bool operator== (Cigar &rhs) const
 Return true if the 2 Cigars are the same (the same operations of the same sizes).
int size () const
 Return the number of cigar operations.
void Dump () const
 Write this object as a string to cout.
int getExpectedQueryBaseCount () const
 Return the length of the read that corresponds to the current CIGAR string.
int getExpectedReferenceBaseCount () const
 Return the number of bases in the reference that this CIGAR "spans".
int getNumBeginClips () const
 Return the number of clips that are at the beginning of the cigar.
int getNumEndClips () const
 Return the number of clips that are at the end of the cigar.
int32_t getRefOffset (int32_t queryIndex)
 Return the reference offset associated with the specified query index or INDEX_NA based on this cigar.
int32_t getQueryIndex (int32_t refOffset)
 Return the query index associated with the specified reference offset or INDEX_NA based on this cigar.
int32_t getRefPosition (int32_t queryIndex, int32_t queryStartPos)
 Return the reference position associated with the specified query index or INDEX_NA based on this cigar and the specified queryStartPos which is the leftmost mapping position of the first matching base in the query.
int32_t getQueryIndex (int32_t refPosition, int32_t queryStartPos)
 Return the query index or INDEX_NA associated with the specified reference offset when the query starts at the specified reference position.
uint32_t getNumOverlaps (int32_t start, int32_t end, int32_t queryStartPos)
 Return the number of bases that overlap the reference and the read associated with this cigar that falls within the specified region.
bool hasIndel ()
 Return whether or not the cigar has indels (insertions or delections).

Static Public Member Functions

static bool foundInQuery (Operation op)
 Return true if the specified operation is found in the query sequence, false if not.
static bool foundInQuery (CigarOperator op)
 Return true if the specified operation is found in the query sequence, false if not.
static bool isClip (Operation op)
 Return true if the specified operation is a clipping operation, false if not.
static bool isClip (CigarOperator op)
 Return true if the specified operation is a clipping operation, false if not.
static bool isMatchOrMismatch (Operation op)
 Return true if the specified operation is a match/mismatch operation, false if not.
static bool isMatchOrMismatch (CigarOperator op)
 Return true if the specified operation is a match/mismatch operation, false if not.

Static Public Attributes

static const int MAX_OP_VALUE = pad
static const int32_t INDEX_NA = -1
 Value associated with an index that is not applicable/does not exist, used for converting between query and reference indexes/offsets when an associated index/offset does not exist.

Protected Member Functions

void clearQueryAndReferenceIndexes ()
void setQueryAndReferenceIndexes ()

Protected Attributes

std::vector< CigarOperatorcigarOperations

Friends

std::ostream & operator<< (std::ostream &stream, const Cigar &cigar)
 Writes all of the cigar operations contained in the cigar to the passed in stream.

Detailed Description

This class represents the CIGAR without any methods to set the cigar (see CigarRoller for that).

Docs from Sam1.pdf: Clipped alignment. In Smith-Waterman alignment, a sequence may not be aligned from the first residue to the last one. Subsequences at the ends may be clipped off. We introduce operation ʻSʼ to describe (softly) clipped alignment. Here is an example. Suppose the clipped alignment is: REF: AGCTAGCATCGTGTCGCCCGTCTAGCATACGCATGATCGACTGTCAGCTAGTCAGACTAGTCGATCGATGTG READ: gggGTGTAACC-GACTAGgggg where on the read sequence, bases in uppercase are matches and bases in lowercase are clipped off. The CIGAR for this alignment is: 3S8M1D6M4S.

If the mapping position of the query is not available, RNAME and CIGAR are set as “*”

A CIGAR string is comprised of a series of operation lengths plus the operations. The conventional CIGAR format allows for three types of operations: M for match or mismatch, I for insertion and D for deletion. The extended CIGAR format further allows four more operations, as is shown in the following table, to describe clipping, padding and splicing:

op Description -- ----------- M Match or mismatch I Insertion to the reference D Deletion from the reference N Skipped region from the reference S Soft clip on the read (clipped sequence present in <seq>) H Hard clip on the read (clipped sequence NOT present in <seq>) P Padding (silent deletion from the padded reference sequence) This class represents the CIGAR. It contains methods for converting to strings and extracting information from the cigar on how a read maps to the reference.

It only contains read only methods. There are no ways to set values. To set a value, a child class must be used.

Definition at line 83 of file Cigar.h.


Member Enumeration Documentation

Enum for the cigar operations.

Enumerator:
none 

no operation has been set.

match 

match/mismatch operation. Associated with CIGAR Operation "M"

mismatch 

mismatch operation. Associated with CIGAR Operation "M"

insert 

insertion to the reference (the query sequence contains bases that have no corresponding base in the reference). Associated with CIGAR Operation "I"

del 

deletion from the reference (the reference contains bases that have no corresponding base in the query sequence). Associated with CIGAR Operation "D"

skip 

skipped region from the reference (the reference contains bases that have no corresponding base in the query sequence). Associated with CIGAR Operation "N"

softClip 

Soft clip on the read (clipped sequence present in the query sequence, but not in reference). Associated with CIGAR Operation "S".

hardClip 

Hard clip on the read (clipped sequence not present in the query sequence or reference). Associated with CIGAR Operation "H".

pad 

Padding (not in reference or query). Associated with CIGAR Operation "P".

Definition at line 87 of file Cigar.h.

00087                    {
00088         none=0, ///< no operation has been set.
00089         match, ///< match/mismatch operation.  Associated with CIGAR Operation "M"
00090         mismatch, ///< mismatch operation.  Associated with CIGAR Operation "M"
00091         insert,  ///< insertion to the reference (the query sequence contains bases that have no corresponding base in the reference).  Associated with CIGAR Operation "I"
00092         del,  ///< deletion from the reference (the reference contains bases that have no corresponding base in the query sequence).  Associated with CIGAR Operation "D"
00093         skip,  ///< skipped region from the reference (the reference contains bases that have no corresponding base in the query sequence).  Associated with CIGAR Operation "N"
00094         softClip,  ///< Soft clip on the read (clipped sequence present in the query sequence, but not in reference).  Associated with CIGAR Operation "S"
00095         hardClip,  ///< Hard clip on the read (clipped sequence not present in the query sequence or reference).  Associated with CIGAR Operation "H"
00096         pad ///< Padding (not in reference or query).  Associated with CIGAR Operation "P"
00097     };


Member Function Documentation

static bool Cigar::foundInQuery ( CigarOperator  op  )  [inline, static]

Return true if the specified operation is found in the query sequence, false if not.

Definition at line 194 of file Cigar.h.

References insert, match, mismatch, and softClip.

00195     {
00196         switch(op.operation)
00197         {
00198             case match:
00199             case mismatch:
00200             case insert:
00201             case softClip:
00202                 return true;
00203             default:
00204                 return false;
00205         }
00206         return false;
00207     }

static bool Cigar::foundInQuery ( Operation  op  )  [inline, static]

Return true if the specified operation is found in the query sequence, false if not.

Definition at line 177 of file Cigar.h.

References insert, match, mismatch, and softClip.

Referenced by SamRecord::shiftIndelsLeft(), and SamFilter::softClip().

00178     {
00179         switch(op)
00180         {
00181             case match:
00182             case mismatch:
00183             case insert:
00184             case softClip:
00185                 return true;
00186             default:
00187                 return false;
00188         }
00189         return false;
00190     }

void Cigar::getCigarString ( std::string &  cigarString  )  const

Set the passed in std::string to the string reprentation of the Cigar operations in this object.

Definition at line 36 of file Cigar.cpp.

00037 {
00038     using namespace STLUtilities;
00039 
00040     std::vector<CigarOperator>::const_iterator i;
00041 
00042     cigarString.clear();  // clear result string
00043 
00044     // Progressively append the character representations of the operations to
00045     // the cigar string.
00046     for (i = cigarOperations.begin(); i != cigarOperations.end(); i++)
00047     {
00048         cigarString << (*i).count << (*i).getChar();
00049     }
00050 }

void Cigar::getCigarString ( String cigarString  )  const

Set the passed in String to the string reprentation of the Cigar operations in this object.

Definition at line 52 of file Cigar.cpp.

Referenced by Dump(), and SamRecord::setCigar().

00053 {
00054     std::string cigar;
00055 
00056     getCigarString(cigar);
00057 
00058     cigarString = cigar.c_str();
00059 
00060     return;
00061 }

void Cigar::getExpandedString ( std::string &  s  )  const

Sets the specified string to a valid CIGAR string of characters that represent the cigar with no digits (a CIGAR of "3M" would return "MMM").

The returned string is actually also a valid CIGAR string. In theory this makes it easier to parse some reads.

Returns:
s the string to populate

Definition at line 63 of file Cigar.cpp.

00064 {
00065     s = "";
00066 
00067     std::vector<CigarOperator>::const_iterator i;
00068 
00069     // Progressively append the character representations of the operations to
00070     // the string passed in
00071 
00072     for (i = cigarOperations.begin(); i != cigarOperations.end(); i++)
00073     {
00074         for (uint32_t j = 0; j<(*i).count; j++) s += (*i).getChar();
00075     }
00076     return;
00077 }

int Cigar::getExpectedQueryBaseCount (  )  const

Return the length of the read that corresponds to the current CIGAR string.

For validation, we should expect that a sequence read in a SAM file will be the same length as the value returned by this method.

Example: 3M2D3M describes a read with three bases matching the reference, then skips 2 bases, then has three more bases that match the reference (match/mismatch). In this case, the read length is expected to be 6.

Example: 3M2I3M describes a read with 3 match/mismatch bases, two extra bases, and then 3 more match/mistmatch bases. The total in this example is 8 bases.

Returns:
returns the expected read length

Definition at line 95 of file Cigar.cpp.

References insert, match, mismatch, and softClip.

Referenced by SamFilter::softClip().

00096 {
00097     int matchCount = 0;
00098     std::vector<CigarOperator>::const_iterator i;
00099     for (i = cigarOperations.begin(); i != cigarOperations.end(); i++)
00100     {
00101         switch (i->operation)
00102         {
00103             case match:
00104             case mismatch:
00105             case softClip:
00106             case insert:
00107                 matchCount += i->count;
00108                 break;
00109             default:
00110                 // we only care about operations that are in the query sequence.
00111                 break;
00112         }
00113     }
00114     return matchCount;
00115 }

int Cigar::getExpectedReferenceBaseCount (  )  const

Return the number of bases in the reference that this CIGAR "spans".

When doing range checking, we occassionally need to know how many total bases the CIGAR string represents as compared to the reference.

Examples: 3M2D3M describes a read that overlays 8 bases in the reference. 3M2I3M describes a read with 3 bases that match the reference, two additional bases that aren't in the reference, and 3 more bases that match the reference, so it spans 6 bases in the reference.

Returns:
how many bases in the reference are spanned by the given CIGAR string

Definition at line 120 of file Cigar.cpp.

References del, match, mismatch, and skip.

Referenced by SamTags::createMDTag().

00121 {
00122     int matchCount = 0;
00123     std::vector<CigarOperator>::const_iterator i;
00124     for (i = cigarOperations.begin(); i != cigarOperations.end(); i++)
00125     {
00126         switch (i->operation)
00127         {
00128             case match:
00129             case mismatch:
00130             case del:
00131             case skip:
00132                 matchCount += i->count;
00133                 break;
00134             default:
00135                 // we only care about operations that are in the reference sequence.
00136                 break;
00137         }
00138     }
00139     return matchCount;
00140 }

uint32_t Cigar::getNumOverlaps ( int32_t  start,
int32_t  end,
int32_t  queryStartPos 
)

Return the number of bases that overlap the reference and the read associated with this cigar that falls within the specified region.

Parameters:
start : inclusive 0-based start position (reference position) of the region to check for overlaps in (-1 indicates to start at the beginning of the reference.)
end : exclusive 0-based end position (reference position) of the region to check for overlaps in (-1 indicates to go to the end of the reference.)
queryStartPos : 0-based leftmost mapping position of the first matcihng base in the query.

Definition at line 260 of file Cigar.cpp.

References getRefOffset().

Referenced by SamRecord::getNumOverlaps().

00262 {
00263     // Get the overlap info.
00264     if ((queryToRef.size() == 0) || (refToQuery.size() == 0))
00265     {
00266         setQueryAndReferenceIndexes();
00267     }
00268 
00269     // Get the start and end offsets.
00270     int32_t startRefOffset = 0;
00271     // If the specified start is more than the queryStartPos, set
00272     // the startRefOffset to the appropriate non-zero value.
00273     // (if start is <= queryStartPos, than startRefOffset is 0 - it should
00274     // not be set to a negative value.)
00275     if (start > queryStartPos)
00276     {
00277         startRefOffset = start - queryStartPos;
00278     }
00279 
00280     int32_t endRefOffset = end - queryStartPos;
00281     if (end  == -1)
00282     {
00283         // -1 means that the region goes to the end of the refrerence.
00284         // So set endRefOffset to the max refOffset + 1 which is the
00285         // size of the refToQuery vector.
00286         endRefOffset = refToQuery.size();
00287     }
00288 
00289 
00290     // if endRefOffset is less than 0, then this read does not fall within
00291     // the specified region, so return 0.
00292     if (endRefOffset < 0)
00293     {
00294         return(0);
00295     }
00296 
00297     // Get the overlaps for these offsets.
00298     // Loop through the read counting positions that match the reference
00299     // within this region.
00300     int32_t refOffset = 0;
00301     int32_t numOverlaps = 0;
00302     for (unsigned int queryIndex = 0; queryIndex < queryToRef.size();
00303             queryIndex++)
00304     {
00305         refOffset = getRefOffset(queryIndex);
00306         if (refOffset > endRefOffset)
00307         {
00308             // Past the end of the specified region, so stop checking
00309             // for overlaps since there will be no more.
00310             break;
00311         }
00312         else if ((refOffset >= startRefOffset) && (refOffset < endRefOffset))
00313         {
00314             // within the region, increment the counter.
00315             ++numOverlaps;
00316         }
00317     }
00318 
00319     return(numOverlaps);
00320 }

int32_t Cigar::getQueryIndex ( int32_t  refPosition,
int32_t  queryStartPos 
)

Return the query index or INDEX_NA associated with the specified reference offset when the query starts at the specified reference position.

Definition at line 240 of file Cigar.cpp.

References INDEX_NA.

00241 {
00242     // If the vectors aren't set, set them.
00243     if ((queryToRef.size() == 0) || (refToQuery.size() == 0))
00244     {
00245         setQueryAndReferenceIndexes();
00246     }
00247 
00248     int32_t refOffset = refPosition - queryStartPos;
00249     if ((refOffset < 0) || ((uint32_t)refOffset >= refToQuery.size()))
00250     {
00251         return(INDEX_NA);
00252     }
00253 
00254     return(refToQuery[refOffset]);
00255 }

int32_t Cigar::getQueryIndex ( int32_t  refOffset  ) 

Return the query index associated with the specified reference offset or INDEX_NA based on this cigar.

Definition at line 202 of file Cigar.cpp.

References INDEX_NA.

Referenced by SamTags::createMDTag().

00203 {
00204     // If the vectors aren't set, set them.
00205     if ((queryToRef.size() == 0) || (refToQuery.size() == 0))
00206     {
00207         setQueryAndReferenceIndexes();
00208     }
00209     if ((refOffset < 0) || ((uint32_t)refOffset >= refToQuery.size()))
00210     {
00211         return(INDEX_NA);
00212     }
00213     return(refToQuery[refOffset]);
00214 }

int32_t Cigar::getRefOffset ( int32_t  queryIndex  ) 

Return the reference offset associated with the specified query index or INDEX_NA based on this cigar.

Definition at line 187 of file Cigar.cpp.

References INDEX_NA.

Referenced by SamQuerySeqWithRefIter::getNextMatchMismatch(), getNumOverlaps(), SamQuerySeqWithRef::seqWithEquals(), and SamQuerySeqWithRef::seqWithoutEquals().

00188 {
00189     // If the vectors aren't set, set them.
00190     if ((queryToRef.size() == 0) || (refToQuery.size() == 0))
00191     {
00192         setQueryAndReferenceIndexes();
00193     }
00194     if ((queryIndex < 0) || ((uint32_t)queryIndex >= queryToRef.size()))
00195     {
00196         return(INDEX_NA);
00197     }
00198     return(queryToRef[queryIndex]);
00199 }

int32_t Cigar::getRefPosition ( int32_t  queryIndex,
int32_t  queryStartPos 
)

Return the reference position associated with the specified query index or INDEX_NA based on this cigar and the specified queryStartPos which is the leftmost mapping position of the first matching base in the query.

Definition at line 217 of file Cigar.cpp.

References INDEX_NA.

Referenced by SamFilter::softClip().

00218 {
00219     // If the vectors aren't set, set them.
00220     if ((queryToRef.size() == 0) || (refToQuery.size() == 0))
00221     {
00222         setQueryAndReferenceIndexes();
00223     }
00224     if ((queryIndex < 0) || ((uint32_t)queryIndex >= queryToRef.size()))
00225     {
00226         return(INDEX_NA);
00227     }
00228 
00229     if (queryToRef[queryIndex] != INDEX_NA)
00230     {
00231         return(queryToRef[queryIndex] + queryStartPos);
00232     }
00233     return(INDEX_NA);
00234 }

bool Cigar::hasIndel (  ) 

Return whether or not the cigar has indels (insertions or delections).

Returns:
true if it has an insertion or deletion, false if not.

Definition at line 324 of file Cigar.cpp.

References del, and insert.

00325 {
00326     for(unsigned int i = 0; i < cigarOperations.size(); i++)
00327     {
00328         if((cigarOperations[i].operation == insert) ||
00329            (cigarOperations[i].operation == del))
00330         {
00331             // Found an indel, so return true.
00332             return(true);
00333         }
00334     }
00335     // Went through all the operations, and found no indel, so return false.
00336     return(false);
00337 }

static bool Cigar::isClip ( CigarOperator  op  )  [inline, static]

Return true if the specified operation is a clipping operation, false if not.

Definition at line 226 of file Cigar.h.

References hardClip, and softClip.

00227     {
00228         switch(op.operation)
00229         {
00230             case softClip:
00231             case hardClip:
00232                 return true;
00233             default:
00234                 return false;
00235         }
00236         return false;
00237     }

static bool Cigar::isClip ( Operation  op  )  [inline, static]

Return true if the specified operation is a clipping operation, false if not.

Definition at line 211 of file Cigar.h.

References hardClip, and softClip.

Referenced by SamFilter::softClip().

00212     {
00213         switch(op)
00214         {
00215             case softClip:
00216             case hardClip:
00217                 return true;
00218             default:
00219                 return false;
00220         }
00221         return false;
00222     }

static bool Cigar::isMatchOrMismatch ( CigarOperator  op  )  [inline, static]

Return true if the specified operation is a match/mismatch operation, false if not.

Definition at line 256 of file Cigar.h.

References match, and mismatch.

00257     {
00258         switch(op.operation)
00259         {
00260             case match:
00261             case mismatch:
00262                 return true;
00263             default:
00264                 return false;
00265         }
00266         return false;
00267     }

static bool Cigar::isMatchOrMismatch ( Operation  op  )  [inline, static]

Return true if the specified operation is a match/mismatch operation, false if not.

Definition at line 241 of file Cigar.h.

References match, and mismatch.

Referenced by SamRecord::shiftIndelsLeft().

00242     {
00243         switch(op)
00244         {
00245             case match:
00246             case mismatch:
00247                 return true;
00248             default:
00249                 return false;
00250         }
00251         return false;
00252     }

bool Cigar::operator== ( Cigar rhs  )  const

Return true if the 2 Cigars are the same (the same operations of the same sizes).

Definition at line 80 of file Cigar.cpp.

References size().

00081 {
00082 
00083     if (this->size() != rhs.size()) return false;
00084 
00085     for (int i = 0; i < this->size(); i++)
00086     {
00087         if (cigarOperations[i]!=rhs.cigarOperations[i]) return false;
00088     }
00089     return true;
00090 }


Member Data Documentation

const int32_t Cigar::INDEX_NA = -1 [static]

Value associated with an index that is not applicable/does not exist, used for converting between query and reference indexes/offsets when an associated index/offset does not exist.

Definition at line 409 of file Cigar.h.

Referenced by SamTags::createMDTag(), SamQuerySeqWithRefIter::getNextMatchMismatch(), getQueryIndex(), getRefOffset(), getRefPosition(), SamQuerySeqWithRef::seqWithEquals(), and SamQuerySeqWithRef::seqWithoutEquals().


The documentation for this class was generated from the following files:
Generated on Tue Aug 23 18:19:07 2011 for libStatGen Software by  doxygen 1.6.3