SamFile Class Reference

Allows the user to easily read/write a SAM/BAM file. More...

#include <SamFile.h>

Inheritance diagram for SamFile:
Inheritance graph
[legend]
Collaboration diagram for SamFile:
Collaboration graph
[legend]

List of all members.

Public Types

enum  OpenType { READ, WRITE }
 

Enum for indicating whether to open the file for read or write.

More...
enum  SortedType { UNSORTED = 0, FLAG, COORDINATE, QUERY_NAME }
 

Enum for indicating the type of sort for the file.

More...

Public Member Functions

 SamFile ()
 Default Constructor.
 SamFile (ErrorHandler::HandlingType errorHandlingType)
 Constructor that sets the error handling type.
 SamFile (const char *filename, OpenType mode)
 Constructor that opens the specified file based on the specified mode (READ/WRITE).
 SamFile (const char *filename, OpenType mode, ErrorHandler::HandlingType errorHandlingType)
 Constructor that opens the specified file based on the specified mode (READ/WRITE) and handles errors per the specified handleType.
bool OpenForRead (const char *filename)
 Open a sam/bam file for reading with the specified filename.
bool OpenForWrite (const char *filename)
 Open a sam/bam file for writing with the specified filename.
bool ReadBamIndex (const char *filename)
 Reads the specified bam index file.
void SetReference (GenomeSequence *reference)
 Sets the reference to the specified genome sequence object.
void SetReadSequenceTranslation (SamRecord::SequenceTranslation translation)
 Set the type of sequence translation to use when reading the sequence.
void SetWriteSequenceTranslation (SamRecord::SequenceTranslation translation)
 Set the type of sequence translation to use when writing the sequence.
void Close ()
 Close the file if there is one open.
bool IsEOF ()
 Returns whether or not the end of the file has been reached.
bool ReadHeader (SamFileHeader &header)
 Reads the header section from the file and stores it in the passed in header.
bool WriteHeader (SamFileHeader &header)
 Writes the specified header into the file.
bool ReadRecord (SamFileHeader &header, SamRecord &record)
 Reads the next record from the file & stores it in the passed in record.
bool WriteRecord (SamFileHeader &header, SamRecord &record)
 Writes the specified record into the file.
void setSortedValidation (SortedType sortType)
 Set the flag to validate that the file is sorted as it is read/written.
uint32_t GetCurrentRecordCount ()
 Return the number of records that have been read/written so far.
SamStatus::Status GetFailure ()
 Get the Status of the last call that sets status.
SamStatus::Status GetStatus ()
 Get the Status of the last call that sets status.
const char * GetStatusMessage ()
 Get the Status of the last call that sets status.
bool SetReadSection (int32_t refID)
 Sets what part of the BAM file should be read.
bool SetReadSection (const char *refName)
 Sets what part of the BAM file should be read.
bool SetReadSection (int32_t refID, int32_t start, int32_t end)
 Sets what part of the BAM file should be read.
bool SetReadSection (const char *refName, int32_t start, int32_t end)
 Sets what part of the BAM file should be read.
uint32_t GetNumOverlaps (SamRecord &samRecord)
 Returns the number of bases in the passed in read that overlap the region that is currently set.
void GenerateStatistics (bool genStats)
 Whether or not statistics should be generated for this file.
void PrintStatistics ()

Protected Member Functions

void resetFile ()
 Resets the file prepping for a new file.
bool validateSortOrder (SamRecord &record, SamFileHeader &header)
 Validate that the record is sorted compared to the previously read record if there is one, according to the specified sort order.
SortedType getSortOrderFromHeader (SamFileHeader &header)
bool readIndexedRecord (SamFileHeader &header, SamRecord &record)
 Overwrites read record to read from the specific reference only.
bool processNewSection (SamFileHeader &header)

Protected Attributes

IFILE myFilePtr
GenericSamInterfacemyInterfacePtr
bool myIsOpenForRead
 Flag to indicate if a file is open for reading.
bool myIsOpenForWrite
 Flag to indicate if a file is open for writing.
bool myHasHeader
 Flag to indicate if a header has been read/written - required before being able to read/write a record.
SortedType mySortedType
int32_t myPrevCoord
 Previous values used for checking if the file is sorted.
int32_t myPrevRefID
std::string myPrevReadName
uint32_t myRecordCount
 Keep a count of the number of records that have been read/written so far.
SamStatisticsmyStatistics
 Pointer to the statistics for this file.
SamStatus myStatus
 The status of the last SamFile command.
bool myIsBamOpenForRead
 Values for reading Sorted BAM files via the index.
bool myNewSection
int32_t myRefID
int32_t myStartPos
int32_t myEndPos
uint64_t myCurrentChunkEnd
SortedChunkList myChunksToRead
BamIndexmyBamIndex
GenomeSequencemyRefPtr
SamRecord::SequenceTranslation myReadTranslation
SamRecord::SequenceTranslation myWriteTranslation
std::string myRefName

Detailed Description

Allows the user to easily read/write a SAM/BAM file.

Definition at line 30 of file SamFile.h.


Member Enumeration Documentation

Enum for indicating whether to open the file for read or write.

Enumerator:
READ 

open for reading.

WRITE 

open for writing.

Definition at line 34 of file SamFile.h.

00034                   {
00035         READ, ///< open for reading.
00036         WRITE ///< open for writing.
00037     };

Enum for indicating the type of sort for the file.

Enumerator:
UNSORTED 

file is not sorted.

FLAG 

SO flag from the header indicates the sort type.

COORDINATE 

file is sorted by coordinate.

QUERY_NAME 

file is sorted by queryname.

Definition at line 41 of file SamFile.h.

00041                     {
00042         UNSORTED = 0, ///< file is not sorted.
00043         FLAG,         ///< SO flag from the header indicates the sort type.
00044         COORDINATE,   ///< file is sorted by coordinate.
00045         QUERY_NAME    ///< file is sorted by queryname.
00046     };


Constructor & Destructor Documentation

SamFile::SamFile ( ErrorHandler::HandlingType  errorHandlingType  ) 

Constructor that sets the error handling type.

Parameters:
errorHandlingType how to handle errors.

Definition at line 40 of file SamFile.cpp.

References resetFile().

00041     : myFilePtr(NULL),
00042       myInterfacePtr(NULL),
00043       myStatistics(NULL),
00044       myStatus(errorHandlingType),
00045       myBamIndex(NULL),
00046       myRefPtr(NULL),
00047       myReadTranslation(SamRecord::NONE),
00048       myWriteTranslation(SamRecord::NONE)
00049 {
00050     resetFile();
00051 }

SamFile::SamFile ( const char *  filename,
OpenType  mode 
)

Constructor that opens the specified file based on the specified mode (READ/WRITE).

Parameters:
filename name of the file to open.
mode mode to use for opening the file.

Definition at line 56 of file SamFile.cpp.

References GetStatusMessage(), OpenForRead(), OpenForWrite(), READ, and resetFile().

00057     : myFilePtr(NULL),
00058       myInterfacePtr(NULL),
00059       myStatistics(NULL),
00060       myStatus(),
00061       myBamIndex(NULL),
00062       myRefPtr(NULL),
00063       myReadTranslation(SamRecord::NONE),
00064       myWriteTranslation(SamRecord::NONE)
00065 {
00066     resetFile();
00067 
00068     bool openStatus = true;
00069     if(mode == READ)
00070     {
00071         // open the file for read.
00072         openStatus = OpenForRead(filename);
00073     }
00074     else
00075     {
00076         // open the file for write.
00077         openStatus = OpenForWrite(filename);
00078     }
00079     if(!openStatus)
00080     {
00081         // Failed to open the file - print error and abort.
00082         fprintf(stderr, "%s\n", GetStatusMessage());
00083         std::cerr << "FAILURE - EXITING!!!" << std::endl;
00084         exit(-1);
00085     }
00086 }

SamFile::SamFile ( const char *  filename,
OpenType  mode,
ErrorHandler::HandlingType  errorHandlingType 
)

Constructor that opens the specified file based on the specified mode (READ/WRITE) and handles errors per the specified handleType.

Parameters:
filename name of the file to open.
mode mode to use for opening the file.
errorHandlingType how to handle errors.

Definition at line 91 of file SamFile.cpp.

References GetStatusMessage(), OpenForRead(), OpenForWrite(), READ, and resetFile().

00093     : myFilePtr(NULL),
00094       myInterfacePtr(NULL),
00095       myStatistics(NULL),
00096       myStatus(errorHandlingType),
00097       myBamIndex(NULL),
00098       myRefPtr(NULL),
00099       myReadTranslation(SamRecord::NONE),
00100       myWriteTranslation(SamRecord::NONE)
00101 {
00102     resetFile();
00103 
00104     bool openStatus = true;
00105     if(mode == READ)
00106     {
00107         // open the file for read.
00108         openStatus = OpenForRead(filename);
00109     }
00110     else
00111     {
00112         // open the file for write.
00113         openStatus = OpenForWrite(filename);
00114     }
00115     if(!openStatus)
00116     {
00117         // Failed to open the file - print error and abort.
00118         fprintf(stderr, "%s\n", GetStatusMessage());
00119         std::cerr << "FAILURE - EXITING!!!" << std::endl;
00120         exit(-1);
00121     }
00122 }


Member Function Documentation

void SamFile::GenerateStatistics ( bool  genStats  ) 

Whether or not statistics should be generated for this file.

The value is carried over between files and is not reset, but the statistics themselves are reset between files.

Parameters:
genStats set to true if statistics should be generated, false if not.

Definition at line 674 of file SamFile.cpp.

References myStatistics.

00675 {
00676     if(genStats)
00677     {
00678         if(myStatistics == NULL)
00679         {
00680             // Want to generate statistics, but do not yet have the
00681             // structure for them, so create one.
00682             myStatistics = new SamStatistics();
00683         }
00684     }
00685     else
00686     {
00687         // Do not generate statistics, so if myStatistics is not NULL, 
00688         // delete it.
00689         if(myStatistics != NULL)
00690         {
00691             delete myStatistics;
00692             myStatistics = NULL;
00693         }
00694     }
00695 
00696 }

SamStatus::Status SamFile::GetFailure (  )  [inline]

Get the Status of the last call that sets status.

To remain backwards compatable - will be removed later.

Definition at line 138 of file SamFile.h.

References GetStatus().

00139     {
00140         return(GetStatus());
00141     }

uint32_t SamFile::GetNumOverlaps ( SamRecord samRecord  ) 

Returns the number of bases in the passed in read that overlap the region that is currently set.

Parameters:
samRecord to check for overlapping bases.
Returns:
number of bases that overlap region that is currently set.

Definition at line 660 of file SamFile.cpp.

References SamRecord::getNumOverlaps(), SamRecord::setReference(), and SamRecord::setSequenceTranslation().

00661 {
00662     if(myRefPtr != NULL)
00663     {
00664         samRecord.setReference(myRefPtr);
00665     }
00666     samRecord.setSequenceTranslation(myReadTranslation);
00667 
00668     // Get the overlaps in the sam record for the region currently set
00669     // for this file.
00670     return(samRecord.getNumOverlaps(myStartPos, myEndPos));
00671 }

bool SamFile::IsEOF (  ) 

Returns whether or not the end of the file has been reached.

Returns:
true = EOF; false = not eof. If the file is not open, false is returned.

Definition at line 356 of file SamFile.cpp.

00357 {
00358     if (myFilePtr != NULL)
00359     {
00360         // File Pointer is set, so return if eof.
00361         return(ifeof(myFilePtr));
00362     }
00363     // File pointer is not set, so return true, eof.
00364     return true;
00365 }

bool SamFile::OpenForRead ( const char *  filename  ) 

Open a sam/bam file for reading with the specified filename.

Parameters:
filename,: the sam/bam file to open for reading.
Returns:
true = success; false = failure.

Definition at line 136 of file SamFile.cpp.

References myIsBamOpenForRead, myIsOpenForRead, myStatus, and resetFile().

Referenced by SamFile(), and SamFileReader::SamFileReader().

00137 {
00138     // Reset for any previously operated on files.
00139     resetFile();
00140 
00141     int lastchar = 0;
00142 
00143     while (filename[lastchar] != 0) lastchar++;
00144 
00145     // If at least one character, check for '-'.
00146     if((lastchar >= 1) && (filename[0] == '-'))
00147     {
00148         // Read from stdin - determine type of file to read.
00149         // Determine if compressed bam.
00150         if(strcmp(filename, "-.bam") == 0)
00151         {
00152             // Compressed bam - open as bgzf.
00153             // -.bam is the filename, read compressed bam from stdin
00154             filename = "-";
00155             myFilePtr = ifopen(filename, "rb", InputFile::BGZF);
00156             
00157             myInterfacePtr = new BamInterface;
00158 
00159             // Read the magic string.
00160             char magic[4];
00161             ifread(myFilePtr, magic, 4);
00162         }
00163         else if(strcmp(filename, "-.ubam") == 0)
00164         {
00165             // uncompressed BAM File.
00166             // -.ubam is the filename, read uncompressed bam from stdin
00167             filename = "-";
00168             myFilePtr = ifopen(filename, "rb", InputFile::UNCOMPRESSED);
00169         
00170             myInterfacePtr = new BamInterface;
00171 
00172             // Read the magic string.
00173             char magic[4];
00174             ifread(myFilePtr, magic, 4);
00175         }
00176         else
00177         {
00178             // SAM File.
00179             // read sam from stdin
00180             filename = "-";
00181             myFilePtr = ifopen(filename, "rb", InputFile::UNCOMPRESSED);
00182             myInterfacePtr = new SamInterface;
00183         }
00184     }
00185     else
00186     {
00187         // Not from stdin.  Read the file to determine the type.
00188         myFilePtr = ifopen(filename, "rb");
00189         
00190         if (myFilePtr == NULL)
00191         {
00192             std::string errorMessage = "Failed to Open ";
00193             errorMessage += filename;
00194             errorMessage += " for reading";
00195             myStatus.setStatus(SamStatus::FAIL_IO, errorMessage.c_str());
00196             return(false);
00197         }
00198         
00199         char magic[4];
00200         ifread(myFilePtr, magic, 4);
00201         
00202         if (magic[0] == 'B' && magic[1] == 'A' && magic[2] == 'M' &&
00203             magic[3] == 1)
00204         {
00205             myInterfacePtr = new BamInterface;
00206             // Set that it is a bam file open for reading.  This is needed to
00207             // determine if an index file can be used.
00208             myIsBamOpenForRead = true;
00209         }
00210         else
00211         {
00212             // Not a bam, so rewind to the beginning of the file so it
00213             // can be read.
00214             ifrewind(myFilePtr);
00215             myInterfacePtr = new SamInterface;
00216         }
00217     }
00218 
00219     // File is open for reading.
00220     myIsOpenForRead = true;
00221     // Successfully opened the file.
00222     myStatus = SamStatus::SUCCESS;
00223     return(true);
00224 }

bool SamFile::OpenForWrite ( const char *  filename  ) 

Open a sam/bam file for writing with the specified filename.

Returns:
true = success; false = failure.

Definition at line 228 of file SamFile.cpp.

References myIsOpenForWrite, myStatus, and resetFile().

Referenced by SamFile(), and SamFileWriter::SamFileWriter().

00229 {
00230     // Reset for any previously operated on files.
00231     resetFile();
00232     
00233     int lastchar = 0;
00234     while (filename[lastchar] != 0) lastchar++;   
00235     if (lastchar >= 4 && 
00236         filename[lastchar - 4] == 'u' &&
00237         filename[lastchar - 3] == 'b' &&
00238         filename[lastchar - 2] == 'a' &&
00239         filename[lastchar - 1] == 'm')
00240     {
00241         // BAM File.
00242         // if -.ubam is the filename, write uncompressed bam to stdout
00243         if((lastchar == 6) && (filename[0] == '-') && (filename[1] == '.'))
00244         {
00245             filename = "-";
00246         }
00247         myFilePtr = ifopen(filename, "wb", InputFile::UNCOMPRESSED);
00248 
00249         myInterfacePtr = new BamInterface;
00250     }
00251     else if (lastchar >= 3 && 
00252              filename[lastchar - 3] == 'b' &&
00253              filename[lastchar - 2] == 'a' &&
00254              filename[lastchar - 1] == 'm')
00255     {
00256         // BAM File.
00257         // if -.bam is the filename, write compressed bam to stdout
00258         if((lastchar == 5) && (filename[0] == '-') && (filename[1] == '.'))
00259         {
00260             filename = "-";
00261         }
00262         myFilePtr = ifopen(filename, "wb", InputFile::BGZF);
00263         
00264         myInterfacePtr = new BamInterface;
00265     }
00266     else
00267     {
00268         // SAM File
00269         // if - (followed by anything is the filename,
00270         // write uncompressed sam to stdout
00271         if((lastchar >= 1) && (filename[0] == '-'))
00272         {
00273             filename = "-";
00274         }
00275         myFilePtr = ifopen(filename, "wb", InputFile::UNCOMPRESSED);
00276    
00277         myInterfacePtr = new SamInterface;
00278     }
00279 
00280     if (myFilePtr == NULL)
00281     {
00282         std::string errorMessage = "Failed to Open ";
00283         errorMessage += filename;
00284         errorMessage += " for writing";
00285         myStatus.setStatus(SamStatus::FAIL_IO, errorMessage.c_str());
00286         return(false);
00287     }
00288    
00289     myIsOpenForWrite = true;
00290 
00291     // Successfully opened the file.
00292     myStatus = SamStatus::SUCCESS;
00293     return(true);
00294 }

bool SamFile::ReadBamIndex ( const char *  filename  ) 

Reads the specified bam index file.

It must be read prior to setting a read section, for seeking and reading portions of a bam file.

Returns:
true = success; false = failure.

Definition at line 298 of file SamFile.cpp.

References myStatus, and BamIndex::readIndex().

00299 {
00300     // Cleanup a previously setup index.
00301     if(myBamIndex != NULL)
00302     {
00303         delete myBamIndex;
00304         myBamIndex = NULL;
00305     }
00306 
00307     // Create a new bam index.
00308     myBamIndex = new BamIndex();
00309     SamStatus::Status indexStat = myBamIndex->readIndex(bamIndexFilename);
00310 
00311     if(indexStat != SamStatus::SUCCESS)
00312     {
00313         std::string errorMessage = "Failed to read the bam Index file: ";
00314         errorMessage += bamIndexFilename;
00315         myStatus.setStatus(indexStat, errorMessage.c_str());
00316         delete myBamIndex;
00317         myBamIndex = NULL;
00318         return(false);
00319     }
00320     myStatus = SamStatus::SUCCESS;
00321     return(true);
00322 }

bool SamFile::ReadHeader ( SamFileHeader header  ) 

Reads the header section from the file and stores it in the passed in header.

Returns:
true = success; false = failure.

Definition at line 369 of file SamFile.cpp.

References myHasHeader, myIsOpenForRead, and myStatus.

00370 {
00371     if(myIsOpenForRead == false)
00372     {
00373         // File is not open for read
00374         myStatus.setStatus(SamStatus::FAIL_ORDER, 
00375                            "Cannot read header since the file is not open for reading");
00376         return(false);
00377     }
00378 
00379     if(myHasHeader == true)
00380     {
00381         // The header has already been read.
00382         myStatus.setStatus(SamStatus::FAIL_ORDER, 
00383                            "Cannot read header since it has already been read.");
00384         return(false);
00385     }
00386 
00387     myStatus = myInterfacePtr->readHeader(myFilePtr, header);
00388     if(myStatus == SamStatus::SUCCESS)
00389     {
00390         // The header has now been successfully read.
00391         myHasHeader = true;
00392         return(true);
00393     }
00394     return(false);
00395 }

bool SamFile::ReadRecord ( SamFileHeader header,
SamRecord record 
)

Reads the next record from the file & stores it in the passed in record.

Returns:
true = record was successfully set. false = record was not successfully set.

Definition at line 433 of file SamFile.cpp.

References myHasHeader, myIsOpenForRead, myRecordCount, myStatistics, myStatus, readIndexedRecord(), BamIndex::REF_ID_ALL, SamRecord::setReference(), SamRecord::setSequenceTranslation(), and validateSortOrder().

00435 {
00436     myStatus = SamStatus::SUCCESS;
00437 
00438     if(myIsOpenForRead == false)
00439     {
00440         // File is not open for read
00441         myStatus.setStatus(SamStatus::FAIL_ORDER, 
00442                            "Cannot read record since the file is not open for reading");
00443         throw(std::runtime_error("SOFTWARE BUG: trying to read a SAM/BAM record prior to opening the file."));
00444         return(false);
00445     }
00446 
00447     if(myHasHeader == false)
00448     {
00449         // The header has not yet been read.
00450         // TODO - maybe just read the header.
00451         myStatus.setStatus(SamStatus::FAIL_ORDER, 
00452                            "Cannot read record since the header has not been read.");
00453         throw(std::runtime_error("SOFTWARE BUG: trying to read a SAM/BAM record prior to reading the header."));
00454         return(false);
00455     }
00456 
00457     // Check to see if a new region has been set.  If so, determine the
00458     // chunks for that region.
00459     if(myNewSection)
00460     {
00461         if(!processNewSection(header))
00462         {
00463             // Failed processing a new section.  Could be an 
00464             // order issue like the file not being open or the
00465             // indexed file not having been read.
00466             // processNewSection sets myStatus with the failure reason.
00467             return(false);
00468         }
00469     }
00470 
00471     // Check to see if the file should be read by index.
00472     if(myRefID != BamIndex::REF_ID_ALL)
00473     {
00474         // Reference ID is set, so read by index.
00475         return(readIndexedRecord(header, record));
00476     }
00477 
00478     record.setReference(myRefPtr);
00479     record.setSequenceTranslation(myReadTranslation);
00480 
00481     // File is open for reading and the header has been read, so read the next
00482     // record.
00483     myInterfacePtr->readRecord(myFilePtr, header, record, myStatus);
00484     if(myStatus == SamStatus::SUCCESS)
00485     {
00486         // A record was successfully read, so increment the record count.
00487         myRecordCount++;
00488 
00489         if(myStatistics != NULL)
00490         {
00491             // Statistics should be updated.
00492             myStatistics->updateStatistics(record);
00493         }
00494 
00495         // Successfully read the record, so check the sort order.
00496         if(!validateSortOrder(record, header))
00497         {
00498             // ValidateSortOrder sets the status on a failure.
00499             return(false);
00500         }
00501         return(true);
00502     }
00503     // Failed to read the record.
00504     return(false);
00505 }

bool SamFile::SetReadSection ( const char *  refName,
int32_t  start,
int32_t  end 
)

Sets what part of the BAM file should be read.

This version will set it to only read a specific reference name and start/end position. The records for this section will be retrieved on each ReadRecord call. When all records have been retrieved for the specified section, ReadRecord will return failure until a new read section is set. Must be called only after the file has been opened for reading.

Parameters:
refName the reference name of the records to read from the file.
start inclusive 0-based start position of records that should be read for this refID.
end exclusive 0-based end position of records that should be read for this refID.
Returns:
true = success; false = failure.

Definition at line 620 of file SamFile.cpp.

References myIsBamOpenForRead, myStatus, BamIndex::REF_ID_ALL, and BamIndex::REF_ID_UNMAPPED.

00621 {
00622     // If there is not a BAM file open for reading, return failure.
00623     // Opening a new file clears the read section, so it must be
00624     // set after the file is opened.
00625     if(!myIsBamOpenForRead)
00626     {
00627         // There is not a BAM file open for reading.
00628         myStatus.setStatus(SamStatus::FAIL_ORDER, 
00629                            "Canot set section since there is no bam file open");
00630         return(false);
00631     }
00632 
00633     myNewSection = true;
00634     myStartPos = start;
00635     myEndPos = end;
00636     if((strcmp(refName, "") == 0) || (strcmp(refName, "*") == 0))
00637     {
00638         // No Reference name specified, so read just the "-1" entries.
00639         myRefID = BamIndex::REF_ID_UNMAPPED;
00640     }
00641     else
00642     {
00643         // save the reference name and revert the reference ID to unknown
00644         // so it will be calculated later.
00645         myRefName = refName;
00646         myRefID = BamIndex::REF_ID_ALL;
00647     }
00648     myChunksToRead.clear();
00649     // Reset the end of the current chunk.  We are resetting our read, so
00650     // we no longer have a "current chunk" that we are reading.
00651     myCurrentChunkEnd = 0;
00652     myStatus = SamStatus::SUCCESS;
00653     
00654     return(true);
00655 }

bool SamFile::SetReadSection ( int32_t  refID,
int32_t  start,
int32_t  end 
)

Sets what part of the BAM file should be read.

This version will set it to only read a specific reference id and start/end position. The records for this section will be retrieved on each ReadRecord call. When all records have been retrieved for the specified section, ReadRecord will return failure until a new read section is set. Must be called only after the file has been opened for reading.

Parameters:
refID the reference ID of the records to read from the file.
start inclusive 0-based start position of records that should be read for this refID.
end exclusive 0-based end position of records that should be read for this refID.
Returns:
true = success; false = failure.

Definition at line 591 of file SamFile.cpp.

References myIsBamOpenForRead, and myStatus.

00592 {
00593     // If there is not a BAM file open for reading, return failure.
00594     // Opening a new file clears the read section, so it must be
00595     // set after the file is opened.
00596     if(!myIsBamOpenForRead)
00597     {
00598         // There is not a BAM file open for reading.
00599         myStatus.setStatus(SamStatus::FAIL_ORDER, 
00600                            "Canot set section since there is no bam file open");
00601         return(false);
00602     }
00603 
00604     myNewSection = true;
00605     myStartPos = start;
00606     myEndPos = end;
00607     myRefID = refID;
00608     myRefName.clear();
00609     myChunksToRead.clear();
00610     // Reset the end of the current chunk.  We are resetting our read, so
00611     // we no longer have a "current chunk" that we are reading.
00612     myCurrentChunkEnd = 0;
00613     myStatus = SamStatus::SUCCESS;
00614     
00615     return(true);
00616 }

bool SamFile::SetReadSection ( const char *  refName  ) 

Sets what part of the BAM file should be read.

This version will set it to only read a specific reference name. The records for that reference id will be retrieved on each ReadRecord call. When all records have been retrieved for the specified reference name, ReadRecord will return failure until a new read section is set. Must be called only after the file has been opened for reading.

Parameters:
refName the reference name of the records to read from the file.
Returns:
true = success; false = failure.

Definition at line 583 of file SamFile.cpp.

References SetReadSection().

00584 {
00585     // No start/end specified, so set back to default -1.
00586     return(SetReadSection(refName, -1, -1));
00587 }

bool SamFile::SetReadSection ( int32_t  refID  ) 

Sets what part of the BAM file should be read.

This version will set it to only read a specific reference id. The records for that reference id will be retrieved on each ReadRecord call. When all records have been retrieved for the specified reference id, ReadRecord will return failure until a new read section is set. Must be called only after the file has been opened for reading.

Parameters:
refID the reference ID of the records to read from the file.
Returns:
true = success; false = failure.

Definition at line 574 of file SamFile.cpp.

Referenced by SetReadSection().

00575 {
00576     // No start/end specified, so set back to default -1.
00577     return(SetReadSection(refID, -1, -1));
00578 }

void SamFile::SetReadSequenceTranslation ( SamRecord::SequenceTranslation  translation  ) 

Set the type of sequence translation to use when reading the sequence.

Passed down to the SamRecord when it is read. NONE (the sequence is left as-is).

Parameters:
translation type of sequence translation to use.

Definition at line 333 of file SamFile.cpp.

00334 {
00335     myReadTranslation = translation;
00336 }

void SamFile::SetReference ( GenomeSequence reference  ) 

Sets the reference to the specified genome sequence object.

Parameters:
reference pointer to the GenomeSequence object.

Definition at line 326 of file SamFile.cpp.

00327 {
00328     myRefPtr = reference;
00329 }

void SamFile::setSortedValidation ( SortedType  sortType  ) 

Set the flag to validate that the file is sorted as it is read/written.

Must be called after the file has been opened.

Definition at line 560 of file SamFile.cpp.

00561 {
00562     mySortedType = sortType;
00563 }

void SamFile::SetWriteSequenceTranslation ( SamRecord::SequenceTranslation  translation  ) 

Set the type of sequence translation to use when writing the sequence.

Passed down to the SamRecord when it is written. The default type (if this method is never called) is NONE (the sequence is left as-is).

Parameters:
translation type of sequence translation to use.

Definition at line 340 of file SamFile.cpp.

00341 {
00342     myWriteTranslation = translation;
00343 }

bool SamFile::validateSortOrder ( SamRecord record,
SamFileHeader header 
) [protected]

Validate that the record is sorted compared to the previously read record if there is one, according to the specified sort order.

If the sort order is UNSORTED, true is returned.

Definition at line 752 of file SamFile.cpp.

References FLAG, SamRecord::get0BasedPosition(), SamRecord::getReadName(), SamRecord::getReferenceID(), myPrevCoord, myRecordCount, myStatus, QUERY_NAME, BamIndex::REF_ID_UNMAPPED, SamRecord::setReference(), SamRecord::setSequenceTranslation(), and UNSORTED.

Referenced by readIndexedRecord(), ReadRecord(), and WriteRecord().

00753 {
00754     if(myRefPtr != NULL)
00755     {
00756         record.setReference(myRefPtr);
00757     }
00758     record.setSequenceTranslation(myReadTranslation);
00759 
00760     bool status = false;
00761     if(mySortedType == UNSORTED)
00762     {
00763         // Unsorted, so nothing to validate, just return true.
00764         status = true;
00765     }
00766     else 
00767     {
00768         // Check to see if mySortedType is based on the header.
00769         if(mySortedType == FLAG)
00770         {
00771             // Determine the sorted type from what was read out of the header.
00772             mySortedType = getSortOrderFromHeader(header);
00773         }
00774 
00775         if(mySortedType == QUERY_NAME)
00776         {
00777             // Validate that it is sorted by query name.
00778             // Get the query name from the record.
00779             const char* readName = record.getReadName();
00780             if(myPrevReadName.compare(readName) > 0)
00781             {
00782                 // The previous name is greater than the new record's name, so
00783                 // return false.
00784                 String errorMessage = "ERROR: File is not sorted at record ";
00785                 errorMessage += myRecordCount;
00786                 myStatus.setStatus(SamStatus::INVALID_SORT, 
00787                                    errorMessage.c_str());
00788                 status = false;
00789             }
00790             else
00791             {
00792                 myPrevReadName = readName;
00793                 status = true;
00794             }
00795         }
00796         else 
00797         {
00798             // Validate that it is sorted by COORDINATES.
00799             // Get the leftmost coordinate and the reference index.
00800             int32_t refID = record.getReferenceID();
00801             int32_t coord = record.get0BasedPosition();
00802             // The unmapped reference id is at the end of a sorted file.
00803             if(refID == BamIndex::REF_ID_UNMAPPED)
00804             {
00805                 // A new reference ID that is for the unmapped reads
00806                 // is always valid.
00807                 status = true;
00808                 myPrevRefID = refID;
00809                 myPrevCoord = coord;
00810             }
00811             else if(myPrevRefID == BamIndex::REF_ID_UNMAPPED)
00812             {
00813                 // Previous reference ID was for unmapped reads, but the
00814                 // current one is not, so this is not sorted.
00815                 String errorMessage = "ERROR: File is not sorted at record ";
00816                 errorMessage += myRecordCount;
00817                 myStatus.setStatus(SamStatus::INVALID_SORT, 
00818                                    errorMessage.c_str());
00819                 status = false;
00820             }
00821             else if(refID < myPrevRefID)
00822             {
00823                 // Current reference id is less than the previous one, 
00824                 //meaning that it is not sorted.
00825                 String errorMessage = "ERROR: File is not sorted at record ";
00826                 errorMessage += myRecordCount;
00827                 myStatus.setStatus(SamStatus::INVALID_SORT, 
00828                                    errorMessage.c_str());
00829                 status = false;
00830             }
00831             else
00832             {
00833                 // The reference IDs are in the correct order.
00834                 if(refID > myPrevRefID)
00835                 {
00836                     // New reference id, so set the previous coordinate to -1
00837                     myPrevCoord = -1;
00838                 }
00839             
00840                 // Check the coordinates.
00841                 if(coord < myPrevCoord)
00842                 {
00843                     // New Coord is less than the previous position.
00844                     String errorMessage = "ERROR: File is not sorted at record ";
00845                     errorMessage += myRecordCount;
00846                     myStatus.setStatus(SamStatus::INVALID_SORT, 
00847                                        errorMessage.c_str());
00848                     status = false;
00849                 }
00850                 else
00851                 {
00852                     myPrevRefID = refID;
00853                     myPrevCoord = coord;
00854                     status = true;
00855                 }
00856             }
00857         }
00858     }
00859 
00860     return(status);
00861 }

bool SamFile::WriteHeader ( SamFileHeader header  ) 

Writes the specified header into the file.

Returns:
true = success; false = failure.

Definition at line 399 of file SamFile.cpp.

References myHasHeader, myIsOpenForWrite, and myStatus.

00400 {
00401     if(myIsOpenForWrite == false)
00402     {
00403         // File is not open for write
00404         // -OR-
00405         // The header has already been written.
00406         myStatus.setStatus(SamStatus::FAIL_ORDER, 
00407                            "Cannot write header since the file is not open for writing");
00408         return(false);
00409     }
00410 
00411     if(myHasHeader == true)
00412     {
00413         // The header has already been written.
00414         myStatus.setStatus(SamStatus::FAIL_ORDER, 
00415                            "Cannot write header since it has already been written");
00416         return(false);
00417     }
00418 
00419     myStatus = myInterfacePtr->writeHeader(myFilePtr, header);
00420     if(myStatus == SamStatus::SUCCESS)
00421     {
00422         // The header has now been successfully written.
00423         myHasHeader = true;
00424         return(true);
00425     }
00426 
00427     // return the status.
00428     return(false);
00429 }

bool SamFile::WriteRecord ( SamFileHeader header,
SamRecord record 
)

Writes the specified record into the file.

Returns:
true = success; false = failure.

Definition at line 510 of file SamFile.cpp.

References myHasHeader, myIsOpenForWrite, myRecordCount, myStatus, SamRecord::setReference(), and validateSortOrder().

00512 {
00513     if(myIsOpenForWrite == false)
00514     {
00515         // File is not open for writing
00516         myStatus.setStatus(SamStatus::FAIL_ORDER, 
00517                            "Cannot write record since the file is not open for writing");
00518         return(false);
00519     }
00520 
00521     if(myHasHeader == false)
00522     {
00523         // The header has not yet been written.
00524         myStatus.setStatus(SamStatus::FAIL_ORDER, 
00525                            "Cannot write record since the header has not been written");
00526         return(false);
00527     }
00528 
00529     // Before trying to write the record, validate the sort order.
00530     if(!validateSortOrder(record, header))
00531     {
00532         // Not sorted like it is supposed to be, do not write the record
00533         myStatus.setStatus(SamStatus::INVALID_SORT, 
00534                            "Cannot write the record since the file is not properly sorted.");
00535         return(false);
00536     }
00537 
00538     if(myRefPtr != NULL)
00539     {
00540         record.setReference(myRefPtr);
00541     }
00542 
00543     // File is open for writing and the header has been written, so write the
00544     // record.
00545     myStatus = myInterfacePtr->writeRecord(myFilePtr, header, record,
00546                                            myWriteTranslation);
00547 
00548     if(myStatus == SamStatus::SUCCESS)
00549     {
00550         // A record was successfully written, so increment the record count.
00551         myRecordCount++;
00552         return(true);
00553     }
00554     return(false);
00555 }


Member Data Documentation

bool SamFile::myHasHeader [protected]

Flag to indicate if a header has been read/written - required before being able to read/write a record.

Definition at line 240 of file SamFile.h.

Referenced by ReadHeader(), ReadRecord(), resetFile(), WriteHeader(), and WriteRecord().


The documentation for this class was generated from the following files:
Generated on Thu Dec 9 12:22:22 2010 for StatGen Software by  doxygen 1.6.3