InputFile Class Reference

Class for easily reading/writing files without having to worry about file type (uncompressed, gzip, bgzf) when reading. More...

#include <InputFile.h>

Collaboration diagram for InputFile:
Collaboration graph
[legend]

List of all members.

Public Types

enum  ifileCompression { DEFAULT, UNCOMPRESSED, GZIP, BGZF }
 

Compression to use when writing a file & decompression used when reading a file from stdin.

More...

Public Member Functions

 InputFile ()
 Default constructor.
 ~InputFile ()
 Destructor.
 InputFile (const char *filename, const char *mode, InputFile::ifileCompression compressionMode=InputFile::DEFAULT)
 Constructor for opening a file.
void bufferReads (unsigned int bufferSize=DEFAULT_BUFFER_SIZE)
 Set the buffer size for reading from files so that bufferSize bytes are read at a time and stored until accessed by another read call.
void disableBuffering ()
 Disable read buffering.
int ifclose ()
 Close the file.
int ifread (void *buffer, unsigned int size)
 Read size bytes from the file into the buffer.
int ifgetc ()
 Get a character from the file.
void ifrewind ()
 Reset to the beginning of the file.
int ifeof ()
 Check to see if we have reached the EOF.
unsigned int ifwrite (const void *buffer, unsigned int size)
 Write the specified buffer into the file.
bool isOpen ()
 Returns whether or not the file was successfully opened.
int64_t iftell ()
 Get current position in the file.
bool ifseek (int64_t offset, int origin)
 Seek to the specified offset from the origin.
const char * getFileName () const
 Get the filename that is currently opened.
void setAttemptRecovery (bool flag=false)
 Enable (default) or disable recovery.
bool attemptRecoverySync (bool(*checkSignature)(void *data), int length)
bool openFile (const char *filename, const char *mode, InputFile::ifileCompression compressionMode)

Protected Member Functions

int readFromFile (void *buffer, unsigned int size)

Protected Attributes

FileTypemyFileTypePtr
unsigned int myAllocatedBufferSize
char * myFileBuffer
int myBufferIndex
int myCurrentBufferSize
std::string myFileName

Static Protected Attributes

static const unsigned int DEFAULT_BUFFER_SIZE = 1048576

Detailed Description

Class for easily reading/writing files without having to worry about file type (uncompressed, gzip, bgzf) when reading.

It hides the low level file operations/structure from the user, allowing them to generically open and operate on a file using the same interface without knowing the file format (standard uncompressed, gzip, or bgzf). For writing, the user must specify the file type. There is a typedef IFILE which is InputFile* and setup to mimic FILE including global methods that take IFILE as a parameter.

Definition at line 42 of file InputFile.h.


Member Enumeration Documentation

Compression to use when writing a file & decompression used when reading a file from stdin.

Any other read checks the file to determine how to uncompress it.

Enumerator:
DEFAULT 

Check the extension, if it is ".gz", treat as gzip, otherwise treat it as UNCOMPRESSED.

UNCOMPRESSED 

uncompressed file.

GZIP 

gzip file.

BGZF 

bgzf file.

Definition at line 50 of file InputFile.h.

00050                           {
00051         DEFAULT,  ///< Check the extension, if it is ".gz", treat as gzip, otherwise treat it as UNCOMPRESSED.
00052         UNCOMPRESSED,  ///< uncompressed file.
00053         GZIP,  ///< gzip file.
00054         BGZF ///< bgzf file.
00055     };


Constructor & Destructor Documentation

InputFile::InputFile ( const char *  filename,
const char *  mode,
InputFile::ifileCompression  compressionMode = InputFile::DEFAULT 
)

Constructor for opening a file.

Parameters:
filename file to open
mode same format as fopen: "r" for read & "w" for write.
compressionMode set the type of file to open for writing or for reading from stdin (when reading files, the compression type is determined by reading the file).

Definition at line 28 of file InputFile.cpp.

00030 {
00031     // XXX duplicate code
00032     myAttemptRecovery = false;
00033     myFileTypePtr = NULL;
00034     myBufferIndex = 0;
00035     myCurrentBufferSize = 0;
00036     myAllocatedBufferSize = DEFAULT_BUFFER_SIZE;
00037     myFileBuffer = new char[myAllocatedBufferSize];
00038     myFileName.clear();
00039 
00040     openFile(filename, mode, compressionMode);
00041 }


Member Function Documentation

void InputFile::bufferReads ( unsigned int  bufferSize = DEFAULT_BUFFER_SIZE  )  [inline]

Set the buffer size for reading from files so that bufferSize bytes are read at a time and stored until accessed by another read call.

This improves performance over reading the file small bits at a time. Buffering reads disables the tell call for bgzf files. Any previous values in the buffer will be deleted.

Parameters:
bufferSize number of bytes to read/buffer at a time, default buffer size is 1048576, and turn off read buffering by setting bufferSize = 1;

Definition at line 90 of file InputFile.h.

Referenced by disableBuffering().

00091     {
00092         // If the buffer size is the same, do nothing.
00093         if(bufferSize == myAllocatedBufferSize)
00094         {
00095             return;
00096         }
00097         // Delete the previous buffer.
00098         if(myFileBuffer != NULL)
00099         {
00100             delete[] myFileBuffer;
00101         }
00102         myBufferIndex = 0;
00103         myCurrentBufferSize = 0;
00104         // The buffer size must be at least 1 so one character can be
00105         // read and ifgetc can just assume reading into the buffer.
00106         if(bufferSize < 1)
00107         {
00108             bufferSize = 1;
00109         }
00110         myFileBuffer = new char[bufferSize];
00111         myAllocatedBufferSize = bufferSize;
00112 
00113         if(myFileTypePtr != NULL)
00114         {
00115             if(bufferSize == 1)
00116             {
00117                 myFileTypePtr->setBuffered(false);
00118             }
00119             else
00120             {
00121                 myFileTypePtr->setBuffered(true);
00122             }
00123         }
00124     }

const char* InputFile::getFileName (  )  const [inline]

Get the filename that is currently opened.

Returns:
filename associated with this class

Definition at line 411 of file InputFile.h.

Referenced by SamFile::ReadBamIndex().

00412     {
00413         return(myFileName.c_str());
00414     }

int InputFile::ifclose (  )  [inline]

Close the file.

Returns:
status of the close (0 is success).

Definition at line 140 of file InputFile.h.

Referenced by ifclose().

00141     {
00142         if (myFileTypePtr == NULL)
00143         {
00144             return EOF;
00145         }
00146         int result = myFileTypePtr->close();
00147         delete myFileTypePtr;
00148         myFileTypePtr = NULL;
00149         myFileName.clear();
00150         return result;
00151     }

int InputFile::ifeof (  )  [inline]

Check to see if we have reached the EOF.

Returns:
0 if not EOF, any other value means EOF.

Definition at line 324 of file InputFile.h.

Referenced by ifeof().

00325     {
00326         // Not EOF if we are not at the end of the buffer.
00327         if (myBufferIndex < myCurrentBufferSize)
00328         {
00329             // There are still available bytes in the buffer, so NOT EOF.
00330             return false;
00331         }
00332         else
00333         {
00334             if (myFileTypePtr == NULL)
00335             {
00336                 // No myFileTypePtr, so not eof (return 0).
00337                 return 0;
00338             }
00339             // exhausted our buffer, so check the file for eof.
00340             return myFileTypePtr->eof();
00341         }
00342     }

int InputFile::ifgetc (  )  [inline]

Get a character from the file.

Read a character from the internal buffer, or if the end of the buffer has been reached, read from the file into the buffer and return index 0.

Returns:
character that was read or EOF.

Definition at line 288 of file InputFile.h.

Referenced by ifgetc(), and operator>>().

00289     {
00290         if (myBufferIndex >= myCurrentBufferSize)
00291         {
00292             // at the last index, read a new buffer.
00293             myCurrentBufferSize = readFromFile(myFileBuffer, myAllocatedBufferSize);
00294             myBufferIndex = 0;
00295         }
00296         // If the buffer index is still greater than or equal to the
00297         // myCurrentBufferSize, then we failed to read the file - return EOF.
00298         if (myBufferIndex >= myCurrentBufferSize)
00299         {
00300             return(EOF);
00301         }
00302         return(myFileBuffer[myBufferIndex++]);
00303     }

int InputFile::ifread ( void *  buffer,
unsigned int  size 
) [inline]

Read size bytes from the file into the buffer.

Parameters:
buffer pointer to memory at least size bytes big to write the data into.
size number of bytes to be read
Returns:
number of bytes read, if it is not equal to size, there was either an error or the end of the file was reached, use ifeof to determine which case it was.

Definition at line 160 of file InputFile.h.

Referenced by ifread().

00161     {
00162         // There are 2 cases:
00163         //  1) There are already size available bytes in buffer.
00164         //  2) There are not size bytes in buffer.
00165 
00166         // Determine the number of available bytes in the buffer.
00167         unsigned int availableBytes = myCurrentBufferSize - myBufferIndex;
00168         int returnSize = 0;
00169 
00170         // Case 1: There are already size available bytes in buffer.
00171         if (size <= availableBytes)
00172         {
00173             //   Just copy from the buffer, increment the index and return.
00174             memcpy(buffer, myFileBuffer+myBufferIndex, size);
00175             // Increment the buffer index.
00176             myBufferIndex += size;
00177             returnSize = size;
00178         }
00179         // Case 2: There are not size bytes in buffer.
00180         else
00181         {
00182             // Check to see if there are some bytes in the buffer.
00183             if (availableBytes > 0)
00184             {
00185                 // Size > availableBytes > 0
00186                 // Copy the available bytes into the buffer.
00187                 memcpy(buffer, myFileBuffer+myBufferIndex, availableBytes);
00188             }
00189             // So far availableBytes have been copied into the read buffer.
00190             returnSize = availableBytes;
00191             // Increment myBufferIndex  by what was read.
00192             myBufferIndex += availableBytes;
00193 
00194             unsigned int remainingSize = size - availableBytes;
00195 
00196             // Check if the remaining size is more or less than the
00197             // max buffer size.
00198             if(remainingSize < myAllocatedBufferSize)
00199             {
00200                 // the remaining size is not the full buffer, but read
00201                 //  a full buffer worth of data anyway.
00202                 myCurrentBufferSize =
00203                     readFromFile(myFileBuffer, myAllocatedBufferSize);
00204 
00205                 // Check for an error.
00206                 if(myCurrentBufferSize <= 0)
00207                 {
00208                     // No more data was successfully read, so check to see
00209                     // if any data was copied to the return buffer at all.
00210                     if( returnSize == 0)
00211                     {
00212                         // No data has been copied at all into the
00213                         // return read buffer, so just return the value
00214                         // returned from readFromFile.
00215                         returnSize = myCurrentBufferSize;
00216                         // Otherwise, returnSize is already set to the
00217                         // available bytes that was already copied (so no
00218                         // else statement is needed).
00219                     }
00220                     // Set myBufferIndex & myCurrentBufferSize to 0.
00221                     myCurrentBufferSize = 0;
00222                     myBufferIndex = 0;
00223                 }
00224                 else
00225                 {
00226                     // Successfully read more data.
00227                     // Check to see how much was copied.
00228                     int copySize = remainingSize;
00229                     if(copySize > myCurrentBufferSize)
00230                     {
00231                         // Not the entire requested amount was read
00232                         // (either from EOF or there was a partial read due to
00233                         // an error), so set the copySize to what was read.
00234                         copySize = myCurrentBufferSize;
00235                     }
00236 
00237                     // Now copy the rest of the bytes into the buffer.
00238                     memcpy((char*)buffer+availableBytes, 
00239                            myFileBuffer, copySize);
00240 
00241                     // set the buffer index to the location after what we are
00242                     // returning as read.
00243                     myBufferIndex = copySize;
00244                 
00245                     returnSize += copySize;
00246                 }
00247             }
00248             else
00249             {
00250                 // More remaining to be read than the max buffer size, so just
00251                 // read directly into the output buffer.
00252                 int readSize = readFromFile((char*)buffer + availableBytes,
00253                                             remainingSize);
00254 
00255                 // Already used the buffer, so "clear" it.
00256                 myCurrentBufferSize = 0;
00257                 myBufferIndex = 0;
00258                 if(readSize <= 0)
00259                 {
00260                     // No more data was successfully read, so check to see
00261                     // if any data was copied to the return buffer at all.
00262                     if(returnSize == 0)
00263                     {
00264                         // No data has been copied at all into the
00265                         // return read buffer, so just return the value
00266                         // returned from readFromFile.
00267                         returnSize = readSize;
00268                         // Otherwise, returnSize is already set to the
00269                         // available bytes that was already copied (so no
00270                         // else statement is needed).
00271                     }
00272                 }
00273                 else
00274                 {
00275                     // More data was read, so increment the return count.
00276                     returnSize += readSize;
00277                 }
00278             }
00279         }
00280         return(returnSize);
00281     }

bool InputFile::ifseek ( int64_t  offset,
int  origin 
) [inline]

Seek to the specified offset from the origin.

Parameters:
offset offset into the file to move to (must be from a tell call)
origin can be any of the following: Note: not all are valid for all filetypes. SEEK_SET - Beginning of file SEEK_CUR - Current position of the file pointer SEEK_END - End of file
Returns:
true on successful seek and false on a failed seek.

Definition at line 395 of file InputFile.h.

Referenced by ifseek().

00396     {
00397         if (myFileTypePtr == NULL)
00398         {
00399             // No myFileTypePtr, so return false - could not seek.
00400             return false;
00401         }
00402         // TODO - may be able to seek within the buffer if applicable.
00403         // Reset buffering since a seek is being done.
00404         myBufferIndex = 0;
00405         myCurrentBufferSize = 0;
00406         return myFileTypePtr->seek(offset, origin);
00407     }

int64_t InputFile::iftell (  )  [inline]

Get current position in the file.

Returns:
current position in the file, -1 indicates an error.

Definition at line 374 of file InputFile.h.

Referenced by iftell().

00375     {
00376         if (myFileTypePtr == NULL)
00377         {
00378             // No myFileTypePtr, so return false - could not seek.
00379             return -1;
00380         }
00381         int64_t pos = myFileTypePtr->tell();
00382         pos -= (myCurrentBufferSize - myBufferIndex);
00383         return(pos);
00384     }

unsigned int InputFile::ifwrite ( const void *  buffer,
unsigned int  size 
) [inline]

Write the specified buffer into the file.

Parameters:
buffer buffer containing size bytes to write to the file.
size number of bytes to write
Returns:
number of bytes written We do not buffer the write call, so just leave this as normal.

Definition at line 349 of file InputFile.h.

Referenced by ifwrite().

00350     {
00351         if (myFileTypePtr == NULL)
00352         {
00353             // No myFileTypePtr, so return 0 - nothing written.
00354             return 0;
00355         }
00356         return myFileTypePtr->write(buffer, size);
00357     }

bool InputFile::isOpen (  )  [inline]

Returns whether or not the file was successfully opened.

Returns:
true if the file is open, false if not.

Definition at line 361 of file InputFile.h.

Referenced by ifopen(), SamFile::IsOpen(), GlfHeader::read(), SamRecord::setBufferFromFile(), GlfHeader::write(), and SamRecord::writeRecordBuffer().

00362     {
00363         // It is open if the myFileTypePtr is set and says it is open.
00364         if ((myFileTypePtr != NULL) && myFileTypePtr->isOpen())
00365         {
00366             return true;
00367         }
00368         // File was not successfully opened.
00369         return false;
00370     }

void InputFile::setAttemptRecovery ( bool  flag = false  )  [inline]

Enable (default) or disable recovery.

When true, we can attach a myFileTypePtr that implements a recovery capable decompressor. This requires that the caller be able to catch the exception XXX "blah blah blah".

Definition at line 423 of file InputFile.h.

Referenced by SamFile::OpenForRead().

00424     {
00425         myAttemptRecovery = flag;
00426     }


The documentation for this class was generated from the following files:
Generated on Tue Sep 6 17:52:02 2011 for libStatGen Software by  doxygen 1.6.3