InputFile Class Reference

Class for easily reading/writing files without having to worry about file type (uncompressed, gzip, bgzf) when reading. More...

#include <InputFile.h>

Inheritance diagram for InputFile:
Inheritance graph
[legend]
Collaboration diagram for InputFile:
Collaboration graph
[legend]

List of all members.

Public Types

enum  ifileCompression { DEFAULT, UNCOMPRESSED, GZIP, BGZF }
 

Compression to use when writing a file & decompression used when reading a file from stdin.

More...

Public Member Functions

 InputFile ()
 Default constructor.
 ~InputFile ()
 Destructor.
 InputFile (const char *filename, const char *mode, InputFile::ifileCompression compressionMode=InputFile::DEFAULT)
 Constructor for opening a file.
void bufferReads (unsigned int bufferSize=DEFAULT_BUFFER_SIZE)
 Set the buffer size for reading from files so that bufferSize bytes are read at a time and stored until accessed by another read call.
void disableBuffering ()
 Disable read buffering.
int ifclose ()
 Close the file.
int ifread (void *buffer, unsigned int size)
 Read size bytes from the file into the buffer.
int readTilChar (const std::string &stopChars, std::string &stringRef)
 Read until the specified characters, returning which character was found causing the stop, -1 returned for EOF, storing the other read characters into the specified string.
int readTilChar (const std::string &stopChars)
 Read until the specified characters, returning which character was found causing the stop, -1 returned for EOF, dropping all read chars.
int discardLine ()
 Read until the end of the line, discarding the characters, returning -1 returned for EOF and returning 0 if the end of the line was found.
int readLine (std::string &line)
 Read, appending the characters into the specified string until new line or EOF is found, returning -1 if EOF is found first and 0 if new line is found first.
int readTilTab (std::string &field)
 Read, appending the characters into the specified string until tab, new line, or EOF is found, returning -1 if EOF is found first, 0 if new line is found first, or 1 if a tab is found first.
int ifgetc ()
 Get a character from the file.
bool ifgetline (void *voidBuffer, size_t max)
 Get a line from the file.
void ifrewind ()
 Reset to the beginning of the file.
int ifeof ()
 Check to see if we have reached the EOF.
unsigned int ifwrite (const void *buffer, unsigned int size)
 Write the specified buffer into the file.
bool isOpen ()
 Returns whether or not the file was successfully opened.
int64_t iftell ()
 Get current position in the file.
bool ifseek (int64_t offset, int origin)
 Seek to the specified offset from the origin.
const char * getFileName () const
 Get the filename that is currently opened.
void setAttemptRecovery (bool flag=false)
 Enable (default) or disable recovery.
bool attemptRecoverySync (bool(*checkSignature)(void *data), int length)
bool openFile (const char *filename, const char *mode, InputFile::ifileCompression compressionMode)

Protected Member Functions

int readFromFile (void *buffer, unsigned int size)

Protected Attributes

FileTypemyFileTypePtr
unsigned int myAllocatedBufferSize
char * myFileBuffer
int myBufferIndex
int myCurrentBufferSize
std::string myFileName

Static Protected Attributes

static const unsigned int DEFAULT_BUFFER_SIZE = 1048576

Detailed Description

Class for easily reading/writing files without having to worry about file type (uncompressed, gzip, bgzf) when reading.

It hides the low level file operations/structure from the user, allowing them to generically open and operate on a file using the same interface without knowing the file format (standard uncompressed, gzip, or bgzf). For writing, the user must specify the file type. There is a typedef IFILE which is InputFile* and setup to mimic FILE including global methods that take IFILE as a parameter.

Definition at line 36 of file InputFile.h.


Member Enumeration Documentation

Compression to use when writing a file & decompression used when reading a file from stdin.

Any other read checks the file to determine how to uncompress it.

Enumerator:
DEFAULT 

Check the extension, if it is ".gz", treat as gzip, otherwise treat it as UNCOMPRESSED.

UNCOMPRESSED 

uncompressed file.

GZIP 

gzip file.

BGZF 

bgzf file.

Definition at line 44 of file InputFile.h.

00044                           {
00045         DEFAULT,  ///< Check the extension, if it is ".gz", treat as gzip, otherwise treat it as UNCOMPRESSED.
00046         UNCOMPRESSED,  ///< uncompressed file.
00047         GZIP,  ///< gzip file.
00048         BGZF ///< bgzf file.
00049     };


Constructor & Destructor Documentation

InputFile::InputFile ( const char *  filename,
const char *  mode,
InputFile::ifileCompression  compressionMode = InputFile::DEFAULT 
)

Constructor for opening a file.

Parameters:
filename file to open
mode same format as fopen: "r" for read & "w" for write.
compressionMode set the type of file to open for writing or for reading from stdin (when reading files, the compression type is determined by reading the file).

Definition at line 28 of file InputFile.cpp.

00030 {
00031     // XXX duplicate code
00032     myAttemptRecovery = false;
00033     myFileTypePtr = NULL;
00034     myBufferIndex = 0;
00035     myCurrentBufferSize = 0;
00036     myAllocatedBufferSize = DEFAULT_BUFFER_SIZE;
00037     myFileBuffer = new char[myAllocatedBufferSize];
00038     myFileName.clear();
00039 
00040     openFile(filename, mode, compressionMode);
00041 }


Member Function Documentation

void InputFile::bufferReads ( unsigned int  bufferSize = DEFAULT_BUFFER_SIZE  )  [inline]

Set the buffer size for reading from files so that bufferSize bytes are read at a time and stored until accessed by another read call.

This improves performance over reading the file small bits at a time. Buffering reads disables the tell call for bgzf files. Any previous values in the buffer will be deleted.

Parameters:
bufferSize number of bytes to read/buffer at a time, default buffer size is 1048576, and turn off read buffering by setting bufferSize = 1;

Definition at line 84 of file InputFile.h.

Referenced by disableBuffering().

00085     {
00086         // If the buffer size is the same, do nothing.
00087         if(bufferSize == myAllocatedBufferSize)
00088         {
00089             return;
00090         }
00091         // Delete the previous buffer.
00092         if(myFileBuffer != NULL)
00093         {
00094             delete[] myFileBuffer;
00095         }
00096         myBufferIndex = 0;
00097         myCurrentBufferSize = 0;
00098         // The buffer size must be at least 1 so one character can be
00099         // read and ifgetc can just assume reading into the buffer.
00100         if(bufferSize < 1)
00101         {
00102             bufferSize = 1;
00103         }
00104         myFileBuffer = new char[bufferSize];
00105         myAllocatedBufferSize = bufferSize;
00106 
00107         if(myFileTypePtr != NULL)
00108         {
00109             if(bufferSize == 1)
00110             {
00111                 myFileTypePtr->setBuffered(false);
00112             }
00113             else
00114             {
00115                 myFileTypePtr->setBuffered(true);
00116             }
00117         }
00118     }

int InputFile::discardLine (  ) 

Read until the end of the line, discarding the characters, returning -1 returned for EOF and returning 0 if the end of the line was found.

Returns:
0 if the end of the line was found before EOF or -1 for EOF.

Definition at line 95 of file InputFile.cpp.

References ifgetc().

00096 {
00097     int charRead = 0;
00098     // Loop until the character was not found in the stop characters.
00099     while((charRead != EOF) && (charRead != '\n'))
00100     {
00101         charRead = ifgetc();
00102     }
00103     // First Check for EOF.  If EOF is found, just return -1
00104     if(charRead == EOF)
00105     {
00106         return(-1);
00107     }
00108     return(0);
00109 }

const char* InputFile::getFileName (  )  const [inline]

Get the filename that is currently opened.

Returns:
filename associated with this class

Definition at line 474 of file InputFile.h.

Referenced by SamFile::ReadBamIndex().

00475     {
00476         return(myFileName.c_str());
00477     }

int InputFile::ifclose (  )  [inline]

Close the file.

Returns:
status of the close (0 is success).

Definition at line 134 of file InputFile.h.

Referenced by ifclose().

00135     {
00136         if (myFileTypePtr == NULL)
00137         {
00138             return EOF;
00139         }
00140         int result = myFileTypePtr->close();
00141         delete myFileTypePtr;
00142         myFileTypePtr = NULL;
00143         myFileName.clear();
00144         return result;
00145     }

int InputFile::ifeof (  )  [inline]

Check to see if we have reached the EOF.

Returns:
0 if not EOF, any other value means EOF.

Definition at line 387 of file InputFile.h.

Referenced by ifeof(), readLine(), and readTilTab().

00388     {
00389         // Not EOF if we are not at the end of the buffer.
00390         if (myBufferIndex < myCurrentBufferSize)
00391         {
00392             // There are still available bytes in the buffer, so NOT EOF.
00393             return false;
00394         }
00395         else
00396         {
00397             if (myFileTypePtr == NULL)
00398             {
00399                 // No myFileTypePtr, so not eof (return 0).
00400                 return 0;
00401             }
00402             // exhausted our buffer, so check the file for eof.
00403             return myFileTypePtr->eof();
00404         }
00405     }

int InputFile::ifgetc (  )  [inline]

Get a character from the file.

Read a character from the internal buffer, or if the end of the buffer has been reached, read from the file into the buffer and return index 0.

Returns:
character that was read or EOF.

Definition at line 325 of file InputFile.h.

Referenced by discardLine(), ifgetc(), ifgetline(), operator>>(), readLine(), readTilChar(), and readTilTab().

00326     {
00327         if (myBufferIndex >= myCurrentBufferSize)
00328         {
00329             // at the last index, read a new buffer.
00330             myCurrentBufferSize = readFromFile(myFileBuffer, myAllocatedBufferSize);
00331             myBufferIndex = 0;
00332             // If the buffer index is still greater than or equal to the
00333             // myCurrentBufferSize, then we failed to read the file - return EOF.
00334             // NB: This only needs to be checked when myCurrentBufferSize
00335             // is changed.  Simplify check - readFromFile returns zero on EOF
00336             if (myCurrentBufferSize == 0)
00337             {
00338                 return(EOF);
00339             }
00340         }
00341         return(myFileBuffer[myBufferIndex++]);
00342     }

bool InputFile::ifgetline ( void *  voidBuffer,
size_t  max 
) [inline]

Get a line from the file.

Parameters:
buffer the buffer into which data is to be placed
max the maximum size of the buffer, in bytes
Returns:
true if the last character read was an EOF

Definition at line 348 of file InputFile.h.

References ifgetc().

Referenced by ifgetline().

00349     {
00350         int ch;
00351         char *buffer = (char *) voidBuffer;
00352 
00353         while( (ch=ifgetc()) != '\n' && ch != EOF) {
00354             *buffer++ = ch;
00355             if((--max)<2)
00356             {
00357                 // truncate the line, so drop remainder
00358                 while( (ch=ifgetc()) && ch != '\n' && ch != EOF)
00359                 {
00360                 }
00361                 break;
00362             }
00363         }
00364         *buffer++ = '\0';
00365         return ch==EOF;
00366     }

int InputFile::ifread ( void *  buffer,
unsigned int  size 
) [inline]

Read size bytes from the file into the buffer.

Parameters:
buffer pointer to memory at least size bytes big to write the data into.
size number of bytes to be read
Returns:
number of bytes read, if it is not equal to size, there was either an error or the end of the file was reached, use ifeof to determine which case it was.

Definition at line 154 of file InputFile.h.

Referenced by ifread().

00155     {
00156         // There are 2 cases:
00157         //  1) There are already size available bytes in buffer.
00158         //  2) There are not size bytes in buffer.
00159 
00160         // Determine the number of available bytes in the buffer.
00161         unsigned int availableBytes = myCurrentBufferSize - myBufferIndex;
00162         int returnSize = 0;
00163 
00164         // Case 1: There are already size available bytes in buffer.
00165         if (size <= availableBytes)
00166         {
00167             //   Just copy from the buffer, increment the index and return.
00168             memcpy(buffer, myFileBuffer+myBufferIndex, size);
00169             // Increment the buffer index.
00170             myBufferIndex += size;
00171             returnSize = size;
00172         }
00173         // Case 2: There are not size bytes in buffer.
00174         else
00175         {
00176             // Check to see if there are some bytes in the buffer.
00177             if (availableBytes > 0)
00178             {
00179                 // Size > availableBytes > 0
00180                 // Copy the available bytes into the buffer.
00181                 memcpy(buffer, myFileBuffer+myBufferIndex, availableBytes);
00182             }
00183             // So far availableBytes have been copied into the read buffer.
00184             returnSize = availableBytes;
00185             // Increment myBufferIndex  by what was read.
00186             myBufferIndex += availableBytes;
00187 
00188             unsigned int remainingSize = size - availableBytes;
00189 
00190             // Check if the remaining size is more or less than the
00191             // max buffer size.
00192             if(remainingSize < myAllocatedBufferSize)
00193             {
00194                 // the remaining size is not the full buffer, but read
00195                 //  a full buffer worth of data anyway.
00196                 myCurrentBufferSize =
00197                     readFromFile(myFileBuffer, myAllocatedBufferSize);
00198 
00199                 // Check for an error.
00200                 if(myCurrentBufferSize <= 0)
00201                 {
00202                     // No more data was successfully read, so check to see
00203                     // if any data was copied to the return buffer at all.
00204                     if( returnSize == 0)
00205                     {
00206                         // No data has been copied at all into the
00207                         // return read buffer, so just return the value
00208                         // returned from readFromFile.
00209                         returnSize = myCurrentBufferSize;
00210                         // Otherwise, returnSize is already set to the
00211                         // available bytes that was already copied (so no
00212                         // else statement is needed).
00213                     }
00214                     // Set myBufferIndex & myCurrentBufferSize to 0.
00215                     myCurrentBufferSize = 0;
00216                     myBufferIndex = 0;
00217                 }
00218                 else
00219                 {
00220                     // Successfully read more data.
00221                     // Check to see how much was copied.
00222                     int copySize = remainingSize;
00223                     if(copySize > myCurrentBufferSize)
00224                     {
00225                         // Not the entire requested amount was read
00226                         // (either from EOF or there was a partial read due to
00227                         // an error), so set the copySize to what was read.
00228                         copySize = myCurrentBufferSize;
00229                     }
00230 
00231                     // Now copy the rest of the bytes into the buffer.
00232                     memcpy((char*)buffer+availableBytes, 
00233                            myFileBuffer, copySize);
00234 
00235                     // set the buffer index to the location after what we are
00236                     // returning as read.
00237                     myBufferIndex = copySize;
00238                 
00239                     returnSize += copySize;
00240                 }
00241             }
00242             else
00243             {
00244                 // More remaining to be read than the max buffer size, so just
00245                 // read directly into the output buffer.
00246                 int readSize = readFromFile((char*)buffer + availableBytes,
00247                                             remainingSize);
00248 
00249                 // Already used the buffer, so "clear" it.
00250                 myCurrentBufferSize = 0;
00251                 myBufferIndex = 0;
00252                 if(readSize <= 0)
00253                 {
00254                     // No more data was successfully read, so check to see
00255                     // if any data was copied to the return buffer at all.
00256                     if(returnSize == 0)
00257                     {
00258                         // No data has been copied at all into the
00259                         // return read buffer, so just return the value
00260                         // returned from readFromFile.
00261                         returnSize = readSize;
00262                         // Otherwise, returnSize is already set to the
00263                         // available bytes that was already copied (so no
00264                         // else statement is needed).
00265                     }
00266                 }
00267                 else
00268                 {
00269                     // More data was read, so increment the return count.
00270                     returnSize += readSize;
00271                 }
00272             }
00273         }
00274         return(returnSize);
00275     }

bool InputFile::ifseek ( int64_t  offset,
int  origin 
) [inline]

Seek to the specified offset from the origin.

Parameters:
offset offset into the file to move to (must be from a tell call)
origin can be any of the following: Note: not all are valid for all filetypes. SEEK_SET - Beginning of file SEEK_CUR - Current position of the file pointer SEEK_END - End of file
Returns:
true on successful seek and false on a failed seek.

Definition at line 458 of file InputFile.h.

Referenced by ifseek().

00459     {
00460         if (myFileTypePtr == NULL)
00461         {
00462             // No myFileTypePtr, so return false - could not seek.
00463             return false;
00464         }
00465         // TODO - may be able to seek within the buffer if applicable.
00466         // Reset buffering since a seek is being done.
00467         myBufferIndex = 0;
00468         myCurrentBufferSize = 0;
00469         return myFileTypePtr->seek(offset, origin);
00470     }

int64_t InputFile::iftell (  )  [inline]

Get current position in the file.

Returns:
current position in the file, -1 indicates an error.

Definition at line 437 of file InputFile.h.

Referenced by iftell().

00438     {
00439         if (myFileTypePtr == NULL)
00440         {
00441             // No myFileTypePtr, so return false - could not seek.
00442             return -1;
00443         }
00444         int64_t pos = myFileTypePtr->tell();
00445         pos -= (myCurrentBufferSize - myBufferIndex);
00446         return(pos);
00447     }

unsigned int InputFile::ifwrite ( const void *  buffer,
unsigned int  size 
) [inline]

Write the specified buffer into the file.

Parameters:
buffer buffer containing size bytes to write to the file.
size number of bytes to write
Returns:
number of bytes written We do not buffer the write call, so just leave this as normal.

Definition at line 412 of file InputFile.h.

Referenced by ifwrite(), and operator<<().

00413     {
00414         if (myFileTypePtr == NULL)
00415         {
00416             // No myFileTypePtr, so return 0 - nothing written.
00417             return 0;
00418         }
00419         return myFileTypePtr->write(buffer, size);
00420     }

bool InputFile::isOpen (  )  [inline]

Returns whether or not the file was successfully opened.

Returns:
true if the file is open, false if not.

Definition at line 424 of file InputFile.h.

Referenced by ifopen(), FastQFile::isOpen(), SamFile::IsOpen(), GlfHeader::read(), SamRecord::setBufferFromFile(), GlfHeader::write(), and SamRecord::writeRecordBuffer().

00425     {
00426         // It is open if the myFileTypePtr is set and says it is open.
00427         if ((myFileTypePtr != NULL) && myFileTypePtr->isOpen())
00428         {
00429             return true;
00430         }
00431         // File was not successfully opened.
00432         return false;
00433     }

int InputFile::readLine ( std::string &  line  ) 

Read, appending the characters into the specified string until new line or EOF is found, returning -1 if EOF is found first and 0 if new line is found first.

The new line and EOF are not written into the specified string.

Parameters:
line reference to a string that the read characters should be apppended to (does not include the new line or eof).
Returns:
0 if new line and -1 for EOF.

Definition at line 112 of file InputFile.cpp.

References ifeof(), and ifgetc().

00113 {
00114     int charRead = 0;
00115     while(!ifeof())
00116     {
00117         charRead = ifgetc();
00118         if(charRead == EOF)
00119         {
00120             return(-1);
00121         }
00122         if(charRead == '\n')
00123         {
00124             return(0);
00125         }
00126         line += charRead;
00127     }
00128     // Should never get here.
00129     return(-1);
00130 }

int InputFile::readTilChar ( const std::string &  stopChars  ) 

Read until the specified characters, returning which character was found causing the stop, -1 returned for EOF, dropping all read chars.

Note: If stopChars is just '
', discardLine is faster.

Parameters:
stopChars characters to stop reading when they are hit.
Returns:
index of the character in stopChars that caused it to stop reading or -1 for EOF.

Definition at line 73 of file InputFile.cpp.

References ifgetc().

00074 {
00075     int charRead = 0;
00076     size_t pos = std::string::npos;
00077     // Loop until the character was not found in the stop characters.
00078     while(pos == std::string::npos)
00079     {
00080         charRead = ifgetc();
00081 
00082         // First Check for EOF.  If EOF is found, just return -1
00083         if(charRead == EOF)
00084         {
00085             return(-1);
00086         }
00087         
00088         // Try to find the character in the stopChars.
00089         pos = stopChars.find(charRead);
00090     }
00091     return(pos);
00092 }

int InputFile::readTilChar ( const std::string &  stopChars,
std::string &  stringRef 
)

Read until the specified characters, returning which character was found causing the stop, -1 returned for EOF, storing the other read characters into the specified string.

Note: If stopChars is just '
', readLine is faster and if stopChars is just '
' and '', readTilTab is faster.

Parameters:
stopChars characters to stop reading when they are hit.
stringRef reference to a string that the read characters should be apppended to (does not include the stopchar).
Returns:
index of the character in stopChars that caused it to stop reading or -1 for EOF.

Definition at line 44 of file InputFile.cpp.

References ifgetc().

00045 {
00046     int charRead = 0;
00047     size_t pos = std::string::npos;
00048     // Loop until the character was not found in the stop characters.
00049     while(pos == std::string::npos)
00050     {
00051         charRead = ifgetc();
00052 
00053         // First Check for EOF.  If EOF is found, just return -1
00054         if(charRead == EOF)
00055         {
00056             return(-1);
00057         }
00058         
00059         // Try to find the character in the stopChars.
00060         pos = stopChars.find(charRead);
00061 
00062         if(pos == std::string::npos)
00063         {
00064             // Didn't find a stop character and it is not an EOF, 
00065             // so add it to the string.
00066             stringRef += charRead;
00067         }
00068     }
00069     return(pos);
00070 }

int InputFile::readTilTab ( std::string &  field  ) 

Read, appending the characters into the specified string until tab, new line, or EOF is found, returning -1 if EOF is found first, 0 if new line is found first, or 1 if a tab is found first.

The tab, new line, and EOF are not written into the specified string.

Parameters:
field reference to a string that the read characters should be apppended to (does not include the tab, new line, or eof).
Returns:
1 if tab is found, 0 if new line, and -1 for EOF.

Definition at line 133 of file InputFile.cpp.

References ifeof(), and ifgetc().

00134 {
00135     int charRead = 0;
00136     while(!ifeof())
00137     {
00138         charRead = ifgetc();
00139         if(charRead == EOF)
00140         {
00141             return(-1);
00142         }
00143         if(charRead == '\n')
00144         {
00145             return(0);
00146         }
00147         if(charRead == '\t')
00148         {
00149             return(1);
00150         }
00151         field += charRead;
00152     }
00153     return(-1);
00154 }

void InputFile::setAttemptRecovery ( bool  flag = false  )  [inline]

Enable (default) or disable recovery.

When true, we can attach a myFileTypePtr that implements a recovery capable decompressor. This requires that the caller be able to catch the exception XXX "blah blah blah".

Definition at line 486 of file InputFile.h.

Referenced by SamFile::OpenForRead().

00487     {
00488         myAttemptRecovery = flag;
00489     }


The documentation for this class was generated from the following files:
Generated on Mon Feb 11 13:45:22 2013 for libStatGen Software by  doxygen 1.6.3