Class for easily reading/writing files without having to worry about file type (uncompressed, gzip, bgzf) when reading. More...
#include <InputFile.h>
Public Types | |
enum | ifileCompression { DEFAULT, UNCOMPRESSED, GZIP, BGZF } |
Compression to use when writing a file & decompression used when reading a file from stdin. More... | |
Public Member Functions | |
InputFile () | |
Default constructor. | |
~InputFile () | |
Destructor. | |
InputFile (const char *filename, const char *mode, InputFile::ifileCompression compressionMode=InputFile::DEFAULT) | |
Constructor for opening a file. | |
void | bufferReads (unsigned int bufferSize=DEFAULT_BUFFER_SIZE) |
Set the buffer size for reading from files so that bufferSize bytes are read at a time and stored until accessed by another read call. | |
void | disableBuffering () |
Disable read buffering. | |
int | ifclose () |
Close the file. | |
int | ifread (void *buffer, unsigned int size) |
Read size bytes from the file into the buffer. | |
int | ifgetc () |
Get a character from the file. | |
void | ifrewind () |
Reset to the beginning of the file. | |
int | ifeof () |
Check to see if we have reached the EOF. | |
unsigned int | ifwrite (const void *buffer, unsigned int size) |
Write the specified buffer into the file. | |
bool | isOpen () |
Returns whether or not the file was successfully opened. | |
int64_t | iftell () |
Get current position in the file. | |
bool | ifseek (int64_t offset, int origin) |
Seek to the specified offset from the origin. | |
const char * | getFileName () const |
Get the filename that is currently opened. | |
void | setAttemptRecovery (bool flag=false) |
Enable (default) or disable recovery. | |
bool | attemptRecoverySync (bool(*checkSignature)(void *data), int length) |
bool | openFile (const char *filename, const char *mode, InputFile::ifileCompression compressionMode) |
Protected Member Functions | |
int | readFromFile (void *buffer, unsigned int size) |
Protected Attributes | |
FileType * | myFileTypePtr |
unsigned int | myAllocatedBufferSize |
char * | myFileBuffer |
int | myBufferIndex |
int | myCurrentBufferSize |
std::string | myFileName |
Static Protected Attributes | |
static const unsigned int | DEFAULT_BUFFER_SIZE = 1048576 |
Class for easily reading/writing files without having to worry about file type (uncompressed, gzip, bgzf) when reading.
It hides the low level file operations/structure from the user, allowing them to generically open and operate on a file using the same interface without knowing the file format (standard uncompressed, gzip, or bgzf). For writing, the user must specify the file type. There is a typedef IFILE which is InputFile* and setup to mimic FILE including global methods that take IFILE as a parameter.
Definition at line 42 of file InputFile.h.
Compression to use when writing a file & decompression used when reading a file from stdin.
Any other read checks the file to determine how to uncompress it.
DEFAULT |
Check the extension, if it is ".gz", treat as gzip, otherwise treat it as UNCOMPRESSED. |
UNCOMPRESSED |
uncompressed file. |
GZIP |
gzip file. |
BGZF |
bgzf file. |
Definition at line 50 of file InputFile.h.
00050 { 00051 DEFAULT, ///< Check the extension, if it is ".gz", treat as gzip, otherwise treat it as UNCOMPRESSED. 00052 UNCOMPRESSED, ///< uncompressed file. 00053 GZIP, ///< gzip file. 00054 BGZF ///< bgzf file. 00055 };
InputFile::InputFile | ( | const char * | filename, | |
const char * | mode, | |||
InputFile::ifileCompression | compressionMode = InputFile::DEFAULT | |||
) |
Constructor for opening a file.
filename | file to open | |
mode | same format as fopen: "r" for read & "w" for write. | |
compressionMode | set the type of file to open for writing or for reading from stdin (when reading files, the compression type is determined by reading the file). |
Definition at line 28 of file InputFile.cpp.
00030 { 00031 // XXX duplicate code 00032 myAttemptRecovery = false; 00033 myFileTypePtr = NULL; 00034 myBufferIndex = 0; 00035 myCurrentBufferSize = 0; 00036 myAllocatedBufferSize = DEFAULT_BUFFER_SIZE; 00037 myFileBuffer = new char[myAllocatedBufferSize]; 00038 myFileName.clear(); 00039 00040 openFile(filename, mode, compressionMode); 00041 }
void InputFile::bufferReads | ( | unsigned int | bufferSize = DEFAULT_BUFFER_SIZE |
) | [inline] |
Set the buffer size for reading from files so that bufferSize bytes are read at a time and stored until accessed by another read call.
This improves performance over reading the file small bits at a time. Buffering reads disables the tell call for bgzf files. Any previous values in the buffer will be deleted.
bufferSize | number of bytes to read/buffer at a time, default buffer size is 1048576, and turn off read buffering by setting bufferSize = 1; |
Definition at line 90 of file InputFile.h.
Referenced by disableBuffering().
00091 { 00092 // If the buffer size is the same, do nothing. 00093 if(bufferSize == myAllocatedBufferSize) 00094 { 00095 return; 00096 } 00097 // Delete the previous buffer. 00098 if(myFileBuffer != NULL) 00099 { 00100 delete[] myFileBuffer; 00101 } 00102 myBufferIndex = 0; 00103 myCurrentBufferSize = 0; 00104 // The buffer size must be at least 1 so one character can be 00105 // read and ifgetc can just assume reading into the buffer. 00106 if(bufferSize < 1) 00107 { 00108 bufferSize = 1; 00109 } 00110 myFileBuffer = new char[bufferSize]; 00111 myAllocatedBufferSize = bufferSize; 00112 00113 if(myFileTypePtr != NULL) 00114 { 00115 if(bufferSize == 1) 00116 { 00117 myFileTypePtr->setBuffered(false); 00118 } 00119 else 00120 { 00121 myFileTypePtr->setBuffered(true); 00122 } 00123 } 00124 }
const char* InputFile::getFileName | ( | ) | const [inline] |
Get the filename that is currently opened.
Definition at line 411 of file InputFile.h.
Referenced by SamFile::ReadBamIndex().
int InputFile::ifclose | ( | ) | [inline] |
Close the file.
Definition at line 140 of file InputFile.h.
Referenced by ifclose().
int InputFile::ifeof | ( | ) | [inline] |
Check to see if we have reached the EOF.
Definition at line 324 of file InputFile.h.
Referenced by ifeof().
00325 { 00326 // Not EOF if we are not at the end of the buffer. 00327 if (myBufferIndex < myCurrentBufferSize) 00328 { 00329 // There are still available bytes in the buffer, so NOT EOF. 00330 return false; 00331 } 00332 else 00333 { 00334 if (myFileTypePtr == NULL) 00335 { 00336 // No myFileTypePtr, so not eof (return 0). 00337 return 0; 00338 } 00339 // exhausted our buffer, so check the file for eof. 00340 return myFileTypePtr->eof(); 00341 } 00342 }
int InputFile::ifgetc | ( | ) | [inline] |
Get a character from the file.
Read a character from the internal buffer, or if the end of the buffer has been reached, read from the file into the buffer and return index 0.
Definition at line 288 of file InputFile.h.
Referenced by ifgetc(), and operator>>().
00289 { 00290 if (myBufferIndex >= myCurrentBufferSize) 00291 { 00292 // at the last index, read a new buffer. 00293 myCurrentBufferSize = readFromFile(myFileBuffer, myAllocatedBufferSize); 00294 myBufferIndex = 0; 00295 } 00296 // If the buffer index is still greater than or equal to the 00297 // myCurrentBufferSize, then we failed to read the file - return EOF. 00298 if (myBufferIndex >= myCurrentBufferSize) 00299 { 00300 return(EOF); 00301 } 00302 return(myFileBuffer[myBufferIndex++]); 00303 }
int InputFile::ifread | ( | void * | buffer, | |
unsigned int | size | |||
) | [inline] |
Read size bytes from the file into the buffer.
buffer | pointer to memory at least size bytes big to write the data into. | |
size | number of bytes to be read |
Definition at line 160 of file InputFile.h.
Referenced by ifread().
00161 { 00162 // There are 2 cases: 00163 // 1) There are already size available bytes in buffer. 00164 // 2) There are not size bytes in buffer. 00165 00166 // Determine the number of available bytes in the buffer. 00167 unsigned int availableBytes = myCurrentBufferSize - myBufferIndex; 00168 int returnSize = 0; 00169 00170 // Case 1: There are already size available bytes in buffer. 00171 if (size <= availableBytes) 00172 { 00173 // Just copy from the buffer, increment the index and return. 00174 memcpy(buffer, myFileBuffer+myBufferIndex, size); 00175 // Increment the buffer index. 00176 myBufferIndex += size; 00177 returnSize = size; 00178 } 00179 // Case 2: There are not size bytes in buffer. 00180 else 00181 { 00182 // Check to see if there are some bytes in the buffer. 00183 if (availableBytes > 0) 00184 { 00185 // Size > availableBytes > 0 00186 // Copy the available bytes into the buffer. 00187 memcpy(buffer, myFileBuffer+myBufferIndex, availableBytes); 00188 } 00189 // So far availableBytes have been copied into the read buffer. 00190 returnSize = availableBytes; 00191 // Increment myBufferIndex by what was read. 00192 myBufferIndex += availableBytes; 00193 00194 unsigned int remainingSize = size - availableBytes; 00195 00196 // Check if the remaining size is more or less than the 00197 // max buffer size. 00198 if(remainingSize < myAllocatedBufferSize) 00199 { 00200 // the remaining size is not the full buffer, but read 00201 // a full buffer worth of data anyway. 00202 myCurrentBufferSize = 00203 readFromFile(myFileBuffer, myAllocatedBufferSize); 00204 00205 // Check for an error. 00206 if(myCurrentBufferSize <= 0) 00207 { 00208 // No more data was successfully read, so check to see 00209 // if any data was copied to the return buffer at all. 00210 if( returnSize == 0) 00211 { 00212 // No data has been copied at all into the 00213 // return read buffer, so just return the value 00214 // returned from readFromFile. 00215 returnSize = myCurrentBufferSize; 00216 // Otherwise, returnSize is already set to the 00217 // available bytes that was already copied (so no 00218 // else statement is needed). 00219 } 00220 // Set myBufferIndex & myCurrentBufferSize to 0. 00221 myCurrentBufferSize = 0; 00222 myBufferIndex = 0; 00223 } 00224 else 00225 { 00226 // Successfully read more data. 00227 // Check to see how much was copied. 00228 int copySize = remainingSize; 00229 if(copySize > myCurrentBufferSize) 00230 { 00231 // Not the entire requested amount was read 00232 // (either from EOF or there was a partial read due to 00233 // an error), so set the copySize to what was read. 00234 copySize = myCurrentBufferSize; 00235 } 00236 00237 // Now copy the rest of the bytes into the buffer. 00238 memcpy((char*)buffer+availableBytes, 00239 myFileBuffer, copySize); 00240 00241 // set the buffer index to the location after what we are 00242 // returning as read. 00243 myBufferIndex = copySize; 00244 00245 returnSize += copySize; 00246 } 00247 } 00248 else 00249 { 00250 // More remaining to be read than the max buffer size, so just 00251 // read directly into the output buffer. 00252 int readSize = readFromFile((char*)buffer + availableBytes, 00253 remainingSize); 00254 00255 // Already used the buffer, so "clear" it. 00256 myCurrentBufferSize = 0; 00257 myBufferIndex = 0; 00258 if(readSize <= 0) 00259 { 00260 // No more data was successfully read, so check to see 00261 // if any data was copied to the return buffer at all. 00262 if(returnSize == 0) 00263 { 00264 // No data has been copied at all into the 00265 // return read buffer, so just return the value 00266 // returned from readFromFile. 00267 returnSize = readSize; 00268 // Otherwise, returnSize is already set to the 00269 // available bytes that was already copied (so no 00270 // else statement is needed). 00271 } 00272 } 00273 else 00274 { 00275 // More data was read, so increment the return count. 00276 returnSize += readSize; 00277 } 00278 } 00279 } 00280 return(returnSize); 00281 }
bool InputFile::ifseek | ( | int64_t | offset, | |
int | origin | |||
) | [inline] |
Seek to the specified offset from the origin.
offset | offset into the file to move to (must be from a tell call) | |
origin | can be any of the following: Note: not all are valid for all filetypes. SEEK_SET - Beginning of file SEEK_CUR - Current position of the file pointer SEEK_END - End of file |
Definition at line 395 of file InputFile.h.
Referenced by ifseek().
00396 { 00397 if (myFileTypePtr == NULL) 00398 { 00399 // No myFileTypePtr, so return false - could not seek. 00400 return false; 00401 } 00402 // TODO - may be able to seek within the buffer if applicable. 00403 // Reset buffering since a seek is being done. 00404 myBufferIndex = 0; 00405 myCurrentBufferSize = 0; 00406 return myFileTypePtr->seek(offset, origin); 00407 }
int64_t InputFile::iftell | ( | ) | [inline] |
Get current position in the file.
Definition at line 374 of file InputFile.h.
Referenced by iftell().
unsigned int InputFile::ifwrite | ( | const void * | buffer, | |
unsigned int | size | |||
) | [inline] |
Write the specified buffer into the file.
buffer | buffer containing size bytes to write to the file. | |
size | number of bytes to write |
Definition at line 349 of file InputFile.h.
Referenced by ifwrite().
bool InputFile::isOpen | ( | ) | [inline] |
Returns whether or not the file was successfully opened.
Definition at line 361 of file InputFile.h.
Referenced by ifopen(), SamFile::IsOpen(), GlfHeader::read(), SamRecord::setBufferFromFile(), GlfHeader::write(), and SamRecord::writeRecordBuffer().
void InputFile::setAttemptRecovery | ( | bool | flag = false |
) | [inline] |
Enable (default) or disable recovery.
When true, we can attach a myFileTypePtr that implements a recovery capable decompressor. This requires that the caller be able to catch the exception XXX "blah blah blah".
Definition at line 423 of file InputFile.h.
Referenced by SamFile::OpenForRead().