Class for easily reading/writing files without having to worry about file type (uncompressed, gzip, bgzf) when reading. More...
#include <InputFile.h>
Public Types | |
enum | ifileCompression { DEFAULT, UNCOMPRESSED, GZIP, BGZF } |
Compression to use when writing a file & decompression used when reading a file from stdin. More... | |
Public Member Functions | |
InputFile () | |
Default constructor. | |
~InputFile () | |
Destructor. | |
InputFile (const char *filename, const char *mode, InputFile::ifileCompression compressionMode=InputFile::DEFAULT) | |
Constructor for opening a file. | |
void | bufferReads (unsigned int bufferSize=DEFAULT_BUFFER_SIZE) |
Set the buffer size for reading from files so that bufferSize bytes are read at a time and stored until accessed by another read call. | |
void | disableBuffering () |
Disable read buffering. | |
int | ifclose () |
Close the file. | |
int | ifread (void *buffer, unsigned int size) |
Read size bytes from the file into the buffer. | |
int | ifgetc () |
Get a character from the file. | |
void | ifrewind () |
Reset to the beginning of the file. | |
int | ifeof () |
Check to see if we have reached the EOF. | |
unsigned int | ifwrite (const void *buffer, unsigned int size) |
Write the specified buffer into the file. | |
bool | isOpen () |
Returns whether or not the file was successfully opened. | |
int64_t | iftell () |
Get current position in the file. | |
bool | ifseek (int64_t offset, int origin) |
Seek to the specified offset from the origin. | |
const char * | getFileName () const |
Get the filename that is currently opened. | |
void | setAttemptRecovery (bool flag=false) |
Enable (default) or disable recovery. | |
bool | attemptRecoverySync (bool(*checkSignature)(void *data), int length) |
bool | openFile (const char *filename, const char *mode, InputFile::ifileCompression compressionMode) |
Protected Member Functions | |
int | readFromFile (void *buffer, unsigned int size) |
Protected Attributes | |
FileType * | myFileTypePtr |
unsigned int | myAllocatedBufferSize |
char * | myFileBuffer |
int | myBufferIndex |
int | myCurrentBufferSize |
std::string | myFileName |
Static Protected Attributes | |
static const unsigned int | DEFAULT_BUFFER_SIZE = 1048576 |
Class for easily reading/writing files without having to worry about file type (uncompressed, gzip, bgzf) when reading.
Definition at line 36 of file InputFile.h.
Compression to use when writing a file & decompression used when reading a file from stdin.
Any other read checks the file to determine how to uncompress it.
DEFAULT |
Check the extension, if it is ".gz", treat as gzip, otherwise treat it as UNCOMPRESSED. |
UNCOMPRESSED |
uncompressed file. |
GZIP |
gzip file. |
BGZF |
bgzf file. |
Definition at line 44 of file InputFile.h.
00044 { 00045 DEFAULT, ///< Check the extension, if it is ".gz", treat as gzip, otherwise treat it as UNCOMPRESSED. 00046 UNCOMPRESSED, ///< uncompressed file. 00047 GZIP, ///< gzip file. 00048 BGZF ///< bgzf file. 00049 };
InputFile::InputFile | ( | const char * | filename, | |
const char * | mode, | |||
InputFile::ifileCompression | compressionMode = InputFile::DEFAULT | |||
) |
Constructor for opening a file.
filename | file to open | |
mode | same format as fopen: "r" for read & "w" for write. | |
compressionMode | set the type of file to open for writing or for reading from stdin (when reading files, the compression type is determined by reading the file). |
Definition at line 28 of file InputFile.cpp.
00030 { 00031 // XXX duplicate code 00032 myAttemptRecovery = false; 00033 myFileTypePtr = NULL; 00034 myBufferIndex = 0; 00035 myCurrentBufferSize = 0; 00036 myAllocatedBufferSize = DEFAULT_BUFFER_SIZE; 00037 myFileBuffer = new char[myAllocatedBufferSize]; 00038 myFileName.clear(); 00039 00040 openFile(filename, mode, compressionMode); 00041 }
void InputFile::bufferReads | ( | unsigned int | bufferSize = DEFAULT_BUFFER_SIZE |
) | [inline] |
Set the buffer size for reading from files so that bufferSize bytes are read at a time and stored until accessed by another read call.
This improves performance over reading the file small bits at a time. Buffering reads disables the tell call for bgzf files. Any previous values in the buffer will be deleted.
bufferSize | number of bytes to read/buffer at a time, default buffer size is 1048576, and turn off read buffering by setting bufferSize = 1; |
Definition at line 84 of file InputFile.h.
Referenced by disableBuffering().
00085 { 00086 // If the buffer size is the same, do nothing. 00087 if(bufferSize == myAllocatedBufferSize) 00088 { 00089 return; 00090 } 00091 // Delete the previous buffer. 00092 if(myFileBuffer != NULL) 00093 { 00094 delete[] myFileBuffer; 00095 } 00096 myBufferIndex = 0; 00097 myCurrentBufferSize = 0; 00098 // The buffer size must be at least 1 so one character can be 00099 // read and ifgetc can just assume reading into the buffer. 00100 if(bufferSize < 1) 00101 { 00102 bufferSize = 1; 00103 } 00104 myFileBuffer = new char[bufferSize]; 00105 myAllocatedBufferSize = bufferSize; 00106 00107 if(myFileTypePtr != NULL) 00108 { 00109 if(bufferSize == 1) 00110 { 00111 myFileTypePtr->setBuffered(false); 00112 } 00113 else 00114 { 00115 myFileTypePtr->setBuffered(true); 00116 } 00117 } 00118 }
const char* InputFile::getFileName | ( | ) | const [inline] |
Get the filename that is currently opened.
Definition at line 405 of file InputFile.h.
Referenced by SamFile::ReadBamIndex().
int InputFile::ifclose | ( | ) | [inline] |
Close the file.
Definition at line 134 of file InputFile.h.
Referenced by ifclose().
int InputFile::ifeof | ( | ) | [inline] |
Check to see if we have reached the EOF.
Definition at line 318 of file InputFile.h.
Referenced by ifeof().
00319 { 00320 // Not EOF if we are not at the end of the buffer. 00321 if (myBufferIndex < myCurrentBufferSize) 00322 { 00323 // There are still available bytes in the buffer, so NOT EOF. 00324 return false; 00325 } 00326 else 00327 { 00328 if (myFileTypePtr == NULL) 00329 { 00330 // No myFileTypePtr, so not eof (return 0). 00331 return 0; 00332 } 00333 // exhausted our buffer, so check the file for eof. 00334 return myFileTypePtr->eof(); 00335 } 00336 }
int InputFile::ifgetc | ( | ) | [inline] |
Get a character from the file.
Read a character from the internal buffer, or if the end of the buffer has been reached, read from the file into the buffer and return index 0.
Definition at line 282 of file InputFile.h.
Referenced by ifgetc(), and operator>>().
00283 { 00284 if (myBufferIndex >= myCurrentBufferSize) 00285 { 00286 // at the last index, read a new buffer. 00287 myCurrentBufferSize = readFromFile(myFileBuffer, myAllocatedBufferSize); 00288 myBufferIndex = 0; 00289 } 00290 // If the buffer index is still greater than or equal to the 00291 // myCurrentBufferSize, then we failed to read the file - return EOF. 00292 if (myBufferIndex >= myCurrentBufferSize) 00293 { 00294 return(EOF); 00295 } 00296 return(myFileBuffer[myBufferIndex++]); 00297 }
int InputFile::ifread | ( | void * | buffer, | |
unsigned int | size | |||
) | [inline] |
Read size bytes from the file into the buffer.
buffer | pointer to memory at least size bytes big to write the data into. | |
size | number of bytes to be read |
Definition at line 154 of file InputFile.h.
Referenced by ifread().
00155 { 00156 // There are 2 cases: 00157 // 1) There are already size available bytes in buffer. 00158 // 2) There are not size bytes in buffer. 00159 00160 // Determine the number of available bytes in the buffer. 00161 unsigned int availableBytes = myCurrentBufferSize - myBufferIndex; 00162 int returnSize = 0; 00163 00164 // Case 1: There are already size available bytes in buffer. 00165 if (size <= availableBytes) 00166 { 00167 // Just copy from the buffer, increment the index and return. 00168 memcpy(buffer, myFileBuffer+myBufferIndex, size); 00169 // Increment the buffer index. 00170 myBufferIndex += size; 00171 returnSize = size; 00172 } 00173 // Case 2: There are not size bytes in buffer. 00174 else 00175 { 00176 // Check to see if there are some bytes in the buffer. 00177 if (availableBytes > 0) 00178 { 00179 // Size > availableBytes > 0 00180 // Copy the available bytes into the buffer. 00181 memcpy(buffer, myFileBuffer+myBufferIndex, availableBytes); 00182 } 00183 // So far availableBytes have been copied into the read buffer. 00184 returnSize = availableBytes; 00185 // Increment myBufferIndex by what was read. 00186 myBufferIndex += availableBytes; 00187 00188 unsigned int remainingSize = size - availableBytes; 00189 00190 // Check if the remaining size is more or less than the 00191 // max buffer size. 00192 if(remainingSize < myAllocatedBufferSize) 00193 { 00194 // the remaining size is not the full buffer, but read 00195 // a full buffer worth of data anyway. 00196 myCurrentBufferSize = 00197 readFromFile(myFileBuffer, myAllocatedBufferSize); 00198 00199 // Check for an error. 00200 if(myCurrentBufferSize <= 0) 00201 { 00202 // No more data was successfully read, so check to see 00203 // if any data was copied to the return buffer at all. 00204 if( returnSize == 0) 00205 { 00206 // No data has been copied at all into the 00207 // return read buffer, so just return the value 00208 // returned from readFromFile. 00209 returnSize = myCurrentBufferSize; 00210 // Otherwise, returnSize is already set to the 00211 // available bytes that was already copied (so no 00212 // else statement is needed). 00213 } 00214 // Set myBufferIndex & myCurrentBufferSize to 0. 00215 myCurrentBufferSize = 0; 00216 myBufferIndex = 0; 00217 } 00218 else 00219 { 00220 // Successfully read more data. 00221 // Check to see how much was copied. 00222 int copySize = remainingSize; 00223 if(copySize > myCurrentBufferSize) 00224 { 00225 // Not the entire requested amount was read 00226 // (either from EOF or there was a partial read due to 00227 // an error), so set the copySize to what was read. 00228 copySize = myCurrentBufferSize; 00229 } 00230 00231 // Now copy the rest of the bytes into the buffer. 00232 memcpy((char*)buffer+availableBytes, 00233 myFileBuffer, copySize); 00234 00235 // set the buffer index to the location after what we are 00236 // returning as read. 00237 myBufferIndex = copySize; 00238 00239 returnSize += copySize; 00240 } 00241 } 00242 else 00243 { 00244 // More remaining to be read than the max buffer size, so just 00245 // read directly into the output buffer. 00246 int readSize = readFromFile((char*)buffer + availableBytes, 00247 remainingSize); 00248 00249 // Already used the buffer, so "clear" it. 00250 myCurrentBufferSize = 0; 00251 myBufferIndex = 0; 00252 if(readSize <= 0) 00253 { 00254 // No more data was successfully read, so check to see 00255 // if any data was copied to the return buffer at all. 00256 if(returnSize == 0) 00257 { 00258 // No data has been copied at all into the 00259 // return read buffer, so just return the value 00260 // returned from readFromFile. 00261 returnSize = readSize; 00262 // Otherwise, returnSize is already set to the 00263 // available bytes that was already copied (so no 00264 // else statement is needed). 00265 } 00266 } 00267 else 00268 { 00269 // More data was read, so increment the return count. 00270 returnSize += readSize; 00271 } 00272 } 00273 } 00274 return(returnSize); 00275 }
bool InputFile::ifseek | ( | int64_t | offset, | |
int | origin | |||
) | [inline] |
Seek to the specified offset from the origin.
offset | offset into the file to move to (must be from a tell call) | |
origin | can be any of the following: Note: not all are valid for all filetypes. SEEK_SET - Beginning of file SEEK_CUR - Current position of the file pointer SEEK_END - End of file |
Definition at line 389 of file InputFile.h.
Referenced by ifseek().
00390 { 00391 if (myFileTypePtr == NULL) 00392 { 00393 // No myFileTypePtr, so return false - could not seek. 00394 return false; 00395 } 00396 // TODO - may be able to seek within the buffer if applicable. 00397 // Reset buffering since a seek is being done. 00398 myBufferIndex = 0; 00399 myCurrentBufferSize = 0; 00400 return myFileTypePtr->seek(offset, origin); 00401 }
int64_t InputFile::iftell | ( | ) | [inline] |
Get current position in the file.
Definition at line 368 of file InputFile.h.
Referenced by iftell().
unsigned int InputFile::ifwrite | ( | const void * | buffer, | |
unsigned int | size | |||
) | [inline] |
Write the specified buffer into the file.
buffer | buffer containing size bytes to write to the file. | |
size | number of bytes to write |
Definition at line 343 of file InputFile.h.
Referenced by ifwrite().
bool InputFile::isOpen | ( | ) | [inline] |
Returns whether or not the file was successfully opened.
Definition at line 355 of file InputFile.h.
Referenced by ifopen(), SamFile::IsOpen(), GlfHeader::read(), SamRecord::setBufferFromFile(), GlfHeader::write(), and SamRecord::writeRecordBuffer().
void InputFile::setAttemptRecovery | ( | bool | flag = false |
) | [inline] |
Enable (default) or disable recovery.
When true, we can attach a myFileTypePtr that implements a recovery capable decompressor. This requires that the caller be able to catch the exception XXX "blah blah blah".
Definition at line 417 of file InputFile.h.
Referenced by SamFile::OpenForRead().