PSEUDO -- A Note Concerning Resource Usage
Memory and disk space usage
It's important to consider the resources available on your system when you're deciding what hypotheses
to test and what output you'd like PSEUDO to generate. Although PSEUDO is designed to
handle memory as efficiently as possible without compromsing algorithmic performance, even a fairly reasonable data set
can require a considerable amount of memory to store inputted z-scores. For instance, if you have a data set with
5 affections, 200 families 23 chromosomes with 5000 positions and 20 replicates per z-score pool, you'll need to input
1 z-score for each affection, family, replicate, and position
= 5 * 200 * 5000 * 20 * sizeof (double)
= 800,000,000 bytes = 800,000K = 800M
or over 800M of memory simply to z-score replicates.
Disk space should also be consideration, particularly when you're selecting the types
of output you'd like to see. Because each replicate simulation generated by PSEUDO
reproduces an entire genome scan for one or several outcomes, the volume of data that can be generated for a typical simulation can
quickly escalate if you're not mindful of the numbers. For instance, suppose you request the maximum lod score for each simulated chromosome
(using the --simpeaks option) for a data set with 5 unique outcomes and markers on 23 chromosomes. If you run 500,000 simulations,
the file produced by PSEUDO will have approximately
23 * 500,000 * 5 = 57,500,000 lines
Options available for managing resource usage
PSEUDO has a number of options that can help you safeguard against unneccessary crashes or extremely large output files.
Memory
The --megabytes option will estimate the amount of memory that will be required to complete the requested
simulation, and terminate it at the beginning if more than the the preset limit will be required. For instance, to limit memory for your simulation to 500 M,
you'd type:
pseudo -l pseudo.list -h pseudo.hyp -n 100000 --megabytes 500
When the amount of memory requested is close to but greater than the preset limit, PSEUDO attempts to use less memory
by saving fewer outcomes in memory and writing results to disk more frequently. While this might slow your simulation
somewhat, it can also make it possible to run simulations that wouldn't otherwise be possible given available memory.
Disk space
If you'd like to limit the amount of disk space used, the --diskSpace option can be helpful. To
set an upper bound of 50 M for any one output file, you'd type
pseudo -l pseudo.list -h pseudo.hyp -n 100000 --diskSpace 50
Whenever PSEUDO estimates that disk space required for the requested output options exceeds the diskspace limit,
it will terminate the simulation
|