University of Michigan Center for Statistical 
Genetics
Search
 
 

 
 

PSEUDO -- A Note Concerning Resource Usage

Memory and disk space usage

It's important to consider the resources available on your system when you're deciding what hypotheses to test and what output you'd like PSEUDO to generate. Although PSEUDO is designed to handle memory as efficiently as possible without compromsing algorithmic performance, even a fairly reasonable data set can require a considerable amount of memory to store inputted z-scores. For instance, if you have a data set with 5 affections, 200 families 23 chromosomes with 5000 positions and 20 replicates per z-score pool, you'll need to input

	
	1 z-score for each affection, family, replicate, and position
 		= 	5 * 200 * 5000 * 20 * sizeof (double) 
    		= 	800,000,000 bytes = 800,000K = 800M

or over 800M of memory simply to z-score replicates.

Disk space should also be consideration, particularly when you're selecting the types of output you'd like to see. Because each replicate simulation generated by PSEUDO reproduces an entire genome scan for one or several outcomes, the volume of data that can be generated for a typical simulation can quickly escalate if you're not mindful of the numbers. For instance, suppose you request the maximum lod score for each simulated chromosome (using the --simpeaks option) for a data set with 5 unique outcomes and markers on 23 chromosomes. If you run 500,000 simulations, the file produced by PSEUDO will have approximately

   23 * 500,000 * 5  = 57,500,000 lines

Options available for managing resource usage

PSEUDO has a number of options that can help you safeguard against unneccessary crashes or extremely large output files.

Memory

The --megabytes option will estimate the amount of memory that will be required to complete the requested simulation, and terminate it at the beginning if more than the the preset limit will be required. For instance, to limit memory for your simulation to 500 M, you'd type:

	pseudo -l pseudo.list -h pseudo.hyp -n 100000 --megabytes 500

When the amount of memory requested is close to but greater than the preset limit, PSEUDO attempts to use less memory by saving fewer outcomes in memory and writing results to disk more frequently. While this might slow your simulation somewhat, it can also make it possible to run simulations that wouldn't otherwise be possible given available memory.

Disk space

If you'd like to limit the amount of disk space used, the --diskSpace option can be helpful. To set an upper bound of 50 M for any one output file, you'd type

	pseudo -l pseudo.list -h pseudo.hyp -n 100000 --diskSpace 50
Whenever PSEUDO estimates that disk space required for the requested output options exceeds the diskspace limit, it will terminate the simulation


 
 

University of Michigan | School of Public Health | Abecasis Lab