PSEUDO -- Input file formats
PSEUDO requires three different types of input in order to run a simulation: (1) a hypothesis file
containing a list of tests to run (2) a set of z-score files (generated by
MERLIN) to be used for sampling
and (3) a list file telling PSEUDO where to find your z-score replicate files.
PSEUDO is designed to work with .zscore format files generated by MERLIN
version 0.10.3 or higher. If
you don't have MERLIN installed on your system, or if you haven't updated
it on your system recently, you'll want
to download an update from the MERLIN
download site
before proceeding.
Hypothesis File
The first thing you need to do before starting your simulation is to settle on a set of hypotheses to
test based on your original results. Most of the time, you'll be interested in evaluating the significance
of one or more lod score peaks that you've gotten from a genome scan.
As an illustration, we'll look at output produced
by MERLIN when a genome scan is done on the sample data contained in pseudo_scan.ped,
pseudo_scan.dat, pseudo_scan.map. If
you've downloaded the example files for this tutorial, you should be able
to reproduce the genome scan results by typing
merlin -p pseudo_scan.ped -d pseudo_scan.dat -m pseudo_scan.map --npl --pairs --qtl
If you take a close look, you'll see that the scan produced several promising lod score peaks.
For the --npl option, we had a lod of 3.60 on chromosome 2 and a lod of 2.68 for
chromosome 8. We also had some interesting results for trait with lods of
3.12 on chromosome 3 and 1.81 on chromosome 8 using the --qtl scoring option.
If we'd like to assign a p-value to our best lod , we only
need to the determine the empirical probability of a genome scan containing 1 or more lods
greater than 3.60 for affection.
However, to properly evaluate significance for our second best peak
for
affection, we'll need to account for the
presence of the first by only counting those simulations that produced 2
or more independent lods greater than 2.68. PSEUDO can test each of these hypotheses jointly if we give it the
hypothesis file below:
HYPOTHESIS_1 affection [ALL] 3.60 1
HYPOTHESIS_2 affection [ALL] 2.68 2
Here, the first column specifies a label for each hypothesis. Columns two and three contain the affection
label and score type - together they fully specify the outcome you're interested in testing. The number
in the fourth column is the lod score to be tested, and the last column gives the rank for the lod.
Using scan
If you're just getting started with your linkage data, and really don't have a set of specific
hypotheses to test in mind, a quick way to get started with PSEUDO is to use a small companion
utility called scan. Scan can save a few minutes by quickly going through your MERLIN screen
output, finding the maximum lod value for each outcome in your data set and generating a hypothesis table
to be used by PSEUDO. To run scan, simply save your original MERLIN output to a file
merlin -p pseudo_scan.ped -d pseudo_scan.dat -m pseudo_scan.map --npl --pairs --qtl > merlin.output
and then tell scan to go through your output,
scan -s merlin.output -h your_hypotheses.hyp -l 2
The -l command line argument tells scan to pick out the 2 largest lods
(on separate chromosomes) for each affection and write a list of hypotheses to a file:
Hypothesis_0 affection [ALL] 3.60 1
Hypothesis_1 affection [ALL] 2.68 2
Hypothesis_2 affection [Pairs] 3.58 1
Hypothesis_3 affection [Pairs] 2.66 2
Hypothesis_4 trait [QTL] 3.12 1
Hypothesis_5 trait [QTL] 1.81 2
A hypothesis with multiple affections
If you look at the results for affection [ALL] and affection [PAIRS], you may notice
that we've gotten a promising lod for both outcomes at the same position. While this might be the case for any two correlated outcomes,
the similarity between trait affection [ALL] and affection [PAIRS] is expected
since we are doing two different tests of linkage on the same
data. To correct for this correlation without overcorrecting, you may want to
test these outcomes jointly. To do so, simply add a joint hypothesis to your file
HYPOTHESIS_1 affection [ALL], affection [PAIRS] 3.60 1
HYPOTHESIS_2 affection [ALL], affection [PAIRS] 2.68 2
Hypothesis_3 trait [QTL] 3.12 1
Hypothesis_4 trait [QTL] 1.81 2
Z-score files
In order to generate a pool of z-score replicates for sampling, you'll need to
use MERLIN's --simulate option to generate
gene-dropping replicates of your original datasets under the null hypothesis of no
linkage and no association. You'll also need to tell MERLIN to analyse
each of these
replicates using the same Kong and Cox
scoring options as your original analysis (e.g. --npl --pairs) ,
and additionally, to save family-specific z-scores (using the --zscores
option).
merlin -p example.ped -d example.dat -m example.map --npl --pairs --qtl --zscores --simulate
You may also want to save yourself a few keystrokes by using MERLIN
--reruns option
to generate multiple replicates with a single command line.
For example the command,
merlin -p pseudo_scan.ped -d pseudo_scan.dat -m pseudo_scan.map --npl --pairs --qtl --zscores --simulate --reruns:20
will generate twenty gene-dropping replicates using the data in example.ped, example.map and example.dat as a template, do a Kong and Cox test of
linkage on of these replicates using Spairs (--pairs) and Sall (--npl), and save the family-specific z-scores for each replicate. Once you
run this command, you'll see 20 z-score files each with the same (randomly generated) file prefix and the replicate number as a suffix in your
directory
% ls merlin*.zscore
merlin-00123456-00001.zscore
merlin-00123456-00002.zscore
merlin-00123456-00003.zscore
merlin-00123456-00004.zscore
merlin-00123456-00005.zscore
merlin-00123456-00006.zscore
merlin-00123456-00007.zscore
merlin-00123456-00008.zscore
merlin-00123456-00009.zscore
merlin-00123456-00010.zscore
merlin-00123456-00011.zscore
merlin-00123456-00012.zscore
merlin-00123456-00013.zscore
merlin-00123456-00014.zscore
merlin-00123456-00015.zscore
merlin-00123456-00016.zscore
merlin-00123456-00017.zscore
merlin-00123456-00018.zscore
merlin-00123456-00019.zscore
merlin-00123456-00020.zscore
List file
After you've got a set of z-score replicates, you'll need to make a list file telling PSEUDO where to find them.
The quickest way to do this on a UNIX system is to figure out the prefix used by MERLIN to tag your z-score files and then use the ls
command to produce a list
ls merlin-00123456*.zscore > pseudo.list
Now that you've got input files together, you're ready to run your first pseudosimulation. If you haven't already installed PSEUDO on your
system, you'll want to get a copy from our download site before proceeding to the section on
output produced by PSEUDO
|