PSEUDO -- Output produced by the program
Basics of running the program
Now that we've got a set of z-score files, a list file and a file listing our hypotheses, we're ready to
evaluate empirical p-values using PSEUDO. To run 10000 pseudosimulations, all you need to do is type
pseudo -l pseudo.list -h pseudo.hyp -n 10000
If you'd also like to see pdf output, try adding a --pdf argument to your command line:
pseudo -l pseudo.list -h pseudo.hyp -n 10000 --pdf
Once you press return, the first thing you'll see is a program header followed by a section detailing
which program options have been selected.
PseudoSimulator
(c) 2003-2005 Jan Wigginton, 2003-2005 Goncalo Abecasis
The following parameters are in effect:
List of replicates : pseudo.list (-lname)
Replicates to generate : 10000 (-n9999)
Hypothesis Table : pseudo.hyp (-hname)
Random Seed : 1128619888 (-r9999)
Additional Options
Analysis Options : --balanced, --random [ON], --chrMax [23]
Optional Summaries : --peaks [ON], --simpeaks [ON], --hits [ON]
Weight Summaries : --reps [ON], --weights [ON]
Output Options : --quiet, --prefix [pseudo], --pdf [ON]
Resource Usage : --megabytes [1000], --diskSpace [100], --thinData [1]
Since we've selected pdf output, this option is set to ON. Notice as well that the
random option is set to ON. This is the default sampling scheme for the program.
After the header and program options have been listed, PSEUDO will open your first z-score file and
construct a list of all affections, families, chromosomes and positions. Once this is done, PSEUDO will
print out values for important parameters of the simulation as well as an estimate of maximum memory usage
for the
simulation.
Number of families: 200
Number of positions: 438
Number of chromosomes: 22
Number of affections: 3
Number of hypotheses: 4
Number of real simulations: 20
Number of pseudosimulations: 10000
Estimated maximum memory required to perform this simulation is 98.5 M
Once PSEUDO knows how many positions, families and affections it will be working with, it opens
each z-score replicate file and reads in the family-specific z-scores for each family and affection
variable at each position for that replicate. If it encounters any position, family, affection or chromosome that wasn't
listed in the first file z-score
file, this will be flagged as an error. Conversely, if a z-score value is not listed for any position
that was listed in the first z-score file, PSEUDO will use a value of 0.00 at the position for that
replicate. If there's no problems reading in your z-score information, PSEUDO will tell you that the file
loaded successfully.
Reading information from file merlin-00123456-00001.zscore
Reading information from file merlin-00123456-00002.zscore
Reading information from file merlin-00123456-00003.zscore
Reading information from file merlin-00123456-00004.zscore
Reading information from file merlin-00123456-00005.zscore
Reading information from file merlin-00123456-00006.zscore
Reading information from file merlin-00123456-00007.zscore
Reading information from file merlin-00123456-00008.zscore
Reading information from file merlin-00123456-00009.zscore
Reading information from file merlin-00123456-00010.zscore
Reading information from file merlin-00123456-00011.zscore
Reading information from file merlin-00123456-00012.zscore
Reading information from file merlin-00123456-00013.zscore
Reading information from file merlin-00123456-00014.zscore
Reading information from file merlin-00123456-00015.zscore
Reading information from file merlin-00123456-00016.zscore
Reading information from file merlin-00123456-00017.zscore
Reading information from file merlin-00123456-00018.zscore
Reading information from file merlin-00123456-00019.zscore
Reading information from file merlin-00123456-00020.zscore
How PSEUDO assigns empirical p-values
When PSEUDO evaluates p-values, it will look for chromosomes with one or more lods exceeding the lod of
interest. Any such chromosome is considered a "hit", and a simulation with multiple hits has one or
more lods greater than the lod of interest on several different chromosomes. The ranked p-value
reported is calculated as
Tk |
= |
number of simulations with k or more chromosomes with max lod > L |
N |
= |
total simulations |
Pk |
= |
Tk / N |
Actual distribution of hits
Before it starts the main simulation, PSEUDO will produce a table showing the distribution of hits per simulation in your
original z-score replicates.
ACTUAL DISTRIBUTION OF SIGNIFICANT CHROMOSOMES PER SIMULATION
=============================================================
Hypothesis : Hypothesis_0
Affections : affection [PAIRS], affection [ALL]
Target Lod : 3.60
Target Rank : 1
Simulations : 20
HITS SIMS PROB CUMULATIVE
------------------------------------------------------------
0 20 1.00 1.00
1 0 0.00 0.00
2 0 0.00 0.00
3 0 0.00 0.00
4 0 0.00 0.00
5 0 0.00 0.00
6 0 0.00 0.00
7 0 0.00 0.00
8 0 0.00 0.00
9 0 0.00 0.00
P-value for 1 or more lod > 3.600: 0.0
For this section, lod score results are re-created for the 20 original gene-dropping simulations
that we'll be using for z-score sampling. The distribution is given in the table; column one
lists the possible number of hits; column two gives the number of simulations that had exactly that many hits.
For Hypothesis_0, we're testing the significance of lod score of 3.60 for the outcome affection
[PAIRS] or affection [ALL] at a rank of 1. Since this is our best lod from the actual data, it's really
not all that surprising to see that linkage analysis on the 20 datasets simulated under the null failed to
produce any chromosomes with a lod greater than 3.60.
Simulated distribution of hits
Once the main simulation runs, you'll see another set tables giving the simulated distribution of hits for
each hypothesis. If you use the example data, you should see a table very similar to the one below:
SIMULATED DISTRIBUTION OF SIGNIFICANT CHROMOSOMES PER SIMULATION
================================================================
Hypothesis : Hypothesis_0
Affections : affection [PAIRS], affection [ALL]
Target Lod : 3.60
Target Rank : 1
Simulations : 10000
HITS SIMS PROB CUMULATIVE
------------------------------------------------------------
0 9929 0.99 1.00
1 71 0.007 0.007
2 0 0.00 0.00
3 0 0.00 0.00
4 0 0.00 0.00
5 0 0.00 0.00
6 0 0.00 0.00
7 0 0.00 0.00
8 0 0.00 0.00
9 0 0.00 0.00
P-value for 1 or more lod > 3.600: 0.0071
Estimated standard deviation for p-value: 0.054
In this case, we see that 71 out of 10000 simulations had exactly one chromosome with a lod greater than
3.60, and we calculate a replicate pool p-value for this hypothesis to be pRP = 0.0071!
Below this, PSEUDO reports an estimated standard deviation for pRP. In this
case, the estimate
(0.054) is quite large relative to pRP. This is because calculation of Var(pRP)
involves estimating a variance weight for each family and chromosome. The weight estimator implemented in PSEUDO
is designed to be conservative, which is has the virtue of placing a reliably conservative confidence bound
on pRP. The tradeoff is that the overall variance estimate will converge somewhat slowly. If you find
an interesting result in your data, and find that the variance estimate is overly conservative, we recommend rerunning that
analysis with a larger number of pseudo-replicates. This will allow the weight estimators to converge and makes it
possible to to place a narrower confidence bound on your p-value.
If you're interested in learning more about how PSEUDO estimates Var(pRP), you might want to
check out the section on variance estimation for the replicate pool method .
Joint p-value for all hypotheses
Below the last set of hypothesis results, PSEUDO will output two different
joint p-values for all hypotheses
Depending on the set hypotheses you've constructed, one or both of these values may be of interest.
The first p-value listed gives the probability that at least one hypothesis is true in any one simulation.
In general this will give a genomewide p-value that corrects for correlation among the various outcomes. While this
will be no smaller than your least significant hypothesis, it will properly correctly account for multiple comparisions.
If correlation between your outcomes is high, this will be much less conservative than a Bonferroni corrected p-value.
JOINT HYPOTHESES
================
Probability that (1 LOD > 3.60 for affection [PAIRS], affection [ALL])
OR (2 LODs > 2.68 for affection [PAIRS], affection [ALL])
OR (1 LOD > 3.12 for trait [QTL])
OR (2 LODs > 1.81 for trait [QTL])
= 0.063
The second p-value listed by PSEUDO gives the probability that all listed hypotheses are true in any one simulation.
This corresponds to a test of significance of the overall lod score profile you've obtained. In the latter circumstance,
you may find evidence for a genetic effect in your dataset even though no individual peak yields conclusive evidence for linkage.
Probability that (1 LOD > 3.60 for affection [PAIRS], affection [ALL])
AND (2 LODs > 2.68 for affection [PAIRS], affection [ALL])
AND (1 LOD > 3.12 for trait [QTL])
AND (2 LODs > 1.81 for trait [QTL])
= 0.0
The next step ...
Now that we've covered the absolute basics, you might want to go over the sections describing other types of summaries
that PSEUDO will produce, including linkage peak summaries, family
weight summaries , and graphical
output.. You might also find it useful to cover the sections on the
replicate pool method or possibly take a look at the section on variance
estimation .
|