PSEUDO Tutorial - Output files

Main

Abecasis Lab

PSEUDO

Home

-----------------------------------------------------------------

Tutorial

Quick Reference

Input Files

Text Summaries

Family Weights

Linkage Peaks

Graphical Output

Resource Usage

-----------------------------------------------------------------

Download

-----------------------------------------------------------------

Reference

Replicate Pool

Variance

-----------------------------------------------------------------

FAQ

PSEUDO -- Output produced by the program

Basics of running the program

Now that we've got a set of z-score files, a list file and a file listing our hypotheses, we're ready to evaluate empirical p-values using PSEUDO. To run 10000 pseudosimulations, all you need to do is type

     pseudo -l pseudo.list -h pseudo.hyp -n 10000

If you'd also like to see pdf output, try adding a --pdf argument to your command line:

     pseudo -l pseudo.list -h pseudo.hyp -n 10000 --pdf

Once you press return, the first thing you'll see is a program header followed by a section detailing which program options have been selected.


PseudoSimulator
(c) 2003-2005 Jan Wigginton, 2003-2005 Goncalo Abecasis


The following parameters are in effect:
            List of replicates :     pseudo.list (-lname)
        Replicates to generate :           10000 (-n9999)
              Hypothesis Table :      pseudo.hyp (-hname)
                   Random Seed :      1128619888 (-r9999)

Additional Options
     Analysis Options : --balanced, --random [ON], --chrMax [23]
   Optional Summaries : --peaks [ON], --simpeaks [ON], --hits [ON]
     Weight Summaries : --reps [ON], --weights [ON]
       Output Options : --quiet, --prefix [pseudo], --pdf [ON]
       Resource Usage : --megabytes [1000], --diskSpace [100], --thinData [1]

Since we've selected pdf output, this option is set to ON. Notice as well that the random option is set to ON. This is the default sampling scheme for the program.

After the header and program options have been listed, PSEUDO will open your first z-score file and construct a list of all affections, families, chromosomes and positions. Once this is done, PSEUDO will print out values for important parameters of the simulation as well as an estimate of maximum memory usage for the simulation.


        Number of families:               200
        Number of positions:              438
        Number of chromosomes:             22
        Number of affections:               3
        Number of hypotheses:               4
        Number of real simulations:        20
        Number of pseudosimulations:    10000


Estimated maximum memory required to perform this simulation is 98.5 M

Once PSEUDO knows how many positions, families and affections it will be working with, it opens each z-score replicate file and reads in the family-specific z-scores for each family and affection variable at each position for that replicate. If it encounters any position, family, affection or chromosome that wasn't listed in the first file z-score file, this will be flagged as an error. Conversely, if a z-score value is not listed for any position that was listed in the first z-score file, PSEUDO will use a value of 0.00 at the position for that replicate. If there's no problems reading in your z-score information, PSEUDO will tell you that the file loaded successfully.


	Reading information from file merlin-00123456-00001.zscore
	Reading information from file merlin-00123456-00002.zscore
	Reading information from file merlin-00123456-00003.zscore
	Reading information from file merlin-00123456-00004.zscore
	Reading information from file merlin-00123456-00005.zscore
	Reading information from file merlin-00123456-00006.zscore
	Reading information from file merlin-00123456-00007.zscore	
	Reading information from file merlin-00123456-00008.zscore
	Reading information from file merlin-00123456-00009.zscore	
	Reading information from file merlin-00123456-00010.zscore
	Reading information from file merlin-00123456-00011.zscore
	Reading information from file merlin-00123456-00012.zscore
	Reading information from file merlin-00123456-00013.zscore
	Reading information from file merlin-00123456-00014.zscore
	Reading information from file merlin-00123456-00015.zscore
	Reading information from file merlin-00123456-00016.zscore
	Reading information from file merlin-00123456-00017.zscore
	Reading information from file merlin-00123456-00018.zscore
	Reading information from file merlin-00123456-00019.zscore
	Reading information from file merlin-00123456-00020.zscore

How PSEUDO assigns empirical p-values

When PSEUDO evaluates p-values, it will look for chromosomes with one or more lods exceeding the lod of interest. Any such chromosome is considered a "hit", and a simulation with multiple hits has one or more lods greater than the lod of interest on several different chromosomes. The ranked p-value reported is calculated as

T_k	=	number of simulations with k or more chromosomes with max lod > L
N	=	total simulations
P_k	=	T_k / N

Actual distribution of hits

Before it starts the main simulation, PSEUDO will produce a table showing the distribution of hits per simulation in your original z-score replicates.


ACTUAL DISTRIBUTION OF SIGNIFICANT CHROMOSOMES PER SIMULATION
=============================================================

Hypothesis  :  Hypothesis_0
Affections  :  affection [PAIRS], affection [ALL]
Target Lod  :  3.60
Target Rank :  1
Simulations :  20

      HITS      SIMS                PROB          CUMULATIVE
------------------------------------------------------------

         0        20                1.00                1.00
         1         0                0.00                0.00
         2         0                0.00                0.00
         3         0                0.00                0.00
         4         0                0.00                0.00
         5         0                0.00                0.00
         6         0                0.00                0.00
         7         0                0.00                0.00
         8         0                0.00                0.00
         9         0                0.00                0.00


P-value for 1 or more lod > 3.600: 0.0

For this section, lod score results are re-created for the 20 original gene-dropping simulations that we'll be using for z-score sampling. The distribution is given in the table; column one lists the possible number of hits; column two gives the number of simulations that had exactly that many hits. For Hypothesis_0, we're testing the significance of lod score of 3.60 for the outcome affection [PAIRS] or affection [ALL] at a rank of 1. Since this is our best lod from the actual data, it's really not all that surprising to see that linkage analysis on the 20 datasets simulated under the null failed to produce any chromosomes with a lod greater than 3.60.

Simulated distribution of hits

Once the main simulation runs, you'll see another set tables giving the simulated distribution of hits for each hypothesis. If you use the example data, you should see a table very similar to the one below:


SIMULATED DISTRIBUTION OF SIGNIFICANT CHROMOSOMES PER SIMULATION
================================================================

Hypothesis  :  Hypothesis_0
Affections  :  affection [PAIRS], affection [ALL]
Target Lod  :  3.60
Target Rank :  1
Simulations :  10000

      HITS      SIMS                PROB          CUMULATIVE
------------------------------------------------------------
      
         0      9929                0.99                1.00 
         1        71               0.007               0.007
         2         0                0.00                0.00
         3         0                0.00                0.00
         4         0                0.00                0.00
         5         0                0.00                0.00
         6         0                0.00                0.00
         7         0                0.00                0.00
         8         0                0.00                0.00
         9         0                0.00                0.00
         
         
P-value for 1 or more lod > 3.600: 0.0071
Estimated standard deviation for p-value:  0.054

In this case, we see that 71 out of 10000 simulations had exactly one chromosome with a lod greater than 3.60, and we calculate a replicate pool p-value for this hypothesis to be p_RP = 0.0071!

Below this, PSEUDO reports an estimated standard deviation for p_RP. In this case, the estimate (0.054) is quite large relative to p_RP. This is because calculation of Var(p_RP) involves estimating a variance weight for each family and chromosome. The weight estimator implemented in PSEUDO is designed to be conservative, which is has the virtue of placing a reliably conservative confidence bound on p_RP. The tradeoff is that the overall variance estimate will converge somewhat slowly. If you find an interesting result in your data, and find that the variance estimate is overly conservative, we recommend rerunning that analysis with a larger number of pseudo-replicates. This will allow the weight estimators to converge and makes it possible to to place a narrower confidence bound on your p-value.

If you're interested in learning more about how PSEUDO estimates Var(p_RP), you might want to check out the section on variance estimation for the replicate pool method .

Joint p-value for all hypotheses

Below the last set of hypothesis results, PSEUDO will output two different joint p-values for all hypotheses Depending on the set hypotheses you've constructed, one or both of these values may be of interest.

The first p-value listed gives the probability that at least one hypothesis is true in any one simulation. In general this will give a genomewide p-value that corrects for correlation among the various outcomes. While this will be no smaller than your least significant hypothesis, it will properly correctly account for multiple comparisions. If correlation between your outcomes is high, this will be much less conservative than a Bonferroni corrected p-value.

JOINT HYPOTHESES
================

Probability that (1 LOD  > 3.60 for affection [PAIRS], affection [ALL])
      OR (2 LODs > 2.68 for affection [PAIRS], affection [ALL])
      OR (1 LOD  > 3.12 for trait [QTL])
      OR (2 LODs > 1.81 for trait [QTL])
      =  0.063

The second p-value listed by PSEUDO gives the probability that all listed hypotheses are true in any one simulation. This corresponds to a test of significance of the overall lod score profile you've obtained. In the latter circumstance, you may find evidence for a genetic effect in your dataset even though no individual peak yields conclusive evidence for linkage.


Probability that (1 LOD  > 3.60 for affection [PAIRS], affection [ALL])
      AND (2 LODs > 2.68 for affection [PAIRS], affection [ALL])
      AND (1 LOD  > 3.12 for trait [QTL])
      AND (2 LODs > 1.81 for trait [QTL])
      =  0.0

The next step ...

Now that we've covered the absolute basics, you might want to go over the sections describing other types of summaries that PSEUDO will produce, including linkage peak summaries, family weight summaries , and graphical output.. You might also find it useful to cover the sections on the replicate pool method or possibly take a look at the section on variance estimation .

University of Michigan | School of Public Health | Abecasis Lab