PSEUDO Tutorial -- Input File Formats

Main

Abecasis Lab

PSEUDO

Home

-----------------------------------------------------------------

Tutorial

Quick Reference

Input Files

Text Summaries

Family Weights

Linkage Peaks

Graphical Output

Resource Usage

-----------------------------------------------------------------

Download

-----------------------------------------------------------------

Reference

Replicate Pool

Variance

-----------------------------------------------------------------

FAQ

PSEUDO -- Input file formats

PSEUDO requires three different types of input in order to run a simulation: (1) a hypothesis file containing a list of tests to run (2) a set of z-score files (generated by MERLIN) to be used for sampling and (3) a list file telling PSEUDO where to find your z-score replicate files.

PSEUDO is designed to work with .zscore format files generated by MERLIN version 0.10.3 or higher. If you don't have MERLIN installed on your system, or if you haven't updated it on your system recently, you'll want to download an update from the MERLIN download site before proceeding.

Hypothesis File

The first thing you need to do before starting your simulation is to settle on a set of hypotheses to test based on your original results. Most of the time, you'll be interested in evaluating the significance of one or more lod score peaks that you've gotten from a genome scan.

As an illustration, we'll look at output produced by MERLIN when a genome scan is done on the sample data contained in pseudo_scan.ped, pseudo_scan.dat, pseudo_scan.map. If you've downloaded the example files for this tutorial, you should be able to reproduce the genome scan results by typing

 
     merlin -p pseudo_scan.ped -d pseudo_scan.dat -m pseudo_scan.map --npl --pairs --qtl

If you take a close look, you'll see that the scan produced several promising lod score peaks. For the --npl option, we had a lod of 3.60 on chromosome 2 and a lod of 2.68 for chromosome 8. We also had some interesting results for trait with lods of 3.12 on chromosome 3 and 1.81 on chromosome 8 using the --qtl scoring option.

If we'd like to assign a p-value to our best lod , we only need to the determine the empirical probability of a genome scan containing 1 or more lods greater than 3.60 for affection. However, to properly evaluate significance for our second best peak for affection, we'll need to account for the presence of the first by only counting those simulations that produced 2 or more independent lods greater than 2.68. PSEUDO can test each of these hypotheses jointly if we give it the hypothesis file below:

        HYPOTHESIS_1   affection  [ALL]    3.60    1
        HYPOTHESIS_2   affection  [ALL]    2.68    2

Here, the first column specifies a label for each hypothesis. Columns two and three contain the affection label and score type - together they fully specify the outcome you're interested in testing. The number in the fourth column is the lod score to be tested, and the last column gives the rank for the lod.

Using scan

If you're just getting started with your linkage data, and really don't have a set of specific hypotheses to test in mind, a quick way to get started with PSEUDO is to use a small companion utility called scan. Scan can save a few minutes by quickly going through your MERLIN screen output, finding the maximum lod value for each outcome in your data set and generating a hypothesis table to be used by PSEUDO. To run scan, simply save your original MERLIN output to a file

    merlin -p pseudo_scan.ped -d pseudo_scan.dat -m pseudo_scan.map --npl --pairs --qtl > merlin.output

and then tell scan to go through your output,

      	scan -s merlin.output -h your_hypotheses.hyp -l 2

The -l command line argument tells scan to pick out the 2 largest lods (on separate chromosomes) for each affection and write a list of hypotheses to a file:


	Hypothesis_0  affection [ALL]    3.60    1
	Hypothesis_1  affection [ALL]    2.68    2
	Hypothesis_2  affection [Pairs]  3.58    1
	Hypothesis_3  affection [Pairs]  2.66    2
	Hypothesis_4  trait [QTL]        3.12    1
	Hypothesis_5  trait [QTL]        1.81    2

A hypothesis with multiple affections

If you look at the results for affection [ALL] and affection [PAIRS], you may notice that we've gotten a promising lod for both outcomes at the same position. While this might be the case for any two correlated outcomes, the similarity between trait affection [ALL] and affection [PAIRS] is expected since we are doing two different tests of linkage on the same data. To correct for this correlation without overcorrecting, you may want to test these outcomes jointly. To do so, simply add a joint hypothesis to your file

        HYPOTHESIS_1   affection  [ALL], affection [PAIRS]    3.60    1
        HYPOTHESIS_2   affection  [ALL], affection [PAIRS]    2.68   2
        Hypothesis_3  trait [QTL]        3.12    1
        Hypothesis_4  trait [QTL]        1.81    2

Z-score files

In order to generate a pool of z-score replicates for sampling, you'll need to use MERLIN's --simulate option to generate gene-dropping replicates of your original datasets under the null hypothesis of no linkage and no association. You'll also need to tell MERLIN to analyse each of these replicates using the same Kong and Cox scoring options as your original analysis (e.g. --npl --pairs) , and additionally, to save family-specific z-scores (using the --zscores option).

        merlin -p example.ped -d example.dat -m example.map --npl --pairs --qtl --zscores --simulate

You may also want to save yourself a few keystrokes by using MERLIN --reruns option to generate multiple replicates with a single command line. For example the command,

        merlin -p pseudo_scan.ped -d pseudo_scan.dat -m pseudo_scan.map  --npl --pairs --qtl --zscores --simulate --reruns:20

will generate twenty gene-dropping replicates using the data in example.ped, example.map and example.dat as a template, do a Kong and Cox test of linkage on of these replicates using Spairs (--pairs) and Sall (--npl), and save the family-specific z-scores for each replicate. Once you run this command, you'll see 20 z-score files each with the same (randomly generated) file prefix and the replicate number as a suffix in your directory

          % ls merlin*.zscore
       
          merlin-00123456-00001.zscore
          merlin-00123456-00002.zscore
          merlin-00123456-00003.zscore 
          merlin-00123456-00004.zscore 
          merlin-00123456-00005.zscore 
          merlin-00123456-00006.zscore 
          merlin-00123456-00007.zscore 
          merlin-00123456-00008.zscore 
          merlin-00123456-00009.zscore 
          merlin-00123456-00010.zscore 
	  merlin-00123456-00011.zscore
          merlin-00123456-00012.zscore
          merlin-00123456-00013.zscore
          merlin-00123456-00014.zscore   
          merlin-00123456-00015.zscore
          merlin-00123456-00016.zscore
          merlin-00123456-00017.zscore
          merlin-00123456-00018.zscore
          merlin-00123456-00019.zscore
          merlin-00123456-00020.zscore

List file

After you've got a set of z-score replicates, you'll need to make a list file telling PSEUDO where to find them. The quickest way to do this on a UNIX system is to figure out the prefix used by MERLIN to tag your z-score files and then use the ls command to produce a list

        ls merlin-00123456*.zscore > pseudo.list

Now that you've got input files together, you're ready to run your first pseudosimulation. If you haven't already installed PSEUDO on your system, you'll want to get a copy from our download site before proceeding to the section on output produced by PSEUDO

University of Michigan | School of Public Health | Abecasis Lab