LAMP Tutorial -- Linkage Analysis with MOD Scores

Main

CSG Home

-----------------------------------------------------------------

Abecasis Lab

Tutorial

LAMP Home

-----------------------------------------------------------------

Input Files

-----------------------------------------------------------------

Linkage Analysis

-----------------------------------------------------------------

Association Analysis I

-----------------------------------------------------------------

Association Analysis II

Linkage Analysis with MOD Scores

In this example, we will see how to use LAMP to carry out linkage analysis in a sample of sibpairs. LAMP can calculate parametric LOD scores but, in contrast to most other packages that execute parametric linkage analysis, it does not rely on a fixed disease model. Instead, uses the available data to estimate an allele frequency for the unobserved disease allele and a penetrance for each genotype. This approach produces a LOD score maximized over all possible disease models and is traditionally called a MOD score analysis.

Input Files

We will analyse an examplar dataset set consisting of 200 affected sibpairs, genotyped at 20 microsatellites along a single chromosome. The genotype and phenotype data is described in two files, asp.dat and asp.ped (in the examples subdirectory of the LAMP distribution). You can check the contents of these files by opening them in a text editor or, more conveniently, using the PEDSTATS program.

When carrying out a linkage analysis, LAMP requires two additional input files. The first file, describes a framework map which specifies marker locations. The second file, a candidate map lists positions at which the disease model will be estimated and a LOD score reported.

The first few lines of the asp-frame.map file are reproduced below. The three columns correspond to chromosome number, marker name and marker position.

<Contents of asp-frame.map>
24  MRK1     0.000
24  MRK2     5.268
24  MRK3    10.536
24  MRK4    15.804
24  MRK5    21.072
< Additional lines not shown ... >

We plan to carry out analysis along a 5cM grid of equally spaced locations, and thus the asp-candidate.map is quite similar. In each row, a chromosome label is followed by a label for the analysis position and the actual analysis position in centimorgans. For simplicity, it is a good choice to set the label and analysis position to be be identical.

<Contents of asp-candidate.map>
24   2.5   2.5
24   7.5   7.5
24  12.5  12.5
24  17.5  17.5
24  22.5  22.5
<Additional lines not shown ...>

Running the Analysis

Since all the files are ready, we can proceed to run the analysis. Although LAMP will estimate disease allele frequencies and penetrances, we do need to provide an estimate of the prevalence of the trait at hand (through the --prevalence command line option. In this case, the trait was simulated with a prevalence of 0.05, and our final command line will look like this:

  lamp -d asp.dat -p asp.ped -f asp-frame.map -c asp-candidate.map --prev 0.05

The first four options specify input file names, starting with the datafile (-d), and followed by the pedigree file (-p), framework map file (-f) and candidate positions file (-c). If you execute the above command, you should see LAMP output scroll through... The most interesting part is usually the table of estimate LOD scores at the end, so we will start there.

Estimated LOD scores

An abbreviated version of the table of LOD scores produced by LAMP is reproduced below. You will see that the strongest evidence for linkage was identified close to the middle of chromosome (at position 52.5) and corresponds to a LOD score of 2.36 (with 2 degrees of freedom).

                                TEST FOR
                                LINKAGE
                            ----------------
LOCATION      TRAIT            LOD df pvalue
============================================

... additional output lines removed here ...

37.5          affection       0.62  2    0.2
42.5          affection       1.49  2   0.03
47.5          affection       1.72  2   0.02
52.5          affection       2.36  2  0.004 <-- Linkage peak here
57.5          affection       1.13  2   0.07
62.5          affection       0.51  2    0.3

... additional output lines removed here ...

Since LAMP searched for the best fitting disease model at each position, you will notice that all the tabulated LODs are greater than or equal to zero. This is in contrast to parametric analyses using a fixed disease model. In general, carrying out an analysis that searches for the best fitting disease model will be desirable when the dataset is large and the disease model is unknown, since model misspecification can reduce power.

Estimated Model Parameters

Although the maximum LOD score and its location is usually the most interesting feature of a linkage analysis, LAMP produces other output that is worth considering. For example, it is often worth inspecting the file lamp-linkage.out which includes details of the estimated linkage model.

If you browse through the lamp-linkage.out file, you will see that it starts with a warning that, for this affected sib-pair sample, the disease allele frequency was fixed. Affected sibpair samples only allow 2 parameters to estimated in a linkage test (the two parameters correspond, ultimately, to the probability of sharing 0, 1 and 2 alleles IBD). Since we allowed both penetrances to vary, LAMP decided to fix the disease allele frequency. As a result, you will notice the estimated disease allele frequency is the same at all locations. If you'd like an estimate of the disease allele frequency, you must constrain the disease model using the --additive, --dominance, --recessive, or --multiplicative option. In a more informative sample, including different types of pedigrees, LAMP would try to estimate both the disease allele frequency and penetrances.

Later in the file, you will see that, at the location of the peak LOD score, LAMP estimated that individuals homozygous for the risk allele had a probability of being affected of about 20%. The disease allele frequency was fixed at 0.223.

Additional Options

We just completed a simple linkage analysis using LAMP. In this case, we specified only the disease prevalence (with the --prevalence option) and the input files (with the -d, -p, -f, -c options). You will use these 5 options in nearly all analyses, but other options can be useful in a linkage analysis. All available options and their current settings are summarized at the top of the LAMP output:

lamp -- Linkage and Association Models for Pedigree Data
        version 0.0.0 (c) 2005-2006 Goncalo Abecasis and Mingyao Li

The following parameters are in effect:
                 Pedigree File :         asp.ped (-pname)
                     Data File :         asp.dat (-dname)
                 Framework Map :   asp-frame.map (-fname)
                Candidates Map : asp-candidate.map (-cname)

Additional Options
   Disease Model : --additive, --dominant, --recessive, --multiplicative,
                   --free [ON], --prevalence [0.05]
     Constraints : --unrelateds-and-trios, --sibpairs, --none, --auto [ON]
    Optimization : --fletcher-reeves, --nelder-mead [ON], --stochastic,
                   --rounds [3], --precision [1.0e-08]
     Performance : --maxbits [16], --buffers [ON], --skipCausality
          Output : --nodetails

For a linkage analysis, the most common options will be to constrain the disease model (with the options --additive, --dominance, --recessive, or --multiplicative). If you want to tweak the approach used by LAMP to maximize LOD scores, you can change the optimization options, by specifying how many different starting points should be evaluate (--rounds option), the accuracy of the estimated likelihood (--precision option) or the numeric optimizer to be used (--fletcher-reeves, --nelder-mead or --stochastic options). We have found the default precision of 10^-8 combined with 3 rounds of the Nelder Mead optimizer to work well in most settings.

The --maxbits option specifies the maximum complexity of a pedigree that LAMP will attempt to analyze. The --buffers option tells LAMP to cache frequently used calculations in memory and should not be disabled unless you are running out of memory (which should only happen in samples including larger pedigrees). Finally, the --nodetails option suppresses output of parameter estimates (in the lamp-linkage.out file) and is useful to save disk space.

Hopefully this tutorial gave you a flavor of how to use LAMP for linkage analyses. You can also read about combined linkage and association analysis, parametric association analysis or return to the main tutorial menu.

University of Michigan | School of Public Health | Abecasis Lab