Linkage Analysis with MOD Scores
In this example, we will see how to use LAMP to carry out linkage
analysis in a sample of sibpairs. LAMP can calculate parametric LOD
scores but, in contrast to most other packages that execute parametric
linkage analysis, it does not rely on a fixed disease model. Instead,
uses the available data to estimate an allele frequency for the
unobserved disease allele and a penetrance for each genotype. This
approach produces a LOD score maximized over all possible disease
models and is traditionally called a MOD score analysis.
Input Files
We will analyse an examplar dataset set consisting of 200 affected
sibpairs, genotyped at 20 microsatellites along a single chromosome.
The genotype and phenotype data is described in two files, asp.dat and
asp.ped (in the examples subdirectory of the
LAMP distribution). You can check the contents of these
files by opening them in a text editor or, more conveniently, using
the PEDSTATS program.
When carrying out a linkage analysis, LAMP requires two additional
input files. The first file, describes a framework map which
specifies marker locations. The second file, a candidate map
lists positions at which the disease model will be estimated and a
LOD score reported.
The first few lines of the asp-frame.map file are reproduced below.
The three columns correspond to chromosome number, marker name and marker
position.
<Contents of asp-frame.map>
24 MRK1 0.000
24 MRK2 5.268
24 MRK3 10.536
24 MRK4 15.804
24 MRK5 21.072
< Additional lines not shown ... >
We plan to carry out analysis along a 5cM grid of equally spaced locations,
and thus the asp-candidate.map is quite similar. In each row, a chromosome
label is followed by a label for the analysis position and the actual analysis
position in centimorgans. For simplicity, it is a good choice to set the label
and analysis position to be be identical.
<Contents of asp-candidate.map>
24 2.5 2.5
24 7.5 7.5
24 12.5 12.5
24 17.5 17.5
24 22.5 22.5
<Additional lines not shown ...>
Running the Analysis
Since all the files are ready, we can proceed to run the analysis. Although
LAMP will estimate disease allele frequencies and penetrances, we do need to
provide an estimate of the prevalence of the trait at hand (through the
--prevalence command line option. In this case, the trait was simulated
with a prevalence of 0.05, and our final command line will look like
this:
lamp -d asp.dat -p asp.ped -f asp-frame.map -c asp-candidate.map --prev 0.05
The first four options specify input file names, starting with the datafile (-d),
and followed by the pedigree file (-p), framework map file (-f) and candidate
positions file (-c). If you execute the above command, you should see LAMP output
scroll through... The most interesting part is usually the table of estimate LOD scores
at the end, so we will start there.
Estimated LOD scores
An abbreviated version of the table of LOD scores produced by LAMP is reproduced below.
You will see that the strongest evidence for linkage was identified close
to the middle of chromosome (at position 52.5) and corresponds to a LOD score of 2.36
(with 2 degrees of freedom).
TEST FOR
LINKAGE
----------------
LOCATION TRAIT LOD df pvalue
============================================
... additional output lines removed here ...
37.5 affection 0.62 2 0.2
42.5 affection 1.49 2 0.03
47.5 affection 1.72 2 0.02
52.5 affection 2.36 2 0.004 <-- Linkage peak here
57.5 affection 1.13 2 0.07
62.5 affection 0.51 2 0.3
... additional output lines removed here ...
Since LAMP searched for the best fitting disease model at each position, you
will notice that all the tabulated LODs are greater than or equal to zero.
This is in contrast to parametric analyses using a fixed disease model. In
general, carrying out an analysis that searches for the best fitting disease
model will be desirable when the dataset is large and the disease model
is unknown, since model misspecification can reduce power.
Estimated Model Parameters
Although the maximum LOD score and its location is usually the most interesting
feature of a linkage analysis, LAMP produces other output that is worth considering.
For example, it is often worth inspecting the file lamp-linkage.out which
includes details of the estimated linkage model.
If you browse through the lamp-linkage.out file, you will see that it starts with a warning that,
for this affected sib-pair sample, the disease allele frequency was fixed. Affected
sibpair samples only allow 2 parameters to estimated in a linkage test (the two
parameters correspond, ultimately, to the probability of sharing 0, 1 and 2 alleles
IBD). Since we allowed both penetrances to vary, LAMP decided to fix the disease allele
frequency. As a result, you will notice the estimated disease allele frequency is the same
at all locations. If you'd like an estimate of the disease allele frequency, you must
constrain the disease model using the --additive, --dominance, --recessive,
or --multiplicative option. In a more informative sample, including different types of
pedigrees, LAMP would try to estimate both the disease allele frequency and penetrances.
Later in the file, you will see that, at the location of the peak LOD score, LAMP estimated
that individuals homozygous for the risk allele had a probability of being affected of about
20%. The disease allele frequency was fixed at 0.223.
Additional Options
We just completed a simple linkage analysis using LAMP. In this case, we specified only the
disease prevalence (with the --prevalence option) and the input files (with the -d,
-p, -f, -c options). You will use these 5 options in nearly all analyses,
but other options can be useful in a linkage analysis. All available options and their
current settings are summarized at the top of the LAMP output:
lamp -- Linkage and Association Models for Pedigree Data
version 0.0.0 (c) 2005-2006 Goncalo Abecasis and Mingyao Li
The following parameters are in effect:
Pedigree File : asp.ped (-pname)
Data File : asp.dat (-dname)
Framework Map : asp-frame.map (-fname)
Candidates Map : asp-candidate.map (-cname)
Additional Options
Disease Model : --additive, --dominant, --recessive, --multiplicative,
--free [ON], --prevalence [0.05]
Constraints : --unrelateds-and-trios, --sibpairs, --none, --auto [ON]
Optimization : --fletcher-reeves, --nelder-mead [ON], --stochastic,
--rounds [3], --precision [1.0e-08]
Performance : --maxbits [16], --buffers [ON], --skipCausality
Output : --nodetails
For a linkage analysis, the most common options will be to constrain the disease
model (with the options --additive, --dominance, --recessive,
or --multiplicative). If you want to tweak the approach used by LAMP to
maximize LOD scores, you can change the optimization options, by specifying how
many different starting points should be evaluate (--rounds option), the
accuracy of the estimated likelihood (--precision option) or the
numeric optimizer to be used (--fletcher-reeves, --nelder-mead or
--stochastic options). We have found the default precision of 10-8
combined with 3 rounds of the Nelder Mead optimizer to work well in most settings.
The --maxbits option specifies the maximum complexity of a pedigree
that LAMP will attempt to analyze. The --buffers option tells LAMP to
cache frequently used calculations in memory and should not be disabled unless
you are running out of memory (which should only happen in samples including
larger pedigrees). Finally, the --nodetails option suppresses output
of parameter estimates (in the lamp-linkage.out file) and is useful to
save disk space.
Hopefully this tutorial gave you a flavor of how to use LAMP for linkage analyses.
You can also read about combined linkage and association
analysis, parametric association analysis or
return to the main tutorial menu.
|