Combined Linkage and Association Analysis
In this example, we will compare the evidence for linkage and association
in a single dataset. Further, we will test whether a specific SNP variant can
account for an observed linkage signal. Recall that linkage measures the
effect of a broad chromosomal region (which may include thousands of distinct
alleles) whereas association focuses on a single allele (or sometimes, a very
specific subset of alleles). Thus, testing whether a SNP can explain a linkage signal
is equivalent to testing for the presence of additional alleles of large
effect in the region covered by a linkage signal.
Before attempting this tutorial, it is recommended that you first
complete the tutorials describing a simple
association analysis and a parametric
linkage analysis using MOD scores.
Input Files
We will examine evidence for linkage and association of age-related macular degeneration to the CFH gene in chromosome 1q. Strong association between a polymorphism in the gene and age-related macular degeneration was originally
reported in Science by three different groups
[Pubmed]. We will use a subset of the data previously published by Abecasis et al. (2004) and Zareparsi et al. (2005). This subset of the data includes only genotypes for three markers and has been edited to remove all identifying information, so as to preserve anonymity of all study participants.
The data set is described in three files, named Y402H.dat, Y402H.ped,
Y402-frame.map and Y402H.map (in the examples subdirectory of the LAMP
distribution). You can check the contents of these files by opening them in a text editor or, more conveniently, using
the PEDSTATS program. The data set includes ~800 unrelated individuals as well as ~100 small families,
each with multiple affected individuals.
Running the Analysis
Since all the files are ready, we can proceed to run the analysis. Although
LAMP will estimate disease allele frequencies and penetrances, we do need to
provide an estimate of the prevalence of the trait at hand (through the
--prevalence command line option). In this case, the individuals are elderly
and the disease is expected to be very common, so we set the prevalence to 20%.
lamp -d Y402H.dat -p Y402H.ped -c Y402H.map -f Y402H-frame.map --prev 0.20
The first four options specify input file names, starting with the datafile (-d),
and followed by the pedigree file (-p), candidate positions file (-c) and the framework file (-f). In this case, the framework file lists two nearby microsatellite markers (which are unlikely to be associated with disease, but can
help evaluate evidence for linkage) and the candidate file lists only the Y402H SNP,
which has been previously associated with macular degeneration in multiple studies.
After you execute the command above, LAMP should take a few minutes to run. Take a moment to see how the first few lines of LAMP output summarize current parameter settings
and available alternatives and some diagnostics about the dataset. In this case, you will
see that a few families (which included no genotyped individuals) were excluded from analysis, and that a single large family was excluded from analysis to preserve computing resources. You can repeat the analysis including this family later by adding the --maxbits 20 option to the command line. While you wait for analysis to complete, you could grab a coffee or peruse the original manuscripts [Abecasis et al. (2004) and Zareparsi et al. (2005)].
Estimated Test Statistics
Once LAMP completes running, you will see the evidence for linkage and association
summarized in three LOD scores. Your results should match those transcribed below:
< ... snippet of LAMP results begins here ... >
TEST FOR TEST FOR TEST FOR OTHER
LINKAGE ASSOCIATION LINKED VARIANTS
---------------- ---------------- ----------------
LOCATION TRAIT LOD df pvalue LOD df pvalue LOD df pvalue
==============================================================================
Y402H AMD 1.46 3 0.08 22.06 1 7e-24 1.41 2 0.04
Peak association was LOD = 22.057 (1 df) at Y402H for AMD
Peak linkage was LOD = 1.460 (3 df) at Y402H for AMD
< ... snippet ends here ... >
In this case, you will see that the linkage test finds only very modest evidence for
linkage in the region, with a LOD score = 1.46 (with 3 degrees of freedom, since the
disease allele frequency and two penetrances were estimated). As previously reported, there
is extremely strong evidence for association (with a LOD = 22.06 (!) and a p-value that is
reassuringly small). Interestingly, it appears that some evidence for linkage remains
even after accounting for the effects of the Y402H variant (LOD = 1.41, 2 df, p = 0.04). This
last result suggests there may be other idependently associated alleles in the region or,
perhaps, that Y402H is only indirectly associated with AMD.
As with all other LAMP analysis, you can find details of the estimated parameters in
the companion output files.
Learning More
I hope you found this tutorial helpful. If you haven't already done so, you might want to read the sections parametric linkage analysis with MOD scores or simple association analyses. Alternatively, you can
return to the main tutorial menu.
|