University of Michigan Center for Statistical 
Genetics
Search
 
 

 
 

Combined Linkage and Association Analysis

In this example, we will compare the evidence for linkage and association in a single dataset. Further, we will test whether a specific SNP variant can account for an observed linkage signal. Recall that linkage measures the effect of a broad chromosomal region (which may include thousands of distinct alleles) whereas association focuses on a single allele (or sometimes, a very specific subset of alleles). Thus, testing whether a SNP can explain a linkage signal is equivalent to testing for the presence of additional alleles of large effect in the region covered by a linkage signal.

Before attempting this tutorial, it is recommended that you first complete the tutorials describing a simple association analysis and a parametric linkage analysis using MOD scores.

Input Files

We will examine evidence for linkage and association of age-related macular degeneration to the CFH gene in chromosome 1q. Strong association between a polymorphism in the gene and age-related macular degeneration was originally reported in Science by three different groups [Pubmed]. We will use a subset of the data previously published by Abecasis et al. (2004) and Zareparsi et al. (2005). This subset of the data includes only genotypes for three markers and has been edited to remove all identifying information, so as to preserve anonymity of all study participants.

The data set is described in three files, named Y402H.dat, Y402H.ped, Y402-frame.map and Y402H.map (in the examples subdirectory of the LAMP distribution). You can check the contents of these files by opening them in a text editor or, more conveniently, using the PEDSTATS program. The data set includes ~800 unrelated individuals as well as ~100 small families, each with multiple affected individuals.

Running the Analysis

Since all the files are ready, we can proceed to run the analysis. Although LAMP will estimate disease allele frequencies and penetrances, we do need to provide an estimate of the prevalence of the trait at hand (through the --prevalence command line option). In this case, the individuals are elderly and the disease is expected to be very common, so we set the prevalence to 20%.

  lamp -d Y402H.dat -p Y402H.ped -c Y402H.map -f Y402H-frame.map --prev 0.20

The first four options specify input file names, starting with the datafile (-d), and followed by the pedigree file (-p), candidate positions file (-c) and the framework file (-f). In this case, the framework file lists two nearby microsatellite markers (which are unlikely to be associated with disease, but can help evaluate evidence for linkage) and the candidate file lists only the Y402H SNP, which has been previously associated with macular degeneration in multiple studies.

After you execute the command above, LAMP should take a few minutes to run. Take a moment to see how the first few lines of LAMP output summarize current parameter settings and available alternatives and some diagnostics about the dataset. In this case, you will see that a few families (which included no genotyped individuals) were excluded from analysis, and that a single large family was excluded from analysis to preserve computing resources. You can repeat the analysis including this family later by adding the --maxbits 20 option to the command line. While you wait for analysis to complete, you could grab a coffee or peruse the original manuscripts [Abecasis et al. (2004) and Zareparsi et al. (2005)].

Estimated Test Statistics

Once LAMP completes running, you will see the evidence for linkage and association summarized in three LOD scores. Your results should match those transcribed below:

< ... snippet of LAMP results begins here ... >

                                TEST FOR         TEST FOR      TEST FOR OTHER
                                LINKAGE         ASSOCIATION   LINKED VARIANTS
                            ---------------- ---------------- ----------------
LOCATION      TRAIT            LOD df pvalue    LOD df pvalue    LOD df pvalue
==============================================================================
Y402H         AMD             1.46  3   0.08  22.06  1  7e-24   1.41  2   0.04

Peak association was LOD = 22.057 (1 df) at Y402H for AMD
Peak linkage was LOD = 1.460 (3 df) at Y402H for AMD

< ... snippet ends here ... >

In this case, you will see that the linkage test finds only very modest evidence for linkage in the region, with a LOD score = 1.46 (with 3 degrees of freedom, since the disease allele frequency and two penetrances were estimated). As previously reported, there is extremely strong evidence for association (with a LOD = 22.06 (!) and a p-value that is reassuringly small). Interestingly, it appears that some evidence for linkage remains even after accounting for the effects of the Y402H variant (LOD = 1.41, 2 df, p = 0.04). This last result suggests there may be other idependently associated alleles in the region or, perhaps, that Y402H is only indirectly associated with AMD.

As with all other LAMP analysis, you can find details of the estimated parameters in the companion output files.

Learning More

I hope you found this tutorial helpful. If you haven't already done so, you might want to read the sections parametric linkage analysis with MOD scores or simple association analyses. Alternatively, you can return to the main tutorial menu.


 
 

University of Michigan | School of Public Health | Abecasis Lab