University of Michigan Center for Statistical 


Joint analysis is more efficient than replication-based analysis for two-stage genome-wide association studies.

Skol AD, Scott LJ, Abecasis GR and Boehnke M

Nat Genet (2006) 38:209-13

Genome-wide association is a promising approach to identify common genetic variants that predispose to human disease. Because of the high cost of genotyping hundreds of thousands of markers on thousands of subjects, genome-wide association studies often follow a staged design in which a proportion (πsamples) of the available samples are genotyped on a large number of markers in stage 1, and a proportion (πmarkers) of these markers are later followed up by genotyping them on the remaining samples in stage 2. The standard strategy for analyzing such two-stage data is to view stage 2 as a replication study and focus on findings that reach statistical significance when stage 2 data are considered alone. We demonstrate that the alternative strategy of jointly analyzing the data from both stages almost always results in increased power to detect genetic association, despite the need to use more stringent significance levels, even when effect sizes differ between the two stages. We recommend joint analysis for all two-stage genome-wide association studies, especially when a relatively large proportion of the samples are genotyped in stage 1 (πsamples >= 0.30), and a relatively large proportion of markers are selected for follow-up in stage 2 (πmarkers >= 0.01).




University of Michigan | School of Public Health | Abecasis Lab