University of Michigan Center for Statistical 


Recommendations for Data Analysis

If you are analysing data from a two-stage study, after calculating the recommended thresholds, you should calculate a z-statistic to compare allele frequencies between cases and controls in each stage. If pcases, pcontrols are the observed allele frequencies in cases and controls in the stage being analyzed, and Ncases and Ncontrols are the number of cases and controls genotyped in the stage being analysed, a suitable statistic might be calculated as:

Vcases = pcases (1 - pcases) / Ncases
Vcontrols = pcontrols (1 - pcontrols) / Ncontrols

Vdiff = Vcases + Vcontrols

Zstatistic = (pcases - pcontrols) / sqrt(Vdiff)

The statistics for each stage can be adjusted for population stratification using methods analogous to genomic control (for example, by rescaling the statistics using the square root of their empirical variance to ensure they have variance 1.0). Then, the analysis should proceed as follows:

For a one stage design, results should be deemed significant if the absolute value of the statistic exceeds the suggested threshold.

For a two stage design, markers should be selected for follow-up genotyping when the absolute value of the statistic exceeds the suggested threshold. Markers should be deemed to be significantly associated if the combined statistic exceeds the suggested stage 2 threshold. The combined statistic is defined as follows:

Zcombined = Zstage1 * sqrt(πsamples) + Zstage2 * sqrt(1.0 - πsamples)

This is the last section in the tutorial, to return to the main tutorial menu, click here.


University of Michigan | School of Public Health | Abecasis Lab