Recommendations for Data Analysis
If you are analysing data from a two-stage study, after calculating the recommended thresholds, you should
calculate a z-statistic to compare allele frequencies between cases and controls in each stage. If
pcases, pcontrols are the observed allele frequencies in cases and controls in the stage
being analyzed, and Ncases and Ncontrols are the number of cases and controls genotyped
in the stage being analysed, a suitable statistic might be calculated as:
Vcases = pcases (1 - pcases) / Ncases
Vcontrols = pcontrols (1 - pcontrols) / Ncontrols
Vdiff = Vcases + Vcontrols
Zstatistic = (pcases - pcontrols) / sqrt(Vdiff)
The statistics for each stage can be adjusted for population stratification using methods analogous to
genomic control (for example, by rescaling the statistics using the square root of their empirical
variance to ensure they have variance 1.0). Then, the analysis should proceed as follows:
For a one stage design, results should be deemed significant if the absolute value of the statistic
exceeds the suggested threshold.
For a two stage design, markers should be selected for follow-up genotyping when the absolute value
of the statistic exceeds the suggested threshold. Markers should be deemed to be significantly associated
if the combined statistic exceeds the suggested stage 2 threshold. The combined statistic is defined as follows:
Zcombined = Zstage1 * sqrt(πsamples) + Zstage2 * sqrt(1.0 - πsamples)
This is the last section in the tutorial, to return to the main tutorial menu,
click here.
|