University of Michigan Center for Statistical 


The CaTS Power Calculator

CaTS is a friendly tool that can carry out power calculations for genetic association studies. CaTS can be used to estimate power for any genetic association study, but is especially designed to facilitate the design of two-stage genetic association studies. This page describes in detail the parameters that CaTS requires as input and the output it generates.

Sample Size

In this section, you should specify the total number of cases and controls available for the study. In a one-stage design, all the cases and controls would be genotyped; in a more cost-effective design only a fraction of the cases and controls would be genotyped initially and the results from this preliminary analysis would be used to select markers to genotype in the remaining individuals.

Cases: The number of cases available for the study. This will include both cases to be genotyped in stage 1 and stage 2.

Controls: The number of controls available for the study. This will include both controls to be genotyped in stage 1 and stage 2.

Two Stage Design

This section allows you to specify design parameters for a two-stage study. In a two-stage study, all the markers are examined in a fraction of the sample. Then, results of this initial analysis are use to select a fraction of markers to be followed up in the remainder of the sample.

Samples Genotyped in Stage 1 (%): The proportion of samples genotyped in stage 1. The number of cases and controls genotyped in stage 1 will be a function of the total number of samples available (specified in the sample size section). For example, if you have 1000 cases and 1000 controls and set this proportion to 30%, you should plan to genotyped 300 cases and 300 controls in the stage 1. Using the notation developed in Skol et al. (2006), this value is πsamples.

Markers Genotyped in Stage 2 (%): The percentage of markers that you plan to follow-up in Stage 2. Using the notation developed in Skol et al., this value is πmarkers. The power calculation assumes that you will test each marker for association at the end of stage 1, and follow-up markers whose corresponding p-value is < πmarkers by genotyping them in the remaining cases and controls.

Significance level: The desired false positive rate per marker. If all M markers are independent and you wish to maintain a genome-wide false postive rate of .05, the per marker false positive rate should be .05/M. In the Skol et al. paper this value is denoted αmarker.

Disease Model

Prevalence: The disease prevalence. This is the probability that a randomly sampled individual is affected by the disease.

Disease Allele Frequency : The frequency of the risk allele in the general population. Usually, the allele will be a little more common in cases, and a little rarer in contros.

Genotype Relative Risk: The definition of genotype relative risk (GRR) is depends on the disease model. If f0, f1, f2 are the probabilities of being affected for individuals with 0, 1, or 2 copies of the risk allele, then GRR is defined as follows:

GRR = f1 / f0 = f2 / f1
GRR = f1 / f0
GRR = f1 / f0 = f2 / f0
GRR = f2 / f0 = f2 / f1

CaTS output

Power Tab

This section displays estimated power for different analysis and genotyping strategies.

One Stage Design: Power attained when all samples are genotyped on all markers in a single stage

Replication Analysis: Power attained when stage 2 data is analyzed independently of the strength of association in stage 1. Replication is deemed successful when the the two stages provide evidence for an effect in same direction

Joint Analysis: Power attained when test statistics from stage 1 and stage 2 are combined.

Thresholds Tab

This section displays suggested thresholds for association tests. These thresholds will ensure the user specified type I error rate. At each stage, a z-statistic should be calculated to compare allele frequencies in cases and controls (for example, as defined in Skol et al.). If desired, the statistic can be adjusted for population stratification using Genomic Control or another appropriate strategy.

One Stage Design: Critical value for two-sided test of association using a z-statistic and the marker-wise false positive rate specified by Significance Level.

Stage One Threshold: This is the critical value that should be used when selecting markers for follow-up genotyping in stage 2

Replication Threshold: Critical value to be used for stage 2 when using replication-based analysis.

Joint Analysis Threshold: Critical value to be use when test statistics from stage 1 and stage 2 are combined.

Penetrances Tab

This tab displays the probability that an individual will be affected for each marker genotype. It also displays the Attributable fraction, which is the proportion of cases due to the effect of the disease predisposing locus, and Recurrance risk to siblings, which is a factor summarizing the increase in risk to siblings of affected individuals due to this locus.

Information Tab

Reduction in total genotyping required: Number of genotypes saved when using the specified two-stage design / Number of genotyped performed in a one-stage design

Case and Control allele frequency: Risk allele frequency in cases and controls given the disease model

Probability marker is followed up: Probabillity a disease predisposing variant with the specified disease model and parameters is selected for genotyping in stage 2 given the specified two-stage design

Optimization Tab

Genotyping Cost Ratio: Specifies the relative cost of genotypes generate in Stage 2 (typically, Stage 1 uses a massive throughput platform with very low per genotype costs, whereas Stage 2 uses more customizable and expensive assays). If it costs one penny to generate a genotype in stage 1 and 5 pennies to generate a genotype in stage 2, the cost ratio is $0.05 / $0.01 = 5.

Target Power (%): This is the power you want to achieve for the two stage design, using joint analysis. It must be less than the power of the one-stage design, since two stage designs always lose a little power. After pressing the Optimize! button, the two-stage design that achieves Target Power at the lowest cost will be reported.

Target Cost (%): This is the amount of funds available for genotyping, as a proportion of the cost for a one stage design. For example, if the one stage design would cost $1 million, but you only have $400,000 available, set the target cost to 40%. Pressing the Optimize! button will report the two-stage design that costs achieves the highest possible power for the target cost.

Optimize! button: When you click this button, the most cost effective design (for a Target Power) or the most power (for a Target Cost) is reported. Optimization may take a few seconds, depending on the speed of your computer, so please be patient.


University of Michigan | School of Public Health | Abecasis Lab