University of Michigan Center for Statistical 


PEDSTATS Tutorial -- Selection Strategies for Hardy-Weinberg Tests

PEDSTATS is able to test markers for Hardy-Weinberg equilibrium using one of three selection strategies (founders only, all individuals, or a set of unrelated individuals selected by the program). In this section, we consider each of these strategies and describe situations where each may be indicated.

Hardy-Weinberg testing using unrelated individuals

In the ideal case, tests of Hardy-Weinberg should be performed using a sample of unrelated individuals. When this convention is followed, tested genotypes will be independent and genotype and allele frequency estimates will free from bias that might be introduced if they were estimated using correlated genotypes.

One possible strategy for choosing an unrelated set of individuals from family data is to select all founders. Although this approach has the advantage of simplicity, when founder genotypes are unavailable, the resulting test may have too few genotypes to ensure adequate power. In this situation, a good alternative is to first identify a set of individuals with a high proportion of markers genotyped. This restricted set can then be used to select a group of unrelated individuals. In addition to providing an independent set of genotypes, this approach will almost always yield a larger genotype sample larger than a founders only strategy. It also has the advantage of providing a single sample of individuals that can be used for testing across the entire marker set.

Although PEDSTATS also implements a founders only strategy, by default, it will perform Hardy-Weinberg testing on your data using a strategy that focuses on selecting a set of unrelated individuals that have as few missing genotypes as possible. The algorithm for this "selected unrelated" strategy is as follows:

Selected unrelated strategy

For each family f in the pedigree.

  1. Let If = {If1, If2 ... Ifn } represent all individuals in family f, and let M be the number of markers in the data set.

  2. Select a set Gf of "most genotyped" individuals from If.

    1. For each individual in If, calculate the proportion of markers that have been genotyped for that person

      pfi = Nfi / M
      where Nfi is the number of markers genotyped for Ifi

    2. Calculate the family-wise maximum proportion of markers genotyped:

      Pmax(f) = max { pf1 , pf2 ... , pfn }

    3. Let Gf represent the set individuals in family f with individual genotype proportion greater than 90% of the family-wise maximum:

      Gf = { Ifi : pfi >= 0.9 * Pmax(f) }

  3. Select a set Uf of unrelated, genotyped individuals for family f by iteratively removing candidates from Gf

    1. Select the individual Ifk in Gf with the fewest relatives and add them to the unrelated set (Uf) .

    2. Remove Ifk and all individuals related to Ifk from Gf

    3. If Gf is empty, stop. Otherwise, go to step 3a.

Founders only strategy

The Hardy-Weinberg test just described selects a single sample of unrelated individuals for testing of all markers. Although this test will outperform a founders only test under most circumstances, if only a small proportion of selected individuals are genotyped for a marker with available founder genotypes, you might find a founders only test to be useful. You can try this out by typing the command:

       pedstats -p asp.ped -d asp.dat --checkFounders
to only run the check using data from founders.

For the asp.ped data set, Hardy-Weinberg tests using just founder individuals cannot be done due to a low number of founder genotypes. When this is the case, PEDSTATS will indicate which markers were not screened.


The following markers were not tested for Hardy-Weinberg due to a low
number of genotyped individuals:

   MRK15 MRK16 MRK17 MRK18 MRK19 MRK20

Hardy-Weinberg testing using all individuals

When both the unrelated and founders only samples fail to yield enough genotypes for reliable testing of a marker, another alternative is to test for Hardy-Weinberg using all individuals in your data set. Although the test as implemented ignores family structure, both the chi-squared and exact tests are reasonably robust to violations of independence. This can allow testing to be performed on markers with only a small number of independent genotypes are available.


University of Michigan | School of Public Health