|
PEDSTATS Tutorial -- Selection Strategies for Hardy-Weinberg Tests
PEDSTATS is able to test markers for Hardy-Weinberg equilibrium using one of three selection strategies (founders
only, all individuals, or a set of unrelated individuals selected by the program). In this section, we consider each of these strategies and describe
situations where each may be indicated.
Hardy-Weinberg testing using unrelated individuals
In the ideal case, tests of Hardy-Weinberg should be performed using a sample of unrelated individuals. When this convention is followed,
tested genotypes will be independent and genotype and allele frequency estimates will free from bias that might be introduced if they were estimated
using correlated genotypes.
One possible strategy for choosing an unrelated set of individuals from family data is to
select all founders. Although this approach has the advantage of simplicity, when founder
genotypes are unavailable, the resulting test may have too few genotypes to ensure adequate power.
In this situation, a good alternative is to first identify a set of individuals with a high proportion of markers genotyped. This restricted set
can then be used to select a group of unrelated individuals. In addition to providing an independent set of genotypes, this approach will almost always yield a larger genotype sample
than a founders only strategy. It also has the advantage of
providing a single sample of individuals that can be used for testing across the entire marker set.
Although PEDSTATS also implements a founders only strategy, by default, it will
perform Hardy-Weinberg testing on your data using a strategy that focuses on
selecting a set of unrelated
individuals that have as few missing genotypes as possible. The algorithm for this "selected unrelated"
strategy is as follows:
Selected unrelated strategy
For each family f in the pedigree.
- Let If = {If1, If2 ... Ifn }
represent all individuals in family f, and let M be the number of markers in the data set.
- Select a set Gf of "most genotyped" individuals from If.
-
For each individual in If, calculate the proportion of markers that have been genotyped for that person
pfi = Nfi / M
where Nfi is the number of markers genotyped for Ifi
-
Calculate the family-wise maximum proportion of markers genotyped:
Pmax(f) = max { pf1 , pf2 ... , pfn }
-
Let Gf represent the set individuals in family f with individual genotype
proportion greater than 90% of the family-wise maximum:
Gf = { Ifi : pfi >= 0.9 *
Pmax(f) }
- Select a set Uf of unrelated, genotyped individuals for family f by iteratively removing candidates from Gf
- Select the individual Ifk in Gf with the fewest relatives and add them to the unrelated set (Uf) .
- Remove Ifk and all individuals related to Ifk from Gf
- If Gf is empty, stop. Otherwise, go to step 3a.
Founders only strategy
The Hardy-Weinberg test just described selects a single sample of unrelated individuals for testing of all markers. Although this test will outperform
a founders only test under most circumstances, if only a small proportion of selected individuals are genotyped for
a marker with available founder genotypes, you might find a founders only test to be useful. You can try this out by typing the command:
pedstats -p asp.ped -d asp.dat --checkFounders
to only run the check using data from founders.
For the asp.ped data set, Hardy-Weinberg tests using just founder individuals cannot be done due to a low number of founder genotypes. When this is
the case, PEDSTATS will indicate which markers were not screened.
HARDY-WEINBERG CHECK AMONG FOUNDERS
===================================
The following markers were not tested for Hardy-Weinberg due to a low
number of genotyped individuals:
MRK1 MRK2 MRK3 MRK4 MRK5 MRK6 MRK7 MRK8 MRK9 MRK10 MRK11 MRK12 MRK13 MRK14
MRK15 MRK16 MRK17 MRK18 MRK19 MRK20
Hardy-Weinberg testing using all individuals
When both the unrelated and founders only samples fail to yield enough genotypes for reliable testing of a marker, another alternative is to
test for Hardy-Weinberg using all individuals in your data set. Although the test as implemented ignores family structure, both the chi-squared and exact tests
are reasonably robust to violations of independence. This can allow testing to be performed on markers with only a
small number of independent genotypes are available.
| |