|
PEDSTATS Tutorial -- Fast Exact Hardy Weinberg Test for SNPs
The exact Hardy-Weinberg test implemented in Pedstats is based on an exact calculation of
the probability of observing H heterozygotes conditional on the number
of copies of the minor SNP allele.
In general, if we have
N
total genotypes, R copies of the minor allele, and therefore C = 2N - R
copies of the common allele, there will be 2N!/ C!R! possible arrangements for the alleles in the sample. If we have
H heterozygotes, the number of homozygous carriers of the minor allele can be written as
RH = (R-H) / 2
while the number of homozygous carriers of the common allele is given by
CH = (2N - R - H) / 2.
Of the 2N!/C!R! possible allele arrangements,
2H N! / (H! RH!CH!) contain exactly H heterozygotes.
Therefore, the probability of observing H heterozygotes, conditional on R copies of the minor allele and
N total genotypes is given by
P(H | N, R) = (2H N! / (H! RH!
CH!) * (R! C! / 2N!)
PEDSTATS uses a fast algorithm to evaluate this probability for all possible values of H in
order to recreate the exact probability distribution. Once this distribution is known, a p-value for
the observed number of heterozygotes is calculated as:
Here, I[] is an indicator function that returns 1 when the comparision is true and 0 otherwise.
| |