Variance Components Models of Association and Permutations
for Exact p-values
Have a look at the second set of example files, sibs.dat,
sibs.ped and sibs.ibd. These describe 50 nuclear
families with four offspring each, but no parental phenotypes
or genotypes. The pedigree file may look familiar, as it is, in
fact, a pre-makeped LINKAGE format file. The file of IBD probabilities
is generated automatically by prelude and finale,
as described in the next section.
The data include 3 markers, 1 quantitative phenotype and 1
covariate. Missing genotypes are encoded as zeros, and missing
phenotypes are encoded as -99.999. If you wish, run pedstats
-d sibs.dat -p sibs.ped -x-99.999 to get a more detailed
description of this file (do not place a space between -x
and -99.999)
File: sibs.dat |
File: sibs.ped |
M SNP_1
M SNP_2
M SNP_3
T Trait
C Covariate
|
1 1 0 0 1 0 0 0 0 0 0 -99.999 -99.999
1 2 0 0 2 0 0 0 0 0 0 -99.999 -99.999
1 3 1 2 1 1 1 1 1 2 2 80.690 81.722
1 4 1 2 1 1 2 1 2 2 2 80.955 90.420
1 5 1 2 1 1 2 1 2 2 2 73.202 90.747
1 6 1 2 2 1 1 1 1 2 2 90.030 99.436
(... additional families follow ...)
|
Simple linear models do not provide valid tests of linkage
disequilibrium when multiple offspring per family are considered.
qtdt uses variance components to model the phenotypic similarities
that are common in family data. Variance components are specified
using the -w option. A typical model for the variances
might include environmental (e), polygenic (g) and
additive (a) components of variance.
Run the following command qtdt -d sibs.dat -p sibs.ped -i
sibs.ibd -x-99.999 -wega. (The command line parameters specify
the input file names, the missing value code and the model for
the variances). After the usual copyright notice, reference list,
and summary of command line parameters you should see this model
description:
The following models will be evaluated...
NULL MODEL
Means = Mu + Covariate + B
Variances = Ve + Vg + Va
FULL MODEL
Means = Mu + Covariate + B + W
Variances = Ve + Vg + Va
|
The model description now includes not only a linear model
for the means (with the covariate defined in the pedigree file)
but also a model for the variances. Means and variances are fitted
by maximum likelihood using a numeric minimizer (different minimizers
specified by the -n command-line option). The results section
looks similar to the one in the previous section:
Testing trait: Trait
=============================================
Testing marker: SNP_1
---------------------------------------------
Allele df(0) LnLk(0) df(T) LnLk(T) ChiSq p
1 : 194 681.22 193 674.58 13.28 0.0003 ( 164/200 probands)
2 : 194 681.22 193 674.58 13.28 0.0003 ( 164/200 probands)
Testing marker: SNP_2
---------------------------------------------
Allele df(0) LnLk(0) df(T) LnLk(T) ChiSq p
1 : 194 683.68 193 678.30 10.76 0.0010 ( 152/200 probands)
2 : 194 683.68 193 678.30 10.76 0.0010 ( 152/200 probands)
Testing marker: SNP_3
---------------------------------------------
Allele df(0) LnLk(0) df(T) LnLk(T) ChiSq p
1 : 194 684.99 193 682.61 4.76 0.0291 ( 168/200 probands)
2 : 194 684.99 193 682.61 4.76 0.0291 ( 168/200 probands)
|
SNP_1 appears to provide the strongest evidence for association.
To find out what happens if you don't include the covariate in
the analysis, run qtdt -d sibs.dat -p sibs.ped -i sibs.ibd
-x-99.999 -wega -c-.
Variance components models can be sensitive to the phenotypic
distribution, especially in small or selected samples. qtdt
can calculate empirical p-values using a Monte-Carlo permutation
framework. These permutations condition on the trait distribution,
linkage and familiality, and provide a test for linkage disequilibrium.
This can be relatively slow, but provides added confidence in
your result.
To try some permutations, run qtdt -d sibs.dat -p sibs.ped
-i sibs.ibd -x-99.999 -wega -m1000 -1 and go have a
break!
To find out how the IBD probabilities were estimated using
simwalk2, proceed to the next section.
|