Quantitative trait linkage analyses examine whether a chromosomal region is responsible for some of the variation in a trait of interest. Here, we will describe how fast quantitative trait regression analyses can be carried out using MERLIN.
For this example, we will use a simulated data set that you will find in the examples subdirectory of the MERLIN distribution or in the download page.
The dataset consists of a simulated 5-cM scan of chromosome 24 in 200 sib-pair families and is organized into 3 files, a data file (asp.dat), a pedigree file (asp.ped) and a map file (asp.map). A quantitative trait has been scored for each offspring.
The recommended first step in any analysis is to verify that input files are being interpreted correctly. So let's start by running pedstats... Pedstats requires an input data file (-d parameter) and pedigree file (-p parameter):
prompt> pedstats -d asp.dat -p asp.ped
By examining the abbreviated pedstats output below, you should be able to confirm that there are 200 pedigrees, each with 4 individuals (two siblings and their parents). The pedigree includes a quantitative trait that has been measured on all 400 offspring but none of the founders.
Pedigree Statistics (c) 1999-2001 Goncalo Abecasis The following parameters are in effect: QTDT Pedigree File : asp.ped (-pname) QTDT Data File : asp.dat (-dname) Missing Value Code : -99.999 (-xname) PEDIGREE STRUCTURE ================== Individuals: 800 (400 founders, 400 nonfounders) Families: 200 Average Family Sizes: 4.00 Average Generations: 2.00 QUANTITATIVE TRAIT STATISTICS ============================= [Phenotypes] [Founders] Mean Var trait 400 50.0% 0 0.0% 0.021 1.496 AFFECTION STATISTICS ==================== [Diagnostics] [Founders] Prevalence affection 400 50.0% 0 0.0% 100.0% Total 400 50.0% 0 0.0% MARKER GENOTYPE STATISTICS ========================== [Genotypes] [Founders] Hetero MRK1 400 50.0% 0 0.0% 72.8% MRK2 400 50.0% 0 0.0% 73.2% (...statistics for other markers would appear here...) Total 8000 50.0% 0 0.0% 74.1%
The most popular method of quantitative trait linkage is the Haseman-Elston (1972) procedure where squared trait differences for sib-pairs are regressed on IBD allele-sharing. If a gene in the region being investigate influences trait levels, sib-pairs who share more alleles are expected to show similar phenotypes and, therefore, smaller squared trait differences.
The flexibility of the method of Haseman and Elston has lead many authors to propose enhancements and extensions. Sham et al. (2002) have recently described a regression-based procedure for linkage analysis that uses trait-squared sums and differences to predict IBD sharing between any non-inbred relative pairs. This method is implemented in the MERLIN-REGRESS program, included in the merlin distribution. The method of Sham et al. can be applied to selected samples but requires specification of the trait distribution parameters in the general population.
To run MERLIN-REGRESS, we will need to specify the input data(-d parameter), pedigree (-p parameter) and map (-m parameter) file names. In addition, we will need to specify the trait distribution parameters (--mean, --variance and --heritability options). In this case, we will assume that the trait of interest has mean=0.0, variance=1.5 and heritability=80% in the general population:
prompt> merlin-regress -d asp.dat -p asp.ped -m asp.map --mean 0.0 --var 1.5 --her 0.8
After running the command, you should first see the familiar MERLIN banner and a summary of currently selected options:
MERLIN 0.9.1 - (c) 2000-2002 Goncalo Abecasis The following parameters are in effect: Data File : asp.dat (-dname) Pedigree File : asp.ped (-pname) Missing Value Code : -99.999 (-xname) Map File : asp.map (-mname) Allele Frequencies : ALL INDIVIDUALS (-f[a|e|f|file]) Random Seed : 123456 (-r9999) Regression Analysis Options Trait Model : --mean [0.00], --variance [1.50], --heritability [0.80] Recombination : --zero, --one, --two, --three, --singlepoint Positions : --steps, --maxStep, --minStep, --grid, --start, --stop Limits : --bits [24], --megabytes, --minutes Output : --quiet, --markerNames Others : --simulate, --swap, --rankFamilies Estimating allele frequencies... [using all genotypes] MRK1 MRK2 MRK3 MRK4 MRK5 MRK6 MRK7 MRK8 MRK9 MRK10 MRK11 MRK12 MRK13 MRK14 MRK15 MRK16 MRK17 MRK18 MRK19 MRK20
After a few moments, you should see analysis results at each location:
Pedigree-Wide Regression Analysis (Trait: trait) ====================================================== Position H2 Stdev Info LOD pvalue 0.000 0.406 0.192 64.8% 0.970 0.02 5.268 0.526 0.183 71.1% 1.792 0.002 10.536 0.598 0.182 72.1% 2.343 0.0005 15.804 0.733 0.182 72.1% 3.520 0.00003 21.072 0.586 0.182 72.2% 2.255 0.0006 26.340 0.596 0.190 66.0% 2.135 0.0009 31.608 0.535 0.189 67.0% 1.744 0.002 36.876 0.522 0.184 70.6% 1.752 0.002 42.144 0.414 0.181 73.0% 1.137 0.011 47.412 0.295 0.175 77.5% 0.614 0.05 (... results continue at other locations ...)
Successive columns indicate position along the chromosome (in CM), estimated locus specific heritability, standard deviation for the estimate of locus specific heritability, proportion of linkage information extracted at this location (100% information corresponds to the smallest possible confidence interval for estimated effect size), LOD score and corresponding p-value. In this case, linkage peaks at position 15.8 with an estimated locus specific heritability of 73.3% and a LOD score of 3.52 (probability 0.00003).
Another useful option in MERLIN-REGRESS is the ability to quantify the expected amount of linkage information in each family. This can be useful when focusing genotyping efforts (for example, by genotyping the most informative families first) or identifying problematic outliers (extreme outliers will lead to some families with very large weights which can reduce effective sample size in linkage analyses).
To estimate family informativeness, specify the trait distribution in the population (by specifying it's mean, variance and heritability) and use the --rankFamilies option. Using the example input files the command line would read:
prompt> merlin-regress -d asp.dat -p asp.ped --mean 0 --var 1.5 --her 0.8 --rank
Running this command would produce the familiar MERLIN output screen followed by a table looking like the one below:
Family Informativeness ====================== Family Trait People Phenos Pairs Info ELOD20 1 trait 4 2 1 0.099 0.001 2 trait 4 2 1 0.025 0.000 3 trait 4 2 1 1.989 0.017 4 trait 4 2 1 0.269 0.002 5 trait 4 2 1 0.327 0.003 (... additional rows follow for other families)
Each row indicates the family and trait of interest, followed by number of individuals and phenotypes in each family, the number of phenotyped relative pairs and the relative informativeness of the family. The final column indicates the expected LOD score for a region with a locus specific heritability of 20% when a fully informative marker is typed. In this case family 3 seems particularly informative (you can try and find out why by examining the phenotypes for each individual in the asp.ped pedigree file).
Expected LOD scores are proportional to the squared locus specific heritability. To calculate expected LOD scores for a different effect size, simply multiply the expected LOD score by (heritability/20)^2, where H2 denotes your desired effect size and ^2 denotes the square operator. For example, for an effect size of 40%, you should multiply each expected LOD score by 4.
Often multiple quantitative traits may be available in a particular dataset. Each of these traits is likely to have a distinct mean, variance and heritability in the population. The -t models_file specifies the name of a text file listing analysis models, one for each trait. Using a models table allows distinct models to be specified for each phenotype in the pedigree file.
A models table includes four columns. The first column indicates the trait name and is followed by columns indicating the trait mean, variance and heritability. Optionally, a fifth column can be included with a label for each model. Here is an example:
<sample regression models file> TRAIT MEAN VARIANCE HERITABILITY LABEL Weight_Kilograms 75 10 0.63 metric_analysis Weight_Pounds 160 40 0.63 imperial_analysis <end of sample regression models file>
Now that you know how to carry out a pedigree-wide regression analysis using MERLIN you might want to find out estimate empirical p-values using simulation, or perhaps explore the sections on error detection, linkage analysis, haplotyping or ibd estimation.