University of Michigan Center for Statistical 
Search Liming's website


How to apply MERLIN to your repeated measure data

To analyze repeated measured data in Merlin, using the variance component models described in Liang et al (2009), you will simply need to use the --vc option and format your input files correctly (as described below) so that Merlin recognizes the presence of repeated measure data. In the special case where all subjects have been measured the same number of times and there are no missing measurements, you will get correct linkage statistics using averaged measurements as input and carrying out a standard variance component analysis. For other situations, read below for details. The "full" model -- which directly examines each measurement for every subject -- should be more powerful but is also more computationally intensive. The "average" model, which examines averaged measurements and weighs each averaged measurement according to the number of times it was measured is a little less powerful, but computationally more efficient.

You may find it useful to learn about MERLIN input formats in general, by clicking here.

Full model: to model all individual measures for each subject

To use the full model, each repeated measure should be recorded as a trait (T) in the .dat and .ped files. If a trait has been measured up to, say, 4 times, there will be 4 columns in the pedigree file with the corresponding data. If a particular measurement is missing for some individuals, simply enter an 'x' in the corresponding row. In the .dat file, you'll need to use a special convention for naming the repeated measures that enables MERLIN to recognize them as such. Specifically, the name of repeated measures should end with _measurement1 (for the first measurement), _measurement2 (for the second measurement), and so on...

An example follows...

<example contents of the .dat file for a maximum of 4 measurements of the LDL trait >
T   LDL_measurement1
T   LDL_measurement2
T   LDL_measurement3
T   LDL_measurement4
<example content of the .ped file>
    1  1  0  0  1  3.2  4.2  5.4  6.3 
    1  2  0  0  2  2.9  3.8  X    X
    1  3  1  2  1  2.5  7.6  8.8  X
    1  4  1  2  2  6.6  3.6  X    1.8
In this example, individual 1 has 4 triat measurements, individual 2 has only 2 measurements, and individuals 3 and 4 have 3 measurements.

Average model: to model average measurements for each subject

To use this computationally efficient model, the average of all repeated measures for each subject should recorded as a trait (T) in the data and pedigree files. The number of measurements taken for each subject should also recorded as a specially named covariate (C). The name for this covariate should start with the same label used for the average measurement trait and end with the string '_repeats'.

<example content of the .dat file>
T   LDL   
C   LDL_repeats

<example contents of the .ped file>
    1  1  0  0  1  4.775   4 
    1  2  0  0  2  3.350   2 
    1  3  1  2  1  6.300   3
    1  4  1  2  2  4.000   3

A toy example of analysis:

To fit the variance component model, you need to specify the input data (-d parameter), pedigree (-p parameter) and map (-m parameter) files and the --vc option. Other related options include --start (start position for analysis), --stop (stop position for analysis) and --grid (the distance between grid points in the unit of Mb. LOD score at each grid point will be output).

prompt> merlin -d LDL.dat -p LDL.ped -m --vc

After running the command, you should see the MERLIN banner and a summary of currently selected options:

MERLIN LOCAL - (c) 2000-2007 Goncalo Abecasis

References for this version of Merlin:

Abecasis et al (2002) Nat Gen 30:97-101 [original citation]
Fingerlin et al (2004) AJHG 74:432-43 [case selection for association studies]
Abecasis and Wigginton (2005) AJHG 77:754-67 [ld modeling, parametric analyses]
Fingerlin et al (2006) Gen Epidemiol 30:384-96 [sex-specific maps]
Chen and Abecasis (2007) AJHG 81:913-26 [qtl association analysis, qtl simulation]

The following parameters are in effect:
Data File : LDL.dat (-dname)
Pedigree File : LDL.ped (-pname)
Missing Value Code : -99.999 (-xname)
Map File : (-mname)
Allele Frequencies : ALL INDIVIDUALS (-f[a|e|f|m|file])
Random Seed : 123456 (-r9999)

Data Analysis Options
General : --information, --likelihood, --model [param.tbl]
Errors : --flag, --perAllele [0.00], --perGenotype [0.00], --fit
IBD States : --ibd, --kinship, --matrices, --extended, --select
NPL Linkage : --npl, --pairs, --qtl, --deviates, --extras, --exp
VC Linkage : --vc [ON], --useCovariates, --ascertainment, --unlinked [0.00]
Association : --infer, --assoc, --fastAssoc, --filter, --custom [cov.tbl]
Haplotyping : --best, --sample, --all, --founders, --horizontal
Recombination : --zero, --one, --two, --three, --singlepoint
Positions : --steps, --maxStep, --minStep, --grid [], --start [], --stop []
LD Clusters : --clusters [], --distance, --rsq, --cfreq
Limits : --bits [24], --megabytes, --minutes
Performance : --trim, --noCoupleBits, --swap, --smallSwap, --cache []
Output : --quiet, --markerNames, --frequencies, --perFamily, --pdf, --tabulate, --prefix [merlin]
Simulation : --simulate, --reruns, --save, --trait []

Estimating allele frequencies... [using all genotypes]

After a few moments, you should see analysis results at each location:

Information on repeated measurements found
An average of 5.5 measurements per subject were taken
A measurement error component will be fitted

Phenotype: simp_phen [VC] (1000 families, h2 = 37.96%, error = 40.42%)
Position H2 ChiSq LOD pvalue
0.000 9.88% 15.35 3.33 0.00004


University of Michigan | School of Public Health