|
MERLIN - Reference Sheet
The following is a summary of all available MERLIN command line options and their meanings:
Input Files and Basic Parameters
-d datafile
- Selects input data file, in linkage or QTDT format.
-p pedfile
- Selects pedigree file, with genotype, phenotype and family structure information
Newer versions of Merlin (>1.1) can combine multiple data and pedigree files on the fly.
To do this, list multiple data files separated by commas after the -d option, for example, -d
pheno.dat,geno.dat,
and also list the corresponding pedigree files separated by commas after the -p option,
for example, -p pheno.ped,geno.ped.
-x missing_value_code
- Selects the missing value code for quantitative phenotypes and covariates in the pedigree
file. If possible, it is always safer to replace missing values with 'x', rather than use
this option.
-m mapfile
- File indicating chromosome and centimorgan position for each marker. Use with QTDT
format input files. Recombination fractions will be derived from marker
positions using the Haldane mapping function.
-f [a|e|f|m|file]
- Source for allele frequency information. Allele frequencies can be set in a user
user specified file (-f filename), they can be estimated using maximum likelihood (-fm),
or they can be estimated by counting in founders (-ff) or in all individuals (-fa), or
assumed equal (-fe). For use with QTDT format input files.
-r seed
- Selects a different random sequence for simulation and sampling of haplotypes.
General Analyses
--error
- Find unlikely genotypes. Likely errors are listed in merlin.err file.
--information
- Calculate information based on entropy at each analysis position.
--likelihood
- Calculate likelihood of observed genotype data.
--model parametric_models.tbl -
- Calculate parametric LOD scores, using the models specified in
parametric_models.tbl. For a detailed description of this
option, see the MERLIN
parametric linkage analysis tutorial and the
MERLIN reference.
IBD State Calculations
--ibd
- Output pairwise IBD coefficients to merlin.ibd
--kinship
- Output pairwise kinship coefficients to merlin.kin
--matrices [see * note]
- Calculate possible pairwise IBD matrices and their probabilities for
each family. This information is stored in the file merlin.kmx
--extended [see * note]
- Output extended IBD state information
to merlin.s15. Extended IBD states track sharing of maternal and paternal
alleles separately and also provide additional information for inbred pedigrees.
--select
- Select most informative affected individuals
on the basis of allele sharing information, and record the results in the file merlin.sel.
If it is only practical to genotype a single individual per family in an association study,
genotyping these individuals can improve power
(Fingerlin et al, 2004).
Non-Parametric Linkage Analyses
--npl
- Use the Whittemore and Halpern NPL all statistic to test for
allele sharing among affected individuals. Also calculates a
LOD score using the Kong and Cox linear model.
--pairs
- Use the Whittemore and Halpern NPL pairs statistic to test for
allele sharing among affected individuals. Also calculates a
LOD score using the Kong and Cox linear model. Versions 0.10.1
and higher also consider sharing within inbred individuals when
computing this statistic.
--qtl
- Use a non-parametric statistic to test for sharing among
individuals with similar phenotypes. Use the sample mean
to estimate the population mean, and calculate a
LOD score using the Kong and Cox linear model.
Follow
this link for additional details on this option.
--deviates
- Similar to --qtl, but assumes that phenotypes are deviates
from the population mean. Follow
this link for additional
details on this option.
--exp
- Calculate non-parametric LOD scores using the Kong and Cox
exponential model. Although more time consuming, this option
can be powerful in datasets that show very strong linkage
signals or which include larger pedigrees.
--zscores
- Generate a compact file summarizing family-specific NPL scores
at each location. The file can be used for additional follow-up
analyses.
Variance Components Linkage Analysis
--vc
- Perform variance components linkage analysis assuming no
dominance. Also calculates sample heritability for each
trait.
--useCovariates
- Model covariate effects during analysis. In QTDT format
data files, covariates are indicated by "C" data type.
--ascertainment
- Model single proband ascertainment. In ascertained families, the
proband can be tagged by setting his individual id to "proband"
or by including a dummy affection status variable named "proband"
and setting its value to 2 (or affected) for probands and missing
otherwise.
--unlinked alpha
- Use a simple heterogeneity model for linkage. The model assumes that
a fraction alpha of the families are unlinked.
Association Analyses
--infer
- Estimate missing genotypes in a pedigree. When this option is selected,
MERLIN will estimate the posterior distribution of each missing SNP genotype
conditional on available genotype data. A new pedigree file will be
generated including the most likely genotype for each individual [whenever
this most likely genotype has a posterior probability of > 95%], the
probability that each missing genotype is a homozygote for the reference
allele, the probability that each missing genotype is an heterozygote, and the
expected number of copies of a reference allele in each missing genotype.
Genotypes will only be inferred for markers with exactly two alleles.
Multi-allelic and monomorphic markers will not be included in the output
file.
If you want a smaller output pedigree file, consider the --inferBest,
--inferExpected and --inferProbabilities options as alternatives.
--assoc
- This option uses a variance component model to estimate an additive effect
for each SNP and carry out an association test. Before evaluating evidence
for association, missing genotypes are estimated to increase power.
--fastAssoc
- This option uses a rapid score test to estimate an additive effect for
each SNP. It is slightly less accurate, but much more computationally
efficient, than the --assoc option and recommended for first pass
analysis of genome-wide scans and other large datasets.
--filter threshold
- When the --fastassoc option is used, only output p-values below
a certain threshold.
--custom covariates.tbl
- The custom file allows users to customize the covariate model for each
trait. For each trait to be analyzed, this file should contain two lines.
The first line should include the TRAIT keyword followed by the trait name.
The second line should include the COVARIATE keyword followed by a list
of appropriate covariates.
This option affects both association analysis and quantitative trait linkage analyses.
If you use the above options for association analysis, please cite
Chen and Abecasis (AJHG, 2007)
which provides a full account of the approach.
Analysis Positions
--steps:n
- Carry out analyses at n equally spaced locations to analyse between consecutive
markers
--minStep:dist
- When carrying out analyses between markers, ensure that consecutive analysis locations
are separated by at least dist centiMorgans.
--maxStep:dist
- When carrying out analyses between markers, ensure that consecutive analysis locations
are separated by no more than dist centiMorgans.
--grid:n
- Carry out analysis along an n-cM grid of equally spaced locations, starting at
the location specified with --start option and continuing up to the location specified
with the --stop option. If --start and --stop are left blank, start at the first marker
and stop after the final marker in each chromosome.
--start:pos
- Start analyses at pos centiMorgans.
--stop:pos
- Stop analyses at pos centiMorgans.
--positions:pos1,pos2,...
- Carry out analysis only at the specified positions. Each position can be a marker name or
centimorgan location.
Haplotyping Analyses
--best
- Output the most likely haplotype vector to merlin.chr
--sample
- Samples a likely haplotype vector according to likelihood and
outputs it to merlin.chr. Use the random seed parameter,
-r, to sample a different vector.
--sample:n
- Repeats the sampling process n times for each family.
--all
- List all possible haplotype vectors for each family in merlin.chr. Must be
used with the --zero recombination option.
--founders
- List founder haplotype graphs in merlin.hap.
--horizontal
- Use an alternative, horizontal format for outputting haplotypes. In this
alternative format alleles for each individual haplotype are listed along
a single line
Recombination Options
--zero
- Assume no recombination between markers. Families with obligate recombinants
will be discarded.
--one, --two, --three
- Allow 1, 2 or 3 recombination events between consecutive informative markers.
This can improve performance of Lander-Green algorithm convolutions and
still provide accurate solutions when markers are closely spaced.
--singlepoint
- Consider each marker individually.
Marker Clustering Options for Modelling Linkage Disequilibrium
--cluster clustering.tbl
- Model linkage disequilibrium for clusters of neighboring markers
defined in the clustering.tbl file. The file should indicate
groups of markers that are in linkage disequilibrium and, optionally,
frequencies of the haplotypes they define. If haplotype frequencies
are not provided, they will be estimated automatically. For more
details of options for modeling linkage disequilibrium, see the
tutorial on modeling marker-marker
disequilibrium with MERLIN
--distance threshold
- Automatically define clusters and estimate haplotype frequencies
for groups of markers that are less than threshold cM apart.
--rsq threshold
- Automatically define clusters including pairs of SNPs for which
pairwise r2 exceeds threshold and all intervening
markers.
--cfreq
- This option instructs merlin to generate a file summarizing clusters
of markers in linkage disequilibrium and the haplotype frequency
distribution within each cluster. This file can be used with the
--cluster option in subsequent analysis.
Resource Usage
--bits:n
- Do not attempt to analyse pedigrees of more than n bit complexity.
--megabytes:n
- Do not attempt to allocate more than n megabytes of memory.
Starting with version 1.1 Merlin will
select different strategies to analyze larger pedigrees when it expects the
standard approach will exhaust memory. This option can stop unnecessary crashes
and facilitate the analysis of large pedigrees.
--minutes:n
- Do not attempt to analyse families where calculations for the forward portion
of the Markov-Chain require more than n minutes.
Performance
--trim
- Trim pedigree by removing individuals with no phenotype or genotype data who
are not required to define kin relationships between other individuals in the
pedigree
--noCoupleBits
- Disable founder couple symmetry. This option generally slows things down, but
allows grandmaternal and grandpaternal haplotypes to be distinguished during
haplotyping analyses even when grandparents are not genotyped.
--swap
- Use swap file to reduce memory usage.
--smallSwap
- Uses an alternative strategy to manage swap files, so as to conserve disk space.
Output Formatting
--quiet
- Do not output progress reports when analyzing large families
--markerNames
- Use marker names, rather than cM positions, to label results
--frequencies
- Output allele frequencies calculated internally by MERLIN to a file
--perFamily
- Output perFamily LOD scores for each family to a file. For non-parametric
analyses, output includes the non-parametric Z score for each family
and two LOD scores calculated using the
Kong and Cox method, one using best fitting overall model (pLOD) and the
other maximized within each family (LOD). For variance components analyses
the output includes each family's contributions to the log-likelihood
under the null and alternative hypothesis as well as as to the LOD score.
--pdf
- Output LOD score plots to pdf file merlin.pdf.
--tabulate
- Generate tables summarizing key analysis results in tab-delimited format. These
tables can be convenient for subsequent analysis.
--prefix label
- Requests that output file names should be derived from label. For example,
estimated haplotypes should be stored in a file called label.chr.
Simulation Options
--simulate
- Perform gene dropping simulation. Generate random genotypes for
each marker, conditional on current missing data pattern, genetic
map and allele frequencies. Use the random seed option (-r seed)
to select a different replicate. For more details on this option,
follow this link.
--reruns N
- Repeat simulation N times.
--trait AFFECTION,FREQ(-),PEN(+/+),PEN(+/-),PEN(-/-),POSITION -
--trait QTLNAME,SNP,Var(QTL),Var(Polygenes),Var(Environment) -
- When combined with the --simulate option, this instructs Merlin to simulate a
quantitative trait or discrete trait. The --trait option is interpreted slightly
differently in each case.
For a discrete trait, genotypes are simulated conditional on observed phenotypic
data: if a particular family includes two affected individuals, Merlin will sample
genotypes conditional on that outcome and the genetic model you specify. Merlin
will simulate genotypes conditional on the phenotypes in a discrete trait labeled
AFFECTION. It will assume that the disease risk allele has frequency FREQ(-) and
that the propability of developing disease, conditional on the (unobserved) disease
locus genotypes is PEN(+/+),PEN(+/-) and PEN(-/-). The disease locus will be placed
at position POSITION.
For a quantitative trait, simulated phenotypes will replace observed phenotypes (but
the original missing data pattern will be respected so that if, for example, all
parental trait values are missing in the original data, they will also be missing in
the simulated data). With the second format above, simulated traits values will be
stored in a column labeled QTLNAME. The
QTL be influenced by SNP which will explain Var(QTL) of the total
variance. The remaining variance will be polygenic, Var(Polygenes) or
environmental, Var(Environment).
The QTL phenotypes will replace the original trait values for QTLNAME, but
will respect the original missing data pattern. The QTL genotypes will also replace the
original genotypes for SNP, but will respect the original missing data pattern.
An examplar set of options might be: --simulate --trait BMI,rs9930506,0.01,0.39,0.60.
This would simulate trait BMI such that marker rs9930506 accounts for ).01 of the
variance, with residual polygenic variance of 0.39 and residual environmental
variance of .60.
--save
- Save simulated pedigree and corresponding data, map and allele
frequency files as merlin-replicate.ped, merlin-replicate.dat,
merlin-replicate.map and merlin-replicate.freq, respectively.
Miscellaneous options
--simwalk2
- Perform a smart linkage analysis in conjuction with Simwalk2. MERLIN
tackles the small pedigrees, Simwalk2 does the larger ones, you get
one answer. This option requires Mega2 version 2.3 or later and MERLIN
version 0.9.2 or later. Please see the
Mega2 Manual for more
detailed information.
--inverseNormal
- Apply quantile normalization to each quantitative trait prior to analysis.
Options marked * are currently available on a trial basis. They probably require
careful validation, but they may still be useful.
MINX: Chromosome X Analyses
MINX (MERLIN in X) is an X-specific version of Merlin. It is available in distributions
of MERLIN version 0.9.1 and later. There is currently no manuscript describing MINX
performance and algorithms in detail. Although I believe MINX results to be correct,
the methods are unpublished and I would advise using with care.
MINX implements X-chromosome specific versions of the functions provided by the
standard Merlin implementation. Males are hemizogous and carry only one X chromosome.
MINX assumes that males are scored as homozygous in the input pedigree file.
MERLIN-REGRESS: Pedigree Wide Regression Analysis
Sham et al. (Am J Hum Genet
71:238-253)
MERLIN-REGRESS implements an extension of the Haseman-Elston quantitative trait linkage
analysis procedure that extracts linkage information from trait squared-sums and
differences from all non-inbred relative pairs. For a detailed analytical description
of this approach, please see the manuscript by Sham et al. (2000).
This regression approach provides a powerful quantitative trait linkage test even in
selected samples, but requires specification of the trait mean, variance and covariances
between different relative pairs. The present implementation derives covariances between
different types of relative pairs from their kinship coefficients and an estimate of the
trait heritability.
Most of the MERLIN-REGRESS options are shared with MERLIN and described above. The following are
MERLIN-REGRESS specific options:
Basic Trait Modeling Options
--mean:x
- Mean for the trait under investigation in an (unselected) population.
Misspecifying this parameter will generally result in decreased power.
--variance:x
- Variance for the trait under investigation in an (unselected) population.
Misspecifying this parameter will generally result in decreased power.
--heritability:x
- Heritability for the trait under investigation in an (unselected) population.
Underestimating the trait heritability can result in inflated error rates, so
it is prudent to avoid setting this value too low.
--testRetest
- Specifies the correlation between repeated measures of the same variable. This
is useful when multiple measurements have been taken (and averaged) for each
subject. To use this option, the pedigree file should include covariates (one per trait)
indicating the number of times each subject was measured for each trait. This variable must be
named TRAIT_REPEATS for each TRAIT where repeat measurements are available.
-t modelsFile
- Specifies the name of a file listing alternative models for analysis. This should
be a space delimited file where each line indicates a trait name, mean, variance
and heritability. An example is available
in the tutorial. When this table exists, the --mean, --variance and --heritability
command line options are ignored.
Options for Modeling Random Samples
--randomSample
- Specifies that the sample was not selected, and that MERLIN-REGRESS should use
the observed sample mean, variance and heritability as estimates of population
parameters.
--useCovariates
- Specifies that covariates in the pedigree file should be "regressed-out" before
analysis. This option is only available for random samples.
--inverseNormal
- Specifies that inverse normal transformation (where each measurement is transformed
to its corresponding quantile in a standard normal transformation) should be
applied to the data before analysis. This option is only available in random
samples, and can be helpful in dealing with data where outliers are present.
Other Options
--rankFamilies
- Rank families according to their expected informativeness. This information can
help focus genotyping efforts.
| |