SIMLINK:  A PROGRAM FOR ESTIMATING THE POWER OF A
PROPOSED LINKAGE STUDY BY COMPUTER SIMULATION

Version 4.12


April 2, 1997

Michael Boehnke and Lynn M. Ploughman

Department of Biostatistics
School of Public Health
University of Michigan
Ann Arbor, Michigan  48109-2029
Phone:  (734) 936-1001
FAX:  (734) 763-2215
Email:  boehnke@umich.edu


TABLE OF CONTENTS

   I. Introduction

  II. Definitions

 III. Assumptions of the Power Calculation

  IV. Options

   V. Outline of the Power Calculation

  VI. Input for SIMLINK

 VII. Output from SIMLINK

VIII. Four Sample Problems

  IX. Array Sizes, File Management, and Other Practical Hints

   X. Error Conditions

  XI. References


I. Introduction

This document describes a computer program to estimate the probability, or 
power, of detecting linkage given family history information on a set of 
identified pedigrees.  It is assumed that the pedigrees are of known structure 
and that some data may be available for the genetic trait that is to be mapped.  
The analysis described here can be applied to autosomal or X-linked traits 
determined by a single major locus.  The trait may be dichotomous with complete 
or reduced penetrance, or may be quantitative.  This power calculation is most 
usefully undertaken after family history data are gathered, but prior to 
examination and testing of pedigree members to obtain marker information. The 
result of this power calculation is an objective answer to the question:  Will 
my families be sufficient to demonstrate linkage?  The theoretical basis for 
this program is given by Ploughman and Boehnke (1989) and Boehnke (1986).

The program SIMLINK (LODSTAT is now incorporated as part of SIMLINK) required 
for this power calculation has three major components:

(A) Trait and Marker Genotype Simulation:  This component of the program 
simulates cosegregation of trait and marker loci in pedigrees.  If simulating 
one marker locus for lod score analysis, a particular (set of) recombination 
fraction(s) is assumed; if simulating two flanking marker loci for analysis by
location scores, a particular map distance is assumed.  The program assumes that 
phenotypic information may be available for some pedigree members for the trait, 
but not for the marker(s). Genotypes are simulated in an unbiased fashion 
(Boehnke, 1986) so that individuals are assigned a trait genotype consistent 
with their observed trait phenotype and the phenotypes of the other pedigree 
members.  Marker genotype simulation is based on population marker gene 
frequencies, trait genotypes, and the recombination fraction(s) between the 
trait and marker loci, and assumes Hardy-Weinberg and linkage equilibrium.  
Traits can be genetically homogeneous, or can be heterogeneous between 
pedigrees.  Individuals identified as unavailable for sampling are assigned 
unknown marker phenotypes for subsequent lod or location score calculation.

(B) Lod or Location Score Calculation:  This component of the program calculates 
lod or location scores based on the simulation results for each replicate 
pedigree.  Lod scores are calculated if one marker locus was simulated; location 
scores are calculated if two flanking marker loci were simulated.  A modified 
version of the computer program MENDEL (Lange et al., 1988) acts as a subroutine 
for implementing these calculations.

(C) Linkage Information Calculation:  This component of the program calculates 
sample statistics for the maximum lod/location score distributions, resulting in 
estimates of (1) expected maximum lod/location scores, (2) probabilities of 
maximum lod/location scores sufficiently large to conclude linkage, and (3) 
expected exclusion regions when the trait is not linked to the marker(s).  
Expected maximum lod scores for each pedigree conditional on whether individual 
pedigree members are homozygous or heterozygous can be used to identify key 
individuals for the linkage analysis.

To estimate the power of a proposed linkage study, multiple replicates of each 
pedigree for each of several true recombination fractions or map distances 
between the trait and marker loci are simulated.  After a replicate pedigree has 
been simulated for each pedigree type and each true recombination fraction or 
map distance, MENDEL calculates lod or location scores.  The resulting scores 
are used to estimate the maximum lod/location score for each pedigree and for 
the set of pedigrees and to update the linkage information statistics.  Once 
this process has been completed for the desired number of replicates, estimates 
of the linkage information provided by the pedigrees, including expected maximum 
lod/location scores and the probabilities of maximum lod/location scores greater 
than particular constants, are calculated and output to a series of tables.  The 
probability of a maximum lod/location score greater than 3.0 gives the 
probability that the pedigree or set of pedigrees will be sufficient to 
demonstrate linkage.

We thank Kenneth Lange and Daniel Weeks for their work in developing MENDEL and 
for generously allowing us to incorporate portions of it into SIMLINK.  Any 
problems that arise through the use of the modified version of MENDEL as a 
component of SIMLINK are the responsibilities of Boehnke and Ploughman, and 
questions should be directed to us.


II. Definitions

Several terms are used in this document that are of key importance.  These 
include:

True Recombination Fraction:  recombination fraction used to simulate replicate 
pedigrees when simulating one marker locus.

True Map Distance:  map distance between the two flanking marker loci used to 
simulate replicate pedigrees when simulating two flanking marker loci.  
Replicate pedigrees are simulated placing the trait locus at a series of 
distances along the interval between the two marker loci.  All map distances are 
converted to recombination fractions using Haldane's (1919) mapping function for 
use in the simulation.

Test Recombination Fraction:  recombination fraction at which lod/location 
scores are calculated.  In general, there will be several test recombination 
fractions for each true recombination fraction or map distance, since by chance 
a replicate pedigree may achieve its maximum lod/location score at a 
recombination fraction or map position different from the true one.

Replicate Pedigree:  a copy of one of the user-supplied pedigrees for which 
trait and/or marker phenotypes are simulated.  In general, a large number of 
replicate copies should be simulated for each pedigree to achieve sufficiently 
accurate estimates of statistical power and mean maximum lod/location scores.


III. Assumptions of the Power Calculation

This power calculation for a linkage study assumes:

(A) One or more pedigrees have been identified in which a dichotomous or 
quantitative trait determined by a two-allele genetic locus is segregating.  If 
the dichotomous trait exhibits incomplete penetrance, the penetrance function 
can be described by a piecewise linear or cumulative normal penetrance function.

(B) Pedigree structures (that is, relationships among pedigree members) are 
known for all pedigrees.  Trait phenotypes may be known (but need not be) for 
some or all pedigree members.  Marker phenotypes are unknown.

(C) Mode of inheritance is known for the trait.  If mode of inheritance for the 
trait is not clear, the power calculation corresponds to the power of a linkage 
study if the assumed trait mode of inheritance is true.  Given several different 
candidate trait models, it may be desirable to carry out a power calculation for 
each model.

(D) Hardy-Weinberg and linkage equilibrium.

(E) No interference, so that Haldane's (1919) mapping function is appropriate. 
This assumption is relevant only if flanking markers are simulated.

(F) No MZ-twins are present in the pedigrees.  Given a pedigree with MZ twins, 
we recommend including only one of the twins in the data set for the power 
calculation.


IV. Options

The power calculation outlined here can be carried out in several different ways 
depending on the trait of interest and the interests and preferences of the 
investigator.  Options available include:

(A) Chromosomal Location:  The trait and marker loci may be either all autosomal 
or all X-linked. 

(B) Marker Loci:  The investigator must choose the situation to simulate:  
either a single marker locus or a pair of flanking marker loci.  Marker mode of 
inheritance can follow any simple Mendelian pattern.  The default maximum number 
of alleles per marker locus is 4, but can be increased by changing a set of 
dimension statements and recompiling.  Gene frequencies must also be specified.  
If in the proposed study particular marker loci are to be used or are of 
predominant importance, modes of inheritance and allele frequencies for those 
markers can be simulated.  If not, a reasonable choice might be to assume two-
allele, codominant markers with equal allele frequencies.

(C) Recombination Fractions or Map Distances:  The results of the power 
calculation depend very strongly on the distance to the linked marker(s).  
Therefore, it may be helpful to consider several true recombination fractions 
between the trait locus and a single marker locus or to consider several true 
map distances between the two flanking marker loci.

(D) Unlinked Marker:  It is also of interest to estimate the region about an 
unlinked marker or pair of unlinked markers that might be excluded from linkage.  
This exclusion region may be estimated.

(E) Genetic Heterogeneity:  Genetic heterogeneity can be allowed for using the 
admixture model for heterogeneity (Smith, 1963). Under this model, the 
probability of the trait being linked in a given pedigree is alpha; with 
probability 1 - alpha the trait is unlinked.  This model assumes that while 
different pedigrees may have different genetic forms of the disease, within a 
pedigree only a single genetic form is present.  If genetic heterogeneity is 
allowed for, two different lod scores are calculated:  the standard lod score 
which assumes genetic homogeneity, and a lod score which allows for maximization 
as a function of both the recombination fraction and the linked fraction alpha.  
Risch (1989) has demonstrated that for simple genetic models and nuclear family 
data, ignoring heterogeneity and calculating the standard lod score tends to be 
the more powerful choice unless the linked fraction alpha is small, the 
pedigrees are large, and the recombination fraction is small.  The relative 
merits of these two analytic strategies for a specific combination of genetic 
model and pedigree data set can be evaluated using SIMLINK.

(F) Identifying Key Pedigree Members:  Often, particular pedigree members are of 
key importance in determining the linkage information provided by a pedigree.  
To assess that importance, we allow calculation of the expected maximum lod 
score for each pedigree conditional on the marker heterozygosity/homozygosity 
status of each pedigree member.  We regard an individual as a key pedigree 
member if there is a large difference in the expected maximum lod score for 
his/her pedigree depending on whether or not (s)he is marker heterozygous.


V. Outline of the Power Calculation

The power calculation is a four step process, involving (A) calculation of 
genotype conditional probabilities for each pedigree member; (B) simulation of a 
replicate of each of the user-supplied pedigree(s); (C) calculation of 
lod/location scores for the replicate of each of the pedigree(s); and (D) 
calculation of statistics based on the lod/location scores.  Step (A) is carried 
out once prior to replicate pedigree simulation, steps (B) and (C) are repeated 
in sequence for each replicate, and step (D) is carried out after all replicates 
have been simulated. Each of these steps is described in this section.

(A) Calculation of Genotype Conditional Probabilities:  To facilitate unbiased 
genotype simulation, conditional probabilities for the trait genotypes of each 
pedigree member are calculated conditional on the trait genotypes of (some of) 
their relatives.  This is accomplished by a single trait-model likelihood 
evaluation using MENDEL.

(B) Simulation of Pedigrees:  SIMLINK simulates cosegregation at the trait and 
marker loci for multiple replicates of each pedigree.  Simulations are carried 
out at the specified true recombination fractions for one marker locus or at the
recombination fractions corresponding to the specified map distance for two 
flanking marker loci.  Input required includes (for details, see Input):

  (1) Family History Information for Each Pedigree Member:  an   ID, IDs for the 
parents, gender, trait phenotype if known, trait availability indicator, 
and, if desired, a variable (e.g. age) which along with gender and 
genotype determines the penetrance function.

  (2) Trait and Marker Locus Descriptions:  mode of inheritance and allele 
frequency information for the trait and marker loci in the form required 
by MENDEL.

  (3) Recombination Fractions/Map Distance:  true recombination   fractions at 
which cosegregation is to be simulated, if simulating one marker locus; a 
single map distance, if simulating two flanking marker loci.  For two 
marker loci, the  trait locus will be placed at positions along the 
interval between the two marker loci and the resulting map distances 
converted to recombination fractions using Haldane's (1919) mapping 
function.

  (4) Penetrance Function:  Currently, SIMLINK allows for a piecewise-linear 
penetrance function or a cumulative normal penetrance function for 
dichotomous traits.  The program allows for different forms of these 
penetrance functions for each trait genotype/gender combination and allows 
them to depend on one quantitative variable.  This variable typically will 
be age, and we will assume that it is age for the remainder of this 
document.  The piecewise-linear function assumes that a minimum penetrance 
holds for ages less than a minimum age, increases linearly to a maximum 
penetrance at a maximum age, and remains at the maximum penetrance for 
ages greater than the maximum age.  The cumulative normal penetrance 
function assumes that penetrance increases from the minimum penetrance at 
age minus infinity to the maximum penetrance at age plus infinity 
following a cumulative normal distribution with a specified mean and 
standard deviation. Quantitative traits with genotype-specific normal 
distributions are the third penetrance option.

  (5) Control Information:  Number of replicates to be simulated for each 
available pedigree, locus and pedigree file names, seeds for the random 
number generator, and other control variables.

SIMLINK creates pedigree files appropriate for MENDEL containing a single 
replicate of each pedigree type.  In each replicate pedigree, members with known 
trait phenotype are assigned their correct trait phenotype.  Pedigree members of 
currently unknown trait phenotype may be assigned a trait phenotype if desired;
marker phenotypes can also be simulated and assigned.  When simulating one 
marker locus, one marker phenotype will be listed for each true recombination 
fraction under which pedigrees were simulated; when simulating two flanking 
marker loci, two marker phenotypes, one per locus, will be listed for each pair 
of true recombination fractions under which pedigrees were simulated.

(C) Lod or Location Score Calculations:  Using the pedigree file created by 
SIMLINK, MENDEL calculates log likelihoods for subsequent calculation of lod 
scores or location scores.

(D) Calculation of Linkage Information Estimates:  SIMLINK calculates the 
following linkage information criteria for the pedigrees at the different true 
recombination fractions/map distances:

  (1) For linked markers:

      (a) the expected maximum lod/location score for each pedigree and for the 
summed pedigrees assuming homogeneity or allowing for heterogeneity 
(optional); and
      (b) the probability of a maximum lod/location score greater than specified 
constants for each pedigree, the summed pedigrees assuming homogeneity 
or allowing for heterogeneity (optional), and any one pedigree.

  (2) For unlinked markers:

      (a) the expected lod/location score for several test recombination 
fractions/map distances for each pedigree and the summed pedigrees; 
and
(b) the probability of a lod/location score greater than specified 
constants.

These information criteria may be used to estimate:

(1) The Power of the Linkage Study:  The power of a proposed linkage study is 
the probability of detecting a linked marker if it is tested.  Equivalently, it 
is the probability of a obtaining a maximum lod score of at least 3.0 for a 
linked marker (Morton, 1955).  This probability is estimated under (1b) above 
when the constant equals 3.0.  The power can be estimated for (a) each pedigree 
alone, (b) the summed pedigrees (under the assumption that the trait is caused 
by the same locus in all pedigrees), (c) the summed pedigrees allowing for 
between pedigree heterogeneity (optional), and (d) all the pedigrees but without 
summing the lod scores (allowing in the analysis for the possibility that the 
trait may be caused by two or more loci, but assuming in the simulation that 
only one locus is actually involved).

(2) The Expected Exclusion Region for An Unlinked Marker (Pair):  A lod score of 
less than -2.0 is customarily accepted as conclusive evidence for the exclusion 
of linkage (Morton, 1955). Calculating the expected lod/location scores for an 
unlinked marker (pair) at each of several test recombination fractions/map 
distances, yields an estimate of the exclusion region when testing for linkage 
to an unlinked marker (pair).

(3) Probability of Incorrectly Concluding Linkage:  Estimating the probability 
of a maximum lod/location score greater than 3.0 for a true recombination 
fraction of .50 gives the probability of incorrectly concluding linkage to an 
unlinked marker (pair).  In statistical terms, that is the probability "a" of 
making a type I error for a single marker (pair).  Since many (pairs of 
flanking) markers will often be considered, the overall probability of making a 
type I error is greater.  Assuming that the linkage calculations for the 
different (pairs of flanking) markers are independent, the overall probability 
of making a type I error becomes 1 - (1 - a)**n, where n is the number of (pairs 
of flanking) markers and "**" represents exponentiation.

In addition, SIMLINK will as an option calculate the expected maximum lod score 
for each pedigree conditional on the heterozygosity/homozygosity status of each 
pedigree member. This provides a means of identifying pedigree member(s) whose
marker status has a strong impact on the linkage information provided by the 
pedigree.


VI. Input for SIMLINK

Three input files are required:  (A) the control file, (B) the locus file, and 
(C) the pedigree file.

(A) The Control File:  The control file contains general information describing 
the power calculation.  The sample control file below requests a power 
calculation based on 100 replicates for a genetically homogeneous dominant trait 
called "TRAIT" with penetrance 0.80 in both males and females (independent of 
age).  Power is to be estimated for a marker linked at 0%, 5%, or 10% 
recombination to the trait; free recombination is also simulated.  The data will 
be echoed in the output file, and the effect of individual marker  
eterozygosity/homozygosity status will be determined.

     100       1       1       1       4       1       1       0
1.00
    0.00    0.05    0.10    0.50
    0.00    60.0    0.00    0.00
    0.00    60.0    0.80    0.80
    0.00    60.0    0.80    0.80
    0.00    60.0    0.00    0.00
    0.00    60.0    0.80    0.80
    0.00    60.0    0.80    0.80
       M       F
TRAIT
LOCUS.DAT
PEDIG.DAT
   31171    2413   19771

The following records in the given order and with variables and formats as 
described below are required in the control file (see Examples):

1. Control Information:  The following nine variables in order, each within an 8 
column field, all but the last right justified (8I8,F8.5):

Note:  This record and its format have been substantially altered since version
4.0.  The definition of NTHETA has also been changed to include free 
recombination.

Col  1- 8  NREP:   the number of replicate data sets to simulate.
Col  9-16  NMLOCI: the number of marker loci:
                   =1 then lod scores are calculated,
                   =2 then two markers are assumed to flank the
                      trait locus and location scores are
                      calculated.
Col 17-24  PENOPT: the indicator of the type of penetrance
                   function for the trait:
                   =1 a piecewise-linear penetrance function for
                      a dichotomous trait,
                   =2 a cumulative normal penetrance function for
                      a dichotomous trait,
                   =3 a quantitative trait due to a mixture of
                      normal distributions.
Col 25-32  IFREE:  indicator of whether free recombination
                   between the trait and marker locus (loci)
                   is to be simulated:
                   =0 if no,
                   =1 if yes.
Col 33-40  NTHETA: if using one marker locus, the number of
                   different true recombination fractions between
                   the trait and marker loci to be considered.
                   Ignored if using two flanking marker loci.
Col 41-48  IECHO:  data echoing indicator
                   =0 if data will not be echoed in the output
file
                   =1 if data will be echoed in the output file
Col 49-56  INDINF: identify key individuals by heterozygosity/
                   homozygosity status; =0 if no, =1 if yes
Col 57-64  LNKOPT: linkage heterogeneity option indicator
                   =0 if genetic homogeneity is assumed
                   =1 if genetic heterogeneity is allowed
Col 65-72  ALPHA:  probability that a pedigree is segregating the
                   linked form of the trait (ignored if LNKOPT=0)

2. Recombination Fractions/Map Distance:  If lod scores are to be calculated 
(NMLOCI=1), the set of possible true recombination fractions between the trait 
and marker loci input in fields eight columns wide (8F8.6).  If location scores 
are to be calculated (NMLOCI=2), the true map distance in Morgans between the 
two marker loci (only one distance is allowed), followed by the distance option 
variable DISOPT, input in fields eight columns wide (F8.6,I8), with DISOPT right 
justified.

Col  1- 8  First true recombination fraction if one marker locus
           or the true map distance if two marker loci,
Col  9-16  Second true recombination fraction if one marker locus
           or DISOPT if two marker loci (right justified)
           DISOPT=0 says to allow for multiple locations for the
           disease locus between the two markers;
           DISOPT=1 says to assume the disease locus is in the
           middle; DISOPT=1 requires much less computation
Col 17-24  Third true recombination fraction if one marker locus
etc.

3. Parameter values for the trait penetrance function:  For each possible trait 
genotype/gender combination, input four parameters per line in fields eight 
columns wide (4F8.4) (see Outline of the Power Calculation):

line 3:    for a male with trait genotype 11;
line 4:    for a male with trait genotype 12;
line 5:    for a male with trait genotype 22;
line 6:    for a female with trait genotype 11;
line 7:    for a female with trait genotype 12;
line 8:    for a female with trait genotype 22.

Here, alleles 1 and 2 correspond to the first and second trait alleles entered 
in the locus file, respectively.

For a dichotomous trait with a piecewise linear penetrance function (PENOPT=1):

     Col  1- 8  minimum age (or whatever quantitative
                variable is to be used),
     Col  9-16  maximum age,
     Col 17-24  minimum penetrance, i.e., penetrance at the
                minimum age,
     Col 25-32  maximum penetrance, i.e., penetrance at the
                maximum age.

Note:  If a constant penetrance of 80% is desired, independent of age, a line 
with the values  0.  60.  .80  .80 could be entered.

For a dichotomous trait with a cumulative normal penetrance function (PENOPT=2):

     Col  1- 8  mean age for the penetrance function,
     Col  9-16  standard deviation of age for the penetrance
                function,
     Col 17-24  minimum penetrance assuming an age of minus
                infinity,
     Col 25-32  maximum penetrance assuming an age of plus
                infinity.

If dealing with a quantitative trait due to a mixture of normal distributions 
(PENOPT=3):

     Col  1- 8  mean trait value at age zero,
     Col  9-16  rate at which the mean trait value changes
                linearly with age,
     Col 17-24  standard deviation of the trait value at
                age zero,
     Col 25-32  rate at which the standard deviation of the
                trait value changes linearly with age.

4. Male and female symbols:  The symbols used to identify males and females in 
the pedigree file (e.g., M and F or 1 and 2). Enter the symbols in character 
fields eight columns wide (2A8):

Col  1- 8  male symbol,
Col  9-16  female symbol.

5. Trait locus name:  The name given the trait locus in the locus file.  Enter 
the name in a character field eight columns wide (A8):
Col  1- 8  trait locus name.

6. Locus file name:  The name of the locus file, in character format (A).
 
7. Pedigree file name:  The name of the pedigree file, in character format (A).

8. Seeds for the random number generator:  These three positive integers will be 
used to start the random number generator used in the simulation (Wichman and 
Hill, 1982).  The values should be relatively large, though no larger than 
32767, and should be changed from one run to the next.  Input the numbers right
justified in fields eight columns wide (3I8).

Col  1- 8 First random number generator seed,
Col  9-16 Second random number generator seed,
Col 17-24 Third random number generator seed.

Note:  The control file should end with an end-of-file symbol.


(B) The Locus File:  The locus file contains information describing the genetic 
loci involved in the power calculation. This includes one trait locus and either 
one or two marker loci. The sample locus file below includes a trait locus and 
two markers, and could be used for a linkage power calculation based on location 
scores.

TRAIT   AUTOSOME 2 3
d       .99
D       .01
1.       1
d/d
2.       2
d/d
D/d
3.       1
D/d
MARKER1 AUTOSOME 2 3
1       .50
2       .50
11       1
1/1
12       1
1/2
22       1
2/2
ABO     AUTOSOME 3 4
A       .26
B       .06
O       .68
A        2
A/A
A/O
B        2
B/B
B/O
AB       1
A/B
O        1
O/O

The trait locus has autosomal dominant inheritance with reduced penetrance; the 
specific penetrance functions are described in the control file.  Because the D 
allele is relatively rare, the D/D genotype is assumed impossible, and  
unaffected spouses in the pedigree file (see below) will be assumed not at risk
(phenotype 1.).  While these assumptions are not exactly true, they are 
reasonably accurate, and they result in a much simplified power calculation.  We 
strongly recommend the use of such assumptions whenever possible.  It is 
important to remember that this is a power calculation; approximate answers 
should be quite satisfactory.  Note:  excluding either homozygous genotype is 
not appropriate for an X-linked trait, since hemizygous males are assumed by 
MENDEL to be homozygous for their allele.

The first marker in the locus file is a two allele codominant marker with equal 
allele frequencies (note, allele names can be characters, including numbers).  
Given no prior interest in a particular marker, we generally use such a 
codominant marker as a compromise along the broad continuum between infinitely
polymorphic "magic markers" at one extreme and two allele polymorphisms with one 
rare allele at the other extreme.  The second marker is the ABO locus, and 
demonstrates how dominance relationships are dealt with when all genotypes are 
allowed for.

Inspection of this example shows that data on the loci are  provided one locus 
at a time with the following records (also see Examples and Lange et al., 1988):

1. Trait locus general information:  the following four variables in (2A8,2I2) 
format, the two integer variables right justified:

Col  1- 8  the name of the trait locus,
Col  9-16  the chromosomal type of the trait locus:
           =AUTOSOME, if the trait locus is autosomal,
           =X-LINKED, if the trait locus is X-linked.
Col 17-18  number of alleles at the trait locus (must be 2),
Col 19-20  number of trait phenotypes (by convention, this must
be
           3 for a dichotomous trait (see below) or 0 for a
           quantitative trait).

2. Trait allele information:  for each allele, a record with the following two 
variables in (A8,F8.5) format:

Col  1- 8  trait allele name,
Col  9-16  trait allele frequency.

Note:  Allele frequencies should sum to 1.0.

For each trait phenotype, enter record 3 below once and record 4 below once for 
each trait genotype that corresponds to the particular trait phenotype.

For dichotomous traits, three trait phenotypes are possible: 1.=normal and not 
at risk of becoming affected; 2.=normal and at risk of becoming affected; 
3.=affected.  Using the not at risk phenotype 1. when possible (for example, for 
spouses who marry into the pedigree for a relatively rare trait) can result in
substantial computational savings since it will usually correspond to fewer 
possible trait genotypes than the at risk phenotype 2. .

For quantitative traits, by convention, zero trait phenotypes are possible.

Note:  The dichotomous trait phenotypes must be 1., 2., or 3. In that order, and 
the trailing decimal points are required.

3. Trait phenotype information (dichotomous traits only):  the following two 
variables in a record in (A8,I2) format, the integer variable right justified:

Col  1- 8  trait phenotype name:  1., 2., or 3. (in that order)
Col  9-10  number of trait genotypes associated with this
           trait phenotype.

4. Trait phenotype/genotype correspondence (dichotomous traits): following each 
trait phenotype record, list the trait genotypes corresponding to that 
phenotype, one record per genotype, each genotype in (A17) format.  Each 
genotype is denoted by its two allele names separated by a slash (/).  The slash 
character should not be part of an allele name.

Note:  For an X-linked trait, no special symbols are required for males.  If a 
listed phenotype is appropriate for both females and males, only the associated 
homozygous genotypes will be assigned to a male with the phenotype.  Internally, 
the program identifies hemizygous genotypes with the corresponding homozygous 
genotypes.

Data on the marker loci are provided one locus at a time with the following 
records 5-8 required for each marker locus.

5. Marker locus general information:  the following four variables in (2A8,2I2) 
format, the two integer variables right justified:

Col  1- 8  the marker locus name,
Col  9-16  the chromosomal type of the marker locus:
           =AUTOSOME, if the marker locus is autosomal,
           =X-LINKED, if the marker locus is X-linked,
Col 17-18  number of alleles at the marker locus,
Col 19-20  number of phenotypes at the marker locus.

Note:  Lod/location score calculation time can increase rapidly as a function of 
the number of marker alleles.  Given more alleles, attendant array sizes may 
also become too large, particularly on microcomputers.

6. Marker allele information:  for each allele, a record with the following two 
variables in (A8,F8.5) format:

Col  1- 8  marker allele name,
Col  9-16  marker allele frequency.

Note:  Allele frequencies should sum to 1.0.

For each phenotype for the current marker, enter record 7 below once and record 
8 below once for each marker genotype that corresponds to the particular marker 
phenotype.

7. Marker phenotype information:  the following two variables in a record in 
(A8,I2) format, the integer variable right justified:

Col  1- 8  marker phenotype name,
Col  9-10  number of marker genotypes associated with this
           marker phenotype.

8. Marker phenotype/genotype correspondence:  following each marker phenotype 
record, list the marker genotypes associated with the marker phenotype in one 
record per marker genotype, each genotype in (A17) format.  Each marker genotype 
is denoted by its two allele names separated by a slash (/).  The slash 
character should not be part of an allele name.

Note:  For an X-linked trait, no special symbols are required for males.  If a 
listed phenotype is appropriate for both females and males, only the associated 
homozygous genotypes will be assigned to a male with the phenotype.  Internally, 
the program identifies hemizygous genotypes with the corresponding homozygous 
genotypes.

9. End-of-file symbol.  The locus file must end with one and only one end-of-
file symbol.  THIS IS CRITICAL!!  On some computers and with some word 
processors, an end-of-file symbol is added automatically, and the symbol is 
invisible.  On other computers there is a visible or partially visible symbol.  
All FORTRAN 77 compilers have an ENDFILE command if it is necessary to produce 
the end-of-file symbol.

(C) The Pedigree File:  The pedigree file contains information describing the 
pedigrees identified for use in the power calculation.  The sample pedigree file 
below includes two pedigrees of ten and six individuals, respectively.

(I3,1X,A8)
(3(A3,1X),2A1,A2,T15,A2,A3,A4)
 10 FAMILY1
  1         M 3. 1. 80.
  2         F 1. 1. 70.
  3   1   2 F 3. 1. 80.
  4   1   2 M 1. 1. 80.
  5   8   9 F 3. 1. 80.
  6   4   5 M 1. 1. 80.
  7   4   5 M 1. 1. 85.
  8         M 3. 1. 80.
  9         F 1. 1. 75.
 10   8   9 F 3. 1. 50.
  6 FAMILY2
  1   5   6 M 3. 1. 80.
  2         F 1. 1. 70.
  3   1   2 F 3. 1. 80.
  4   1   2 M 3. 1. 80.
  5         M 3. 1. 80.
  6         F 1. 1. 80.

In the pedigree file, two format statements are followed by information on each 
pedigree, one pedigree at a time.  Pedigree information includes a pedigree 
description record, followed by a record for each pedigree member.  The 
following records in the given order and with variables and formats as described 
below are required in the pedigree file (see Examples and Lange et al., 1988):

1. Pedigree record format statement:  This FORTRAN format statement is used to 
read the pedigree description records.  It should consist of an integer format 
for reading the number of individuals in a pedigree and a character format 
(maximum of eight characters) for reading the pedigree ID.  For example, 
(I3,1X,A8).

2. Individual record format statement:  This FORTRAN format statement is used to 
read the individual records.  Each individual record consists of an ID, parents' 
IDs, gender, MZ-twin status, trait phenotype for the first time (in character 
format corresponding exactly to what appears in the locus file for a dichotomous 
trait, or a blank field if this is for a quantitative trait), trait phenotype 
again (present for both dichotomous and quantitative traits), the observable 
phenotype indicator, and penetrance variable (such as age).  In order to read a 
dichotomous trait phenotype a second time, a tab (T) can be used to reread the 
previous field; two different fields must be read for quantitative trait data 
(see below).  All items or fields on an individual record should be read in 
character format (A) and each should consist of eight characters or less. This 
includes the quantitative variables (trait phenotype, observable phenotype 
indicator, and penetrance variable), for which decimal points are mandatory.  
For example, (3(A3,1X),2A1,A2,T15,A2,A3,A4).

3. Pedigree information.  This record is present once for each pedigree.  Enter 
the following two variables in the format specified in record 1.
Field 1:  the number of individuals in the pedigree (right
          justified),
Field 2:  the pedigree ID (optional).

4. Individual data.  This record is present once for each pedigree member.  For 
each pedigree member, input the following variables in the format specified in 
record 2.

Field 1:  Individual's ID,
Field 2:  ID of one of his/her parents, blank if the parent is
          not in the pedigree,
Field 3:  ID of the other parent, blank if the parent is not in
          pedigree,
Field 4:  Individual's gender, using symbols specified in the
          control file (for example, M or F, 1 or 2),
Field 5:  MZ-twin status, must be left blank since SIMLINK does
          not allow for MZ twins,
Field 6:  Individual's trait phenotype (see note below for
          quantitative traits),
Field 7:  Individual's trait phenotype again,
Field 8:  Indicator of the availability of the individual's
          phenotypes if a linkage study is carried out.
          =0. if marker phenotypes should not be simulated, and
              the trait phenotype should be left as specified in
              the pedigree file;
          =1. if marker phenotypes should be simulated, and a
              trait phenotype should be simulated if not listed
              in the pedigree file;
          =2. if marker phenotypes should be simulated, and the
              trait phenotype should be left as specified in the
              pedigree file;
          =3. if marker phenotypes should not be simulated, and
              the trait phenotype should be simulated if not
              listed in the pedigree file.

Note:  These last two options were not available in earlier versions of SIMLINK.

Field 9:  penetrance function variable, for example age.

Note 1:  Individual IDs must be unique within pedigrees.

Note 2:  Either both parents or neither parent of a person must be listed in a 
pedigree.

Note 3:  Missing values for any field must be represented by blanks.

Note 4:  For a dichotomous trait, the trait phenotype is read twice for each 
individual.  This can be done either by having two identical input fields and 
reading them both, or having a single input field and reading it twice using a 
tab (T) in the format statement.  For a quantitative trait, there must be two 
separate trait phenotype fields.  The first trait phenotype field must be left 
blank and the second trait phenotype field must contain the quantitative trait 
phenotype.  This approach to input makes it possible to use the same program for 
both dichotomous and quantitative traits.  Our apologies for any confusion it 
may cause.

5. End-of-file symbol.  The pedigree file must end with one and only one end-of-
file symbol.  THIS IS CRITICAL!!  On some computers and with some word 
processors, this is done automatically, and the symbol is invisible.  On other 
computers there is a visible or partially visible symbol.  All FORTRAN 77 
compilers have an ENDFILE command if it is necessary to produce the end of file 
symbol.


VII. Output from SIMLINK

The output from SIMLINK takes the form of up to seven tables, depending on the 
analyses carried out.  Maximum lod/location scores for each replicate of each 
pedigree are estimated by quadratic interpolation over the lod/location score 
values calculated at the test recombination fractions/map distances.

Table 1. Summary of Information Used in the Simulation.

Table 1 summarizes the information used in the simulation.  This includes the 
trait locus name, the number of pedigree replicates simulated, true 
recombination fractions/map distances, and the test recombination fractions/map 
distances used.

Tables 2 and 3 give estimates of the mean maximum lod/location score and the 
probabilities of maximum lod/location scores greater than specified constants 
for each of the true recombination fractions/map distances.  These estimates are 
given for each pedigree separately (listed under 1, 2, and so forth), for the 
pedigrees combined assuming genetic homogeneity (under SUMMED), for the 
pedigrees combined allowing for between-pedigree heterogeneity (under SUMMEDH) 
(optional), and for any one pedigree over all the available pedigrees (under 
ANY).

The values for a specific pedigree give estimates of the expected information 
provided by that pedigree.  The values for the summed pedigrees estimate the 
expected information provided by pooling the data.  Pooling the data in this way 
assumes that the trait is caused by a single genetic locus, that is, there is no
heterogeneity.  The values for the summed pedigrees allowing for heterogeneity 
estimates the expected information provided by pooling the data while explicitly 
allowing for heterogeneity. The values under ANY correspond to the information 
provided when an analysis is carried out under the assumption of genetic 
heterogeneity, and information from different pedigrees is not pooled, but the 
trait is actually homogeneous. 

Table 2. Estimated Mean Maximum Lod/Location Score for a Marker (Pair).

This table lists the estimated mean maximum lod/location score, its standard 
error, and the maximum maximum-lod/location-score among all replicates for each 
pedigree, for the summed pedigrees assuming homogeneity, for the summed 
pedigrees allowing for between-pedigree heterogeneity (optional), and for any of 
the pedigrees.  These estimates are reported for each of the true recombination 
fractions/map distances.

Note:  Since the maximum of the sum is usually less than the sum of the maxima, 
the expected maximum summed lod/location score (for all pedigrees combined) will 
usually be less than the sum of the expected maximum lod/location scores for the 
individual pedigrees.

Table 3. Estimated Probabilities of Maximum Lod/Location Scores Greater than 
Specified Constants for a Linked Marker (Pair).

This table lists the estimates and standard errors of probabilities of maximum 
lod/location scores greater than 0.5, 1.0, 1.5, 2.0, 2.5, and 3.0 for each 
pedigree, for the summed pedigrees assuming homogeneity, for the summed 
pedigrees allowing for heterogeneity (optional), and for any of the pedigrees.
These values are reported for each of the true recombination fractions/map 
distances.  For linked loci, estimates of the probabilities of maximum 
lod/location scores greater than 3.0 give estimates of the power of a proposed 
linkage study based on the corresponding data and the assumption of a linked 
marker or a pair of flanking markers at the given recombination fraction/map 
distance.  For unlinked loci, these same estimates give estimates of the 
probability of incorrectly inferring linkage to an unlinked marker or pair of 
markers.  In statistical terms, this estimates the probability "a" of making a 
type I error for a single analysis.  Since many markers will often be 
considered, the overall probability of making a type I error is greater. 
Assuming that the linkage calculations for the different marker (pairs) are 
independent, the overall probability of making a type I error becomes 1-(1-
a)**n, where n is the number of marker (pairs) and "**" represents 
exponentiation.

Table 4. Estimated Probabilities of Maximum Location Scores Greater Than 
Specified Constants, Averaged Over the Interval Between the Two Marker Loci.

This table lists estimates of the average probability, when the trait locus is 
located somewhere between the two marker loci, of a maximum location score 
greater than constants 0.5, 1.0, 1.5, 2.0, 2.5, and 3.0 for each pedigree, for 
the summed pedigrees assuming homogeneity, for the summed pedigrees allowing for
heterogeneity, and for any of the pedigrees.  Table 4. is omitted when 
simulating only one marker locus or if only a single location for the disease 
locus was chosen in the control file (see above).  See Boehnke (1986) for a 
method using two-point lod scores to calculate a lower bound on the information 
provided by flanking markers and location scores.
 
Tables 5 and 6 provide estimates of the expected lod/location score and 
probability of a lod/location score greater than specified constants when the 
marker (pair) is unlinked.  These tables differ from tables 2 and 3 by reporting 
values for each test recombination fraction/map distance, rather than maximizing
over all test recombination fractions/map distances.  Tables 5 and 6 can be used 
to estimate the distance to each side of an unlinked marker (pair) that is 
likely to be excluded using the available pedigrees.  Tables 5 and 6 are 
included only if free recombination is simulated (that is, IFREE=1).

Table 5. Estimated Mean Lod/Location Score for an Unlinked Marker (Pair).

For each test recombination fraction/map distance, this table gives the estimate 
of the mean lod/location score, its standard error, and the sample maximum and 
minimum lod/location scores for each pedigree and for the summed pedigrees 
assuming homogeneity. In addition, an estimate of the test recombination 
fraction/map distance at which the mean lod/location score equals -2.0 is 
printed.  This estimate is based on quadratic interpolation of the lod/location 
score.  This recombination fraction/map distance gives an estimate of the 
expected exclusion distance when testing for linkage to an unlinked marker 
(pair).  If interpolation is not possible, asterisks are printed.

Table 6. Estimated Probabilities of Lod/Location Scores Greater than Specified 
Constants for an Unlinked Marker (Pair).

For each test recombination fraction/map distance, estimates and standard errors 
for the probabilities of lod/location scores greater than -2.0, -1.5, -1.0, ... 
, 2.5, and 3.0 are given. For each test recombination fraction/map distance, one 
minus the probability of a lod/location score greater than -2.0 gives an 
estimate of the probability that linkage will be excluded for at least that 
distance from an unlinked marker (pair).


VIII. Four Sample Problems

Input files for these examples are EXAMPLE*.CON, EXAMPLE*.LOC, and EXAMPLE*.PED; 
output files are EXAMPLE*.OUT (*=1,2,3,4). These files are all included on the 
diskette.  Before using SIMLINK for your own data, we strongly recommend running 
the test problems to verify that you are obtaining the same results.  The 
example input files should be helpful when you go to prepare input files for 
your own analyses.

Example 1:  Eight Pedigrees, Autosomal Dominant Trait with Piecewise Linear 
Penetrance Function

Each of the eight pedigrees in this example is identical to that described by 
Ploughman and Boehnke (1989).  Eight copies are used to achieve a moderate-sized 
power estimate for demonstration purposes.

Pedigrees 1 through 8 are segregaing an autosomal dominant trait with complete 
penetrance by age 40.  Three pedigree members, numbered 4, 6, and 7, in each of 
the pedigrees, are unaffected, at risk, and below the age of 40.  The penetrance 
for these pedigree members is described by a piecewise linear function 
(PENOPT=1) which increases from 0 at age 0 to 1.0 at age 40 for trait genotypes 
DD and Dd, and is 0 at all ages for trait genotype dd.  The remaining pedigree 
members are either affected or unaffected and assumed not to be at risk.  The 
ages listed for these pedigree members are not needed by the penetrance  
function, and, hence, need not be correct (see pedigree file).

Only 20 replicates are simulated in this example, so that it can be used to 
quickly check that the program is producing the same results as are given in 
EXAMPLE1.OUT.

Control file:  EXAMPLE1.CON

Column numbers are provided for easy reference; they are not part of the input 
file.

         1         2         3         4         5         6
1234567890123456789012345678901234567890123456789012345678901234

      20       1       1       1       4       1       0       0
    0.00    0.10    0.20    0.50             2. True rec. frac.
     0.0    40.0     0.0     1.0             3. for males, DD
     0.0    40.0     0.0     1.0                for males, Dd
     0.0    40.0     0.0     0.0                for males, dd
     0.0    40.0     0.0     1.0                for females, DD
     0.0    40.0     0.0     1.0                for females, Dd
     0.0    40.0     0.0     0.0                for females, dd
M       F                          4. male and female symbols
AUTODOM                            5. trait locus name
EXAMPLE1.LOC                       6. locus file name
EXAMPLE1.PED                       7. pedigree file name
    3791    3271     313    8. seeds for random number generator

1. The control line states that 20 replicates will be simulated for each 
pedigree (NREP=20), 1 marker locus will be simulated (NMLOCI=1), the penetrance 
function is piecewise linear (PENOPT=1), free recombination will be simulated 
(IFREE=1), 4 true recombination fractions will be considered (NTHETA=4), echo
the data (IECHO=1), do not examine the effects of individual 
heterozygosity/homozygosity status (INDINF=0), and assume the trait is 
homogeneous (LNKOPT=0).  Since LNKOPT=0, SIMLINK assumes the linked fraction 
alpha is 1.

2. Linked marker phenotypes will be simulated at the following true 
recombination fractions between the trait and marker loci: 0.00, 0.10, 0.20, and 
0.50.

3. The minimum age, maximum age, minimum penetrance, and maximum penetrance for 
the piecewise linear penetrance function for each possible trait genotype/gender 
combination.

4. The male and female symbols used in the pedigree file are M and F.

5. The trait locus name is AUTODOM in the locus file.

6. The locus file name is EXAMPLE1.LOC, chosen to make clear the contents of the 
file.

7. The pedigree file name is EXAMPLE1.PED, chosen to make clear the contents of 
the file.

8. These three values are chosen as seeds for the random number generator.  If 
the same values are used in a later run, the same results will be obtained.  If 
they are changed, the results will change too.


Locus file:  EXAMPLE1.LOC

Column numbers are provided for easy reference; they are not part of the input 
file.

         1         2
12345678901234567890123456789  Comments:

AUTODOM AUTOSOME 2 3           1. Trait locus information
D       .01                    2. Trait allele information
d       .99
1.       1                     3. Trait phenotype information
d/d                            4. Pheno/geno correspondence
2.       2                     3. Trait phenotype information
D/d                            4. Pheno/geno correspondence
d/d                            4. Pheno/geno correspondence
3.       1                     3. Trait phenotype information
D/d                            4. Pheno/geno correspondence
MARKER1 AUTOSOME 2 3           5. Marker locus information
A       .50                    6. Marker allele information
B       .50
AA       1                     7. Marker phenotype information
A/A                            8. Pheno/geno correspondence
AB       1                     7. Marker phenotype information
A/B                            8. Pheno/geno correspondence
BB       1                     7. Marker phenotype information
B/B                            8. Pheno/geno correspondence

1. The trait locus name is AUTODOM; it is autosomal, has 2 alleles, and 3 
phenotypes.

2. The 2 trait alleles are the dominant disease-susceptibility allele D, with 
allele frequency 0.01, and the recessive allele d, with allele frequency 0.99.

3., 4. There are 3 trait phenotypes:  phenotype 1. has 1 associated genotype, 
d/d, phenotype 2. has 2 associated genotypes, D/d and d/d, and phenotype 3. has 
1 associated genotype, D/d.  Because it is so rare, genotype D/D has been 
omitted from this analysis, reducing the amount of computation time 
substantially.  We strongly recommend this approach whenever feasible.  Note:  
Homozygous genotypes should not be eliminated if the trait locus is X-linked.

5. The marker locus name is MARKER1; it is autosomal, has 2 alleles, and 3 
phenotypes.

2. The 2 marker alleles are A and B, each with allele frequency 0.50.

3., 4. There are 3 marker phenotypes:  phenotype AA has 1 associated genotype, 
A/A, phenotype AB has 1 associated genotype, A/B, and phenotype BB has 1 
associated genotype, B/B, so that the marker is codominant.


Pedigree file:  EXAMPLE1.PED

Column numbers are provide for easy reference; they are not part of the input 
file.

         1         2
12345678901234567890123456789  Comments:

(I3,1X,A8)                     1. Pedigree record format
(3(A3,1X),2A1,A2,T15,A2,A3,A4) 2. Individual record format
 10 FAMILY 1                   3. Pedigree information
  1         M 3. 1. 80.        4. Individual data
  2         F 1. 1. 70.
  3   1   2 F 3. 1. 80.
  4   1   2 M 2. 1. 30.
  5   8   9 F 3. 1. 80.
  6   4   5 M 2. 1. 10.
  7   4   5 M 2. 1.  5.
  8         M 3. 1. 80.
  9         F 1. 1. 75.
 10   8   9 F 1. 1. 50.
 10 FAMILY 2                   3. Pedigree information
  1         M 3. 1. 80.        4. Individual data
  2         F 1. 1. 70.
  3   1   2 F 3. 1. 80.
  4   1   2 M 2. 1. 30.
  5   8   9 F 3. 1. 80.
  6   4   5 M 2. 1. 10.
  7   4   5 M 2. 1.  5.
  8         M 3. 1. 80.
  9         F 1. 1. 75.
 10   8   9 F 1. 1. 50.

          .
          .
          .

 10 FAMILY 8                   3. Pedigree information
  1         M 3. 1. 80.        4. Individual data
  2         F 1. 1. 70.
  3   1   2 F 3. 1. 80.
  4   1   2 M 2. 1. 30.
  5   8   9 F 3. 1. 80.
  6   4   5 M 2. 1. 10.
  7   4   5 M 2. 1.  5.
  8         M 3. 1. 80.
  9         F 1. 1. 75.
 10   8   9 F 1. 1. 50.

1. Each pedigree record, consisting of the number of individuals in a pedigree 
and the pedigree ID (optional), will be read in format (I2,1X,A8).

2. Each individual record, consisting of an ID, parents' IDs, gender, MZ-twin 
status (blank), trait phenotype, trait phenotype again (by tabbing to the 
previous field), the observable marker phenotype indicator, and age, will be 
read in format (3(A3,1X),2A1,A2,T15,A2,A3,A4).

3. There are ten individuals in each of the eight pedigrees.  The pedigree IDs 
are FAMILY 1, FAMILY 2, ..., and FAMILY 8.

4. For each individual:  his/her ID, the IDs of both of his/her parents, his/her 
gender (using the symbols M and F as specified in the control file), a blank 
field for MZ-twin status, his/her trait phenotype, a 1. indicating that his/her 
marker phenotype should be simulated, and his/her age.


Example 2:  Two Pedigrees, Autosomal Dominant Trait with Cumulative Normal 
Penetrance Function

Pedigrees 1 and 2 are segregating a heterogeneous autosomal dominant trait with 
complete penetrance by age 40.  In pedigree 1, individuals 32, 35, 39, and 40 
are unaffected, at risk, and below the age of 40; likewise, in pedigree 2, 
individuals 30, 33, 36, and 38 are unaffected, at risk, and below the age of 40.  
The penetrance for these individuals is described by a cumulative normal 
function (PENOPT=2) with a mean age of 10.0, a standard deviation of 4.0, a 
minimum penetrance of 0.0, and a maximum penetrance of 1.0 for trait genotypes 
DD and Dd.  The penetrance is 0.0 at all ages for trait genotype dd.  The 
remaining pedigree members are either affected or unaffected and not at risk.  
The linked fraction of pedigrees is assumed to be .80.  A related example is 
described by Boehnke (1986).


Control file:  EXAMPLE2.CON

     250       1       2       1       2       1       0       1
0.80
    0.05    0.50                             2. True rec. frac.
    10.0     4.0     0.0     1.0             3. for males, DD
    10.0     4.0     0.0     1.0                for males, Dd
     0.0     4.0     0.0     0.0                for males, dd
    10.0     4.0     0.0     1.0                for females, DD
    10.0     4.0     0.0     1.0                for females, Dd
     0.0     4.0     0.0     0.0                for females, dd
1       2                          4. male and female symbols
AUTODOM                            5. trait locus name
EXAMPLE2.LOC                       6. locus file name
EXAMPLE2.PED                       7. pedigree file name
    3191     371   21713     8. seeds for random number generator


Locus file:  EXAMPLE2.LOC

AUTODOM AUTOSOME 2 3           1. Trait locus information
D       .01                    2. Trait allele information
d       .99
1.       1                     3. Trait phenotype information
d/d                            4. Pheno/geno correspondence
2.       2                     3. Trait phenotype information
D/d                            4. Pheno/geno correspondence
d/d                            4. Pheno/geno correspondence
3.       1                     3. Trait phenotype information
D/d                            4. Pheno/geno correspondence
MARKER1 AUTOSOME 2 3           5. Marker locus information
A       .50                    6. Marker allele information
B       .50
AA       1                     7. Marker phenotype information
A/A                            8. Pheno/geno correspondence
AB       1                     7. Marker phenotype information
A/B                            8. Pheno/geno correspondence
BB       1                     7. Marker phenotype information
B/B                            8. Pheno/geno correspondence


Pedigree file:  EXAMPLE2.PED

(I2,1X,A8)                     1. Pedigree record format
(3(A3,1X),2A1,A3,T15,3A3)      2. Individual record format
40 FAMILY 1                    3. Pedigree information
  1         1 3. 0. 80.        4. Individual data
  2         2 1. 0. 80.
  3         2 1. 0. 80.
  4   1   2 1 3. 0. 80.
  5   1   2 1 3. 0. 80.
  6         2 1. 1. 80.
  7         2 1. 1. 80.
  8   3   4 1 3. 1. 80.
  9   3   4 2 1. 1. 80.
 10   3   4 1 3. 1. 80.
 11         2 1. 1. 80.
 12         2 1. 1. 80.
 13   5   6 1 3. 1. 80.
 14   5   6 1 3. 1. 80.
 15         2 1. 1. 80.
 16   5   6 2 1. 1. 80.
 17   5   6 2 3. 1. 80.
 18         1 1. 1. 80.
 19   5   6 1 1. 1. 80.
 20         1 1. 1. 80.
 21   7   8 2 3. 1. 80.
 22   7   8 1 1. 1. 80.
 23   7   8 1 1. 1. 80.
 24   7   8 1 3. 1. 80.
 25  10  11 1 1. 1. 80.
 26  10  11 2 1. 1. 80.
 27  10  11 1 3. 1. 80.
 28  12  13 2 1. 1. 80.
 29  12  13 2 3. 1. 80.
 30  12  13 2 1. 1. 80.
 31  14  15 2 1. 1. 80.
 32  14  15 2 2. 1. 10.
 33  14  15 1 3. 1. 80.
 34  17  18 2 1. 1. 80.
 35  17  18 2 2. 1.  5.
 36  17  18 2 3. 1. 80.
 37  17  18 1 1. 1. 80.
 38  20  21 2 1. 1. 80.
 39  20  21 2 2. 1. 12.
 40  20  21 2 2. 1.  8.
38 FAMILY 2                    3. Pedigree information
  1         1 3. 0. 80.        4. Individual data
  2         2 1. 0. 80.
  3         1 1. 1. 80.
  4   1   2 2 3. 0. 80.
  5   1   2 2 3. 1. 80.
  6   1   2 2 1. 1. 80.
  7         1 1. 1. 80.
  8   3   4 2 3. 1. 80.
  9   3   4 2 1. 1. 80.
 10   3   4 1 3. 1. 80.
 11         2 1. 1. 80.
 12         1 1. 1. 80.
 13   7   8 2 3. 1. 80.
 14         1 1. 1. 80.
 15   7   8 2 3. 1. 80.
 16   7   8 2 3. 1. 80.
 17         1 1. 1. 80.
 18  10  11 2 1. 1. 80.
 19  10  11 1 3. 1. 80.
 20         2 1. 1. 80.
 21  12  13 1 1. 1. 80.
 22  12  13 1 1. 1. 80.
 23  14  15 2 1. 1. 80.
 24         2 1. 1. 80.
 25  16  17 1 3. 1. 80.
 26  16  17 2 3. 1. 80.
 27         1 1. 1. 80.
 28  16  17 1 3. 1. 80.
 29  16  17 1 3. 1. 80.
 30  16  17 1 2. 1. 17.
 31  19  20 1 3. 1. 80.
 32  19  20 2 3. 1. 80.
 33  19  20 1 2. 1. 13.
 34  24  25 1 1. 1. 80.
 35  24  25 1 3. 1. 80.
 36  26  27 2 2. 1.  8.
 37  26  27 1 1. 1. 80.
 38  26  27 2 2. 1. 10.


Example 3:   Three Pedigrees, X-linked Recessive Trait with Two Flanking Marker 
Loci

The rare, X-linked recessive trait segregating in these pedigrees is Becker 
Muscular Dystrophy.  The pedigrees BD28, BD78, and BD9 were taken from Brown et 
al. (1985) with some modification of ages.  Although this trait has age-
dependent penetrance, usually appearing in the 20s, since all unaffecteds in the 
line of descent of the trait are beyond the typical range of onset ages, 
assuming complete penetrance is reasonable for a power calculation and will save 
computation time.  Therefore, the piecewise linear penetrance function used in 
the analysis has complete penetrance for individuals with trait genotype dd and
0.0 penetrance for individuals with trait genotype DD or Dd.  Two flanking 
marker loci with a true map distance of 10 cM between them were used in the 
simulation.


Control file:  EXAMPLE3.CON

     250       2       1       1       1       1       1       0
    0.10       1                      2. True map dist., dist.
option
     0.0    40.0     1.0     1.0      3. for males, dd
     0.0    40.0     0.0     0.0         for males, Dd
     0.0    40.0     0.0     0.0         for males, DD
     0.0    40.0     1.0     1.0         for females, dd
     0.0    40.0     0.0     0.0         for females, Dd
     0.0    40.0     0.0     0.0         for females, DD
M       F                          4. male and female symbols
XREC                               5. trait locus name
EXAMPLE3.LOC                       6. locus file name
EXAMPLE3.PED                       7. pedigree file name
    2791    3903    1313           8. seeds for random numbers


Locus file:  EXAMPLE3.LOC

XREC    X-LINKED 2 3           1. Trait locus information
d       .0001                  2. Trait allele information
D       .9999
1.       2                     3. Trait phenotype information
D/D                            4. Pheno/geno correspondence
D/d
2.       3                     3. Trait phenotype information
D/D                            4. Pheno/geno correspondence
D/d
d/d
3.       1                     3. Trait phenotype information
d/d                            4. Pheno/geno correspondence
MARKER1 X-LINKED 2 3           5. Marker locus information
A       .50                    6. Marker allele information
B       .50
AA       1                     7. Marker phenotype information
A/A                            8. Pheno/geno correspondence
AB       1                     7. Marker phenotype information
A/B                            8. Pheno/geno correspondence
BB       1                     7. Marker phenotype information
B/B                            8. Pheno/geno correspondence
MARKER2 X-LINKED 2 3           5. Marker locus information
Y       .50                    6. Marker allele information
Z       .50
YY       1                     7. Marker phenotype information
Y/Y                            8. Pheno/geno correspondence
YZ       1                     7. Marker phenotype information
Y/Z                            8. Pheno/geno correspondence
ZZ       1                     7. Marker phenotype information
Z/Z                            8. Pheno/geno correspondence

Note:  The genotypes DD and dd must be included in this X-linked example so that 
the male hemizygous genotypes will be allowed for by MENDEL.

Pedigree file:  EXAMPLE3.PED

(I3,1X,A8)                     1. Pedigree record format
(3(A3,1X),2A1,A2,T15,A2,A3,A4) 2. Individual record format
 10 BD28                       3. Pedigree information
  1         M 1. 0. 80.        4. Individual data
  2         F 1. 0. 80.
  3         M 1. 1. 80.
  4   1   2 F 1. 1. 80.
  5   1   2 M 3. 0. 80.
  6         F 1. 1. 80.
  7   1   2 M 3. 1. 80.
  8   3   4 M 3. 1. 80.
  9   5   6 M 1. 1. 80.
 10   5   6 F 1. 1. 80.
  7 BD78                       3. Pedigree information
  1         M 1. 1. 90.        4. Individual data
  2         F 1. 1. 85.
  3         M 1. 1. 65.
  4   1   2 F 1. 1. 60.
  5   1   2 M 3. 0. 60.
  6   1   2 M 1. 1. 60.
  7   3   4 M 3. 1. 33.
 12 BD9                        3. Pedigree information
  1         M 1. 0. 90.        4. Individual data
  2         F 1. 0. 90.
  3         M 1. 1. 90.
  4   1   2 F 1. 1. 90.
  5   1   2 M 1. 1. 90.
  6   3   4 M 3. 1. 62.
  7   3   4 M 3. 1. 64.
  8   3   4 M 3. 1. 66.
  9   3   4 F 1. 1. 63.
 10         M 1. 1. 66.
 11   9  10 M 3. 1. 36.
 12   9  10 M 3. 1. 40.


Example 4:  One Pedigree with an Autosomal Dominant Quantitative Trait

The large nuclear family in this example is segregating an autosomal major locus 
for a quantitative trait.  The mean trait value for an individual with the DD or 
Dd trait genotype is 10.0 plus 0.10 times the age of the individual; the 
standard deviation is 1.0.  The mean trait value for an individual with the dd 
trait genotype is 5.0 and is not a function of age; the standard deviation is 
also 1.0.

Control file:  EXAMPLE4.CON

     250       1       3       1       3       1       1       0
    0.00    0.10    0.50                     2. True rec. frac.
    10.0    0.10     1.0     0.0             3. for males, DD
    10.0    0.10     1.0     0.0                for males, Dd
     5.0     0.0     1.0     0.0                for males, dd
    10.0    0.10     1.0     0.0                for females, DD
    10.0    0.10     1.0     0.0                for females, Dd
     5.0     0.0     1.0     0.0                for females, dd
M       F                           4. male and female symbols
QUANT                               5. trait locus name
EXAMPLE4.LOC                        6. locus file name
EXAMPLE4.PED                        7. pedigree file name
    3191     371   21713     8. seeds for random number generator


Locus file:  EXAMPLE4.LOC

QUANT   AUTOSOME 2 0           1. Trait locus information
D       .01                    2. Trait allele information
d       .99
MARKER1 AUTOSOME 2 3           5. Marker locus information
A       .50                    6. Marker allele information
B       .50
AA       1                     7. Marker phenotype information
A/A                            8. Pheno/geno correspondence
AB       1                     7. Marker phenotype information
A/B                            8. Pheno/geno correspondence
BB       1                     7. Marker phenotype information
B/B                            8. Pheno/geno correspondence


Pedigree file:  EXAMPLE4.PED

(I2,1X,A8)                     1. Pedigree record format
(3(A3,1X),3A1,A4,A3,A4)        2. Individual record format
15 QUANT                       3. Pedigree information
  1         M  20. 1. 80.      4. Individual data
  2         F   5. 1. 70.
  3   1   2 M  19. 1. 55.
  4   1   2 F  16. 1. 52.
  5   1   2 M  16. 1. 50.
  6   1   2 M  14. 1. 48.
  7   1   2 M  15. 1. 46.
  8   1   2 F   6. 1. 44.
  9   1   2 M   4. 1. 41.
 10   1   2 F  17. 1. 39.
 11   1   2 F  16. 1. 36.
 12   1   2 M   5. 1. 35.
 13   1   2 F  12. 1. 33.
 14   1   2 F   6. 1. 31.
 15   1   2 M   5. 1. 29.

Note:  A blank must be present in the first trait phenotype field for a 
quantitative trait.


IX. Array Sizes, File Management, and Other Practical Hints

The maximum sizes of the variables and arrays in SIMLINK are initially set 
according to the values of the following variables:

                                                    Initial
Variable          Description                        Value

MAXALL   maximum number of marker alleles                4
MAXCON   maximum number of constants for comparing
         to lod/location scores                          9
MAXGEN   maximum number of marker genotypes             10
MAXP     maximum number of people on whom a person's
         conditional probabilities can depend            4
MAXPED   maximum number of pedigrees                    20
MAXPEO   maximum number of people per pedigree         100
MAXPHN   maximum number of marker phenotypes            10
MAXTH    maximum number of true recombination
         fractions/map distances                         8
MAXTOT   maximum number of people in entire data set   200
MAXTST   maximum number of test recombination
         fractions/map distances                         8
MXGLST   maximum size of GLIST array                  1200
MXMG     maximum size of MARGEN array                 6400
MXMP     maximum size of MKPHEN array                 3200
MXTM     maximum size of the hetero/homozygos arrays  1600
MXPLST   maximum size of PLIST array                   800
MXPROB   maximum size of CONDPR array                16200
MXTEMP   maximum size of TEMPPR array (maximum number
         of conditional probabilities per person)       81
LENC     maximum size of CARRAY array for MENDEL       200
LENI     maximum size of IARRAY array for MENDEL      5000
LENL     maximum size of LARRAY array for MENDEL       100
LENR     maximum size of RARRAY array for MENDEL      5000

To modify these dimensions, as you will almost certainly need to do, modify the 
parameter statement in SIMLINK.FOR for the variable in question.  This may be 
accomplished by using a file editor.  Then recompile SIMLINK.FOR and link the 
.OBJ files.

Note:  Many of the maximum sizes listed above are interrelated, so that if one 
is altered, others may need to be as well.  The relationships are given below:

MAXTH  =  maximum number of recombination fractions
MAXTOT =  maximum total number of people in the data set
MAXLOC =  maximum number of loci (1 or 2)
MAXP   =  maximum number of individuals on whom someone's
          conditional genotype probabilities might depend
(roughly
          speaking, no more than 3 + the number of loops in a
          pedigree)

MXGLST =  MAXTOT*3*2 (where 3 is the number of possible trait
          genotypes and 2 is the number of haplotypes)
MXMG   =  MAXTOT*MAXTH*MAXLOC*2 (where 2 is the number of
haplotypes)
MXMP   =  MXMG/2
MXPLST <= MAXTOT*MAXP
MXPROB <= MAXTOT*3**MAXP (where 3 is the number of possible trait
          genotypes and "**" represents exponentiation)
MXTEMP =  3**MAXP (where 3 is the number of possible trait
          genotypes)
MXTM   =  MAXTOT*MAXTH

Note:  MAXTST must be greater than or equal to the number of test
recombination fractions/map distances (NTEST).


X. Error Conditions

When SIMLINK stops without completing the desired analysis, error messages may 
be found (1) on the screen, (2) in the output file, or (3) in the file 
SIMERR.SCR.  SIMDOC.SCR can be consulted to determine the correspondence between 
input IDs and MENDEL IDs.

The most frequent error encountered when using SIMLINK is insufficient array 
size for any of a large variety of arrays. This can be dealt with by editing 
SIMLINK.FOR, identifying the PARAMETER statement associated with the array 
dimension that is too small, recompiling SIMLINK.FOR, and linking the program.
NOTE:  On a microcomputer using MICROSOFT FORTRAN, it may not be possible to 
make all arrays sufficiently large because of the 640K limitation of DOS.  In 
such cases, possible solutions include:  (a) limiting the number of trait 
genotypes whenever possible (see above); (b) decreasing the number of marker
alleles; (c) decreasing some array dimensions if possible; (d) calculating lod 
scores rather than location scores; (e) using the F77L-EM/32 compiler; or (f) 
using a larger computer.

As you encounter other errors that are not clearly explained by the error 
message(s) provided, I would appreciate knowing about them so that I can add 
them to this documentation and/or add better error messages to the program.


XI. References

Boehnke M (1986) Estimating the power of a proposed linkage study:  a practical 
computer simulation approach.  American Journal of Human Genetics 39:513-527.

Brown CS, Thomas NST, Sarfarazi M, Davies KE, Kunkel L, Pearson PL, Kingston HM, 
Shaw DJ, Harper PS (1985) Genetic linkage relationships of seven DNA probes with 
Duchenne and Becker muscular dystrophy.  Human Genetics 71:62-74.

Haldane JBS (1919) The combination of linkage values and the calculation of 
distances between the loci of linked factors. Journal of Genetics 8:299-309.

Lange K, Boehnke M, Weeks D (1988) Documentation for MENDEL, Version 2.3, 
November, 1988.

Lange K, Weeks D, Boehnke M (1988) Programs for pedigree analysis:  MENDEL, 
FISHER, and dGENE.  Genetic Epidemiology 5:471-472.

Morton NE (1955) Sequential tests for the detection of linkage. American Journal 
of Human Genetics 7:277-318.

Ploughman LM, Boehnke M (1989) Estimating the power of a proposed linkage study 
for a complex genetic trait.  American Journal of Human Genetics 44:543-551.

Risch N (1989) Linkage detection tests under heterogeneity. Genetic Epidemiology 
6:473-480.

Smith CAB (1963) Testing for heterogeneity of recombination fraction values in 
human genetics.  Annals of Human Genetics 27:175-182.

Wichman BA, Hill ID (1982) An efficient and portable pseudo-random number 
generator.  Applied Statistics 31:188-192.