RHMAP:  STATISTICAL PACKAGE FOR
MULTIPOINT RADIATION HYBRID MAPPING

VERSION 2.01


October 1995


Programmed by:

Michael Boehnke, Elizabeth Hauser, Kenneth Lange,
Kathryn Lunetta, Justine Uro, and Jill VanderStoep


Address questions and correspondence to:

Michael Boehnke, Ph.D.
Department of Biostatistics
School of Public Health
1420 Washington Heights
University of Michigan
Ann Arbor, Michigan  48109-2029
Phone:  (734) 936-1001
FAX:  (734) 763-2215
E-Mail:  boehnke@umich.edu


TABLE OF CONTENTS


INTRODUCTION
RHMAP:  CHANGES IN VERSION 2
RH2PT:  INTRODUCTION AND ASSUMPTIONS
RH2PT:  CHANGES IN VERSION 2
RH2PT:  INPUT
RH2PT:  OUTPUT
RHMINBRK:  INTRODUCTION AND ASSUMPTIONS
RHMINBRK:  CHANGES IN VERSION 2
RHMINBRK:  ORDERING STRATEGIES
RHMINBRK:  INPUT
RHMINBRK:  OUTPUT
RHMAXLIK:  INTRODUCTION, ASSUMPTIONS, AND MODELS
RHMAXLIK:  CHANGES IN VERSION 2
RHMAXLIK:  ORDERING STRATEGIES
RHMAXLIK:  INPUT
RHMAXLIK:  OUTPUT
INPUT DIFFERENCES IN THE PROGRAMS
CHECKING FOR DATA ERRORS AND INFLUENTIAL HYBRIDS IN THE
  MULTIPOINT ANALYSES
OUTLINE FOR THE ANALYSIS OF RH MAPPING DATA
DEFAULT ARRAY DIMENSIONS
ERROR CONDITIONS AND USER SUPPORT
FUTURE PLANS
ACKNOWLEDGEMENTS
REFERENCES


INTRODUCTION

Building on the earlier work of Goss and Harris (1975, 1977ab), Cox and his
colleagues (1990) have demonstrated that radiation hybrid (RH) mapping provides 
a powerful method for fine-structure mapping of human chromosomes.
Cox et al. used the method of moments and the analysis of two and four loci at a 
time to estimate distances between loci and to determine locus order.  In 
contrast, we (Boehnke et al. 1991) have developed multipoint mapping methods 
that make use of information on many loci simultaneously.  These methods are 
based on (1) minimizing obligate chromosome breaks, and (2) maximizing the 
likelihood for several different breakage and retention models.  Detailed 
description of RH mapping will not be presented in this document; the papers of 
Cox et al. (1990), Boehnke et al. (1991), and Walter et al. (1994) can be 
consulted for such a description, including definitions of many of the terms 
that will be used here.

RHMAP version 2 is a set of three FORTRAN 77 programs that provide the means for 
a complete statistical analysis of RH mapping data.  RH2PT is a program for data 
description and two-point analysis.  It provides estimates of locus-specific 
retention probabilities and pairwise breakage probabilities, two-point lod 
scores for linkage of the various marker pairs, and linkage groups.

RHMINBRK is a program for multilocus ordering by minimization of the number of 
obligate chromosome breaks; RHMAXLIK is a program for multilocus ordering by 
maximization of the likelihood of the hybrid data under a variety of breakage 
and retention models.  Both these programs can evaluate a user-specified list of 
locus orders, or can employ one of several strategies of combinatorial 
optimization to attempt to identify the best locus orders.  Both multipoint 
methods can be used to identify influential hybrids that have a large impact on 
ordering conclusions.

The files that accompany this documentation have both source and executable 
files for all three programs, as well as input and output files for several 
sample analyses of the proximal chromosome 21q data set of Cox et al. (1990).  
This document describes each of the three programs in turn, discussing 
assumptions, options, input, output, and sample analyses.  It concludes with a 
general discussion of how to carry out a RH mapping analysis, how to compile and 
run the programs, error recovery, consulting, future plans, and references.


RHMAP:  CHANGES IN VERSION 2

Version 2 of RHMAP replaces version 1.1.  The principal enhancements in the new 
software include:  (1) analysis of diploid and more generally polyploid RH 
mapping data (all programs); (2) map construction in which a subset of the 
genetic markers are fixed in a user-specified order (RHMINBRK and RHMAXLIK); and 
(3) determination of the distribution of the number of obligate chromosome 
breaks for a hybrid as a further aid in the detection of marker mistyping or 
misscoring (RHMAXLIK).  These and other less significant changes to the various 
programs are described in detail in the descriptions of the individual programs.  
Manuscripts describing the new methods are currently being written and should be 
submitted sometime in the winter of 1995.

Note:  RHMAP version 1.1 input files for RH2PT AND RHMINBRK should be usable for 
version 2 of these programs.  RHMAXLIK version 1.1 files will require one change 
(see below for details).


RH2PT:  INTRODUCTION AND ASSUMPTIONS

RH2PT is a FORTRAN 77 program for data description and two-point analysis of RH 
mapping data.  It prints tables of (1) locus names; (2) retention status 
characters; (3) observed RH retention data; (4) locus retention probabilities; 
(5) two-locus conditional coretention probabilities; (6) two-locus breakage 
probability estimates, distance estimates, and maximum lod scores for the equal 
retention probability model that assumes all fragments have the same probability 
of being retained in a RH; (7) linkage groups indicating which loci are linked 
on the basis of two-locus lod scores of at least 2.0, at least 3.0, or at least 
4.0; and (8) a list of locus-pairs that are never discordant in the data and so 
appear completely linked.

While tables 1-5 and 8 are merely descriptive and require no assumptions, 
estimation of breakage probabilities and distances and calculation of maximum 
lod scores require assumptions about the breakage and retention processes.  
Following Cox et al. (1990), we assume that (1) breakage is at random along the 
chromosome, with constant intensity and no interference (in probabilistic terms, 
breakage along the chromosome is a Poisson process); (2) different chromosomal 
fragments are retained independently in the resulting RHs; and (3) retention 
probabilities for the various fragments are all equal.


RH2PT:  CHANGES IN VERSION 2

Changes in RH2PT in version 2 include:  (a) analysis of diploid and more 
generally polyploid RH mapping data; (b) elimination from Table 6 of lod scores 
and parameter estimates results for the general retention model, since the equal 
and general retention models give very similar results; (c) basing the linkage 
groups in Table 7 on equal-retention rather than general-retention lod scores; 
(d) addition of Table 8 that lists all locus pairs that are completely linked, 
that is, demonstrate no obligate chromosome breaks between them; and (e) 
elimination of several minor programming bugs, one of which in some cases caused 
incorrect parameter estimates and lod scores when hybrids were reported as 
having been present in multiple copies.

These changes result in one modification in program input:  optional 
specification of the ploidy NCHR; default is haploid (NCHR=1).  No modifications 
of existing input files should be required if haploid data are analyzed.


RH2PT:  INPUT

Input for RH2PT is in the form of a single file that contains numbers of loci 
and hybrids, locus names, format for reading the hybrid names and retention 
data, retention characters, an output permutation, and hybrid names and the 
retention data.

An abbreviated version of the sample data file RH2PT.DAT is provided below:

  14  99   0   1
APP S1  S4  S8  S11 S12 S16 S18 S46 S47 S48 S52 S111SOD1
(A2,14(1X,A1),T3,I1)
+-?
S16 S48 S46 S4  S52 S11 S1  S18 S8  APP S12 S111S47 SOD1
 1 - - - - + - - - - + - - - +
 2 + + + + + + + + + + + + + +
 3 ? - + ? - + + + ? ? + ? ? ?
 4 - - + - + - - - + - - + ? -
 5 - - - - - - - - - - - - - -
 6 - - - - - - - - - - - - - -
 7 - - + - - - + - + - + ? ? ?
 8 + + + + + + + + + + + + + -
 . . . . . . . . . . . . . . .
 . . . . . . . . . . . . . . .
 . . . . . . . . . . . . . . .
98 - + + - + - + + + - + + - -
99 ? + + + + + + - + + + + + +


The following records in the given order and with variables and formats as 
described below are required as input for RH2PT:

1. Numbers of loci and RHs, output option, and ploidy, each right-justified in a 
4 column field (4I4).

Columns  1- 4  NLOCUS:  the number of loci in the data set
Columns  5- 8  NHYB:    the number of RHs in the data set
Columns  9-12  OUTOPT:  output option
               =0 print table 5
               =1 do not print table 5 (see below).
Columns 13-16  NCHR:  the ploidy for these data; =1 for haploid data, =2
               for diploid data, etc.  If left blank, deafults to 1
               (haploid).

2. Locus names for all NLOCUS loci, each left-justified in a 4 column field 
20A4).  Locus names can include any characters.  If there are more than 20 loci, 
locus names should be entered on multiple lines, 20 names per line.
Columns  1- 4  LNAME(1):  name of the first locus
Columns  5- 8  LNAME(2):  name of the second locus, etc.

3. Format for reading the hybrid names and retention status data.  This FORTRAN 
format statement is used to read the information on each RH.  Each hybrid record 
consists of the hybrid name, retention information for each locus, and the 
number of times that hybrid was observed.  The hybrid name will be read in 
character (A) format, and may be up to 4 characters long.  Retention information 
on each locus is also in character (A) format, one character per locus.  
Finally, the number of times the hybrid was observed is read in integer (I) 
format.  A zero or a blank in this field is interpreted by the program as one 
hybrid of this type.  For example, (A2,14(1X,A1),T3,I1) is a format for a RH 
mapping data set with 14 loci. Note:  the T3 in this format statement says to 
tab back to column 3 which happens to be an entirely blank column in the sample 
data set; the result is that the program assumes each hybrid is present once.

4. Retention status characters representing (a) locus typed and present, (b) 
locus typed and absent, and (c) locus not typed.  A single character is allowed 
for each of these three situations.  These characters are read in (3A1) format.  
In the above example, +, -, and ? are used.
Column 1  Character representing that locus is typed and present.
Column 2  Character representing that locus is typed and absent.
Column 3  Character representing that locus is not typed.

5. Locus names specifying the output permutation for the loci.  Locus names 
should be specified for all NLOCUS loci in the order in which they will be 
output in the tables.  Each locus name should be left-justified in a 4 column 
field (20A4).  Locus names can include any characters.  If there are more than 
20 loci, locus names should be entered on multiple lines, 20 names per line.
Columns  1- 4  LNAMEP(1):  first locus in the permutation
Columns  5- 8  LNAMEP(2):  second locus in the permutation, etc.

6. Hybrid records, one per hybrid, specifying the hybrid name, retention 
information for each locus, and the number of times that hybrid was observed.
Each of these variables will be read as indicated in the format statement 
defined in 3. above.  The hybrid name may be up to 4 characters long and can be 
anywhere within the input field; any characters can be used.  Retention 
information on each locus may also be any character, but must correspond to 
those defined in 4. above.  Finally, the number of times a hybrid is observed is 
read right-justified in integer format.

Note:  If the number of times a hybrid is observed is specified as zero or 
blank, it is interpreted as 1.  Thus, if all hybrids are observed exactly once 
(the usual case), the number of times observed column may be left blank in the 
hybrid records.  However, the format item for reading those blanks must still be 
present in the format statement, and the blank column(s) must be present in the 
input file.


RH2PT:  OUTPUT

The output from RH2PT is in the form of seven tables.  Descriptions and 
abbreviated examples of these tables follow.

Table 1 gives the locus names in the order specified by the above output 
permutation.

TABLE 1:  PERMUTED LOCUS NAMES

      LOCUS        LOCUS
      NUMBER       NAME

         1          S16
         2          S48
         3          S46
         4          S4
         5          S52
         6          S11
         7          S1
         8          S18
         9          S8
        10          APP
        11          S12
        12          S111
        13          S47
        14          SOD1


Table 2 provides symbols for retention status.  These are the symbols for marker 
typed and retained, marker typed and lost, and marker not typed, respectively.


TABLE 2:  RETENTION STATUS CHARACTERS

    + = RETAINED
    - = NOT RETAINED
    ? = UNTYPED


Table 3 echoes the retention status data for this problem.  The data are 
permuted according to the output permutation.  Loci are labelled with the locus 
numbers specified in Table 1.  Also output are the numbers of RHs and the number 
of unique retention status patterns observed.


TABLE 3:  PERMUTED RADIATION HYBRID RETENTION STATUS DATA

HYBRID  HYBRID   NUMBER   LOCUS NUMBER
NUMBER   NAME   OBSERVED  1  2  3  4  5  6  7  8  9 10 11 12 13 14

   1      1         1     -  -  -  -  -  +  -  -  -  -  -  -  +  +
   2      2         1     +  +  +  +  +  +  +  +  +  +  +  +  +  +
   3      3         1     +  +  ?  +  ?  -  -  +  ?  ?  +  ?  ?  ?
   4      4         1     -  -  +  +  +  +  -  -  -  -  -  ?  -  -
   5      5         1     -  -  -  -  -  -  -  -  -  -  -  -  -  -
   6      6         1     -  -  -  -  -  -  -  -  -  -  -  -  -  -
   .      .         .     .  .  .  .  .  .  .  .  .  .  .  .  .  .
   .      .         .     .  .  .  .  .  .  .  .  .  .  .  .  .  .
   .      .         .     .  .  .  .  .  .  .  .  .  .  .  .  .  .
  98     98         1     +  +  +  +  +  +  +  +  -  -  -  -  -  -
  99     99         1     +  +  +  +  +  +  +  -  +  ?  +  +  +  +

TOTAL NUMBER OF HYBRIDS OBSERVED                       99
NUMBER OF UNIQUE HYBRID RETENTION PATTERNS OBSERVED:   71
PLOIDY:                                                 1


Table 4 prints the number and proportion of hybrids typed for each locus, the 
number and proportion of typed hybrids that retain each locus, and the estimated 
retention rate on a per chromosome basis.  For haploid data, these two retention 
estimates are the same; for c-ploid data, the overall rate R and the haploid 
rate r are related as R=1-(1-r)**c.  Totals for each of these quantities are 
also printed.


TABLE 4:  LOCUS RETENTION PROBABILITIES

                                           P(RETAINED)
 LOCUS    TYPED    P(TYPED)  RETAINED   OVERALL   HAPLOID

  S16       81       0.818      48       0.593     0.593
  S48       96       0.970      56       0.583     0.583
  S46       71       0.717      38       0.535     0.535
  S4        96       0.970      48       0.500     0.500
  S52       67       0.677      31       0.463     0.463
  S11       94       0.949      53       0.564     0.564
  S1        91       0.919      43       0.473     0.473
  S18       95       0.960      35       0.368     0.368
  S8        71       0.717      29       0.408     0.408
  APP       71       0.717      24       0.338     0.338
  S12       94       0.949      34       0.362     0.362
  S111      68       0.687      22       0.324     0.324
  S47       85       0.859      36       0.424     0.424
  SOD1      64       0.646      26       0.406     0.406

 TOTAL    1144       0.825     523       0.457     0.457


Table 5 prints conditional coretention probabilities for each locus pair.  These 
are the probability the first locus is retained given that the second locus is 
(is not) retained, and conversely, the probability the second locus is retained 
given that the first locus is (is not) retained.  These coretention 
probabilities are measures of the dependence of retention for the different 
locus pairs.  Output is given in sections with the second locus varying first.


TABLE 5:  CONDITIONAL CORETENTION PROBABILITIES

              BOTH
LOC1  LOC2   TYPED  P(L1|L2) P(L1|NOT L2)  P(L2|L1) P(L2|NOT L1)

S16   S48      81     0.979      0.059       0.958     0.030
S16   S46      71     0.947      0.121       0.900     0.065
S16   S4       81     0.949      0.262       0.771     0.061
S16   S52      67     0.871      0.250       0.750     0.129
S16   S11      79     0.750      0.371       0.717     0.333
S16   S1       77     0.857      0.357       0.667     0.156
S16   S18      80     0.897      0.431       0.542     0.094
S16   S8       71     0.828      0.381       0.600     0.161
S16   APP      71     0.833      0.404       0.513     0.125
S16   S12      80     0.840      0.473       0.447     0.121
S16   S111     65     0.857      0.409       0.500     0.103
S16   S47      73     0.741      0.457       0.488     0.219
S16   SOD1     64     0.692      0.421       0.529     0.267

S48   S46      71     0.974      0.061       0.949     0.031
S48   S4       96     0.958      0.208       0.821     0.050
S48   S52      67     0.903      0.222       0.778     0.097
S48   S11      94     0.736      0.366       0.722     0.350
S48   S1       91     0.837      0.333       0.692     0.179
S48   S18      95     0.886      0.417       0.554     0.103
S48   S8       71     0.793      0.381       0.590     0.188
S48   APP      71     0.833      0.383       0.526     0.121
S48   S12      94     0.824      0.450       0.509     0.154
S48   S111     68     0.864      0.391       0.514     0.097
S48   S47      85     0.750      0.449       0.551     0.250
S48   SOD1     64     0.692      0.421       0.529     0.267
....  ....     ..     .....      .....       .....     .....
....  ....     ..     .....      .....       .....     .....
....  ....     ..     .....      .....       .....     .....

S111  S47      61     0.640      0.111       0.800     0.220
S111  SOD1     53     0.455      0.194       0.625     0.324

S47   SOD1     62     0.800      0.054       0.909     0.125


Table 6 prints for each locus pair (1) the number of hybrids typed for both 
loci; (2) the numbers of hybrids typed for both loci that are negative for both 
loci, negative for the first locus and positive for the second locus, positive 
for the first locus and negative for the second locus, and positive for both 
loci; and (3) estimates of the breakage probability and the distance (in Rays) 
between the loci, and the corresponding maximum lod scores, all assuming equal 
retention for all fragments.  Output is in sections as in table 5, with the 
second locus varying first.


TABLE 6:  MAXIMUM LOD SCORES AND BREAKAGE PROBABILITY AND DISTANCE ESTIMATES

                    BOTH                                               LOD
  LOCUS1  LOCUS2    TYPED    --    -+    +-    ++    P(BR)     DIST   SCORE

   S16     S48        81     32     1     2    46    0.076    0.079   18.30
   S16     S46        71     29     2     4    36    0.171    0.187   12.31
   S16     S4         81     31     2    11    37    0.323    0.390    8.81
   S16     S52        67     27     4     9    27    0.388    0.491    5.85
   S16     S11        79     22    11    13    33    0.620    0.967    2.53
   S16     S1         77     27     5    15    30    0.520    0.735    4.01
   S16     S18        80     29     3    22    26    0.626    0.983    2.49
   S16     S8         71     26     5    16    24    0.592    0.897    2.64
   S16     APP        71     28     4    19    20    0.656    1.068    1.85
   S16     S12        80     29     4    26    21    0.758    1.417    1.03
   S16     S111       65     26     3    18    18    0.656    1.067    1.70
   S16     S47        73     25     7    21    20    0.771    1.473    0.84
   S16     SOD1       64     22     8    16    18    0.753    1.398    0.86

   S48     S46        71     31     1     2    37    0.085    0.089   15.87
   S48     S4         96     38     2    10    46    0.252    0.290   13.07
   S48     S52        67     28     3     8    28    0.328    0.398    7.18
   S48     S11        94     26    14    15    39    0.629    0.992    2.86
   S48     S1         91     32     7    16    36    0.506    0.706    5.03
   S48     S18        95     35     4    25    31    0.612    0.946    3.19
   S48     S8         71     26     6    16    23    0.621    0.970    2.27
   S48     APP        71     29     4    18    20    0.630    0.994    2.15
   S48     S12        94     33     6    27    28    0.704    1.218    1.81
   S48     S111       68     28     3    18    19    0.629    0.991    2.07
   S48     S47        85     27     9    22    27    0.729    1.307    1.37
   S48     SOD1       64     22     8    16    18    0.753    1.398    0.86

   ...     ....       ..     ..     .    ..    ..    .....    .....    ....
   ...     ....       ..     ..     .    ..    ..    .....    .....    ....
   ...     ....       ..     ..     .    ..    ..    .....    .....    ....

   S111    S47        61     32     9     4    16    0.458    0.612    3.98
   S111    SOD1       53     25    12     6    10    0.738    1.341    0.78

   S47     SOD1       62     35     5     2    20    0.240    0.274    8.48


Table 7 presents linkage groups constructed from the results of Table 6.  A 
linkage group is defined here as a set of loci for which there is clear pairwise 
evidence of linkage.  That is, loci A and B are in the same linkage group if the 
maximum lod score for A and B is greater than some constant c, or if there exist 
loci C, D, ..., H such that the maximum lod scores between B and C, C and D, 
..., and H and B all are at least c.  For this purpose, we have arbitrarily 
chosen to use the maximum lod scores calculated under the general retention 
model, and values c = 2.0, 3.0, and 4.0.  For the chromosome 21 data of Cox et 
al. (1990), all loci are in the same linkage group under each of the two-point 
lod score criteria.


TABLE 7:  LINKAGE GROUPS

LOD SCORE CRITERION:   2.00

LINKAGE GROUP  1:
S16   S48   S46   S4    S52   S11   S1    S18   S8    APP
S12   S111  S47   SOD1


LOD SCORE CRITERION:   3.00

LINKAGE GROUP  1:
S16   S48   S46   S4    S52   S11   S1    S18   S8    APP
S12   S111  S47   SOD1


LOD SCORE CRITERION:   4.00

LINKAGE GROUP  1:
S16   S48   S46   S4    S52   S11   S1    S18   S8    APP
S12   S111  S47   SOD1


If a data set includes more than one linkage group, multipoint analyses should 
begin with separate analyses of the apparently distinct linkage groups.

Table 8 presents a list of locus pairs that fail to display obligate chromosome 
breaks in the data, together with their co-retention pattern.  When building a 
map, the analysis will be substantially simplified by removing one of the two 
loci in each pair.  Such an approach can occasionally alter the results if the 
two markers have different patterns of missing data.


TABLE 8:  TOTALLY-LINKED LOCUS PAIRS

                         LOCUS-PAIR RETENTION STATUS
 LOCUS1    LOCUS2    --  -+  +-  ++  -?  +?  ?-  ?+  ??

  S12       S111     45   0   0  22  15  12   1   0   4


RHMINBRK:  INTRODUCTION AND ASSUMPTIONS

RHMINBRK is a FORTRAN 77 program that calculates numbers of obligate chromosome 
breaks for locus orders, and attempts to identify those orders requiring the 
fewest obligate chromosome breaks (Boehnke et al. 1991; Bishop and Crockford 
1992; Boehnke 1992; Weeks et al. 1992).  The idea behind the minimum break 
approach is that the closer two loci are on the chromosome, the fewer breaks 
that should occur between them.  Thus, the best locus order is that requiring 
the fewest obligate chromosome breaks. Such an approach is analogous to genetic 
mapping by minimizing recombinants (Thompson 1987).  Note that the minimum 
obligate breaks approach requires only that loci be arranged in a linear way 
along the chromosome.  Thus, minimum obligate chromosome breaks provides a non-
parametric method for locus ordering

Counting obligate breaks is straightforward.  For a given locus order, obligate 
breaks occur when a retained locus follows a locus which is lost or vice versa; 
in this tabulation, untyped loci are ignored.  It should be noted that the 
number of obligate breaks is generally substantially less than the number of 
actual breaks.  Indeed, if r is the probability a human chromosome fragment is 
retained in a hybrid, the mean values of the number of obligate breaks B and 
number of actual breaks N are related according to E(B) = 2r(1-r)E(N) (Barrett 
1992).  Thus, the number of actual chromosome breaks will on average be at least 
twice as large as the number of obligate breaks.

RHMINBRK prints tables of (1) locus names; (2) marker retention symbols; (3) 
observed RH retention data; (4) best locus orders ranked on the basis of minimum 
obligate breaks; (5) RH retention data permuted to be consistent with the best 
minimum break locus order; (6) observed distribution of the number of obligate 
breaks per hybrid; and (7) influential hybrids for the various nearly-best locus 
orders.


RHMINBRK:  CHANGES IN VERSION 2

Changes in RHMINBRK in version 2 include:  (a) analysis in which a subset of the 
loci are forced in a pre-specified order within the map, allowing incorporation 
of prior information from other mapping methods; and (b) analysis in which a 
particular genetic marker is forced to be at the end of the map if it is 
included, providing a method to eliminate "flip-flops" of marker groups at the 
end(s) of a map.

These changes result in one modification in program input:  optional 
specification of the ordering restriction variable NFORCE; default is no forcing 
(NFORCE=0).  No modifications of existing input files should be required if 
haploid data are analyzed.


RHMINBRK:  ORDERING STRATEGIES

Given n loci A(1), A(2), ..., A(n), RHMINBRK provides four strategies for locus 
ordering.  These are:

1. List of user-specified locus orders.  Each order is evaluated in terms of 
minimum number of obligate chromosome breaks, and the orders are ranked on that 
basis.

2. Stepwise locus ordering.  This strategy builds locus orders one locus at a 
time.  At step m (m <= n), a new locus is added to the list of currently saved 
partial locus orders, all of which contain the same m-1 loci.  An m-locus order 
constructed in this way is then saved for further consideration if its number of 
obligate chromosome breaks is not too much larger than the number of obligate 
chromosome breaks for the best locus order made up of the same m loci.  This 
approach is analogous to the build option employed in CRIMAP (Barker et al. 
1987).

3. Simulated annealing (Kirkpatrick et al. 1983; Press et al. 1989).  This 
strategy starts with an n-locus order, and moves to different possible n-locus 
orders by proposal and (conditional) acceptance of random block inversions of 
loci.  At an early stage in the process, nearly all proposed block inversions of 
loci are taken.  As the process continues, the probability of accepting a move 
to an order requiring more obligate breaks becomes progressively smaller.  The 
goal is to sample a substantial number of locus orders, and not get bogged down 
early on in the region of a locally rather than globally best order.  A list of 
best encountered orders is kept during the process.

4. Branch and bound (see, for example, Nijenhuis and Wilf 1978).  This strategy 
is similar to stepwise locus ordering.  The difference is that partial locus 
orders are saved if they do not require too many more obligate breaks than a 
candidate locus order for the complete set of loci. This strategy, unlike the 
other three, guarantees the best locus order is identified.  However, if the 
number of loci n is large, branch and bound can require too much computation.  
Identification of a good candidate order is of critical importance for branch 
and bound, but this can be done automatically by a greedy algorithm.  This 
greedy algorithm builds the candidate order one locus at a time.  At each stage, 
the locus to add is selected so that the difference in obligate breaks for its 
best and next best position in the current best partial locus order are most 
different. The idea is to add the locus for which the positioning is most clear.

REMINDER:  Of the four ordering options, only branch and bound guarantees that 
the best locus order is identified.  To help insure the best locus order is 
found when using stepwise locus ordering, we recommend increasing SAVMAX (see
below) until no changes in results are observed.  For simulated annealing, 
we recommend re-starting the process several times with different locus orders
and merging the resulting lists of locus orders.  To emphasize the importance of 
this advice for stepwise locus ordering, we recommend running test problem 2 for 
RHMINBRK with SAVMAX=9 and again with SAVMAX=10; very different sets of locus 
orders are obtained.


RHMINBRK:  INPUT

As for RH2PT, input for RHMINBRK is in the form of a single file.  In contrast 
to the input for RH2PT, the input file for RHMINBRK can contain information for 
multiple problems for each of several data sets.  If there are several data 
sets, they follow, one after the other, in the input file. 

Input for each data set includes numbers of problems, loci, and hybrids, screen 
output option, locus names, format for reading the hybrid names and retention 
status data, retention status characters, hybrid names and the retention status 
data, and problem-specific information for each problem. Problem-specific 
information includes number and names of loci used in the problem, ordering 
option, information specific to the ordering option, and output options.

An abbreviated version of the sample data file RHMINBRK.DAT is provided below
("Problem 1", "Problem 2", etc. are not required items in the data file, but are 
included for ease of reading the sample problems):

   4  14  99   1
APP S1  S4  S8  S11 S12 S16 S18 S46 S47 S48 S52 S111SOD1
(A2,14(1X,A1),T3,I1)
+-?
 1 - - - - + - - - - + - - - +
 2 + + + + + + + + + + + + + +
 3 ? - + ? - + + + ? ? + ? ? ?
 4 - - + - + - - - + - - + ? -
 5 - - - - - - - - - - - - - -
 6 - - - - - - - - - - - - - -
 7 - - + - - - + - + - + ? ? ?
 8 + + + + + + + + + + + + + -
 . . . . . . . . . . . . . . .
 . . . . . . . . . . . . . . .

 . . . . . . . . . . . . . . .

98 - + + - + - + + + - + + - -
99 ? + + + + + + - + + + + + +
  14   1   1   0   1                           Problem 1
APP S1  S4  S8  S11 S12 S16 S18 S46 S47 S48 S52 S111SOD1
   4
(14A4)
S16 S48 S46 S4  S52 S11 S1  S18 S8  APP S12 S111S47 SOD1
S16 S48 S46 S4  S52 S11 S1  S18 S8  APP S111S12 S47 SOD1
S16 S48 S46 S4  S52 S11 S1  S18 APP S8  S12 S111S47 SOD1
S16 S48 S46 S4  S52 S11 S1  S18 APP S8  S111S12 S47 SOD1
  14   2   1   0   1                           Problem 2
APP S1  S4  S8  S11 S12 S16 S18 S46 S47 S48 S52 S111SOD1
   0   0  10   3  -4
S16 S4  APP S111
  14   4   1   0   1                           Problem 3
APP S1  S4  S8  S11 S12 S16 S18 S46 S47 S48 S52 S111SOD1
   0   0   3   3
  14   3   1   0   1                           Problem 4
APP S1  S4  S8  S11 S12 S16 S18 S46 S47 S48 S52 S111SOD1
       0      50   31131   17571    9713       4
     100     140    1400   1000.    0.90

This data file includes a single data set with four problems.

For each data set, the following records in the given order and with variables 
and formats as described below are required as input for RHMINBRK.  Multiple 
data sets can be included in a file simply by putting them one after another in 
the data file.

1. Numbers of problems, loci, and RHs, and screen output option, each right-
justified in a 4 column field (4I4).
Columns  1- 4  NPROB:   the number of problems for the data set
Columns  5- 8  NLOCT:   the total number of loci in the data set
Columns  9-12  NHYBT:   the total number of RHs in the data set
Columns 13-16  SCROPT:  screen output option.
               =0 for essentially no screen intermediate output
               =1 for some screen intermediate output
               =2 for lots of screen intermediate output

Note:  NLOCT cannot be greater than 999, even if MAXLOC is increased beyond 999.

2. Locus names for all NLOCT loci, each left-justified in a 4 column field 
(20A4).  Locus names can include any characters.  If there are more than 20 
loci, locus names should be entered on multiple lines, 20 names per line.
Columns  1- 4  LNAMET(1):  name of the first locus
Columns  5- 8  LNAMET(2):  name of the second locus, etc.

3. Format for reading the hybrid names and retention status data.  This FORTRAN 
format statement is used to read the information on each RH.  Each hybrid record
consists of the hybrid name, retention information for each locus, and the 
number of times that hybrid was observed.  The hybrid name will be read in 
character (A) format, and may be up to 4 characters long. Retention information 
on each locus is also in character (A) format, one character per locus.  
Finally, the number of times the hybrid was observed is read right-justified in 
integer (I) format.  A zero or a blank in this field is interpreted by the 
program as one hybrid of this type.  For example, (A2,14(1X,A1),T3,I1) is a 
format for a RH mapping data set with 14 loci. Note:  the T3 in this format 
statement says to tab back to column 3.  This allows the reading of a blank 
column.

4. Three different retention status characters representing (a) locus typed and 
present, (b) locus typed and absent, and (c) locus not typed.  A single 
character is allowed for each of these three situations.  These characters are 
read in (3A1) format.  In the above example, +, -, and ? are used. 
Column 1  Character representing locus typed and present.
Column 2  Character representing locus typed and absent.
Column 3  Character representing locus not typed.

5. Hybrid records, one per hybrid, specifying the hybrid name, retention 
information for each locus, and the number of times that hybrid was observed. 
Each of these variables will be read as indicated in the format statement 
defined in 3. above.  The hybrid name may be up to 4 characters long and can be 
anywhere within the input field; any characters can be used.  Retention 
information on each locus may also be any character, but must correspond to 
those defined in 4. above.  Finally, the number of times a hybrid is observed is 
read right-justified in integer format.

Note:  If the number of times a hybrid is observed is specified as zero or 
blank, it is interpreted as 1.  Thus, if all hybrids are observed exactly once 
(the usual case), the number of times observed column may be left blank in all 
hybrid records.  However, the format item for reading those blanks must be 
present in the format statement, and the blank column(s) must be present in the 
input file.

The following information is required for each of the NPROB problems for the 
current data set.  Problem information is entered problem by problem following
the data set information.

1. Problem control record:  The following five variables right-justified in 4 
column fields (5I4).

Columns  1- 4  NLOCUS:  the number of loci in the problem
Columns  5- 8  ORDOPT:  the ordering option for the problem
               =1 list of locus orders
               =2 stepwise locus ordering
               =3 simulated annealing
               =4 branch and bound
Columns  9-12  USEINC:  incomplete hybrid use option
               =0 exclude partially typed hybrids from the analysis
               =1 include partially typed hybrids in the analysis
Columns 13-16  NBEST:  upper bound on the number of locus orders to
               print.  If 0, no explicit upper bound.
Columns 17-20  INFOPT:  influential hybrid printing option
               =0 if no
               =1 to print influential hybrids and retention data
               ordered according to the best locus order

Note:  We strongly recommend that partially typed hybrids be included in the 
analysis (USEINC=1).  The primary reason for the option to exclude such hybrids 
is to permit comparison to other methods that exclude them.  We also recommend 
printing influential hybrid information (INFOPT=1) since it provides a useful 
indication of the degree and origin of support for the locus-ordering inferences 
made.

2. Locus names for all NLOCUS loci in the problem, each left-justified in a 4 
column field (20A4).  Locus names can include any characters.  If there are more 
than 20 loci, locus names should be entered on multiple lines, 20 names per 
line.
Columns  1- 4  LNAME(1):  name of the first locus
Columns  5- 8  LNAME(2):  name of the second locus, etc.

One of the following four sets of additional information is required, depending 
on which of the four ordering options is selected:

List of Locus Orders (ORDOPT=1) (see Problem 1 above):

3. Number of locus orders right-justified in a 4 column field (I4).

4. Format statement for reading the locus names in order.  For example, (14A4) 
for a problem involving 14 loci.

5. For each locus order, locus names for that order, each name left-justified as 
required by the format described in 4. above.

Stepwise Locus Ordering (ORDOPT=2) (see Problem 2 above) or Branch and Bound 
(ORDOPT=4) (see Problem 3 above):

3. Stepwise locus ordering/branch and bound control record:

The following five variables right-justified in a 4 column fields (5I4).
Columns  1- 4  Stepwise locus ordering or branch and bound option BBOPT
               =0 machine-generated locus adding order
               =1 user-specified locus adding order
Columns  5- 8  Candidate order option CANOPT
               =0 machine-generated candidate order
               =1 user-specified candidate order
Columns  9-12  SAVMAX:  maximum difference in number of obligate
               chromosome breaks to keep a partial locus order under
               consideration.  SAVMAX=k says to at any step delete a
               partial locus order that requires more than k more
               obligate breaks than the current best partial (ORDOPT=2)
               or candidate (ORDOPT=4) locus order.
Columns 13-16  PRTMAX:  maximum number of breaks different from the best
               locus order for printing a locus order.  Controls the number
               of locus orders to print.
Columns 17-20  NFORCE:  variable for restricting the locus orders considered.
               NFORCE=0 results in no restrictions on locus ordering.
               NFORCE = k (k = 3, 4, ..., NLOCUS) says to force the first k
               loci in LNAMEA (see below) into the locus order in the order
               given.  NFORCE=-1 says to force the first locus in LNAMEA to
               be an end locus in the map.  NFORCE = -k (k = 3, 4, ...,
               NLOCUS) says to force the first k loci in LNAMEA into the map
               in the order given AND to make the first locus in LNAMEA an
               end locus in the map.  See Problem 2.

Note:  We strongly recommend BBOPT=0 and CANOPT=0:  machine-generated locus
adding order and machine-generated candidate locus order.  The greedy algorithms 
we employ seem to generate very efficient locus-adding and candidate locus 
orders.

Note:  CANOPT is ignored for stepwise locus ordering (ORDOPT=2), since in that 
case, no candidate locus order is required.

Note:  If branch and bound is used, SAVMAX and PRTMAX should be equal.  If 
stepwise locus ordering is used, SAVMAX should be substantially larger than 
PRTMAX so that there is a good chance that all locus orders within PRTMAX breaks 
of the best locus orders will be obtained.

4. If branch and bound is used (ORDOPT=4), and a user-specified candidate locus 
order was requested (CANOPT=1), this order comes next.  This order should be 
chosen as a guess of the true order of loci.  It can be obtained from a prior 
two-point analysis.  Locus names for all NLOCUS loci in the problem, each left-
justified in a 4 column field (20A4).  Locus names can include any characters.  
If there are more than 20 loci, locus names should be entered on multiple lines, 
20 names per line.
Columns  1- 4  LNAMEC(1):  name of the first locus
Columns  5- 8  LNAMEC(2):  name of the second locus, etc.

Note:  There is no need for a candidate locus if stepwise locus ordering is used 
(ORDOPT=2).

It is very important that the candidate order be as nearly optimal as possible 
for efficient ordering by the branch and bound strategy.

5. If a user-specified order for adding loci was requested (BBOPT=1) or if the 
locus orders are to be restricted (NFORCE NE 0), specify a locus order next.  If 
BBOPT=1, locus names for all NLOCUS loci in the problem, each left-justified in 
a 4 column field (20A4).  If only restricting the possible locus orders 
(BBOPT=0, NFORCE NE 0), only the names of the forced loci need to be specified; 
they must be in the order desired.  Locus names can include any characters.  If 
there are more than 20 loci, locus names should be entered on multiple lines, 20 
names per line.
Columns  1- 4  LNAMEA(1):  name of the first locus
Columns  5- 8  LNAMEA(2):  name of the second locus, etc.

Simulated Annealing (ORDOPT=3) (see Problem 4 above):

3. Simulated annealing control record 1:  The following six variables right-
justified in 8 column fields (6I8).
Columns  1- 8  Simulated annealing option SAOPT.
               =0 for random initial locus order
               =1 for user-specified initial locus order
Columns  9-16  NUMORD:  number of best locus orders to save
Columns 17-24  ISEED1:  first random number generator seed
Columns 25-32  ISEED2:  second random number generator seed
Columns 33-40  ISEED3:  third random number generator seed
Columns 41-48  PRTMAX:  maximum obligate break difference from the
               best order for printing a locus order.  Controls the
               number of locus orders to print.

Among the NUMORD best locus orders encountered, those orders with no more than 
PRTMAX plus the minimum number of obligate breaks encountered will be printed.

Note:  Seeds for the random number generator should be between 1 and 32767.

4. Simulated annealing control record 2:  The following variables, the first 3 
values right-justified in 8 column fields, the last 2 anywhere in 8 column 
fields:
Columns  1- 8  NTEMP:  number of temperatures for simulated annealing
Columns  9-16  NBET:  number of moves to a better order required for
               temperature decrease
Columns 17-24  NMOVE:  maximum number of moves before a temperature
               decrease
Columns 25-32  TMAX:  initial (maximum) temperature
Columns 33-40  FACTOR:  factor by which temperature is decreased

Note 1:  FACTOR should be strictly greater than 0 and strictly less than 1.

Note 2:  TMAX and FACTOR both should have decimal points. For example, a maximum 
temperature of 100 could be represented as 100. or 100.0, but not simply as 100

5. If a user-specified initial locus order was requested (SAOPT=1), that order 
comes next.  Locus names for all NLOCUS loci in the problem, each left-justified 
in a 4 column field (20A4).  Locus names can include any characters.  If there 
are more than 20 loci, locus names should be entered on multiple lines, 20 names 
per line.
Columns  1- 4  LNAMEA(1):  name of the first locus
Columns  5- 8  LNAMEA(2):  name of the second locus, etc.

Note:  We currently use an initial temperature of 1000.0, a temperature decrease 
factor of 0.90, require 10n successful moves for an early temperature decrease, 
and limit the number of moves before the next temperature decrease to 100n, 
where n is the number of loci in the problem. THESE ARE NOT OPTIMIZED VALUES!!  
For example, the initial temperature almost certainly should depend on the 
number of obligate breaks for a candidate locus order.  If you use simulated 
annealing, we encourage you to play around with the parameter settings, and to 
compare results from different settings.  We would be very interested to learn 
of the results of such investigations.

Branch and Bound (ORDOPT=4) (see Problem 3 above):

Input requirements for branch and bound are exactly the same as those for 
stepwise locus ordering (see above).


RHMINBRK:  OUTPUT

The output from RHMINBRK is in the form of several tables.

For each data set (set of problems), the following three tables are provided:  
(1) locus numbers and names; (2) retention status characters; and (3) hybrid 
names, retention information, and numbers of times observed.

For each problem, the following tables are presented:  (1) description of the 
analysis undertaken; (2) best locus orders ranked by minimum obligate chromosome 
breaks; (3) RH data permuted to be consistent with the best minimum breaks locus 
order; (4) observed distribution of the number of obligate chromosome breaks per 
hybrid; and (5) influential hybrids for the nearly-best locus orders.  (3)-(5) 
are printed only if influential hybrid information is requested (INFHYB=1).

Descriptions and abbreviated examples of these tables follow.

The first table gives locus names and numbers for the current data set.  These 
locus numbers are used in the tables that follow.

LOCUS NAMES FOR PROBLEM SET     1

LOCUS        LOCUS
NUMBER       NAME

   1         APP
   2         S1
   3         S4
   4         S8
   5         S11
   6         S12
   7         S16
   8         S18
   9         S46
  10         S47
  11         S48
  12         S52
  13         S111
  14         SOD1


The second table provides symbols for retention status.  These are the symbols for marker typed and retained, marker typed and lost, and marker not typed, respectively.

RETENTION STATUS CHARACTERS

    + = RETAINED
    - = NOT RETAINED
    ? = UNTYPED


The third table echoes the retention status information for each hybrid. It also lists each hybrid's number, name, and the number of times the hybrid was observed.

RADIATION HYBRID RETENTION STATUS DATA

HYBRID    HYBRID    NUMBER
NUMBER     NAME    OBSERVED     RETENTION STATUS

   1        1          1        - - - - + - - - - + - - - +
   2        2          1        + + + + + + + + + + + + + +
   3        3          1        ? - + ? - + + + ? ? + ? ? ?
   4        4          1        - - + - + - - - + - - + ? -
   5        5          1        - - - - - - - - - - - - - -
   6        6          1        - - - - - - - - - - - - - -
   7        7          1        - - + - - - + - + - + ? ? ?
   8        8          1        + + + + + + + + + + + + + -
   .        .          .        . . . . . . . . . . . . . .
   .        .          .        . . . . . . . . . . . . . .
   .        .          .        . . . . . . . . . . . . . .
  98       98          1        - + + - + - + + + - + + - -
  99       99          1        ? + + + + + + - + + + + + +

TOTAL NUMBER OF HYBRIDS:         99


The first three tables are printed once per data set.  The remaining tables are 
printed for each problem.  Complete results for all four analyses can be found 
in the file RHMINBRK.OUT.  Here, only the output for problem 2 has been printed, 
and it has been compressed.

Output for each problem begins with an annotated echoing of the input data.

PROBLEM NUMBER                         2

NUMBER OF LOCI:                       14
ORDERING OPTION:                STEPWISE
USE INCOMPLETE HYBRIDS:              YES
IDENTIFY INFLUENTIAL HYBRIDS:        YES

GENETIC LOCI:      APP  S1   S4   S8   S11  S12  S16  S18  S46  S47
                   S48  S52  S111 SOD1


STEPWISE LOCUS ORDERING OPTIONS

MAXIMUM BREAK DIFFERENCE TO SAVE ORDER:  10
MAXIMUM BREAK DIFFERENCE TO PRINT ORDER:  3
ADDING ORDER:             MACHINE-GENERATED
CANDIDATE ORDER:          MACHINE-GENERATED
ORDER FOR FORCED LOCI:    S16  S4   APP  S111
ORDER FOR ADDING LOCI:    S16  S4   APP  S111 S48  S52  S46  S47  SOD1 S1
                          S18  S11  S8   S12


Next, the list of best minimum break locus orders is printed.  In this output, 
RANK has been abbreviated as RK, BREAKS as BRKS, and columns have been deleted 
to allow the information to fit in this document.  BREAKS gives the number of 
obligate breaks for the order.  If the maximum number of locus orders is not 
exceeded, locus orders are sorted by number of obligate breaks; otherwise, they 
are given in the order in which they were encountered.

In this example, there are four locus orders requiring no more than 3 more than 
the minimum number of obligate breaks, 123, that satisfy the forced suborder 
S16-S4-APP-S111.


LIST OF BEST MINIMUM OBLIGATE BREAK LOCUS ORDERS
RK BRKS  LOCUS ORDER

1 123 S16  S48  S46  S4   S52  S11  S1   S18  S8   APP  S12  S111 S47  SOD1
      7    11    9    3   12    5    2    8    4    1    6   13   10   14

2 123 S16  S48  S46  S4   S52  S11  S1   S18  S8   APP  S111 S12  S47  SOD1
       7   11    9    3   12    5    2    8    4    1   13    6   10   14

3 125 S16  S48  S46  S4   S52  S11  S1   S18  APP  S8   S12  S111 S47  SOD1
       7   11    9    3   12    5    2    8    1    4    6   13   10   14

4 125 S16  S48  S46  S4   S52  S11  S1   S18  APP  S8   S111 S12  S47  SOD1
       7   11    9    3   12    5    2    8    1    4   13    6   10   14


If influential hybrid information is requested (INFOPT=1), the following three 
tables are printed:

The first presents the retention information permuted according to (one of) the 
best minimum break locus order(s), as well as numbers of obligate breaks.  This 
listing can be scanned for retention patterns suggesting possible typing errors.  
These include patterns, such as ++++++-+++++++ or -------+-------, which may 
correctly indicate two close breaks, or instead may represent false negatives or 
positives.

RETENTION DATA PERMUTED IN THE BEST LOCUS ORDER FOR PROBLEM 2

HYBRID HYBRID  NUMBER  OBLIGATE   RETENTION STATUS
NUMBER  NAME  OBSERVED  BREAKS    14 10  6 13  1  4  8  2  5  12  3  9 11  7

   1     1        1        3       +  +  -  -  -  -  -  -  +   -  -  -  -  -
   2     2        1        0       +  +  +  +  +  +  +  +  +   +  +  +  +  +
   3     3        1        2       ?  ?  +  ?  ?  ?  +  -  -   ?  +  ?  +  +
   4     4        1        2       -  -  -  ?  -  -  -  -  +   +  +  +  -  -
   5     5        1        0       -  -  -  -  -  -  -  -  -   -  -  -  -  -
   6     6        1        0       -  -  -  -  -  -  -  -  -   -  -  -  -  -
   7     7        1        1       ?  -  -  ?  -  -  -  -  -   ?  +  +  +  +
   8     8        1        1       -  +  +  +  +  +  +  +  +   +  +  +  +  +
   .     .        .        .       .  .  .  .  .  .  .  .  .   .  .  .  .  .
   .     .        .        .       .  .  .  .  .  .  .  .  .   .  .  .  .  .
   .     .        .        .       .  .  .  .  .  .  .  .  .   .  .  .  .  .
  98    98        1        1       -  -  -  -  -  -  +  +  +   +  +  +  +  +
  99    99        1        2       +  +  +  +  ?  +  -  +  +   +  +  +  +  +

The next table provides the observed distribution of the number of obligate 
breaks per hybrid under the best locus order.  The presence of a large number of 
hybrids requiring multiple obligate breaks suggests the possibility that typing 
errors may be present in the data.  Re-scoring or re-typing of hybrids requiring 
substantial numbers of obligate breaks should be considered; these hybrids can 
be identified in the above permutation of the retention data.

NUMBERS OF OBLIGATE BREAKS PER HYBRID:

NUMBER OF BREAKS        0    1    2    3    4    5
NUMBER OF HYBRIDS      36   26   22    9    4    2


The influential hybrid table lists for each nearly best locus order the hybrids 
that require different numbers of obligate breaks than under the best order, and 
how those numbers differ.  For locus order 2, no such hybrids exist.  For locus 
orders 3 and 4, only hybrid 69 requires a different number of breaks, namely two 
more, than under the best locus order.  Thus, hybrid 69 is solely responsible 
for the relative ordering of APP and S8; hybrid 69 is influential.  Re-scoring 
or even re-typing of hybrid 69 might be undertaken, since hybrid 69 is the sole 
basis for the relative ordering of these two loci.

INFLUENTIAL HYBRIDS FOR THE MOST LIKELY ORDERS

RANK  BREAKS   HYBRID NAME AND BREAK DIFFERENCES (OTHER-BEST)

  2     123     NO INFLUENTIAL HYBRIDS IDENTIFIED.

  3     125     69
                   2

  4     125     69
                   2


             RHMAXLIK:  INTRODUCTION, ASSUMPTIONS, AND MODELS

RHMAXLIK is a FORTRAN 77 program that carries out maximum likelihood estimation 
of model parameters for four different breakage and retention models.  Each of 
these models assume that X-ray breakage occurs as a Poisson process (see, for 
example, Karlin and Taylor 1975) along the chromosome, that is, constant 
breakage intensity and no interference. Given n loci A(1), A(2), ..., A(n), all 
models are parameterized in terms of the n-1 breakage probabilities between 
adjacent loci.  Under the above assumptions, a breakage probability t can be 
converted to an additive distance d using the formula d=-ln(1-t); note the close 
analogy to the Haldane (1919) genetic mapping function.

The models supported by RHMAXLIK differ in their assumptions about fragment 
retention.  Let r(i,j) be the probability a fragment containing exactly loci 
A(i), A(i+1), ..., A(j) (i <= j) should be retained in a RH.  The models 
currently supported are:

1. Equal retention model (Bishop and Crockford 1992; Boehnke 1992; Boehnke et 
al. 1991; Chakravarti and Reefer 1992; Lawrence and Morton 1992):  r(i,j) = r 
for all i <= j.  This simplest model assumes all fragments have the same 
retention probability.  This model includes a total of n parameters, n-1 
breakage probabilities and one retention probability.

2. Centromeric (telomeric) retention model (Bishop and Crockford 1992; Boehnke 
1992; Boehnke et al. 1991; Lawrence and Morton 1992):  r(1,j) = r(1) for all j; 
r(i,j) = r(2) for all 1 < i <= j.  This model allows for a higher or lower 
retention probability for fragments containing the centromere, telomere, or more 
generally, one endpoint of the map.  This model includes a total of n+1 
parameters, n-1 breakage probabilities and two retention probabilities.

3. Left-endpoint retention model (Boehnke et al. 1991; Bishop and Crockford 
1992; Boehnke 1992):  r(i,j) = r(i) for all i <= j.  This model allows the 
fragment retention probability to depend on the left-most locus present on the 
fragment.  This model includes a total of 2n-1 parameters, n-1 breakage 
probabilities and n retention probabilities.

4. General retention model (Cox et al. 1990):  allows all retention 
probabilities to differ.  This model includes a total of (n**2+3n-2)/2 
parameters, n-1 breakage probabilities and n(n+1)/2 retention probabilities.

Note that these four models are nested, so that conditional on order, likelihood 
ratio tests can be carried out to test for the relative fit of the models to the 
data.  In particular, if logL(p) and logL(q) are the maximum log10-likelihoods 
for a particular locus order for models p and q, then (2 ln10) [logL(q)-logL(p)] 
should be asymptotically distributed as chi-squared, with degrees of freedom 
equal to the difference in numbers of parameters for the two models.  Note:  the 
factor (2 ln10) (about 4.605) converts the log10-likelihood difference to twice 
the natural log-likelihood difference.

For haploid data, all models but the general model are Markovian in the sense 
that conditional on the retention status of the last locus, previous loci 
contribute no additional information (Boehnke et al. 1991). Computation of locus 
order likelihoods for these models scales linearly with the number of loci n.  
In contrast, the general retention model is non-Markovian, and computation of a 
locus order likelihood under this model scales geometrically with the number of 
loci.  For diploid and more generally polyploid data, the general model is not 
supported.  The three models that are Markovian in the haploid setting are no 
longer so in the polyploid.  However, likelihoods still can be computed in a 
reasonably straightforward way using the theory of hidden Markov chains.  
Computation times scale a bit worse than linear in the number of chromosome 
copies present (the ploidy).

While all four models are in theory identifiable given at least n = 4 loci, it 
is our experience that the left-endpoint and general models often result in 
fragment retention probabilities at the boundary values of zero and one. Even 
so, their predictions of locus-specific retention probabilities can be 
noticeably better than those for the equal and centromeric models.  The 
resolving power for the more complex models to compare different orders seems 
often to be compromised by the greater flexibility provided by the large number 
of parameters.

Model fitting for a given locus order requires iterative techniques, since 
analytic expressions for the parameter estimates are not generally available.  
We employ EM algorithms (Dempster et al. 1977) for this purpose (Boehnke et al. 
1991).


RHMAXLIK:  CHANGES IN VERSION 2

Additions and other changes in RHMAXLIK in version 2 include:  (a) analysis of 
diploid and more generally polyploid RH mapping data; (b) restriction to a 
single set of initial values for parameter estimates based on two-locus maximum 
likelihood estimates in contrast to the previous existence of two initial value 
options; (c) analysis in which a subset of the loci are forced in a pre-
specified order within the map, allowing incorporation of prior information from 
other mapping methods; (d) analysis in which a particular genetic marker is 
forced to be at the end of the map if it is included, providing a method to 
eliminate "flip-flops" of marker groups at the end(s) of a map; (e) calculation 
of the distribution of the number of obligate chromosome breaks for a hybrid as 
a further aid in the detection of marker mistyping or misscoring; and (f) 
correction of several logical inconsistencies and formatting errors that should 
have had no effect on previous analyses.

These changes result in three modifications in program input:  (a) elimination 
of variable IGUESS which had specified the initial parameter guess option - 
this change requires modification of existing RHMAXLIK input files; (b) optional 
specification of the ploidy NCHR; default is haploid (NCHR=1); and (c) optional 
specification of the ordering restriction variable NFORCE; default is no 
forcing.


RHMAXLIK:  ORDERING STRATEGIES

The ordering strategies available in RHMAXLIK are the same as those available in 
RHMINBRK:

1. List of user-specified locus orders.  This strategy is particularly useful in 
RHMAXLIK if we wish to calculate maximum likelihoods for a set of locus orders 
identified under the simpler minimum breaks criterion.  Such an approach might 
be taken if the number of loci n is quite large or if the computationally 
intensive general retention model is used.

2. Stepwise locus ordering.  This strategy builds locus orders one locus at a 
time.  At step m (m <= n), a new locus is added to the list of currently saved 
partial locus orders, all of which contain the same m-1 loci.  An m-locus order 
constructed in this way is then saved for further consideration if its maximum 
likelihood is not too much smaller than the maximum likelihood for the best 
locus order made up of the same m loci.  This approach is analogous to the build 
option employed in CRIMAP (Barker et al. 1987).

3. Simulated annealing (Kirkpatrick et al. 1983; Press et al. 1989).  This 
strategy starts with an n-locus order, and moves to different possible n-locus 
orders by proposal and (conditional) acceptance of random block inversions of 
loci.  At an early stage in the process, nearly all proposed block inversions of 
loci are taken.  As the process continues, the probability of accepting a move 
to a less good locus order becomes progressively smaller.  The goal is to sample 
a substantial number of locus orders, and not get bogged down in the region of a 
locally rather than globally best order.  A list of best encountered orders is 
kept during the process.

4. Branch and bound (see, for example, Nijenhuis and Wilf 1978).  This strategy 
is similar to stepwise locus ordering.  The difference is that partial locus 
orders are saved if they are not too much worse in likelihood than a candidate 
locus order for the complete set of loci.  This strategy, unlike the other 
three, guarantees the best locus order is identified. However, unless the number 
of loci n is relatively small, branch and bound tends to require too much 
computation.  Identification of a good candidate order is of critical importance 
for branch and bound, but this can be done very nicely using the minimum breaks 
criterion and RHMINBRK.

REMINDER:  Of the four ordering options, only branch and bound guarantees that 
the best locus order is identified.  To help insure the best locus order is 
found when using stepwise locus ordering, we recommend increasing SAVMAX (see 
below) until no changes in results are observed.  For simulated annealing, we 
recommend re-starting the process several times with different locus orders and 
merging the resulting lists of locus orders.  To emphasize the importance of 
this advice for stepwise locus ordering, we recommend running test problem 2 for 
RHMINBRK with SAVMAX=9 and again with SAVMAX=10; very different sets of locus 
orders are obtained.


RHMAXLIK:  INPUT

As for the other two programs, input for RHMAXLIK is in the form of a
single file.  Like RHMINBRK, the data file can contain information for multiple 
problems for each of several data sets. 

For each data set, the data include numbers of problems, loci, and hybrids, 
screen output option, locus names, format for reading the hybrid names and 
retention status data, retention characters, hybrid names and the retention 
status data, and problem-specific information for each problem.  Problem-
specific information includes number and names of loci used in the problem, 
retention model, ordering option, information specific to the ordering option, 
and output options.

An abbreviated version of the sample data file RHMAXLIK.DAT is provided below 
("Problem 1", "Problem 2", etc. are not required items in the data file, but are 
included for ease of reading the sample problems):

   4  14  99   1   1
APP S1  S4  S8  S11 S12 S16 S18 S46 S47 S48 S52 S111SOD1
(A2,14(1X,A1),T3,I1)
+-?
 1 - - - - + - - - - + - - - +
 2 + + + + + + + + + + + + + +
 3 ? - + ? - + + + ? ? + ? ? ?
 4 - - + - + - - - + - - + ? -
 5 - - - - - - - - - - - - - -
 6 - - - - - - - - - - - - - -
 7 - - + - - - + - + - + ? ? ?
 8 + + + + + + + + + + + + + -
 . . . . . . . . . . . . . . .
 . . . . . . . . . . . . . . .
 . . . . . . . . . . . . . . .
98 - + + - + - + + + - + + - -
99 ? + + + + + + - + + + + + +
   6   4   1   1   0   0   1    0.00          Problem 1
S11 S1  S18 S8  APP S12
   4
(6A4)
S11 S1  S18 S8  APP S12
S1  S11 S18 S8  APP S12
S11 S1  S18 APP S8  S12
S1  S11 S18 APP S8  S12
   8   1   2   1   0   1   1    1.00          Problem 2
S11 S1  S18 S8  APP S12 S47 SOD1
   0    2.00    9.00    4.00  -3
S1  SOD1S18
   8   2   4   1   0   1   1    1.00          Problem 3
S11 S1  S18 S8  APP S12 S47 SOD1
   0    0.00    3.00    3.00
S11 S1  S18 S8  APP S12 S47 SOD1
   8   1   3   1   0   1   1    1.00          Problem 4
S1  S4  S11 S16 S46 S48 S52 SOD1
       0     100   31611    4531    6512    3.00
     100      80     800 1000.00    0.90

This data file includes a single data set with four problems.

For each data set, the following records in the given order and with variables 
and formats as described below are required as input for RHMAXLIK.  Multiple 
data sets can be included in a file simply by putting them one after another in 
the data file.

1. Numbers of problems, loci, and RHs, screen output option, and ploidy, each 
right-justified in a 4 column field (5I4).
Columns  1- 4  NPROB:   the number of problems for the data set
Columns  5- 8  NLOCT:   the total number of loci in the data set
Columns  9-12  NHYBT:   the total number of RHs in the data set
Columns 13-16  SCROPT:  screen output option.
               =0 for essentially no screen intermediate output
               =1 for some screen intermediate output
               =2 for lots of screen intermediate output
Columns 17-20  NCHR:  the ploidy for these data; =1 for haploid data, =2
               for diploid data, etc.  If left blank, defaults to 1
               (haploid).

Note:  NLOCT cannot be greater than 999, even if MAXLOC is increased beyond 999.

2. Locus names for all NLOCT loci, each left-justified in a 4 column field 
(20A4).  Locus names can include any characters.  If there are more than 20 
loci, locus names should be entered on multiple lines, 20 names per line. Columns  1- 4  LNAMET(1):  name of the first locus
Columns  5- 8  LNAMET(2):  name of the second locus, etc.

3. Format for reading the hybrid names and retention status data.  This FORTRAN
 format statement is used to read the information on each RH.  Each hybrid 
record consists of the hybrid name, retention information for each locus, and 
the number of times that hybrid was observed.  The hybrid name will be read in 
character (A) format, and may be up to 4 characters long. Retention information 
on each locus is also in character (A) format, one character per locus.  
Finally, the number of times the hybrid is observed is read right-justified in 
integer (I) format.  A zero or a blank in this field is interpreted by the 
program as one hybrid of this type.  For example, (A2,14(1X,A1),T3,I1) is a 
format for a RH mapping data set with 14 loci. Note:  the T3 in this format 
statement says to tab back to column 3.  This allows the reading of a blank 
column.

4. Retention status characters representing (a) locus typed and present, (b) 
locus typed and absent, and (c) locus not typed.  A single character is allowed 
for each of these three situations.  These characters are read in (3A1) format.  
In the above example, +, -, and ? are used. 
Column 1  Character representing locus typed and present.
Column 2  Character representing locus typed and absent.
Column 3  Character representing locus not typed.

5. Hybrid records, one per hybrid, specifying the hybrid name, retention 
information for each locus, and the number of times that hybrid was observed. 
Each of these variables will be read as indicated in the format statement 
defined in 3. above.  The hybrid name may be up to 4 characters long and can be 
anywhere within the input field; any characters can be used.  Retention 
information on each locus may also be any character, but must correspond to 
those defined in 4. above.  Finally, the number of times a hybrid is observed is 
read right-justified in integer format.

Note:  If the number of times a hybrid is observed is specified as zero or 
blank, it is interpreted as 1.  Thus, if all hybrids are observed exactly once 
(the usual case), the number of times observed column may be left blank in all 
hybrid records.  However, the format item for reading those blanks must still be 
present in the format statement, and the blank column(s) must be present in the 
input file.

The following information is required for each of the NPROB problems for the 
current data set.  Problem information is entered problem by problem following 
the data set information. 

1. Problem control record:  The following variables, the first seven right-
justified in 4 column fields, the last anywhere in an 8 column field (7I4,F8.5).

Columns  1- 4  NLOCUS:  the number of loci in the problem 
Columns  5- 8  MODEL:  the retention model for this problem
               =1 equal retention probability model
               =2 centromeric retention probability model
               =3 left-endpoint retention probability model
               =4 general retention probability model
Columns  9-12  ORDOPT:  the ordering option for the problem
               =1 list of locus orders
               =2 stepwise locus ordering
               =3 simulated annealing
               =4 branch and bound
Columns 13-16  USEINC:  incomplete hybrid use option
               =0 exclude partially typed hybrids from the analysis
               =1 include partially typed hybrids in the analysis
Columns 17-20  NBEST:  upper bound on the number of locus orders to
               print.  If 0, no upper bound.
Columns 21-24  INFOPT:  influential hybrid printing option
               =0 if no
               =1 to print influential hybrids and retention data
               ordered according to the best locus order
Columns 25-28  OUTOPT:  output option
               =0 basic output only
               =1 also print estimates for the nearly best locus orders
               =2 also print iteration output for each likelihood
                  maximization
Columns 29-36  HYBMIN:  minimum log10-likelihood difference to define
               a hybrid as influential (should have a decimal point in the
               number)

Note:  The elimination of IGUESS has altered the alignment of variables on this 
record.  Old RHMAXLIK files will need to be changed here!!!

Note:  We strongly recommend that partially typed hybrids be included in the 
analysis (USEINC=1).  The primary reason for the option to exclude such hybrids 
is to permit comparison to other methods that exclude them.  We also recommend 
printing influential hybrid information (INFOPT=1) since it provides a useful 
indication of the degree and origin of support for the inferences made.

Note:  Since likelihood maximization is carried out for every locus order 
considered, OUTOPT=2 can result in tremendous quantities of output.  It 
generally should be used only if ORDOPT=1 or if the number of loci n is very 
small.

Note:  Since the likelihood calculation for the general model scales 
geometrically with n, it is generally computationally impractical for n more 
than about 10.

Note:  Convergence is assumed after NCONV consecutive iterations in which the 
change in log10-likelihood is no greater than CONV.  In the distributed version, 
NCONV is set to 4, while CONV is set to CONV1=.002 or CONV2=.00002.  CONV=CONV1 
is used for initial locus ordering using stepwise locus ordering, simulated 
annealing, or branch and bound (ORDOPT=2,3,4), and for identification of 
influential hybrids.  CONV=CONV2 is used for re-evaluation of best locus orders 
for ORDOPT=2,3,4, as well as for evaluation of a list of locus orders 
(ORDOPT=1).  CONV=CONV2 is also used for estimation of model parameters.  More 
or less stringent convergence criteria can be set by modifying the data 
statement in line 50 of RHMAXL1.FOR.

2. Locus names for all NLOCUS loci in the problem, each left-justified in a 4 
column field (20A4).  Locus names can include any characters.  If there are more 
than 20 loci, locus names should be entered on multiple lines, 20 names per 
line.
Columns  1- 4  LNAME(1):  name of the first locus
Columns  5- 8  LNAME(2):  name of the second locus, etc.


One of the following four sets of additional information is required, depending 
on which of the four ordering options was selected:

List of Locus Orders (ORDOPT=1) (see Problem 1 above):

3. Number of locus orders right-justified in a 4 column field (I4).

4. Format statement for reading the locus names in order.  For example, (14A4) 
for a problem involving 14 loci.

5. For each locus order, locus names for that order, each name left-justified as 
required by the format described in 4. above.

Stepwise Locus Ordering (ORDOPT=2) (see Problem 2 above) or Branch and Bound 
(ORDOPT=4) (see Problem 3 above):

3. Stepwise locus ordering/branch and bound control record:

The following variables, the first right-justified in a 4 column field, the next 
three anywhere in 8 column fields (I4,3F8.5,I4), the last right-justified in a 4 
column field.

Columns  1- 4  Stepwise locus ordering or branch and bound option BBOPT
               =0 machine-generated locus adding order
               =1 user-specified locus adding order
Columns  5-12  ADDMIN:  minimum log10-likelihood difference to add a
               locus.  ADDMIN=0.0 produces a comprehensive map of
               loci.  ADDMIN=c>0 produces a framework map of loci
               all ordered with at least 10**c odds.
Columns 13-20  SAVMAX:  maximum log10-likelihood difference to keep a
               partial locus order under consideration.  SAVMAX=k
               says to at any step delete a partial locus order with
               maximum likelihood 10**k times smaller than that of
               the current best partial (ORDOPT=2) or candidate
               (ORDOPT=4) locus order.
Columns 21-28  PRTMAX:  maximum log10-likelihood difference from the best
               locus order for printing a locus order.  Controls the number
               of locus orders to print.
Columns 29-32  NFORCE:  variable for restricting the locus orders considered.
               NFORCE=0 results in no restrictions on locus ordering.
               NFORCE = k (k = 3, 4, ..., NLOCUS) says to force the first k
               loci in LNAMEA (see below) into the locus order in the order
               given.  NFORCE=-1 says to force the first locus in LNAMEA to
               be an end locus in the map.  NFORCE = -k (k = 3, 4, ...,
               NLOCUS) says to force the first k loci in LNAMEA into the map
               in the order given AND to make the first locus in LNAMEA an
               end locus in the map.

Note:  We strongly recommend BBOPT=0:  machine-generated locus adding order when 
stepwise locus ordering or branch and bound is selected.  The greedy algorithm 
we employ seems to generate very efficient locus-adding orders.

Note:  While ADDMIN=3, for example, results in a framework map of loci ordered 
with 1000:1 relative likelihood, the resulting framework map need not be maximal 
in the sense of containing the largest possible number of loci.  Limited 
experience suggests that the approach often generates nearly-maximal framework 
maps.  Combining several runs with ADDMIN = 0, 2, and 3 seems a useful approach 
for generating framework maps.

4. Candidate locus order for branch and bound (ORDOPT=4).  If branch and bound 
is used (ORDOPT=4), this order comes next.  This order should be chosen as a 
guess of the true order of loci.  It can be obtained from a prior minimum breaks 
or two-point analysis.  Locus names for all NLOCUS loci in the problem, each 
left-justified in a 4 column field (20A4).  Locus names can include any 
characters.  If there are more than 20 loci, locus names should be entered on 
multiple lines, 20 names per line.
Columns  1- 4  LNAMEC(1):  name of the first locus
Columns  5- 8  LNAMEC(2):  name of the second locus, etc.

Note:  There is no need for a candidate locus order for stepwise locus ordering 
(ORDOPT=2).

It is very important that the candidate order be as nearly optimal as possible 
for efficient ordering by the branch and bound strategy.

5. If a user-specified order for adding loci was requested (BBOPT=1) or if the 
locus orders are to be restricted (NFORCE NE 0), specify a locus order next.  If 
BBOPT=1, locus names for all NLOCUS loci in the problem, each left-justified in 
a 4 column field (20A4).  If only restricting the possible locus orders 
(BBOPT=0, NFORCE NE 0), only the names of the forced loci need to be specified; 
they must be in the order desired.  Locus names can include any characters.  If 
there are more than 20 loci, locus names should be entered on multiple lines, 20 
names per line.
Columns  1- 4  LNAMEA(1):  name of the first locus
Columns  5- 8  LNAMEA(2):  name of the second locus, etc.

Simulated Annealing (ORDOPT=3) (see Problem 4 above):

3. Simulated annealing control record 1:  The following variables, the first 
five right-justified in 8 column fields, the last anywhere in an 8 column field 
(5I8,F8.5).

Columns  1- 8  Simulated annealing option SSOPT.
               =0 for random initial locus order
               =1 for user-specified initial locus order
Columns  9-16  NUMORD:  number of best locus orders to save
Columns 17-24  ISEED1:  first random number generator seed
Columns 25-32  ISEED2:  second random number generator seed
Columns 33-40  ISEED3:  third random number generator seed
Columns 41-48  PRTMAX:  maximum log10-likelihood difference from the
               best order for printing a locus order.  Controls the
               number of locus orders to print.

Among the NUMORD best locus orders encountered, those with maximum likelihood no 
more than 10**PRTMAX times smaller than that for the best encountered order will 
be printed.

4. Simulated annealing control record 2:  The following variables, the first 3 
right-justified in 8 column fields, the last 2 anywhere in 8 column fields 
(3I8,2F8.5):

Columns  1- 8  NTEMP:  number of temperatures for simulated annealing
Columns  9-16  NBET:  number of moves to a better order prior
               required for temperature decrease
Columns 17-24  NMOVE:  maximum number of moves before a temperature
               decrease
Columns 25-32  TMAX:  initial (maximum) temperature
Columns 33-40  FACTOR:  factor by which temperature is decreased

Note 1:  Seeds for the random number generator should be between 1 and 32767.

Note 2:  TMAX and FACTOR both should have decimal points. For example, a
maximum temperature of 100 could be represented as 100. or 100.0, but not
simply as 100

5. If a user-specified initial locus order was requested (SAOPT=1), that order 
comes next.  Locus names for all NLOCUS loci in the problem, each left-justified 
in a 4 column field (20A4).  Locus names can include any characters.  If there 
are more than 20 loci, locus names should be entered on multiple lines, 20 names 
per line.
Columns  1- 4  LNAMEA(1):  name of the first locus
Columns  5- 8  LNAMEA(2):  name of the second locus, etc.

Note:  We currently use an initial temperature of 1000.0, a temperature decrease 
factor of 0.90, require 10n successful moves for an early temperature decrease, 
and limit the number of moves before the next temperature decrease to 100n, 
where n is the number of loci in the problem. THESE ARE NOT OPTIMIZED VALUES!!  
For example, the initial temperature almost certainly should depend on the log-
likelihood for a candidate order. If you use simulated annealing, we encourage 
you to play around with the parameter settings, and to compare results from 
different settings.  We would be very interested to learn of the results of such 
investigations.

Branch and Bound (ORDOPT=4) (see Problem 3 above):

Input requirements for branch and bound are exactly the same as those for 
stepwise locus ordering (see above).


RHMAXLIK:  OUTPUT

The output from RHMAXLIK is in the form of several tables.

For each data set (set of problems), the following three tables are provided:  
(1) locus numbers and names; (2) retention status characters; and (3) hybrid 
names, retention information, and numbers of times observed.

For each problem, the following tables are presented:  (1) description of the 
analysis undertaken; (2) best locus orders ranked by maximum likelihood; (3) if 
ORDOPT>1, possible positions for the various loci under the locus orders with 
maximum likelihoods no more than 10**PRTMAX or 1000 (whichever is smaller) times 
smaller than that of the best locus order; (4) parameter estimates (breakage 
probabilities, distances, and fragment retention probabilities), and predicted 
and observed locus retentions for the best locus order (OUTOPT=0) or orders 
(OUTOPT>0); and (5) influential hybrids and permutation of the RH data 
consistent with the most likely locus order (if INFHYB=1).

Descriptions and abbreviated examples of these tables follow:

The first table gives locus names and numbers for the current data set. These 
locus numbers are used in the tables that follow.

LOCUS NAMES FOR PROBLEM SET     1

LOCUS         LOCUS
NUMBER        NAME

   1           APP
   2           S1
   3           S4
   4           S8
   5           S11
   6           S12
   7           S16
   8           S18
   9           S46
  10           S47
  11           S48
  12           S52
  13           S111
  14           SOD1


The second table provides symbols for retention status.  These are the symbols 
for marker typed and retained, marker typed and lost, and marker not typed, 
respectively.

RETENTION STATUS CHARACTERS

    + = RETAINED
    - = NOT RETAINED
    ? = UNTYPED


The third table echoes the retention status data for this problem.  The data are 
permuted according to the output permutation.  Loci are labeled with the locus 
numbers specified in Table 1.

RADIATION HYBRID RETENTION STATUS DATA FOR PROBLEM SET     1
HYBRID  HYBRID   NUMBER     LOCUS NUMBER
 NAME   NUMBER  OBSERVED    1  2  3  4  5  6  7  8  9 10 11 12 13 14

  1        1        1       -  -  -  -  +  -  -  -  -  +  - -  -  +
  2        2        1       +  +  +  +  +  +  +  +  +  +  + +  +  +
  3        3        1       ?  -  +  ?  -  +  +  +  ?  ?  + ?  ?  ?
  4        4        1       -  -  +  -  +  -  -  -  +  -  - +  ?  -
  5        5        1       -  -  -  -  -  -  -  -  -  -  - -  -  -
  6        6        1       -  -  -  -  -  -  -  -  -  -  - -  -  -
  7        7        1       -  -  +  -  -  -  +  -  +  -  + ?  ?  ?
  8        8        1       +  +  +  +  +  +  +  +  +  +  + +  +  -
  .        .        .       .  .  .  .  .  .  .  .  .  .  . .  .  .
  .        .        .       .  .  .  .  .  .  .  .  .  .  . .  .  .
  .        .        .       .  .  .  .  .  .  .  .  .  .  . .  .  .
 98       98        1       -  +  +  -  +  -  +  +  +  -  + +  -  -
 99       99        1       ?  +  +  +  +  +  +  -  +  +  + +  +  +

TOTAL NUMBER OF HYBRIDS:    99

The first three tables are printed once per data set.  The remaining tables are 
printed for each problem.  Complete results for all four analyses can be found 
in the file RHMAXLIK.OUT.  Here, only the output for problem 3 has been printed, 
and it has been compressed.

Output for each problem begins with an annotated echoing of the input data.

PROBLEM NUMBER                         3

NUMBER OF LOCI:                        8
RETENTION MODEL:              CENTROMERE
ORDERING OPTION:                  BRANCH
USE INCOMPLETE HYBRIDS:              YES
INITIAL GUESS OPTION:                  0
IDENTIFY INFLUENTIAL HYBRIDS:        YES

GENETIC LOCI:   S11  S1   S18  S8   APP  S12  S47  SOD1

BRANCH AND BOUND ORDERING OPTIONS

MAXIMUM LOG10-L DIFFERENCE TO SAVE ORDER:      3.00000
MAXIMUM LOG10-L DIFFERENCE TO PRINT ORDER:     3.00000
MINIMUM LOG10-L SUPPORT TO ADD LOCUS:           .00000

CANDIDATE LOCUS ORDER:   S11  S1   S18  S8   APP  S12  S47  SOD1

ORDER FOR ADDING LOCI:   S11  S1   S18  SOD1 S12  S47  S8   APP


Next, the list of best maximum likelihood locus orders is printed.  If the 
maximum number of locus orders is not exceeded, locus orders are sorted by 
maximum likelihood; otherwise, they are given in the order in which they were 
encountered.

In this example, there are three locus orders that have maximum likelihood no 
more than 1000 times smaller than the best locus order.

MOST LIKELY LOCUS ORDERS

       LOG10
        LIKE     LIKE
RANK    DIFF     RATIO BRKS   LOCUS ORDER

  1    .0000       1.0   75   S11  S1   S18  S8   APP  S12  S47  SOD1
                               1    2    3    4    5    6    7    8

  2   1.4315      27.0   77   S11  S1   S18  APP  S8   S12  S47  SOD1
                               1    2    3    5    4    6    7    8

  3   1.7785      60.0   75   SOD1 S47  S12  APP  S8   S18  S1   S11
                               8    7    6    5    4    3    2    1


LOG-LIKELIHOOD FOR THE MAXIMUM LIKELIHOOD LOCUS ORDER:
LOG(10)    -119.9471446
LOG(E)     -276.1885071


LOG10 LIKE DIFF is the difference of maximum log10-likelihoods for the maximum 
likelihood locus order and the current locus order.

LIKE RATIO is the ratio of maximum likelihoods for the maximum likelihood locus 
order and the current locus order.

BRKS reports the number of obligate chromosome breaks for the locus order.

Note that for the centromeric model (or the left-endpoint model), opposite 
orientations along the chromosome are listed (for example, orders ranked 1 and 3 
above).

The next table provides a list of possible positions for the different loci 
among the competing orders.  In this accounting, opposite orientations along the 
chromosome are ignored.  It is often possible based on this table to suggest a 
possible framework map.  In this case, S11-S1-S18-S8-S12-S47-SOD1 and S11-S1-
S18-APP-S12-S47-SOD1 are suggested as possible framework maps.

POSSIBLE LOCUS POSITIONS AMONG ORDERS WITH MAXIMUM LIKELIHOODS NO MORE THAN    
1000.000 TIMES SMALLER THAN THAT OF THE MOST LIKELY LOCUS ORDER

LOCUS NUMBER   LOCUS NAME   POSSIBLE LOCUS POSITIONS

      1            S11        1
      2            S1         2
      3            S18        3
      4            S8         4  5
      5            APP        4  5
      6            S12        6
      7            S47        7
      8            SOD1       8


The next table prints parameter estimates for the best locus order (OUTOPT=0) or 
each of the best locus orders (OUTOPT>0).  Breakage probability estimates (BRK) 
and distance estimates (DIST) are printed for adjacent loci.  RETOBS and RETEST 
are the observed and estimated locus-specific retention probabilities; the 
observed values are sample proportions for the observed retention data.  RETPAR 
are the fragment retention probability estimates.  TOTAL MAP LENGTH is the sum 
of the distance estimates.  All these estimates are calculated based on the
Retention and breakage probability estimates.

For the equal retention probability model, a single retention probability 
estimate is printed.  For the centromeric model, the end-locus retention 
probability estimate is printed first followed by the other retention 
probability estimate.  For the left-endpoint model, the retention probability 
estimate for each locus is printed under that locus.  For the general model, the 
estimate of the retention probability r(i,j) is printed in the ith row and jth 
column among the retention probability estimates.

PARAMETER ESTIMATES FOR THE MOST LIKELY LOCUS ORDERS

RANK       LOCUS ORDER

   1     S11   S1    S18   S8    APP   S12   S47   SOD1
BRK        0.163 0.440 0.309 0.111 0.229 0.319 0.237
DIST       0.178 0.580 0.370 0.118 0.260 0.384 0.271
RETOBS  0.564 0.473 0.368 0.408 0.338 0.362 0.424 0.406
RETEST  0.559 0.525 0.448 0.417 0.410 0.396 0.381 0.374
RETPAR  0.559 0.350
TOTAL MAP LENGTH:   2.161

   2     S11   S1    S18   APP   S8    S12   S47   SOD1
BRK        0.163 0.440 0.306 0.126 0.270 0.319 0.237
DIST       0.178 0.580 0.365 0.135 0.315 0.384 0.271
RETOBS  0.564 0.473 0.368 0.338 0.408 0.362 0.424 0.406
RETEST  0.559 0.525 0.448 0.418 0.409 0.393 0.379 0.372
RETPAR  0.559 0.350
TOTAL MAP LENGTH:   2.228

   3     SOD1  S47   S12   APP   S8    S18   S1    S11
BRK        0.227 0.304 0.212 0.106 0.298 0.426 0.165
DIST       0.257 0.362 0.238 0.112 0.354 0.556 0.180
RETOBS  0.406 0.424 0.362 0.338 0.408 0.368 0.473 0.564
RETEST  0.436 0.444 0.453 0.457 0.458 0.462 0.466 0.467
RETPAR  0.436 0.472
TOTAL MAP LENGTH:   2.060

The influential hybrid table lists hybrids that result in substantially 
different log-likelihoods under the best and nearly best locus orders, and show 
how those log-likelihoods differ.  Differences to be printed are controlled by 
the value of the variable HYBMIN (see above).

For locus order 2, only hybrid 69 gives a substantially different log-likelihood 
than under the best locus order.  Hybrid 69 is influential in the sense of 
having substantial impact on the relative ranking of orders 1 and 2.  Indeed, 
hybrid 69 has likelihood 10**1.54 = 35 times larger under the maximum likelihood 
locus order than under the order ranked second.  Re-scoring of hybrid 69 might 
be justified.

INFLUENTIAL HYBRIDS FOR THE MOST LIKELY ORDERS.  MIN. DIFFERENCE:  1.00000

RANK    HYBRID CLASS AND LOG10-LIKELIHOOD DIFFERENCES (OTHER-BEST)


   2      69
       -1.53

   3    NO INFLUENTIAL HYBRIDS IDENTIFIED.


The permuted retention data can be examined for unusual retention patterns. Such 
patterns might include (1) a large number of obligate chromosome breaks, (2) a 
locus absent symbol interrupting a string of locus present symbols (possible 
false negative), or (3) a locus present symbol interrupting a string of locus 
absent symbols (possible false positive).

A new feature of the program is the printing of(1) the expected number of 
obligate breaks for each hybrid (E), and (2) the tail probability specifying the 
probability that the number of obligate breaks should be greater than or equal 
to the number actually observed (O).  These values are calculated based on the 
maximum likelihood parameter estimates of the breakage and retention 
probabilities and the missing data pattern for the hybrid.  The method for 
determining these quantities will be described in a forthcoming paper (Lunetta 
et al., in preparation).  Large numbers of observed breaks in comparison to the 
expected number, and/or small tail probabilities, suggest the possibility of 
mistyping.

RETENTION DATA PERMUTED IN THE BEST LOCUS ORDER FOR PROBLEM    3

HYBRID HYBRID  NUMBER   OBLIGATE BREAKS   LOCUS ORDER
NUMBER  NAME  OBSERVED  O   E    TAILPR   1  2  3  4  5  6  7  8

   1     1        1     2  0.9   0.2050    +  -  -  -  -  -  +  +
   2     2        1     0  0.9   1.0000    +  +  +  +  +  +  +  +
   3     3        1     1  0.6   0.4780    -  -  +  ?  ?  +  ?  ?
   4     4        1     1  0.9   0.6226    +  -  -  -  -  -  -  -
   5     5        1     0  0.9   1.0000    -  -  -  -  -  -  -  -
   6     6        1     0  0.9   1.0000    -  -  -  -  -  -  -  -
   7     7        1     0  0.8   1.0000    -  -  -  -  -  -  -  ?
   8     8        1     1  0.9   0.6226    +  +  +  +  +  +  +  -
   .     .        .        .        .      .  .  .  .  .  .  .  .
   .     .        .        .        .      .  .  .  .  .  .  .  .
   .     .        .        .        .      .  .  .  .  .  .  .  .
  98    98        1     1  0.9   0.6226    +  +  +  -  -  -  -  -
  99    99        1     2  0.9   0.2001    +  +  -  +  ?  +  +  +


Also new to version 2 is the printing of the distribution of the expected and 
observed numbers of hybrids with 0, 1, 2, ... obligate breaks.  The expected 
numbers are calculated both allowing for the observed degree of partial typing:  
EXP (PARTIAL); or ignoring the partial typing and assuming all hybrids are fully 
typed:  EXP (COMPLETE).


DISTRIBUTION OF THE NUMBER OF OBLIGATE CHROMOSOME BREAKS

BREAKS               0     1     2     3     4
OBS                 51    28    14     5     1
EXP (PARTIAL)    42.51 39.75 13.51  2.90  0.31  0.02
EXP (COMPLETE)   37.36 41.34 15.88  3.91  0.47  0.04


INPUT DIFFERENCES IN THE PROGRAMS

To simplify data entry, we have attempted to make the input requirements for the 
programs as similar as possible.  Our goal is to facilitate the analysis of an 
RH mapping data set with all three programs (see below). Here we provide a list 
of the principal differences between the data set requirements for the three 
programs as an aid to move from an analysis under one program to an analysis 
under the next.

1. The largest single difference in input for the programs is that RH2PT allows 
only a single analysis of only a single data set.  In contrast, RHMINBRK and 
RHMAXLIK allow multiple analyses of multiple data sets. Because of this, RH2PT 
includes no input data after the retention information on the hybrids, whereas 
RHMINBRK and RHMAXLIK both require such information.

2. Line 1:  Problem definition.
RH2PT:  Numbers of loci and hybrids, table option.
RHMINBRK:  Numbers of problems, loci, hybrids, and screen output option.
RHMAXLIK:  Same as RHMINBRK.

3. Output permutation vector for RH2PT.  Just after the retention symbols and 
just before the retention data, RH2PT requires a permutation for the loci for 
purposes of output.  This should not be present for the other two programs.

4. Problem Control Record.  This record, input for each problem after the 
retention data, includes different variables for RHMINBRK and RHMAXLIK.

5. Depending on which ordering option is chosen, there may be an additional 
difference for RHMINBRK and RHMAXLIK.  For a list of locus orders (ORDOPT=1), 
there are no further differences.  For stepwise locus ordering (ORDOPT=2) or 
branch and bound (ORDOPT=4), the stepwise locus ordering/branch and bound 
control record is different for the two programs.  For simulated annealing 
(ORDOPT=3), the simulated annealing control record 1 is different for the two 
programs.


CHECKING FOR DATA ERRORS AND INFLUENTIAL HYBRIDS IN THE MULTIPOINT ANALYSES

Several situations suggest that individual RHs merit special attention. (1) A 
hybrid requires a large number of obligate chromosome breaks under the best 
locus order.  (2) A hybrid scored as ------+------ or +++++-++++++ may correctly 
indicate two closely positioned breaks; alternatively, the discordant marker may 
have been mistyped.  (3) A hybrid is identified as influential in the analysis 
in the sense that it requires a different numbers of obligate breaks under the 
best and other nearly best locus orders (minimum breaks), or that its likelihood 
is substantially different under the best and other nearly best locus orders 
(maximum likelihood). Under each of these circumstances, we recommend the 
hybrids be re-scored or perhaps even re-typed.  In this process, we recommend 
separation of the tasks of identification of hybrids for reconsideration, and 
actual reconsideration of the hybrids.  If a hybrid will simply be re-scored, we
recommend that all markers be re-scored, with no special attention drawn to
particular markers.


OUTLINE FOR THE ANALYSIS OF RH MAPPING DATA

Proper analysis of RH mapping data involves several steps.  RHMAP provides a set
 of programs to carry out many of these steps.  We advocate the following 
approach to the analysis of RH mapping data:  (1) careful marker scoring, data 
entry, and data checking; (2) calculation of descriptive statistics and two-
point analysis by RH2PT; (3) nonparametric multipoint RH mapping by RHMINBRK; 
and (4) maximum likelihood RH mapping by RHMAXLIK. These steps are next 
considered in greater detail.

Scoring of markers in RH mapping studies is not trivial, particularly if PCR is 
used.  In cases of ambiguity, such as weak positives, consideration should be 
given to repeating the experiment or simply calling the data missing. If only a 
small portion of the data are ambiguous, treating those data as missing gnerally 
will not result in substantial information loss.

Data entry requires a file editor.  Double entry, in which the data are entered 
into two different files with the resulting files compared, should be used; 
double entry requires only a little extra time, and should result in detection 
of nearly all data entry errors.  After data are entered, careful re-checking to 
eliminate errors is strongly recommended, even if double entry was used.

Once data have been entered, RH2PT can be used to determine linkage groups, 
estimate locus retention probabilities, and estimate distances between the 
various markers.  Subsequent multipoint analyses (see below) are best undertaken 
on those (sets of) loci that appear to be linked based on the two-point 
analyses; inclusion of unlinked markers complicates the analysis and 
interpretation of the RH mapping data.  Retention probability estimates for the 
different loci will suggest whether the equal fragment retention probability 
model should be appropriate for the maximum likelihood analysis, or whether 
alternative fragment retention models should be considered.  Breakage 
probability and distance estimates for the marker pairs will suggest locus 
orders for the markers; these orders can be compared to those inferred in the 
more complex multipoint analyses. Substantial discrepancies suggest checking the 
analyses undertaken. Breakage probability estimates for the two-point and 
multipoint maximum likelihood analyses generally should be similar, particularly 
for loci that are close together.  The estimates from these analyses should be 
compared for consistency.  Substantial discrepancies again suggest checking the
analyses undertaken.

Multipoint analysis under the minimum breaks criterion provides a useful adjunct 
to subsequent maximum likelihood analysis.  Since the minimum breaks approach is 
nonparametric, violations of assumptions such as independent fragment retention 
or specific fragment retention models are of no concern.  Locus orders obtained 
under the minimum breaks criterion can be used as a list of input orders for 
subsequent maximum likelihood analysis; alternatively, the lists of best orders 
under the two methods can be compared and discrepancies noted.

When carrying out the minimum breaks analysis, we advocate trying first to use 
the branch and bound approach with SAVMAX and PRTMAX set to a relatively small 
positive number, say 5.  If this approach succeeds, all locus orders requiring 
no more than 5 more than the best locus order are guaranteed to be found.  If 
this approach proves too time consuming, switching to stepwise locus ordering is 
suggested, with SAVMAX set substantially larger than PRTMAX, say SAVMAX=15 and 
PRTMAX=5.  If this succeeds, SAVMAX can be increased and any changes in the 
results noted.  In this way, reasonably strong assurance of identifying the best 
locus orders can be obtained. Simulated annealing will probably be most useful 
when the number of loci n is so large that even stepwise locus ordering is too 
time consuming.  If simulated annealing is used, we recommend repeating it with 
several different initial locus orders (note that this requires different seeds 
for the random number generator if random initial orders are used), with the 
results compared.

Multipoint maximum likelihood is the most complicated RH mapping strategy 
because of its substantial computational burden, and because of the availability 
of several different retention probability models to consider. If the number of 
loci n is quite modest (say n < 12), branch and bound may be feasible.  More 
typically, stepwise locus ordering will be the method of choice.  Usually, we 
first try stepwise locus ordering with PRTMAX=3 and SAVMAX=8 to see whether the 
analysis can be carried out fairly quickly.  As for the minimum breaks analysis, 
SAVMAX can be increased and any changes in the results noted.  Again, simulated 
annealing can be attempted if stepwise locus ordering takes too long.  However, 
for maximum likelihood, our experience suggests that under such circumstances 
simulated annealing also can require many hours of computation (Boehnke et al. 
1991).

Estimates of locus-specific retention probabilities from RH2PT can suggest 
whether the equal fragment retention probability model is reasonable.  If locus-
specific retention probabilities appear variable, combining these estimates with 
the best minimum breaks locus order(s) can further suggest whether there might 
be a gradient of locus retention probabilities as one travels along the  
chromosome.  Such gradients have been noted for several chromosomes, including 
chromosomes 21 (Cox et al. 1990; Burmeister et al. 1991), 16 (Ceccherini et al. 
1992), and X (Gorski et al. 1992).  We generally carry out our initial maximum 
likelihood analysis under the equal fragment retention model, and then repeat 
the analysis under the centromeric (telomeric) or the left-endpoint model if the 
retention data suggest the equal retention model might not fit the data well.  
Conditional on locus order, a likelihood ratio test can be used to test whether 
the more complicated retention models provide a more satisfactory fit to the 
data (see above).

Identification of possible data errors and influential hybrids is an important 
aspect of both multipoint methods (see above), and will often lead to data re-
checking or even re-typing of RH data, and subsequent re-analysis.


DEFAULT ARRAY DIMENSIONS

The maximum array dimensions for the programs are initially set according to the 
values of the following variables:


Initial
Variable          Description
Value

RH2PT.FOR:
MAXHYB   maximum number of RHs in a data set                           200
MAXLOC   maximum number of loci in a data set                           60
MXPAIR   maximum number of locus pairs in a data set                  1770
         MXPAIR=MAXLOC*(MAXLOC-1)/2

RHMINBRK.FOR:
MAXHYB   maximum number of RHs in a data set                           200
MAXLOC   maximum number of loci in a data set                           60
MAXORD:  maximum number of locus orders to process                    1000

RHMAXLIK.FOR:
MAXHYB   maximum number of RHs in a data set                           100
MAXLOC   maximum number of loci in a data set                           32
MAXORD:  maximum number of locus orders to process                    1000
MAXPAR:  maximum number of model parameters                             64

Note that with MAXPAR=64, the Markovian models can be supported for up to 32 
markers, while the general model can be supported for up to 10 markers.

To modify these dimensions, modify the PARAMETER statement on lines 23-24 of 
RH2PT, line 20 of RHMINBRK, or lines 24-25 of RHMAXL1.  This may be accomplished 
by using a file editor.  Then recompile the program as described below.


ERROR CONDITIONS AND USER SUPPORT

When one of the RHMAP programs stops without completing the desired analyses, 
error messages may be found either on the screen, in the output file, or both.  
ALWAYS CHECK THE OUTPUT FILE FOR ERROR MESSAGES IF THE PROGRAM STOPS 
UNEXPECTEDLY!  After correcting the error(s) noted, the program can be re-run.

While we have tried to carefully document RHMAP and to make it reasonably easy 
to use, problems will undoubtedly arise.  Before calling us, please read the 
documentation carefully.  It is our experience with other programs that many of 
the questions we are asked are answered in the documentation. If after reading 
the documentation you are still having problems, help on an as-available basis 
can be obtained by phone, letter, fax, or e-mail from 

Michael Boehnke, Ph.D.
Department of Biostatistics
School of Public Health
1420 Washington Heights
University of Michigan
Ann Arbor, Michigan  USA 48109-2029
Phone:  (313) 936-1001
FAX:    (313) 763-2215
E-Mail: boehnke@umich.edu

If you think you have found a bug in one of the programs, or an error in the 
documentation, we also would like very much to know about it.

Although RHMAP is distributed free of charge, please do not pass on a copy of 
the programs to others.  Instead, please ask anyone wishing to use the programs 
to obtain them directly from us.  This procedure allows us to keep accurate 
records of software users, and to send out updates to everyone as improvements 
are made and errors are corrected.


FUTURE PLANS

Since this is the first distribution version of the polyploid version of RHMAP, 
the programs remain in a somewhat fluid state of development. Obviously, errors 
may be found; we will do our best to correct any errors discovered as quickly as 
possible, and to inform all users immediately. 

Future additions to the programs that we may undertake, depending on time, grant
support, and user interest include:  (1) modeling and parameter estimation for 
models that include typing error; (2) allowance for hybrid construction based on 
a selectable marker; (3) allowing for more than one set of RHs for a given 
mapping problem; this would permit combination of RH mapping data from two or 
more sources, allowing for differences in retention and breakage probabilities.  
Suggestions from users interested in these or other extensions of the software 
would be most welcome.


ACKNOWLEDGEMENTS

We thank David Cox for many helpful discussions during the development of this 
software and for allowing us to use his 21q data set for the examples presented 
here.  Thanks to Tempie Shearon for testing version 2.0 and providing comments 
and suggested improvements.  Support for this work was provided by NIH grant 
HG00209 to MB.


REFERENCES

Barker D, Green P, Knowlton R, Schumm J, Langer E, Oliphant A, Willard et al.
(1987) Genetics linkage map of human chromosome 7 with 63 DNA markers. Proc Natl 
Acad Sci USA 84:8006-8010

Barrett JH (1992) Genetic mapping based on radiation hybrid data. Genomics 
13:95-104

Bishop DT, Crockford GP (1992) Comparisons of radiation hybrid mapping and 
linkage mapping.  Genetic Analysis Workshop 7:  Issues in Gene Mapping and 
Detection of Major Genes.  MacCluer JW, Chakravarti A, Cox D, Bishop DT, Bale 
SJ, Skolnick MH (eds.).  Cytogenet Cell Genet 59:93-95

Boehnke M (1992) Radiation hybrid mapping by minimization of the number of 
obligate chromosome breaks.  Genetic Analysis Workshop 7:  Issues in Gene 
Mapping and Detection of Major Genes.  MacCluer JW, Chakravarti A, Cox D, Bishop 
DT, Bale SJ, Skolnick MH (eds.).  Cytogenet Cell Genet 59:96-98

Boehnke M, Lange K, Cox DR (1991) Statistical methods for multipoint radiation 
hybrid mapping.  Am J Hum Genet 49:1174-1188

Burmeister M, Kim S, Price ER, de Lange T, Tantravahi U, Myers RM, Cox DR (1991) 
A map of the distal region of the long arm of human chromosome 21 constructed by 
radiation hybrid mapping and pulsed-field gel electrophoresis. Genomics 9:19-30

Ceccherini I, Romeo G, Lawrence S, Breuning MH, Harris PC, Himmelbauer H, 
Frischauf AM, Sutherland GR, Germino GG, Reeders ST, Morton NE (1992) 
Construction of a map of chromosome 16 by using radiation hybrids.  Proc Natl 
Acad Sci USA 89:104-108

Chakravarti A, Reefer JE (1992) A theory for radiation hybrid (Goss-Harris) 
mapping:  application to proximal 21q markers.  Genetic Analysis Workshop 7:  
Issues in Gene Mapping and Detection of Major Genes.  MacCluer JW, Chakravarti 
A, Cox D, Bishop DT, Bale SJ, Skolnick MH (eds.).  Cytogenet Cell Genet 59:99-
101

Cox DR, Burmeister M, Price ER, Kim S, Myers RM (1990) Radiation hybrid mapping:  
a somatic cell genetic method for constructing high-resolution maps of mammalian 
chromosomes.  Science 250:245-250

Dempster AP, Laird NM, Rubin DB (1977) Maximum likelihood from incomplete data 
via the EM algorithm.  J Roy Statist Soc B 39:1-22

Walter MA, Spillett DJ, Thomas P, Weissenbach J, Goodfellow PN (1994) A method 
for constructing radiation hybrid maps of whole genomes.  Nat Genet 7:22-28

Gorski JL, Boehnke M, Reyner EL, Burright EN (1992) A radiation hybrid map of 
the proximal short arm of the human X chromosome spanning Incontinentia Pigmenti 
1 (IP1) translocation breakpoints.  Genomics 14:657-665

Goss SJ, Harris H (1975) New method for mapping genes in human chromosomes. 
Nature 255:680-684

Goss SJ, Harris H (1977a) Gene transfer by means of cell fusion.  I. Statistical 
mapping of the human X-chromosome by analysis of radiation-induced gene 
segregation.  J Cell Sci 25:17-37

Goss SJ, Harris H (1977b) Gene transfer by means of cell fusion.  II. The 
mapping of 8 loci on human chromosome 1 by statistical analysis of gene 
assortment in somatic cell hybrids.  J Cell Sci 25:39-57

Haldane JBS (1919) The combination of linkage values, and the calculation of 
distance between the loci of linked factors.  J Genet 8:299-309

Karlin S, Taylor HM (1975) A first course in stochastic processes, 2nd ed. 
Academic Press, New York, pp. 45-80, 117-128

Kirkpatrick S, Gelatt CD, Vecchi MP (1983) Optimization by simulated annealing.  
Science 220:671-680

Lange K, Boehnke M (1992) Bayesian methods and optimal experimental design for 
gene mapping by radiation hybrids.  Ann Hum Genet 56:119-144

Lawrence S, Morton N (1992) Physical mapping by multiple pairwise analysis. 
Genetic Analysis Workshop 7:  Issues in Gene Mapping and Detection of Major 
Genes.  MacCluer JW, Chakravarti A, Cox D, Bishop DT, Bale SJ, Skolnick MH 
(eds.).  Cytogenet Cell Genet 59:107-109

Nijenhuis A, Wilf HS (1978) Combinatorial algorithms, 2nd ed. Academic Press, 
New York, pp. 240-246

Press WH, Flannery BP, Teukolsky SA, Vetterling WT (1989) Numerical recipes.  
The art of scientific computing (FORTRAN version).  Cambridge University Press, 
Cambridge, pp. 326-334

Thompson EA (1987) Crossover counts and likelihood in multipoint linkage 
analysis.  IMA J Math Appl Med Biol 4:93-108

Weeks DE, Lehner T, Ott J (1992) Preliminary ranking procedures for multilocus 
ordering based on radiation hybrid data.  Genetic Analysis Workshop 7:  Issues 
in Gene Mapping and Detection of Major Genes.  MacCluer JW, Chakravarti A, Cox 
D, Bishop DT, Bale SJ, Skolnick MH (eds.). Cytogenet Cell Genet 59:125-127