| 
 PEDSTATS Tutorial - Input Files PEDSTATS provides graphical and text summaries of the information contained in any pair of pedigree and data files.  
Pedigree (.ped) files describe relationships between individuals in your dataset and also store marker genotypes, 
disease status and quantitative trait values. Data (.dat) files provide a description of the contents of the 
associated pedigree file.  PEDSTATS supports input files in either QTDT, LINKAGE or  MENDEL  
format . Although the three formats are similar, in the discussion below we will focus on QTDT format. Describing Relationships Between IndividualsAlthough pedigrees can become quite complex, all the information that is necessary to 
reconstruct individual relationships in a pedigree file can be summarized in five items: 
a family identifier, an individual identifier, a link to each parent (if available) and 
finally an indicator of each individual's sex.  As an example of how family relationships are described, we will construct a pedigree
file for a small pedigree with two siblings, their parents and maternal grand-parents.
 For this simple pedigree, the five key items take the following values:
 FAMILY     PERSON   FATHER   MOTHER   SEX
example    granpa   unknown  unknown    m
example    granny   unknown  unknown    f
example    father   unknown  unknown    m
example    mother   granpa   granny     f
example    sister   father   mother     f
example    brother  father   mother     m
 These key values constitute the first five columns of any pedigree 
file. Because of restrictions in early genetic programs, text identifiers
are usually replaced by unique numeric values. After replacing each 
identifier with unique integer and recoding sexes as 2 (female) and 1 (male),
this is what a basic space-delimited pedigree file would look like:
 <contents of basic.ped>
1   1   0  0  1
1   2   0  0  2
1   3   0  0  1
1   4   1  2  2
1   5   3  4  2
1   6   3  4  1
<end of basic.ped> A pedigree file can include multiple families. Each family can 
have a unique structure, independent of other families in the dataset.
 Describing Phenotypes and GenotypesUsually the five standard columns are followed by various 
types of genetic data, including phenotypes for discrete and quantitative
traits and marker genotypes. Disease status is usually encoded in a single column as    U or 1 for unaffecteds,
A or 2 for affecteds, and
 X or 0 for missing phenotypes.
 Quantitative traits are encoded as numeric values with X 
denoting missing values (it is also possible to use a peculiar numeric
value to flag missing phenotypes, but the procedure is prone to error 
and not recommended). Marker genotypes are encoded as two consecutive integers,
one for each allele, optionally separated by a "/". A 0 (zero) or X 
can be used as a placeholder for missing alleles. The following are all 
valid genotype entries 1/1 (homozygote for allele 1), 0/0 
(missing genotype), and 3 4 (heterozygote for alleles 3 and 4). For 
the X chromosome, males should be encoded as if they had two identical
alleles. 
 This is what the previous pedigree file might look like after adding a 
column for disease status, measurements for a quantitative trait and 
genotypes for two markers: <contents of basic2.ped>
1   1   0  0  1   1      x   3 3   x x
1   2   0  0  2   1      x   4 4   x x
1   3   0  0  1   1      x   1 2   x x
1   4   1  2  2   1      x   4 3   x x
1   5   3  4  2   2  1.234   1 3   2 2
1   6   3  4  1   2  4.321   2 4   2 2
<end of basic2.ped> Notice that the two siblings (individuals 5 and 6 in the last two rows)
are marked as affected (value 2 in the sixth column), everyone else is marked
as unaffected (value 1 in the sixth column). The 
quantitative trait (seventh column) takes values 1.234 and 4.321 for individuals 5 and 6 respectively. Whereas
everyone is genotyped at the first marker, for the second marker, only 
individuals 5 and 6 are genotyped. Describing the pedigree filePedigree files can include any number of marker genotype, disease
status and quantitative trait variables, limited only by available 
memory. Since each pedigree file has a unique structure (apart from 
the first five columns), its contents must be described in a companion
data file. 
 The data file includes one row per data item in the pedigree file,
indicating the data type (encoded as M - marker,  A - affection status,
T - Quantitative Trait C - Covariate and Z - Twins ) and providing a 
one-word label
for each item. A data file for the pedigree above, which has one affection
status, followed by one quantitative trait and two marker genotypes might
read:
 <contents of basic2.dat>
A  some_disease
T  some_trait
M  some_marker
M  another_marker
<end of basic2.dat>Now that you understand pedigree and data file formats, you'll probably want to actually run PEDSTATS! You can 
get a copy from our  download page or if you'd like, you can take a look 
at some text or PDF output first. 
 |