PEDSTATS Tutorial -- Text Output
In addition to graphical output, PEDSTATS provides a text summary of the
input pedigree and data files. We'll walk through the text output in this section.
A discussion of graphical output can be found here.
For purposes of illustration, we'll use simulated
data and pedigree files that are very similar to
those discussed in the section on input files. These files
are also in the examples sub-directory of the PEDSTATS distribution available from the download page.
To begin, in the examples sub-directory, type
pedstats -p basic7.ped -d basic7.dat --pairStatistics
You should first see a program header with a summary of currently
selected program options. Program
options are specified on the command line and may be abbreviated (e.g --pairs instead
of --pairStatistics). Because we've used the --pairStatistics option, this is set to ON.
Pedigree Statistics - 0.6.3
(c) 1999-2005 Goncalo Abecasis, 2002-2005 Jan Wigginton
The following parameters are in effect:
Pedigree File : basic7.ped (-pname)
Data File : basic7.dat (-dname)
IBD File : pedstats.ibd (-iname)
Adobe PDF File : pedstats.pdf (-aname)
Missing Value Code : -99.999 (-xname)
Additional Options
Pedigree File : --ignoreMendelianErrors, --chromosomeX, --trim
Hardy-Weinberg : --hardyWeinberg, --showAll, --cutoff [0.05]
HW Sample : --checkFounders, --checkAll, --checkUnrelated
Output : --pairs [ON], --rewritePedigree, --markerTables, --verbose
Grouping : --bySex, --byFamily
Age Checking : --age [], --birth []
Generations : --minGap [13.00], --maxGap [70.00], --sibGap [30.00]
PDF Options : --pdf, --familyPDF, --traitPDF, --affPDF, --markerPDF
Filter : --minGenos, --minPhenos, --minCovariates, --affectedFor []
Pedigree structure
After the program header, PEDSTATS will produce a brief summary of family structure
in your dataset. As you can see from the listing below,
our pedigree file includes 2 families for a total of 15 individuals. The average
family size is 7.5 and the average number of generations is 3.5.
After producing summary statistics, PEDSTATS does a check of your data to confirm that
all individuals in each family are connected. Everything seems OK with this data set.
PEDIGREE STRUCTURE
==================
Individuals: 15
Founders: 7 founders, 8 nonfounders
Gender: 8 females, 7 males
Families: 2
Family Sizes
Average: 7.50 (6 to 9)
Distribution: 6 (50.0%), 9 (50.0%) and 0 (0.0%)
Generations
Average: 3.50 (3 to 4)
Distribution: 3 (50.0%), 4 (50.0%) and 0 (0.0%)
Checking family connectedness ...
All individuals in each family are connected.
Quantitative trait statistics:
If your data includes any quantitive traits, you should see summary information for each
trait listed after the section on family information. From this, you can see that the mean for some_trait was 5.749, the sample variance was 11.547 and
the correlation
among all sibling pairs was 0.407. In addition, you can
verify that 9 individuals or 60% of the total data were phenotyped for
some_trait and that 2 out of 9 phenotyped individuals were
founders.
QUANTITATIVE TRAIT STATISTICS
=============================
[All Phenotypes] Min Max Mean Var SibCorr
some_trait 9 60.0% 0.512 10.000 5.749 11.547 0.407
Total 9 60.0%
[Founders Only] Min Max Mean Var SibCorr
some_trait 2 28.6% 8.000 9.000 8.500 0.500 -
Total 2 28.6%
Affection status statistics
If affection status information is included in your pedigree file, a summary of each affection
will be listed next. The output below tells us that for the one affection status (some_disease)
included in the data set, all individuals are phenotyped and prevalence in the sample is 40%.
AFFECTION STATISTICS
====================
[Diagnostics] [Founders] Prevalence
some_disease 15 100.0% 7 100.0% 40.0%
Total 15 100.0% 7 100.0%
Marker allele frequencies
The next section of output gives information about the marker data in your pedigree file. In this example, we have one marker (
some_marker). The table below to indicates that of the 15 individuals in the pedigree, all were genotyped
at some_marker and that 80% of the typed individuals in the data set were heterozygous at this locus.
MARKER GENOTYPE STATISTICS
==========================
[Genotypes] [Founders] Hetero
some_marker 15 100.0% 7 100.0% 80.0%
Total 15 100.0% 7 100.0% 80.0%
Summary mode for large data sets
If your dataset includes more than 50 markers, Pedstats
switches to a summary mode. In this case, you'll see a table listing extreme marker heterozygosity and genotyping
rates similar to that below. If you're interested in more details, you'll find that the
marker genotype table described in the previous section has been written to a separate text file
(pedstats.markerinfo).
Switching to summary output mode because there are more than 50 markers.
See file pedstats.markerinfo for detailed marker information.
DATA QUALITY
============
HIGHEST AND LOWEST GENOTYPING RATES BY MARKER
MARKER RANK PROP N_GENO | MARKER RANK PROP N_GENO
---------------------------------------------------------------------------------
MRK153 1 50.0% 400 | MRK305 458 50.0% 400
MRK154 2 50.0% 400 | MRK306 457 50.0% 400
MRK152 3 50.0% 400 | MRK302 456 50.0% 400
MRK458 4 50.0% 400 | MRK1 455 50.0% 400
MRK144 5 50.0% 400 | MRK304 454 50.0% 400
MRK155 6 50.0% 400 | MRK303 453 50.0% 400
MRK156 7 50.0% 400 | MRK307 452 50.0% 400
MRK151 8 50.0% 400 | MRK312 451 50.0% 400
MRK146 9 50.0% 400 | MRK308 450 50.0% 400
MRK147 10 50.0% 400 | MRK313 449 50.0% 400
Totals 458 50.0% 183200
HIGHEST AND LOWEST HETEROZYGOSITIES BY MARKER
MARKER RANK HET N_GENO | MARKER RANK HET N_GENO
---------------------------------------------------------------------------------
MRK8 1 78.0% 400 | MRK197 458 35.2% 400
MRK3 2 77.2% 400 | MRK26 457 36.2% 400
MRK15 3 76.2% 400 | MRK42 456 37.5% 400
MRK13 4 75.8% 400 | MRK437 455 37.8% 400
MRK6 5 75.5% 400 | MRK375 454 38.2% 400
MRK4 6 75.2% 400 | MRK436 453 38.8% 400
MRK14 7 75.0% 400 | MRK147 452 39.0% 400
MRK12 8 74.2% 400 | MRK273 451 39.5% 400
MRK5 9 74.0% 400 | MRK104 450 39.5% 400
MRK11 10 74.0% 400 | MRK146 449 39.5% 400
Totals 458 46.8% 183200
Detailed marker summaries for all 458 markers written to file pedstats.markerinfo
Pairwise summaries
Because the --pairStatistics option is specified,
summary statistics for pairwise distributions of affection, covariate and
quantitive trait variables are also listed. At the top of this section, a
listing of overall relative pair counts is given; this indicates that
our pedigree file includes 4 sib pairs, 16 parent-child pairs, 12
grandparent-grandchild pairs, and 2 avuncular (e.g uncle-nephew or
aunt-niece ) pairs. The next two tables give (respectively) correlations
and counts of phenotyped pairs for each pair
type. From these tables, you should verify that
all 4 sib pairs had both members phenotyped for some_trait while only 8 of 16 parent-child pairs were fully phenotyped. Among
these 8 phenotyped parent-child pairs, the within pair correlation for some_trait was 0.4014.
The next table in this section summarizes pairwise affection status by relative pair type.
From this you should be able to see that all
16 parent-child pairs were diagnosed for the affection some_disease, and that among
these 16 pairs, there were 5 unaffected and 10 discordant pairs.
PAIR STATISTICS
===============
Relative Pair Counts
Sib-pairs: 4 pairs
Parent-Child: 16 pairs
Grandparent-Grandchild: 12 pairs
Avuncular: 2 pairs
Pair Correlations for Each Trait:
Sib HalfSib Cousin ParentChild Grandparent Avuncular
some_trait 0.4070 - - 0.4014 0.0000 0.0000
Pair Counts for Each Trait:
Sib HalfSib Cousin ParentChild Grandparent Avuncular
some_trait 4 0 0 8 5 2
Pair Counts by Affection Status:
Sib HalfSib Cousin ParentChild Grandparent Avuncular
some_disease
[ Unaffected] 0 0 0 5 1 0
[ Discordant] 2 0 0 10 9 1
[ Affected] 2 0 0 1 2 1
When the --pairStatistics option is used in conjunction with the --pdf
option, PEDSTATS also appends the standard pdf output with several pages
of graphical output summarizing pairwise information. To see some
examples, you might want to look at the section on graphical output for pair distributions after you take a look at the
section on graphical ouput .
|