University of Michigan Center for Statistical 
Genetics
Search
 
 

 
 

PEDSTATS Tutorial -- Text Output

In addition to graphical output, PEDSTATS provides a text summary of the input pedigree and data files. We'll walk through the text output in this section. A discussion of graphical output can be found here.

For purposes of illustration, we'll use simulated data and pedigree files that are very similar to those discussed in the section on input files. These files are also in the examples sub-directory of the PEDSTATS distribution available from the download page.

To begin, in the examples sub-directory, type

       pedstats -p basic7.ped -d basic7.dat --pairStatistics
You should first see a program header with a summary of currently selected program options. Program options are specified on the command line and may be abbreviated (e.g --pairs instead of --pairStatistics). Because we've used the --pairStatistics option, this is set to ON.

Pedigree Statistics - 0.6.3
(c) 1999-2005 Goncalo Abecasis, 2002-2005 Jan Wigginton

The following parameters are in effect:
                 Pedigree File :      basic7.ped (-pname)
                     Data File :      basic7.dat (-dname)
                      IBD File :    pedstats.ibd (-iname)
                Adobe PDF File :    pedstats.pdf (-aname)
            Missing Value Code :         -99.999 (-xname)

Additional Options
    Pedigree File : --ignoreMendelianErrors, --chromosomeX, --trim
   Hardy-Weinberg : --hardyWeinberg, --showAll, --cutoff [0.05]
        HW Sample : --checkFounders, --checkAll, --checkUnrelated
           Output : --pairs [ON], --rewritePedigree, --markerTables, --verbose
         Grouping : --bySex, --byFamily
     Age Checking : --age [], --birth []
      Generations : --minGap [13.00], --maxGap [70.00], --sibGap [30.00]
      PDF Options : --pdf, --familyPDF, --traitPDF, --affPDF, --markerPDF
           Filter : --minGenos, --minPhenos, --minCovariates, --affectedFor []

Pedigree structure

After the program header, PEDSTATS will produce a brief summary of family structure in your dataset. As you can see from the listing below, our pedigree file includes 2 families for a total of 15 individuals. The average family size is 7.5 and the average number of generations is 3.5.

After producing summary statistics, PEDSTATS does a check of your data to confirm that all individuals in each family are connected. Everything seems OK with this data set.

PEDIGREE STRUCTURE
==================
       Individuals: 15
          Founders: 7 founders, 8 nonfounders
            Gender: 8 females, 7 males
          Families: 2

       Family Sizes
           Average: 7.50 (6 to 9)
      Distribution: 6 (50.0%), 9 (50.0%) and 0 (0.0%)
           
       Generations
           Average: 3.50 (3 to 4)
      Distribution: 3 (50.0%), 4 (50.0%) and 0 (0.0%)

Checking family connectedness ...
  All individuals in each family are connected.

Quantitative trait statistics:

If your data includes any quantitive traits, you should see summary information for each trait listed after the section on family information. From this, you can see that the mean for some_trait was 5.749, the sample variance was 11.547 and the correlation among all sibling pairs was 0.407. In addition, you can verify that 9 individuals or 60% of the total data were phenotyped for some_trait and that 2 out of 9 phenotyped individuals were founders.


QUANTITATIVE TRAIT STATISTICS 
=============================

               [All Phenotypes]      Min      Max     Mean      Var  SibCorr
     some_trait        9  60.0%    0.512   10.000    5.749   11.547    0.407
          Total        9  60.0%


                [Founders Only]      Min      Max     Mean      Var  SibCorr
     some_trait        2  28.6%    8.000    9.000    8.500    0.500        -
          Total        2  28.6%

Affection status statistics

If affection status information is included in your pedigree file, a summary of each affection will be listed next. The output below tells us that for the one affection status (some_disease) included in the data set, all individuals are phenotyped and prevalence in the sample is 40%.


AFFECTION STATISTICS
====================

                  [Diagnostics]      [Founders] Prevalence
   some_disease       15 100.0%        7 100.0%      40.0%
          Total       15 100.0%        7 100.0%

Marker allele frequencies

The next section of output gives information about the marker data in your pedigree file. In this example, we have one marker ( some_marker). The table below to indicates that of the 15 individuals in the pedigree, all were genotyped at some_marker and that 80% of the typed individuals in the data set were heterozygous at this locus.

MARKER GENOTYPE STATISTICS
==========================

                    [Genotypes]      [Founders]     Hetero
    some_marker       15 100.0%        7 100.0%      80.0%
          Total       15 100.0%        7 100.0%      80.0%

Summary mode for large data sets

If your dataset includes more than 50 markers, Pedstats switches to a summary mode. In this case, you'll see a table listing extreme marker heterozygosity and genotyping rates similar to that below. If you're interested in more details, you'll find that the marker genotype table described in the previous section has been written to a separate text file (pedstats.markerinfo).


Switching to summary output mode because there are more than 50 markers.
See file pedstats.markerinfo for detailed marker information.



DATA QUALITY
============


HIGHEST AND LOWEST GENOTYPING RATES BY MARKER


         MARKER   RANK    PROP   N_GENO  |         MARKER   RANK    PROP   N_GENO
---------------------------------------------------------------------------------
         MRK153      1   50.0%      400  |         MRK305    458   50.0%      400
         MRK154      2   50.0%      400  |         MRK306    457   50.0%      400
         MRK152      3   50.0%      400  |         MRK302    456   50.0%      400
         MRK458      4   50.0%      400  |           MRK1    455   50.0%      400
         MRK144      5   50.0%      400  |         MRK304    454   50.0%      400
         MRK155      6   50.0%      400  |         MRK303    453   50.0%      400
         MRK156      7   50.0%      400  |         MRK307    452   50.0%      400
         MRK151      8   50.0%      400  |         MRK312    451   50.0%      400
         MRK146      9   50.0%      400  |         MRK308    450   50.0%      400
         MRK147     10   50.0%      400  |         MRK313    449   50.0%      400

         Totals   458   50.0%   183200



HIGHEST AND LOWEST HETEROZYGOSITIES BY MARKER


         MARKER   RANK     HET   N_GENO  |         MARKER   RANK     HET   N_GENO
---------------------------------------------------------------------------------
           MRK8      1   78.0%      400  |         MRK197    458   35.2%      400
           MRK3      2   77.2%      400  |          MRK26    457   36.2%      400
          MRK15      3   76.2%      400  |          MRK42    456   37.5%      400
          MRK13      4   75.8%      400  |         MRK437    455   37.8%      400
           MRK6      5   75.5%      400  |         MRK375    454   38.2%      400
           MRK4      6   75.2%      400  |         MRK436    453   38.8%      400
          MRK14      7   75.0%      400  |         MRK147    452   39.0%      400
          MRK12      8   74.2%      400  |         MRK273    451   39.5%      400
           MRK5      9   74.0%      400  |         MRK104    450   39.5%      400
          MRK11     10   74.0%      400  |         MRK146    449   39.5%      400

         Totals   458   46.8%   183200


Detailed marker summaries for all 458 markers written to file pedstats.markerinfo

Pairwise summaries

Because the --pairStatistics option is specified, summary statistics for pairwise distributions of affection, covariate and quantitive trait variables are also listed. At the top of this section, a listing of overall relative pair counts is given; this indicates that our pedigree file includes 4 sib pairs, 16 parent-child pairs, 12 grandparent-grandchild pairs, and 2 avuncular (e.g uncle-nephew or aunt-niece ) pairs. The next two tables give (respectively) correlations and counts of phenotyped pairs for each pair type. From these tables, you should verify that all 4 sib pairs had both members phenotyped for some_trait while only 8 of 16 parent-child pairs were fully phenotyped. Among these 8 phenotyped parent-child pairs, the within pair correlation for some_trait was 0.4014. The next table in this section summarizes pairwise affection status by relative pair type. From this you should be able to see that all 16 parent-child pairs were diagnosed for the affection some_disease, and that among these 16 pairs, there were 5 unaffected and 10 discordant pairs.

                         

PAIR STATISTICS
===============

Relative Pair Counts
                Sib-pairs:        4 pairs
             Parent-Child:       16 pairs
   Grandparent-Grandchild:       12 pairs
                Avuncular:        2 pairs


Pair Correlations for Each Trait:
                     Sib  HalfSib   Cousin  ParentChild  Grandparent Avuncular
     some_trait   0.4070        -        -       0.4014       0.0000    0.0000

Pair Counts for Each Trait:
                     Sib  HalfSib   Cousin  ParentChild  Grandparent Avuncular
     some_trait        4        0        0            8            5         2


Pair Counts by Affection Status:
                     Sib  HalfSib   Cousin  ParentChild  Grandparent Avuncular
   some_disease
  [ Unaffected]        0        0        0            5            1         0
  [ Discordant]        2        0        0           10            9         1
  [   Affected]        2        0        0            1            2         1


When the --pairStatistics option is used in conjunction with the --pdf option, PEDSTATS also appends the standard pdf output with several pages of graphical output summarizing pairwise information. To see some examples, you might want to look at the section on graphical output for pair distributions after you take a look at the section on graphical ouput .


 
 

University of Michigan | School of Public Health