Manual for GHOST

GHOST is a software package for family-based genomewide association (GWA) analysis, with the ability to infer missing genotypes using the Elston-Stewart algorithm. When SNPs from an association panel are less complete (i.e., having more missing genotypes) than markers from a linkage panel, many of the missing genotypes can be determined (Chen and Abecasis 2007). GHOST can handle large pedigrees. (When pedigrees are small, Merlin is also recommended for this analysis.)


GENERAL INPUT FILES

The input files include at least a data file, a pedigree file, and a map file. They can be specified with the -d, -p, and -m parameters. The command line will look like this:

prompt> ghost -d ex.dat -p ex.ped -m ex.map
Ghost supports input files in LINKAGE, GENEHUNTER, and MERLIN format. Please read MERLIN tutorial for details of MERLIN input files.


GENOTYPE INFERENCE

Missing genotypes can be inferred (entirely or partially) in GHOST. For the analysis of quantitative traits, the genotype inference procedure is incorporated in the test procedure, and thus a separeate genotype inference before carrying out the test is not necessary. However, when genotype inference needs to be carried out independent of the test, GHOST provides the option of stand-alone genotype inference.

The parameter for genotype inference is --infer. The inferred data are stored in ghost-infer.dat and ghost-infer.ped. The command line for genotype inference may look like this:

prompt> ghost -d ex.dat -p ex.ped -m ex.map --flank 6 --infer

Note only a few adjacent markers are involved in the computation at each SNP under inference. Therefore, the number of flanking markers used in the Elston-Stewart procedure needs to be specified (using parameter --flankingMarkers n)


ASSOCIATION ANALYSIS

Two major association tests are implemented in Ghost: score test and likelihood-ratio test (LRT). The score test is rapid and suitable for genome-wide association analysi. A score test can be specified with the --SCORE option. In the current version of Ghost, the score test is the default test, so an explicit specification is not necessary. In contrast, LRT is more reliable but more time-consuming. It is good for following up the initial genome-wide assocation results. The LRT can be specified with --ASSOC option.

All tests can be combined with --noinfer option for a fast analysis without carrying out the genotype inference.

When genotype data consist of both linkage markers and association SNPs, and association SNPs have much more missing genotypes than the linkage markers, option --twoStage provides a more efficient strategy to handle this type of data.

Examples of association tests are:
prompt> ghost -d ex.dat -p ex.ped -m ex.map --noinfer
prompt> ghost -d ex.dat -p ex.ped -m ex.map --flank 6 --two
prompt> ghost -d ex.dat -p ex.ped -m ex.map --ASSOC --flank 8 --two --start 6 --stop 8

The first command carries out a score test without inferring any missing genotypes. The second command carries out a score test incorporating genotype inference with 6 nearby flanking markers. The third command carries out a LRT incorporating genotype inference with 8 nearby flanking markers, starting from 6.0cM and ending at 8.0cM.


Some Other Useful Parameters

--trait trait_names specifies one or more traits to be analyzed. All traits will be analyzed if not specified.
--covariate covariate_names specifies covariates to be included in the model.
--normalize normalizes trait(s) through inverse normal transformation prior to analysis.


REFERENCE

1. Chen WM, Abecasis GR (2007) Family-based association tests for genomewide association scans. Am J Hum Genet 81:913-926 [PDF]
2. Burdick JT, Chen WM, Abecasis GR, Cheung VG (2006) In silico method for inferring genotypes in pedigrees. Nat Genet 38:1002-4
[PDF]


======================================
Last updated: September 18, 2007 by Wei-Min Chen