University of Michigan Center for Statistical 
Genetics
Search
 
 

 
 

GAINQC First Pass: Sample Checks

GAINQC operates in 2 passes. In the first pass, it performs sample based checks and flags samples that fail to meet the user-set threshold for sample quality. In the second pass, the flagged samples are excluded and quality check is performed on SNPs. In this section we will describe, in moderate detail, the sample based checks performed by GAINQC.

NOTE: All the tests are performed on genotypes after thresholding for quality score, if scores are provided. Genotypes with quality score below the thresholds are blanked out (treated as missing).

Individual Sample Checks

Genotyping completeness: The genotyping completeness rate is computed for all samples. For each sample, this is computed as the ration of the number of markers with a non-missing genotype call to the total number of markers. The sample genotyping completeness is checked both absolutely and relative to other samples. Any sample with too low absolute genotyping complteness is flagged. Samples with too low or too high genotyping complteness realtive to other samples (outliers) are also flagged.

Heterozygosity: The heterozygosity for each samples is computed as the ratio of the number of heterozygote genotype calls to the total number of non-missing calls. Similar to the genotyping completeness, the samples are flagged if their heterozygosity if too low or too high, both on the absolute and the relative scale.

Mendelian inconsistencies: In case trios are present in the study sample, GAINQC computes the number of markers where the sample is present in an inconsistent trio. The samples with too many mendelian inconsistencies are flagged.

Sample Sex Odds: For each sample, the odds of the sample being a male as opposed to being a female are calculated using the X-linked markers. Odds less than zero indicate that the sample is more likely to be a female, whereas positive odds indicate that the sample is more likely to be a male. The samples who have sex odds that mismatch their putative sex (with enough confidence -- above a threshold) are flagged.

Individual Sample Statistics

These statistics are calculated per sample in the first pass of GAINQC, but these are not used to flag any samples. They are provided as additional information.

Log-Likelihood of the genotypes: The likelihood of observing the genotypes of the sample are calculated using the allele frequencies computed for all the SNPs. The allele frequencies are computed using all the available sample data. A histogram of log-likelihoods (log of the likelihood) is generated and included with the sample histograms.

Average quality score: The average quality score of the sample genotypes is computed using the genotypes before thresholding, i.e. all the sample's genotypes are used - including the ones that have a quality score lower than the threshold. A histogram of average quality scores is included with the sample histograms. This is computed only if quality score are available.

 
 

University of Michigan | School of Public Health | Abecasis Lab