University of Michigan Center for Statistical 
Genetics
Search
 
 

 
 

1000G 2009-08 Download

Original data (generated by Richard Durbin and colleagues) are the August 2009 release of phased data from the 1000 Genomes Project, downloaded from ftp://ftp.sanger.ac.uk/pub/1000genomes/REL-0908/LowCov/. Seven individuals (including the father and daughter of the trio in Pilot 2 [namely, NA12891 and NA12878], NA10847, NA10851, NA12004, NA12414 and NA12717) were removed because no sequencing data were used to built the haplotypes, resulting in 112 haplotypes. Singletons (SNPs with minor allele appearing once) are removed.

Download Data

    0908_CEU_NoSingleton.hap.tgz          
    0908_CEU_NoSingleton.snp.tgz        	      
    0908_CEU_NoSingleton.map.tgz        	      
    ID        	      
  

 The files can be directly fed to mach. We recommend a 2-step imputation procedure:
(step 1) a representative subset of >= 200 unrelated individuals are used to calibrate model parameters; and
(step 2) actual genotype imputation is performed for every person using parameters inferred in step 1.

Example command lines for a 2-step imputation:

mach1 -d sample.dat -p subset.ped -s chr20.snps -h chr20.hap --compact --greedy --autoFlip -r 100 -o par_infer > mach.infer.log
mach1 -d sample.dat -p sample.ped -s chr20.snps -h chr20.hap --compact --greedy --autoFlip --errorMap par_infer.erate --crossoverMap par_infer.rec --mle --mldetails > mach.imp.log

 Warning:
Report to Yun Li if a large number of genotyped SNPs are discarded due to absence in this reference. You can check through the following command line
> grep "will be ignored" mach.*.log

 Notes:
Do not turn on --compact if memory is not an issue.

 
 

University of Michigan | School of Public Health | Abecasis Lab