University of Michigan Center for Statistical 
Genetics
Search
 
 

 
 

1000G 2009-08 Download

Original data (generated by Yun Li, Goncalo Abecasis and colleagues) are the August 2009 release of phased data from the 1000 Genomes Project, downloadable from ftp://share.sph.umich.edu/1000genomes/pilot1/. Two individuals (the father and daughter of the trio in Pilot 2 [namely, NA12891 and NA12878]) were removed because no sequencing data were used to built the haplotypes, resulting in 122 haplotypes. Singletons (SNPs with minor allele appearing once) are removed.

Download Data

    UoM_0908_CEU_NoSingleton.hap.tgz          
    UoM_0908_CEU_NoSingleton.snps.tgz        	      
    UoM_0908_CEU_NoSingleton.map.tgz        	      
    UoM_0908_CEU_NoSingleton.info.tgz        	      
    README        	      
    Annotation Files        	      
  

 The files can be directly fed to mach. We recommend a 2-step imputation procedure:
(step 1) a representative subset of >= 200 unrelated individuals are used to calibrate model parameters; and
(step 2) actual genotype imputation is performed for every person using parameters inferred in step 1.

Example command lines for a 2-step imputation:

mach1 -d sample.dat -p subset.ped -s chr20.snps -h chr20.hap --compact --greedy --autoFlip -r 100 -o par_infer > mach.infer.log
mach1 -d sample.dat -p sample.ped -s chr20.snps -h chr20.hap --compact --greedy --autoFlip --errorMap par_infer.erate --crossoverMap par_infer.rec --mle --mldetails > mach.imp.log

 Warning:
Report to Yun Li if a large number of genotyped SNPs are discarded due to absence in this reference. You can check through the following command line
> grep "will be ignored" mach.*.log

 Notes:
Do not turn on --compact if memory is not an issue.

 
 

University of Michigan | School of Public Health | Abecasis Lab