1000G 2010-03 Download

Original data (generated by merging three preliminary call sets: (1) by Jared Maguire and colleagues at the Broad Institute; (2) by Yun Li and Goncalo Abecasis at the University of Michigan; and (3) by Quang Le and Richard Durbin at the Sanger Institute) are the March 2010 release of phased data from the 1000 Genomes Project, downloadable from ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/pilot_data/release/2010_03/pilot1/. The CEU dataset contains 120 haplotypes. Singletons (SNPs with minor allele appearing once) are NOT removed.

Download Data

    2010-03.CEU.hap.tgz          
    2010-03.CEU.snps.tgz        	      
    2010-03.CEU.map.tgz        	      
    README
    Annotation Files

The files can be directly fed to mach. We recommend a 2-step imputation procedure:
(step 1) a representative subset of >= 200 unrelated individuals are used to calibrate model parameters; and
(step 2) actual genotype imputation is performed for every person using parameters inferred in step 1.

Example command lines for a 2-step imputation:
mach1 -d sample.dat -p subset.ped -s chr20.snps -h chr20.hap --compact --greedy --autoFlip -r 100 -o par_infer > mach.infer.log
mach1 -d sample.dat -p sample.ped -s chr20.snps -h chr20.hap --compact --greedy --autoFlip --errorMap par_infer.erate --crossoverMap par_infer.rec --mle --mldetails > mach.imp.log

Warning:
Report to Yun Li if a large number of genotyped SNPs are discarded due to absence in this reference. You can check through the following command line
> grep "will be ignored" mach.*.log

Notes:
Do not turn on --compact if memory is not an issue.

University of Michigan | School of Public Health | Abecasis Lab