1000G Phase I Integrated Release Version 2 Haplotypes
(2010-11 data freeze, 2012-02-14 haplotypes)
The release contains haplotypes on 1092 samples (#haplotypes = 2184) for total 40,309,712 bi-allelic polymorphic markers.
Among the ~40 million markers, 3,660,720 are short indels and large deletions, the rest SNPs. Indels alleles are coded as
"R", "D", "I" for REF (reference allele), deletion ALT (deletion
alternative allele), and insertion ALT (insertion alternative allele). SNPs are still coded as 1234 or ACGT.
Latest version of MaCH/MaCH-Admix and
minimac can handle both the above extended MaCH reference haplotype format,
and vcf format.
Original data available
1000 Genomes Project FTP site. The sub-population and continental group information for the 1,092 individuals can be found
phase1_integrated_calls.20101123.ALL.panel. A breakdown by continents is pasted below:
This set of phase haplotypes includes singletons (for completeness, although you probably won't be able to impute them very well!).
Monomorphic sites are removed.
- AFR 246
- AMR 181
- ASN 286
- EUR 379
If you have any questions, email Yun Li, or Christian Fuchsberger.