Original data (generated by the Broad Institute) are available at
BI autosomes and BI X chromosome.
There are total 629 individuals broken down by:
174 AFR = 78 YRI + 67 LWK + 24 ASW + 5 PUR
283 EUR = 90 CEU + 92 TSI + 43 GBR + 36 FIN + 17 MXL + 5 PUR
194 ASN = 68 CHB + 25 CHS + 84 JPT + 17 MXL
For each continental group (AFR, EUR, ASN), SNPs with missing genotypes are removed. For autosomes, we applied further filtering of SNPs not flagged as QC+ in the 4-way (Broad Institute, Michigan, Boston College and NCBI) merged set. Original supporting data used to construct the 4-way consensus SNP set are available at 4-way merged autosomes. Note that this original set doesn't include the X chromosome.
Singletons (SNPs with minor allele appearing once) are NOT removed.
The files can be directly fed to mach. We recommend a 2-step imputation procedure: pre-phasing using MaCH and imputation using minimac. For details, please go to minimac .
Report to Yun Li if a large number of genotyped SNPs are discarded due to absence in this reference. You can check through the following command line
> grep "will be ignored" mach.*.log
* $pop.chrX.hap.gz contains two duplicated chromosomes for each male. Please use $pop.chrX.noDup.hap.gz instead.
* Do not turn on --compact if memory is not an issue.