#These files are the Polygenic score weights included in the Graham, et al. 2021 Global Lipids Genetics Consortium Manuscript (under review)

#The format of each file is as follows:
Variant(build 37)	Effect_Allele	Weight
19:45412079:C:T	T	-0.488976

#Naming convention:
PT = pruning and thresholding from plink (from the optimal threshold selected after varying r2, p-value, and distance thresholds)
PRS-CS = from PRS-CS, with the phi auto option (https://github.com/getian107/PRScs)
Full details on the variant and threshold selection are included in the supplementary information and tables

#Ancestry abbreviations
ALL = AFR+EAS+EUR+HIS+SAS
AFR = African + African American (primarily African American)
EAS = East Asian
EUR = European
HIS = Hispanic
SAS = South Asian

#PT parameters (based on optimization in ancestry-matched individuals)
ALL = r2=0.1 distance=500kb p=5e-4
AFR = r2=0.1 distance=250kb p=5e-9
EAS = r2=0.1 distance=500kb p=5e-9
EUR = r2=0.1 distance=500kb p=5e-4
HIS = r2=0.1 distance=500kb p=5e-7
SAS = r2=0.1 distance=500kb p=5e-10


#PRS methods section from manuscript
LDL-C polygenic scores
----------------------
	"Weights for the LDL-C polygenic scores were derived from beta estimates generated from each of the ancestry-specific meta-analyses and from the trans-ancestry results using METAL. Additional meta-analyses were carried out using the 2010 Global Lipids Genetics Consortium LDL-C meta-analysis results5 in combination with the i) AdmAFR or ii) AdmAFR, EAS, HIS, and SAS results from the present meta-analysis for comparison. Furthermore, we performed a meta-analysis of European cohorts randomly selected to reach a total sample size near 100K, 200K, or 400K to understand the role of increasing European sample size and the influence of imputation panel. In addition, we tested possible methods for improving performance of European-derived scores in African-ancestry individuals by separately fitting the EUR polygenic scores in the UK Biobank AdmAFR subset to determine the best set of risk score parameters (either pruning and thresholding or PRS-CS). 
We generated polygenic score weights using both: i) significant variants only (at a variety of p-value thresholds) and ii) using genome-wide methods. Meta-analysis results were first filtered to variants present in UK Biobank, MGI, and MVP with imputation info score > 0.3. Pruning and thresholding was performed in PLINK62 with ancestry-matched subsets of UK Biobank individuals (AdmAFR N=7,324, EUR N=40,000, SAS N=7,193, trans-ancestry: N=10,000 (80% EUR, 15% AdmAFR, 5% SAS)) or 1KGP3 (HIS N=347 , EAS N=504) used for LD reference. We additionally tested 1000 Genomes phase 3 with all populations included as the LD reference panel for the trans-ancestry score (results not shown), which gave very similar results to those of the UK Biobank trans-ancestry reference set originally selected for its larger sample size. P-value thresholds (after GC correction) of 5x10-10, 5x10-9, 5x10-8, 5x10-7, 5x10-6, 5x10-5, 5x10-4, 5x10-3, and 5x10-2 were tested with distance thresholds of 250 and 500 kb and LD r2 thresholds of 0.1 and 0.2. Polygenic score weights were also generated using PRS-CS63 with the LD reference panels for AFR, EAS, and EUR populations from 1000 Genomes provided by the developers. PRS-CS LD reference panels for the other ancestries were generated using 1000 Genomes following the same protocol as provided by the PRS-CS authors63. This included removing variants with MAF ≤ 0.01, ambiguous A/T or G/C variants, and restricting to variants included in HapMap3. Pairwise LD matrices within pre-defined LD blocks64 (using EUR LDetect blocks for HIS and trans-ancestry LD calculations and ASN blocks for SAS) were then calculated using PLINK and converted to HDF5 format.
For each individual in the PRS testing cohorts, polygenic scores were calculated as the sum of the dosages multiplied by the given weight at each variant. UK Biobank individuals not present in datasets used to generate the summary statistics (either AdmAFR, white British, both AdmAFR and white British, EAS, SAS, or all individuals excluding SAS) were used to select the best performing AdmAFR, EUR, AdmAFR+EUR, EAS, SAS, and trans-ancestry polygenic scores, respectively. UK Biobank SAS individuals were included in the trans-ancestry risk score weights but excluded from the UK Biobank trans-ancestry testing set due to an initial focus on comparing predictions among European- and African-ancestry individuals. Sample sizes of the ancestry groups in UK Biobank used to test PRS performance included: AdmAFR N=6,863; EAS N=1,441; EUR N=389,158; SAS N=6,814; ALL=461,918. The best performing HIS polygenic score weights were selected based on performance in Hispanic individuals in the Michigan Genomics Initiative dataset. Model fit was assessed by the adjusted R2 of a linear model for LDL-C value at initial assessment adjusted for cholesterol medication (divided by 0.7 to estimate pre-medication levels) with sex, batch, age at initial assessment, and PCs1-4 as covariates. 
The best performing polygenic score in each ancestry group was then tested in the validation cohorts: the Michigan Genomics Initiative (EUR N=17,190; AFRAMR N=1,341), East London Genes and Health65 (ELGH; SAS N=15,242), Tohoku Medical Megabank Community Cohort Study (ToMMo; EAS N=28,217), Korean Genome and Epidemiology Study66 (KoGES; EAS N=118,260), Penn Medicine BioBank (PMBB; AFRAMR=2,138), Africa America Diabetes Mellitus (AADM; 3,566 West AFR; 707 East AFR), Africa Wits-INDEPTH partnership for Genomic Studies (AWI-Gen; 1,744 East AFR; 4,972 South AFR; 3,744 West AFR) and Million Veterans Program participants not included in the discovery meta-analysis (MVP; EUR N=68,381; AFRAMR N=18,251; EAS/SAS N=4,155; HIS N=7,669). Adjusted R2 values were reported for each cohort and ancestry group, with 95% confidence intervals for the adjusted R2 values calculated using bootstrapping. Within each cohort, covariates used were: MGI- sex, batch, PC1-4, and birth year; PMBB- birth year, sex, and PC1-4; ELGH- age, sex, and PC1-10; MVP- sex, PC1-4, birth year, and mean age; ToMMo-sex, age, recruitment method, and PC1-20 (only participants from Miyagi Prefecture were included); KoGES-age, sex, and recruitment area, AADM-age, sex, PC1-3, AWI-Gen East Africa- age, sex, PC1-6, AWI-Gen South Africa- age, sex, PC1-6, and AWI-Gen West Africa- age, sex, and PC1-4. The type of LDL-C value used in the model varied depending on the measurements selected by each cohort. Mean LDL-C values were used for MGI, MVP and PMBB, maximum LDL-C values for ELGH, and baseline measurements for AADM, AWI-Gen, ToMMo and KoGES. A descriptive summary of each replication cohort is included in Supplementary Table 14. African admixture for MGI was calculated using all African-ancestry individuals in 1000 Genomes with ADMIXTURE v1.367. African admixture for MVP was calculated using the YRI and LWK African-ancestry individuals in 1000 Genomes."