Method
This page explains how the preserved ExPRSweb release was assembled and how to read the score, download, and PRS-PheWAS outputs that now live in the static portal.
Use it when you are comparing candidate ExPRS for one exposure, checking whether a score-detail page reports a useful model in MGI or UK Biobank, or deciding what the linked follow-up files mean for binary and continuous traits.
The preserved release follows the original resource described in Ma Y, Patil S, Zhou X, Mukherjee B, Fritsche LG. ExPRSweb: An online repository with polygenic risk scores for common health-related exposures. Am J Hum Genet. 2022 Oct 6;109(10):1742-1760. . The local manuscript PDF and the article below remain the best primary references for the full scientific background.
Method in brief
- Use Scores to compare multiple ExPRS methods for the same exposure, then open a score-detail page for one model's tuning choices, cohort, downloads, and PRS-PheWAS context.
- Binary and continuous exposures are evaluated differently, so some shared score-detail fields are intentionally blank for continuous traits.
- Main comparisons depend on cohort-specific train-test evaluation, adjusted regression models, and method-specific tuning rather than one universal score.
- Treat exclusion PRS-PheWAS as an exposure-based sensitivity analysis that helps separate broad follow-up signals from associations driven mainly by the primary exposure definition.
Evaluation cohorts
The original ExPRSweb study evaluated exposure PRS in two biobanks: the Michigan Genomics Initiative (MGI), a longitudinal Michigan Medicine biorepository with linked EHR data, and UK Biobank (UKB), a large population-based cohort. Some ExPRS are only shown in one cohort when the preserved legacy source files did not include both settings.
- MGI analyses used 46,782 unrelated genotyped participants of inferred recent European ancestry with integrated EHR data
- UKB analyses used roughly 408,595 documented White British participants from the UK Biobank imputed dataset
- MGI recruitment extended the original anesthesia-based collection effort with MIPACT, MHB2, and MGI-MEND participants
Exposure definitions and cohort phenomes
ExPRSweb was not only a score catalog. It also defined how target exposures were represented in cohort data before PRS evaluation, which is why some rows and downstream PRS-PheWAS artifacts remain cohort-specific.
- MGI genotypes came from customized Illumina Infinium CoreExome-24 arrays, with over 24 million HRC-imputed variants retained after QC
- UKB analyses used the UK Biobank imputed dataset and cohort-specific subsets for LD calculations on UKB-derived GWAS inputs
- The original release covered 21 continuous exposures and 7 binary exposures with freely available, complete GWAS summary statistics
- Binary disorders such as type 2 diabetes, hypertension, insomnia, and sleep apnea used phenotype definitions aligned with the cohort phenomes
- Smoking and alcohol variables with repeated survey responses were recoded to never versus ever use
- Repeated continuous measurements were harmonized by removing within-person outliers and averaging retained observations
- MGI phenome analyses matched up to 10 controls per case and covered 1,685 studies with more than 50 cases
- UKB phenome analyses used birth year as an age proxy, handled relatedness before matching, and covered 1,419 studies with more than 50 cases
ICD9 and ICD10 diagnoses were aggregated to PheCodes in both cohorts. Matching used Mahalanobis distance on age or birth year and the first four genotype principal components, with exact matching on sex and genotyping array.
For UKB-derived kidney function traits, the original workflow also derived eGFR from harmonized creatinine, race, and sex information using the CKD-EPI equation.
Exposure GWAS summary statistic sources
For each exposure, the original catalog combined complete GWAS summary statistics from up to four classes of sources, then performed ancestry and allele-harmonization checks before score construction.
- NHGRI-EBI GWAS Catalog records
- FinnGen Consortium GWAS
- Published GWAS meta-analyses
- UK Biobank-derived GWAS resources including Lee Lab and Neale Lab releases
Only broad European-ancestry summary statistics were retained to match the target cohorts. Coordinates were lifted to GRCh37 when needed, rows with missing effect alleles or effect sizes were excluded, and reported allele frequencies were used as an additional QC screen for likely flipped alleles.
ExPRS construction strategies
ExPRSweb compared multiple construction methods for each exposure rather than presenting one universal score. For each exposure GWAS, the original workflow generated up to five PRS across five implementations of four method families and retained the method-specific tuning parameter in the released metadata.
- C+T (GT) and C+T (DS) optimized clumping-and-thresholding runs that differed in whether LD calculations used best-guess genotypes or dosage data
lassosumwith a 5,000-sample LD reference panel and tuning over multiplesandlambdacombinations- DBSLMM using GWAS summary statistics plus LD information, default software parameters, and no cross-validation requirement
PRS-CSin the default auto mode with an external European 1000 Genomes LD reference panel
The original method page also specified shared filters used across methods: autosomal overlap between summary
statistics, reference, and target data; minor allele frequency filtering; LD clumping windows of 1 Mb with
r2 < 0.1; and exclusion of approaches that produced fewer than 5 retained variants or weights.
Evaluation and score interpretation
Each ExPRS was evaluated with a 50/50 training-test split performed separately within each trait and cohort, keeping the gender ratio unchanged between splits. Training data selected tuning parameters, and testing data supplied the reported performance summaries.
- Regression models adjusted for age or birth-year proxy, sex, genotyping array, and the first four genotype principal components
- Binary exposures used logistic models and selected tuned C+T or
lassosumvariants by Nagelkerke's pseudo-R2 - Continuous exposures used R2-based tuning and evaluation rather than discrimination metrics such as AAUC
- Centered and scaled PRS values made effect sizes more comparable across exposures and construction methods
- Binary rows in the static portal keep pseudo-R2, Brier score, AAUC, and top-percentile effects together for direct comparison
- Continuous rows keep fit and beta-based percentile summaries, so some shared score-detail fields remain blank by design
The release keeps the chosen tuning parameter, cohort label, and construction method together so users can compare alternative ExPRS for the same exposure without depending on a live query backend.
Firth bias reduction was used when logistic regression separation occurred, and percentile summaries should always be interpreted in light of the reported trait type and effect metric.
PheWAS interpretation
PRS-PheWAS pages summarize downstream associations between one exposure PRS and EHR-derived phenotypes across the phenome. The linked files preserve both the full scan and the secondary exclusion scan from the original resource.
- All PheWAS: scan across the full phenome for one ExPRS
- Exclusion PheWAS for binary exposures: repeat the scan after excluding individuals with the exposure itself
- Exclusion PheWAS for quantitative exposures: repeat the scan within the study-defined normal-range subset after removing low and high exposure values
- Interpret the exclusion scan as an exposure-based sensitivity analysis rather than a diagnosis-based exclusion
- Use the paired views to distinguish broad follow-up signals from associations driven mainly by the primary exposure definition
The static plotting layer is simpler than the original Grails portal, but the underlying PRS-PheWAS and exclusion-PheWAS result tables remain downloadable for exact odds ratios, confidence intervals, and counts. For quantitative traits, the exclusion scan context on each score page is drawn from the preserved normal-range metadata described in the manuscript and supporting range table.
The ExPRSweb manuscript also describes trait-PheWAS and exclusion-trait-PheWAS analyses that use measured exposures or normal-range measured exposures as predictors. This static portal preserves the ExPRS-driven PRS-PheWAS artifacts and documents that distinction, but it does not add separate trait-PheWAS pages.
Release model and provenance
This instance is intentionally read-only. Each ExPRS release is assembled from normalized model metadata, PheWAS rows, shared phecode-to-ICD mappings, and download manifests so the published site stays reproducible and easy to rebuild.
The original ExPRSweb portal was implemented in Grails. This static replacement preserves the scientific outputs and manuscript context while removing the live application stack. When an external catalog record or other preserved weight resource was available, the score detail page links to it directly.
The release documentation also preserves the original acknowledgement context for MGI participants, Precision Health, the University of Michigan data and biorepository groups, the Advanced Genomics Core, the Center for Statistical Genetics, and UK Biobank application 24460.
Unless otherwise noted, the original ExPRS analyses were performed in R 4.1.1.
Full manuscript acknowledgments
The authors acknowledge the Michigan Genomics Initiative participants, Precision Health at the University of Michigan, and the University of Michigan Medical School Data Office for Clinical and Translational Research; the University of Michigan Medical School Central Biorepository and the University of Michigan Advanced Genomics Core for providing data storage, management, processing, and distribution services; and the Center for Statistical Genetics in the Department of Biostatistics at the School of Public Health for genotype data curation, imputation, and management in support of the research reported in this publication.
Part of this research has been conducted with both the UK Biobank Resource under application number 24460 and with results and data generated by previous researchers who have used the UK Biobank Resource.
This material is based in part upon work supported by the National Institutes of Health/NIH (NCI P30CA046592 [L.G.F. and B.M.]), by the University of Michigan (UM-Precision Health Investigators Award U063790 [L.G.F., S.P., Y.M., and B.M.]), and by the National Science Foundation under grant number DMS-1712933. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation.