Departments of Anesthesiology and Biostatistics | University of Michigan

Method

This page explains how the preserved Cancer PRSweb release was assembled and how to read the score, download, and PheWAS outputs that now live in the static portal.

Use it when you are comparing candidate PRS for one cancer trait, checking whether a score-detail page reports a useful model in MGI or UK Biobank, or deciding what the linked follow-up files mean.

The preserved release follows the original resource described in Fritsche LG, Patil S, Beesley LJ, VandeHaar P, Salvatore M, Ma Y, Peng RB, Taliun D, Zhou X, Mukherjee B. Cancer PRSweb: An Online Repository with Polygenic Risk Scores for Major Cancer Traits and Their Evaluation in Two Independent Biobanks. Am J Hum Genet. 2020 Nov 5;107(5):815-836. . The local manuscript PDF and the article below remain the best primary references for the full scientific background.

Method in brief

  • Use Scores to compare multiple cancer PRS construction strategies for the same trait, then open a score-detail page for one model's metrics, downloads, and linked PheWAS results.
  • The preserved catalog evaluates phecode-defined cancer traits in two biobanks, MGI and UK Biobank, using matched case-control analyses.
  • Main score comparisons should start with pseudo-R2, Brier score, and AAUC, then use top-percentile odds ratios as supporting tail-risk summaries.
  • Read all-subject and exclusion PheWAS together to separate primary-trait confirmation from secondary signals that persist after related cancers are removed.

Evaluation cohorts

The original Cancer PRSweb study evaluated cancer PRS in two biobanks: the Michigan Genomics Initiative (MGI), a longitudinal Michigan Medicine biorepository, and UK Biobank (UKB), a large population-based cohort.

  • MGI analyses used 38,360 unrelated genotyped participants of inferred recent European ancestry with EHR data
  • UKB analyses were based on documented White British participants from the UK Biobank resource

Source phenotypes and matching

ICD9-CM and ICD10-CM diagnoses were aggregated to PheCodes, and each cancer trait was evaluated in matched case-control data. The original workflow used exact matching for sex and genotyping array and Mahalanobis-based matching on age or birth year and the first four genotype principal components.

  • Up to 10 controls were matched to each case with MatchIt
  • MGI phenome analyses covered 1,689 studies with more than 50 cases
  • UKB phenome analyses covered 1,419 studies with more than 50 cases

GWAS summary statistic sources

For each cancer trait, the original catalog combined up to three classes of GWAS summary statistics and harmonized coordinates to GRCh37 when needed.

  • NHGRI-EBI GWAS Catalog signals after allele harmonization and QC
  • Large cancer GWAS meta-analyses and publication-specific summary statistics
  • UK Biobank-derived resources including UKB PHECODE, PHESANT, ICD10, and FINNGEN-style phenotype sets

PRS construction strategies

Cancer PRSweb compared multiple construction methods rather than presenting one universal PRS per trait. Fixed thresholds and optimized thresholds were both evaluated after LD clumping, and full-summary-statistic sources could also be used with penalized or Bayesian weighting methods.

  • GWAS Hits (p ≤ 5e-08) and other fixed-threshold LD pruning / p-value thresholding runs
  • Optimized Pruning and Thresholding (P&T) using the pseudo-R2-selected threshold
  • lassosum and PRS-CS when full summary statistics were available
  • PRS with fewer than 5 included variants were excluded

Evaluation and score interpretation

PRS evaluation used logistic regression adjusted for age, sex, genotyping array, and the first four genotype principal components. Tuning relied on Nagelkerke's pseudo-R2, and the released score summaries emphasize pseudo-R2, Brier score, AAUC, and top 1%, 2%, 5%, 10%, and 25% odds ratios.

  • Matched strata were split 50/50 into training and testing sets for tuning and final evaluation
  • Firth bias reduction was used when logistic regression separation occurred
  • Pseudo-R2, Brier score, and AAUC remain the main comparison metrics on score pages
  • Top-percentile odds ratios show enrichment in the upper PRS tail and are best read alongside the main fit metrics

The released rows also keep tuning parameters, cohort labels, and quality flags together so users can compare multiple PRS constructions for the same cancer trait without relying on a live query backend.

PheWAS interpretation

Phenome-wide exploration was reserved for a subset of PRS that were strongly associated with their primary cancer trait. Analyses were run in MGI and, when the GWAS source was not derived from UKB, also in UKB.

  • Phenome-wide Bonferroni correction used 1,689 analyzed PheCodes in MGI and 1,419 in UKB
  • Directional markers indicate positive and inverse PRS-phenotype associations in the PheWAS plot
  • Exclusion PRS-PheWAS removed subjects with the primary or related cancer traits before refitting associations
  • All rows show the primary phenome scan for that PRS across the full eligible cohort
  • Exclusion rows repeat the scan after removing subjects with the primary or related cancer traits
  • Use the two views together to distinguish primary-trait confirmation from secondary signals that persist after exclusion

Upward markers indicate odds ratios above 1.0, downward markers indicate inverse associations, and the dashed threshold line reflects the phenome-wide significance cutoff for the currently selected analysis.

Release model and provenance

This instance is intentionally read-only. Each cancer release is assembled from normalized PRS model metadata, PheWAS rows, shared phecode and ICD mappings, and download manifests so the published site stays reproducible and easy to rebuild.

When an external catalog record or preserved weight file was available for the original resource, the score detail page links to it directly. Cancer PRSweb therefore acts as a stable discovery layer for evaluation results, preserved downloads, and external weight resources without recreating the full mutable application stack.

Unless otherwise noted, the original analyses were performed in R 3.6.1.

Full manuscript acknowledgments

The authors acknowledge the Michigan Genomics Initiative participants, Precision Health at the University of Michigan, and the University of Michigan Medical School Data Office for Clinical and Translational Research, the University of Michigan Medical School Central Biorepository, and the University of Michigan Advanced Genomics Core for providing data storage, management, processing, and distribution services, and the Center for Statistical Genetics in the Department of Biostatistics at the School of Public Health for genotype data curation, imputation, and management in support of the research reported in this publication.

Part of this research has been conducted using both the UK Biobank Resource under application number 24460 and also using results and data generated by previous researchers who have used the UK Biobank Resource.

This material is based in part upon work supported by the National Institutes of Health/NIH (NCI P30CA046592 [L.G.F., L.J.B., M.S., B.M.], NHGRI R01 HG008773 [B.M.], and T32 CA83654 [R.B.P.]), by the University of Michigan (UM-Precision Health Investigators Award U063790 [L.G.F., S.P., Y.M., B.M.]), and by the National Science Foundation under grant number DMS-1712933. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation. Additional acknowledgments of GWAS sources are listed in Supplemental Acknowledgments.

Primary sources

Found a bug?

Report a PRSweb issue

Open an email draft

This uses your default mail app and includes the current page URL automatically.