Screening mammography and advances in treatments have decreased breast cancer mortality in recent years.

Breast cancer impacts require personalized screening and prevention strategies, as risk prediction models don’t correlate with mortality risk, mammography can lead to overdiagnosis, and chemoprevention effectiveness remains uncertain.

A study published in The Lancet Digital Health aims to create a prognostic model that accurately predicts the 10-year risk of breast cancer mortality in females without a prior breast cancer diagnosis, using a large, representative dataset of over 11.6 million women.

The study used the QResearch primary care database data from 2000-2020 to identify individuals at high risk of life-threatening breast cancers in England, UK, rather than focusing solely on cancer incidence.

Key findings from the study included:

  1. Dataset: Data from 11,626,969 female individuals were analyzed, totaling 70,095,574 person-years of follow-up. Among these, 1.2% received breast cancer diagnoses, 0.2% experienced breast cancer-related deaths, and 6.0% died from other causes.
  2. Model Performance: Researchers utilized various modeling approaches, with the competing risks model demonstrating the highest performance, achieving a Harrell’s C-index of 0.932. This model exhibited excellent calibration across diverse age and ethnic groups.
  3. Clinical Utility: Decision curve analysis indicated favorable clinical utility across all age groups, suggesting that this model could assist in stratified screening or preventive strategies.
  4. Implications: The model that combines the risk of developing and dying from breast cancer on a population level has the potential to guide more effective screening and prevention approaches. Further research should assess the impact and cost-effectiveness of strategies informed by this model.

The study was funded by Cancer Research UK.

Methods and participants

This study utilized various modeling techniques, such as Cox proportional hazards, competing risks regression, XGBoost, and neural networks, to predict breast cancer mortality risk over a 10-year period in women with no prior breast cancer history. Researchers evaluated these models using an internal–external validation approach involving dataset partitioning based on time period and geographical region. They collected data from the QResearch database, which links primary care, hospital records, national cancer registry data, and mortality records. And they obtained ethical approval for this study.

The study enrolled adult females aged 20 years and older, specifically those 20–90 years old, who entered the QResearch database between January 1, 2000, and December 31, 2020. Researchers excluded women with previous diagnoses of invasive breast carcinoma or ductal carcinoma in situ from the study.

Outcomes and Candidate Predictors

The primary outcome was breast cancer mortality, defined as breast cancer recorded as a primary or contributory cause of death. Candidate predictor variables associated with breast cancer diagnosis or mortality were identified from clinical and epidemiological literature. These predictors were assessed at cohort entry or the most recent record before entry.

Procedures for Missing Data

Researchers employed multiple imputation to address missing data for variables such as alcohol intake, smoking status, BMI, deprivation score, and ethnicity. These imputed values were utilized throughout model development and evaluation.

Modelling Strategy

Researchers applied each model to the entire cohort, utilizing internal–external cross-validation, which involves data splitting by time period and geographical region. They assessed performance using metrics like Harrell’s C-index, calibration slope, and calibration in the large. Calibration plots visualized accuracy, and decision curve analysis gauged clinical value.

To develop regression models, they determined a minimum sample size of 199,500 participants with 400 outcome events, based on specific statistical parameters and a Cox-Snell R2 of 0.0045.

The analyses were carried out using Stata version 17 and R version 3.7.


After excluding female individuals with a recorded history of previous or current breast cancer (n=152,870) or ductal carcinoma in situ diagnoses (n=5,409), the final study cohort comprised 11,626,969 females.
This study is the largest to develop clinical prediction models in breast cancer and the first to develop models estimating the risks of breast cancer mortality in the general female population.

This study developed prediction models for breast cancer mortality in females without breast cancer, with the competing risks regression model demonstrating the highest clinical utility. Accurate risk prediction tools can help target interventions and improve outcomes in breast cancer prevention and screening programs. Further research and validation are necessary before implementing these models in clinical practice.