上海郊区成年女性乳腺癌相关因素分析及风险分层模型构建

Analyses of factors associated with breast cancer and construction of a risk stratification model among adult women in suburban Shanghai

  • 摘要:目的】 分析上海郊区成年女性乳腺癌的相关因素,并构建风险分层模型,为高风险人群的识别提供依据。【方法】 基于上海郊区自然人群队列和生物样本库基线资料开展横断面研究,纳入20~74岁女性39683名,其中乳腺癌249例,非乳腺癌39434例。收集人口学特征、生殖健康、生活方式、慢性病及心理健康等信息。采用多重插补处理缺失值,采用多因素Firth logistic回归分析乳腺癌相关因素,并采用限制性立方样条分析糖化血红蛋白与乳腺癌患病的剂量-反应关系。以乳腺癌为结局,按7∶ 3将数据分为训练集和测试集,采用logistic回归构建风险分层模型,并进行Platt scaling概率校准。采用ROC曲线下面积(ROC-AUC)、PR曲线下面积(PR-AUC)、Brier评分、校准截距、校准斜率。考虑到乳腺癌属于低患病率结局,进一步计算模型针对高风险前5%和前10%人群的精确率、召回率、提升倍数,以评价模型在高风险筛查场景中的风险识别与分层能力。【结果】 多因素Firthlogistic回归全调整模型显示,年龄增加与乳腺癌患病风险升高相关(OR=1.050,95% CI:1.033~1.067),而初次月经年龄增加(OR=0.921,95% CI: 0.859~0.987)和怀孕次数增加(OR=0.875,95% CI: 0.770~0.986)与乳腺癌患病风险降低相关。有雌激素服用史者乳腺癌患病风险高于无雌激素服用史者(OR=3.098,95% CI: 1.481~5.728);高血糖者乳腺癌患病风险高于无高血糖者(OR=1.754,95% CI: 1.305~2.334);与无焦虑抑郁者相比,轻度(OR=2.239,95% CI: 1.484~3.266)、重度焦虑抑郁者(OR=10.104,95% CI:1.106~32.798)乳腺癌患病风险升高。限制性立方样条分析显示,糖化血红蛋白与乳腺癌患病存在非线性关联(P=0.003)。基于logistic回归构建的风险分层模型具有一定判别能力(ROC-AUC=0.730,95% CI: 0.673~0.784),整体预测误差较小(Brier评分=0.006,95% CI: 0.006~0.006);在高风险前5%和前10%人群中,模型的提升倍数分别为4.261和3.199,可分别识别21.3%和32.0%的乳腺癌病例。【结论】 上海郊区成年女性乳腺癌患病与年龄较大、初次月经年龄较早、怀孕次数较少、雌激素服用史、高血糖以及轻度和重度焦虑抑郁有关,提示在社区女性健康管理中应加强对生殖激素相关因素、代谢异常及心理健康状况的综合关注。基于常规流行病学指标构建的logistic风险分层模型具有一定判别能力和高风险人群富集能力,可在资源有限场景下为社区人群乳腺癌高风险女性的初筛与分层管理提供参考。

     

    Abstract: Objective To examine factors associated with breast cancer among adult women in suburban Shanghai and develop a risk stratification model, providing a basis for the identification of high-risk populations. Methods A cross-sectional study was conducted using baseline data from the Shanghai Suburban Adult Cohort and Biobank. A total of 39683 women aged 20-74 years old were included, comprising 249 breast cancer cases and 39434 nonbreast cancer cases. Information for demographic characteristics, reproductive health, lifestyle, chronic diseases, and psychological status was collected. Missing data were handled using multiple imputation. Multivariable Firth logistic regression analyses were used to identify factors associated with breast cancer, and restricted cubic spline analyses were performed to assess the dose-response relationship between hemoglobin A1c and breast cancer. With breast cancer as the outcome, the data were randomly split into a training set and a test set at a ratio of 7∶ 3. A logistic regression model was developed for risk stratification and further calibrated using Platt scaling. Model performance was evaluated using the area under the receiver operating characteristic curve (ROC-AUC), area under the precisionrecall curve (PR-AUC), Brier score, calibration intercept, and calibration slope. Given that breast cancer was a lowincidence outcome, the model’s precision, recall, and lift for the top 5% and top 10% of high-risk individuals were further calculated, so as to evaluate the model’s ability to identify and stratify risk in high-risk screening scenarios. Results In the fully adjusted model, increasing age was associated with a higher risk of breast cancer (OR=1.050, 95%CI: 1.033-1.067), whereas older age at menarche (OR=0.921, 95%CI: 0.859-0.987) and a greater number of pregnancies (OR=0.875, 95%CI: 0.770-0.986) were associated with a lower risk of breast cancer. Women with a history of estrogen use had a higher risk of breast cancer than those without such a history (OR=3.098, 95%CI: 1.481-5.728). Hyperglycemia was associated with an increased risk of breast cancer (OR=1.754, 95%CI: 1.305-2.334). Compared with women without anxiety or depression, those with mild (OR=2.239, 95% CI: 1.484-3.266) and severe anxiety or depression(OR=10.104, 95% CI: 1.106-32.798) had a higher risk of breast cancer. Restricted cubic spline analyses showed a nonlinear association between hemoglobin A1c and breast cancer (P=0.003). The logistic regression-based risk stratification model showed moderate discriminative ability (ROC-AUC=0.730, 95%CI: 0.673-0.784) and a low overall prediction error (Brier score=0.006, 95%CI: 0.006-0.006). The lift values were 4.261 for the top 5% high-risk group and 3.199 for the top 10% high-risk group, identifying 21.3% and 32.0% of breast cancer cases, respectively. Conclusion Breast cancer among adult women in suburban Shanghai was associated with older age, earlier age at menarche, fewer pregnancies, history of estrogen use, hyperglycemia, and mild and severe anxiety or depression, suggesting that greater attention should be paid to reproductive hormonerelated factors, metabolic abnormalities, and psychological health in community-based women’s health management. The logistic regression-based model using routine epidemiological indicators showed moderate discriminative ability and potential for enriching high-risk populations and may help support initial screening and stratified management of women at high risk of breast cancer in resource-limited community settings.

     

/

返回文章
返回