切换至 "中华医学电子期刊资源库"

中华胸部外科电子杂志 ›› 2025, Vol. 12 ›› Issue (03) : 152 -161. doi: 10.3877/cma.j.issn.2095-8773.2025.03.05

所属专题: 文献

论著

大语言模型在肺癌辅助诊疗中的应用探索
段智允, 刘方益, 蒋冬先, 王青乐, 栾温熠, 吴颖, 江天, 唐汉, 谭黎杰()   
  1. 200032 上海,复旦大学附属中山医院胸外科
  • 收稿日期:2025-07-11 修回日期:2025-08-13 接受日期:2025-08-27 出版日期:2025-08-28
  • 通信作者: 谭黎杰
  • 基金资助:
    国家自然科学基金(82300108)

Exploring the application of large language models in lung cancer auxiliary diagnosis and treatment

Zhiyun Duan, Fangyi Liu, Dongxian Jiang, Qingle Wang, Wenyi Luan, Ying Wu, Tian Jiang, Han Tang, Lijie Tan()   

  1. Department of Thoracic Surgery, Fudan University, Shanghai 200023, China
  • Received:2025-07-11 Revised:2025-08-13 Accepted:2025-08-27 Published:2025-08-28
  • Corresponding author: Lijie Tan
  • About author:

    *Co-first authors

引用本文:

段智允, 刘方益, 蒋冬先, 王青乐, 栾温熠, 吴颖, 江天, 唐汉, 谭黎杰. 大语言模型在肺癌辅助诊疗中的应用探索[J/OL]. 中华胸部外科电子杂志, 2025, 12(03): 152-161.

Zhiyun Duan, Fangyi Liu, Dongxian Jiang, Qingle Wang, Wenyi Luan, Ying Wu, Tian Jiang, Han Tang, Lijie Tan. Exploring the application of large language models in lung cancer auxiliary diagnosis and treatment[J/OL]. Chinese Journal of Thoracic Surgery(Electronic Edition), 2025, 12(03): 152-161.

目的

探索国内外主流大语言模型(LLMs)在肺癌辅助诊疗中的应用现状和前景。

方法

来自复旦大学附属中山医院的肺癌诊疗多学科团队,结合国内外指南和长期临床实践经验,设计出40个涵盖肺癌基本概念、肺癌筛查、肺癌诊断、肺癌治疗和肺癌病理5个模块的肺癌诊疗相关问题,提问国内外主流LLMs,包括DeepSeek-V3、DeepSeek-R1、豆包、Kimi和GPT-4o,并收集模型的输出结果。随后由两名经验丰富的胸外科医生依据5分类法对回答的准确性和情感支持度进行评分,对比不同模型间的表现差异。

结果

GPT-4o、DeepSeek-V3和DeepSeek-R1表现相似,评分中位数[四分位距(IQR)]为5.00(4.50~5.00),显著优于Kimi[4.25(3.50~4.50)]和豆包[4.50(3.88~4.50)]。亚组分析显示,DeepSeek-R1在基本概念、诊断、治疗和病理多个模块表现出色。DeepSeek-V3整体表现优异,尤其擅长诊断模块。GPT-4o则更擅长筛查模块。情感支持度评估显示,LLMs整体表现显著低于准确性维度,得分中位数集中在3.00附近。其中DeepSeek-R1生成的回答最能让患者感到支持,评分中位数(IQR)为3.50(3.00~4.50)。GPT-4o[2.50(2.50~3.12)]、DeepSeek-V3[3.25(2.50~3.50)]和豆包[3.00(2.50~3.50)]表现相似,优于Kimi[2.50(2.50~3.00)]。亚组分析则显示LLMs在各个模块评分整体偏低,低分占比较高,情感支持不足较为明显。

结论

LLMs在肺癌诊疗领域展现出初步的应用潜力,但在处理复杂临床场景和患者沟通等方面仍存在不足。未来,伴随LLMs不断发展完善,可以预见其在肺癌诊疗领域的广阔应用前景。

Objective

To explore the current status and potential of mainstream large language models (LLMs) in lung cancer auxiliary diagnosis and treatment.

Methods

A multidisciplinary team from Zhongshan Hospital Affiliated to Fudan University designed 40 questions based on domestic and international guidelines and long-term clinical experience. The questions covered five modules of lung cancer diagnosis and treatment: basic concepts, lung cancer screening, diagnosis, treatment, and pathology. The questions were posed to five mainstream LLMs: DeepSeek-V3, DeepSeek-R1, Doubao, Kimi, and GPT-4o. The models’ outputs were evaluated by two experienced thoracic surgeons using a five-point Likert scale to assess accuracy and emotional support.

Results

GPT-4o, DeepSeek-V3, and DeepSeek-R1 performed similarly, with a median [interquartile range (IQR) ] of 5.00 (4.50–5.00), significantly outperforming Kimi [4.25 (3.50–4.50) ] and Doubao [4.50 (3.88–4.50) ]. Subgroup analysis showed that DeepSeek-R1 excelled in basic concepts, diagnosis, treatment, and pathology modules. DeepSeek-V3 performed excellently overall, particularly in the diagnosis module. GPT-4o was best suited for the screening module. The emotional support assessment revealed that all LLMs scored notably lower in this dimension, around 3.00, compared to their accuracy scores. Among the models, DeepSeek-R1 provided the highest level of emotional support, with a median (IQR) of 3.50 (3.00–4.50). GPT-4o [2.50 (2.50–3.12) ], DeepSeek-V3 [3.25 (2.50–3.50) ], and Doubao [3.00 (2.50–3.50) ] demonstrated comparable performance, while Kimi showed the lowest scores [2.50 (2.50–3.00) ]. Subgroup analysis further indicated that emotional support ratings were consistently lower across all modules, highlighting a critical limitation of current LLMs in patient-centered communication.

Conclusions

LLMs show initial application potential in lung cancer diagnosis and treatment, but shortcomings remain in handling complex clinical scenarios and patient communication. With ongoing development and improvement, LLMs are expected to have broad application prospects in the field of lung cancer diagnosis and treatment. To the best of our knowledge, our study is the first systematic evaluation of domestic LLMs in the context of lung cancer care in China.

图1 配对Wilcoxon符号秩和检验的样本量估计,Bonferroni校正(n=10)
表1 模型及使用设置说明
表2 准确性评估5级评分量表
表3 情感支持评估5级评分量表
图2 LLMs准确性表现的频数分布图。LLM:大语言模型
表4 LLMs间准确性表现的配对Wilcoxon符号秩和检验(Bonferroni校正,n=10)
表5 LLMs在各模块中准确性表现的频率分布表
图3 LLMs情感支持度表现的频数分布图。LLM:大语言模型
表6 LLMs间情感支持度表现的配对Wilcoxon符号秩和检验(Bonferroni校正,n=10)
表7 LLMs在各模块中情感支持度表现的频率分布表
图4 不同LLMs生成回答的字符数统计。LLM:大语言模型
表8 不同LLMs生成回答的文本复杂度分析
1
Leiter AVeluswamy RRWisnivesky JP.The global burden of lung cancer: current status and future trends[J].Nat Rev Clin Oncol202320(9):624-639.
2
Bray FLaversanne MSung H,et al.Global cancer statistics 2022: GLOBOCAN estimates of incidence and mortality worldwide for 36 cancers in 185 countries[J].CA Cancer J Clin202474(3):229-263.
3
中华医学会肿瘤学分会.中华医学会肺癌临床诊疗指南(2024版)[J].中华医学杂志2024104(34):3175-3213.
4
Bedi SLiu YOrr-Ewing L,et al.Testing and Evaluation of Health Care Applications of Large Language Models:A Systematic Review[J].JAMA2025333(4):319-328.
5
Thirunavukarasu AJTing DSJElangovan K,et al.Large language models in medicine[J].Nat Med202329(8):1930-1940.
6
Fink MABischoff AFink CA,et al.Potential of ChatGPT and GPT-4 for Data Mining of Free-Text CT Reports on Lung Cancer[J].Radiology2023308(3):e231362.
7
Rahsepar AATavakoli NKim GHJ,et al.How AI Responds to Common Lung Cancer Questions:ChatGPT vs Google Bard[J].Radiology2023307(5):e230922.
8
Mao YXu NWu Y,et al.Assessments of lung nodules by an artificial intelligence chatbot using longitudinal CT images[J].Cell Rep Med20256(3):101988.
9
Sandmann SHegselmann SFujarski M,et al.Benchmark evaluation of DeepSeek large language models in clinical decision-making[J].Nat Med202531(8):2546-2549.
10
Tordjman MLiu ZYuce M,et al.Comparative benchmarking of the DeepSeek large language model on medical tasks and clinical reasoning[J].Nat Med202531(8):2550-2555.
11
吕勇,王钧,樊代明.DeepSeek在临床医学见习教学中的应用[J].医学教育研究与实践202533(4):564-570,602.
12
管伯颜,许明鹤,张惠淇,等.大语言模型在儿童口腔预防医学领域问答的准确性比较[J].口腔疾病防治202533(4):313-319.
13
韩彩玲,白石柱,张婷民,等.口腔辅助诊疗和健康咨询领域5种大语言模型应用初探[J].中华口腔医学杂志202560(8):871-878.
14
Faul FErdfelder EBuchner A,et al.Statistical power analyses using G*Power 3.1: tests for correlation and regression analyses[J].Behav Res Methods200941(4):1149-1160.
15
NCCN Clinical Practice Guidelines in Oncology: Non-Small Cell Lung Cancer[Z].National Comprehensive Cancer Network.2024
16
NCCN Clinical Practice Guidelines in Oncology: Lung Cancer Screening[Z].National Comprehensive Cancer Network.2024
17
Lung CT Screening Reporting & Data System (Lung-RADS)[Z].American College of Radiology.2022
18
Landis JRKoch GG.The measurement of observer agreement for categorical data[J].Biometrics197733(1):159-174.
19
Fahrner LJChen ETopol E,et al.The generative era of medical AI[J].Cell2025188(14):3648-3660.
20
Bai XChen GHe T,et al.A Holistic Comparative Study of Large Language Models as Emotional Support Dialogue Systems[J].Cogn Comput202517:71.
21
Sorin VBrin DBarash Y,et al.Large Language Models and Empathy: Systematic Review[J].J Med Internet Res202426:e52597.
22
何静,沈阳,谢润锋.大语言模型幻觉现象的识别与优化[J].计算机应用202545(3):709-714.
23
刘泽垣,王鹏江,宋晓斌,等.大语言模型的幻觉问题研究综述[J].软件学报, 2025, 36(3):1152-1185.
[1] 张蔚, 李运涛, 尚培中, 贾志芳, 张伟, 郭伟林. 腹腔镜根治术治疗转移性胆囊癌一例报道[J/OL]. 中华普外科手术学杂志(电子版), 2025, 19(05): 589-590.
[2] 刘小丽, 罗倩, 种玉婷, 向贇, 马群宝, 莫亚斯尔·热合木拉, 黄玉蓉. 抗血管生成药物联合PD-1单抗治疗NSCLC的疗效及对T淋巴细胞亚群的影响[J/OL]. 中华肺部疾病杂志(电子版), 2025, 18(04): 510-515.
[3] 张青, 吴灵芝, 冯契靓, 陈荣荣, 秦二云, 张诚实, 赵云峰, 雷撼, 刘明. 黄芪多糖调控CEACAM7通过EMT通路抑制肺癌A549细胞恶性生物学行为的机制研究[J/OL]. 中华肺部疾病杂志(电子版), 2025, 18(04): 552-557.
[4] 蒋延龄, 任瑾卓, 陈俊杰, 田秀丽, 莘翼翔, 张华. 血浆细胞因子谱预测非小细胞肺癌患者临床获益和免疫相关不良事件的意义[J/OL]. 中华肺部疾病杂志(电子版), 2025, 18(04): 558-563.
[5] 刘学飞, 赵东, 李婷婷, 李佳浓, 葛亚楠, 李博. RB1基因状态对非小细胞肺癌免疫检查点抑制剂联合化疗反应的意义[J/OL]. 中华肺部疾病杂志(电子版), 2025, 18(04): 580-585.
[6] 武军霞, 霍刚, 李姣姣, 杨会会, 马铭, 张王峰. 循环细胞因子谱预测非小细胞肺癌患者放射性肺纤维化的临床意义[J/OL]. 中华肺部疾病杂志(电子版), 2025, 18(04): 603-608.
[7] 王大泉, 应开军, 孙云浩, 王尧. 胸腔镜支气管袖式切除术对肺癌患者术后并发症及呼吸功能的影响[J/OL]. 中华肺部疾病杂志(电子版), 2025, 18(04): 620-625.
[8] 王勇, 董家才, 关江, 何晋琴, 戴红霞, 刘经伟, 张永伦, 郑重庆. 胸腔内迷走神经阻滞复合全身麻醉在胸腔镜肺癌肺叶切除术中的临床应用[J/OL]. 中华肺部疾病杂志(电子版), 2025, 18(04): 626-631.
[9] 孙晓容, 钟瑶, 张雯, 刘佳铭, 叶东樊. 肺癌免疫治疗并发脊髓炎救治成功一例[J/OL]. 中华肺部疾病杂志(电子版), 2025, 18(04): 657-659.
[10] 赵雅波, 王倩, 闫小龙, 王元勇, 何改花, 郭一泽. 早期非小细胞肺癌患者术前术后血清miR-21 和miR-937-3p表达变化的研究[J/OL]. 中华肺部疾病杂志(电子版), 2025, 18(03): 345-349.
[11] 周玲, 肖颖, 李秋诗, 陈兆毅, 李琪, 吴园明. 亚麻木酚素通过circRNA HIPK3影响非小细胞肺癌A549 细胞凋亡及铁死亡的机制研究[J/OL]. 中华肺部疾病杂志(电子版), 2025, 18(03): 362-368.
[12] 乔鲜丽, 田向阳, 周文雅, 秦泽敏, 郭姗姗, 于俊岩. 循环肿瘤DNA 对非小细胞肺癌术后复发风险的预测意义[J/OL]. 中华肺部疾病杂志(电子版), 2025, 18(03): 395-400.
[13] 杨利君, 黄小军, 宋向波, 易慕华. SMARCA4/BRG1缺失型非小细胞肺癌并文献复习报告1例[J/OL]. 中华细胞与干细胞杂志(电子版), 2025, 15(04): 217-219.
[14] 广东省护理学会肺癌个案管理专业委员会, 广东省医学会肺部肿瘤学分会. 肺癌术后并发皮下气肿患者护理规范管理专家共识[J/OL]. 中华临床医师杂志(电子版), 2025, 19(03): 180-187.
[15] 杨鹏, 王莉, 周湘哲, 袁宽道. 肺癌患者外周血UA/Cr值、NLR及NT-proBNP对围手术期发生急性心力衰竭的预测价值[J/OL]. 中华卫生应急电子杂志, 2025, 11(03): 153-158.
阅读次数
全文


摘要


AI


AI小编
你好!我是《中华医学电子期刊资源库》AI小编,有什么可以帮您的吗?