Abstract:
Objective To explore the current status and potential of mainstream large language models (LLMs) in lung cancer auxiliary diagnosis and treatment.
Methods A multidisciplinary team from Zhongshan Hospital Affiliated to Fudan University designed 40 questions based on domestic and international guidelines and long-term clinical experience. The questions covered five modules of lung cancer diagnosis and treatment: basic concepts, lung cancer screening, diagnosis, treatment, and pathology. The questions were posed to five mainstream LLMs: DeepSeek-V3, DeepSeek-R1, Doubao, Kimi, and GPT-4o. The models’ outputs were evaluated by two experienced thoracic surgeons using a five-point Likert scale to assess accuracy and emotional support.
Results GPT-4o, DeepSeek-V3, and DeepSeek-R1 performed similarly, with a median [interquartile range (IQR) ] of 5.00 (4.50–5.00), significantly outperforming Kimi [4.25 (3.50–4.50) ] and Doubao [4.50 (3.88–4.50) ]. Subgroup analysis showed that DeepSeek-R1 excelled in basic concepts, diagnosis, treatment, and pathology modules. DeepSeek-V3 performed excellently overall, particularly in the diagnosis module. GPT-4o was best suited for the screening module. The emotional support assessment revealed that all LLMs scored notably lower in this dimension, around 3.00, compared to their accuracy scores. Among the models, DeepSeek-R1 provided the highest level of emotional support, with a median (IQR) of 3.50 (3.00–4.50). GPT-4o [2.50 (2.50–3.12) ], DeepSeek-V3 [3.25 (2.50–3.50) ], and Doubao [3.00 (2.50–3.50) ] demonstrated comparable performance, while Kimi showed the lowest scores [2.50 (2.50–3.00) ]. Subgroup analysis further indicated that emotional support ratings were consistently lower across all modules, highlighting a critical limitation of current LLMs in patient-centered communication.
Conclusions LLMs show initial application potential in lung cancer diagnosis and treatment, but shortcomings remain in handling complex clinical scenarios and patient communication. With ongoing development and improvement, LLMs are expected to have broad application prospects in the field of lung cancer diagnosis and treatment. To the best of our knowledge, our study is the first systematic evaluation of domestic LLMs in the context of lung cancer care in China.
Key words:
Large language models,
Lung cancer,
Clinical decision support
Zhiyun Duan, Fangyi Liu, Dongxian Jiang, Qingle Wang, Wenyi Luan, Ying Wu, Tian Jiang, Han Tang, Lijie Tan. Exploring the application of large language models in lung cancer auxiliary diagnosis and treatment[J]. Chinese Journal of Thoracic Surgery(Electronic Edition), 2025, 12(03): 152-161.