Performance Assessment of Large Language Model-Generated English Translations of Traditional Chinese Medicine Classics Based on BERTScore Semantic Matching
-
Abstract
Traditional Chinese Medicine(TCM) classics have preserved a holistic medical paradigm for over two millennia, offering valuable insights for global chronic disease management and personalized healthcare.Their philosophical and medical significance is undeniable.This study selected 100 sample sentences from five representative TCM classics—Huangdi Neijing:Suwen(Yellow Emperor's Inner Canon: Basic Questions),Yinhai Jingwei(Essential Subtleties on the Silver Sea),Bencao Gangmu(Compendium of Materia Medica),Shanghan Lun(Treatise on Cold Damage),and Jingui Yaolue(Essential Prescriptions from the Golden Cabinet)—as test corpora.Six domestic large language models(LLMs)—DeepSeek, Doubao, Qwen3,ERNIE 4.5,Kimi, and Hunyuan T1—were evaluated for translation performance using BERTScore semantic matching scores.Quantitative analysis revealed that ERNIE 4.5 ranked first, showing statistically significant superiority over the lowest-ranking Qwen3,while differences among other models were not significant.Overall, the LLM-generated translations met the comprehensibility threshold and minimum post-editing requirements.Qualitative analysis of the highest-and lowest-scoring sentences identified key weaknesses in TCM term interpretation, spatiotemporal concept mapping, contextual coherence, and cultural metaphor transfer.Based on these findings, this paper proposes four improvement pathways: constructing terminological knowledge graph networks, developing spatiotemporal concept mapping modules, enhancing contextual intelligence frameworks, and strengthening cultural metaphor model training.These strategies aim to advance the English translation and global dissemination of TCM classics, promoting the worldwide sharing and inheritance of TCM culture.
-
-