Construction and Application of A Knowledge Base for Ancient Poetry Annotation with Large Language Models

LI Jiabin, WEI Tingxin, QU Weiguang, LI Bin, FENG Minxuan, WANG Dongbo

LIBRARY TRIBUNE ›› 2025, Vol. 45 ›› Issue (3) : 99-109.

PDF(5687 KB)
PDF(5687 KB)
LIBRARY TRIBUNE ›› 2025, Vol. 45 ›› Issue (3) : 99-109.

Construction and Application of A Knowledge Base for Ancient Poetry Annotation with Large Language Models

  • LI Jiabin, WEI Tingxin, QU Weiguang, LI Bin, FENG Minxuan, WANG Dongbo
Author information +
History +

Abstract

The high semantic complexity of allusions,imagery,and proper nouns in ancient poetry hinders the public's understanding of its meaning. To address this problem,this paper attempts to systematically analyze and categorize the complex semantic groups in ancient poems. With the text processing and information extraction capability of large language models,it integrates the knowledge from various dictionaries to build a knowledge base for ancient poem annotation,which is verified and applied in the automatic annotation and translation tasks of ancient poems. The experimental results indicate that the annotation knowledge base achieves an average macro-F1 score of 93.90% on the task of annotating five major semantic groups in ancient poems,outperforming the existing annotation schemes. The performance of AnnoKB_GLM,a domain-specific language model of ancient poetry obtained by pre-training with the knowledge base again,on machine translation tasks exceeds that of the existing general large language models of modern Chinese and the benchmark models of ancient Chinese texts,verifying the practical value of the annotation knowledge base.

Key words

annotations on ancient poetry / knowledge base construction / large language model

Cite this article

Download Citations
LI Jiabin, WEI Tingxin, QU Weiguang, LI Bin, FENG Minxuan, WANG Dongbo. Construction and Application of A Knowledge Base for Ancient Poetry Annotation with Large Language Models[J]. LIBRARY TRIBUNE, 2025, 45(3): 99-109

References

[1] 搜韵网[EB/OL].[2024-01-03].https://www.sou-yun.cn/.
[2] 蒋绍愚. 唐诗语言研究[M].北京:语文出版社,2008:73-133.
[3] 屈光. 中国古典诗歌意象论[J].中国社会科学,2002(3):162-171,208.
[4] 傅根生. 唐诗语言艺术研究[D].南京:南京师范大学,2018.
[5] 李斌,陈家骏,陈小荷.基于互联网的汉语认知属性获取及分析[J].语言文字应用,2012,83(3):134-143.
[6] 王锳. 唐诗中的动词重叠[J].中国语文,1996(3):233-234,240.
[7] 陈贻焮. 增订注释全唐诗[M].北京:文化艺术出版社,2001.
[8] 瞿蛻園,朱金城.李白集校注[M].上海:上海古籍出版社,1980.
[9] 萧涤非. 杜甫全集校注[M].北京:人民文学出版社,2014.
[10] 陈致. 中国古代诗词典故辞典[M].北京:北京燕山出版社,1991.
[11] 张英,王世祯.渊鉴类函[M].上海:上海古籍出版社,1992.
[12] 张廷玉. 骈字类编[M].北京:中国书店出版社,1984.
[13] 陈逸云. 以搜韵网为例谈诗词知识图谱的构建与应用[J]. 数字人文,2021(3):64-68.
[14] 罗竹风. 汉语大词典[M].上海:汉语大词典出版社,1992.
[15] 徐一士. 国语辞典[M].北京:商务印书馆,1937.
[16] 古诗词网[EB/OL].[2024-01-03].https://m.gushici.net/.
[17] 古诗文网[EB/OL].[2024-01-03].https://www.gushiwen.cn/.
[18] 诗词名句网[EB/OL].[2024-01-03].https://www.shicimingju.com/.
[19] 张颖怡,章成志,周毅,等.基于ChatGPT的多视角学术论文实体识别:性能测评与可用性研究[J].数据分析与知识发现,2023,7(9):12-24.
[20] 鲍彤,章成志.ChatGPT中文信息抽取能力测评——以三种典型的抽取任务为例[J]. 数据分析与知识发现,2023,7(9):1-11.
[21] YU P P,CHEN J H,FENG X,et al. CHEAT:a Large-scale Dataset for Detecting ChatGPT-writtEn AbsTracts[EB/OL].(2023-04-24)[2024-02-24].https://arxiv.org/abs/2304.12008.
[22] LIU Y K,ZHANG Z Y,ZHANG W Y,et al.Argugpt:evaluating,understanding and identifying argumentative essays generated by gpt models[EB/OL].(2023-04-16)[2024-02-24].https://arxiv.org/abs/2304.07666.
[23] 刘芳,赵铁军,于浩,等.基于统计的汉语组块分析[J].中文信息学报,2000,14(6):28-32,39.
[24] 钱青青,王诚文,荀恩东,等.汉语块依存语法与树库构建[J].中文信息学报,2022,36(7):50-58.
[25] 刘文蔚. 诗学含英[M].香港:香港银河出版社,2001.
[26] 张撝之. 中国历代人名大辞典[M].上海:上海古籍出版社,1999.
[27] 薛国屏. 中国古今地名对照表[M].上海:上海辞书出版社,2014.
[28] 龚延明. 中国历代职官别名大辞典[M].北京:中华书局,2019.
[29] 王力,岑麒祥,林焘.古汉语常用字字典[M].北京:商务印书馆,2005.
[30] 中国历代人物传记资料(CBDB)[EB/OL].[2024-01-03].https://inindex.com/biog.
[31] 汉典网[EB/OL].[2024-01-03].https://www.zdic.net/.
[32] 百度AI开放平台[EB/OL].[2024-01-03].https://console.bce.baidu.com/qianfan/modelcenter/model/buildIn/list.
[33] 讯飞云平台[EB/OL].[2024-01-03].https://xinghuo.xfyun.cn/sparkapi?ch=bytg_xhapi_ty1&bd_vid=8425106211891165406.
[34] OpenAI平台[EB/OL].[2024-01-03].https://platform.openai.com/docs/overview.
[35] ZENG A H,XU B,WANG B W,et al.ChatGLM:A Family of Large Language Models from GLM-130B to GLM-4 All Tools[EB/OL].(2023-10-25)[2024-01-03].https://arxiv.org/abs/2406.12793.
[36] LCLCJJ.XunziALLM[EB/OL].(2024-01-29)[2024-01-30].https://github.com/Xunzi-LLM-of-Chinese-classics/XunziALLM.
[37] DEVLIN J,CHANG M W,LEE K,et al.BERT:pre-training of deep bidirectional transformers for language understanding[C]//BURSTEIN J,DORAN C,SOLORIO T. Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics:Human Language Technologies. Minneapolis:Association for Computational Linguistics,2019:4171-4186.
[38] KIM Y.Convolutional neural networks for sentence classification[C]//MOSCHITTI A,PANG B,DAELEMANS W. Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP).Doha:Association for Computational Linguistics,2014:1746-1751.
[39] LIU P F,QIU X P,HUANG X J.Recurrent neural network for text classification with multi-task learning[C]//KAMBHAMPATI S.IJCAI'16:Proceedings of the Twenty-Fifth International Joint Conference on Artificial Intelligence. New York:AAAI Press & International Joint Conferences on Artificial Intelligence,2016:2873-2879.
[40] LAI S W,XU L H,LIU K,et al.Recurrent convolutional neural networks for text classification[C]//BONET B,KOENIG S.Proceedings of the Twenty-Ninth AAAI Conference on Artificial Intelligence.Austin:AAAI Press,2015:2267-2273.
[41] JOHNSON R,ZHANG T.Deep pyramid convolutional neural networks for text categorization[C]//BARZILAY R,KAN M Y.Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics. Vancouver:Association for Computational Linguistics,2017:562-570.
[42] ZHANG Y X,LI H N.Can large langauge model comprehend ancient Chinese? A preliminary test on ACLUE[C]//ANDERSON A,GORDIN S,LI B,et al.Proceedings of the Ancient Language Processing Workshop. Varna:INCOMA Ltd.,2023:80-87.
[43] CUI Y M,YANG Z Q,YAO X. Efficient and Effective Text Encoding for Chinese LLaMA and Alpaca[EB/OL].(2023-04-17)[2024-01-03].https://arxiv.org/pdf/2304.08177.
[44] WANG D B,LIU C,ZHAO Z X,et al.GujiBERT and GujiGPT:Construction of Intelligent Information Processing Foundation Language Models for Ancient Texts[EB/OL].(2023-07-11)[2024-01-03]. https://arxiv.org/abs/2307.05354.
[45] PAPINENI K,ROUKOS S,WARD T,et al.Bleu:a method for automatic evaluation of machine translation[C]//ISABELLE P,CHARNIAK E,LIN D K.Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics.Philadelphia:Association for Computational Linguistics,2002:311-318.
[46] LIN C Y.ROUGE:A package for automatic evaluation of summaries[C]//Proceedings of the Workshop on Text Summarization Branches Out. Barcelona:Association for Computational Linguistics,2004:74-81.
[47] HU E J,SHEN Y L,WALLIS P,et al.LoRA:low-rank adaptation of large language models[C]//The Tenth International Conference on Learning Representations(ICLR). Appleton, WI:ICLR,2022:1-13.
PDF(5687 KB)

20

Accesses

0

Citation

Detail

Sections
Recommended

/