《计算机应用研究》|Application Research of Computers

CRF与规则相结合的医学病历实体识别

Combining CRF and rule based medical named entity recognition

免费全文下载 (已被下载 次)  
获取PDF全文
作者 栗伟,赵大哲,李博,彭新茗,刘积仁
机构 1.东北大学 a.医学影像计算教育部重点实验室;b.信息科学与工程学院,沈阳 110004;2.东软集团股份有限公司,沈阳 110179
统计 摘要被查看 次,已被下载
文章编号 1001-3695(2015)04-1082-05
DOI 10.3969/j.issn.1001-3695.2015.04.029
摘要 针对电子病历结构化中命名实体识别困难的问题,提出了一种基于CRF与规则相结合的医学病历实体识别算法。该算法采用CRF进行病历实体的初始识别,然后基于规则进行病历实体识别结果优化,其中规则包括基于决策树生成的规则和临床知识规则。实验证明,该算法对病历实体进行识别时准确率及召回率分别最高达到91.03%和87.26%,满足临床中系统应用需求,同时实验表明该算法具有很好的鲁棒性和稳定性。
关键词 电子病历;病历实体;命名实体识别;条件随机场;决策树
基金项目 国家自然科学基金资助项目(61172002,61302012)
中央高校基本科研业务费专项资金资助项目(N120518001,N110718001)
辽宁省自然科学基金资助项目(2013020021)
本文URL http://www.arocmag.com/article/01-2015-04-029.html
英文标题 Combining CRF and rule based medical named entity recognition
作者英文名 LI Wei, ZHAO Da-zhe, LI Bo, PENG Xin-ming, LIU Ji-ren
机构英文名 1. a. Key Laboratory of Medical Image Computing of Ministry of Education, b. College of Information Science & Engineering, Northeastern University, Shenyang 110004, China; 2. Neusoft Group LTD. , Shenyang 110179, China
英文摘要 In the preprocessing step of electronic medical records analysis, medical named entity recognition is a key issue. This paper proposed a combining CRF and rule based medical named entity recognition algorithm. The algorithm made an initial entity recognition by CRF and then applied a rule based recognition method to improve the accuracy, whose rules included the rules from decision tree and domain knowledge. The results show that the algorithm has high accuracy and recall perfor-mance at records entity recognition that is up to 91.03% and 87.26%, and meets the requirement of the clinical application. Meanwhile, the algorithm has good robustness and stability on different sizes, types of the dataset.
英文关键词 electronic medical record(EMR); medical named entity; named entity recognition; conditional random field(CRF); decision tree
参考文献 查看稿件参考文献
  [1] CHEN Yu-kun, MANI S, XU Hua. Applying active learning to assertion classification of concepts in clinical text[J] . Journal of Biome-dical Informatics, 2012, 45(2):265-272.
[2] JONNALAGADDA S, COHEN T, WU S, et al. Enhancing clinical concept extraction with distributional seman-tics[J] . Journal of Biomedical Informatics, 2012, 45(1):129-140.
[3] MATHENY M E, FITZHENRY F, WU S, et al. Detection of infectious symptoms from VA emergency department and primary care clinical documentation[J] . International Journal of Medical Informa-tics, 2012, 81(3):143-156.
[4] ZHENG Jia-ping, CHAPMAN W W, MILLER T A, et al. A system for coreference resolution for the clinical narrative[J] . Journal of the American Medical Informatics Association, 2012, 19(4):660-667.
[5] RINK B, HARABAGIU S, ROBERTS K. Automatic extraction of relations between medical concepts in clinical texts[J] . Journal of the American Medical Informatics Association, 2011, 18(5):594-600.
[6] FRIEDMAN C, SHAGINA L, LUSSIER Y, et al. Automated encoding of clinical documents based on natural language processing[J] . Journal of the American Medical Informatics Association, 2004, 11(5):392-402.
[7] DONALDSON I, MARTIN J, De BRUIJN B, et al. PreBIND and textomy-mining the biomedical literature for protein-protein interactions using a support vector machine[J] . BMC Bioinformatics, 2003, 4(1):11.
[8] 李莹. 文本病历信息抽取方法研究[D] . 杭州:浙江大学, 2009.
[9] 王世昆, 李绍滋, 陈彤生. 基于条件随机场的中医命名实体识别[J] . 厦门大学学报:自然科学版, 2009, 48(3):359-364.
[10] NADEAU D, SEKINE S. A survey of named entity recog-nition and classification[J] . Lingvisticae Investigationes, 2007, 30(1):3-26.
[11] 潘正高. 基于规则和统计相结合的中文命名实体识别研究[J] . 情报科学, 2012, 30(5):708-712.
[12] 向晓雯, 史晓东, 曾华琳. 一个统计与规则相结合的中文命名实体识别系统[J] . 计算机应用, 2005, 25(10):2404-2406.
[13] 张晓艳, 王挺, 陈火旺. 基于混合统计模型的汉语命名实体识别方法[J] . 计算机工程与科学, 2006, 28(6):135-139.
[14] ZWEIG G, NGUYEN P, Van COMPERNOLLE D, et al. Speech re-cognition with segmental conditional random fields:a summary of the JHU CLSP 2010 summer workshop[C] //Proc of IEEE International Conference on Acoustics, Speech and Signal Processing. [S. l. ] :IEEE Press, 2011:5044-5047.
收稿日期 2014/3/11
修回日期 2014/4/24
页码 1082-1086
中图分类号 TP391.4
文献标志码 A