《计算机应用研究》|Application Research of Computers

医学知识图谱构建技术与研究进展

Construction techniques and research development of medical knowledge graph

免费全文下载 (已被下载 次)  
获取PDF全文
作者 袁凯琦,邓扬,陈道源,张冰,雷凯
机构 北京大学深圳研究生院 深圳市云计算关键技术与应用重点实验室,广东 深圳 518055
统计 摘要被查看 次,已被下载
文章编号 1001-3695(2018)07-1929-08
DOI 10.3969/j.issn.1001-3695.2018.07.002
摘要 现有知识图谱构建技术在医学领域中普遍存在效率低、限制多、拓展性差等问题。针对医疗数据跨语种、专业性强、结构复杂等特点,对构建医学知识图谱的关键技术进行了自底向上的全面解析,涵盖了医学知识表示、抽取、融合和推理以及质量评估五部分内容;此外,还介绍了医学知识图谱在信息检索、知识问答、智能诊断等医疗服务中的应用现状。最后,结合当前医学知识图谱构建技术面临的重大挑战和关键问题,对其发展前景进行了展望。
关键词 知识图谱;知识获取;知识融合;知识推理;自然语言处理
基金项目 国家自然科学基金青年基金资助项目(61602013)
深圳市科创委基础研究项目(JCYJ20160330095313861,JCYJ20151030154330711,JCYJ20151014093505032)
本文URL http://www.arocmag.com/article/01-2018-07-002.html
英文标题 Construction techniques and research development of medical knowledge graph
作者英文名 Yuan Kaiqi, Deng Yang, Chen Daoyuan, Zhang Bing, Lei Kai
机构英文名 ShenzhenKeyLaboratoryforCloudComputingTechnology&Applications,PekingUniversityShenzhenGraduateSchool,ShenzhenGuangdong518055,China
英文摘要 Current constructing techniques of knowledge graph have some common defects in efficiency, scalability and applicability. Considering the specific features of medical data, this paper analyzed and classified the key techniques and methods involved in the construction of medical knowledge graph in a bottom-up way, included representation, extraction, fusion and reasoning of medical knowledge and quality assessment of medical knowledge graph. Furthermore, it also introduced the research and application of search engine, question-answering system and decision support system based on medical knowledge graph. Finally, this paper summarized challenges and major problems of medical knowledge graph, and prospected for future development.
英文关键词 knowledge graph; knowledge extraction; knowledge fusion; knowledge reasoning; natural language processing
参考文献 查看稿件参考文献
  [1] Christopher J. Introducing the Knowledge Graph:things, not strings[EB/OL] . (2012-05-24). https://mondaybynoon. com/introducing-the-knowledge-graph-things-not-strings/.
[2] Amarilli A, Galárraga L, Preda N, et al. Recent topics of research around the YAGO knowledge base[M] //Web Technologies and Applications. [S. l. ] :Springer International Publishing, 2014:1-12.
[3] Auer S, Bizer C, Kobilarov G, et al. DBpedia:a nucleus for a Web of open data[M] //The Semantic Web. Berlin:Springer, 2007:722-735.
[4] 阮彤, 孙程琳, 王昊奋, 等. 中医药知识图谱构建与应用[J] . 医学信息学杂志, 2016, 37(4):8-13.
[5] 顾琳. 基于领域本体的亚健康中医辅助诊断系统的研究及应用[D] . 昆明:云南师范大学, 2008.
[6] Shortliffe E H. Computer-based medical consultations:MYCIN[M] . New York:Elsevier Publishing Co. Inc, 2012.
[7] Rédei G P. Encyclopedia of genetics, genomics, proteomics and informatics[M] . Netherlands:Springer, 2008.
[8] Ceusters W, Martens P, Dhaen C, et al. LinkFactory:an advanced formal ontology management system[C] //Proc of Interactive Tools for Knowledge Capture Workshop. 2001:175-204.
[9] Baker P G, Brass A, Bechhofer S, et al. TAMBIS:transparent access to multiple bioinformatics information sources[C] //Proc of the 6th International Conference on Intelligent Systems for Molecular Biology. Palo Alto:AAAI Press, 1998:25-34.
[10] 刘知远, 孙茂松, 林衍凯, 等. 知识表示学习研究进展[J] . 计算机研究与发展, 2016, 53(2):247-261.
[11] Turian J, Ratinov L, Bengio Y. Word representations:a simple and general method for semi-supervised learning[C] //Proc of the 48th Annual Meeting of the Association for Computational Linguistics. Stroudsburg:Association for Computational Linguistics, 2010:384-394.
[12] Bordes A, Weston J, Collobert R, et al. Learning structured embeddings of knowledge bases[C] //Proc of the 25th AAAI Conference on Artificial Intelligence. Palo Alto:AAAI Press, 2011:301-306.
[13] Socher R, Chen Danqi, Manning C D, et al. Reasoning with neural tensor networks for knowledge base completion[C] //Proc of the 26th International Conference on Neural Information Processing Systems. [S. l. ] :Curran Associates Inc, 2013:926-934.
[14] Jenatton R, Roux N L, Bordes A, et al. A latent factor model for highly multi-relational data[C] //Proc of the 25th International Conference on Neural Information Processing Systems. [S. l. ] :Curran Associates Inc, 2012:3167-3175.
[15] Bordes A, Usunier N, Garcia-Duran A, et al. Translating embeddings for modeling multi-relational data[C] // Proc of the 26th International Conference on Neural Information Processing Systems. [S. l. ] :Curran Associates Inc, 2013:2787-2795.
[16] Kleyko D, Khan S, Osipov E, et al. Modality classification of medica limages with distributed representations based on cellular automata reservoir computing[C] //Proc of the 14th IEEE International Symposium on Biomedical Imaging. Piscataway, NJ:IEEE Press, 2017.
[17] Henriksson A, Zhao Jing, Dalianis H, et al. Ensembles of randomized trees using diverse distributed representations of clinical events[J] . BMC Medical Informatics and Decision Making, 2016, 16(2):69.
[18] 侯丽, 钱庆, 黄利辉, 等. 基于本体的临床医学知识库系统构建探讨[J] . 医学信息学杂志, 2011, 32(4):42-47.
[19] McDonald F S, Elkin P L. UMLS concept indexing for production databases:a feasibility study[J] . Journal of the American Medical Informatics Association, 2001, 8(5):512-514.
[20] Friedman C, Alderson P O, Austin J H, et al. A general natural-language text processor for clinical radiology[J] . Journal of the American Medical Informatics Association, 1994, 1(2):161-174.
[21] Wu S T, Liu Hongfang, Li Dingcheng, et al. Unified medical language system term occurrences in clinical notes:a large-scale corpus analysis[J] . Journal of the American Medical Informatics Association, 2012, 19(e1):149-156.
[22] Smith C A, Stavri P Z. Consumer health vocabulary[M] // Consumer Health Informatics:Informing Consumers and Improving Health Care. New York:Springer Publishing, 2005:122-128.
[23] Uzuner , South B R, Shen Shuying, et al. 2010 i2b2/VA challenge on concepts, assertions, and relations in clinical text[J] . Journal of the American Medical Informatics Association, 2011, 18(5):552-556.
[24] Doan R I, Leaman R, Lu Zhiyong. NCBI disease corpus:a resource for disease name recognition and concept normalization[J] . Journal of Biomedical Informatics, 2014, 47(2):1-10.
[25] Kazama J, Makino T, Ohta Y, et al. Tuning support vector machines for biomedical named entity recognition[C] //Proc of Workshop on Natural Language Processing in the Biomedical Domain. Stroudsburg:Association for Computational Linguistics, 2002:1-8.
[26] Zhou Guodong, Zhang Jie, Su Jian, et al. Recognizing names in biomedical texts:a machine learning approach[J] . Bioinformatics, 2004, 20(7):1178-1190.
[27] Chen Lifeng, Friedman C. Extracting phenotypic information from the literature via natural language processing[J] . Studies in Health Technology & Informatics, 2004, 107(2):758-762.
[28] Lin Liang, Wang Keze, Meng Deyu, et al. Active self-paced learning for cost-effective and progressive face identification[J] . IEEE Trans on Pattern Analysis & Machine Intelligence, 2018, 40(1):7-19.
[29] Collobert R, Weston J, Bottou L, et al. Natural language processing (almost) from scratch[J] . Journal of Machine Learning Research, 2011, 12(1):2493-2537.
[30] Sahu S K, Anand A. Recurrent neural network models for disease name recognition using domain invariant features[C] //Proc of the 54th Annual Meeting of the Association for Computational Linguistics. Stroudsburg:Association for Computational Linguistics 2016:2216-2225.
[31] Wei Qikang, Chen Tao, Xu Ruifeng, et al. Disease named entity recognition by combining conditional random fields and bidirectional recurrent neural networks[EB/OL] . (2016-08-01). http://europepmc. org/backend/ptpmcrender. fcgi?accid=PMC5088735&blo-btype=pdf.
[32] Jagannatha A, Yu Hong. Structured prediction models for RNN based sequence labeling in clinical text[C] //Proc of Conference on Empirical Methods in Natural Language Processing. Stroudsburg:Association for Computational Linguistics, 2016:856-865.
[33] World Health Organization. The ICD-10 classification of mental and behavioural disorders:clinical descriptions and diagnostic guidelines[M] . Geneva:World Health Organization, 1992:86-92.
[34] Uzuner O, Mailoa J, Ryan R, et al. Semantic relations for problemoriented medical records[J] . Artificial Intelligence in Medicine, 2010, 50(2):63-73.
[35] Frunza O, Inkpen D. Extraction of disease-treatment semantic relations from biomedical sentences[C] //Proc of Workshop on Biomedical Natural Language Processing. Stroudsburg:Association for Computational Linguistics, 2010:91-98.
[36] Abacha A B, Zweigenbaum P. A hybrid approach for the extraction of semantic relations from MEDLINE abstracts[C] // Proc of International Conference on Intelligent Text Processing and Computational Linguistics. Berlin:Springer, 2011:139-150.
[37] De Bruijn B, Cherry C, Kiritchenko S, et al. Machine-learned solutions for three stages of clinical information extraction:the state of the art at i2b2 2010[J] . Journal of the American Medical Informatics Association, 2011, 18(5):557-562.
[38] Stapley B J, Benoit G. Biobibliometrics:information retrieval and visualization from co-occurrences of gene names in MEDLINE abstracts[C] // Proc of Pacific Symposium. 2000:529-540.
[39] Khoo C S G, Chan S, Niu Yun. Extracting causal knowledge from a medical database using graphical patterns[C] //Proc of the 38th Annual Meeting on Association for Computational Linguistics. Stroudsburg:Association for Computational Linguistics, 2000:336-343.
[40] 王昊奋, 张金康, 程小军. 中文开放链接医疗数据的构建[J] . 中国数字医学, 2013, 8(4):5-8.
[41] 徐绪堪, 房道伟, 蒋勋, 等. 知识组织中知识粒度化表示和规范化研究[J] . 图书情报知识, 2014 (6):101-106, 90.
[42] 庄严, 李国良, 冯建华. 知识库实体对齐技术综述[J] . 计算机研究与发展, 2016, 53(1):165-192.
[43] Fellegi I P, Sunter A B. A theory for record linkage[J] . Journal of the American Statistical Association, 1969, 64(328):1183-1210.
[44] Cochinwala M, Kurien V, Lalk G, et al. Efficient data reconciliation[J] . Information Sciences, 2001, 137(1-4):1-15.
[45] Elfeky M G, Verykios V S, Elmagarmid A K. TAILOR:a record linkage tool box[C] //Proc of the 12th International Conference on Data Engineering. Washington DC:IEEE Computer Society, 2002:17-28.
[46] Christen P. Automatic training example selection for scalable unsupervised record linkage[C] //Advances in Knowledge Discovery and Data Mining. Berlin:Springer, 2008:511-518.
[47] Chen Zhaoqi, Kalashnikov D V, Mehrotra S. Exploiting context analysis for combining multiple entity resolution systems[C] // Proc of ACM SIGMOD International Conference on Management of Data. New York:ACM Press, 2009:207-218.
[48] Ravikumar P, Cohen W W. A hierarchical graphical model for record linkage[C] //Proc of the 20th Conference on Uncertainty in Artificial Intelligence. Arlington, Virginia:AUAI Press, 2004:454-461.
[49] Li Juanzi, Wang Zhichun, Zhang Xiao, et al. Large scale instance matching via multiple indexes and candidate selection[J] . Know-ledge-Based Systems, 2013, 50(9):112-120.
[50] Bhattacharya I, Getoor L. Collective entity resolution in relational data[J] . ACM Trans on Knowledge Discovery from Data, 2007, 1(1):articleNo. 5.
[51] Lacoste-Julien S, Palla K, Davies A, et al. SiGMa:simple greedy matching for aligning large knowledge bases[C] //Proc of the 19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. New York:ACM Press, 2013:572-580.
[52] Tang Jie, Li Juanzi, Liang Bangyong, et al. Using Bayesian decision for ontology mapping[J] . Web Semantics:Science, Services and Agents on the World Wide Web, 2006, 4(4):243-262.
[53] Bhattacharya I, Getoor L. A latent Dirichlet model for unsupervised entity resolution[C] //Proc of SIAM International Conference on Data Mining. [S. l. ] :Society for Industrial and Applied Mathematics, 2006:47-58.
[54] McCallum A, Wellner B. Conditional models of identity uncertainty with application to noun coreference[C] //Proc of the 17th International Conference on Neural Information Processing Systems. Cambridge:MIT Press, 2004:905-912.
[55] Singla P, Domingos P. Multi-relational record linkage[C] //Proc of KDD Workshop on Multi-Relational Data Mining. 2004:1113-1121.
[56] Singla P, Domingos P. Entity resolution with Markov logic[C] //Proc of the 6th International Conference on Data Mining. Washington DC:IEEE Computer Society, 2006:572-582.
[57] Rastogi V, Dalvi N, Garofalakis M. Large-scale collective entity matching[J] . Proceedings of the VLDB Endowment, 2011, 4(4):208-218.
[58] 阮彤, 王梦婕, 王昊奋, 等. 垂直知识图谱的构建与应用研究[J] . 知识管理论坛, 2016(3):226-234.
[59] 李敬华, 易小烈, 杨德利, 等. 面向临床决策支持的中医脾胃病本体知识库构建研究[J] . 中国医学创新, 2014, 11(27):121-125.
[60] Shvaiko P, Euzenat J. Ontology matching:state of the art and future challenges[J] . IEEE Trans on Knowledge & Data Engineering, 2013, 25(1):158-176.
[61] Suchanek F M, Abiteboul S, Senellart P. PARIS:probabilistic alignment of relations, instances, and schema[J] . Proceedings of the VLDB Endowment, 2011, 5(3):157-168.
[62] Dong Xin, Gabrilovich E, Heitz G, et al. Knowledge vault:a Web-scale approach to probabilistic knowledge fusion[C] // Proc of ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. New York:ACM Press, 2014:601-610.
[63] Dieng-Kuntz R, Minier D, Rzicka M, et al. Building and using a medical ontology for knowledge management and cooperative work in a health care network[J] . Computers in Biology & Medicine, 2005, 36(7-8):871-892.
[64] Baorto D, Li Li, Cimino J J. Practical experience with the maintenance and auditing of a large medical ontology[J] . Journal of Biomedical Informatics, 2009, 42(3):494-503.
[65] De Giacomo G, Lenzerini M. TBox and ABox reasoning in expressive description logics[C] // Proc of the 5th International Conference on Principles of Knowledge Representation and Reasoning. San Francisco:Morgan Kaufmann Publishing, 1996:316-327.
[66] Buchanan B G, Shortliffe E H. Rule-based expert systems:the MYCIN experiments of the Stanford heuristic programming project[M] . Reading, MA:Addison Wesley Publishing Co, 1984.
[67] Aamodt A, Plaza E. Case-based reasoning:foundational issues, methodological variations, and system approaches[J] . AI Communications, 1994, 7(1):39-59.
[68] Bousquet C, Henegar C, Lillo-LeLout A, et al. Implementation of automated signal generation in pharmacovigilance using a knowledge-based approach[J] . International Journal of Medical Informatics, 2005, 74(7-8):563-571.
[69] Chen R C, Huang Yunhou, Bau C T, et al. A recommendation system based on domain ontology and SWRL for anti-diabetic drugs selection[J] . Expert Systems with Applications, 2012, 39(4):3995-4006.
[70] Bichindaritz I, Kansu E, Sullivan K M. Case-based reasoning in CARE-PARTNER:gathering evidence for evidence-based medical practice[C] // Proc of the 4th European Workshop on Advances in Case-based Reasoning. London:Springer-Verlag, 1998:334-345.
[71] Yang B S, Han Tian, Kim Y S. Integration of ART-Kohonen neural network and case-based reasoning for intelligent fault diagnosis[J] . Expert Systems with Applications, 2004, 26(3):387-395.
[72] Socher R, Chen Danqi, Manning C D, et al. Reasoning with neural tensor networks for knowledge base completion[C] //Proc of the 26th International Conference on Neural Information Processing Systems. [S. l. ] :Curran Associates Inc, 2013:926-934.
[73] Karegowda A G, Manjunath A S, Jayaram M A. Application of genetic algorithm optimized neural network connection weights for medical diagnosis of Pima Indians diabetes[J] . International Journal on Soft Computing, 2011, 2(2):15-23.
[74] Lao Ni, Mitchell T, Cohen W W. Random walk inference and learning in a large scale knowledge base[C] //Proc of Conference on Empirical Methods in Natural Language Processing. Stroudsburg:Association for Computational Linguistics, 2011:529-539.
[75] Lin Yankai, Liu Zhiyuan, Luan Huanbo, et al. Modeling relation paths for representation learning of knowledge bases[C] //Proc of Conference on Empirical Methods in Natural Language Processing. Stroudsburg:Association for Computational Linguistics, 2015:705-714.
[76] Angles R, Gutierrez C. Survey of graph database models[J] . ACM Computing Surveys, 2008, 40(1):articleNo 1.
[77] Holzschuher F, Peinl R. Performance of graph query languages:comparison of cypher, gremlin and native access in Neo4j[C] //Proc of Joint EDBT/ICDT Workshops. New York:ACM Press, 2013:195-204.
[78] Chang C, Moon B, Acharya A, et al. Titan:a high-performance remote-sensing database[C] //Proc of the 13th International Conference on Data Engineering. Washington DC:IEEE Computer Society, 1997:375-384.
[79] Tesoriero C. Getting started with OrientDB[M] . [S. l. ] :Packt Publishing Ltd, 2013.
[80] Dohmen L. Algorithms for large networks in the NoSQL database ArangoDB[D] . North Rhine-Westphalia:RWTH Aachen University, 2012.
[81] Neuhaus F, Vizedom A, Baclawski K, et al. Towards ontology evaluation across the life cycle[J] . Applied Ontology, 2013, 8(3):179-194.
[82] Clarke E L, Loguercio S, Good B M, et al. A task-based approach for gene ontology evaluation[J] . Journal of Biomedical Semantics, 2013, 4(Suppl 1):S4.
[83] Bright T J, Furuya E Y, Kuperman G J, et al. Development and evaluation of an ontology for guiding appropriate antibiotic prescribing[J] . Journal of Biomedical Informatics, 2012, 45(1):120-128.
[84] Gordon C L, Pouch S, Cowell L G, et al. Design and evaluation of a bacterial clinical infectious diseases ontology[C] //Proc of AMIA Annual Symposium. 2013:502-511.
[85] Corcho ó, Gómez-Pérez A, González-Cabero R, et al. ODEval:a tool for evaluating RDF(S), DAML+OIL, and OWL concept taxonomies[C] //Artificial Intelligence Applications and Innovations. Boston:Springer, 2004:369-382.
[86] Poveda-Villalón M, Suárez-Figueroa M C, Gómez-Pérez A. Validating ontologies with OOPS![C] //Proc of International Conference on Knowledge Engineering and Knowledge Management. Berlin:Springer, 2012:267-281.
[87] Stojanovic L, Stojanovic N, Gonzalez J, et al. OntoManager:a system for the usage-based ontology management[C] //On the Move to Meaningful Internet Systems. Berlin:Springer, 2003:858-875.
[88] Fernández M, Cantador I, Castells P. CORE:a tool for collaborative ontology reuse and evaluation[EB/OL] . (2012-04-01). https://km. aifb. kit. edu/ws/eon2006/eon2006fernandezetal. pdf.
[89] Ammenwerth E, Grber S, Herrmann G, et al. Evaluation of health information systems-problems and challenges[J] . International Journal of Medical Informatics, 2003, 71(2-3):125-135.
[90] Aronson A R, Rindflesch T C. Query expansion using the UMLS metathesaurus[C] //Proc of AMIA Annual Fall Symposium. 1996:485-489.
[91] Díaz-Galiano M C, Martín-Valdivia M T, Urea-López L A. Query expansion with a medical ontology to improve a multimodal information retrieval system[J] . Computers in Biology & Medicine, 2009, 39(4):396-403.
[92] Nelson S J, Johnston W D, Humphreys B L. Relationships in medical subject headings(MeSH)[M] //Relationships in the Organization of Knowledge. Netherlands:Springer, 2001:171-184.
[93] Huang C C, Lu Zhiyong. Exploring query expansion for entity searches in PubMed[C] //Proc of the 7th International Workshop on Health Text Mining and Information Analysis. 2016:106-112.
[94] 贾李蓉, 于彤, 崔蒙, 等. 中医药学语言系统研究进展[J] . 中国数字医学, 2014, 9(10):57-59, 62.
[95] 贾李蓉, 刘静, 于彤, 等. 中医药知识图谱构建[J] . 医学信息学杂志, 2015, 36(8):51-53.
[96] Yao Xuchen, Van Durme B. Information extraction over structured data:question answering with freebase[C] //Proc of the 52nd Annual Meeting of the Association for Computational Linguistics. Stroudsburg:Association for Computational Linguistics, 2014:956-966.
[97] Berant J, Chou A, Frostig R, et al. Semantic parsing on freebase from question-answer pairs[C] //Proc of Conference on Empirical Methods in Natural Language Processing. Stroudsburg:Association for Computational Linguistics, 2013.
[98] Bordes A, Weston J, Usunier N. Open question answering with weakly supervised embedding models[C] //Machine Learning and Knowledge Discovery in Databases. Berlin:Springer, 2014:165-180.
[99] Lee M, Cimino J, Zhu Hairan, et al. Beyond information retrieval-medical question answering[C] //Proc of AMIA Annual Symposium. 2006:469-473.
[100] Cao Yonggang, Liu Feifan, Simpson P, et al. AskHERMES:an online question answering system for complex clinical questions[J] . Journal of Biomedical Informatics, 2011, 44(2):277-288.
[101] Cairns B L, Nielsen R D, Masanz J J, et al. The MiPACQ clinical question answering system[C] //Proc of AMIA Annual Symposium. 2011, :171-180.
[102] Terol R M, Martínez-Barco P, Palomar M. A knowledge based method for the medical question answering problem[J] . Computers in Biology & Medicine, 2007, 37(10):1511-1521.
[103] Abacha A B, Zweigenbaum P. MEANS:a medical question-answering system combining NLP techniques and semantic Web technologies[J] . Information Processing & Management, 2015, 51(5):570-594.
[104] García-Cresp á, Rodríguez A, Mencke M, et al. ODDIN:ontology-driven differential diagnosis based on logical inference and probabilistic refinements[J] . Expert Systems with Applications, 2010, 37(3):2621-2628.
[105] Martínez-Romero M, Vázquez-Naya J M, Pereira J, et al. The iOSC3 system:using ontologies and SWRL rules for intelligent supervision and care of patients with acute cardiac disorders[J] . Computational & Mathematical Methods in Medicine, 2013, 2013(5904):articleID 650671.
收稿日期 2017/6/12
修回日期 2017/7/28
页码 1929-1936
中图分类号 TP182
文献标志码 A