《计算机应用研究》|Application Research of Computers

面向中文敏感词变形体的识别方法研究

Study on identification method for change form of Chinese sensitive words

免费全文下载 (已被下载 次)  
获取PDF全文
作者 付聪,余敦辉,张灵莉
机构 1.湖北大学 计算机与信息工程学院,武汉 430062;2.湖北省教育信息化工程技术中心,武汉 430062
统计 摘要被查看 次,已被下载
文章编号 1001-3695(2019)04-007-0988-04
DOI 10.19734/j.issn.1001-3695.2017.11.0996
摘要 针对网络信息中所包含的敏感词,尤其是中文敏感词变形体的识别成为了一个迫切需要解决的问题。通过分析汉字的结构和读音等特征提出了一种中文敏感词变形体的识别方法。该方法针对词的拼音、词的简称和词的拆分三种敏感词变形体分别设计了基于易混拼音分组的敏感词的识别算法(SPGR)、字符串的简称识别算法(SNR)和基于KMP的汉字拆分识别算法(WS-KMP),有效提高了敏感词审查的准确率和效率。实验结果表明,该方法在识别中文敏感词变形体时有较高的查全率和查准率。
关键词 变形体; 敏感词识别; 编辑距离; KMP算法
基金项目 国家“973”计划资助项目(2014CB340404)
国家自然科学基金资助项目(61373037,61672387)
本文URL http://www.arocmag.com/article/01-2019-04-007.html
英文标题 Study on identification method for change form of Chinese sensitive words
作者英文名 Fu Cong, Yu Dunhui, Zhang Lingli
机构英文名 1.School of Computer Science & Information Engineering,Hubei University,Wuhan 430062,China;2.Hubei Provincial Center for Education Information Technology Studies,Wuhan 430062,China
英文摘要 Recognizing the sensitive words in the network information, especially the change form of Chinese sensitive words, is an urgent problem to be solved. By analyzing the structure and pronunciation of Chinese characters, this paper proposed a method of recognition of the change form of Chinese sensitive words. This method designed sensitive word recognition algorithm based on the grouping of confusing pinyin, string abbreviation recognition algorithm and recognition algorithm based on KMP's character split recognition algorithm for the pinyin of word, the abbreviation of word and the split of word, and improved the accuracy and efficiency of the review. The experimental results show that the proposed method has higher recall and precision when recognizing the change form of Chinese sensitive words.
英文关键词 change form; sensitive word recognition; edit distance; KMP algorithm
参考文献 查看稿件参考文献
 
收稿日期 2017/11/11
修回日期 2017/12/26
页码 988-991
中图分类号 TP391.1
文献标志码 A