《计算机应用研究》|Application Research of Computers

基于类别信息和特征熵的文本特征权重计算

Feature weighting scheme based on category information and term entropy

免费全文下载 (已被下载 次)  
获取PDF全文
作者 阿力木江·艾沙,殷晓雨,库尔班·吾布力,李喆
机构 新疆大学 a.网络与信息技术中心;b.信息科学与工程学院,乌鲁木齐 830046
统计 摘要被查看 次,已被下载
文章编号 1001-3695(2019)11-007-3237-03
DOI 10.19734/j.issn.1001-3695.2018.05.0294
摘要 基于类别信息的特征权重计算方法对特征与类别的关系表达不够准确,即对于类别频率相同的特征无法比较其对类别的区分能力,因此要考虑特征在类内的分布情况。将特征的反类别频率(inverse category frequency,ICF)和类内熵(entropy)相结合引入到特征权重计算方案中,构造了两种有监督特征权重计算方案。在维吾尔文文本分类语料上进行的实验结果表明,该方法能够明显改善样本的空间分布状态并提高维吾尔文文本分类的微平均<i>F</i><sub>1</sub>值。
关键词 文本分类; 文本特征; 权重计算; 类别频率
基金项目 新疆维吾尔自治区自然科学基金资助项目(2016D01C068)
本文URL http://www.arocmag.com/article/01-2019-11-007.html
英文标题 Feature weighting scheme based on category information and term entropy
作者英文名 Alimjan Aysa, Yin Xiaoyu, Kurban Ubul, Li Zhe
机构英文名 a.Network & Information Technology Center,b.School of Information Science & Engineering,Xinjiang University,Urumqi 830046,China
英文摘要 Feature weighting schemes based on category information is not accurate enough to express the relationship between features and categories. That is the classification ability of the features with the same category frequency can't be compared, so the distribution of the features in the category should be considered. This paper combined the inverse category frequency(ICF) and inner category entropy of the features into the term weight calculation, and constructed two supervised feature weighting schemes. The experimental results on the Uygur text categorization dataset show that this method can obviously improve the spatial distribution of the samples and improve the micro average <i>F</i><sub>1</sub> value of the Uygur text classification.
英文关键词 text classification; text feature; term weighting; category frequency
参考文献 查看稿件参考文献
 
收稿日期 2018/5/7
修回日期 2018/6/27
页码 3237-3239,3285
中图分类号 TP391.1
文献标志码 A