《计算机应用研究》|Application Research of Computers

一种基于多桥映射的跨领域文本分类方法

Cross-domain text classification approach based on multi-bridge mapping

免费全文下载 (已被下载 次)  
获取PDF全文
作者 杨奇奇,张玉红,胡学钢
机构 合肥工业大学 计算机与信息学院,合肥 230009
统计 摘要被查看 次,已被下载
文章编号 1001-3695(2018)04-0996-05
DOI 10.3969/j.issn.1001-3695.2018.04.008
摘要 跨领域分类旨在利用已标记的源领域信息来为概率分布不同、未标记的目标领域训练一个精确的分类器。已有工作大多以文本主题为特征表现形式,并基于共享主题来建立领域间独有主题的映射关系,从而达到跨领域学习的目的。现实中领域间的连接可以是多角度的,但这种基于单一共享主题的映射方式,存在语义表示不完备和偏差性等问题,从而影响跨领域分类精度。基于此,提出一种基于多桥映射的跨领域分类方法,通过提取多重的共享主题和领域独有主题,并以多重共享主题为桥梁来建立领域独有主题之间的多重映射关系,从而实现跨领域的分类。在20Newsgroups和Reuters-21578数据集上的实验结果表明,与同类算法相比,所提算法在分类精度上具有优越性。
关键词 跨领域分类;多桥映射;主题;文本分类
基金项目 国家重点研发计划专项课题(2016YFC0801406)
国家自然科学基金青年基金资助项目(61503112,61305063)
国家自然科学基金资助项目(61673152)
本文URL http://www.arocmag.com/article/01-2018-04-008.html
英文标题 Cross-domain text classification approach based on multi-bridge mapping
作者英文名 Yang Qiqi, Zhang Yuhong, Hu Xuegang
机构英文名 SchoolofComputer&Information,HefeiUniversityofTechnology,Hefei230009,China
英文摘要 Cross-domain text classification aims to exploit labeled data in one domain to train an accurate classification for another target domain, where the distribution is different form the source domain. To achieve the cross-domain learning, many existing works used topics as a new feature representation, usually built a mapping between the domain-specific topics using the shared topics as a bridge. However, the connection in domains could be multi-angle, those mapping methods based on the single shared topics had several weaknesses, which impacted classification precision, for example, the semantic representation was incomplete and had the deviation. Motivated by this, this paper proposed a new approach based on multi-bridge mapping for cross-domain text classification. It first extracted both multi-layer shared and domain-specific topics, and then built multi-mapping between the domain-specific topics in different domains by using the bridge of multi-layer shared topics. Experimental results conducted on 20newsgroups and Reuters-21578 datasets demonstrate the effectiveness of the proposed approach.
英文关键词 cross-domain classification; multi-bridge mapping; topics; text classification
参考文献 查看稿件参考文献
  [1] Pan S J, Yang Qiang. A survey on transfer learning[J] . IEEE Trans on Knowledge and Data Engineering, 2010, 22(10):1345-1359.
[2] Long Mingsheng, Wang Jianmin, Ding Guiguang, et al. Transfer learning with graph co-regularization[J] . IEEE Trans on Knowledge & Data Engineering, 2014, 26(7):1805-1818.
[3] Swietojanski P, Ghoshal A, Renals S. Unsupervised cross-lingual knowledge transfer in DNN-based LVCSR[C] //Proc of the 4th IEEE Workshop on Spoken Language Technology. 2012:246-251.
[4] 洪佳明, 印鉴, 黄云, 等. TrSVM:一种基于领域相似性的迁移学习算法[J] . 计算机研究与发展, 2011, 48(10):1823-1830.
[5] Li Lianghao, Jin Xiaoming, Long Mong’en. Topic correlation analysis for cross-domain text classification[C] //Proc of the 26th AAAI Conference on Artificial Intelligence. [S. l. ] :AAAI Press, 2012:998-1004.
[6] Zhuang Fuzhen, Luo Ping, Shen Zhiyong, et al. Collaborative dual-PLSA:mining distinction and commonality across multiple domains for text classification[C] //Proc of the 19th ACM Conference on Information and Knowledge Management. New York:ACM Press, 2010:359-368.
[7] Zhuang Fuzhen, Luo Ping, Xiong Hui, et al. Exploiting associations between word clusters and document classes for cross-domain text categorization[J] . Statistical Analysis & Data Mining, 2011, 4(1):100-114.
[8] Li Yifeng, Ngom A. Supervised dictionary learning via non-negative matrix factorization for classification[C] //Proc of the 11th International Conference on Machine Learning and Applications. Washington DC:IEEE Computer Society, 2012:439-443.
[9] Wang Dingding, Li Tao, Ding C. Weighted feature subset non-negative matrix factorization andits applications to document understanding[C] //Proc of the 10th International Conference on Data Mining. 2010:541-550.
[10] Zhuang Fuzhen, Luo Ping, Du Changying, et al. Triplex transfer learning:exploiting both shared and distinct concepts for text classification[J] . IEEE Trans on Cybernetics, 2014, 44(7):1191-1203.
[11] Pan Jianhan, Hu Xuegang, Zhang Yuhong, et al. Quadruple transfer learning:exploiting both shared and non-shared concepts for text classification[J] . Knowledge-Based Systems, 2015, 90(12):199-210.
[12] Blitzer J, McDonald R, Pereira F. Domain adaptation with structural correspondence learning[C] //Proc of Conference on Empirical Methods in Natural Language Processing. Stroudsburg:Association for Computational Linguistics, 2006:120-128.
[13] Pan S J, Ni Xiaochuan, Sun Jiantao, et al. Cross-domain sentiment classification via spectral feature alignment[C] //Proc of the 19th International Conference on World Wide Web. New York:ACM Press, 2010:751-760.
[14] Zhang Yuhong, Hu Xuegang, Li Peipei, et al. Cross-domain sentiment classification-feature divergence, polarity divergence or both?[J] . Pattern Recognition Letters, 2015, 65(11):44-50.
[15] Hofmann T. Probabilistic latent sentiment indexing[C] //Proc of the 20th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. 1999:50-57.
[16] Hu Xuegang, Pan Jianhan, Li Peipei, et al. Multi-bridge transfer learning[J] . Knowledge-Based Systems, 2016, 97(4):60-74.
[17] Lin J. Divergence measures based on the Shannon entropy[J] . IEEE Trans on Information Theory, 1991, 37(1):145-151.
[18] Fan Rong’en, Chang Kaiwei, Hsieh C J, et al. LIBLINEAR:a library for large linear classification[J] . Journal of Machine Learning Research, 2008, 9(6):1871-1874.
收稿日期 2016/12/12
修回日期 2017/2/15
页码 996-1000
中图分类号 TP391.1
文献标志码 A