《计算机应用研究》|Application Research of Computers

基于图划分抽样算法的图表示学习

Graph representation learning based on graph partition sampling algorithm

免费全文下载 (已被下载 次)  
获取PDF全文
作者 夏鑫,高品,陈康,姜进磊
机构 1.清华大学 计算机科学与技术系,北京 100084;2.腾讯 微信事业群,广东 深圳 518057
统计 摘要被查看 次,已被下载
文章编号 1001-3695(2020)09-004-2586-05
DOI 10.19734/j.issn.1001-3695.2019.03.0130
摘要 在基于神经网络的图表示算法中,当节点属性维度过高、图的规模过大时,从内存到显存的数据传输会成为训练性能的瓶颈。针对这类问题,该方法将图划分算法应用于图表示学习中,降低了内存访问的I/O开销。该方法根据图节点的度数,将图划分成若干个块,使用显存缓存池存储若干个特征矩阵块。每一轮训练,使用缓存池中的特征矩阵块,以此来减少内存到显存的数据拷贝。针对这一思想,该方法使用基于图划分的抽样算法,设计显存的缓存池来降低内存的访问,运用多级负采样算法,降低训练中负样本采样的时间复杂度。在多个数据集上,与现有方法对比发现,该方法的下游机器学习准确率与原算法基本一致,训练效率可以提高2~ 7倍。实验结果表明,基于图划分的图表示学习能高效训练模型,同时保证节点表示向量的测试效果。今后的课题可以使用严谨的理论证明,阐明图划分模型与原模型的理论误差。
关键词 图划分; 图表示学习; 图抽样; 图神经网络
基金项目 国家重点研发计划资助项目(2018YFB1003505)
本文URL http://www.arocmag.com/article/01-2020-09-004.html
英文标题 Graph representation learning based on graph partition sampling algorithm
作者英文名 Xia Xin, Gao Pin, Chen Kang, Jiang Jinlei
机构英文名 1.Dept. of Computer Science & Technology,Tsinghua University,Beijing 100084,China;2.WeChat Group,Tencent Corporation,Shenzhen Guangdong 518057,China
英文摘要 When training graph embedding via neural network, high dimension feature vector and large scale graph cause data transferring from memory to GPU to be a bottleneck. Aimed to solve this problem, this paper proposed graph partition based graph representation learning. This method splitted graph nodes into blocks according to their degree. It stored several node feature matrices in buffer pool on GPU. Every epoch, it trained representation during several blocks which fitted into buffer pool to reduce the data transferred from memory to GPU. Based on block split, this method used blocked based sampling algorithm, cached block feature matrix in GPU buffer pool to reduce memory read and built hierarchical negative sampling table, which could sample nodes in const time complex. Compared to related work on several real world datasets, this method achieves competitive accuracy at downstream machine learning task and 2~7 times speedup on training process. The experiments show that graph representation learning based on partition can train model efficiently and generate accurate embedding vectors. In future work, it is worth to prove the deviation between partition based method and original method in theory.
英文关键词 graph partition; graph representation learning; graph sampling; graph neural network
参考文献 查看稿件参考文献
 
收稿日期 2019/3/14
修回日期 2019/5/29
页码 2586-2590,2599
中图分类号 TP183
文献标志码 A