摘要
作为一种典型的网络大数据,社交信息网络如微博、Tweeter等,不仅包含用户间复杂的网络结构,而且包含大量用户所发表的微博/Tweet信息.现有链路预测算法大多只利用单方面的网络拓扑信息或非拓扑信息,仍然缺乏有效融合社交信息网络中拓扑与非拓扑信息的链路预测方法.为此,从社交信息网络中用户的主题角度出发,提出一种融合主题相似信息的链路预测方法.首先基于用户文本内容抽取用户的主题表示,并定义用户间的主题相似度;然后基于用户主题相似度,构建了一种用户主题相似稀疏网络;进一步将用户主题相似网络与用户间关注/被关注网络融合在统一的概率矩阵分解框架下,通过学习获得用户的潜在特征表示和网络链路参数;最终在此概率矩阵分解框架下,基于用户的潜在特征表示和链路参数计算得到用户间的链路可能性.所提出的模型提供了一种融合多种网络信息的通用策略和学习方法.实验在包含网络结构与文本信息的4组微博与推特数据集中显示,所提出的融合概率矩阵分解链路方法相比其他链路预测方法更有效.
As one kind of typical network big data, social-information networks such as Weibo and Twitter include both the complex network structure among users and rich microblog/Tweet information published by users. It is notable that most of the existing methods only make use of the network topological information or the non-topological information for link prediction, but there is still a lack of effective methods by fusing the topological information or non-topological information in social-information networks. A link prediction method is proposed from the perspective of users' topic by fusing users' topic similarity in social-information networks. The method goes in accordance with the following sequence: firstly, a topic similarity between users based on users' topic representation is defined, followed by which a topic similarity-based sparse network is constructed; secondly, the information of the following/followed network and the topic similarity-based network are fused into a unified framework of probabilistic matrix factorization, based on which the latent-feature representation of the network nodes and the linking relation parameters are obtained; finally, the linking probability between network nodes is calculated based on the obtained latent-feature representation and linking relation parameters. The proposed approach provides a general modeling strategy fusing multi-network information and a learning-based solution. Link prediction experiments are conducted on four real network datasets, i.e. Twitter and Weibo. The experimental results demonstrate that the proposed method is more effective than others.
引文
[1]Romero D M,Kleinberg J.The directed closure process in hybrid social-information networks,with an analysis of link formation on twitter[C]Proc of the 4th AAAI Int Conf on Weblogs and Social Media.Menlo Park,CA:AAAI,2010:138-145
[2]Wang Yuanzhuo,Jin Xiaolong,Cheng Xueqi.Network big data:Present and future[J].Chinese Journal of Computers,2013,36(6):1125-1138(in Chinese)(王元卓,靳小龙,程学旗.网络大数据:现状与展望[J].计算机学报,2013,36(6):1125-1138)
[3]Zhao Shu,Liu Xiaoman,Duan Zhen,et al.A survey on social ties mining[J].Chinese Journal of Computers,2017,40(3):535-555(in Chinese)(赵姝,刘晓曼,段震,等.社交关系挖掘综述[J].计算机学报,2017,40(3):535-555)
[4]Zhao Zeya,Jia Yantao,Wang Yuanzhuo,et al.Link inference in large scale evolutionable knowledge network[J].Journal of Computer Research and Development,2016,53(2):492-502(in Chinese)(赵泽亚,贾岩涛,王元卓,等.大规模演化知识网络中的关联推理[J].计算机研究与发展,2016,53(2):492-502)
[5]Wang Li,Cheng Suqi,Shen Huawei,et al.Structure inference and prediction in the co-evolution of social networks[J].Journal of Computer Research and Development,2013,50(12):2492-2503(in Chinese)(王莉,程苏琦,沈华伟,等.在线社会网络共演化的结构推断与预测[J].计算机研究与发展,2013,50(12):2492-2503)
[6]Clauset A,Moore C,Newman M E.Hierarchical structure and the prediction of missing links in networks[J].Nature,2008,453(7191):98-101
[7]Zhou Tao,LüLinyuan,Zhang Yicheng.Predicting missing links via local information[J].The European Physical Journal B,2009,71(4):623-630
[8]Wu Sen,Sun Jimeng,Tang Jie.Patent partner recommendation in enterprise social networks[C]Proc of the 6th ACMInt Conf on Web Search and Data Mining.New York:ACM,2013:43-52
[9]Aiello L M,Barrat A,Schifanella R,et al.Friendship prediction and homophily in social media[J].ACMTransactions on the Web,2012,6(2):No.9
[10]Jansen R,Yu Haiyuan,Greenbaum D,et al.A Bayesian networks approach for predicting protein-protein interactions from genomic data[J].Science,2003,302(5644):449-453
[11]Wang Peng,Xu Baowen,Wu Yurong,et al.Link prediction in social networks:The state-of-the-art[J].Science China Information Sciences,2015,58(1):011101
[12]Liben-Nowell D,Kleinberg J.The link prediction problem for social networks[C]Proc of the 12th ACM Int Conf on Information and Knowledge Management.New York:ACM,2003:556-559
[13]LüLinyuan,Zhou Tao.Link Prediction[M].Beijing:Higher Education Press,2013(in Chinese)(吕琳媛,周涛.链路预测[M].北京:高等教育出版社,2013)
[14]Adamic L A,Adar E.Friends and neighbors on the Web[J].Social Networks,2003,25(3):211-230
[15]Lorrain F,White H C.Structural equivalence of individuals in social networks[J].The Journal of Mathematical Sociology,1971,1(1):67-98
[16]Barabasi A L,Albert R.Emergence of scaling in random networks[J].Science,1999,286(5439):509-512
[17]Katz L.A new status index derived from sociometric analysis[J].Psychometrika,1953,18(1):39-43
[18]LüLinyuan,Jin Cihang,Zhou Tao.Similarity index based on local paths for link prediction of complex networks[J].Physical Review E,2009,80(2):046122
[19]Fouss F,Pirotte A,Renders J M,et al.Random-walk computation of similarities between nodes of a graph with application to collaborative recommendation[J].IEEETransactions on Knowledge&Data Engineering,2007,19(3):355-369
[20]Jeh G,Widom J.SimRank:A measure of structural-context similarity[C]Proc of the 8th Int Conf on Knowledge Discovery and Data Mining.New York:ACM,2002:538-543
[21]Edoardo M A,David M B,Stephen E F,et al.Mixed membership stochastic blockmodels[J].Journal of Machine Learning Research,2008,9(5):1981-2014
[22]Holland P W,Laskey K B,Leinhardt S.Stochastic blockmodels:First steps[J].Social Networks,1983,5(2):109-137
[23]Zhu Jun.Max-margin nonparametric latent feature models for link prediction[C]Proc of the 29th Int Conf on Machine Learning.New York:IEEE Communications Society,2012:1179-1186
[24]Palla K,Knowles D,Ghahramani Z.An infinite latent attribute model for network data[C]Proc of the 29th Int Conf on Machine Learning.New York:IEEE Communications Society,2012:1179-1186
[25]Wang Zhiqiang,Liang Jiye,Li Ru,et al.An approach to cold-start link prediction:Establishing connections between non-topological and topological information[J].IEEETransactions on Knowledge&Data Engineering,2016,28(11):2857-2870
[26]Menon A K,Elkan C.Link prediction via matrix factorization[C]Proc of the 22nd European Conf on Machine Learning and the 15th Principles and Practice of Knowledge Discovery in Databases.Berlin:Springer,2011:437-452
[27]Zhai Shuangfei,Zhang Zhongfei.Dropout training of matrix factorization and autoencoder for link prediction in sparse graphs[C]Proc of the 15th Int Conf on Data Mining.Philadelphia,PA:SIAM,2015:451-459
[28]Jaccard P.Etude de la distribution florale dans une portion des Alpes et du Jura[J].Bulletin De La Societe Vaudoise Des Sciences Naturelles,1901,37(142):547-579
[29]Ravasz E,Somera A L,Mongru D A,et al.Hierarchical organization of modularity in metabolic networks[J].Science,2002,297(5586):1551-1555
[30]Papadimitriou A,Symeonidis P,Manolopoulos Y.Fast and accurate link prediction in social networking systems[J].Journal of Systems&Software,2012,85(9):2119-2132
[31]Brin S,Page L.Reprint of:The anatomy of a large-scale hypertextual Web search engine[J].Computer Networks,2012,56(18):3825-3833
[32]Liu Weiping,LüLinyuan.Link prediction based on local random walk[J].Europhysics Letters,2010,89(5):58007
[33]Rowe M,Stankovic M,Alani H.Who will follow whom/Exploiting semantics for link prediction in attentioninformation networks[C]Proc of the 11th Int Semantic Web Conf.Berlin:Springer,2012:476-491
[34]Wang Zhongqing,Li Shoushan,Zhou Guodong.Personal group recommendation via textual and social information[J].Journal of Software,2017,28(9):2468-2480(in Chinese)(王中卿,李寿山,周国栋.基于文本与社交信息的用户群组识别[J].软件学报,2017,28(9):2468-2480)
[35]Bhattacharyya P,Garg A,Wu S F.Analysis of user keyword similarity in online social networks[J].Social Network Analysis and Mining,2011,1(3):143-158
[36]Sa H R D,Prudencio R B C.Supervised link prediction in weighted networks[C]Proc of the Int Joint Conf on Neural Networks.Piscataway,NJ:IEEE,2011:2281-2288
[37]Lichtenwalter R N,Chawla N V.Vertex collocation profiles:Subgraph counting for link analysis and prediction[C]Proc of the 21st Int Conf on World Wide Web.New York:ACM,2012:1019-1028
[38]Leskovec J,Huttenlocher D,Kleinberg J.Predicting positive and negative links in online social networks[C]Proc of the19th Int Conf on World Wide Web.New York:ACM,2010:641-650
[39]Chiang K Y,Natarajan N,Tewari A,et al.Exploiting longer cycles for link prediction in signed networks[C]Proc of the 20th Int Conf on Information and Knowledge Management.New York:ACM,2011:1157-1162
[40]Golub G,Kahan W.Calculating the singular values and pseudo-inverse of a matrix[J].Journal of the Society for Industrial&Applied Mathematics,1965,2(2):205-224
[41]Lee D D,Seung H S.Algorithms for non-negative matrix factorization[C]Proc of the 7th Int Conf on Neural Information Processing Systems.Cambridge,MA:MITPress,2000:556-562
[42]Salakhutdinov R,Mnih A.Probabilistic matrix factorization[C]Proc of the 14th Int Conf on Neural Information Processing Systems.Cambridge,MA:MIT Press,2007:1257-1264
[43]Aral S,Muchnik L,Sundararajan A.Distinguishing influence-based contagion from homophily-driven diffusion in dynamic networks[J].Proceedings of the National Academy of Sciences of the United States of America,2009,106(51):21544-21549
[44]Wu Xindong,Li Yi,Li Lei.Influence analysis of online social networks[J].Chinese Journal of Computers,2014,37(6):735-752(in Chinese)(吴信东,李毅,李磊.在线社交网络影响力分析[J].计算机学报,2014,37(4):735-752)
[45]Blei D M,Ng A Y,Jordan M I.Latent dirichlet allocation[J].Journal of Machine Learning Research,2003,3:993-1022
[46]Kullback S.The Kullback-Leibler distance[J].American Statistician,1987,41(4):340-341
[47]Zhu Shenghuo,Yu Kai,Chi Yun,et al.Combining content and link for classification using matrix factorization[C]Proc of the 30th Annual Int Conf on Research and Development in Information Retrieval.New York:ACM,2007:487-494
[48]Zhang Jing,Liu Biao,Tang Jie,et al.Social influence locality for modeling retweeting behaviors[C]Proc of the22nd Int Joint Conf on Artificial Intelligence.Menlo Park,CA:AAAI,2013:2761-2767
[49]Pedregosa F,Gramfort A,Michel V,et al.Scikit-learn:Machine learning in python[J].Journal of Machine Learning Research,2011,12(10):2825-2830
[50]Fawcett T.An introduction to ROC analysis[J].Pattern Recognition Letters,2006,27(8):861-874