用户名: 密码: 验证码:
基于兴趣偏好的微博用户性别推断研究
详细信息    查看全文 | 推荐本文 |
  • 英文篇名:User Interest Preferences for Gender Inference on Microblog
  • 作者:宋巍 ; 刘丽珍 ; 王函石
  • 英文作者:SONG Wei;LIU Li-zhen;WANG Han-shi;College of Information Engineering,Capital Normal University;
  • 关键词:用户隐藏属性 ; 用户性别推断 ; 用户偏好建模 ; 社交媒体
  • 英文关键词:user latent attribute;;user gender inference;;user preference modeling;;social media
  • 中文刊名:DZXU
  • 英文刊名:Acta Electronica Sinica
  • 机构:首都师范大学信息工程学院;
  • 出版日期:2016-10-15
  • 出版单位:电子学报
  • 年:2016
  • 期:v.44;No.404
  • 基金:国家自然科学基金(No.61402304,No.61303105);; 北京市自然科学基金(No.4154065);; 教育部人文社会科学规划项目(No.14YJAZH046);; 北京市教委科研支持项目(No.KM201610028015)
  • 语种:中文;
  • 页:DZXU201610034
  • 页数:8
  • CN:10
  • ISSN:11-2087/TN
  • 分类号:237-244
摘要
用户属性,如:性别、年龄等,是计算心理学、个性化搜索、社会化商业推广等研究和应用考察的核心因素.利用用户生成数据自动推断用户属性成为新兴的研究课题.本文提出基于用户兴趣偏好研究微博用户的性别推断问题.考察了用户内容偏好以及关注行为偏好对性别推断的作用.在新浪微博近万名用户的数据集上证明了用户偏好特征的有效性.与传统的语用特征相比,将用户内容偏好与关注偏好相结合能够显著提高推断准确率.关注偏好特征对推断非活跃用户的性别尤其有效.
        User demographic attributes,such as gender and age,are the core factors to be considered for research and applications in computational psychology,personalized search and social commerce marketing. Automatic user latent attribute inference based on user generated data becomes an emerging research topic. This paper proposes a methed for user gender inference on Microblog by exploiting user content preferences and following behaviour preferences. The experiments on a dataset collected from Sina Weibo that consists of nearly 10000 users demonstrate the effectiveness of user preferences features.Comparing with the traditional language usage features,combining user content preferences and user following preferences features can improve the inference accuracy largely. The user following preferences features are especially effective for inferring the gender of inactive users.
引文
[1]Lazer David,Alex Sandy Pentland,Lada Adamic,Sinan Aral,Albert Laszlo Barabasi,Devon Brew er,Nicholas Christakis,et al.Life in the netw ork:the coming age of computational social science[J].Science,2009,323(5915):721.
    [2]Sun R.The Cambridge Handbook of Computational Psychology[M].Cambridge University Press,2008.
    [3]Ingmar W,Carlos C.The demographics of web search[A].Proceedings of the 33rd International ACM SIGIR Conference on Research and Development in Information Retrieval[C].New York:ACM,2010.523-530.
    [4]Duhigg C.The power of habit:why we do what we do in life and business[J].Random House LLC,2012,34(10).
    [5]De Choudhury M,et al.Predicting depression via social media[A].Proceedings of AAAI Conference on Weblogs and Social M edia[C].Palo Alto,California:AAAI Press,2013.128-137.
    [6]Newman ML,et al.Gender differences in language use:An analysis of 14,000 text samples[J].Discourse Processes,2008,45(3):211-236.
    [7]Pennebaker JW,Stone LD.Words of wisdom:language use over the life span[J].Journal of Personality and Social Psychology,2003,85(2):291-301.
    [8]Burger JD,et al.Discriminating gender on Twitter[A].Proceedings of Empirical M ethods in Natural Language Processing[C].Stroudsburg,PA,USA:ACL,2011.1301
    [9]Gosling SD,Gaddis S,Vazire S.Personality impressions based on facebook profiles[A].Proceedings of AAAI Conference on Weblogs and Social M edia[C].Palo Alto,California:AAAI Press,2007.1-4.
    [10]Argamon,et al.Mining the Blogosphere:Age,gender and the varieties of self-expression[J].First M onday,2007,12(9).
    [11]Burger JD,Henderson JC.An exploration of observable features related to blogger age[A].Proceedings of AAAI Spring Symposium:Computational Approaches to Analyzing Weblogs[C].Palo Alto,California:AAAI Press,2006.15-20.
    [12]Rao D,et al.Classifying latent user attributes in twitter[A].Proceedings of the 2nd International Workshop on Search and M ining User-generated Contents[C].New York:ACM,2010.37-44.
    [13]Dong N,et al.How old do you think i am?:a study of language and age in tw itter[A].Proceedings of the Seventh International AAAI Conference on Weblogs and Social M edia[C].Palo Alto,California:AAAI Press,2013.439-448.
    [14]Kosinski M,Stillwell D,Graepe T.Private traits and attributes are predictable from digital records of human behavior[J].The National Academy of Sciences,2013,110:5802-5805.
    [15]Schwartz H A,et al.Personality,gender,and age in the language of social media:the open-vocabulary approach[J].Plo S One,2013,8(9).
    [16]Tang C,et al.What’s in a name:a study of names,gender inference,and gender behavior in facebook[J].Database Systems for Advanced Applications,2011,344-356.
    [17]Elena Z,Lise G.To join or not to join:the illusion of privacy in social netw orks w ith mixed public and private user profiles[A].Proceedings of the 18th International Conference on World Wide Web[C].New York:ACM,2009.531-540.
    [18]Alan M,et al.You are who you know:inferring user profiles in online social netw orks[A].Proceedings of the 3rd ACM International Conference on Web Search and Data M ining[C].New York:ACM,2010.251-260.
    [19]Pennacchiotti M,Popescu A-M.Democrats,republicans and starbucks afficionados:user classification in tw itter[A].Proceedings of ACM SIGKDD International Conference on Know ledge Discovery in Data M ining[C].New York:ACM,2011.430-438.
    [20]Golbeck,et al.Predicting personality from twitter[A].Proceedings of the IEEE Third International Conference on Social Computing[C].IEEE,2011.149-156.
    [21]Yoram,B,et al.Personality and patterns of Facebook usage[A].Proceedings of the 3rd Annual ACM Web Science Conference[C].New York:ACM,2012.24-32.
    [22]Daniele Q,et al.Our Twitter profiles,our selves:Predicting personality w ith Tw itter[A].Proceedings of the IEEE Third International Conference on Social Computing[C].IEEE,2011.180-185.
    [23]De Choudhury M,et al.Characterizing and predicting postpartum depression from shared facebook data[A].Proceedings of the 17th ACM Conference on Computer Supported Cooperative Work&Social Computing[C].New York:ACM,2014.626-638.
    [24]Li Jiwei,Ritter A,Hovy E.Weakly supervised user profile extraction from Tw itter[A].Proceedings of the 52nd Annual M eeting of the Association for Computational Linguistics[C].Stroudsburg,PA,USA:ACL,2014.165-174.
    [25]Tausczik,YR,Pennebaker JW.The psychological meaning of w ords:LIWC and computerized text analysis methods[J].Journal of Language and Social Psychology,2010,29(1):24-54.
    [26]Gao,R,et al.Developing simplified Chinese psychological linguistic analysis dictionary for microblog[J].Brain and Health Informatics,2013,359-368.
    [27]Huijie L,et al.User-level psychological stress detection from social media using deep neural netw ork[A].Proceedings of ACM International Conference on M ultimedia[C].New York:ACM,2014.507-516.
    [28]唐琴,林鸿飞.文本中人物性别识别研究[J].中文信息学报,2010,2:46-51.Tang Qin,Lin H.Research on gender recognition for character in text[J].Journal of Chinese Information Processing,2010,24(2):46-51.(in Chinese)
    [29]王晶晶,李寿山,黄磊.中文微博用户性别分类方法研究[J].中文信息处理,2014,28(6):150-155.Wang Jingjing,Li Shoushan,Huang Lei.User gender classification in Chinese M icroblog[J].Journal of Chinese Information Processing,2010,28(6):150-155.(in Chinese)
    [30]Morgane C,Sonderegger M,Ruths D.Gender inference of tw itter users in non-English contexts[A].Proceedings of the Conference on Empirical M ethods in Natural Language Processing[C].Stroudsburg,PA,USA:ACL,2013.1136-1145.
    [31]Zamal A,et al.Homophily and latent attribute inference:inferring latent attributes of tw itter users from neighbors[A].Proceedings of AAAI Conference on Weblogs and Social M edia[C].Palo Alto,California:AAAI Press,2012.387-390.
    [32]Mislove A,et al.Understanding the demographics of twitter users[A].Proceedings of AAAI Conference on Weblogs and Social M edia[C].Palo Alto,California:AAAI Press,2011.554-557.
    [33]Liu W,Ruths D.What’s in a name?using first names as features for gender inference in Tw itter[A].Proceedings of the 2013 AAAI Spring Symposium[C].Palo Alto,California:AAAI Press,2013.10-16.
    [34]Ghorab MR,et al.Personalised information retrieval:survey and classification[J].User M odeling and User-Adapted Interaction,2013,4(23):381-443.
    [35]Bobadilla,et al.Recommender systems survey[J].Knowledge-Based Systems,2013,46:109-132.
    [36]Liangjie Hong,Brian D Davison.Empirical study of topic modeling in tw itter[A].Proceedings of the First Workshop on Social M edia Analytics[C].New York:ACM,2010.80-88.
    [37]Anderson WT,Golden LL.Lifestyle and psychographics:a critical review and recommendation[J].Advances in Consumer Research,1984,11(1).
    [38]Blei,DM,Ng AY,Jordan MI.Latent dirichlet allocation[J].Journal of Machine Learning Research,2003,3:993-1022.
    [39]Griffiths,TL,Steyvers M.Finding scientific topics[J].National Academy of Sciences of the United States of America,2004,101:5228-5235.
    [40]Jacob Cohen et al.A coef?cient of agreement for nominal scales[J].Educational and Psychological M easurement,1960,20(1):37-46.
    [41]Fan R.-E.,et al.LIBLINEAR:A library for large linear classification[J].Journal of M achine Learning Research,2008,9:1871-1874.

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700