基于复杂网络重叠社团发现的微博话题检测
DOI:
作者:
作者单位:

作者简介:

通讯作者:

中图分类号:

TP391

基金项目:

国家自然科学基金


Topic Detection Based on Overlapping Community in Complex Network
Author:
Affiliation:

Fund Project:

  • 摘要
  • |
  • 图/表
  • |
  • 访问统计
  • |
  • 参考文献
  • |
  • 相似文献
  • |
  • 引证文献
  • |
  • 资源附件
  • |
  • 文章评论
    摘要:

    社交媒体话题检测一直是个热点问题,由于社交媒体数据杂乱异构,且具有时效性,语义模糊性等特点,话题检测也是个难点问题。文章利用复杂网络对社交文本数据进行建模,并结合一种基于极大团凝聚层次聚类的重叠社团发现方法实现了社交话题的检测。文本数据建模中,文章通过自定义突发系数量化话题词,即把话题词看作具有时域分布偏好的关键词,并通过自定义的相关系数连接话题词,完成话题网络的构建。同时,为使自定义系数更适用于动态数据环境,实验结合数据进行了适应性测试优化系数值。文章把采用EAGLE重叠社团发现方法在公开数据集上评测,根据Q函数值显示结果明显优于当前一些重叠社团发现策略,提出策略对采样的60万条青少年社交数据进行了话题分析并可视化了分析结果。

    Abstract:

    Topic detection in social media is a hot yet challenging issue in social computing given most data there are heterogeneous, time-evolving and linguistically ambiguous. In this paper, we explore the idea of achieving this goal through complex network modeling which has demonstrated excellent interpretability of the real world. Specifically, a complex network was constructed based on pre-processed topic words where two parameters, namely the emergency and correlation coefficients, were also introduced to allow us to filter social data through the network as well as determine their possible correlations. This approach was then applied to analyze 600,000 messages by teenager users in Weibo.com to identify overlapping communities with the help of the well-established algorithm EAGLE. It was demonstrated that, compared to other popular approaches such as CONGO and Peacock a much better Q-value results has been obtained by the method proposed here.

    参考文献
    相似文献
    引证文献
引用本文

引用本文格式: 尹兰,程飞,任亚峰,姬东鸿. 基于复杂网络重叠社团发现的微博话题检测[J]. 四川大学学报: 自然科学版, 2016, 53: 1233.

复制
分享
文章指标
  • 点击次数:
  • 下载次数:
  • HTML阅读次数:
  • 引用次数:
历史
  • 收稿日期:2015-11-12
  • 最后修改日期:2016-02-26
  • 录用日期:2016-03-29
  • 在线发布日期: 2016-11-29
  • 出版日期: