基于时空主题模型的微博主题提取
Constructing Spatio-Temporal Topic Model for Microblog Topic Retrieving
-
摘要: 目的 已有地理主题模型没有考虑不同区域对微博主题影响程度的差异性,同时他们将时间要素离散化,难以得到连续时间上的微博主题强度。提出了一种顾及连续时间及区域影响力因素的时空主题模型。该方法将城市划分为多个区域,依据各兴趣点类型及数量对区域赋予权重以表达区域社会功能对微博主题的影响程度,基于稀疏增量式生成模型表达微博主题分布,利用 Beta分布描述主题在连续时间中的强度,最终通过Gibbs采样得到时空主题模型各参数。实验表明,本文方法能发现连续时间上微博主题的演变,与已有地理主题模型相比,能更加准确地提取微博主题。Abstract: Objective Existing geography topic models do not consider the degree to which different regions influ-ence microblog topics.Meanwhile,these models describe the topic evolutions in a discrete mannerwhich prevents the acquisition of topic intensities over continuous time.This paper proposes a novelspatio-temporal topic model to discover microblog topics by introducing continuous time and region in-fluences.A city was divided into multiple geographic regions.Region weights,expressing the regionfunction influence degree on microblog topics,were allocated to regions based on the number of differ-ent POI(Point of Interest)types.Then a sparse additive generative model was applied to generate mi-croblog topic distributions.Beta distributions were employed to depict topic evolution over continuoustime.Finally,we use a Gibbs sampling method to estimate model parameters.Experimental resultsshowed that not only does our model track the temporal distribution of microblog topics but also en-hances topic extraction accuracy when compared with other geography topic models.