张政, 江南, 曹一冰, 张江水, 杨振凯. 基于改进重力模型的签到数据好友关系判断方法[J]. 武汉大学学报 ( 信息科学版), 2022, 47(4): 604-612. DOI: 10.13203/j.whugis20190180
引用本文: 张政, 江南, 曹一冰, 张江水, 杨振凯. 基于改进重力模型的签到数据好友关系判断方法[J]. 武汉大学学报 ( 信息科学版), 2022, 47(4): 604-612. DOI: 10.13203/j.whugis20190180
ZHANG Zheng, JIANG Nan, CAO Yibing, ZHANG Jiangshui, YANG Zhenkai. A Method for Friendship Judgement Based on Improved Gravity Model with Check-in Data[J]. Geomatics and Information Science of Wuhan University, 2022, 47(4): 604-612. DOI: 10.13203/j.whugis20190180
Citation: ZHANG Zheng, JIANG Nan, CAO Yibing, ZHANG Jiangshui, YANG Zhenkai. A Method for Friendship Judgement Based on Improved Gravity Model with Check-in Data[J]. Geomatics and Information Science of Wuhan University, 2022, 47(4): 604-612. DOI: 10.13203/j.whugis20190180

基于改进重力模型的签到数据好友关系判断方法

A Method for Friendship Judgement Based on Improved Gravity Model with Check-in Data

  • 摘要: 利用签到数据进行好友关系预测是基于位置的社交网络的主要研究方向之一。由于社会关系网络数据往往事先难以获取,为了能够仅依靠位置签到数据实现好友关系判断,提出了一种基于改进重力模型的签到数据好友关系判断方法。首先,利用信息增益计算不同特征参数对好友关系的影响,并选择了用户居住地和时空共现区两个特征参数;然后,针对所选择的两个特征参数对重力模型进行改进,并利用Sigmoid函数将其值域映射到0~1,以便好友关系的判断及模型参数标定;最后,利用逻辑回归实现了模型参数的标定,并在测试数据集上实现了好友关系的预测。分别在Gowalla和Brightkite数据集上利用改进重力模型进行了交叉实验,并与好友关系概率模型进行了对比实验。结果表明,所提方法能够在仅仅依靠位置签到数据的条件下实现好友关系判断,模型在不同来源的数据之间具有较好的稳定性,且该方法的总体效果明显高于对比方法。

     

    Abstract:
      Objectives  With the continuous popularity of social network products and applications, there are more and more ways to obtain location check-in data. The judgement of friendship based on check-in data is one of the most popular research directions of location based social networks (LBSN). However, social network data is often difficult to obtain in advance. In order to solve the problem of friend relationship judgement based on location check-in data only, this paper proposed a friend relationship judgment method based on an improved gravity model.
      Methods  Firstly, the information gain was used to calculate the influence of different feature parameters on friendship judgment, and two feature parameters, residence distance and spatial temporal co-occurrence zone, were selected as consequences. Secondly, the gravity model was improved according to the selected feature parameters, and the model's value range was mapped from 0 to 1 by using Sigmoid function, to facilitate the judgement of friend relationship and calibration of model parameters. Finally, the parameters of the model were calibrated by logistic regression, and the prediction of friend relationship was realized on the Gowalla and Brightkite datasets.
      Results  Multiple experiments were conducted. Part of check-in data were selected to test the validity of feature parameters. The AUC (area under the curve) value of the spatial temporal co-occurrence zone parameter on the Gowalla dataset was 0.710, and the AUC value in the Brightkite dataset was 0.760. The residence distance parameter on the Gowalla dataset got an AUC value of 0.634, and 0.647 on the Brightkite dataset. By setting different degrees of balance, the improved gravity model was used to conduct friend relationship prediction experiments. As the data imbalance increased, the accuracy increased, while the recall rate and F-value decreased. However, the model still had a high accuracy and recall rate even with an imbalance of 1∶150. The accuracy rate was 0.846 1 and the recall rate was 0.788 5 when the Gowalla dataset as the training dataset while the test was performed on the Brightkite dataset. The accuracy rate was 0.859 1, and the recall rate was 0.853 7 when the Brightkite dataset as the training dataset while the test was performed on the Gowalla dataset. The contrast experiment with the friend relationship probability model shows that the recall rate of the contrast model dropped from 0.75 to almost 0 with the increase of the threshold. The experimental results above show that the selected feature parameters and the proposed model have a good predictive effect on the judgment of friend relationship.
      Conclusions  The proposed method can realize friend relationship judgment under the condition of only relying on location check-in data. The model has better stability between datasets from different sources, and the overall performance of this method is significantly higher than that of the comparison method.

     

/

返回文章
返回