Objectives With the continuous popularity of social network products and applications, there are more and more ways to obtain location check-in data. The judgement of friendship based on check-in data is one of the most popular research directions of location based social networks (LBSN). However, social network data is often difficult to obtain in advance. In order to solve the problem of friend relationship judgement based on location check-in data only, this paper proposed a friend relationship judgment method based on an improved gravity model.
Methods Firstly, the information gain was used to calculate the influence of different feature parameters on friendship judgment, and two feature parameters, residence distance and spatial temporal co-occurrence zone, were selected as consequences. Secondly, the gravity model was improved according to the selected feature parameters, and the model's value range was mapped from 0 to 1 by using Sigmoid function, to facilitate the judgement of friend relationship and calibration of model parameters. Finally, the parameters of the model were calibrated by logistic regression, and the prediction of friend relationship was realized on the Gowalla and Brightkite datasets.
Results Multiple experiments were conducted. Part of check-in data were selected to test the validity of feature parameters. The AUC (area under the curve) value of the spatial temporal co-occurrence zone parameter on the Gowalla dataset was 0.710, and the AUC value in the Brightkite dataset was 0.760. The residence distance parameter on the Gowalla dataset got an AUC value of 0.634, and 0.647 on the Brightkite dataset. By setting different degrees of balance, the improved gravity model was used to conduct friend relationship prediction experiments. As the data imbalance increased, the accuracy increased, while the recall rate and F-value decreased. However, the model still had a high accuracy and recall rate even with an imbalance of 1∶150. The accuracy rate was 0.846 1 and the recall rate was 0.788 5 when the Gowalla dataset as the training dataset while the test was performed on the Brightkite dataset. The accuracy rate was 0.859 1, and the recall rate was 0.853 7 when the Brightkite dataset as the training dataset while the test was performed on the Gowalla dataset. The contrast experiment with the friend relationship probability model shows that the recall rate of the contrast model dropped from 0.75 to almost 0 with the increase of the threshold. The experimental results above show that the selected feature parameters and the proposed model have a good predictive effect on the judgment of friend relationship.
Conclusions The proposed method can realize friend relationship judgment under the condition of only relying on location check-in data. The model has better stability between datasets from different sources, and the overall performance of this method is significantly higher than that of the comparison method.