Abstract:
Objectives Aiming at the lack of semantic information in mobile phone data, the method for identifying the trip purposes of mobile phone users is explored to break through the limitations of mobile phone data application.
Methods This paper proposes a framework for identifying the trip purposes of residents by combining a rule-based approach and machine learning utilizing large-scale and long-term mobile phone data and multi-source spatiotemporal data. The home and work locations of users are first analyzed using the rule-based approach to identify commuting trips. The area of interest data are then used to identify the purposes of some non-commuting trips. Finally, The K-Prototypes-XGBoost model is used to identify the purposes of other non-commuting trips. The proposed model constructs feature vectors based on multi-source spatiotemporal data, including built environment, user attributes, activity attributes and historical activity of user, to capture the travel behavior habits and patterns of individuals.
Results Comparative experiments on known trip-purpose activity data from May 1st to 31st, 2023 show that, without introducing spatiotemporal features of historical activity of user and clustering information, extreme gradient boosting (XGBoost) outperforms adaptive boosting, random forest, gradient boosting decision tree, and artificial neural network, achieving accuracy of 73.2%, precision of 74.3%, recall of 73.2%, and F1-score of 74.3%. After introducing spatiotemporal features of historical activity of user, the prediction performance of all five models is significantly improved, with the average accuracy, precision, recall, and F1-score increasing by 24.2%, 21.2%, 24.1%, and 23.0%, respectively. On this basis, the proposed K-Prototypes-XGBoost model further improves the prediction performance by incorporating user clustering information. Compared with the conventional XGBoost model, its accuracy, precision, recall, and F1-score increase by 2.7%, 1.5%, 2.7%, and 2.8%, respectively, and all four evaluation metrics reach 91.5%, indicating the best overall performance in identifying non-commuting trip purposes from large-scale, long-term mobile phone data. The confusion matrix further reveals that the proposed model performs best in identifying medical activities, with an accuracy of 94.7%, whereas dining activities exhibit the lowest recognition accuracy of 88.4%. The main misclassifications occur among leisure and entertainment, dining, and shopping activities, because these activities often share highly overlapping spatiotemporal characteristics and are frequently located within the same commercial complexes or adjacent areas.
Conclusions By integrating rule-based inference with the K-Prototypes-XGBoost model, this paper provides an effective solution for recovering the semantic information of trip purposes from mobile phone data. The proposed framework enables a more complete reconstruction of trip purpose chains and promotes the transformation of mobile phone data applications from mobility pattern description to semantic understanding of travel behavior, thereby supporting finer-grained urban transport planning and human activity analysis.