融合规则方法和机器学习的居民出行目的识别方法

柯日宏; 肖亮; 赵大郅; 李胜楠; 王璞

doi:10.13203/j.whugis20240329

摘要: 针对手机定位数据缺乏语义信息的问题，探索手机用户出行目的识别方法，有助突破手机定位数据应用的局限性。当前出行目的识别的研究中普遍存在数据量不足、数据类型单一和非通勤出行目的识别效果不理想等问题。基于大规模、长周期的手机定位数据和多源时空数据，提出一种融合规则方法和机器学习的居民出行目的识别框架。首先，基于规则方法提取用户职住信息，进而识别出通勤出行目的；然后，使用兴趣面数据识别部分非通勤出行目的；最后，构建K-Prototypes-XGBoost模型识别剩余非通勤出行目的。模型综合建成环境、用户属性、活动属性和用户历史活动等多源时空数据，构建特征变量，能够有效地捕捉个体出行习惯与行为模式。研究结果表明，所提方法能够较为准确地识别手机用户出行目的；与5种基准方法进行对比，K-Prototypes-XGBoost模型预测非通勤出行目的效果最好。利用K-Prototypes模型提取相似出行行为的人群再进行预测，能够进一步提高模型性能和可解释性。

Abstract:

Objectives Aiming at the lack of semantic information in mobile phone data, the method for identifying the trip purposes of mobile phone users is explored to break through the limitations of mobile phone data application.

Methods This paper proposes a framework for identifying the trip purposes of residents by combining a rule-based approach and machine learning utilizing large-scale and long-term mobile phone data and multi-source spatiotemporal data. The home and work locations of users are first analyzed using the rule-based approach to identify commuting trips. The area of interest data are then used to identify the purposes of some non-commuting trips. Finally, The K-Prototypes-XGBoost model is used to identify the purposes of other non-commuting trips. The proposed model constructs feature vectors based on multi-source spatiotemporal data, including built environment, user attributes, activity attributes and historical activity of user, to capture the travel behavior habits and patterns of individuals.

Results Comparative experiments on known trip-purpose activity data from May 1st to 31st, 2023 show that, without introducing spatiotemporal features of historical activity of user and clustering information, extreme gradient boosting （XGBoost） outperforms adaptive boosting， random forest， gradient boosting decision tree, and artificial neural network, achieving accuracy of 73.2%, precision of 74.3%, recall of 73.2%, and F1-score of 74.3%. After introducing spatiotemporal features of historical activity of user, the prediction performance of all five models is significantly improved, with the average accuracy, precision, recall, and F1-score increasing by 24.2%, 21.2%, 24.1%, and 23.0%, respectively. On this basis, the proposed K-Prototypes-XGBoost model further improves the prediction performance by incorporating user clustering information. Compared with the conventional XGBoost model, its accuracy, precision, recall, and F1-score increase by 2.7%, 1.5%, 2.7%, and 2.8%, respectively, and all four evaluation metrics reach 91.5%, indicating the best overall performance in identifying non-commuting trip purposes from large-scale, long-term mobile phone data. The confusion matrix further reveals that the proposed model performs best in identifying medical activities, with an accuracy of 94.7%, whereas dining activities exhibit the lowest recognition accuracy of 88.4%. The main misclassifications occur among leisure and entertainment, dining, and shopping activities, because these activities often share highly overlapping spatiotemporal characteristics and are frequently located within the same commercial complexes or adjacent areas.

Conclusions By integrating rule-based inference with the K-Prototypes-XGBoost model, this paper provides an effective solution for recovering the semantic information of trip purposes from mobile phone data. The proposed framework enables a more complete reconstruction of trip purpose chains and promotes the transformation of mobile phone data applications from mobility pattern description to semantic understanding of travel behavior, thereby supporting finer-grained urban transport planning and human activity analysis.

融合规则方法和机器学习的居民出行目的识别方法

An Integrated Rule-Based and Machine Learning Method for Trip Purpose Identification