Abstract:
Objectives: With the rapid increase in vehicle ownership and the growing complexity of road environments, traditional traffic management methods face dual challenges of perception accuracy and response efficiency in dense and dynamic highway scenarios. Video surveillance, as a low-cost and real-time sensing approach, has become an essential data source for vehicle information collection. However, existing video-based studies primarily focus on vehicle detection and often fail to achieve accurate perception of multidimensional attributes such as speed, color, traffic flow, and lane position. To address existing limitations, a video data-driven framework is proposed for precise perception of multidimensional vehicle information on highways, aiming to enhance the comprehensiveness and reliability of traffic perception.
Methods: First, a scale-adaptive highway vehicle detection method is introduced. The proposed C3k2_DSConv module enhances the model’s adaptability to vehicles of various scales and morphologies by integrating depthwise separable convolutions and feature reuse strategies. A convolution kernel adjustment mechanism within the Dynamic Head is further designed to improve the model’s detection robustness under diverse viewing angles and occlusion conditions. Second, based on the detected vehicle bounding boxes, multidimensional vehicle attributes are derived through specialized perception algorithms: (1) vehicle speed is estimated using a dual-virtual-line timing model, which computes velocity through frame-based centroid tracking. (2) vehicle color is identified by combining RGB-to-HSV color space transformation with feature enhancement through region-based color histogram mapping. (3) traffic flow is calculated via a virtual line constraint counting method that minimizes duplicate detection. (4) lane position is calibrated using a virtual bounding-box constraint model that aligns vehicle trajectories with detected lane boundaries. To validate the proposed framework, a new highway vehicle dataset is constructed from real-world monitoring footage, incorporating variations in vehicle types, illumination, and scale to ensure robustness and generalization.
Results: Comprehensive experiments were conducted using state-of-the-art detectors including YOLOv5~YOLOv13, RT-DETR, SSD, Faster R-CNN, and RetinaNet for performance comparison. The proposed method achieved superior results, with precision (P), recall (R), and mAP@0.5 of 0.903, 0.899, and 0.954, respectively, outperforming all baseline models. Furthermore, multidimensional perception experiments showed that the mean Average Precision (mAP@0.5) for vehicle detection reached 95.4%, traffic flow accuracy was 99%, lane calibration accuracy reached 92%, vehicle color recognition accuracy was 81%, and the average absolute error for vehicle speed detection was 9.36%. These quantitative results confirm the effectiveness and robustness of the proposed framework in complex highway environments.
Conclusions: The framework demonstrates that video-based methods can achieve high-precision perception of multiple vehicle attributes when enhanced with adaptive feature extraction and multidimensional information modeling. It enables comprehensive perception of speed, color, traffic flow, and lane position, thereby providing a reliable basis for intelligent highway management, traffic law enforcement, and accident early warning systems.