Abstract:
Objectives: In addition to the extent of water accumulation, the depth of water accumulation is a more important criterion for measuring the severity of urban waterlogging. In response to the limitations of traditional urban waterlogging monitoring methods, such as high costs, limited monitoring range, and the ability to only monitor specific flood-prone areas, a model for assessing the depth levels of urban waterlogging using image information captured by the public has been proposed.
Methods: We used web crawling techniques with keywords like "urban waterlogging" and "vehicles submerged" to collect images and videos from the internet that include small vehicles in water, creating a sample set. Using the depth of vehicle submersion as a reference standard, we divided the waterlogging depth into three levels: safe (-20cm), unsafe (20-60cm), and dangerous (>60cm), and created corresponding labels for these levels. We applied the YOLOv8 model to construct four models of different network complexities: YOLOv8l, YOLOv8m, YOLOv8n, and YOLOv8s. During training, we used a confusion matrix and four accuracy metrics—Precision, Recall, mean Average Precision (mAP), and Average Precision (AP)—to comprehensively evaluate the model's accuracy.
Results: (1) For the training set, all four YOLOv8 models converged after about 40 training epochs, with Precision, Recall, and mAP50 values stabilizing around 60%-70%, and mAP50:95 values around 35%. (2) In the test set, for the Precision metric, YOLOv8m scored the highest for the safe class at 80.7%, YOLOv8l for the unsafe class at 55.4%, and YOLOv8s for the dangerous class at 84.8%. For the Recall metric, YOLOv8m scored the highest for both the safe and unsafe classes at 66.3% and 68.8%, respectively, while YOLOv8n scored the highest for the dangerous class at 68.3%. For the mAP50 metric, YOLOv8s scored the highest for all classes, with 74.4% for safe, 61.7% for unsafe, and 76.2% for dangerous. For the mAP50:95 metric, YOLOv8m scored the highest for the safe class at 43.1%, while YOLOv8s scored the highest for both the unsafe and dangerous classes at 37.7% and 36.7%, respectively. (3) It should be noted that the depth and width of the network, as well as the ability to extract features, also have a significant impact on model performance. Although increasing the width and depth of the network significantly increases the number of model parameters and computational load, not all evaluation metrics increase with the width and depth of the network. Although the YOLOv8l model is the most complex, its Precision and mAP50 scores are not dominant, ranking fourth and third at 69.3% and 68.0%, respectively, while its Recall and mAP50:95 scores are the highest. Despite the YOLOv8n model having the fewest parameters and computational load, its mAP50 score also achieved a second-place ranking. Overall, YOLOv8m performed the best on the validation set, with the highest scores in Precision, mAP50, and mAP50:95, with Precision and mAP50 being 2.2% and 2.6% higher than the second place, respectively, and without much sacrifice in Recall, making it the best overall performer. (4) The detection accuracy of the four models for the unsafe class is lower than that of the other two levels. On the one hand, this is due to the imbalance in the number of samples (unsafe class samples are 68% of safe class samples and 59% of dangerous class samples). On the other hand, it is also due to certain limitations in the model's detection performance, such as vehicles in motion easily splashing water, leading to misjudgment of water levels; when the water level is just at the halfway point of the tire, the unsafe class is easily confused with the safe class, and when the water level is just at the exhaust outlet of the engine, it is easily confused with the dangerous class, thus the probability of correctly detecting the unsafe class is the lowest. To address this issue, future research will focus on targeted data augmentation techniques for sample processing to improve the model's recognition ability for unsafe class samples, and introduce attention mechanism modules into the model network structure to enhance the model's learning ability for key features.