Abstract:
Objectives In addition to the extent of water accumulation, the depth of water accumulation is a more important criterion for measuring the severity of urban waterlogging. In response to the limitations of traditional urban waterlogging monitoring methods, such as high costs, limited monitoring range, and the ability to only monitor specific flood-prone areas, a model for assessing the depth grade of urban waterlogging using images captured by the public has been proposed.
Methods We use web crawling techniques with keywords like urban waterlogging and vehicles submerged to collect images and videos from the internet that include small vehicles in water, creating a sample set. Using the depth of vehicle submersion as a reference standard, we divide the waterlogging depth into three levels: Safe (0,20 cm), unsafe ((20,60 cm), and dangerous (>60 cm), and create corresponding labels for these levels. We apply the YOLOv8 model to construct four models of different network complexities that are YOLOv8l, YOLOv8m, YOLOv8n, and YOLOv8s. During training, we use a confusion matrix and four accuracy metrics that are precision, recall, average precision (AP), and mean AP (mAP) to comprehensively evaluate the model's accuracy.
Results (1) For the training set, all four YOLOv8 models converge after about 40 training epochs, with precision and mAP50 values stabilizing around 60%~70%, recall values around 50%~60%, and mAP50: 95 values around 35%. (2) In the test set, for the precision metric, YOLOv8m scores the highest for the safe class at 80.7%, YOLOv8l for the unsafe class at 55.4%, and YOLOv8s for the dangerous class at 84.8%. For the recall metric, YOLOv8m scores the highest for both the safe and unsafe classes at 66.3% and 68.8%, respectively, while YOLOv8n scores the highest for the dangerous class at 68.3%. For the mAP50 metric, YOLOv8s scores the highest for all classes, with 74.4% for safe, 61.7% for unsafe, and 76.2% for dangerous. For the mAP50: 95 metric, YOLOv8m scores the highest for the safe class at 43.1%, while YOLOv8s scores the highest for both the unsafe and dangerous classes at 37.7% and 36.7%, respectively.
Conclusions (1) The depth and width of the network, as well as feature extraction capability, have a significant impact on model performance. Although increasing network width and depth substantially increases the number of model parameters and computational load, not all evaluation metrics improve with the increase in network width and depth. Although the YOLOv8l model is the most complex, its precision and mAP50 scores are not superior, ranking fourth (69.3%) and third (68.0%), respectively. Although the YOLOv8n model has the fewest parameters and computational load, its mAP50 score still achieves second place. Overall, YOLOv8m performs the best on the validation set, achieving the highest scores in precision, mAP50, and mAP50: 95, with precision and mAP50 being 2.2% and 2.6% higher than the second-ranked model, respectively, without significant sacrifice in recall, making it the optimal model in terms of comprehensive performance. (2) In the validation set, the detection accuracy of the four models for the unsafe class is lower than that for the other two classes. On one hand, this is due to the imbalance in sample sizes. On the other hand, it is also due to certain limitations in the model's detection performance. To address this issue, future research will focus on employing targeted data augmentation techniques for sample processing to improve the model's recognition capability for unsafe class samples, and introducing attention mechanism modules into the model's network structure to enhance its ability to learn key features.