EVALUATING THE EFFECTIVENESS OF FEATURE FUSION–BASED OBJECT DETECTION MODELS IN UAV IMAGERY

Dũng Nguyễn1, , Nguyen Ngoc Thuy Nguyen2, Bui Luong Vu Ngoc Bui Luong Vu Ngoc3
1 Trưởng Đại học Khoa học, Đại học Huế
2 Hong Duc University
3 Phân hiệu Trường Đại học Y Hà Nội tại tỉnh Thanh Hoá

Main Article Content

Abstract

Object detection from the UAV perspective has attracted increasing attention due to its importance in applications such as traffic monitoring, smart agriculture, and environmental observation. However, UAV imagery often contains small, densely distributed objects with frequent occlusions and complex backgrounds, posing significant challenges. This paper conducts a comprehensive survey and experimental evaluation of modern object detection models based on CNNs and Transformers in UAV scenarios, using the VisDrone2019, TinyPerson, and HIT-UAV benchmarks. The results reveal a clear trade-off between detection accuracy and computational cost, while recent approaches such as adaptive attention mechanisms and multi-scale feature pyramid architectures demonstrate strong potential for achieving a favorable balance between performance and efficiency in UAV deployment.

Article Details

References

[1] D. Du et al. (2019), VisDrone-DET2019: The vision meets drone object detection in image challenge results, Proceedings of the IEEE/CVF international conference on computer vision workshops, pp. 0-0.
[2] T.-Y. Lin et al. (2014), Microsoft coco: Common objects in context, Computer Vision–ECCV 2014: 13th European Conference, Zurich, Switzerland, September 6-12, 2014, Proceedings, Part V 13, Springer, pp. 740-755.
[3] J. Redmon, S. Divvala, R. Girshick, and A. Farhadi (2016), You only look once: Unified, real-time object detection, Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 779-788.
[4] G. J. a. A. C. a. J. Qiu (2013), Ultralytics YOLOv8, [Online]. Available: https://github.com/ultralytics/ultralytics.
[5] A. Wang, H. Chen, L. Liu, K. Chen, Z. Lin, J. Han, and G. Ding (2024), Yolov10: Real-time end-to-end object detection, Advances in Neural Information Processing Systems, vol. 37, no. 107984-108011.
[6] R. Khanam and M. Hussain (2024), Yolov11: An overview of the key architectural enhancements, arXiv preprint arXiv:2410.17725.
[7] S. Ren, K. He, R. Girshick, and J. Sun (2015), Faster r-cnn: Towards real-time object detection with region proposal networks, Advances in neural information processing systems, vol. 28.
[8] Z. Cai and N. Vasconcelos (2018), Cascade r-cnn: Delving into high quality object detection, Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 6154-6162.
[9] N. Carion, F. Massa, G. Synnaeve, N. Usunier, A. Kirillov, and S. Zagoruyko (2020), End-to-end object detection with transformers, European conference on computer vision, Springer, pp. 213-229.
[10] X. Zhu, W. Su, L. Lu, B. Li, X. Wang, and J. Dai (2020), Deformable detr: Deformable transformers for end-to-end object detection, arXiv preprint arXiv:2010.04159.
[11] Y. Zhao et al. (2024), Detrs beat yolos on real-time object detection, Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 16965-16974.
[12] T.-Y. Lin, P. Dollár, R. Girshick, K. He, B. Hariharan, and S. Belongie (2017), Feature pyramid networks for object detection, Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 2117-2125.
[13] K. He, G. Gkioxari, P. Dollár, and R. Girshick (2017), Mask r-cnn, Proceedings of the IEEE international conference on computer vision, pp. 2961-2969.
[14] T.-Y. Lin, P. Goyal, R. Girshick, K. He, and P. Dollár (2017), Focal loss for dense object detection, Proceedings of the IEEE international conference on computer vision, pp. 2980-2988.
[15] B. Zhang and Y. Zhang (2025), UAV Small Object Detection Algorithm Based on Dynamic Feature Aggregation and Hierarchical Attention Mechanism, IEEE Access.
[16] M. Tan, R. Pang, and Q. V. Le (2020), Efficientdet: Scalable and efficient object detection, Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 10781-10790.
[17] J. Suo, T. Wang, X. Zhang, H. Chen, W. Zhou, and W. Shi (2023), HIT-UAV: A high-altitude infrared thermal dataset for Unmanned Aerial Vehicle-based object detection, Scientific Data, vol. 10, no. 1, p. 227.
[18] M. Chao, C. Peng, L. Yun, C. Zhang, H. Wang, and Z. Chen (2025), A lightweight small object detection model for UAV images based on deep semantic integration, Scientific Reports, vol. 15, no. 1, p. 31888.
[19] D. Nguyen, V. -D. Hoang and V. -T. -L. Le (2025), A Lightweight Multi-Scale Attention Model for Small Object Detection in UAV Imagery, IEEE Access, doi: 10.1109/ACCESS.2026.3656179.
[20] X. Yu, Y. Gong, N. Jiang, Q. Ye, and Z. Han (2020), Scale match for tiny person detection, Proceedings of the IEEE/CVF winter conference on applications of computer vision, pp. 1257-1265.