ĐÁNH GIÁ HIỆU QUẢ CÁC MÔ HÌNH PHÁT HIỆN ĐỐI TƯỢNG DỰA TRÊN CƠ CHẾ HỢP NHẤT ĐẶC TRƯNG TRONG BỐI CẢNH ẢNH UAV

Nguyễn Dũng; Nguyễn Ngọc Thủy; Bùi Lương Vũ Ngọc

doi:10.70117/hdujs.84.2.2026.1146

pdf

Số xuất bản: Số 84-02.2026: Chuyên ngành Khoa học Tự nhiên, Kỹ thuật và Công nghệ

Chuyên mục: Khoa học Tự nhiên, Kỹ thuật và Công nghệ

DOI: 10.70117/hdujs.84.2.2026.1146

Ngày xuất bản: 25/03/2026

Lượt xem 4

Lượt tải xuống 1

Trích dẫn bài báo

Nguyễn, D., Nguyễn, N. T., & Bùi, L. V. N. (2026). ĐÁNH GIÁ HIỆU QUẢ CÁC MÔ HÌNH PHÁT HIỆN ĐỐI TƯỢNG DỰA TRÊN CƠ CHẾ HỢP NHẤT ĐẶC TRƯNG TRONG BỐI CẢNH ẢNH UAV. Tạp chí Khoa học Trường Đại học Hồng Đức, 84(2), 20-29. https://doi.org/10.70117/hdujs.84.2.2026.1146

Định dạng trích dẫn:

ĐÁNH GIÁ HIỆU QUẢ CÁC MÔ HÌNH PHÁT HIỆN ĐỐI TƯỢNG DỰA TRÊN CƠ CHẾ HỢP NHẤT ĐẶC TRƯNG TRONG BỐI CẢNH ẢNH UAV

Nguyễn Dũng^1,, Nguyễn Ngọc Thủy², Bùi Lương Vũ Ngọc³
¹ Trưởng Đại học Khoa học, Đại học Huế
² Trường Đại học Khoa học, Đại học Huế
³ Phân hiệu Trường Đại học Y Hà Nội tại tỉnh Thanh Hoá

Tóm tắt

Phát hiện đối tượng từ góc nhìn UAV ngày càng được quan tâm nhờ các ứng dụng như giám sát giao thông, nông nghiệp thông minh và quan sát môi trường. Tuy nhiên, ảnh UAV thường chứa các đối tượng nhỏ, mật độ cao, bị che khuất và có nền phức tạp, gây nhiều thách thức. Bài báo này khảo sát và đánh giá thực nghiệm các mô hình phát hiện đối tượng hiện đại dựa trên CNN và Transformer trong kịch bản UAV, trên các bộ dữ liệu VisDrone2019, TinyPerson và HIT-UAV. Kết quả cho thấy sự đánh đổi rõ rệt giữa độ chính xác và chi phí tính toán, đồng thời các phương pháp chú ý thích ứng và tháp đặc trưng đa mức cho thấy tiềm năng cân bằng hiệu năng và hiệu quả cho triển khai UAV.

Từ khóa

UAV, phát hiện đối tượng, học sâu, cơ chế chú ý, tháp đặc trưng đa mức

Tài liệu tham khảo

[1] D. Du et al. (2019), VisDrone-DET2019: The vision meets drone object detection in image challenge results, in Proceedings of the IEEE/CVF international conference on computer vision workshops, pp.213-226.
[2] T.-Y. Lin et al. (2014), Microsoft coco: Common objects in context, in Computer Vision–ECCV 2014: 13th European Conference, Zurich, Switzerland, September 6-12, 2014, Proceedings, Part V 13, Springer, pp.740-755.
[3] J. Redmon, S. Divvala, R. Girshick, and A. Farhadi (2016), You only look once: Unified, real-time object detection, in Proceedings of the IEEE conference on computer vision and pattern recognition, pp.779-788.
[4] G. J. a. A. C. a. J. Qiu (2023), Ultralytics YOLOv8, Available: https://github.com/ultralytics/ultralytics.
[5] A. Wang, H. Chen, L. Liu, K. Chen, Z. Lin, J. Han, and G. Ding (2024), Yolov10: Real-time end-to-end object detection, Advances in Neural Information Processing Systems, vol. 37, pp.107984-108011.
[6] R. Khanam and M. Hussain (2024), Yolov11: An overview of the key architectural enhancements, arXiv preprint arXiv:2410.17725.
[7] S. Ren, K. He, R. Girshick, and J. Sun (2015), Faster r-cnn: Towards real-time object detection with region proposal networks, Advances in neural information processing systems, vol. 28.
[8] Z. Cai and N. Vasconcelos (2018), Cascade r-cnn: Delving into high quality object detection, in Proceedings of the IEEE conference on computer vision and pattern recognition, pp.6154-6162.
[9] N. Carion, F. Massa, G. Synnaeve, N. Usunier, A. Kirillov, and S. Zagoruyko (2020), End-to-end object detection with transformers, in European conference on computer vision, Springer, pp.213-229.
[10] X. Zhu, W. Su, L. Lu, B. Li, X. Wang, and J. Dai (2020), Deformable detr: Deformable transformers for end-to-end object detection, arXiv preprint arXiv:2010.04159.
[11] Y. Zhao et al. (2024), Detrs beat yolos on real-time object detection, in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 16965-16974.
[12] T.-Y. Lin, P. Dollár, R. Girshick, K. He, B. Hariharan, and S. Belongie (2017), Feature pyramid networks for object detection, in Proceedings of the IEEE conference on computer vision and pattern recognition, pp.2117-2125.
[13] K. He, G. Gkioxari, P. Dollár, and R. Girshick (2017), Mask r-cnn, in Proceedings of the IEEE international conference on computer vision, pp.2961-2969.
[14] T.-Y. Lin, P. Goyal, R. Girshick, K. He, and P. Dollár (2017), Focal loss for dense object detection, in Proceedings of the IEEE international conference on computer vision, pp.2980-2988.
[15] B. Zhang and Y. Zhang (2025), UAV Small Object Detection Algorithm Based on Dynamic Feature Aggregation and Hierarchical Attention Mechanism, IEEE Access.
[16] M. Tan, R. Pang, and Q. V. Le (2020), Efficientdet: Scalable and efficient object detection, in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp.10781-10790.
[17] J. Suo, T. Wang, X. Zhang, H. Chen, W. Zhou, and W. Shi (2023), HIT-UAV: A high-altitude infrared thermal dataset for Unmanned Aerial Vehicle-based object detection, Scientific Data, 10(1), pp.227.
[18] M. Chao, C. Peng, L. Yun, C. Zhang, H. Wang, and Z. Chen (2025), A lightweight small object detection model for UAV images based on deep semantic integration, Scientific Reports, 15(1), pp. 31888.
[19] X. Yu, Y. Gong, N. Jiang, Q. Ye, and Z. Han (2020), Scale match for tiny person detection, in Proceedings of the IEEE/CVF winter conference on applications of computer vision, pp.1257-1265.

Thanh bên bài viết

Nội dung chính của bài viết

Tóm tắt

Từ khóa

Chi tiết bài viết

Tài liệu tham khảo