PHÁT TRIỂN PHẦN MỀM GÁN NHÃN VÀ CHÚ THÍCH ẢNH BÁN TỰ ĐỘNG ỨNG DỤNG TRÍ TUỆ NHÂN TẠO

Cong Hoang Anh; Dinh Cong Nguyen; Hoang Long Nguyen; The Anh Pham; Sam Le Van

doi:10.70117/hdujs.79.09.2025.977

pdf

Date Published: 20/11/2025

Abstract Views: 7
Views pdf: 2

DOI: 10.70117/hdujs.79.09.2025.977

Issue

Số 79-09.2025: Khoa học Tự nhiên, Kỹ thuật và Công nghệ

Section

Khoa học Tự nhiên và Công nghệ

How to Cite

Hoang Anh, C., Nguyen, D. C., Nguyen, H. L., Pham, T. A., & Le Van, S. (2025). Design and Development of Semi-Automated Image Labeling and Annotation Software Utilizing Artificial Intelligence. Hong Duc University Journal of Science, 79(09), 5-12. https://doi.org/10.70117/hdujs.79.09.2025.977

Citation format:

Design and Development of Semi-Automated Image Labeling and Annotation Software Utilizing Artificial Intelligence

Cong Hoang Anh¹, Dinh Cong Nguyen, Hoang Long Nguyen², The Anh Pham², Sam Le Van³
¹ Trường Đại học Văn Hoá - Thể thao - Du lịch Thanh Hoá
² Hong Duc University
³ Phân Hiệu trường Đại học Y Hà Nội, Thanh Hoá

Abstract

This paper presents a semi-automated image labeling and annotation software system that leverages artificial intelligence to support the creation of training datasets for computer vision models. The system integrates the YOLOv8 model for automatic object detection, allowing users to adjust bounding boxes, write and edit image annotations, retrieve data, and download datasets. Additionally, the software offers user access control, data category management, batch image uploads, and flexible image-annotation querying. Featuring a user-friendly, cross-platform interface, the system is easy to deploy in educational and research environments. Experimental results demonstrate that it significantly reduces labeling time while maintaining high accuracy. The proposed solution provides an effective, practical approach to semi-automating training data preparation and also paves the way for future integration of automatic caption generation models.

Keywords

Gán nhãn ảnh, chú thích ngữ nghĩa, học sâu.

References

1. Lin, T. Y., Maire, M., Belongie, S., Hays, J., Perona, P., Ramanan, D., ... & Zitnick, C. L. (2014). Microsoft coco: Common objects in context. In Computer vision–ECCV 2014: 13th European conference, zurich, Switzerland, September 6-12, 2014, proceedings, part v 13 (pp. 740-755). Springer International Publishing.
2. Moon, Y. B., & Oh, T. H. (2024). Label-efficient learning methods for computer vision applications. IEIE Transactions on Smart Processing & Computing, 13(2), 120-128.
3. Osaid, M., & Memon, Z. A. (2022, September). A Survey On Image Captioning. In 2022 International Conference on Emerging Trends in Smart Technologies (ICETST) (pp. 1-6). IEEE.
4. Li, J., Li, D., Xiong, C., & Hoi, S. (2022, June). Blip: Bootstrapping language-image pre-training for unified vision-language understanding and generation. In International conference on machine learning (pp. 12888-12900). PMLR.
5. Wang, J., Yang, Z., Hu, X., Li, L., Lin, K., Gan, Z., ... & Wang, L. (2022). Git: A generative image-to-text transformer for vision and language. arXiv preprint arXiv:2205.14100.
6. Liu, L., Ouyang, W., Wang, X., Fieguth, P., Chen, J., Liu, X., & Pietikäinen, M. (2020). Deep learning for generic object detection: A survey. International journal of computer vision, 128, 261-318.
7. Adnan, M. M., Rahim, M. S. M., Al-Jawaheri, K., & Neamah, K. (2021, April). A review of methods for the image automatic annotation. In Journal of Physics: Conference Series (Vol. 1892, No. 1, p. 012002). IOP Publishing.
8. Redmon, J., Divvala, S., Girshick, R., & Farhadi, A. (2016). You only look once: Unified, real-time object detection. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 779-788).
9. Ren, S., He, K., Girshick, R., & Sun, J. (2015). Faster r-cnn: Towards real-time object detection with region proposal networks. Advances in neural information processing systems, 28.
10. Tan, M., Pang, R., & Le, Q. V. (2020). Efficientdet: Scalable and efficient object detection. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 10781-10790).
11. Vinyals, O., Toshev, A., Bengio, S., & Erhan, D. (2015). Show and tell: A neural image caption generator. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 3156-3164).
12. Xu, K., Ba, J., Kiros, R., Cho, K., Courville, A., Salakhudinov, R., ... & Bengio, Y. (2015, June). Show, attend and tell: Neural image caption generation with visual attention. In International conference on machine.
13. Alayrac, J. B., Donahue, J., Luc, P., Miech, A., Barr, I., Hasson, Y., ... & Simonyan, K. (2022). Flamingo: a visual language model for few-shot learning. Advances in neural information processing systems, 35, 23716-23736
14. Bui, D. C., Nguyen, N. H., & Nguyen, K. (2023). UIT-OpenViIC: A Novel Benchmark for Evaluating Image Captioning in Vietnamese. arXiv preprint arXiv:2305.04166.
15. Sohan, M., Sai Ram, T., & Rami Reddy, C. V. (2024). A review on yolov8 and its advancements. In International Conference on Data Intelligence and Cognitive Informatics (pp. 529-545). Springer, Singapore.

Article Sidebar

Main Article Content

Abstract

Keywords

Article Details

References