Design and Development of Semi-Automated Image Labeling and Annotation Software Utilizing Artificial Intelligence

Cong Hoang Anh1, Dinh Cong Nguyen , Hoang Long Nguyen2, The Anh Pham2, Sam Le Van3
1 Trường Đại học Văn Hoá - Thể thao - Du lịch Thanh Hoá
2 Hong Duc University
3 Phân Hiệu trường Đại học Y Hà Nội, Thanh Hoá

Main Article Content

Abstract

This paper presents a semi-automated image labeling and annotation software system that leverages artificial intelligence to support the creation of training datasets for computer vision models. The system integrates the YOLOv8 model for automatic object detection, allowing users to adjust bounding boxes, write and edit image annotations, retrieve data, and download datasets. Additionally, the software offers user access control, data category management, batch image uploads, and flexible image-annotation querying. Featuring a user-friendly, cross-platform interface, the system is easy to deploy in educational and research environments. Experimental results demonstrate that it significantly reduces labeling time while maintaining high accuracy. The proposed solution provides an effective, practical approach to semi-automating training data preparation and also paves the way for future integration of automatic caption generation models.

Article Details

References

1. Lin, T. Y., Maire, M., Belongie, S., Hays, J., Perona, P., Ramanan, D., ... & Zitnick, C. L. (2014). Microsoft coco: Common objects in context. In Computer vision–ECCV 2014: 13th European conference, zurich, Switzerland, September 6-12, 2014, proceedings, part v 13 (pp. 740-755). Springer International Publishing.
2. Moon, Y. B., & Oh, T. H. (2024). Label-efficient learning methods for computer vision applications. IEIE Transactions on Smart Processing & Computing, 13(2), 120-128.
3. Osaid, M., & Memon, Z. A. (2022, September). A Survey On Image Captioning. In 2022 International Conference on Emerging Trends in Smart Technologies (ICETST) (pp. 1-6). IEEE.
4. Li, J., Li, D., Xiong, C., & Hoi, S. (2022, June). Blip: Bootstrapping language-image pre-training for unified vision-language understanding and generation. In International conference on machine learning (pp. 12888-12900). PMLR.
5. Wang, J., Yang, Z., Hu, X., Li, L., Lin, K., Gan, Z., ... & Wang, L. (2022). Git: A generative image-to-text transformer for vision and language. arXiv preprint arXiv:2205.14100.
6. Liu, L., Ouyang, W., Wang, X., Fieguth, P., Chen, J., Liu, X., & Pietikäinen, M. (2020). Deep learning for generic object detection: A survey. International journal of computer vision, 128, 261-318.
7. Adnan, M. M., Rahim, M. S. M., Al-Jawaheri, K., & Neamah, K. (2021, April). A review of methods for the image automatic annotation. In Journal of Physics: Conference Series (Vol. 1892, No. 1, p. 012002). IOP Publishing.
8. Redmon, J., Divvala, S., Girshick, R., & Farhadi, A. (2016). You only look once: Unified, real-time object detection. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 779-788).
9. Ren, S., He, K., Girshick, R., & Sun, J. (2015). Faster r-cnn: Towards real-time object detection with region proposal networks. Advances in neural information processing systems, 28.
10. Tan, M., Pang, R., & Le, Q. V. (2020). Efficientdet: Scalable and efficient object detection. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 10781-10790).
11. Vinyals, O., Toshev, A., Bengio, S., & Erhan, D. (2015). Show and tell: A neural image caption generator. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 3156-3164).
12. Xu, K., Ba, J., Kiros, R., Cho, K., Courville, A., Salakhudinov, R., ... & Bengio, Y. (2015, June). Show, attend and tell: Neural image caption generation with visual attention. In International conference on machine.
13. Alayrac, J. B., Donahue, J., Luc, P., Miech, A., Barr, I., Hasson, Y., ... & Simonyan, K. (2022). Flamingo: a visual language model for few-shot learning. Advances in neural information processing systems, 35, 23716-23736
14. Bui, D. C., Nguyen, N. H., & Nguyen, K. (2023). UIT-OpenViIC: A Novel Benchmark for Evaluating Image Captioning in Vietnamese. arXiv preprint arXiv:2305.04166.
15. Sohan, M., Sai Ram, T., & Rami Reddy, C. V. (2024). A review on yolov8 and its advancements. In International Conference on Data Intelligence and Cognitive Informatics (pp. 529-545). Springer, Singapore.