Design and Development of Semi-Automated Image Labeling and Annotation Software Utilizing Artificial Intelligence
Main Article Content
Abstract
This paper presents a semi-automated image labeling and annotation software system that leverages artificial intelligence to support the creation of training datasets for computer vision models. The system integrates the YOLOv8 model for automatic object detection, allowing users to adjust bounding boxes, write and edit image annotations, retrieve data, and download datasets. Additionally, the software offers user access control, data category management, batch image uploads, and flexible image-annotation querying. Featuring a user-friendly, cross-platform interface, the system is easy to deploy in educational and research environments. Experimental results demonstrate that it significantly reduces labeling time while maintaining high accuracy. The proposed solution provides an effective, practical approach to semi-automating training data preparation and also paves the way for future integration of automatic caption generation models.
Keywords
Gán nhãn ảnh, chú thích ngữ nghĩa, học sâu.
Article Details
References
2. Moon, Y. B., & Oh, T. H. (2024). Label-efficient learning methods for computer vision applications. IEIE Transactions on Smart Processing & Computing, 13(2), 120-128.
3. Osaid, M., & Memon, Z. A. (2022, September). A Survey On Image Captioning. In 2022 International Conference on Emerging Trends in Smart Technologies (ICETST) (pp. 1-6). IEEE.
4. Li, J., Li, D., Xiong, C., & Hoi, S. (2022, June). Blip: Bootstrapping language-image pre-training for unified vision-language understanding and generation. In International conference on machine learning (pp. 12888-12900). PMLR.
5. Wang, J., Yang, Z., Hu, X., Li, L., Lin, K., Gan, Z., ... & Wang, L. (2022). Git: A generative image-to-text transformer for vision and language. arXiv preprint arXiv:2205.14100.
6. Liu, L., Ouyang, W., Wang, X., Fieguth, P., Chen, J., Liu, X., & Pietikäinen, M. (2020). Deep learning for generic object detection: A survey. International journal of computer vision, 128, 261-318.
7. Adnan, M. M., Rahim, M. S. M., Al-Jawaheri, K., & Neamah, K. (2021, April). A review of methods for the image automatic annotation. In Journal of Physics: Conference Series (Vol. 1892, No. 1, p. 012002). IOP Publishing.
8. Redmon, J., Divvala, S., Girshick, R., & Farhadi, A. (2016). You only look once: Unified, real-time object detection. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 779-788).
9. Ren, S., He, K., Girshick, R., & Sun, J. (2015). Faster r-cnn: Towards real-time object detection with region proposal networks. Advances in neural information processing systems, 28.
10. Tan, M., Pang, R., & Le, Q. V. (2020). Efficientdet: Scalable and efficient object detection. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 10781-10790).
11. Vinyals, O., Toshev, A., Bengio, S., & Erhan, D. (2015). Show and tell: A neural image caption generator. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 3156-3164).
12. Xu, K., Ba, J., Kiros, R., Cho, K., Courville, A., Salakhudinov, R., ... & Bengio, Y. (2015, June). Show, attend and tell: Neural image caption generation with visual attention. In International conference on machine.
13. Alayrac, J. B., Donahue, J., Luc, P., Miech, A., Barr, I., Hasson, Y., ... & Simonyan, K. (2022). Flamingo: a visual language model for few-shot learning. Advances in neural information processing systems, 35, 23716-23736
14. Bui, D. C., Nguyen, N. H., & Nguyen, K. (2023). UIT-OpenViIC: A Novel Benchmark for Evaluating Image Captioning in Vietnamese. arXiv preprint arXiv:2305.04166.
15. Sohan, M., Sai Ram, T., & Rami Reddy, C. V. (2024). A review on yolov8 and its advancements. In International Conference on Data Intelligence and Cognitive Informatics (pp. 529-545). Springer, Singapore.