A Transformer-Based Approach for Video Deepfake Detection

Van Hao Le , Tran Doan Minh Tran1, TrinhThiHop TrinhThiHop1
1 Hong Duc University

Main Article Content

Abstract

The rapid advancement of Deepfake technology has made it possible to generate videos with realistic facial manipulations, raising serious concerns about the authenticity of digital content. This work proposes HDU-DFNet, a Transformer-based deep learning model designed for the automatic detection of Deepfake videos. Experimental results indicate that the model achieves superior accuracy and generalization performance when compared with conventional CNN architectures, such as ResNet50. The model effectively identifies fine-grained facial inconsistencies and blending artifacts arising from face-swapping operations. In addition, interpretability analyses are applied to clarify the model’s reasoning process, highlighting key facial regions associated with forgery detection.

Article Details

References

[1] Huyền, N. T. (2023). Tìm hiểu về một số phương pháp phát hiện ra Deepfake trong
Deep learning, Journal of educational equipment: Applied research, 2(299).
[2] Yan, Z., Zhang, Y., Yuan, X., Lyu, S., & Wu, B. D. (2023), A comprehensive benchmark of deepfake detection, arXiv preprint arXiv:2307.01426.
[3] Tuan, L. M., Manh, P. T., & Linh, D. T. T. (2023), Deepfake detection based on deep learning, TNU Journal of Science and Technology, 228(15): 88 - 95.
[4] Dolhansky, B., Bitton, J., Pflaum, B., Lu, J., Howes, R., Wang, M., & Ferrer, C. C.
(2020). The deepfake detection challenge (dfdc) dataset, arXiv preprint
arXiv:2006.07397.
[5] Altuncu, E., Franqueira, V., & Li, S. (2022). Deepfake: Definitions, performance metrics and standards, datasets and benchmarks, and a meta-review. arXiv. org.
[6] Pei, G., Zhang, J., Hu, M., Zhang, Z., Wang, C., Wu, Y., ... & Tao, D. (2024). Deepfake generation and detection: A benchmark and survey, arXiv preprint arXiv:2403.17881.
[7] Nguyen, T. T., Nguyen, Q. V. H., Nguyen, D. T., Nguyen, D. T., Huynh-The, T., Nahavandi, S., ... & Nguyen, C. M. (2022). Deep learning for deepfakes creation and detection: A survey. Computer Vision and Image Understanding, 223, 103525.
[8] Li, Y., & Lyu, S. (2018). Exposing deepfake videos by detecting face warping artifacts. arXiv preprint arXiv:1811.00656.
[9] Afchar, D., Nozick, V., Yamagishi, J., & Echizen, I. (2018, December). Mesonet: a compact facial video forgery detection network. IEEE international workshop on information forensics and security (WIFS) (pp. 1-7).
[10] Kwon, P., You, J., Nam, G., Park, S., & Chae, G. (2021). Kodf: A large-scale korean deepfake detection dataset, IEEE/CVF international conference on computer vision (pp. 10744-10753).
[11] Ni, Y., Meng, D., Yu, C., Quan, C., Ren, D., & Zhao, Y. (2022). Core: Consistent representation learning for face forgery detection . IEEE/CVF conference on computer vision and pattern recognition (pp. 12-21).
[12] Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., & Houlsby, N. (2020). An image is worth 16x16 words: Transformers for image recognition at scale. arXiv preprint arXiv:2010.11929.
[13] Rossler, A., Cozzolino, D., Verdoliva, L., Riess, C., Thies, J., & NieBner, M. (2019). Faceforensics++: Learning to detect manipulated facial images. EEE/CVF international conference on computer vision (pp. 1-11).
[14] X., “140k real and fake faces,” 2020. [Online]. Available:
https://www.kaggle.com/datasets/xhlulu/140k-real-and-fake-faces
[15] Li, Y., Yang, X., Sun, P., Qi, H., & Lyu, S. (2020). Celeb-df: A large-scale challenging dataset for deepfake forensics. IEEE/CVF conference on computer vision and pattern recognition (pp. 3207-3216).
[16] Le, T. N., Nguyen, H. H., Yamagishi, J., & Echizen, I. (2021). Openforensics: Large- scale challenging dataset for multi-face forgery detection and segmentation in-the- wild, International conference on computer vision - ICCV (pp. 10117-10127).