Evaluating the Generalizability of Deepfake Detection Models: A Comparative Analysis of GAN and Diffusion-Based Generated Content

Hasan, Md. Tarek

dc.contributor.author	Hasan, Md. Tarek
dc.date.accessioned	2025-10-15T02:55:05Z
dc.date.available	2025-10-15T02:55:05Z
dc.date.issued	2025-08-02
dc.identifier.uri	http://dspace.uiu.ac.bd/handle/52243/3310
dc.description.abstract	The rapid advancement of deepfake technology has significantly increased the realism and accessibility of synthetic media. Emerging techniques such as diffusion-based models and Neural Radiance Fields (NeRF), along with improvements in traditional Genera- tive Adversarial Networks (GANs), have enabled the sophisticated generation of deepfake videos, posing growing threats to biometric security and trust. In parallel, detection meth- ods have advanced through innovations in Transformer-based architectures, contrastive learning, and other deep learning approaches. Yet, this progress continues to play out within a persistent cat-and-mouse dynamic between generation and detection. In this work, we present a comprehensive empirical evaluation of state-of-the-art deepfake detec- tion methods, alongside a human subject study focused on identifying deepfakes, using curated stimuli set generated by cutting-edge deepfake synthesis techniques. Unlike prior efforts, our study establishes a benchmark that captures the challenges posed by the lat- est generation methods in realistic settings. Our findings expose a critical vulnerability: both leading detection models and human evaluators struggle when confronted with high- quality, modern deepfakes. To address this gap, we introduce a multimodal detection framework that incorporates both audio and visual modalities, enhancing the robust- ness of detection systems in cross-modal scenarios. Our methodology includes evaluating performance across diverse conditions—such as different resolutions and clip lengths—and comparing unimodal versus multimodal fusion strategies. Extensive experimentation high- lights the urgent need to refine detection models to keep pace with rapidly evolving gener- ative techniques. By establishing a rigorous benchmark and revealing current limitations, our study offers a timely foundation for developing more robust and future-ready deep- fake detection systems. Our results demonstrate that incorporating the audio modality alongside video consistently improves detection performance, underscoring the value of multimodal analysis for robust generalization. Notably, the proposed multimodal frame- work—evaluated on the FakeAVCeleb and AV-Deepfake1M datasets—achieved superior performance across all tested conditions, with early fusion yielding the highest AUC and precision, and cross-modal attention demonstrating particular effectiveness under low- resolution scenarios.	en_US
dc.language.iso	en_US	en_US
dc.publisher	UIU	en_US
dc.subject	Deepfake Detection, Comparative Analysis, GAN, Diffusion	en_US
dc.title	Evaluating the Generalizability of Deepfake Detection Models: A Comparative Analysis of GAN and Diffusion-Based Generated Content	en_US
dc.type	Article	en_US

Files in this item

Name:: Thesis - Md. Tarek Hasan 0122410019 ...
Size:: 12.63Mb
Format:: PDF

View/Open

This item appears in the following Collection(s)

M.Sc Thesis/Project [166]

Show simple item record