Show simple item record

dc.contributor.authorHasan, Md. Tarek
dc.date.accessioned2025-10-15T02:55:05Z
dc.date.available2025-10-15T02:55:05Z
dc.date.issued2025-08-02
dc.identifier.urihttp://dspace.uiu.ac.bd/handle/52243/3310
dc.description.abstractThe rapid advancement of deepfake technology has significantly increased the realism and accessibility of synthetic media. Emerging techniques such as diffusion-based models and Neural Radiance Fields (NeRF), along with improvements in traditional Genera- tive Adversarial Networks (GANs), have enabled the sophisticated generation of deepfake videos, posing growing threats to biometric security and trust. In parallel, detection meth- ods have advanced through innovations in Transformer-based architectures, contrastive learning, and other deep learning approaches. Yet, this progress continues to play out within a persistent cat-and-mouse dynamic between generation and detection. In this work, we present a comprehensive empirical evaluation of state-of-the-art deepfake detec- tion methods, alongside a human subject study focused on identifying deepfakes, using curated stimuli set generated by cutting-edge deepfake synthesis techniques. Unlike prior efforts, our study establishes a benchmark that captures the challenges posed by the lat- est generation methods in realistic settings. Our findings expose a critical vulnerability: both leading detection models and human evaluators struggle when confronted with high- quality, modern deepfakes. To address this gap, we introduce a multimodal detection framework that incorporates both audio and visual modalities, enhancing the robust- ness of detection systems in cross-modal scenarios. Our methodology includes evaluating performance across diverse conditions—such as different resolutions and clip lengths—and comparing unimodal versus multimodal fusion strategies. Extensive experimentation high- lights the urgent need to refine detection models to keep pace with rapidly evolving gener- ative techniques. By establishing a rigorous benchmark and revealing current limitations, our study offers a timely foundation for developing more robust and future-ready deep- fake detection systems. Our results demonstrate that incorporating the audio modality alongside video consistently improves detection performance, underscoring the value of multimodal analysis for robust generalization. Notably, the proposed multimodal frame- work—evaluated on the FakeAVCeleb and AV-Deepfake1M datasets—achieved superior performance across all tested conditions, with early fusion yielding the highest AUC and precision, and cross-modal attention demonstrating particular effectiveness under low- resolution scenarios.en_US
dc.language.isoen_USen_US
dc.publisherUIUen_US
dc.subjectDeepfake Detection, Comparative Analysis, GAN, Diffusionen_US
dc.titleEvaluating the Generalizability of Deepfake Detection Models: A Comparative Analysis of GAN and Diffusion-Based Generated Contenten_US
dc.typeArticleen_US


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record