Evaluating the Generalizability of Deepfake Detection Models: A Comparative Analysis of GAN and Diffusion-Based Generated Content

UIU Institutional Repository

    • Login
    View Item 
    •   UIU DSpace Home
    • School of Science and Engineering (SoSE)
    • Department of Computer Science and Engineering (CSE)
    • M.Sc Thesis/Project
    • View Item
    •   UIU DSpace Home
    • School of Science and Engineering (SoSE)
    • Department of Computer Science and Engineering (CSE)
    • M.Sc Thesis/Project
    • View Item
    JavaScript is disabled for your browser. Some features of this site may not work without it.

    Evaluating the Generalizability of Deepfake Detection Models: A Comparative Analysis of GAN and Diffusion-Based Generated Content

    Thumbnail
    View/Open
    Thesis - Md. Tarek Hasan 0122410019 - MSCSE (1).pdf (12.63Mb)
    Date
    2025-08-02
    Author
    Hasan, Md. Tarek
    Metadata
    Show full item record
    Abstract
    The rapid advancement of deepfake technology has significantly increased the realism and accessibility of synthetic media. Emerging techniques such as diffusion-based models and Neural Radiance Fields (NeRF), along with improvements in traditional Genera- tive Adversarial Networks (GANs), have enabled the sophisticated generation of deepfake videos, posing growing threats to biometric security and trust. In parallel, detection meth- ods have advanced through innovations in Transformer-based architectures, contrastive learning, and other deep learning approaches. Yet, this progress continues to play out within a persistent cat-and-mouse dynamic between generation and detection. In this work, we present a comprehensive empirical evaluation of state-of-the-art deepfake detec- tion methods, alongside a human subject study focused on identifying deepfakes, using curated stimuli set generated by cutting-edge deepfake synthesis techniques. Unlike prior efforts, our study establishes a benchmark that captures the challenges posed by the lat- est generation methods in realistic settings. Our findings expose a critical vulnerability: both leading detection models and human evaluators struggle when confronted with high- quality, modern deepfakes. To address this gap, we introduce a multimodal detection framework that incorporates both audio and visual modalities, enhancing the robust- ness of detection systems in cross-modal scenarios. Our methodology includes evaluating performance across diverse conditions—such as different resolutions and clip lengths—and comparing unimodal versus multimodal fusion strategies. Extensive experimentation high- lights the urgent need to refine detection models to keep pace with rapidly evolving gener- ative techniques. By establishing a rigorous benchmark and revealing current limitations, our study offers a timely foundation for developing more robust and future-ready deep- fake detection systems. Our results demonstrate that incorporating the audio modality alongside video consistently improves detection performance, underscoring the value of multimodal analysis for robust generalization. Notably, the proposed multimodal frame- work—evaluated on the FakeAVCeleb and AV-Deepfake1M datasets—achieved superior performance across all tested conditions, with early fusion yielding the highest AUC and precision, and cross-modal attention demonstrating particular effectiveness under low- resolution scenarios.
    URI
    http://dspace.uiu.ac.bd/handle/52243/3310
    Collections
    • M.Sc Thesis/Project [156]

    Copyright 2003-2017 United International University
    Contact Us | Send Feedback
    Developed by UIU CITS
     

     

    Browse

    All of DSpaceCommunities & CollectionsBy Issue DateAuthorsTitlesSubjectsThis CollectionBy Issue DateAuthorsTitlesSubjects

    My Account

    LoginRegister

    Copyright 2003-2017 United International University
    Contact Us | Send Feedback
    Developed by UIU CITS