Deepfake detector (XceptionNet) on Fake Video | Deepfake detector (XceptionNet) on Adversarial Fake Video |
Deepfakes or facially manipulated videos, can be used maliciously to spread disinformation, harass individuals or defame famous personalities. Recently developed Deepfake detection methods rely on Convolutional Neural Network (CNN) based classifiers to distinguish AI-generated fake videos from real videos. In this work, we demonstrate that it is possible to bypass such detectors by adversarially modifying fake videos synthesized using existing Deepfake generation methods. We design adversarial examples for the FaceForensics++ dataset to fool Deepfake detectors.
We propose attacks which target Deepfake detectors that rely on CNN based classification models. The victim detectors used in our experiments, work on the frame level and classify each frame independently as either Real or Fake using the following two-step pipeline:
|
In this setting, we assume the attacker has complete knowledge of the detector model's architecture and parameters. We use iterative gradient sign based attacks to craft adversarial examples in this setting. We use Expectation Over Transforms in our robust white box attack to craft adversarial videos that are robust to video and image compression codecs. Following are some example videos of our white box attacks on XceptionNet.
Fake (From dataset) | White-box | Robust White-box |
In this setting, we assume the attacker has the knowledge of the detector pipeline structure but can only query the classification CNN as a black-box to obtain the probability of the frame being real or fake. We use Natural Evolution Strategy (NES) for estimating the gradient of output probabilities with respect to the input to craft adversarial examples in this black-box setting. Similar to the white-box setting, we craft adversarial examples that are robust to compression by ensuring robustness to input transformations during training. Following are some example videos of our black box attacks on XceptionNet.
Fake (From dataset) | Black-box | Robust Black-box |