Multi-Layer Fusion Neural Network for Deepfake Detection

Multi-Layer Fusion Neural Network for Deepfake Detection

Zheng Zhao, Penghui Wang, Wei Lu
Copyright: © 2021 |Pages: 14
DOI: 10.4018/IJDCF.20210701.oa3
Article PDF Download
Open access articles are freely available for download

Abstract

Recently, the spread of videos forged by deepfake tools has been widely concerning, and effective ways for detecting them are urgently needed. It is known that such artificial intelligence-aided forgery makes at least three levels of artifacts, which can be named as microcosmic or statistical features, mesoscopic features, and macroscopic or semantic features. However, existing detection methods have not been designed to exploited them all. This work proposes a new approach to more effective detection of deepfake videos. A multi-layer fusion neural network (MFNN) has been designed to capture the artifacts in different levels. Features maps output from specially designed shallow, middle, and deep layers, which are used as statistical, mesoscopic, and semantic features, respectively, are fused together before classification. FaceForensic++ dataset was used to train and test the method. The experimental results show that MFNN outperforms other relevant methods. Particularly, it demonstrates more advantage in detecting low-quality deepfake videos.
Article Preview
Top

Introduction

Human face is the most significant identity of human beings. Nowadays, digital videos with human faces are widely used in many serious occasions such as court evidence and news report. Apparently, the validity of them depends on the fact that it is infeasible to forge the faces.

For a long period of time, forging human faces in video has been considered as a time consuming and expensive task. However, the situation has been changed recently. With the development of neural network-based methods like deep learning, more and more new techniques which can support facial tampering and face swap begin to emerge. Based on convolutional autoencoders (Bengio, Lamblin, Popovici, & Larochelle, 2007) and generative adversarial network (GAN) (Goodfellow, Pouget-Abadie, Mirza, Xu, Warde-Farley, Ozair, Courville, & Bengio, 2014), the most famous face manipulation tools under the name of Deepfake (Korshunova, Shi, Dambre, & Theis, 2017; Faceswap Project, 2018; Faceswap-GAN Project, 2018) can replace a human being face in video with a face belongs to anybody else in an easy but effective way. Although the face swapping in video has also been implemented with the methods based on computer graphics, such as Face2Face (Thies, Zollhofer, Stamminger, Theobalt, & Nießner, 2016) and FaceSwap (Kowalski, 2016), Deepfake tools are widely considered as more promising. Moreover, Deepfake technologies have been used by the face forgery software tools designed for common people, such as FaceApp and Deepfakesapp. These tools, running on either personal computers or on smart phones, have friendly interfaces to guide the people without professional training to forge video faces with a convincing effect. As a result, more and more Deepfake videos have emerged on social networks, and the side effect of them has made such technology a worldwide concern. Apparently, effective ways of detecting them are urgently needed.

To detect Deepfake videos, some methods have been proposed by recognizing the forgery features. Under the traditional framework of pattern recognition, the semantic features such as inconsistent head poses (Yang, Li, & Lyu, 2019), color anomalies (Li, Li, Tan, & Huang, 2019; Mccloskey & Albright, 2018), color difference between left and right eyes, shading artifacts, and reflection detail missing in eyes (Matern, Riess, & Stamminger, 2019), have been extracted and classified. In recent two years, deep learning-based methods have been more used. Li and Lyu (2018) proposed a deep learning network to detect the artifacts resulting from face warping transform. Li, Chang, and Lyu (2018) adopted convolutional neural network (CNN) and long short-term memory (LSTM) (Hochreiter, & Schmidhuber, 1997) to detect the anomalies of eye blinking. In fact, LSTM is a typical recurrent neural net-works (RNN) used for learning the feature sequence extracted by CNN over each frame. Similarly, Güera and Delp (2018) utilized the CNN named Inception v3 (Szegedy, Vanhoucke, Ioffe, Shlens, & Wojna, 2016) and LSTM (Hochreiter, & Schmidhuber, 1997) to detect the anomalies within and between frames respectively. And Afchar, Nozick, Yamagishi and Echizen (2018) designed the network named MesoNet to detect the so-called mesoscopic features which are considered as the middle-level features between the semantic and statistical ones. More famous neural networks which were previously used for image classification and image forgery detection tasks, such as Xception (Chollet, 2017) and MISLnet (Bayar & Stamm, 2018), have also been applied to detection of Deepfake videos (Rössler, Cozzolino, Verdoliva, Riess, Thies, & Nießner, 2019).

Complete Article List

Search this Journal:
Reset
Volume 16: 1 Issue (2024)
Volume 15: 1 Issue (2023)
Volume 14: 3 Issues (2022)
Volume 13: 6 Issues (2021)
Volume 12: 4 Issues (2020)
Volume 11: 4 Issues (2019)
Volume 10: 4 Issues (2018)
Volume 9: 4 Issues (2017)
Volume 8: 4 Issues (2016)
Volume 7: 4 Issues (2015)
Volume 6: 4 Issues (2014)
Volume 5: 4 Issues (2013)
Volume 4: 4 Issues (2012)
Volume 3: 4 Issues (2011)
Volume 2: 4 Issues (2010)
Volume 1: 4 Issues (2009)
View Complete Journal Contents Listing