Face Sketch-to-Photo Translation Using Generative Adversarial Networks

Document Type : Research Article

Authors

Department of Computer Engineering, Amirkabir University of Technology, Tehran, Iran

Abstract

Translating face sketches to photo-realistic faces is an interesting and essential task in many applications like law enforcement and the digital entertainment industry. One of the most important challenges of this task is the inherent differences between the sketch and the real image such as the lack of color and details of the skin tissue in the sketch. With the advent of adversarial generative models, an increasing number of methods have been proposed for sketch-to-image synthesis. However, these models still suffer from limitations such as the large number of paired data required for training, the low resolution of the produced images, or the unrealistic appearance of the generated images. In this paper, we propose a method for converting an input facial sketch to a colorful photo without the need for any paired dataset. To do so, we use a pre-trained face photo generating model to synthesize high-quality natural face photos and employ an optimization procedure to keep high fidelity to the input sketch. We train a network to map the facial features extracted from the input sketch to a vector in the latent space of the face-generating model. Also, we study different optimization criteria and compare the results of the proposed model with those of the state-of-the-art models quantitatively and qualitatively. The proposed model achieved 0.655 in the SSIM index and 97.59% rank-1 face recognition rate with a higher quality of the produced images.

Keywords


[1] A. Barbadekar, P. Kulkarni, A survey of face recognition from sketches, International Journal of Latest Trends in Engineering and Technology (IJLTET), 6(3) (2016) 150-158.
[2] S. Klum, H. Han, A.K. Jain, B. Klare, Sketch based face recognition: Forensic vs. composite sketches, in:  2013 international conference on biometrics (ICB), IEEE, 2013, pp. 1-8.
[3] X. Tang, X. Wang, Face sketch synthesis and recognition, in:  Proceedings Ninth IEEE International Conference on Computer Vision, IEEE, 2003, pp. 687-694.
[4] X. Tang, X. Wang, Face sketch recognition, IEEE Transactions on Circuits and Systems for video Technology, 14(1) (2004) 50-57.
[5] X. Wang, X. Tang, Face photo-sketch synthesis and recognition, IEEE transactions on pattern analysis and machine intelligence, 31(11) (2008) 1955-1967.
[6] T. Karras, T. Aila, S. Laine, J. Lehtinen, Progressive growing of gans for improved quality, stability, and variation, arXiv preprint arXiv:1710.10196,  (2017).
[7] T. Karras, S. Laine, T. Aila, A style-based generator architecture for generative adversarial networks, in:  Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2019, pp. 4401-4410.
[8] M. Zhang, J. Zhang, Y. Chi, Y. Li, N. Wang, X. Gao, Cross-domain face sketch synthesis, IEEE Access, 7 (2019) 98866-98874.
[9] N. Wang, D. Tao, X. Gao, X. Li, J. Li, A comprehensive survey to face hallucination, International journal of computer vision, 106 (2014) 9-30.
[10] J. Zhang, N. Wang, X. Gao, D. Tao, X. Li, Face sketch-photo synthesis based on support vector regression, in:  2011 18th IEEE International Conference on Image Processing, IEEE, 2011, pp. 1125-1128.
[11] N. Wang, D. Tao, X. Gao, X. Li, J. Li, Transductive face sketch-photo synthesis, IEEE transactions on neural networks and learning systems, 24(9) (2013) 1364-1376.
[12] N. Wang, X. Gao, D. Tao, X. Li, Face sketch-photo synthesis under multi-dictionary sparse representation framework, in:  2011 Sixth International Conference on Image and Graphics, IEEE, 2011, pp. 82-87.
[13] X. Gao, N. Wang, D. Tao, X. Li, Face sketch–photo synthesis and retrieval using sparse representation, IEEE Transactions on circuits and systems for video technology, 22(8) (2012) 1213-1226.
[14] C. Peng, X. Gao, N. Wang, D. Tao, X. Li, J. Li, Multiple representations-based face sketch–photo synthesis, IEEE transactions on neural networks and learning systems, 27(11) (2015) 2201-2215.
[15] I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville, Y. Bengio, Generative adversarial nets, Advances in neural information processing systems, 27 (2014).
[16] Z. Yongxin, A survey of image to image translation with gans,  (2020).
[17] P. Isola, J.-Y. Zhu, T. Zhou, A.A. Efros, Image-to-image translation with conditional adversarial networks, in:  Proceedings of the IEEE conference on computer vision and pattern recognition, 2017, pp. 1125-1134.
[18] J.-Y. Zhu, T. Park, P. Isola, A.A. Efros, Unpaired image-to-image translation using cycle-consistent adversarial networks, in:  Proceedings of the IEEE international conference on computer vision, 2017, pp. 2223-2232.
[19] Z. Yi, H. Zhang, P. Tan, M. Gong, Dualgan: Unsupervised dual learning for image-to-image translation, in:  Proceedings of the IEEE international conference on computer vision, 2017, pp. 2849-2857.
[20] L. Wang, V. Sindagi, V. Patel, High-quality facial photo-sketch synthesis using multi-adversarial networks, in:  2018 13th IEEE international conference on automatic face & gesture recognition (FG 2018), IEEE, 2018, pp. 83-90.
[21] W. Chao, L. Chang, X. Wang, J. Cheng, X. Deng, F. Duan, High-fidelity face sketch-to-photo synthesis using generative adversarial network, in:  2019 IEEE International Conference on Image Processing (ICIP), IEEE, 2019, pp. 4699-4703.
[22] H.-Y. Lee, H.-Y. Tseng, Q. Mao, J.-B. Huang, Y.-D. Lu, M. Singh, M.-H. Yang, Drit++: Diverse image-to-image translation via disentangled representations, International Journal of Computer Vision, 128 (2020) 2402-2417.
[23] O. Nizan, A. Tal, Breaking the cycle-colleagues are all you need, in:  Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2020, pp. 7860-7869.
[24] U. Osahor, H. Kazemi, A. Dabouei, N. Nasrabadi, Quality guided sketch-to-photo image synthesis, in:  Proceedings of the IEEE/CVF conference on computer vision and pattern recognition workshops, 2020, pp. 820-821.
[25] Y. Lin, S. Ling, K. Fu, P. Cheng, An identity-preserved model for face sketch-photo synthesis, IEEE Signal Processing Letters, 27 (2020) 1095-1099.
[26] J. Zheng, W. Song, Y. Wu, R. Xu, F. Liu, Feature encoder guided generative adversarial network for face photo-sketch synthesis, IEEE Access, 7 (2019) 154971-154985.
[27] M. Zhu, J. Li, N. Wang, X. Gao, A deep collaborative framework for face photo–sketch synthesis, IEEE transactions on neural networks and learning systems, 30(10) (2019) 3096-3108.
[28] J. Yu, X. Xu, F. Gao, S. Shi, M. Wang, D. Tao, Q. Huang, Toward realistic face photo–sketch synthesis via composition-aided GANs, IEEE transactions on cybernetics, 51(9) (2020) 4350-4362.
[29] Y. Fang, W. Deng, J. Du, J. Hu, Identity-aware CycleGAN for face photo-sketch synthesis and recognition, Pattern Recognition, 102 (2020) 107249.
[30] S.-Y. Chen, W. Su, L. Gao, S. Xia, H. Fu, Deep generation of face images from sketches, arXiv preprint arXiv:2006.01047,  (2020).
[31] Y. Li, X. Chen, B. Yang, Z. Chen, Z. Cheng, Z.-J. Zha, Deepfacepencil: Creating face images from freehand sketches, in:  Proceedings of the 28th ACM International Conference on Multimedia, 2020, pp. 991-999.
[32] E. Collins, R. Bala, B. Price, S. Susstrunk, Editing in style: Uncovering the local semantics of gans, in:  Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020, pp. 5771-5780.
[33] Y. Shen, J. Gu, X. Tang, B. Zhou, Interpreting the latent space of gans for semantic face editing, in:  Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2020, pp. 9243-9252.
[34] T. Wang, T. Zhang, B. Lovell, Faces a la carte: Text-to-face generation via attribute disentanglement, in:  Proceedings of the IEEE/CVF winter conference on applications of computer vision, 2021, pp. 3380-3388.
[35] E. Richardson, Y. Alaluf, O. Patashnik, Y. Nitzan, Y. Azar, S. Shapiro, D. Cohen-Or, Encoding in style: a stylegan encoder for image-to-image translation, in:  Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2021, pp. 2287-2296.
[36] R. Abdal, Y. Qin, P. Wonka, Image2stylegan: How to embed images into the stylegan latent space?, in:  Proceedings of the IEEE/CVF international conference on computer vision, 2019, pp. 4432-4441.
[37] T. Karras, S. Laine, M. Aittala, J. Hellsten, J. Lehtinen, T. Aila, Analyzing and improving the image quality of stylegan, in:  Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2020, pp. 8110-8119.
[38] R. Abdal, Y. Qin, P. Wonka, Image2stylegan++: How to edit the embedded images?, in:  Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2020, pp. 8296-8305.
[39] A. Tewari, M. Elgharib, F. Bernard, H.-P. Seidel, P. Pérez, M. Zollhöfer, C. Theobalt, Pie: Portrait image embedding for semantic control, ACM Transactions on Graphics (TOG), 39(6) (2020) 1-14.
[40] R. Saha, B. Duke, F. Shkurti, G.W. Taylor, P. Aarabi, Loho: Latent optimization of hairstyles via orthogonalization, in:  Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021, pp. 1984-1993.
[41] Y. Shen, C. Yang, X. Tang, B. Zhou, Interfacegan: Interpreting the disentangled face representation learned by gans, IEEE transactions on pattern analysis and machine intelligence, 44(4) (2020) 2004-2018.
[42] J. Zhu, Y. Shen, D. Zhao, B. Zhou, In-domain gan inversion for real image editing, in:  European conference on computer vision, Springer, 2020, pp. 592-608.
[43] K.M. Lewis, S. Varadharajan, I. Kemelmacher-Shlizerman, Tryongan: Body-aware try-on via layered interpolation, ACM Transactions on Graphics (TOG), 40(4) (2021) 1-10.
[44] F. Schroff, D. Kalenichenko, J. Philbin, Facenet: A unified embedding for face recognition and clustering, in:  Proceedings of the IEEE conference on computer vision and pattern recognition, 2015, pp. 815-823.
[45] K. Simonyan, A. Zisserman, Very deep convolutional networks for large-scale image recognition, arXiv preprint arXiv:1409.1556,  (2014).
[46] O.M. Parkhi, Andrea Vedaldi et Andrew Zisserman,«, Deep face recognition,  (2015).
[47] Q. Cao, L. Shen, W. Xie, O.M. Parkhi, A. Zisserman, Vggface2: A dataset for recognising faces across pose and age, in:  2018 13th IEEE international conference on automatic face & gesture recognition (FG 2018), IEEE, 2018, pp. 67-74.
[48] N. Dalal, B. Triggs, Histograms of oriented gradients for human detection, in:  2005 IEEE computer society conference on computer vision and pattern recognition (CVPR'05), Ieee, 2005, pp. 886-893.
[49] D.E. King, Dlib-ml: A machine learning toolkit, The Journal of Machine Learning Research, 10 (2009) 1755-1758.
[50] A. Martinez, R. Benavente, The ar face database: Cvc technical report, 24,  (1998).
[51] L. Panabee, Ai picture colorizer, in, 2021.
[52] ImageColorizer, in, 2021, pp. Colourise your black and white photos.
[53] Photomyne Ltd, in, 2020, pp. Magical b&w photo and video colorization.