An Efficient Knowledge Distillation Architecture for Real-time Semantic Segmentation

Document Type : Research Article


Department of Computer Engineering, Sharif University of Technology, Tehran, Iran


In recent years, Convolutional Neural Networks (CNNs) have made significant strides in the field of segmentation, particularly in semantic segmentation where both accuracy and efficiency are crucial. However, despite their high accuracy, these deep networks are not practical for real-time use due to their low inference speed. This issue has prompted researchers to explore various techniques to improve the efficiency of CNNs. One such technique is knowledge distillation, which involves transferring knowledge from a larger, cumbersome (teacher) model to a smaller, more compact (student) model. This paper proposes a simple yet efficient approach to address the issue of low inference speed in CNNs using knowledge distillation. The proposed method involves distilling knowledge from the feature maps of the teacher model to guide the learning of the student model. The approach uses a straightforward technique known as pixel-wise distillation to transfer the feature maps of the last convolution layer of the teacher model to the student model. Additionally, a pair-wise distillation technique is used to transfer pair-wise similarities of the intermediate layers. To validate the effectiveness of the proposed method, extensive experiments were conducted on the PascalVoc 2012 dataset using a state-of-the-art DeepLabV3+ segmentation network with different backbone architectures. The results showed that the proposed method achieved a balanced mean Intersection over Union (mIoU) and training time.


Main Subjects

[1] L.-C. Chen, G. Papandreou, F. Schroff, H. Adam, Rethinking atrous convolution for semantic image segmentation, arXiv preprint arXiv:1706.05587,  (2017).
[2] H. Zhao, J. Shi, X. Qi, X. Wang, J. Jia, Pyramid scene parsing network, in, 2017, pp. 2881-2890.
[3] A. Paszke, A. Chaurasia, S. Kim, E. Culurciello, Enet: A deep neural network architecture for real-time semantic segmentation, arXiv preprint arXiv:1606.02147,  (2016).
[4] S. Mehta, M. Rastegari, A. Caspi, L. Shapiro, H. Hajishirzi, Espnet: Efficient spatial pyramid of dilated convolutions for semantic segmentation, in, 2018, pp. 552-568.
[5] H. Zhao, X. Qi, X. Shen, J. Shi, J. Jia, Icnet for real-time semantic segmentation on high-resolution images, in, 2018, pp. 405-420.
[6] C. Yu, C. Gao, J. Wang, G. Yu, C. Shen, N. Sang, Bisenet v2: Bilateral network with guided aggregation for real-time semantic segmentation, International Journal of Computer Vision, 129 (2021) 3051-3068 %@ 0920-5691.
[7] C. Yu, J. Wang, C. Peng, C. Gao, G. Yu, N. Sang, Bisenet: Bilateral segmentation network for real-time semantic segmentation, in, 2018, pp. 325-341.
[8] G. Hinton, O. Vinyals, J. Dean, Distilling the knowledge in a neural network, arXiv preprint arXiv:1503.02531,  (2015).
[9] A. Romero, N. Ballas, S.E. Kahou, A. Chassang, C. Gatta, Y. Bengio, Fitnets: Hints for thin deep nets, arXiv preprint arXiv:1412.6550,  (2014).
[10] B. Zhou, A. Khosla, A. Lapedriza, A. Oliva, A. Torralba, Learning deep features for discriminative localization, in, 2016, pp. 2921-2929.
[11] R.R. Selvaraju, M. Cogswell, A. Das, R. Vedantam, D. Parikh, D. Batra, Grad-cam: Visual explanations from deep networks via gradient-based localization, in, 2017, pp. 618-626.
[12] M. Everingham, L. Van Gool, C.K.I. Williams, J. Winn, A. Zisserman, The pascal visual object classes (voc) challenge, International journal of computer vision, 88 (2010) 303-338 %@ 0920-5691.
[13] L.-C. Chen, Y. Zhu, G. Papandreou, F. Schroff, H. Adam, Encoder-decoder with atrous separable convolution for semantic image segmentation, in, 2018, pp. 801-818.
[14] J. Long, E. Shelhamer, T. Darrell, Fully convolutional networks for semantic segmentation, in, 2015, pp. 3431-3440.
[15] K. He, X. Zhang, S. Ren, J. Sun, Deep residual learning for image recognition, in, 2016, pp. 770-778.
[16] T. Furlanello, Z. Lipton, M. Tschannen, L. Itti, A. Anandkumar, Born again neural networks, in, PMLR, 2018, pp. 1607-1616 %@ 2640-3498.
[17] Z. Zhou, C. Zhuge, X. Guan, W. Liu, Channel distillation: Channel-wise attention for knowledge distillation, arXiv preprint arXiv:2006.01683,  (2020).
[18] P. Chen, S. Liu, H. Zhao, J. Jia, Distilling knowledge via knowledge review, in, 2021, pp. 5008-5017.
[19] I. Sarridis, C. Koutlis, S. Papadopoulos, I. Kompatsiaris, InDistill: Transferring Knowledge From Pruned Intermediate Layers, arXiv preprint arXiv:2205.10003,  (2022).
[20] R. Liu, K. Yang, H. Liu, J. Zhang, K. Peng, R. Stiefelhagen, Transformer-based knowledge distillation for efficient semantic segmentation of road-driving scenes, arXiv preprint arXiv:2202.13393,  (2022).
[21] Z. Li, D. Hoiem, Learning without forgetting, IEEE transactions on pattern analysis and machine intelligence, 40(12) (2017) 2935-2947 %@ 0162-8828.
[22] W. Park, D. Kim, Y. Lu, M. Cho, Relational knowledge distillation, in, 2019, pp. 3967-3976.
[23] K. Yue, J. Deng, F. Zhou, Matching guided distillation, in, Springer, 2020, pp. 312-328 %@ 3030585549.
[24] S. Tang, Z. Zhang, Z. Cheng, J. Lu, Y. Xu, Y. Niu, F. He, Distilling Object Detectors with Global Knowledge, in, Springer, 2022, pp. 422-438.
[25] C. Yang, M. Ochal, A. Storkey, E.J. Crowley, Prediction-guided distillation for dense object detection, in, Springer, 2022, pp. 123-138.
[26] D. Chen, J.-P. Mei, H. Zhang, C. Wang, Y. Feng, C. Chen, Knowledge distillation with the reused teacher classifier, in, 2022, pp. 11933-11942.
[27] H.-J. Ye, S. Lu, D.-C. Zhan, Generalized knowledge distillation via relationship matching, IEEE Transactions on Pattern Analysis and Machine Intelligence, 45(2) (2022) 1817-1834 %@ 0162-8828.
[28] G. Ros, S. Stent, P.F. Alcantarilla, T. Watanabe, Training constrained deconvolutional networks for road scene semantic segmentation, arXiv preprint arXiv:1604.01545,  (2016).
[29] L. Liu, Q. Huang, S. Lin, H. Xie, B. Wang, X. Chang, X. Liang, Exploring inter-channel correlation for diversity-preserved knowledge distillation, in, 2021, pp. 8271-8280.
[30] Y. Wang, W. Zhou, T. Jiang, X. Bai, Y. Xu, Intra-class feature variation distillation for semantic segmentation, in, Springer, 2020, pp. 346-362 %@ 3030585700.
[31] Y. Feng, X. Sun, W. Diao, J. Li, X. Gao, Double similarity distillation for semantic image segmentation, IEEE Transactions on Image Processing, 30 (2021) 5363-5376 %@ 1057-7149.
[32] J. Xie, B. Shuai, J.-F. Hu, J. Lin, W.-S. Zheng, Improving fast segmentation with teacher-student learning, arXiv preprint arXiv:1810.08476,  (2018).
[33] S. Park, Y.S. Heo, Knowledge distillation for semantic segmentation using channel and spatial correlations and adaptive cross entropy, Sensors, 20(16) (2020) 4616 %@ 1424-8220.
[34] B. Heo, J. Kim, S. Yun, H. Park, N. Kwak, J.Y. Choi, A comprehensive overhaul of feature distillation, in, 2019, pp. 1921-1930.
[35] Y. Liu, K. Chen, C. Liu, Z. Qin, Z. Luo, J. Wang, Structured knowledge distillation for semantic segmentation, in, 2019, pp. 2604-2613.
[36] T. He, C. Shen, Z. Tian, D. Gong, C. Sun, Y. Yan, Knowledge adaptation for efficient semantic segmentation, in, 2019, pp. 578-587.
[37] S. Zagoruyko, N. Komodakis, Paying more attention to attention: Improving the performance of convolutional neural networks via attention transfer, arXiv preprint arXiv:1612.03928,  (2016).