A Guassian Mixture Variational Graph Autoencoder for Node Classification

Document Type : Research Article

Authors

1 Department of Computer Engineering, Amirkabir University of Technology

2 PhD Candidate, Department of Computer Engineering, Amirkabir University of Technology

Abstract

Graph embedding is the procedure of transforming a graph into a low dimensional, informative representation. The majority of existing graph embedding techniques have given less consideration to the embedding distribution of the latent codes and more attention to the graph’s structure. Recently, Variational Graph AutoEncoders (VGAEs) have demonstrated good performance by learning smooth representations from unlabeled training samples. On the other hand, in regular VGAEs, the prior distribution over latent variables is generally a single Gaussian distribution. However, complex data distributions cannot be well modeled under the assumption of a single Gaussian distribution. This choice of prior distribution is important because each dimension of a multivariate Gaussian can learn a separate continuous latent feature, which can result more structured and disentangled representation. In this paper, we employ the Gaussian Mixture Model (GMM) as the prior distribution in a Variational Graph Autoencoder (GMM-VGAE) framework for node classification in graphs. In this framework, GMM effectively discovers the inherent complex data distribution, graph convolutional networks (GCNs) exploit the structure of the nodes of a graph to learn more informative representations. The proposed model incorporates several Graph Convolutional Networks (GCNs): one to map the input feature vector to the latent representation utilized for classification, another to generate the parameters of the latent distribution for learning from unlabeled data, and finally, an additional GCN is employed for reconstructing the input and delivering the reconstruction loss. Through extensive experiments on well-known Citation, Co-authorship, and Social network graphs, GMM-VGAE's superiority over state-of-the-art methods is demonstrated.

Keywords

Main Subjects