Semantic

we want to give every node of the taxonomy a semantic embedding, and the emphasis of semantic embedding in the different level of taxonomy should be different, (e.g., with level goes high, the representation of embedding with level goes high, the representation of embedding goes from concrete to abstract).

So we choose GMM as our concept clustering, and the and of Gaussian distribution are the embeddings.

What's more, the result of HGMM is used to learn the latent graph embedding for the VAE, and meanwhile, the result of VAE will be used to update the HGMM.

semantic

As shown in the picture, every iteration, the VDGEC generates a new layer, where the concept is embedded as 5 part(a, b, c, d, e) GMM at iteration 1, and 3 part(a, f, d) GMM in iteration 2, and one Gaussian distribution model in the final iteration. And different layers capture very different semantic meaning.

To the normal clustering algorithms, the bigger problem is the latent z of normal VAE cannot help extracting hierarchical relation, while the clustering result can not feedback to guide the generation of latent z.

so we propose a new unified model VDGEC, which choose GMM for concept clustering.