In our VDGEC method, we jointly optimize the reconstruction loss and clustering loss in the objective function by VDGEC.
The loss function :
The first term is the clustering loss, the second term is reconstruction loss.
With epoch increase, , which means , the reconstruction loss goes down. And the , so the clustering loss goes down.
About clear reduction in early epochs:
Total loss: Clear reduction can be found in picture.
Reconstruction Loss: There are clear reduction in the early epochs, which show the Encoder and Decoder are learning features of concept.
Clustering Loss: After every epoch, we cluster concepts base on their emebddings (i.e. the latent generated by encoder.). After every 100 epochs, we fuse the compontents GMM to compontents GMM, with the information loss threshold , which change the and increase the loss of the first term. When τ goes small, HGMM will hard to fuse, the learning process will end earilier. When τ goes big, HGMM will merge more components in an iteration, the total level will be lower. In practice, we raise τ after every fusion, as the margin of cluster goes wide, as shown in the peaks of picture.
About no clear reduction in latter epochs:
Reconstruction Loss: The reconstruction loss is stable after enough epoch. Although the fusion of GMM changes the prior of VGDEC, there is only small flush can be found on 100,200,300,400 epoch, as the changed.
Clustering Loss: We do clustering every epoch, however the embedding of concept are stable and the parameters of GMM never change until the fusion of GMM, so the undulate can only be found around fusion epoch.
EffectofModelParameters.
The evaluation is conducted by changing one parameter (e.g., K) while fixing the other (e.g., τ). The above picture shows the performance of VDGEC for different values of K and τ. When τ goes small, HGMM will hard to fuse, the learning process will end earilier. When τ goes big, HGMM will merge more components in an iteration, the total level will be lower. In practice, we raise τ after every iteration, as the margin of cluster goes wide. When K goes small, the initial cluster number goes down, HGMM contributes less as the inital cluster could be the final result. The bigger the K goes, the edge number of taxonomy raises and the fine-grained relation will output.