Task
We investigate the problem of automatically learning multimodal taxonomies from the multimedia data on the Web.

Framework

Dataset
Dataset 7.23GB
Every line is a document in json format as: x
1{
2 images:[
3 'http://zhuanzhiimg.b0.upaiyun.com/images/wx/9cc43e5bd67854194860257410f94634',
4 'http://zhuanzhiimg.b0.upaiyun.com/images/wx/8032a0a94cf2f197132490f3b40b5e2c'
5 ],
6 videos:[],
7 content:''
8}