Task
We investigate the problem of automatically learning multimodal taxonomies from the multimedia data on the Web.
Framework
Dataset
Dataset 7.23GB
Every line is a document in json format as: x
1{2 images:[3 'http://zhuanzhiimg.b0.upaiyun.com/images/wx/9cc43e5bd67854194860257410f94634',4 'http://zhuanzhiimg.b0.upaiyun.com/images/wx/8032a0a94cf2f197132490f3b40b5e2c'5 ],6 videos:[],7 content:''8}