Multi-Object Classification and Scene Understanding

teaserImage

People

Abstract

Deep learning has shown state-of-art classification performance on datasets such as ImageNet, which contain a single object in each image. However, multi-object classification is far more challenging. We present a unified framework which leverages the strengths of multiple machine learning methods, viz deep learning, probabilistic models and kernel methods to obtain state-of-art performance on Microsoft COCO, consisting of non-iconic images. We incorporate contextual information in natural images through a conditional latent tree probabilistic model (CLTM), where the object co-occurrences are conditioned on the extracted fc7 features from pre-trained Imagenet CNN as input. We learn the CLTM tree structure using conditional pairwise probabilities for object co-occurrences, estimated through kernel methods, and we learn its node and edge potentials by training a new 3-layer neural network, which takes fc7 features as input. Moreover, the latent variables in the CLTM capture scene information and with simple k-means clustering on this auxiliary information improves scene classification performance on the MIT-Indoor dataset, without the need for any retraining.

Paper & Presentation

Paper


Citation 
[Show BibTex]


Demo



  • Extracted Latent Tree Structure




  • Latent Tree in Action
  • Related Papers

    Myung Jin Choi, Vincent Y. F. Tan, Animashree Anandkumar, and Alan S. Willsky "Learning Latent Tree Graphical Models" Journal of Machine Learning Research (JMLR).

    Code

    Dataset and Other Tools