Multi-Object Classification and Scene Understanding




Deep learning has shown state-of-art classification performance on datasets such as ImageNet, which contain a single object in each image. However, multi-object classification is far more challenging. We present a unified framework which leverages the strengths of multiple machine learning methods, viz deep learning, probabilistic models and kernel methods to obtain state-of-art performance on Microsoft COCO, consisting of non-iconic images. We incorporate contextual information in natural images through a conditional latent tree probabilistic model (CLTM), where the object co-occurrences are conditioned on the extracted fc7 features from pre-trained Imagenet CNN as input. We learn the CLTM tree structure using conditional pairwise probabilities for object co-occurrences, estimated through kernel methods, and we learn its node and edge potentials by training a new 3-layer neural network, which takes fc7 features as input. Moreover, the latent variables in the CLTM capture scene information and with simple k-means clustering on this auxiliary information improves scene classification performance on the MIT-Indoor dataset, without the need for any retraining.

Paper & Presentation


[Show BibTex]


  • Extracted Latent Tree Structure

  • Latent Tree in Action
  • Related Papers

    Myung Jin Choi, Vincent Y. F. Tan, Animashree Anandkumar, and Alan S. Willsky "Learning Latent Tree Graphical Models" Journal of Machine Learning Research (JMLR).


    Dataset and Other Tools