Mid-level Deep Pattern Mining

Mid-level Deep Pattern Mining

Yao Li, Lingqiao Liu, Chunhua Shen, Anton van den Hengel

The Univerisity of Adelaide     NICTA    Australian Centre for Robotic Vision

pipeline

Abstract: Mid-level visual element discovery aims to find clusters of image patches that are both representative and discriminative. In this work, we study this problem from the prospective of pattern mining while relying on the recently popularized Convolutional Neural Networks (CNNs). Specifically, we find that for an image patch, activation extracted from the fully-connected layer of CNNs have two appealing properties which enable its seamless integration with pattern mining. Patterns are then discovered from a large number of CNN activations of image patches through the well-known association rule mining. When we retrieve and visualize image patches with the same pattern (See Fig. 1), surprisingly, they are not only visually similar but also semantically consistent. We apply our approach to scene and object classification tasks, and demonstrate that our approach outperforms all previous works on mid-level visual element discovery by a sizeable margin with far fewer elements being used. Our approach also outperforms or matches recent works using CNN for these tasks.

Paper: 

  • Yao Li, Lingqiao Liu, Chunhua Shen, Anton van den Hengel. Mid-level Deep Pattern Mining. Computer Vision and Pattern Recognition (CVPR), 2015. [PDF] [arXiv]
  • Yao Li, Lingqiao Liu, Chunhua Shen, Anton van den Hengel. Mining Mid-level Visual Patterns with Deep CNN Activations. International Journal on Computer Vision (IJCV), 2016. [PDF] (extended version of the above one)

Code is public available on the Github.

Qualitative result

Discovered mid-level visual elements from the PASCAL VOC 2007 dataset (two per category).

VOC_two

Discovered mid-level visual elements and their detections on test images on the MIT Indoor dataset.

MIT

Discovered mid-level visual elements and their detections on test images on the PASCAL VOC 2007 dataset.

MIT

Funding

This work is partly supported by Australian Research Council Grants FT120100969, LP130100156.