Sequentially Generated Instance-Dependent Image Representations for Classification - Full Paper ICLR 2014

Visual Attention for Image Classification with Reinforcement Learning

 

Full Paper : http://arxiv.org/abs/1312.6594

Abstract

In this paper, we introduced an adaptive representation process for image classification. The presented strategy combines both an exploration strategy used to find the best subset of regions for each image, and the final classification algorithm. New regions are iteratively selected based on the location and content of the previous ones. The resulting scheme produces an effective instance-based classification algorithm. We demonstrated the strategy's pertinence on two different image classification datasets. When using our exploration strategy limited to half of the regions of the images, we obtained a significant gain relative to baseline methods.

Notations

Let us denote \mathcal{X} the set of possible images and $\mathcal{Y}$ the discrete set of C categories. A classifier is a parametrized function f_\theta such that f_\theta: \mathcal{X} \rightarrow \mathcal{Y} where f_\theta(x)=y means that category y has been predicted for image x. To learn f_\theta, a set of \ell labeled training images S_\text{train} = \{(x_1,y_1),...,(x_\ell,y_\ell)\} is provided to the system.

We also consider for x a fixed grid N\times M of regions \{r^x_i\}_{i\ \leq N \times M} where r^x_i is the i-th region. The set of all possible regions is denoted \mathcal{R}, and \mathcal{R}(x)  corresponds to the set of regions over image x. r^x_i is represented by a feature vector \phi(r^x_i)   of size K. We  use a SIFT bag-of-words representation in our experiments.

Model formalization

The classifier is modeled as a sequential decision process that, given an image,  first  sequentially selects regions, and then classifies the image using the information available in the visited regions. At each step, the classifier has already selected a sequence of regions denoted (s^x_{1},..,s^x_{t}) where s^x_t is the index of the region of x selected at step t.

The sequence (s^x_{1},..,s^x_{t}) is thus a representation tailored to the specific image and the current classification task.  \mathcal{S}(x) denotes the set of all possible trajectories over image x and \mathcal{S}^t(x) the trajectories composed of t selected regions.

Given a fixed budget B,  new regions are acquired resulting in a trajectory of size B. Given this trajectory, the classifier then decides which category to assign to the image.
There are two important aspects of our approach: First, the way these regions are acquired depends on the content of the previously acquired regions resulting in a classifier that is able to adapt its representation to each image being classified, thus selecting the best regions for each image. Second, the final decision is made given the features of the acquired regions only, without needing the computation of the features for the other regions, thus resulting in both a speed-up of the classification process --- not all features have to be computed --- but also in an improvement of the classification rate due to the exclusion of noisy regions.

We now give details concerning the features, the classification phase --- which classifies the image given the B previously selected regions --- and the exploration phase --- which  selects B-1 additional regions over an image to classify.

iclr2014-1

Inference and Learning Algorithms

iclr2014-2

See Full Paper.

Experimental Results

Expeirmental results have been obtained on different datasets an show the ability of the model to produce a high accuracy while exploring only a subset of the regions of the image.

 

iclr2014-3

 

Conclusion

In this paper, we introduced an adaptive representation process for image classification. The presented strategy combines both an exploration strategy used to find the best subset of regions for each image, and the final classification algorithm. New regions are iteratively selected based on the location and content of the previous ones. The resulting scheme produces an effective instance-based classification algorithm. We demonstrated the strategy's pertinence on two different image classification datasets. When using our exploration strategy limited to half of the regions of the images, we obtained a significant gain relative to baseline methods.