Visual Attention for Image Classification with Reinforcement Learning
Full Paper : http://arxiv.org/abs/1312.6594
Abstract
In this paper, we introduced an adaptive representation process for image classification. The presented strategy combines both an exploration strategy used to find the best subset of regions for each image, and the final classification algorithm. New regions are iteratively selected based on the location and content of the previous ones. The resulting scheme produces an effective instance-based classification algorithm. We demonstrated the strategy's pertinence on two different image classification datasets. When using our exploration strategy limited to half of the regions of the images, we obtained a significant gain relative to baseline methods.
Notations
Let us denote the set of possible images and $\mathcal{Y}$ the discrete set of
categories. A classifier is a parametrized function
such that
where
means that category
has been predicted for image
. To learn
, a set of
labeled training images
is provided to the system.
We also consider for a fixed grid
of regions
where
is the i-th region. The set of all possible regions is denoted
, and
corresponds to the set of regions over image
.
is represented by a feature vector
of size
. We use a SIFT bag-of-words representation in our experiments.
Model formalization
The classifier is modeled as a sequential decision process that, given an image, first sequentially selects regions, and then classifies the image using the information available in the visited regions. At each step, the classifier has already selected a sequence of regions denoted where
is the index of the region of
selected at step
.
The sequence is thus a representation tailored to the specific image and the current classification task.
denotes the set of all possible trajectories over image
and
the trajectories composed of
selected regions.
Given a fixed budget , new regions are acquired resulting in a trajectory of size
. Given this trajectory, the classifier then decides which category to assign to the image.
There are two important aspects of our approach: First, the way these regions are acquired depends on the content of the previously acquired regions resulting in a classifier that is able to adapt its representation to each image being classified, thus selecting the best regions for each image. Second, the final decision is made given the features of the acquired regions only, without needing the computation of the features for the other regions, thus resulting in both a speed-up of the classification process --- not all features have to be computed --- but also in an improvement of the classification rate due to the exclusion of noisy regions.
We now give details concerning the features, the classification phase --- which classifies the image given the previously selected regions --- and the exploration phase --- which selects
additional regions over an image to classify.
Inference and Learning Algorithms
See Full Paper.
Experimental Results
Expeirmental results have been obtained on different datasets an show the ability of the model to produce a high accuracy while exploring only a subset of the regions of the image.
Conclusion
In this paper, we introduced an adaptive representation process for image classification. The presented strategy combines both an exploration strategy used to find the best subset of regions for each image, and the final classification algorithm. New regions are iteratively selected based on the location and content of the previous ones. The resulting scheme produces an effective instance-based classification algorithm. We demonstrated the strategy's pertinence on two different image classification datasets. When using our exploration strategy limited to half of the regions of the images, we obtained a significant gain relative to baseline methods.