Object Localization

Classification and retrieval are key computer vision tasks, but they do not tell the whole story. For many tasks, we want a more refined description of the image. For instance we might want to segment the image to retrieve a specific portion, or detect if an object is in the image and where. Many techniques for this exist, but users are being more demanding with expectations, and many of the existing methods do not scale, so object localization presents a significant challenge.

Object localization encompasses two different computer vision tasks. Semantic segmentation is the task of labelling the pixels of an image depending on their semantic category. This means that regions of the images are precisely defined at the pixel level. Object detection produces a rough location (bounding boxes) of all the instances that belong to a set of objects of interest. Our research focuses on both tasks.

Semantic segmentation

Semantic segmentation assigns one of the pre-defined class labels to each pixel of an image. This image is divided into semantic regions, such as sky, water, tree and building in the example above.

semantic segmentation

Our group has a long history of contributions in semantic segmentation. A simple and yet very efficient method was proposed [BMVC08] that won the PASCAL VOC segmentation challenge in 2008. Some improvements have been proposed over the years ([IJCV11] [ICVGIP12]). The group also reflected on how to evaluate segmentation in the fairest and most meaningful manner [BMVC13].

 We also consider segmentation when images are enriched with additional information. For instance, near-infrared (NIR) information is available for free from most consumer digital cameras, and we showed that it could be successfully used to improve segmentation [ECCV12] (more information about this work can be found on the website of our open innovation partners at EPFL 

Object detection

The localization of objects is crucial, but for some applications a rough localization using a bounding box is enough. This is the task considered for object detection. This allows counting objects precisely, which is not always possible from semantic segmentation results.

 Most of the existing methods cast detection as a classification problem. Every possible window in the image is considered, and given to a classifier that decides if the window contains an object or not. This type of approach is successful but extremely costly as such a sequential system has to classify thousands if not millions of windows for a single input image. 

Our group has proposed a totally different way of attacking detection. Detection is performed using a single global image descriptor, reducing significantly the cost of detection with this approach [ICCV13].


Object detection and image segmentation find many applications in the different lines of business of Xerox, such as the transportation domain or the retail domain.


What is a good evaluation measure for semantic segmentation? Gabriela Csurka, Diane Larlus, Florent Perronnin. BMVC 2013

Predicting an Object Location using a Global Image Representation José A. Rodriguez, Diane Larlus. ICCV 2013

On the use of Regions for Semantic Image Segmentation Rui Hu, Diane Larlus, Gabriela Csurka. ICVGIP 2012.

Semantic Image Segmentation Using Visible and Near-Infrared Channels .Neda Salamati, Diane Larlus, Gabriela Csurka, Sabine Süsstrunk. Workshop on Color and Photometry in Computer Vision at ECCV 2012.

A Simple High Performance Approach to Semantic Segmentation Gabriela Csurka, Florent Perronnin. BMVC 2008