Our Research
vissual search_image retrieval image

Visual search is the computer vision problem that involves predicting whether two images or videos display the same content, for instance the same object or the same person. This is a fundamental research problem with practical applications ranging from query-by-example search in image/video datasets to vehicle re-identification in multi-camera networks.

For the purpose of matching, we need to have a robust metric to compare objects in images or videos despite differences in viewpoints and lighting conditions or occlusions. So, the first challenge to address is to be able to extract visual signatures that are informative yet robust to such confounding factors. Recently, deep learning techniques have allowed us to go beyond simply extracting visual signatures: now we can learn, directly from the image pixels, how to represent images for the image search task [deep learning]. 


One recurrent problem is the need to compare an image or video, not with a single image or video but with millions if not billions of them. When dealing with vast amounts of visual content, there are two considerations of paramount importance. The first one is the computational cost: the computation of the distance between two visual signatures should rely on efficient operations. The second one is the memory cost: the memory footprint of the objects should be small enough so that all database image signatures fit in the memory of the machines. To address these two interrelated issues, we proposed several efficient compression techniques with state-of-the-art results on large-scale image retrieval datasets containing up to 100M images.

Another aspect of paramount importance in visual search is that visual content does not exist in isolation. Indeed, every instance is part of a wider context: business workflows, textual information, social network information, relationship to databases, user interactions and so on which all have potential useful information embedded in them. This information is often inconsistent, unstructured, only partially observable and so on. Hence, one active line of research in the group has been how to leverage this information to improve visual search results.

Selected publications: