Visual search is the computer vision problem that involves predicting whether two images or videos display the same content, for instance the same object or the same person. This is a fundamental research problem with practical applications ranging from query-by-example search in image/video datasets to vehicle re-identification in multi-camera networks.
For the purpose of matching, we need to have a robust metric to compare objects in images or videos despite differences in viewpoints and lighting conditions or occlusions. So, the first challenge to address is to be able to extract visual signatures that are informative yet robust to such confounding factors. Recently, deep learning techniques have allowed us to go beyond simply extracting visual signatures: now we can learn, directly from the image pixels, how to build representations optimized for the image search task .
One recurrent problem is the need to compare an image or video, not with a single image or video but with millions if not billions of them. When dealing with vast amounts of visual content, there are two considerations of paramount importance. The first one is the computational cost: the computation of the distance between two visual signatures should rely on efficient operations. The second one is the memory cost: the memory footprint of the objects should be small enough so that all database image signatures fit in the memory of the machines. To address these two interrelated issues, we proposed several efficient compression techniques with state-of-the-art results on large-scale image retrieval datasets containing up to 100M images.
Another aspect of paramount importance in visual search is that visual content does not exist in isolation. Indeed, every instance is part of a wider context: business workflows, textual information, social network information, relationship to databases, user interactions and so on which all have potential useful information embedded in them. This information is often inconsistent, unstructured, only partially observable and so on. Hence, one active line of research in the group has been how to leverage this information to improve visual search results.
“Deep Image Retrieval: Learning global representations for image search ”, Albert Gordo, Jon Almazán, Jerome Revaud, Diane Larlus, ECCV, 2016
“LEWIS: Latent Embeddings for Word Images and their Semantics ”, Albert Gordo, Jon Almazán, Naila Murray, Florent Perronnin, ICCV, 2015
“Iterative Quantization: A Procrustean Approach to Learning Binary Codes for Large-Scale Image Retrieval “, Yunchao Gong, Svetlana Lazebnik, Albert Gordo, Florent Perronnin, IEEE TPAMI, 2013 .
“Aggregating Local Image Descriptors into Compact Codes ”, Hervé Jégou, Florent Perronnin, Matthijs Douze, Jorge Sánchez, Patrick Pérez, Cordelia Schmid, IEEE TPAMI, 2012.
“A Model-Based Sequence Similarity with Application to Handwritten Word Spotting ”, José A. Rodríguez-Serrano, Florent Perronnin, IEEE TPAMI, 2012.
“Leveraging category-level labels for instance-level image retrieval “, Albert Gordo, José A. Rodríguez-Serrano, Florent Perronnin, Ernest Valveny, CVPR, 2012.
"Data-Driven Vehicle Identification by Image Matching ”, José A. Rodríguez-Serrano, Harsimrat Sandhawalia, Raja Bala, Florent Perronnin, Craig Saunders, ECCV Workshop, 2012.
“An empirical study of fusion operators for multimodal image retrieval “, Gabriela Csurka, Stéphane Clinchant , CBMI, 2012 .
“XRCE's Participation at Wikipedia Retrieval of ImageCLEF 2011 ”, Gabriela Csurka, Stéphane Clinchant, Adrian Popescu, CLEF Notebook Papers, 2011.
“Medical image modality classification and retrieval “, Gabriela Csurka, Stéphane Clinchant, Guillaume Jacquet, CBMI, 2011.
“Large-scale image retrieval with compressed Fisher vectors ”, Florent Perronnin, Yan Liu, Jorge Sánchez, Hervé Poirier, CVPR, 2010.