Phone : +33 (0)4 76 61 50 17 florent.perronnin@xrce.xerox.com
My name is Florent Perronnin, welcome to my home page.
I am the manager of the Computer Vision group and a Principal Scientist at the Xerox Research Center Europe (XRCE) in Grenoble, France. My interests are in the practical application of machine learning to computer vision tasks such as image classification, retrieval or segmentation.
Here are links to my Google Scholar and DBLP profiles.
I obtained my Engineering Degree in 2000 from the Ecole Nationale Supérieure des Télécommunications (Paris, France) which is known nowadays as Telecom ParisTech. From 2000 to 2001 I was a Research Engineer with the Panasonic Speech Technology Laboratory (Santa Barbara, California) working on speech and speaker recognition. At the end of 2001, I joined the Multimedia Communications Department of the Institut Eurecom where I did a Ph.D. on face recognition in collaboration with France Telecom Research, now Orange Labs. I obtained my Ph.D. degree in 2004 from the Ecole Polytechnique Federale de Lausanne (Lausanne, Switzerland). In 2005, I joined the Xerox Research Center Europe as a Research Scientist and in 2010 I was appointed Senior Scientist.
Here is a (non-exhaustive) list of research projects I have been involved in. The list of related publications can be found here .
We are interested in characterizing the content of an image (e.g. a photograph, a painting, a document, a drawing, etc.) with a "robust" signature, i.e. a representation which is invariant to changes in viewpoint and scale, to variations in lighting or to occlusion. State-of-the-art image signatures are all based on a similar principle and consist in aggregating descriptors computed from local patches extracted at multiple scales. However, finding the "best" aggregation strategy is still an active research topic. We have proposed several image representations and have found the Fisher Vector to work extremely well in a variety of computer vision tasks including image classification, retrieval and segmentation. It is also extremely simple to implement and very efficient to compute.
Within the computer vision community annotated data used to be viewed as a scarce resource. However, the availability of platforms to "crowdsource" the annotation process has radically changed this perception in the past few years. Scaling to thousands of classes and millions of images raises new challenges especially in terms of computational and storage efficiency. Our system is efficient in both respects: it uses compressed Fisher Vectors to represent images compactly and Stochastic Gradient Descent to learn image classifiers efficiently. It was shown to perform very well during the Imagenet Large Scale Visual Recognition Challenges (ILSVRC) 2010 and 2011.
Query-by-example image retrieval is the problem which consists in returning the closest matches to a given query image within a dataset. When dealing with a large number of database images, there are two considerations of paramount importance. The first one is the computational cost: the computation of the distance between two image signatures should rely on efficient operations. The second one is the memory cost: the memory footprint of the objects should be small enough so that all database image signatures fit in RAM. We have proposed efficient image retrieval techniques which enable to search with high accuracy within a database containing a hundred millions images in real-time using a single server.
Semantic Image segmentation is the problem which consists in assigning each pixel in an image to a set of pre-defined semantic categories. State-of-the-art semantic segmentation algorithms typically consist of three components: a local appearance model, a local consistency model and a global consistency model. These three components are generally tightly integrated into a unified framework which makes training and inference computationally costly. We have proposed a decoupled system which can be trained extremely efficiently. Although supposedely suboptimal, our system has demonstrated state-of-the-art accuracy during the PASCAL VOC 2008 segmentation challenge.
Handwritten word-spotting is the task which consists in detecting one or multiple keywords in a handwritten document. While in the document analysis literature it has been primarily viewed as a query-by-example problem, we have proposed to address it as an object detection problem. Our system, which uses robust features inspired by the computer vision literature and robust statistical models to cope with scarce training data, has shown excellent performance on real-world datasets. Handwritten word-spotting is part of the Xerox Smart Document Technologies (SDT) suite.