Learning image and video representations

Unit: Computer Vision

Diane Larlus

Duration: 5 to 6 months
Start Date: September 2016

The computer vision research as well as several other fields have observed a significant boost in performance when hand-crafted features carefully designed for the task at hand were replaced by deep end-to-end trained features, such as convolutional neural networks. Despite their broad success these deep features do not come without limitations. First they rely on very large annotated sets: they require hundreds of thousands if not millions of examples to train. Second, the design of their architecture requires many difficult choices such as the number of layers, their size, and their order, these choices having as much impact as they lack theoretical justification.

In this internship, we would like to explore ways to overcome these limitations. A possible line of work would consider using weakly supervised or unsupervised scenarios using large amounts of video sequences. Another line of research would be to learn several models jointly. This research direction would build on the Convolutional Neural Fabrics ( ). Possible applications of the newly built deep networks include but are not limited to image categorization, object localization, video classification, and abnormally detection.

Applicants should be enrolled in a graduate program (PhD is a plus). They should have strong knowledge in computer vision and machine learning. They should have a good programming background in python and C/C++. Experience with deep learning and frameworks such as Caffe or Theano is a plus.

The successful candidate will be part of the Computer Vision group at XRCE and will work with the researchers in the group. The intern will be given the freedom and flexibility to find his/her own solutions and to work in a way that suits him/her but will have the guidance and support of experienced full-time Xerox researchers and thereby gain an introduction to the field of commercial research. This internship is also part of a collaboration with the word-class INRIA laboratory located in Grenoble and with Jakob Verbeek from the Toth group in particular.
For further details, please contact Diane Larlus

