International Conference on Computer Vision (ICCV) 2015; Santiago, Chile
Abstract: The goal of this work is to bring semantics into the tasks of text recognition and retrieval in natural images.
Although text recognition and retrieval have received a lot of attention in recent years, previous works have focused on recognizing or retrieving exactly the same word used as a query, without taking the semantics into consideration.
For this goal we propose a convolutional neural network (CNN) with a weighted ranking loss objective that ensures that the concepts relevant to the query image are ranked ahead of those that are not relevant.
This can also be interpreted as learning a Euclidean space where word images and concepts are jointly embedded.
This model is learned in an end-to-end manner, from image pixels to semantic concepts, using a dataset of synthetically generated word images and concepts mined from a lexical database (WordNet).
Our results show that, despite the complexity of the task, word images and concepts can indeed be associated with a high degree of accuracy.