Statistical Relational Learning

The era of big data  is just starting and with it, a rise of new methodological tools  to try to automatically analyze and predict data without having to follow the standard but complex data analytics process of data acquisition, modelling, algorithm development and real world validation. Perhaps surprisingly, the most effective predictive models are the linear models invented at the early age of statistics and computing.

The main reason for their success comes from their ability to scale linearly with the number of observations, enabling them to be applied on massive amounts of data.

Statistical Relational Learning relates to algorithms that can predict or correct values in a relational database by using the redundancies and similarities in the data. The ambition is to derive models and algorithms that are generic enough to perform reasonably well on a large variety of problems, including recommendation, time series prediction, text and image categorization or approximate logical reasoning. By developing a unified framework for predictive queries, our ambition is to perform any type of supervised prediction task on top of an existing knowledge base, including categorization, regression and outlier detection. Such research direction goes hand-to-hand with the Linked Open Data initiative that tries to promote the exchange of information in a unique format. In a statistical relational learning application, such format can be directly be used by the predictive algorithms. Ultimately, this research will have direct impact on Xerox services across multiple domains, such as City Mobility (predictions based on city sensor information), Healthcare (recommendation of health plans and fraud detection), Customer Care (identifying customer problems and proposing solutions), Finance (fusing multiple sources of data to anticipate market evolution), and Education (computing a personalized training program).

In short, research in this domain implements the “prescriptive analytics” vision supported by Xerox research over several years.