Embedding probabilistic logic for machine reading

5th March 2015

Sebastian Riedel , senior lecturer at University College London, London, U.K.

Abstract: We want to build machines that read, and make inferences based on what was read. A long line of the work in the field has focussed on approaches where language is converted (possibly using machine learning) into a symbolic and relational representation. A reasoning algorithm (such as a theorem prover) then derives new knowledge from this representation. This allows for rich knowledge to be captured, but generally suffers from two problems: acquiring sufficient symbolic background knowledge and coping with noise and uncertainty in data. Probabilistic logics (such as Markov Logic) offer a solution, but are known to often scale poorly.

In recent years a third alternative emerged: latent variable models in which entities and relations are embedded in vector spaces (and represented "distributionally"). Such approaches scale well and are robust to noise, but they raise their own set of questions: What type of inferences do they support? What is a proof in embeddings? How can explicit background knowledge be injected into embeddings? In this talk I first present our work on latent variable models for machine reading, using ideas from matrix factorisation as well as both closed and open information extraction. Then I will present recent work we conducted to address the questions of injecting and extracting symbolic knowledge into/from models based on embeddings. In particular, I will show how one can rapidly build accurate relation extractors through combining logic and embeddings.