A link key approach to data interlinking

23rd April 2015

Jérôme Euzenat , senior research scientist at Exmo team, INRIA Rhône-Alpes, Montbonnot, France

Co-presented with Jérôme David

Abstract:Vast quantities of data are published in RDF format on the Web. Links identify the same resource across such different data sets allowing the joint exploitation of published data. Yet, extracting links is not an easy task. We develop an approach for that purpose which extracts link keys. Link keys extend the notion of a key to the case of different data sets. They are made of sets of pairs of properties belonging to two different classes identifying objects when they have the same, or intersecting, values for these properties.

After defining link keys, we present how candidate link keys can be extracted automatically from data. We relate this operation to formal concept analysis. We will also discuss measures for evaluating the quality of the extracted candidate link keys, depending on the availability (supervised case) or non availability (non supervised case) of actual links. The accuracy and robustness of such measures are illustrated on a real-world example.