Categorisation de documents PubMed pour l'annotation médicale dans SWISS-PROT

Cyril Goutte, Pavel Dobrokhotov, Eric Gaussier, Anne-Lise Veuthey
The goal of medical annotation of human proteins in Swiss-Prot is to provide researchers working on genetic diseases and polymorphisms with all the useful information. For this purpose, curators must access and search through a vast number of publications, in order to extract the relevant information. Promising results have been obtained by applying natural language processing and machine learning techniques for resolution of this problem. Our solution relies on a categorisation step that re-orders documents such that relevant articles are easier to access. Our first results show promising results: relevant documents are returned within the top 40% of the list, and about 60% of articles are relevant in the top part of the list (while only 15% of documents are relevant overall).
EGC Conférence, Atelier "Fouille de données et recherche d'informations dans des bases de données multimédia semi-structurées", Lyon, France, January 22, 2003.


goutte_dobrokhotov03.pdf (179.38 kB)