Categorisation de documents PubMed pour l'annotation médicale dans SWISS-PROT
Cyril Goutte, Pavel Dobrokhotov, Eric Gaussier, Anne-Lise Veuthey
The goal of medical annotation of human proteins in Swiss-Prot is to provide researchers working on genetic
diseases and polymorphisms with all the useful information. For this purpose, curators must access and
search through a vast number of publications, in order to extract the relevant information. Promising results
have been obtained by applying natural language processing and machine learning techniques for resolution
of this problem. Our solution relies on a categorisation step that re-orders documents such that relevant
articles are easier to access. Our first results show promising results: relevant documents are returned
within the top 40% of the list, and about 60% of articles are relevant in the top part of the list
(while only 15% of documents are relevant overall).
EGC Conférence, Atelier "Fouille de données et recherche d'informations dans des bases de données
multimédia semi-structurées", Lyon, France, January 22, 2003.
goutte_dobrokhotov03.pdf (179.38 kB)