A Probabilistic information retrieval approach to medical annotation in SWISS-PROT

Pavel Dobrokhotov, Cyril Goutte, Anne-Lise Veuthey, Eric Gaussier
the goal of medical annotation of human proteins in SWISS-PROT is to add features specifically intended for researchers working on genetic diseases and polymorphisms. For this purpose, it is necessary to search through a waste number of publications containing relevant information. Promising results have been obtained by applying the natural language processing and machine learning techniques for resolution of this problem. By using the probalistic latent categoriser on representative query sets, 69% recall and 59% precision was achieved for relevant documents. This classifier also rejected irrelevant abstracts with more than 96% precision. Better linguistic pre-processing of source documents can further improve results of such computer approach
Proceedings of Medical informatics Europe (MIE2003), Saint Malo, France, May 4-7, 2003.


dobrokhotov03probabilistic.pdf (82.95 kB)