A Probabilistic information retrieval approach to medical annotation in SWISS-PROT
Pavel Dobrokhotov, Cyril Goutte, Anne-Lise Veuthey, Eric Gaussier
the goal of medical annotation of human proteins in SWISS-PROT is to add features specifically intended for
researchers working on genetic diseases and polymorphisms. For this purpose, it is necessary to search
through a waste number of publications containing relevant information. Promising results have been obtained
by applying the natural language processing and machine learning techniques for resolution of this problem.
By using the probalistic latent categoriser on representative query sets, 69% recall and 59% precision was
achieved for relevant documents. This classifier also rejected irrelevant abstracts with more than 96%
precision. Better linguistic pre-processing of source documents can further improve results of such computer
Proceedings of Medical informatics Europe (MIE2003), Saint Malo, France, May 4-7, 2003.
dobrokhotov03probabilistic.pdf (82.95 kB)