Publication Search Form




We found publication with these paramters.

Combining NLP and probabilistic categorisation for document and term selection for SWISS-PROT Medical annotation

Pavel Dobrokhotov, Cyril Goutte, Anne-Lise Veuthey, Eric Gaussier
Motivation:Looking for relevant publications for manual database annotation is a tedious task. In this paper, we show that the combination of natural language processing (NLP) qnd clqssificqtion tools cqn help re-ranking the documents returned by PubMed according to their relevance to SWISS-PROT annotation. Results:With q probabilistic latent categoriser (PLC)we obtained 69% recall and 59% precision for relevant documents in representative query. As the PLC technique provides the relative contribution of each term to the final document score, we used the Kullback-Leibler symmetric divergence to determine the most discriminating words for SWISS-PROT medical annotation. This information should allow curators to better apprehend classification results and has also a great value for fine-tuning the linguistic pre-processing of documents, which in turn can improve the overall classifier performance.
Proceedings of the 11th International Conference on Intelligent Systems for Molecular Biology (ISMB 2003)


dobrokhotov03combining.pdf (383.27 kB)