Publications
Authors:
  • Pavel Dobrokhotov , Cyril Goutte , Anne-Lise Veuthey , Eric Gaussier
Citation:
Proceedings of the 11th International Conference on Intelligent Systems for Molecular Biology (ISMB 2003)
Abstract:
Motivation:Looking for relevant publications for manual database annotation is a tedious task. In this paper, we
show that the combination of natural language processing (NLP) qnd clqssificqtion tools cqn help re-ranking
the documents returned by PubMed according to their relevance to SWISS-PROT annotation.
Results:With q probabilistic latent categoriser (PLC)we obtained 69% recall and 59% precision for relevant
documents in representative query. As the PLC technique provides the relative contribution of each term to
the final document score, we used the Kullback-Leibler symmetric divergence to determine the most
discriminating words for SWISS-PROT medical annotation. This information should allow curators to better
apprehend classification results and has also a great value for fine-tuning the linguistic pre-processing of
documents, which in turn can improve the overall classifier performance.
Year:
2003
Report number:
2003/009
Attachments: