Publications
Authors:
  • Cyril Goutte , Eric Gaussier , Nicola Cancedda , Hervé Dejean
Citation:
JADT 2004, 7èmes journées internationales analyse statistique des données textuelles, Louvain-la-Neuve, Belgium, 10-12 mars 2004.
Abstract:
Annotating biomedical text for Named Entity Recognition (NER) is usually a tedious and expensive process,
while unannotated data is freely available in large quantities. It therefore seems relevant to address biomedical
NER using Machine Learning techniques that learn from a combination of labelled and unlabelled data. We
consider two approaches: one is discriminative, using Support Vector Machines, the other generative, using
mixture models. We compare the two on a biomedical NER task with various levels of annotation, and
different similarity measures. We also investigate the use of Fisher kernels as a way to leverage the strength
of both approaches. Overall the discriminative approach using standard similarity measures seems to
out-perform both the generative approach and the Fisher kernels.
Year:
2004
Report number:
2003/079
Attachments: