Publication Search Form




We found publication with these paramters.

Corpus-Based vs. Model-Based Selection of Relevant Features

Cyril Goutte, Pavel Dobrokhotov, Eric Gaussier, Anne-Lise Veuthey
In this contribution, we review a number of approaches to feature selection, divided in two broad classes. Some are corpus-based, ie they use only the data to assess the relevance of each feature, and aim at identifying a small subset of relevant features on which to train categorisation models. Others are model-based, ie they assess the relevance of each feature on the basis of the model used for categorisation. This second class of measures allows to better understand the model decisions. Furthermore, comparing the two classes provide insight on whether or not corpus-based feature extraction is selective enough, and does not overgenerate compared to model-based selection. Our experimental comparison is mainly based on a collection of medical abstracts, provided by the Swiss Institute of Bioinformatics.
Proceedings of CORIA04, Toulouse, France, March 10-12, 2004, pp. 75-88.

Attachments (1.90 MB)