A Detailed Analysis of English Stemming Algorithms

David Hull, Greg Grefenstette
We present a study comparing the performance of traditional stemming algorithms based on suffic removal to linguistic methods performing morphological analysis. The results indicate that most conflation algorithms perform about 5% better than no stemming, and there is little difference between methods in terms of average performance. However, a detailed analysis of individual queries indicates that performance on this level is often highly sensitive to the choice of stemming technique. From this analysis, we can suggest a number of different ways to modify linguistic approaches so that they will be better suited to the stemming problem.
Xerox technical report


DHull-GGrefenstette-Technical-report-MLTT96.pdf (1.14 MB)