Problems and Approaches to Cross Language Information Retrieval

Greg Grefenstette
As information becomes globally accessible, when foreign language documents are returned as a result of a query, they can either be regarded as a nuisance to be discarded, or as an unmined source of potential answers to the original query. Taking the second view is the new research area called Cross Language Information Retrieval. This paper will discuss the problems and current research techniques being explored in this area for finding documents in languages other than the original query. We will talk about language identification methods (n-grams and short word techniques), techniques for automatically generating queries in other languages (stemming and morphological analysis; dictionary-based translation, corpus-based translation, machine-translation based; query conflation), and retrieving and merging query results (weighting schemes)
ASIS Annual Meeting, Pittsburgh, PA October 26-29, 1998.