The World Wide Web as a Resource for Example-Based Machine Translation Tasks

Greg Grefenstette
The WWW is two orders of magnitude larger than the largest corpora. Although noisy, web text presents language as it is used, and statistics derived from the Web can have practical uses in many NLP applications. For this reason, the WWW should be seen and studied as any other computationally available linguistic resource. In this article, we illustrate this by showing that an Example-Based approach to lexical choice for machine translation can use the Web as an adequate and free resource.
ASLIB, Translating and the Computer 21, London, Nov 10-11, 1999.


gg_aslib.pdf (319.59 kB)