This paper introduces to the finite-state calculus a family of directed replace operators. In contrast to the
simple replace expression, UPPER -> LOWER, defined in Karttunen (1995), the new directed version,
UPPER @-> LOWER, yields an unambiguous transducer if the lower language consists of a single string.
It transduces the input string from left to right, making only the longest possible replacement at each point.
A new type of replacement expression, UPPER @-> PREFIX ... SUFFIX, yields a transducer that inserts text
around strings that are instances of UPPER. The symbol ... denotes the matching part of the input which itself
remains unchanged. PREFIX and SUFFIX are regular expressions describing the insertions. Expressions of the
type UPPER @-> PREFIX ... SUFFIX may be used to compose a deterministic parser for a 'local grammar' in
the sense of Gross (1989). Other useful applications of directed replacement include tokenization and filtering
of text streams.
Proceedings of ACL'96. Santa Cruz, CA, June 23-28 1996. pp.108-115
acl96.pdf (191.04 kB)
acl96.ps (186.82 kB)