Publications
Authors:
  • Andre Kempe
Citation:
Research Report
Abstract:
The automatic extraction of acronyms and their meaning from corpora
is an important sub-task of text mining. It can be seen as a special
case of string alignment, where a text chunk is aligned with an
acronym. Alternative alignments have different cost, and ideally the
least costly one should give the correct meaning of the acronym. We
show how this approach can be implemented by means of a 3-tape
weighted finite-state machine (3-WFSM) which reads a text chunk on
tape 1 and an acronym on tape 2, and generates all alternative
alignments on tape 3. The 3-WFSM can be automatically generated from
a simple regular expression. No additional algorithms are required
at any stage. Our 3-WFSM has a size of 27 states and 64 transitions,
and finds the best analysis of an acronym in a few milliseconds.
Year:
2006
Report number:
2006/019
Attachments: