Automatic F-structure Annotation of Treebank Trees

Anette Frank
We describe a method that automatically induces LFG f-structures from treebank tree representations, given a set of f-structure annotation principles that define partial, modular c- to f-structure correspondences in a linguistically informed, principle-based way. These principles are applied to treebank tree representations, using an existing term rewriting system. Due to the disambiguated tree input, the resulting f-structures require only minimal manual disambiguation. The annotation principles define partial, characteristic c- to f-structure correspondences that abstract away from irrelevant c-structure contexts, and therefore apply to previously unseen tree configurations. The method is fully automated, and inherently robust. It yields partial, unconnected f-structures in the case of missing annotation rules. We describe the results of a first experiment where we apply this method to the Susanne treebank, and extend the model to selective ambiguity filtering, using lexical subcategorization information. Finally we address some conceptual issues, such as changes to treebank encodings, and which type of encodings should be expoited for different applications: the construction of f-structure banks, as opposed to more far-reaching goals, including rapid, corpus-based LFG grammar development, and robust parsing architectures.
Proceedings of the LFG 2000 Conference, University of California at Berkley, 19.-20. July (to appear)