Grammar based generation: algorithms, error mining and applications
Claire Gardent, directeur de recherche at CNRS, LORIA, Nancy, France
While statistical approaches to grammar based generation have been shown to be fast and robust, hybrid symbolic and statistical approaches are generally more precise and more transparent. They allow for a simple encoding of basic linguistic phenomena such as subject-verb agreement or temporal inflection; and can easily be modified/corrected to fit the needs of a particular application. On the other hand, these approaches are difficult to debug, often lack coverage and may fail to scale up.
In this talk, I will report on some work we did in Nancy to address these shortcomings and present a hybrid grammar based approach to sentence generation where the grammar used is a Feature-Based Lexicalised Tree Adjoining Grammar. I will start by presenting an SR algorithm that permits generating from data derived from the Penn Treebank by the the Surface Realisation (SR) Task organisers. On this data, the grammar has a coverage of 83% (with no robustness mechanism added); the average generation time is 2.57 seconds and the SR algorithm achieves a BLEU score of 0.73 thus outperforming by a large margin (+0.36) the best hybrid symbolic/statistical system participating in the SR task. I will then go on to show how the grammar was semi-automatically improved using statistical error mining techniques. Finally, I will motivate the use of hand coded precise grammars by showing how an FB-LTAG can be used to automatically generate grammar exercises and their solutions.