Distributed semantics for automatically classifying discourse relations
Discourse structure provides a link between individual sentences, which can be analyzed syntactically, and document-level labels, which are typically computed from bag-of-words statistics. Successfully modeling discourse therefore has the potential to improve a range of natural language processing applications, from sentiment analysis to machine translation. One challenge to achieving this vision of discourse-driven NLP is that many discourse relations are fundamentally semantic in nature. While theoretical work has elucidated connections between formal semantics and discourse, at present this is of little practical use in automatic discourse parsing, due to the intractability of open-domain formal semantic analysis. Distributed compositional semantics offers an appealing alternative: the meaning of discourse arguments is captured in dense numerical vectors, which are constructed incrementally from smaller linguistic units; the compositional operations themselves can be learned so as to optimize performance on discourse parsing. But the key question is whether these vector-based representations are sufficiently expressive to capture the semantics behind discourse relations. This talk describes three projects in which distributed semantic representations yield significant improvements in discourse relation detection: rhetorical structure theory parsing, supervised PDTB relation classification, and a generative discourse relational language model.