Nested Propositions in Open Information Extraction

Nikita Bhutani1, H V Jagadish1, Dragomir Radev2
1University of Michigan, Ann Arbor, 2University of Michigan


Abstract

The challenges of Machine Reading and Knowledge Extraction at a web scale require a system capable of extracting diverse information from large, heterogeneous corpora. The Open Information Extraction (OIE) paradigm aims at extracting assertions from large corpora without requiring a vocabulary or relation-specific training data. Most systems built on this paradigm extract binary relations from arbitrary sentences, ignoring the context under which the assertions are correct and complete. They lack the expressiveness needed to properly represent and extract complex assertions commonly found in the text. To address the lack of representation power, we propose NestIE, which uses a nested representation to extract higher-order relations, and complex, interdependent assertions. Nesting the extracted propositions allows NestIE to more accurately reflect the meaning of the original sentence. Our experimental study on real-world datasets suggests that NestIE obtains comparable precision with better minimality and informativeness than existing approaches. NestIE produces 1.7-1.8 times more minimal extractions and achieves 1.1-1.2 times higher informativeness than ClausIE.