Preliminary Program

Space Efficiencies in Discourse Modeling via Conditional Random Sampling

Brian Kjersten¹ and Benjamin Van Durme²
¹Johns Hopkins University Center for Language and Speech Processing, ²Johns Hopkins University, Human Language Technology Center of Excellence

Abstract

Recent exploratory efforts in discourse-level language modeling have relied heavily on calculating Pointwise Mutual Information (PMI), which involves signiﬁcant computation when done over large collections. Prior work has required aggressive pruning or independence assumptions to compute scores on large collections. We show the method of Conditional Random Sampling, thus far an under-utilized technique, to be a space-efﬁcient meansof representing the sufﬁcient statistics in discourse that underly recent PMI-based work.This is demonstrated in the context of inducing Shankian script-like structures over newsarticles.