Space Efficiencies in Discourse Modeling via Conditional Random Sampling

Brian Kjersten1 and Benjamin Van Durme2
1Johns Hopkins University Center for Language and Speech Processing, 2Johns Hopkins University, Human Language Technology Center of Excellence


Abstract

Recent exploratory efforts in discourse-level language modeling have relied heavily on calculating Pointwise Mutual Information (PMI), which involves significant computation when done over large collections. Prior work has required aggressive pruning or independence assumptions to compute scores on large collections. We show the method of Conditional Random Sampling, thus far an under-utilized technique, to be a space-efficient meansof representing the sufficient statistics in discourse that underly recent PMI-based work.This is demonstrated in the context of inducing Shankian script-like structures over newsarticles.