Same Referent, Different Words: Unsupervised Mining of Opaque Coreferent Mentions
Marta Recasens, Matthew Can and Dan Jurafsky
Coreference resolution systems rely heavily on string overlap (e.g., "Google
Inc." and "Google"), performing badly on mentions with very different words
("opaque" mentions) like "Google" and "the search giant". Yet prior attempts to
resolve opaque pairs using ontologies or distributional semantics hurt
precision more than improved recall. We present a new unsupervised method for
mining opaque pairs. Our intuition is to "restrict" distributional semantics to
articles about the same event, thus promoting referential match. Using an
English comparable corpus of tech news, we built a dictionary of opaque
coreferent mentions (only 3% are in WordNet). Our dictionary can be integrated
into any coreference system (it increases the performance of a state-of-the-art
system by 1% F1 on all measures) and is easily extendable by using news
aggregators.
Back to Papers Accepted