Re-examining Machine Translation Metrics for Paraphrase Identification

Nitin Madnani1,  Joel Tetreault1,  Martin Chodorow2
1Educational Testing Service, 2Hunter College, CUNY


Abstract

We propose to re-examine the hypothesis that automated metrics developed for MT evaluation can prove useful for paraphrase identification in light of the significant work on the development of new MT metrics over the last 4 years. We show that a meta-classifier trained using nothing but recent MT metrics outperforms all previous paraphrase identification approaches on the Microsoft Research Paraphrase corpus. In addition, we apply our system to a second corpus developed for the task of plagiarism detection and obtain extremely positive results. Finally, we conduct extensive error analysis and uncover the top systematic sources of error for a paraphrase identification approach relying solely on MT metics. We release both the new dataset and the error analysis annotations for use by the community.