Paper Abstracts

Guiding Statistical Word Alignment Models With Prior Knowledge

Yonggang Deng and Yuqing Gao

We present a general framework to incorporate prior knowledge such as heuristics or linguistic features in statistical generative word alignment models. Prior knowledge plays a role of probabilistic soft constraints between bilingual word pairs that shall be used to guide word alignment model training. We investigate knowledge that can be derived automatically from entropy principle and bilingual latent semantic analysis and show how they can be applied to improve translation performance.

A Discriminative Syntactic Word Order Model for Machine Translation

Pi-Chuan Chang and Kristina Toutanova

We present a global discriminative statistical word order model for machine translation. Our model combines syntactic movement and surface movement information, and is discriminatively trained to choose among possible word orders. We show that combining discriminative training with features to detect these two different kinds of movement phenomena leads to substantial improvements in word ordering performance over strong baselines. Integrating this word order model in a baseline MT system results in a 2.4 points improvement in BLEU for English to Japanese translation.

Tailoring Word Alignments to Syntactic Machine Translation

John DeNero and Dan Klein

Extracting tree transduction rules for syntactic MT systems can be complicated by word alignment errors which violate syntactic correspondences. We propose a novel model for unsupervised word alignment which explicitly takes into account target language constituent structure during training, while retaining the robustness and efficiency of the HMM model. The model's predictions improve the yield of a tree transduction extraction system without sacrificing alignment quality.

Transductive learning for statistical machine translation

Nicola Ueffing, Gholamreza Haffari and Anoop Sarkar

Statistical machine translation systems are usually trained on large amounts of bilingual text (to learn translation models) and monolingual text in the target language (for language models). In this paper we explore the use of transductive semi-supervised methods for the effective use of monolingual data from the source language in order to improve translation quality. We propose several algorithms with this aim, and present the strengths and weaknesses of each one. We present detailed experimental evaluations on the French-English EuroParl data set and on data from the NIST 2006 Chinese-English large-data track. We show a significant improvement in translation quality on both data sets.

Word Sense Disambiguation Improves Statistical Machine Translation

Yee Seng Chan, Hwee Tou Ng and David Chiang

Recent research presents conflicting evidence on whether word sense disambiguation (WSD) systems can help to improve the performance of statistical machine translation (MT) systems. In this paper, we successfully integrate a state-of-the-art WSD system into a state-of-the-art hierarchical phrase-based MT system, Hiero. We show for the first time that integrating a WSD system improves the performance of a state-of-the-art statistical MT system on an actual translation task. Furthermore, the improvement is statistically significant.

Learning Expressive Models for Word Sense Disambiguation

Lucia Specia, Mark Stevenson and Maria das Graças Volpe Nunes

We present a novel approach to the word sense disambiguation problem which makes use of corpus-based evidence combined with background knowledge. Employing an inductive logic programming algorithm, the approach generates expressive disambiguation rules which exploit several knowledge sources and can also model relations between them. The approach is evaluated in two tasks: identification of the correct translation for a set of highly ambiguous verbs in English-Portuguese translation and disambiguation of verbs from the Senseval-3 lexical sample task. The average accuracy obtained for the multilingual task outperforms the other machine learning techniques investigated. In the monolingual task, the approach performs as well as the state-of-the-art system which reported results for the same set of verbs.

Domain Adaptation with Active Learning for Word Sense Disambiguation

Yee Seng Chan and Hwee Tou Ng

When a word sense disambiguation (WSD) system is trained on one domain but applied to a different domain, a drop in accuracy is frequently observed. This highlights the importance of domain adaptation for word sense disambiguation. In this paper, we first show that an active learning approach can be successfully used to perform domain adaptation of WSD systems. Then, by using the predominant sense predicted by expectation-maximization (EM) and adopting a count-merging technique, we improve the effectiveness of the original adaptation process achieved by the basic active learning approach.

Making Lexical Ontologies Functional and Context-Sensitive

Tony Veale and Yanfen Hao

Human categorization is neither a binary nor a context-free process. Rather, some concepts are better examples of a category than others, while the criteria for category membership may be satisfied to different degrees by different concepts in different contexts. In light of these empirical facts, WordNet's static category structure appears both excessively rigid and unduly fragile for processing real texts. In this paper we describe a syntagmatic, corpus-based approach to re-defining WordNet's categories in a functional, gradable and context-sensitive fashion. We describe how the diagnostic properties for these definitions are automatically acquired from the web, and how the increased flexibility in categorization that arises from these redefinitions offers a robust account of metaphor comprehension in the mold of Glucksberg's (2001) theory of category-inclusion. Furthermore, we demonstrate how this competence with figurative categorization can effectively be governed by automatically-generated ontological constraints, also acquired from the web.

A Bayesian Model for Discovering Typological Implications

Hal Daume III and Lyle Campbell

A standard form of analysis for linguistic typology is the universal implication. These implications state facts about the range of extant languages, such as "if objects come after verbs, then adjectives come after nouns." Such implications are typically discovered by painstaking hand analysis over a small sample of languages. We propose a computational model for assisting at this process. Our model is able to discover both well-known implications as well as some novel implications that deserve further study. Moreover, through a careful application of hierarchical analysis, we are able to cope with the well-known sampling problem: languages are not independent.

A discriminative language model with pseudo-negative samples

Daisuke Okanohara and Jun'ichi Tsujii

In this paper, we propose a novel discriminative language model which can be used for general applications. In contrast to the well known N-gram language models, discriminative language models can achieve more accurate discrimination because it can employ overlapped features and non-local information. However, discriminative language models have been used only for re-ranking in specific applications because we cannot obtain negative examples. We propose to sample pseudo-negative examples taken from N-gram language models. This formulation, however, requires prohibitive computational cost to handle quite a few features and training samples. We tackle the problem by estimating latent information of sentences using a semi-Markov class model, and then extract features from them. We also use an on-line max-margin algorithm with efficient kernel computation. Experimental results show that pseudo-negative examples can be seen as real negative examples and our model can discriminate these sentences correctly.

Detecting Erroneous Sentences using Automatically Mined Sequential Patterns

Guihua Sun, Xiaohua Liu, Gao Cong, Ming Zhou, Zhongyang Xiong, John Lee and Chin-Yew Lin

This paper studies the problem of identifying erroneous/correct sentences. The problem has important applications, e.g., providing feedback for writers of English as a Second Language, controlling the quality of parallel bilingual sentences mined from the Web, and evaluating machine translation results.In this paper, we propose a new approach to detecting erroneous/correct sentences by integrating pattern discovery with supervised learning models. Experimental results show that our techniques are promising.

Vocabulary Decomposition for Estonian Open Vocabulary Speech Recognition

Antti Puurula and Mikko Kurimo

Speech recognition in many morphologically rich languages suffers from a very high out-of-vocabulary (OOV) ratio. Earlier work has shown that vocabulary decomposition methods can practically solve this problem for a subset of these languages. This paper compares various vocabulary decomposition approaches to open vocabulary speech recognition, using Estonian speech recognition as a benchmark. Comparisons are performed utilizing large models of 60000 lexical items and smaller vocabularies of 5000 items. A large vocabulary model based on a manually constructed morphological tagger is shown to give the lowest word error rate, while the unsupervised morphology discovery method Morfessor Baseline gives marginally weaker results. Only the Morfessor-based approach is shown to adequately scale to smaller vocabulary sizes.

Phonological Constraints and Morphological Preprocessing for Grapheme-to-Phoneme Conversion

Vera Demberg, Helmut Schmid and Gregor Möhler

Grapheme-to-phoneme conversion (g2p) is a core component of any text-to-speech system. We show that adding simple syllabification and stress assignment constraints, namely ‘one nucleus per syllable’ and ‘one main stress per word’, to a joint n-gram model for g2p conversion leads to a dramatic improvement in conversion accuracy. Secondly, we assessed morphological preprocessing for g2p conversion. While morphological information has been incorporated in some past systems, its contribution, if any, has never been quantitatively assessed. We compared the relevance of morphological preprocessing with respect to the morphological segmentation method, training set size, the g2p conversion algorithm, and two languages, English and German.

Redundancy Ratio: An Invariant Property of the Consonant Inventories of the World's Languages

Animesh Mukherjee, Monojit Choudhury, Anupam Basu and Niloy Ganguly

In this paper, we put forward an information theoretic definition of the redundancy that is observed across the sound inventories of the world's languages. Through rigorous statistical analysis, we find that this redundancy is an invariant property of the consonant inventories. The statistical analysis further unfolds that the vowel inventories do not exhibit any such property, which in turn points to the fact that the organizing principles of the vowel and the consonant inventories are quite different in nature.

Multilingual Transliteration Using Feature based Phonetic Method

Su-Youn Yoon, Kyoung-Young Kim and Richard Sproat

In this paper we investigate named entity transliteration using a phonetic scoring method. The phonetic method is computed using phonetic features and pseudo features carefully designed based on pronunciation error data of second-language learners of English. A phonetic-feature-based linear classifier is trained using the Winnow machine learning algorithm. The proposed method is tested with four languages – Arabic, Chinese, Hindi and Korean – and a source language – English using comparable corpora. There is salient improvement in Hindi and Arabic compared to the baseline system, which was constructed by hand using phonetic knowledge but no training data. The proposed method can be trained using a small amount of data and thus is useful in situations where there is limited training data. Moreover, for some rarely spoken languages, it is practically impossible to collect enough training data. We also demonstrate that the method is effective when training on language pairs other than the target language pair. In summary, the method can be applied both with minimal data, and without target language data, and can achieve comparable results for various languages. This is possible because the method makes use of language-independent phonetic features, as well as language-pair independent features that model common interlanguage substitution errors.

Semantic Transliteration of Personal Names

Haizhou Li, Khe Chai Sim, Jin-Shea Kuo and Minghui Dong

Words of foreign origin are referred to as borrowed words or loanwords. A loanword is usually imported to Chinese by phonetic transliteration if a translation is not easily available. Semantic transliteration is seen as a good tradition that passes on from generation to generation in introducing foreign words to Chinese. Not only does it preserve how a word sounds in the source language, it also carries forward the word’s original semantic attributes. This paper attempts to automate the semantic transliteration process for the first time. We conduct an inquiry into the feasibility of semantic transliteration and propose a probabilistic model for transliterating personal names in Latin script into Chinese. The results show that semantic transliteration substantially and consistently improves accuracy over phonetic transliteration in all the experiments.

Generating Complex Morphology for Machine Translation

Einat Minkov, Kristina Toutanova and Hisami Suzuki

We present a novel method for predicting inflected word forms for generating morphologically rich languages in machine translation. We utilize a rich set of syntactic and morphological knowledge sources from both source and target sentences in a probabilistic model, and evaluate their contribution in generating Russian and Arabic sentences. Our results show that the proposed model substantially outperforms the commonly used baseline of a trigram target language model; in particular, the use of morphological and syntactic features leads to a very large gain in prediction accuracy. We also show that the proposed method is effective a with relatively small amount of data.

Assisting Translators in Indirect Lexical Transfer

Bogdan Babych, Anthony Hartley, Serge Sharoff and Olga Mudraya

We present the design and evaluation of a translator’s amenuensis that uses comparable corpora to propose and rank non-literal solutions to the translation of expressions from the general lexicon. Using distributional similarity and bilingual dictionaries, the method outperforms established techniques for extracting translation equivalents from parallel corpora.

Forest Rescoring: Faster Decoding with Integrated Language Models

Liang Huang and David Chiang

Efficient decoding has been a fundamental problem in machine translation, especially with an integrated language model which is essential for achieving good translation quality. We develop faster approaches for this problem based on k-best parsing algorithms and demonstrate their effectiveness on both phrase-based and syntax-based MT systems. In both cases, our methods achieve an order of magnitude speed-up over the conventional beam-search method at the same levels of search error and translation accuracy as measured by BLEU.

Statistical Machine Translation through Global Lexical Selection and Sentence Reconstruction

Srinivas Bangalore, Patrick Haffner and Stephan Kanthak

Machine translation of a source language sentence involves selecting appropriate target language words and ordering the selected words to form a well-formed target language sentence. Most of the previous work on statistical machine translation relies on (em local) associations of target words/phrases with source words/phrases for lexical selection. In contrast, in this paper, we present a novel approach to lexical selection where the target words are associated with the entire source sentence (em global) without the need to compute local associations. Further, we present a technique for reconstructing the target language sentence from the selected words. We compare the results of this approach against those obtained from a finite-state based statistical machine translation system which relies on local lexical associations.

Mildly Context-Sensitive Dependency Languages

Marco Kuhlmann and Mathias Möhl

Dependency-based representations of natural language syntax require a fine balance between structural flexibility and computational complexity, and in recent work, several constraints have been proposed to identify classes of dependency structures that are well-balanced in this sense. All of these constraints are formulated on fully specified structures, which makes it hard to integrate them into models where structures are composed from lexical information. In this paper, we show how two empirically relevant structural constraints can be lexicalized, and how combining the resulting lexicons with a regular means of composition gives rise to a hierarchy of mildly context-sensitive dependency languages. Our results provide fundamental insights into the relation between structural properties of dependency representations and notions of formal power.

Transforming Projective Bilexical Dependency Grammars into efficiently-parsable CFGs with Unfold-Fold

Mark Johnson

This paper shows how to use the Unfold-Fold transformation to transform Projective Bilexical Dependency Grammars (PBDGs) into ambiguity-preserving weakly equivalent Context-Free Grammars (CFGs). These CFGs can be parsed in O(n³) time using the CKY and other standard algorithms with appropriate indexing, rather than the O(n⁵) time required by a naive encoding. Informally, using the CKY algorithm with such a CFG mimics the steps of the Eisner-Satta O(n³) PBDG parsing algorithm. This transformation makes all of the techniques developed for CFGs available to PBDGs. We demonstrate this by describing a maximum posterior parse decoder for PBDGs.

Parsing and Generation as Datalog Queries

Makoto Kanazawa

We show that the problems of parsing and surface realization for grammar formalisms with "context-free" derivations, coupled with Montague semantics (under a certain restriction) can be reduced in a uniform way to Datalog query evaluation. As well as giving a polynomial-time algorithm for computing all derivation trees (in the form of a shared forest) from an input string or input logical form, this reduction has the following complexity-theoretic consequences for all such formalisms: (i) the decision problem of recognizing grammaticality (surface realizability) of an input string (logical form) is in LOGCFL; and (ii) the search problem of finding one logical form (surface string) from an input string (logical form) is in functional LOGCFL. Moreover, the generalized supplementary magic-sets rewriting of the Datalog program resulting from the reduction yields efficient Earley-style algorithms for both parsing and generation.

Optimizing Grammars for Minimum Dependency Length

Daniel Gildea and David Temperley

We examine the problem of choosing word order for a set of dependency trees so as to minimize total dependency length. We present an algorithm for computing the optimal layout of a single tree as well as a numerical method for optimizing a grammar of orderings over a set of dependency types. A grammar generated by minimizing dependency length in unordered trees from the Penn Treebank is found to agree surprisingly well with English word order, suggesting that dependency length minimization has influenced the evolution of English.

Generalizing semantic role annotations across syntactically similar verbs

Andrew Gordon and Reid Swanson

Large corpora of parsed sentences with semantic role labels (e.g. PropBank) provide training data for use in the creation of high-performance automatic semantic role labeling systems. Despite the size of these corpora, individual verbs (or rolesets) often have only a handful of instances in these corpora, and only a fraction of English verbs have even a single annotation. In this paper, we describe an approach for dealing with this sparse data problem, enabling accurate semantic-role labeling for verbs (rolesets) with only a single training example. Our approach involves the identification of syntactically similar verbs found in PropBank, the alignment of arguments in their corresponding rolesets, and the use of their corresponding annotations in PropBank as surrogate training data.

A Grammar-driven Convolution Tree Kernel for Semantic Role Classification

Min Zhang, Wanxiang Che, Aiti Aw, Chew Lim Tan, Guodong Zhou, Ting Liu and Sheng Li

Convolution tree kernel has shown very promising results in semantic role labeling (SRL). However, this method considers less linguistic knowledge and only carries out hard matching between substructures, which may lead to over-fitting and less accurate similarity measure. To remove the constraints, this paper proposes a grammar-driven convolution tree kernel for semantic role classification by introducing more linguistic knowledge into the convolution tree kernel. The proposed grammar-driven convolution displays two advantages over the previous one: 1) grammar-driven approximate substructure matching and 2) grammar-driven approximate tree node matching. The two improvements enable the proposed kernel explore more linguistically motivated substructure features than the previous one. Experiments on the CoNLL-2005 SRL shared task show that the proposed grammar-driven tree kernel significantly outperforms the previous non-grammar-driven one in semantic role classification. Moreover, we present a composite kernel to integrate feature-based and tree kernel-based methods. Experimental results show that the composite kernel outperforms the previously best-reported methods.

Learning Predictive Structures for Semantic Role Labeling of NomBank

Chang Liu and Hwee Tou Ng

This paper presents a novel application of Alternating Structure Optimization (ASO) to the task of Semantic Role Labeling (SRL) of noun predicates in NomBank. ASO is a recently proposed linear multi-task learning algorithm, which extracts the common structures of multiple tasks to improve accuracy, via the use of auxiliary problems. In this paper, we explore a number of different auxiliary problems, and we are able to significantly improve the accuracy of the NomBank SRL task using this approach. To our knowledge, our proposed approach achieves the highest accuracy published to date on the English NomBank SRL task.

A Simple, Similarity-based Model for Selectional Preferences

Katrin Erk

We propose a new, simple model for the automatic induction of selectional preferences, using corpus-based semantic similarity metrics. Focusing on the task of semantic role labeling, we compute selectional preferences for semantic roles. The new model is extensively evaluated and compared to both WordNet models and EM-based clustering models of selectional preferences.

SVM Model Tampering and Anchored Learning: A Case Study in Hebrew NP Chunking

Yoav Goldberg and Michael Elhadad

We study the issue of porting a known NLP method to a language with little existing NLP re-sources, specifically Hebrew SVM-based chunking. We introduce two SVM-based methods – Model Tampering and Anchored Learning. These allow fine grained analysis of the learned SVM models, which provides guidance to identify errors in the training corpus, distinguish the role and interaction of lexical features and eventually construct a model with 15% error reduction. The resulting chunker is shown to be robust in the presence of noise in the training corpus, relies on less lexical features than was previously understood and achieves an F-measure performance of 92.2 on automatically PoS-tagged text. The SVM analysis methods also provide general insight on SVM-based chunking.

Fully Unsupervised Discovery of Concept-Specific Relationships by Web Mining

Dmitry Davidov, Ari Rappoport and Moshe Koppel

We present a web mining method for discovering and enhancing relationships in which a specified concept participates. We discover a whole range of relationships focused on the given concept, rather than generic known relationships as most previous work. Our method is based on clustering patterns that contain concept words and other words related to them. We evaluate the method on three different rich concepts and find that in each case the method generates a broad variety of relationships with good precision.

Adding Noun Phrase Structure to the Penn Treebank

David Vadas and James Curran

The Penn Treebank does not annotate within base noun phrases (NPs), committing only to flat structures that ignore the complexity of English NPs. This means that tools trained on Treebank data cannot learn the correct internal structure of NPs. This paper details the process of adding gold-standard bracketing within each noun phrase in the Penn Treebank. We then examine the consistency and reliability of our annotations. Finally, we use this resource to determine NP structure using several statistical approaches, thus demonstrating the utility of the corpus. This adds detail to the Penn Treebank that is necessary for many NLP applications.

Formalism-Independent Parser Evaluation with CCG and DepBank

Stephen Clark and James R. Curran

A key question facing the parsing community is how to compare parsers which use different grammar formalisms and produce different output. Evaluating a parser on the same resource used to create it can lead to non-comparable accuracy scores and an over-optimistic view of parser performance. In this paper we evaluate a CCG parser on DepBank, and demonstrate the difficulties in converting the parser output into DepBank grammatical relations. In addition we present a method for measuring the effectiveness of the conversion, which provides an upper bound on parsing accuracy. The CCG parser obtains an F-score of over 81% on labelled dependencies, against an upper bound of 84.8%. We compare the CCG parser against the RASP parser, outperforming RASP by over 5% overall and on the majority of dependency types.

Frustratingly Easy Domain Adaptation

Hal Daume III

We describe an approach to domain adaptation that is appropriate exactly in the case when one has enough "target" data to do slightly better than just using only "source" data. Our approach is incredibly simple, easy to implement as a preprocessing step (10 lines of Perl!) and outperforms state-of-the-art approaches on a range of datasets. The technique comes with several simple theoretical guarantees. Moreover, it is trivially extended to a multi-domain adaptation problem, where one has data from a variety of different domains.

Instance Weighting for Domain Adaptation in NLP

Jing Jiang and ChengXiang Zhai

Domain adaptation is an important problem in natural language processing (NLP) due to the lack of labeled data in novel domains. In this paper, we study the domain adaptation problem from the instance weighting perspective. We formally analyze and characterize the domain adaptation problem from a distributional view, and show that there are two distinct needs for adaptation, corresponding to the different distributions of instances and classification functions in the source and the target domains. We then propose a general instance weighting framework for domain adaptation. Our empirical results on three NLP tasks show that incorporating and exploiting more information from the target domain through instance weighting is effective.

The Infinite Tree

Jenny Rose Finkel, Trond Grenager and Christopher D. Manning

Historically, unsupervised learning techniques have lacked a principled technique for selecting the number of unseen components. Research into non-parametric priors such as the Dirichlet Process has enabled instead the use of infinite models, in which the number of hidden categories is not fixed, but can instead grow with the amount of training data. In this work we develop an infinite tree model, a new type of infinite model that is capable of representing recursive branching structure over an arbitrarily large set of hidden categories. Specifically, we develop three infinite tree models, each of which enforces different independence assumptions, and for each model we also define simple direct assignment sampling inference procedures suitable for unsupervised learning. We demonstrate the utility of our models by unsupervised learning of part-of-speech tags from treebank dependency skeleton structure, achieving an accuracy of 71%, and from unsupervised splitting of part-of-speech tags, which increases parsing accuracy from 85.11% to 87.40% when used in a generative dependency parser.

Guiding Semi-Supervision with Constraint-Driven Learning

Ming-Wei Chang, Lev Ratinov and Dan Roth

Over the last few years, two of the main research directions in machine learning of natural language processing have been the study of semi-supervised learning algorithms as a way to train classifiers when the labeled data is scarce, and the study of ways to exploit knowledge and global information in structured learning tasks. In this paper, we suggest to incorporate domain knowledge in semi-supervised learning algorithms. We use constraints as a general framework to represent common-sense knowledge and develop a novel learning protocol which unifies and can exploit several kinds of constraints. The experimental results presented in the information extraction domain exhibit that applying constraints helps the model to generate better feedback during learning, and hence the framework allows for high performance learning with significantly less training data than was possible before on these tasks.

Supertagged Phrase-Based Statistical Machine Translation

Hany Hassan, Khalil Sima'an and Andy Way

Until quite recently, extending Phrase-based Statistical Machine Translation (PBSMT) with syntactic structure caused system performance to deteriorate. In this work we show that lexical syntactic descriptions in the form of supertags can yield significantly better PBSMT systems. We describe a novel PBSMT model that incorporates supertags into the target language model and the target side of the translation model. Two kinds of supertags are employed: those of Lexicalized Tree-Adjoining Grammar (LTAG), and Combinatory Categorial Grammar (CCG). Despite the differences between the LTAG and CCG supertaggers, they give similar improvements. As well as supertagging, we also explore the utility of a surface global grammaticality measure based on combinatory operators. We perform various experiments on the Arabic to English NIST 2005 test set addressing issues such as sparseness, scalability and the utility of system subcomponents. %Using different sizes of training material, we show that our approach neither suffers from special %sparsity or scalability problems. Our best result (0.4688 BLEU) improves by 6.1% relative to a state-of-the-art PBSMT system and compares favourably with the best systems on the NIST 2005 task.

Regression for Sentence-Level MT Evaluation with Pseudo References

Joshua S. Albrecht and Rebecca Hwa

Most automatic evaluation metrics for machine translation (MT) rely on making comparisons to human translations. However, human references may not always be available. In this paper, we present a machine learning approach that combines a wide range of indicators of fluency and adequacy derived from weaker sources of comparisons pseudo references to form a composite metric that evaluates MT outputs at the sentence-level. We show that regression learning, which optimizes the metric to correlate with human assessment on training examples, is key in leveraging this weaker form of references. Our experimental results suggest the proposed framework creates metrics that rival standard reference-based metrics in terms of correlations with human judgments on new test instances.

Bootstrapping Word Alignment via Word Packing

Yanjun Ma, Nicolas Stroppa and Andy Way

We introduce a simple method to pack words for statistical word alignment. Our goal is to simplify the task of automatic word alignment by packing several consecutive words together when we believe they correspond to a single word in the opposite language. This is done using the word aligner itself, i.e. by bootstrapping on its output. We evaluate the performance of our approach on a Chinese-to-English machine translation task, and report a 12.2% relative increase in BLEU score over a state-of-the art phrase-based SMT system.

Improved Word-Level System Combination for Machine Translation

Antti-Veikko Rosti, Spyros Matsoukas and Richard Schwartz

Recently, confusion network decoding has been applied in machine translation system combination. Due to errors in the hypothesis alignment, decoding may result in ungrammatical combination outputs. This paper describes an improved confusion network based method to combine outputs from multiple MT systems. In this approach, arbitrary sentence-level features may be added log-linearly into the objective function, thus allowing language model re-scoring. Also, a novel method to automatically select the hypothesis which other hypotheses are aligned against is proposed. A generic weight tuning algorithm may be used to optimize various automatic evaluation metrics including TER, BLEU and METEOR. The experiments using the Arabic to English and Chinese to English NIST MT05 tasks show significant improvements in BLEU scores compared to earlier confusion network decoding based methods.

Generating Constituent Order in German Clauses

Katja Filippova and Michael Strube

We investigate the factors which determine constituent order in German clauses and propose an algorithm which performs the task in two steps: First, the best candidate for the initial sentence position is chosen. Then, the order for the remaining constituents is determined. The first task is more difficult than the second one because of properties of the German sentence-initial position. Experiments show a significant improvement over several baselines and competing approaches. Apart from that, our algorithm is considerably more efficient than these.

A Symbolic Approach to Near-Deterministic Surface Realisation using Tree Adjoining Grammar

Claire Gardent and Eric Kow

Surface realisers divide into those used in generation (NLG geared realisers) and those mirroring the parsing process (Reversible realisers). While the first rely on grammars not easily usable for parsing, it is unclear how the second type of realisers could be parameterised to yield from among the set of possible paraphrases, the paraphrase appropriate to a given generation context. In this paper, we present a surface realiser which combines a reversible grammar (used for parsing and doing semantic construction) with a symbolic means of selecting paraphrases.

Sentence generation as a planning problem

Alexander Koller and Matthew Stone

In this paper, we translate sentence generation from TAG grammars with semantic and pragmatic information into a planning problem by encoding the contribution of each word declaratively and explicitly. This allows us to tap into the recent performance improvements in off-the-shelf planners. It also opens up new perspectives on referring expression generation and the relationship between language and action.

GLEU: Automatic Evaluation of Sentence-Level Fluency

Andrew Mutton, Mark Dras, Stephen Wan and Robert Dale

In evaluating the output of language technology applications – MT, natural language generation, summarisation – automatic evaluation techniques generally conflate measurement of faithfulness to source content with fluency of the resulting text. In this paper we develop an automatic evaluation metric to estimate fluency alone, by examining the use of parser outputs as metrics, and show that they correlate with human judgements of generated text fluency. We then develop a machine learner based on these, and show that this performs better than the individual parser output metrics, approaching a lower bound on human performance. We finally look at different language models for generating sentences, and show that while individual parser metrics can be `fooled' depending on generation method, the machine learner provides a consistent estimator of fluency.

Conditional Modality Fusion for Coreference Resolution

Jacob Eisenstein and Randall Davis

Non-verbal modalities such as gesture can improve processing of spontaneous spoken language. For example, similar hand gestures tend to predict semantic similarity, so features that quantify gestural similairty can improve tasks such as coreference resolution. However, not all hand movements are informative gestures; psychological research has shown that speakers are more likely to gesture meaningfully when their speech is ambiguous. Ideally, one would attend to gesture only in such circumstances, and ignore other hand movements. We present conditional modality fusion, which formalizes this intuition by treating the informativeness of gesture as a hidden variable to be learned jointly with the class label. Applied to coreference resolution, conditional modality fusion significantly outperforms both early and late modality fusion, which are current state-of-the-art techniques for modality combination.

The Utility of a Graphical Representation of Discourse Structure in Spoken Dialogue Systems

Mihai Rotaru and Diane J. Litman

In this paper we explore the utility of the Navigation Map (NM), a graphical representation of the discourse structure. We run a user study to investigate if users perceive the NM as helpful in a tutoring spoken dialogue system. From the users’ perspective, our results show that the NM presence allows them to better identify and follow the tutoring plan and to better integrate the instruction. It was also easier for users to concentrate and to learn from the system if the NM was present. Our preliminary analysis on objective metrics further strengthens these findings.

Automated Vocabulary Acquisition and Interpretation in Multimodal Conversational Systems

Yi Liu, Joyce Y. Chai and Rong Jin

Motivated by psycholinguistic findings that eye gaze is tightly linked to human language production, we currently investigate the use of naturally co-occurred eye gaze and speech utterances during human machine conversation for automated vocabulary acquisition and interpretation in multimodal conversational systems. In particular, we developed an unsupervised approach based on translation models to automatically learn the mappings between words and objects on the graphic display. The experimental results indicate that user eye gaze can provide reliable information to establish such mappings, which have promising implications in automatically acquiring and interpreting user vocabularies for conversational systems.

A Multimodal Interface for Access to Content in the Home

Michael Johnston, Luis Fernando D'Haro, Michelle Levine and Bernard Renger

In order to effectively access the rapidly increasing range of media content available in the home, new kinds of more natural interfaces are needed. In this paper, we explore the application of multimodal interface technologies to searching and browsing a database of movies. The resulting system allows users to access movies using speech, pen, remote control, and dynamic combinations of these modalities. An experimental evaluation, with more than 40 users, is presented contrasting two variants of the system: one combining speech with traditional remote control input and a second where the user has a tablet display supporting speech and pen input.

Fast Unsupervised Incremental Parsing

Yoav Seginer

This paper describes an incremental parser and an unsupervised learning algorithm for inducing this parser from plain text. The parser uses a representation for syntactic structure similar to dependency links which is well-suited for incremental parsing. In contrast to previous unsupervised parsers, the parser does not use part-of-speech tags and both learning and parsing are local and fast, requiring no explicit clustering or global optimization. The parser is evaluated by converting its output into equivalent bracketing and improves on previously published results for unsupervised parsing from plain text.

K-best Spanning Tree Parsing

Keith Hall

This paper introduces a Maximum Entropy dependency parser based on an efficient k-best Maximum Spanning Tree (MST) algorithm. Although recent work suggests that the edge-factored constraints of the MST algorithm significantly inhibit parsing accuracy, we show that generating the 50-best parses according to an edge-factored model has an oracle performance well above the 1-best performance of the best dependency parsers. This motivates our parsing approach, which is based on reranking the k-best parses generated by an edge-factored model. We present a description of the k-best MST algorithm along with empirical results for a reranker based on tree features. We present oracle parse accuracy results for the edge-factored model and 1-best results for the reranker on eight languages (seven from CoNLL-X and English).

Is the End of Supervised Parsing in Sight?

Rens Bod

How far can we get with unsupervised parsing if we make our training corpus several orders of magnitude larger than has hitherto be attempted? We present a new efficient algorithm for unsupervised parsing using an all-subtree model. While previous unsupervised all-subtrees models depended on random sampling of subtrees from the set of all possible binary trees assigned to sentences (Bod 2006), our algorithm converts a packed forest of all binary trees directly into a compact PCFG. We test two models: the U-DOP* estimator which extracts subtrees from trees generated by shortest derivations, and UML-DOP which trains the full PCFG reduction on a held-out corpus. Both estimators are known to be statistically consistent. While UML-DOP slighly outperforms U-DOP*, only the latter can be tested on NANC’s WSJ and LA Times data (which is two orders of magnitude larger than Penn’s WSJ) showing that considerable improvement in unsupervised parsing is possible. This paper presents the first experiments with an unsupervised all-subtrees model without any a priori sampling. This paper also reports the first unsupervised results on the standard WSJ test set (section 23), achieving 70.7% unlabeled f-score.

An Ensemble Method for Selection of High Quality Parses

Roi Reichart and Ari Rappoport

While the average performance of statistical parsers gradually improves, they still attach to many sentences annotations of rather low quality. The number of such sentences grows when the training and test data are taken from different domains, which is the case for major web applications such as information retrieval and question answering. In this paper we present a Sample Ensemble Parse Assessment (SEPA) algorithm for detecting parse quality. We use a function of the agreement among several copies of a parser, each of which trained on a different sample from the training data, to assess parse quality. We experimented with both generative and reranking parsers (Collins, Charniak and Johnson respectively). We show superior results over several baselines, both when the training and test data are from the same domain and when they are from different domains. For a test setting used by previous work, we show an error reduction of 31% as opposed to their 20%.

Opinion Mining using Econometrics: A Case Study on Reputation Systems

Anindya Ghose, Panagiotis G. Ipeirotis and Arun Sundararajan

Deriving the polarity and strength of opinions is an important research topic, attracting significant attention over the last few years. Many existing approaches rely on human annotators to evaluate the polarity and strength of the opinions, a laborious and error-prone task. We take a different approach by considering the economic context in which an opinion is evaluated. We rely on the fact that the text in on-line systems influence the behavior of the readers and this effect can be observed using some easy-to-measure economic variables, such as revenues or product prices. Then, by reversing the logic, we infer the semantic orientation and the strength of an opinion by tracing the changes in the associated economic variable. In effect, we combine econometrics with text mining algorithms to identify the "economic value of text" and assign a "dollar value" to each opinion phrase, measuring sentiment effectively and without the need for manual labeling. We argue that by interpreting opinions within an econometric framework, we have the first objective, quantifiable, and context-sensitive evaluation of opinions. We make the discussion concrete by presenting results on the reputation system of the Amazon.com marketplace. We show that user feedback affects the pricing power of merchants and by measuring their pricing power we can infer the polarity and strength of the underlying textual evaluations posted by the buyers.

PageRanking WordNet Synsets: An Application to Opinion Mining

Andrea Esuli and Fabrizio Sebastiani

This paper presents an application of PageRank, a random-walk model originally devised for ranking Web search results, to ranking WordNet synsets in terms of how strongly they possess a given semantic property. The semantic properties we use for exemplifying the approach are positivity and negativity, two properties of central importance in sentiment analysis. The rationale of applying PageRank to detecting the semantic properties of synsets lies in the fact that the space of WordNet synsets may be seen as a graph, in which synsets are connected through the binary relation "a term belonging to synset s_k occurs in the gloss of synset s_i". The data for this relation can be obtained from Extended WordNet, a publicly available sense-disambiguated version of WordNet. We argue that this relation is structurally akin to the relation between hyperlinked Web pages, and thus lends itself to PageRank analysis. We report experimental results supporting our intuitions.

Structured Models for Fine-to-Coarse Sentiment Analysis

Ryan McDonald, Kerry Hannan, Tyler Neylon, Mike Wells and Jeff Reynar

In this paper we investigate a structured model for jointly classifying the sentiment of text at varying levels of granularity. Inference in the model is bottom-up and based on standard sequence classification techniques with constrained Viterbi inference to ensure consistent solutions. The primary advantage of such a model is that it allows classification decisions from one level in the text to influence decisions at another. Experiments show that this method can significantly reduce classification error relative to models trained in isolation.

Biographies, Bollywood, Boom-boxes and Blenders: Domain Adaptation for Sentiment Classification

John Blitzer, Mark Dredze and Fernando Pereira

Automatic sentiment classification has been extensively studied and applied in recent years. However, sentiment is expressed differently in different domains, and annotating corpora for every possible domain of interest is impractical. We investigate domain adaptation for sentiment classifiers, focusing on online reviews for different types of products. First, we extend to sentiment classification the recently-proposed structural correspondence learning (SCL) algorithm, reducing the relative error due to adaptation between domains by an average of 30% over the original SCL algorithm and 46% over a supervised baseline. Second, we identify a measure of domain similarity that correlates well with the potential for adaptation of a classifier from one domain to another. This measure could for instance be used to select a small set of domains to annotate whose trained classifiers would transfer well to many other domains.

Clustering Clauses for High-Level Relation Detection: An Information-theoretic Approach

Samuel Brody

Recently, there has been a rise of interest in unsupervised detection of high-level semantic relations involving complex units, such as phrases and whole sentences. Typically such approaches are faced with two main obstacles: data sparseness and correctly generalizing from the examples. In this work, we describe the Clustered Clause representation, which utilizes information-based clustering and inter-sentence dependencies to create a simplified and generalized representation of the grammatical clause. We implement an algorithm which uses this representation to detect a predefined set of high-level relations, and demonstrate our model's effectiveness in overcoming both the problems mentioned.

Instance-based Evaluation of Entailment Rule Acquisition

Idan Szpektor, Eyal Shnarch and Ido Dagan

Obtaining large volumes of inference knowledge, such as entailment rules, has become a major factor in achieving robust semantic processing. While there has been substantial research on learning algorithms for such knowledge, their evaluation methodology has been problematic, hindering further research. We propose a novel evaluation methodology for entailment rules which explicitly addresses their semantic properties and yields satisfactory human agreement levels. The methodology is used to compare two state of the art learning algorithms, exposing critical issues for future progress.

Statistical Machine Translation for Query Expansion in Answer Retrieval

Stefan Riezler, Alexander Vasserman, Ioannis Tsochantaridis, Vibhu Mittal and Yi Liu

This paper presents a novel approach to query expansion in answer retrieval that uses Statistical Machine Translation (SMT) techniques to bridge the lexical gap between questions and answers. SMT-based query expansion is done by i) applying a full-sentence paraphraser to the query to introduce synomyms in global query context, and ii) by translating query terms into answer terms using a full-sentence SMT model trained on question-answer pairs. We compare these global, context-aware query expansion techniques with a tfidf model and local query expansion on a database of 10 million question-answer pairs extracted from FAQ pages. Experimental results show a significant improvement of SMT-based query expansion over both baselines.

A Computational Model of Text Reuse in Ancient Literary Texts

John Lee

We propose a computational model of text reuse tailored for ancient literary texts, available to us often only in small and noisy samples. The model takes into account source alternation patterns, so as to be able to align even sentences with low surface similarity. We demonstrate its ability to characterize text reuse in the Greek New Testament.

Finding document topics for improving topic segmentation

Olivier Ferret

Topic segmentation and identification are often tackled as separate problems whereas they are both part of topic analysis. In this article, we study how topic identification can contribute to improve a topic segmenter based on word reiteration. We first present an unsupervised method for discovering the topics of a text. Then, we detail how these topics are used by segmentation for finding topical similarities between text segments. Finally, we show through the results of an evaluation done both for French and English the interest of the method we propose.

The utility of parse-derived features for automatic discourse segmentation

Seeger Fisher and Brian Roark

We investigate different feature sets for performing automatic sentence-level discourse segmentation within a general machine learning approach, including features derived from either finite-state or context-free annotations. We achieve the best reported performance on this task, and demonstrate that our SPADE-inspired context-free features are critical to achieving this level of accuracy. This counters recent results suggesting that purely finite-state approaches can perform competitively.

PERSONAGE: Personality Generation for Dialogue

Francois Mairesse and Marilyn Walker

Over the last fifty years, the "Big Five" model of personality traits has become standard in psychology, and research has systematically documented correlations between a wide range of linguistic variables and Big Five traits. A distinct line of research has explored methods for automatically generating language that varies along personality dimensions. While this work suggests a clear utility for generating personality-rich language: (1) these generation systems have not been evaluated to see whether they produce recognizable personality variation; (2) they have primarily been based on template-based generation with limited paraphrases for different personality settings; (3) the use of psychological findings has been heuristic rather than systematic. We present PERSONAGE (PERSONAlity GEnerator), a language generator with 29 parameters previously shown to correlate with extraversion, an important aspect of personality. We explore two methods for generating personality-rich language: (1) direct generation with particular parameter settings suggested by the psychology literature; and (2) overgeneration and selection using statistical models trained from judge's ratings. An evaluation shows that both methods reliably generate utterances that vary along the extraversion dimension, according to human judges.

Making Sense of Sound: Unsupervised Topic Segmentation over Acoustic Input

Igor Malioutov, Alex Park, Regina Barzilay and James Glass

We address the task of unsupervised topic segmentation of speech data relying only on raw acoustic information. In contrast to existing algorithms for topic segmentation of speech, our approach does not require input transcripts. Our method predicts topic changes by analyzing the distribution of re-occurring acoustic patterns in the speech signal corresponding to a single speaker. The algorithm robustly handles noise inherent in matching based on acoustic similarity by intelligently aggregating information about distributional similarity from multiple local comparisons. Our experiments show that audio-based segmentation compares favorably with transcript-based segmentation computed over noisy transcripts. These results demonstrate the utility of our method for applications where a speech recognizer is not available, or its output has a high word error rate.

Randomised Language Modelling for Statistical Machine Translation

David Talbot and Miles Osborne

A Bloom filter (BF) is a randomised data structure for set membership queries. Its space requirements are significantly below lossless information-theoretic lower bounds but it produces false positives with some constant probability. Here we explore the use of BFs for language modelling in statistical machine translation. We investigate how a BF containing n-grams extracted from a large corpus can complement a standard n-gram LM within an SMT system and consider (i) how to include approximate frequency information efficiently and (ii) how to reduce the effective error rate by first checking for lower-order subsequences in candidate n-grams. Our solutions in both cases retain the one-sided error guarantees of the standard BF while taking advantage of the particular characteristics of natural language statistics to reduce the space requirements.

Bilingual-LSA Based LM Adaptation for Spoken Language Translation

Yik-Cheung Tam, Ian Lane and Tanja Schultz

We propose a novel approach to crosslingual language model (LM) adaptation based on bilingual Latent Semantic Analysis (bLSA). A bLSA model is introduced which enables latent topic distributions to be efficiently transferred across languages byenforcing a one-to-one topic correspondence during training. Using the proposed bLSA framework crosslingual LM adaptation can be performed by, first, inferring the topic posterior distribution of the source text and then applying the inferred distribution to the target language N-gram LM via marginal adaptation. The proposed framework also enables rapid bootstrapping of LSA models for new languages based on a source LSA model from another language. On Chinese to English speech and text translation the proposed bLSA framework successfully reduced word perplexity of the English LM by over 27% for a unigram LM and up to 13.6% for a 4-gram LM. Furthermore, the proposed approach consistently improved machine translation quality.

Coreference Resolution Using Semantic Relatedness Information from Automatically Discovered Patterns

Xiaofeng Yang and Jian Su

Semantic relatedness is a very important factor for the coreference resolution task. To obtain this semantic information, corpus-based approaches commonly leverage patterns that can express a specific semantic relation. The patterns, however, are designed manually and thus are not necessarily the most effective ones in terms of accuracy and breadth. To deal with this problem, in this paper we propose an approach that can automatically find the effective patterns for coreference resolution. We explore how to automatically discover and evaluate patterns, and how to exploit the patterns to obtain the semantic relatedness information. The evaluation on ACE data set shows that the pattern based semantic information is helpful for coreference resolution.

Semantic Class Induction and Coreference Resolution

Vincent Ng

This paper examines whether a learning-based coreference resolver can be improved using semantic class knowledge that is automatically acquired from a version of the Penn Treebank in which the noun phrases are labeled with their semantic classes. Experiments on the ACE test data shows that a coreference resolver that employs such induced semantic class knowledge significantly outperforms (by 2% in F-measure) one that uses heuristically computed semantic class knowledge. More importantly, the induced knowledge improves the accuracy of common noun resolution by 2-6%.

Generating a Table-of-Contents

S. R. K. Branavan, Pawan Deshpande and Regina Barzilay

This paper presents a method for the automatic generation of a table-of-contents. This type of summary could serve as an effective navigation tool for accessing information in long texts, such as books. To generate a coherent table-of-contents, we need to capture both global dependencies across different titles in the table and local constraints within sections. Our algorithm effectively handles these complex dependencies by factoring the model into local and global components, and incrementally constructing the model's output. The results of automatic evaluation and manual assessment confirm the benefits of this design: our system is consistently ranked higher than non-hierarchical baselines.

Towards an Iterative Reinforcement Approach for Simultaneous Document Summarization and Keyword Extraction

Xiaojun Wan, Jianwu Yang and Jianguo Xiao

Though both document summarization and keyword extraction aim to extract concise representations from documents, these two tasks have usually been investigated independently. This paper proposes a novel iterative reinforcement approach to simultaneously extracting summary and keywords from single document under the assumption that the summary and keywords of a document can be mutually boosted. The approach can naturally make full use of the reinforcement between sentences and keywords by fusing the homogeneous sentence-to-sentence relationships, the homogeneous word-to-word relationships, and the heterogeneous sentence-to-word relationships. Experimental results show a significant improvement over a few solid baselines for both tasks. The corpus-based approach is validated to work almost as well as the knowledge-based approach for computing word semantics.

Fast Semantic Extraction Using a Novel Neural Network Architecture

Ronan Collobert and Jason Weston

We describe a novel neural network architecture for the problem of semantic role labeling. Many current solutions are complicated, consist of several stages and hand-built features, and are too slow to be applied as part of real applications that require such semantic labels, partly because of their use of a syntactic parser. Our method instead learns a direct mapping from source sentence to semantic tags for a given predicate without the aid of a parser. Our resulting system obtains accuraces comparable to the current state-of-the-art at a fraction of the computational cost.

Improving the Interpretation of Noun Phrases with Cross-linguistic Information

Roxana Girju

This paper addresses the automatic classification of semantic relations in noun phrases based on cross-linguistic evidence from a set of five Romance languages: Spanish, Italian, French, Portuguese, and Romanian. A set of novel semantic and contextual English-Romance NP features are derived based on empirical observations on the distribution of the syntax and meaning of noun phrases on two corpora of different genre (Europarl and CLUVI). The features were employed in a Support Vector Machines algorithm which achieved an accuracy of 76.9% (Europarl) and 74.31% (CLUVI). The results were compared against two state-of-the-art models reported in the literature: a supervised model and a web-based unsupervised model.

Learning to Extract Relations from the Web using Minimal Supervision

Razvan Bunescu and Raymond Mooney

We present a new approach to relation extraction that requires only a handful of training examples. Given a few pairs of named entities known to exhibit or not exhibit a particular relation, bags of sentences containing the pairs are extracted from the web. We extend an existing relation extraction method to handle this weaker form of supervision, and present experimental results demonstrating that our approach can reliably extract relations from web documents.

A Seed-driven Bottom-up Machine Learning Framework for Extracting Relations of Various Complexity

Feiyu Xu, Hans Uszkoreit and Hong Li

A minimally supervised machine learning framework is described for extracting relations of various complexity. Bootstrapping starts from a small set of n-ary relation instances as “seeds” in order to automatically learn pattern rules from parsed data, which then can extract new instances of the relation and its projections. We propose a novel rule representation model that enables the composition of n-ary relation rules on top of the rules for projections of the relation. The compositional approach to rule construction is supported by a bottom-up pattern extraction method. Because we only consider linguistic structures that contain arguments in the seed relations, the pattern extraction does not suffer from the computational problems of the subtree model (Sudo et al., 2003). In comparison to other automatic approaches, our rules cannot only localize relation arguments but also assign their exact target argument roles. The method is evaluated in two tasks: the extraction of Nobel Prize awards and management succession events. Performance for the new Nobel Prize task is strong. For the management succession task the results compare favorably with those of existing pattern acquisition approaches.

A Multi-resolution Framework for Information Extraction from Free Text

Mstislav Maslennikov and Tat-Seng Chua

Extraction of relations between entities is an important part of IE on free text. Previous methods are mostly based on statistical correlation and dependency relations be-tween entities. This paper re-exams the problem at the multi-resolution layers of phrase, clauses and sentences using dependency and discourse relations. Our multi-resolution framework uses clausal relations in 2 ways: 1) to filter noisy dependency paths; and 2) to increase reliability of dependency path extraction. The resulting system outperforms the previous ap-proaches by 3%, 7%, 4% on MUC4, MUC6 and ACE RDC domains respectively.

Using Corpus Statistics on Entities to Improve Semi-supervised Relation Extraction from the Web

Benjamin Rosenfeld and Ronen Feldman

Many errors produced by unsupervised and semi-supervised relation extraction (RE) systems occur because of wrong recognition of entities that participate in the relations. This is especially true for systems that do not use separate named-entity recognition components, instead relying on general-purpose shallow parsing. Such systems have greater applicability, because they are able to extract relations that contain attributes of unknown types. However, this generality comes with the cost in accuracy. In this paper we show how to use corpus statistics to validate and correct the arguments of extracted relation instances, improving the overall RE performance. We test the methods on SRES – a self-supervised Web relation extraction system. We also compare the performance of corpus-based methods to the performance of validation and correction methods based on supervised NER components.

Beyond Projectivity: Multilingual Evaluation of Constraints and Measures on Non-Projective Structures

Jiří Havelka

Dependency analysis of natural language has gained importance for its applicability to NLP tasks. Non-projective structures are common in dependency analysis, therefore we need fine-grained means of describing them, especially for the purposes of machine-learning oriented approaches like parsing. We present an evaluation on twelve languages which explores several constraints and measures on non-projective structures. We pursue an edge-based approach concentrating on properties of individual edges as opposed to properties of whole trees. In our evaluation, we include previously unreported measures taking into account levels of nodes in dependency trees. Our empirical results corroborate theoretical results and show that an edge-based approach using levels of nodes provides an accurate and at the same time expressive means for capturing non-projective structures in natural language.

Self-Training for Enhancement and Domain Adaptation of Statistical Parsers Trained on Small Datasets

Roi Reichart and Ari Rappoport

Creating large amounts of annotated data to train statistical PCFG parsers is expensive, and the performance of such parsers declines when training and test data are taken from different domains. In this paper we use self-training in order to improve the quality of a parser and to adapt it to a different domain, using only small amounts of manually annotated seed data. We report significant improvement both when the seed and test data are in the same domain and in the out-of-domain adaptation scenario. In particular, we achieve 50% reduction in annotation cost for the in-domain case, yielding an improvement of 66% over previous work, and a 20-33% reduction for the domain adaptation case. This is the first time that self-training with small labeled datasets is applied successfully to these tasks. We were also able to state a characterization of when self-training is valuable.

HPSG Parsing with Shallow Dependency Constraints

Kenji Sagae, Yusuke Miyao and Jun'ichi Tsujii

We present a novel framework that combines strengths from surface syntactic parsing and deep syntactic parsing to increase deep parsing accuracy, specifically by combining dependency and HPSG parsing. We show that by using surface dependencies to constrain the application of wide-coverage HPSG rules, we can benefit from a number of parsing techniques designed for high-accuracy dependency parsing, while actually performing deep syntactic analysis. Our framework results in a 1.4% absolute improvement over a state-of-the-art approach for wide coverage HPSG parsing.

Constituent Parsing with Incremental Sigmoid Belief Networks

Ivan Titov and James Henderson

We introduce a framework for syntactic parsing with latent variables based on dynamic Sigmoid Belief Networks. We demonstrate that a previous feed-forward neural network parsing model can be viewed as a coarse approximation to inference with this class of graphical models. By constructing a more accurate but still tractable approximation, we significantly improve parsing accuracy, suggesting that SBNs provide a good idealization for parsing. This generative model of parsing achieves state-of-the-art results on WSJ text and 8% error reduction over the baseline neural network parser.

Corpus Effects on the Evaluation of Automated Transliteration Systems

Sarvnaz Karimi, Andrew Turpin and Falk Scholer

Machine transliteration systems take a source word as input, and produce a target word in a different language that has the same pronunciation as the source. Most current transliteration systems employ a corpus of known source-target word pairs to train their system, and typically evaluate their systems on a similar corpus. In this paper we explore the performance of transliteration systems on corpora that are varied in a controlled way. In particular, we control the number and prior language knowledge of human transliterators used to construct the corpora, and the origin of the source words that make up the corpora. We find that the word accuracy of automated transliteration systems can alter by up to 30% (in absolute terms) depending on the corpus on which they are run. We conclude that at least four human transliterators should be used to construct corpora for evaluating automated transliteration systems; and that although absolute word accuracy metrics may not translate across corpora, the relative rankings of system performance remains stable across differing corpora.

Collapsed Consonant and Vowel Models: New Approaches for English-Persian Transliteration and Back-Transliteration

Sarvnaz Karimi, Falk Scholer and Andrew Turpin

The transliteration of words from a source language to a target language is important for many applications that need to deal with unknown words, including machine translation, cross-lingual information retrieval, and cross-lingual question answering. In this paper, we propose a novel algorithm for English to Persian transliteration. Previous methods proposed for this language pair apply a word alignment tool for training. By contrast, we introduce an alignment algorithm particularly designed for transliteration. Our new model improves the English to Persian transliteration accuracy by 14.2% over a n-gram baseline. We also investigate back-transliteration for this language pair, a previously unstudied problem. We propose a novel method to handle back-transliteration. Experimental results demonstrate that our algorithm leads to an absolute improvement of 25.1% over standard transliteration approaches.

Alignment-Based Discriminative String Similarity

Shane Bergsma and Grzegorz Kondrak

A character-based measure of similarity is an important component of many natural language processing systems, including approaches to transliteration, coreference, word alignment, spelling correction, and the identification of cognates in related vocabularies. We propose an alignment-based discriminative framework for string similarity. We gather features from substring pairs consistent with a character-based alignment of the two strings. This approach achieves exceptional performance; on nine separate cognate identification experiments using six different language pairs, we more than double the average precision of traditional orthographic measures like Longest Common Subsequence Ratio and Dice's Coefficient. We also show improvement over other recent discriminative and heuristic similarity functions.

Bilingual Terminology Mining - Using Brain, not brawn comparable corpora

Emmanuel Morin, Béatrice Daille, Koichi Takeuchi and Kyo Kageura

Current research in text mining favours the quantity of texts over their quality. But for bilingual terminology mining, and for many language pairs, large comparable corpora are not available. More importantly, as terms are defined vis-a-vis a specific domain with a restricted register, it is expected that the quality rather than the quantitiy of the corpus matters more in terminology mining. Our hypothesis, therefore, is that the quality of the corpus is more important than the quantity and ensures the quality of the acquired terminological resources. We show how important the type of discourse is as a characteristic of the comparable corpus.

Unsupervised Language Model Adaptation Incorporating Named Entity Information

Feifan Liu and Yang Liu

Language model (LM) adaptation is important for both speech and language processing. It is often achieved by combining a generic LM with a topic-specific model that is more relevant to the target document. Unlike previous work on unsupervised LM adaptation, this paper investigates how effectively using named entity (NE) information, instead of considering all the words, helps LM adaptation. We evaluate two latent topic analysis approaches in this paper, namely, clustering and Latent Dirichlet Allocation (LDA). In addition, a new dynamically adapted weighting scheme for topic mixture models is proposed based on LDA topic analysis. Our experimental results show that the NE-driven LM adaptation framework outperforms the baseline generic LM. Furthermore, in the LDA-based approach, expanding the named entities with syntactically filtered words, together with an increase of the number of topics, yields a perplexity reduction of 14.23% compared to the baseline generic LM.

Coordinate Noun Phrase Disambiguation in a Generative Parsing Model

Deirdre Hogan

In this paper we present methods for improving the disambiguation of noun phrase (NP) coordination within the framework of a generative history-based parsing model. As well as reducing noise in the data, we look at modeling two main sources of information for disambiguation: symmetry in conjunct structure, and the dependency between conjunct lexical heads. We also alter the head-finding rules for base noun phrases so that the lexical item chosen to head the entire phrase more closely resembles that chosen for other types of coordinate NP. Our changes to the baseline model result in an increase in NP coordination dependency f-score from 69.9% to 73.8%, which represents a relative reduction in f-score error of 13%.

A Unified Tagging Approach to Text Normalization

Conghui Zhu, Jie Tang, Hang Li, Hwee Tou Ng and Tiejun Zhao

This paper addresses the issue of text normalization, an important yet often overlooked problem in natural language processing. By text normalization, we mean converting ‘informally inputted’ text into the canonical form, by eliminating ‘noises’ in the text and detecting paragraph and sentence boundaries in the text. Previously, text normalization issues were often undertaken in an ad-hoc fashion or studied separately. This paper first gives a formalization of the entire problem. It then proposes a unified tagging approach to perform the task using Conditional Random Fields (CRF). The paper shows that with the introduction of a small set of tags, most of the text normalization tasks can be performed within the approach. The accuracy of the proposed method is high, because the subtasks of normalization are interdependent and should be performed together. Experimental results on email data cleaning and named entity recognition show that the proposed method significantly outperforms the approach of using cascaded models and that of employing independent models.

Sparse Information Extraction: Unsupervised Language Models to the Rescue

Doug Downey, Stefan Schoenmackers and Oren Etzioni

Zipf's Law governs the distribution of extractions. Thus, even in a massive corpus such as the Web, a substantial fraction of extractions appear infrequently. This paper shows how to assess the correctness of such sparse extractions by utilizing unsupervised language models. The REALM system, which combines HMM-based and N-gram-based language models, ranks candidate extractions by the likelihood that they are correct. Experiments over multiple relations show that REALM reduces extraction error by 39%, on average, when compared with previous work. Because REALM pre-computes language models based on its corpus and does not require any hand-tagged seeds, it is far more scalable than previous approaches that learn models for each individual relation from hand-tagged data. Thus, REALM is ideally suited for open information extraction where the relations of interest are not specified in advance and their number is potentially vast.

Forest-to-String Statistical Translation Rules

Yang Liu, Yun Huang, Qun Liu and Shouxun Lin

In this paper, we propose forest-to-string rules to enhance the expressive power of tree-to-string translation models. A forest-to-string rule is capable of capturing non-syntactic phrase pairs by describing the correspondence between multiple parse trees and one string. To integrate these rules into tree-to-string translation models, auxiliary rules are introduced to provide a generalization level. Experimental results show that, on the NIST 2005 Chinese-English test set, the tree-to-string model augmented with forest-to-string rules achieves a relative improvement of 4.3% in terms of BLEU score over the original model which allows tree-to-string rules only.

Ordering Phrases with Function Words

Hendra Setiawan, Min-Yen Kan and Haizhou Li

This paper presents a Function Word centered, Syntax-based (FWS) solution to phrase ordering. Motivated by the observation that function words often encodes grammatical relationship between phrases within a sentence, we propose a probabilistic synchronous grammar to model the ordering of function words and their left and right arguments. We also extend standard ITG to accommodate single gaps. By lexicalizing the resulting Single Gap ITG rules in a small number of cases corresponding to function words, we improve phrase ordering performance. The experiments show that the FWS approach consistently outperforms the baseline system in both function word centered ordering and overall BLEU score in perfect and noisy word alignment scenarios.

A Probabilistic Approach to Syntax-based Reordering for Statistical Machine Translation

Chi-Ho Li, Minghui Li, Dongdong Zhang, Mu Li, Ming Zhou and Yi Guan

Inspired by previous preprocessing approaches to SMT, this paper proposes a novel, probabilistic approach to reordering which combines the merits of syntax and phrase-based SMT. Given a source sentence and its parse tree, our method generates, by tree operations, an n-best list of reordered inputs, which are then fed to standard phrase-based decoder to produce the optimal translation. Experiments show that, for the NIST MT-05 task of Chinese-to-English translation, the proposal leads to BLEU improvement of 1.57%.

Machine Translation by Triangulation: Making Effective Use of Multi-Parallel Corpora

Trevor Cohn and Mirella Lapata

Current phrase-based SMT systems perform poorly when using small training sets. This is a consequence of unreliable translation estimates and poor coverage over source and target phrases. This paper presents a method that alleviates this problem by exploiting multiple translations of the same source phrase. Central to our approach is triangulation, the process of translating from a source to a target language via an intermediate third language. This allows the use of a much wider range of parallel corpora for training, and can be combined with a standard phrase-table using conventional smoothing methods. Experimental results demonstrate BLEU improvements for triangulated models over a standard phrase-based system.

A Maximum Expected Utility Framework for Binary Sequence Labeling

Martin Jansche

Binary sequence labeling tasks arise frequently in natural language processing. Predictive inference under F-score as utility amounts to finding a sequence of binary labels with maximal expected F-score relative to a simple probabilistic sequence labeling model. We show that the number of hypotheses whose expected F-score needs to be evaluated is linear in the sequence length and present a framework for efficiently evaluating the expectation of many common loss/utility functions, including the F-score. This framework includes both exact and faster inexact calculation methods.

A fully Bayesian approach to unsupervised part-of-speech tagging

Sharon Goldwater and Tom Griffiths

Unsupervised learning of linguistic structure is a difficult problem. A common approach is to define a generative model and maximize the probability of the hidden structure given the observed data. Typically, this is done using maximum-likelihood estimation (MLE) of the model parameters. We show using part-of-speech tagging that a fully Bayesian approach can greatly improve performance. Rather than estimating a single set of parameters, the Bayesian approach integrates over all possible parameter values. This difference ensures that the learned structure will have high probability over a range of possible parameters, and permits the use of priors favoring the sparse distributions that are typical of natural language. Our model has the structure of a standard trigram HMM, yet achieves tagging accuracy comparable to that of a state-of-the-art discriminative model (Smith and Eisner, 2005), an improvement of up to 15 percentage points over MLE. Moreover, our Bayesian HMM can be successfully trained from data alone, with no tagging dictionary.

Computationally Efficient M-Estimation of Log-Linear Structure Models

Noah A. Smith, Douglas L. Vail and John D. Lafferty

We describe a new loss function, due to Jeon and Lin (2006), for estimating structured log-linear models on arbitrary features. The loss function can be seen as a (generative) alternative to maximum likelihood estimation with an interesting information-theoretic interpretation, and it is statistically consistent. It is substantially faster than maximum (conditional) likelihood estimation of conditional random fields (Lafferty et al., 2001; an nocitelafferty-01 order of magnitude or more). We compare its performance and training time to an HMM and a CRF on a shallow parsing task. These experiments cleanly tease apart the contributions of rich features and discriminative training, which are shown to be more than additive.

Guided Learning for Bidirectional Sequence Classification

Libin Shen, Giorgio Satta and Aravind K. Joshi

In this paper, we propose guided learning, a new learning framework for bidirectional sequence classification. The tasks of learning the order of inference and training the local classifier are dynamically incorporated into a single Perceptron like learning algorithm. We apply this novel learning algorithm to POS tagging. It obtains an error rate of 2.67% on the standard PTB test set, which represents 3.3% relative error reduction over the previous best result with fewer features on the same data set.

Different Structures for Evaluating Answers to Complex Questions: Pyramids Won't Topple, and Neither Will Human Assessors

Hoa Trang Dang and Jimmy Lin

The idea of "nugget pyramids" has recently been introduced as a refinement to the nugget-based methodology employed to evaluate answers to complex questions in the TREC QA tracks. This work examines data from the TREC 2006 QA track, the first large-scale deployment of the nugget pyramids method, and shows that this method of combining judgments of nugget importance from multiple assessors increases the stability and discriminative power of the evaluation while introducing only a small additional manual assessment cost. We address the desire to maintain a model of real users for the task of question answering, by exploring different ways in which assessor opinions can be combined. We show that the nugget pyramid evaluation is highly correlated with other evaluations that do maintain a user model, and hence is an appropriate method for evaluating an end-user task such as question-answering.

Exploiting Syntactic and Shallow Semantic Kernels for Question Answer Classification

Alessandro Moschitti, Silvia Quarteroni, Roberto Basili and Suresh Manandhar

In this paper, we study the impact of syntactic and shallow semantic information in automatic classification and reranking of questions and answers. We define (a) new tree structures based on shallow semantics encoded in Predicate Argument Structures (PASs) given by PropBank and (b) new kernel functions to exploit the representational power of such structures. Our experiments with Support Vector Machines %and the above models suggest that syntactic information helps specific tasks such as question/answer classification and that shallow semantics produces a remarkable improvement when a reliable set of PASs can be extracted, e.g. from answers.

Language-independent Probabilistic Answer Ranking for Question Answering

Jeongwoo Ko, Teruko Mitamura and Eric Nyberg

This paper presents a probabilistic answer ranking framework for multilingual question answering. The framework estimates the probability of an individual answer candidate given the degree of answer relevance and the amount of supporting evidence provided in the set of answer candidates for the question. Our approach was evaluated by comparing the candidate answer sets generated by Chinese and Japanese answer extractors with the re-ranked answer sets produced by the answer ranking framework. Empirical results from testing on NTCIR factoid questions show a 40% performance improvement in Chinese answer selection and a 45% improvement in Japanese answer selection.

Learning to Compose Effective Strategies from a Library of Dialogue Components

Martijn Spitters, Marco De Boni, Jakub Zavrel and Remko Bonnema

This paper describes a method for automatically learning effective dialogue strategies, generated from a library of dialogue content, using reinforcement learning from user feedback. This library includes greetings, social dialogue, chit-chat, jokes and relationship building, as well as the more usual clarification and verification components of dialogue. We tested the method through a motivational dialogue system that encourages take-up of exercise and show that it can be used to construct good dialogue strategies with little effort.

On the role of context and prosody in the interpretation of 'okay'

Agustin Gravano, Stefan Benus, Hector Chavez, Julia Hirschberg and Lauren Wilcox

We examine the effect of contextual and acoustic cues in the disambiguation of three discourse-pragmatic functions of the word 'okay'. Results of a perception study show that contextual cues are stronger predictors of discourse function than acoustic cues. However, acoustic features capturing the pitch excursion at the right edge of 'okay' feature prominently in disambiguation, whether other contextual cues are present or not.

Predicting Success in Dialogue

David Reitter and Johanna D. Moore

Task-solving in dialogue depends on the linguistic alignment of the interlocutors, which is suggested to be based on mechanistic repetition effects (Pickering & Garrod 2004).In this paper, we seek confirmation of this of this hypothesis by looking at repetition in corpora - and whether repetition is correlated with task success. We show that the relevant repetition tendency is based on slow adaptation rather than short-term priming and demonstrate that lexical and syntactic repetition is a reliable predictor of task success given the first five minutes of a task-oriented dialogue.

Resolving It, This, and That in Unrestricted Multi-Party Dialog

Christoph Müller

We present an implemented system for the resolution of it, this, and that in transcribed multi-party dialog. The system handles NP-anaphoric as well as discourse-deictic anaphors, i.e. pronouns with VP antecedents. Selectional preferences for NP or VP antecedents are determined on the basis of corpus counts. Initial results show that the system performs better than a recency-based baseline.

A Comparative Study of Parameter Estimation Methods for Statistical Natural Language Processing

Jianfeng Gao, Galen Andrew, Mark Johnson and Kristina Toutanova

This paper presents a comparative study of five parameter estimation algorithms on four NLP tasks. Three of the five algorithms are well-known in the computational linguistics community: Maximum Entropy (ME) estimation with L2 regularization, the Averaged Perceptron, and Boosting. We also investigate ME estimation with the increasingly popular L1 regularization using a novel optimization algorithm, and BLasso, which is a version of Boosting with Lasso (L1) regularization. We first investigate all of our estimators on two reranking tasks: a parse selection task and a language model adaptation task. Then we apply the best of these estimators to two additional tasks involving conditional sequence models: a Conditional Markov Model (CMM) for part of speech (POS) tagging and a Conditional Random Field (CRF) for Chinese word segmentation. Our experiments show that three of the estimators, ME estimation with L1 or L2 regularization, and the Averaged Perceptron, are in a near statistical tie for first place.

Grammar Approximation by Representative Sublanguage: A New Model for Language Learning

Smaranda Muresan and Owen Rambow

We propose a new language learning model that learns a syntactic-semantic grammar from a small number of natural language strings annotated with their semantics, along with basic assumptions about natural language syntax. We show that the search space for grammar induction is a complete grammar lattice, which guarantees the uniqueness of the learned grammar.

Chinese Segmentation with a Word-Based Perceptron Algorithm

Yue Zhang and Stephen Clark

Standard approaches to Chinese word segmentation treat the problem as a tagging task, assigning labels to the characters in the sequence indicating whether the character marks a word boundary. Discriminatively trained models based on local character features are used to make the tagging decisions, with Viterbi decoding finding the highest scoring segmentation. In this paper we propose an alternative, word-based segmentor, which uses features based on complete words and word sequences. The perceptron algorithm is used for discriminative training. Since Viterbi decoding is no longer applicable, we use a beam-search decoder. Closed tests on the first and second SIGHAN bakeoffs show that our system is competitive with the best in the literature, achieving the highest reported F-scores for a number of corpora.

Unsupervised Coreference Resolution in a Nonparametric Bayesian Model

Aria Haghighi and Dan Klein

We present an unsupervised, nonparametric Bayesian approach to coreference resolution which models both global entity references across a corpus as well as the sequential anaphoric structure within each document. While most existing work is driven by pairwise decisions, our model is fully generative, producing each mention from a combination of global entity properties and local attentional state. Despite being unsupervised, our system achieves surprisingly competitive performance ACE and MUC data sets. In particular, our best system achieves a 70.3 MUC F1 measure on the MUC-6 test set, broadly in the range of some recent supervised results.

Pivot Language Approach for Phrase-Based Statistical Machine Translation

Hua Wu and Haifeng Wang

This paper proposes a novel method for phrase-based statistical machine translation by using pivot language. To conduct translation between languages Lf and Le with a small bilingual corpus, we bring in a third language Lp, which is named the pivot language. For Lf-Lp and Lp-Le, there exist large bilingual corpora. Using only Lf-Lp and Lp-Le bilingual corpora, we can build a translation model for Lf-Le. The advantage of this method lies in that we can perform translation between Lf and Le even if there is no bilingual corpus available for this language pair. Using BLEU as a metric, our pivot language method achieves an absolute improvement of 0.06 (22.13% relative) as compared with the model directly trained with 5,000 Lf-Le sentence pairs for French-Spanish translation. Moreover, with a small Lf-Le bilingual corpus available, our method can further improve the translation quality by using the additional Lf-Lp and Lp-Le bilingual corpora.

Bootstrapping a Stochastic Transducer for Arabic-English Transliteration Extraction

Tarek Sherif and Grzegorz Kondrak

We propose a bootstrapping approach to training a memoriless stochastic transducer for the task of extracting transliterations from an English-Arabic bitext. It learns its similarity metric from the data in the bitext, and thus can function directly on strings written in different writing scripts without any additional language knowledge. We show that this bootstrapped transducer performs as well or better than a model designed specifically to detect Arabic-English transliterations.

Benefits of the Massively Parallel Rosetta Stone: Cross-Language Information Retrieval with over 30 Languages

Peter A. Chew and Ahmed Abdelali

In this paper, we describe our experiences in extending a standard cross-language in-formation retrieval (CLIR) approach which uses parallel aligned corpora and Latent Semantic Indexing. Most, if not all, previous work which follows this ap-proach has focused on bilingual retrieval; two examples involve the use of French-English or English-Greek parallel cor-pora. Our extension to the approach is ‘massively parallel’ in two senses, one linguistic and the other computational. First, we make use of a parallel aligned corpus consisting of almost 50 parallel translations in over 30 distinct languages, each in over 30,000 documents. Given the size of this dataset, a ‘massively parallel’ approach was also necessitated in the more usual computational sense. Our results indicate that, far from adding more noise, more linguistic parallelism is better when it comes to cross-language retrieval precision, in addition to the self-evident benefit that CLIR can be performed on more languages.

A Re-examination of Machine Learning Approaches for Sentence-Level MT Evaluation

Joshua S. Albrecht and Rebecca Hwa

Machine learning methods have been proposed in the past as means of developing automatic metrics to evaluate the quality of machine translated sentences. This paper further investigates this idea, analyzing aspects of learning that impact performance. We show that previously proposed approaches of training a Human-Likeness classifier is not as well correlated with human judgments of translation quality. Instead, we argue that regression-based learning produces more reliable metrics. We demonstrate the feasibility of regression-based metrics through empirical analysis of learning curves and generalization studies. Our results suggest that regression-based metrics can achieve higher correlations with human judgments than several standard automatic metrics.

Automatic Acquisition of Ranked Qualia Structures from the Web

Philipp Cimiano and Johanna Wenderoth

This paper presents an approach to automatically learning qualia structures for nouns from the Web and thus opens the possibility to explore the impact of qualia structures for natural language processing at a larger scale. The approach builds on earlier work based on the idea of matching specific lexico-syntactic patterns conveying a certain semantic relation on the World Wide Web using standard search engines. In our approach, the qualia elements are actually ranked for each qualia role with respect to some measure. The specific contribution of the paper lies in the extensive analysis and quantitative comparison of different measures for ranking the qualia elements. Further, for the first time, we present a quantitative evaluation of such an approach for learning qualia structures with respect to a handcrafted gold standard.

A Sequencing Model for Situation Entity Classification

Alexis Palmer, Elias Ponvert, Jason Baldridge and Carlota Smith

Situation entities (SEs) are the events, states, generic statements, and embedded facts and propositions introduced to a discourse by clauses of text. We report on the first data-driven models for SE classification, which is the labeling of clauses according to the type of situation entity they introduce. SE classification is important for discourse mode identification and useful for discourse parsing. We use a sequencing approach to the task that outperforms a simple utterance-based classifier. Linguistically-motivated cooccurrence features and grammatical relation information from deep syntactic analysis improve classification accuracy. In addition, we report on genre effects seen in SE classification which support the analysis of discourse modes having characteristic distributions of SEs and sequences of SEs. Finally, we show that SE classification helps discourse parsing accuracy.

Words and Echoes: Assessing and Mitigating the Non-Randomness Problem in Word Frequency Distribution Modeling

Baroni Marco and Evert Stefan

Frequency distribution models tuned to words and other linguistic events can predict the number and frequency distribution of types in samples of arbitrary sizes. We conduct, for the first time, a rigorous evaluation of these models based on cross-validation and separation of training and test data. Our experiments reveal that the prediction accuracy of the models is marred by serious overfitting problems, due to violations of the random sampling assuption in corpus data. We then propose a simple pre-processing method to alleviate non-randomness problems. Further evaluation confirms the effectiveness of the method, which compares favourably to more complex correction techniques.

A System for Large-Scale Acquisition of Verbal, Nominal and Adjectival Subcategorization Frames from Corpora

Judita Preiss, Ted Briscoe and Anna Korhonen

This paper describes the first system for large-scale acquisition subcategorization frames from English corpus data which can be used to acquire comprehensive lexicons for verbs, nouns and adjectives. The system incorporates an extensive rule-based classifier which identifies 168 verbal, 37 adjectival and 31 nominal frames from grammatical relations output by a robust parser. The system achieves state-of-the-art performance on all three sets.

A Language-Independent Unsupervised Model for Morphological Segmentation

Vera Demberg

Morphological segmentation has been shown to be beneficial to a range of NLP tasks such as machine translation, speech recognition, speech synthesis and information retrieval. Recently, a number of approaches to unsupervised morphological segmentation have been proposed. This paper describes an algorithm that draws from previous approaches and combines them into a simple model for morphological segmentation that outperforms other approaches on English and German, and also yields good results on agglutinative languages such as Finnish and Turkish. We also propose a method for detecting variation within stems in an unsupervised fashion. We show that the segmentation quality reached with the new algorithm is good enough to improve on a speech synthesis task.

Using Mazurkiewicz Trace Languages for Partition-Based Morphology

Francois Barthelemy

Partition-based morphology is an approach of finite-state morphology where a grammar describes a special kind of regular relations, which split all the strings of a given tuple into the same number of substrings. They are compiled in finite-state machine. In this paper, we address the question of merging grammars using different partitionings into a single finite-state machine. A morphological description may then be obtained by parallel or sequential application of constraints expressed on different partition notions (e.g. morpheme, phoneme, grapheme). The theory of Mazurkiewicz Trace Languages, a well known semantics of parallel systems, provides a way of representing and compiling such a description.

Much ado about nothing: A social network model of Russian paradigmatic gaps

Robert Daland, Andrea D. Sims and Janet Pierrehumbert

A number of Russian verbs lack 1sg non-past forms. The persistence of these paradigmatic gaps seemingly contradicts the highly productive nature of inflectional systems. We model the persistence and spread of gaps in Russian with a multi-agent model. We ran three simulations: no grammar learning, learning with arbitrary analogical pressure, and morphophono-logically conditioned learning. The results and limit behavior are compared to the attested historical development of the gaps. We propose that the persistence of gaps can be explained in the absence of morphological competition.

Substring-Based Transliteration

Tarek Sherif and Grzegorz Kondrak

Transliteration is the task of converting a word from one alphabetic script to another. We present a novel, substring-based approach to transliteration, inspired by phrase-based models of machine translation. We investigate two implementations of substring-based transliteration: a dynamic programming algorithm, and a finite-state transducer. We show that our substring-based transducer not only outperforms a state-of-the-art letter-based approach by a significant margin, but is also orders of magnitude faster.

Pipeline Iteration

Kristy Hollingshead and Brian Roark

This paper presents pipeline iteration, an approach that uses output from later stages of a pipeline to constrain earlier stages of the same pipeline. We demonstrate significant improvements in a state-of-the-art PCFG parsing pipeline using base-phrase constraints, derived either from later stages of the parsing pipeline or from a finite-state shallow parser. The best performance is achieved by reranking the union of unconstrained parses and relatively heavily-constrained parses.

Learning Synchronous Grammars for Semantic Parsing with Lambda Calculus

Yuk Wah Wong and Raymond J. Mooney

This paper presents the first empirical results on learning synchronous grammars that generate logical forms. Using statistical machine translation techniques, a semantic parser based on a synchronous context-free grammar augmented with lambda-operators is learned given a set of training sentences and their correct logical forms. The resulting parser is shown to be the best-performing system so far in a database query domain.

Generalizing Tree Transformations for Inductive Dependency Parsing

Jens Nilsson, Joakim Nivre and Johan Hall

Previous studies in data-driven dependency parsing have shown that tree transformations can improve parsing accuracy for specific parsers and data sets. We investigate to what extent this can be generalized across languages/treebanks and parsers, focusing on pseudo-projective parsing, as a way of capturing non-projective dependencies, and transformations used to facilitate parsing of coordinate structures and verb groups. The results indicate that the beneficial effect of pseudo-projective parsing is independent of parsing strategy but sensitive to language or treebank specific properties. By contrast, the construction specific transformations appear to be more sensitive to parsing strategy but have a constant positive effect over several languages.

Learning Multilingual Subjective Language via Cross-Lingual Projections

Rada Mihalcea, Carmen Banea and Janyce Wiebe

This paper explores methods for generating subjectivity analysis resources in a new language by leveraging on the tools and resources available in English. Given a bridge between English and the selected target language (e.g., a bilingual dictionary or a parallel corpus), the methods can be used to rapidly create tools for subjectivity analysis in the new language.

Sentiment Polarity Identification in Financial News: A Cohesion-based Approach

Ann Devitt and Khurshid Ahmad

Text is not unadulterated fact. A text can make you laugh or cry but can it also make you short sell your stocks in company A and buy up options in company B? Research in the domain of finance strongly suggests that it can. Studies have shown that both the informational and affective aspects of news text affect the markets in profound ways, impacting on volumes of trades, stock prices, volatility of prices and even future firm earnings. This paper aims to explore a computable metric of positive or negative polarity in financial news text which is consistent with human judgments of polarity in such texts and which can be used in a quantitative analysis of news sentiment impact on financial markets.

Weakly Supervised Learning for Hedge Classification in Scientific Literature

Ben Medlock and Ted Briscoe

We investigate automatic classification of speculative language, or `hedging', in scientific literature from the biomedical domain using weakly supervised machine learning. Our contributions include a precise description of the task with annotation guidelines, analysis and discussion, a probabilistic formulation of the self-training paradigm, and a theoretical and practical evaluation of the learning and classification models presented. We demonstrate experimentally that hedge classification is feasible using weakly supervised ML, while pointing toward avenues for future research.

Text Analysis for Automatic Image Annotation

Koen Deschacht and Marie-Francine Moens

We present a novel approach to automatically annotate images using associated text. We detect and classify all entities (persons and objects) in the text after which we determine the salience (the importance of an entity in a text) and visualness (the extent to which an entity can be perceived visually) of these entities. We combine these measures to compute the probability that an entity is present in the image. The suitability of our approach was successfully tested on 50 image-text pairs of Yahoo! News.

User Requirements Analysis for Meeting Information Retrieval Based on Query Elicitation

Vincenzo Pallotta, Violeta Seretan and Marita Ailomaa

We present a user requirement study for question answering on meeting records that assesses the difficulty of users questions in terms of what types of information and retrieval techniques are required in order to provide the correct answers. We ground our work on the empirical analysis of elicited user queries. We found out that the majority of elicited queries pertain to argumentative processes and outcomes (around 60%). Our analysis also suggest that standard keyword-based Information Retrieval can successfully deal with less than 20% of the queries, and that it must be complemented with other types of metadata and inference.

Combining Multiple Knowledge Sources for Dialogue Segmentation in Multimedia Archives

Pei-Yun Hsueh and Johanna D. Moore

Automatic segmentation is important for making multimedia archives comprehensible, and for developing downstream information retrieval and extraction modules. In this study, we explore approaches that can segment conversational speech by integrating various knowledge sources (e.g., words, audio and video recordings, speaker intention and context). In particular, we evaluate the performance of a Maximum Entropy approach, and evaluate the effectiveness of different multimodal features on the task of automatic segmentation of conversations. We also provide a quantitative account of the effect of using ASR transcription as opposed to human transcripts.

Topic Analysis for Psychiatric Document Retrieval

Liang-Chih Yu, Chung-Hsien Wu, Chin-Yew Lin, Eduard Hovy and Chia-Ling Lin

Psychiatric document retrieval attempts to help people to efficiently and effectively locate the consultation documents relevant to their depressive problems. Individuals can understand how to alleviate their symp-toms according to recommendations in the relevant documents. This work proposes the use of high-level topic information ex-tracted from consultation documents to im-prove the precision of retrieval results. The topic information adopted herein includes negative life events, depressive symptoms and semantic relations between symptoms, which are beneficial for better understand-ing of users’ queries. Experimental results show that the proposed approach achieves higher precision than the word-based re-trieval models, namely the vector space model (VSM) and Okapi model, adopting word-level information alone.

What to be? - Electronic Career Guidance Based on Semantic Relatedness

Iryna Gurevych, Christof Müller and Torsten Zesch

We present a study aimed at investigating the use of semantic information in a novel NLP application, Electronic Career Guidance (ECG), in German. ECG is formulated as an information retrieval (IR) task, whereby textual descriptions of professions (documents) are ranked for their relevance to natural language descriptions of a person's professional interests (the topic). We compare the performance of two semantic IR models: (IR-1) utilizing semantic relatedness (SR) measures based on either wordnet or Wikipedia and a set of heuristics, and (IR-2) measuring the similarity between the topic and documents based on Explicit Semantic Analysis (ESA) (Gabrilovich and Markovitch, 2007). We evaluate the performance of SR measures intrinsically on the tasks of (T-1) computing semantic relatedness, and (T-2) solving Reader's Digest Word Power (RDWP) problems. We find that the wordnet based measure is superior for capturing semantic similarity, while the Wikipedia based measure is very good at capturing semantic relatedness and non-classical semantic relations. It also performs significantly better both in terms of coverage and correctness for RDWP problems. We find that (IR-2) performs significantly better for longer topics, while (IR-1) utilizing the Wikipedia based SR measure is significantly better for short topics both in MAP and P10.

Extracting Social Networks and Biographical Facts From Conversational Speech Transcripts

Hongyan Jing, Nanda Kambhatla and Salim Roukos

We present a general framework for automatic extraction of social networks and biographical facts from conversational speech transcripts. Our approach relies on fusing the output produced by multiple information extraction modules, including entity recognition and detection, relation detection, and event detection. We describe the specific features and algorithmic improvements effective for conversational speech transcripts. These improvements increases the performance of social network extraction from 0.06 to 0.30 for the development set, and from 0.06 to 0.28 for the test set, as measured by f-measure on the ties within a network. The same framework can be applied to other genres of text — we have built an automatic biography generation system for general domain text using the same approach.