ACL 2010 - Workshops

Workshops

Venue	Thursday, July 15		Friday, July 16
	a.m.	p.m.	a.m.	p.m.
Venue A, Aula	CoNLL-2010 Fourteenth Conference on Computational Natural Language Learning
Venue B, Lacture Hall 3	WS2: WMT’10/MetricsMATR Joint Fifth Workshop on Statistical Machine Translation and MetricsMATR
Venue B, Lecture Hall 4	WS1: SemEval-2010 5th International Workshop on Semantic Evaluation
Venue A, Hall X	WS3: The LAW IV The 4th Linguistic Annotation Workshop
Venue A, Hall IX	WS4: BioNLP2010 Workshop on Biomedical Natural Language Processing		WS6: NLPLing 2010 NLP and Linguistics: Finding the Common Ground
Venue A, Hall IV	WS11: DANLP 2010 Domain Adaptation for Natural Language Processing		WS8: TextGraphs-5 Graph-based Methods for Natural Language Processing
Venue A, Room VIII	WS5: CMCL Cognitive Modeling and Computational Linguistics		WS9: NEWS 2010 Named Entities Workshop
Venue A, Room XI	WS7: SIGMORPHON-11 11th Meeting of ACL Special Interest Group in Computational Morphology and Phonology		WS10: ATANLP 2010 Applications of Tree Automata in Natural Language Processing
Venue A, Room II	WS12: CDS Companionable Dialogue Systems		WS13: GEMS-2010 Geometric Models of Natural Language Semantics

ACL 2010 Workshop Chairs

Pushpak Bhattacharyya (Indian Institute of Technology, Mumbai, India)
David Weir (University of Sussex, United Kingdom)

E-mail: workshops@acl2010.org

⇑ CoNLL-2010

Fourteenth Conference on Computational Natural Language Learning
July 15–16
Venue A, Aula
Chairs: Mirella Lapata and Anoop Sarkar
Homepage

Thursday, July 15, 2010

9:00–9:15

Opening Remarks

9:15–10:30

Session 1: Parsing

9:15–9:40	Improvements in Unsupervised Co-Occurrence-Based Parsing Christian Hänig show abstracthide abstract This paper presents an algorithm for unsupervised co-occurrence based parsing that improves and extends existing approaches. The proposed algorithm induces a context-free grammar of the language in question in an iterative manner. The resulting structure of a sentence will be given as a hierarchical arrangement of constituents. Although this algorithm does not use any a priori knowledge about the language, it is able to detect heads, modiﬁers and a phrase type’s different compound composition possibilities. For evaluation purposes, the algorithm is applied to manually annotated part-of-speech tags (POS tags) as well as to word classes induced by an unsupervised part-of-speech tagger.
9:40–10:05	Viterbi Training Improves Unsupervised Dependency Parsing Valentin I. Spitkovsky, Hiyan Alshawi, Daniel Jurafsky and Christopher D. Manning show abstracthide abstract We show that Viterbi (or "hard") EM is well-suited to unsupervised grammar induction. It is more accurate than standard inside-outside re-estimation (classic EM), signiﬁcantly faster, and simpler. Our experiments with Klein and Manning’s Dependency Model with Valence (DMV) attain state-of-the-art performance — 44.8% accuracy on Section 23 (all sentences) of the Wall Street Journal corpus — without clever initialization; with a good initializer, Viterbi training improves to 47.9%. This generalizes to the Brown corpus, our held-out set, where accuracy reaches 50.8% — a 7.5% gain over previous best results. We ﬁnd that classic EM learns better from short sentences but cannot cope with longer ones, where Viterbi thrives. However, we explain that both algorithms optimize the wrong objectives and prove that there are fundamental disconnects between the likelihoods of sentences, best parses, and true parses, beyond the well-established discrepancies between likelihood, accuracy and extrinsic performance.
10:05–10:30	Driving Semantic Parsing from the World’s Response James Clarke, Dan Goldwasser, Ming-Wei Chang and Dan Roth show abstracthide abstract Current approaches to semantic parsing, the task of converting text to a formal meaning representation, rely on annotated training data mapping sentences to logical forms. Providing this supervision is a major bottleneck in scaling semantic parsers. This paper presents a new learning paradigm aimed at alleviating the supervision burden. We develop two novel learning algorithms capable of predicting complex structures which only rely on a binary feedback signal based on the context of an external world. In addition we reformulate the semantic parsing problem to reduce the dependency of the model on syntactic patterns, thus allowing our parser to scale better using less supervision. Our results surprisingly show that without using any annotated meaning representations learning with a weak feedback signal is capable of producing a parser that is competitive with fully supervised parsers.

10:30–11:00

Break

11:00–12:15

Session 2: Grammar Induction

11:00–11:25	Efﬁcient, Correct, Unsupervised Learning for Context-Sensitive Languages Alexander Clark show abstracthide abstract A central problem for NLP is grammar induction: the development of unsupervised learning algorithms for syntax. In this paper we present a lattice-theoretic representation for natural language syntax, called Distributional Lattice Grammars. These representations are objective or empiricist, based on a generalisation of distributional learning, and are capable of representing all regular languages, some but not all context-free languages and some non-context-free languages. We present a simple algorithm for learning these grammars together with a complete self-contained proof of the correctness and efﬁciency of the algorithm.
11:25–11:50	Identifying Patterns for Unsupervised Grammar Induction Jesús Santamaría and Lourdes Araujo show abstracthide abstract This paper describes a new method for unsupervised grammar induction based on the automatic extraction of certain patterns in the texts. Our starting hypothesis is that there exist some classes of words that function as separators, marking the beginning or the end of new constituents. Among these separators we distinguish those which trigger new levels in the parse tree. If we are able to detect these separators we can follow a very simple procedure to identify the constituents of a sentence by taking the classes of words between separators. This paper is devoted to describe the process that we have followed to automatically identify the set of separators from a corpus only annotated with Part-of-Speech (POS) tags. The proposed approach has allowed us to improve the results of previous proposals when parsing sentences from the Wall Street Journal corpus.
11:50–12:15	Learning Better Monolingual Models with Unannotated Bilingual Text David Burkett, Slav Petrov, John Blitzer and Dan Klein show abstracthide abstract This work shows how to improve state-of-the-art monolingual natural language processing models using unannotated bilingual text. We build a multiview learning objective that enforces agreement between monolingual and bilingual models. In our method the ﬁrst, monolingual view consists of supervised predictors learned separately for each language. The second, bilingual view consists of log-linear predictors learned over both languages on bilingual text. Our training procedure estimates the parameters of the bilingual model using the output of the monolingual model, and we show how to combine the two models to account for dependence between views. For the task of named entity recognition, using bilingual predictors increases F1 by 16.1% absolute over a supervised monolingual model, and retraining on bilingual predictions increases monolingual model F1 by 14.6%. For syntactic parsing, our bilingual predictor increases F1 by 2.1% absolute, and retraining a monolingual model on its output gives an improvement of 2.0%.

12:15–14:15

Lunch

14:15–15:30

Invited Talk

14:15–15:30

Clueless: Explorations in Unsupervised, Knowledge-Lean Extraction of Lexical-Semantic Information

Lillian Lee

show abstract

I will discuss two current projects on automatically extracting certain types of lexical-semantic information in settings wherein we can rely neither on annotations nor existing knowledge resources to provide us with clues. The name of the game in such settings is to ﬁnd and leverage auxiliary sources of information. Why is it that if you know I’ll give a silly talk, it follows that you know I’ll give a talk, whereas if you doubt I’ll give a good talk, it doesn’t follow that you doubt I’ll give a talk? This pair of examples shows that the word “doubt” exhibits a special but prevalent kind of behavior known as downward entailingness — the licensing of reasoning from supersets to subsets, so to speak, but not vice versa. The ﬁrst project I’ll describe is to identify words that are downward entailing, a task that promises to enhance the performance of systems that engage in textual inference, and one that is quite challenging since it is difﬁcult to characterize these items as a class and no corpus with downward-entailingness annotations exists. We are able to surmount these challenges by utilizing some insights from the linguistics literature regarding the relationship between downward entailing operators and what are known as negative polarity items — words such as “ever” or the idiom “have a clue” that tend to occur only in negative contexts. A cross-linguistic analysis indicates some potentially interesting connections to ﬁndings in linguistic typology. That previous paragraph was quite a mouthful, wasn’t it? Wouldn’t it be nice if it were written in plain English that was easier to understand? The second project I’ll talk about, which has the eventual aim to make it possible to automatically simplify text, aims to learn lexical-level simpliﬁcations, such as “work together” for “collaborate”. (This represents a complement to prior work, which focused on syntactic transformations, such as passive to active voice.) We exploit edit histories in Simple English Wikipedia for this task. This isn’t as simple (ahem) as it might at ﬁrst seem because Simple English Wikipedia and the usual Wikipedia are far from a perfect parallel corpus and because many edits in Simple Wikipedia do not constitute simpliﬁcations. We consider both explicitly modeling different kinds of operations and various types of bootstrapping, including as clues the comments Wikipedians sometimes leave when they edit. Joint work with Cristian Danescu-Niculescu-Mizil, Bo Pang, and Mark Yatskar.

15:30–16:00

Break

16:00–17:30

Shared Task Session 1: Overview and Oral Presentations

16:00–16:20	The CoNLL 2010 Shared Task: Learning to Detect Hedges and their Scope in Natural Language Text Richárd Farkas, Veronika Vincze, György Móra, János Csirik and György Szarvas show abstracthide abstract The CoNLL 2010 Shared Task was dedicated to the detection of uncertainty cues and their linguistic scope in natural language texts. The motivation behind this task was that distinguishing factual and uncertain information in texts is of essential importance in information extraction. This paper provides a general overview of the \emph{Learning to detect hedges and their scope in natural language texts Shared Task}, including the annotation protocols of the training and evaluation datasets, the exact task deﬁnitions, the evaluation metrics employed and the overall results. The paper concludes with an analysis of the prominent approaches and an overview of the systems submitted to the Shared Task.
16:20–16:30	A Cascade Method for Detecting Hedges and their Scope in Natural Language Text Buzhou Tang, Xiaolong Wang, Xuan Wang, Bo Yuan and Shixi Fan show abstracthide abstract Detecting hedges and their scope in natural language text is very important for information inference. In this paper, we present a system based on a cascade method for the CoNLL-2010 shared task. The system composes of two components: one for detecting hedges and another one for detecting their scope. For detecting hedges, we build a cascade subsystem. Firstly, a conditional random ﬁeld (CRF) model and a large margin-based model are trained respectively. Then, we train another CRF model using the result of the ﬁrst phase. For detecting the scope of hedges, a CRF model is trained according to the result of the ﬁrst subtask. The experiments show that our system achieves 86.36% F-measure on Biological corpus and 55.05\% F-measure on Wikipedia corpus for hedge detection, and 49.95% F-measure on Biological corpus for hedge scope detection. Among them, 86.36% is the best result on Biological corpus for hedge detection.
16:30–16:40	Detecting Speculative Language using Syntactic Dependencies and Logistic Regression Andreas Vlachos and Mark Craven show abstracthide abstract In this paper we describe our approach to the CoNLL 2010 shared task on detecting speculative language in biomedical text. We treat the detection of sentences containing uncertain information (Task1) as a token classiﬁcation task since the existence or absence of cues determines the sentence label. We distinguish words that have speculative and non-speculative meaning by employing syntactic features as a proxy for their semantic content. In order to identify the scope of each cue (Task2), we learn a classiﬁer that predicts whether each token of a sentence belongs to the scope of a given cue. The features in the classiﬁer are based on the syntactic dependency path between the cue and the token. In both tasks, we use a Bayesian logistic regression classiﬁer incorporating a sparsity-enforcing Laplace prior. Overall, the performance achieved is 85.21% F-score and 44.11% F-score in Task1 and Task2, respectively.
16:40–16:50	A Hedgehop over a Max-margin Framework using Hedge Cues Maria Georgescul show abstracthide abstract In this paper, we describe the experimental settings we adopted in the context of the 2010 CoNLL shared task for detecting sentences containing uncertainty. The classiﬁcation results reported on are obtained using discriminative learning with features essentially incorporating lexical information. Hyper-parameters are tuned for each domain: using BioScope training data for the biomedical domain and Wikipedia training data for the Wikipedia test set. By allowing an efﬁcient handling of combinations of large-scale input features, the discriminative approach we adopted showed highly competitive empirical results for hedge detection on the Wikipedia dataset: our system is ranked as the ﬁrst with an F-score of 60.17%.
16:50–17:00	Detecting Hedge Cues and their Scopes with Average Perceptron Feng Ji, Xipeng Qiu and Xuanjing Huang show abstracthide abstract In this paper, we proposed a hedge detection method with average perceptron, which was used in the closed challenge in CoNLL 2010 Shared Task. There are two subtasks: (1) detecting uncertain sentences and (2) identifying the in-sentence scopes of hedge cues. We use the uniﬁed learning algorithm for both subtasks since that the hedge score of sentence can be decomposed into scores of the words, especially the hedge words. On the biomedical corpus, our methods achieved F-measure with 77.86% in detecting in-domain uncertain sentences, 77.44% in recognizing hedge cues, and 19.27% in identifying the scopes.
17:00–17:10	Memory-based Resolution of In-sentence Scopes of Hedge Cues Roser Morante, Vincent Van Asch and Walter Daelemans show abstracthide abstract In this paper we describe the machine learning systems that we submitted to the CoNLL-2010 Shared Task on Learning to Detect Hedges and Their Scope in Natural Language Text. Task 1 on detecting uncertain information was performed by an SVM-based system to process the Wikipedia data and by a memory-based system to process the biological data. Task 2 on resolving in-sentence scopes of hedge cues, was performed by a memory-based system that relies on information from syntactic dependencies. This system scored the highest F1 (57.32) of Task 2.
17:10–17:20	Resolving Speculation: MaxEnt Cue Classiﬁcation and Dependency-Based Scope Rules Erik Velldal, Lilja Øvrelid and Stephan Oepen show abstracthide abstract This paper describes a hybrid, two-level approach for resolving hedge cues, the problem of the CoNLL 2010 shared task. First, a maximum entropy classiﬁer is applied to identify cue words, using both syntactic and surface-oriented features. Second, a set of manually crafted rules, operating on dependency representations and the output of the classiﬁer, is applied to resolve the scope of the hedge cues within the sentence. For both Task 1 and Task 2, our system participates in the stricter category of ‘closed’ or ‘in-domain’ systems.
17:20–17:30	Combining Manual Rules and Supervised Learning for Hedge Cue and Scope Detection Marek Rei and Ted Briscoe show abstracthide abstract Hedge cues were detected using a supervised Conditional Random Field (CRF) classiﬁer exploiting features from the RASP parser. The CRF’s predictions were ﬁltered using known cues and unseen instances were removed, increasing precision while retaining recall. Rules for scope detection, based on the grammatical relations of the sentence and the part-of-speech tag of the cue, were manually developed. However, another supervised CRF classiﬁer was used to reﬁne these predictions. As a ﬁnal step, scopes were constructed from the classiﬁer output using a small set of post-processing rules. Development of the system revealed a number of issues with the annotation scheme adopted by the organisers.

17:30–18:00

Shared Task Discussion Panel

Friday, July 16, 2010

9:15–10:30

Invited Talk

9:15–10:30

Bayesian Hidden Markov Models and Extensions

Zoubin Ghahramani

show abstract

10:30–11:00

Break

11:00–12:30

Joint Poster Session: Main conference and shared task posters

Main conference posters
21	Improved Unsupervised POS Induction Using Intrinsic Clustering Quality and a Zipﬁan Constraint Roi Reichart, Raanan Fattal and Ari Rappoport show abstracthide abstract Modern unsupervised POS taggers usually apply an optimization procedure to a non-convex function, and tend to converge to local maxima that are sensitive to starting conditions. The quality of the tagging induced by such algorithms is thus highly variable, and researchers report average results over several random initializations. Consequently, applications are not guaranteed to use an induced tagging of the quality reported for the algorithm. In this paper we address this issue using an unsupervised test for intrinsic clustering quality. We run a base tagger with different random initializations, and select the best tagging using the quality test. As a base tagger, we modify a leading unsupervised POS tagger (Clark, 2003) to constrain the distributions of word types across clusters to be Zipﬁan, allowing us to utilize a perplexity-based quality test. We show that the correlation between our quality test and gold standard-based tagging quality measures is high. Our results are better in most evaluation measures than all results reported in the literature for this task, and are always better than the Clark average results.
22	Syntactic and Semantic Structure for Opinion Expression Detection Richard Johansson and Alessandro Moschitti show abstracthide abstract We demonstrate that relational features derived from dependency-syntactic and semantic role structures are useful for the task of detecting opinionated expressions in natural-language text, signiﬁcantly improving over conventional models based on sequence labeling with local features. These features allow us to model the way opinionated expressions interact in a sentence over arbitrary distances. While the relational features make the prediction task more computationally expensive, we show that it can be tackled effectively by using a reranker. We evaluate a number of machine learning approaches for the reranker, and the best model results in a 10-point absolute improvement in soft recall on the MPQA corpus, while decreasing precision only slightly.
23	Type Level Clustering Evaluation: New Measures and a POS Induction Case Study Roi Reichart, Omri Abend and Ari Rappoport show abstracthide abstract Clustering is a central technique in NLP. Consequently, clustering evaluation is of great importance. Many clustering algorithms are evaluated by their success in tagging corpus tokens. In this paper we discuss type level evaluation, which reﬂects class membership only and is independent of the token statistics of a particular reference corpus. Type level evaluation casts light on the merits of algorithms, and for some applications is a more natural measure of the algorithm’s quality. We propose new type level evaluation measures that, contrary to existing measures, are applicable when items are polysemous, the common case in NLP. We demonstrate the beneﬁts of our measures using a detailed case study, POS induction. We experiment with seven leading algorithms, obtaining useful insights and showing that token and type level measures can weakly or even negatively correlate, which underscores the fact that these two approaches reveal different aspects of clustering quality.
24	Recession Segmentation: Simpler Online Word Segmentation Using Limited Resources Constantine Lignos and Charles Yang show abstracthide abstract In this paper we present a cognitively plausible approach to word segmentation that segments in an online fashion using only local information and a lexicon of previously segmented words. Unlike popular statistical optimization techniques, the learner uses structural information of the input syllables rather than distributional cues to segment words. We develop a memory model for the learner that like a child learner does not recall previously hypothesized words perfectly. The learner attains an F-score of 86.69% in ideal conditions and 85.05% when word recall is unreliable and stress in the input is reduced. These results demonstrate the power that a simple learner can have when paired with appropriate structural constraints on its hypotheses.
25	Computing Optimal Alignments for the IBM-3 Translation Model Thomas Schoenemann show abstracthide abstract Prior work on training the IBM-3 translation model is based on suboptimal methods for computing Viterbi alignments. In this paper, we present the ﬁrst method guaranteed to produce globally optimal alignments. This not only results in improved alignments, it also gives us the opportunity to evaluate the quality of standard hillclimbing methods. Indeed, hillclimbing works reasonably well in practice but still fails to ﬁnd the global optimum for between 2\% and 12\% of all sentence pairs and the probabilities can be several tens of orders of magnitude away from the Viterbi alignment. By reformulating the alignment problem as an Integer Linear Program, we can use standard machinery from global optimization theory to compute the solutions. We use the well-known branch-and-cut method, but also show how it can be customized to the speciﬁc problem discussed in this paper. In fact, a large number of alignments can be excluded from the start without losing global optimality.
26	Semi-Supervised Recognition of Sarcasm in Twitter and Amazon Dmitry Davidov, Oren Tsur and Ari Rappoport show abstracthide abstract Sarcasm is a form of speech act in which the speakers convey their message in an implicit way. The inherently ambiguous nature of sarcasm sometimes makes it hard even for humans to decide whether an utterance is sarcastic or not. Recognition of sarcasm can beneﬁt many sentiment analysis NLP applications, such as review summarization, dialogue systems and review ranking systems. In this paper we experiment with semi-supervised sarcasm identiﬁcation on two very different data sets: a collection of 5.9 million tweets collected from Twitter, and a collection of 66000 product reviews from Amazon. Using the Mechanical Turk we created a gold standard sample in which each sentence was tagged by 3 annotators, obtaining F-scores of 0.78 on the product reviews dataset and 0.83 on the Twitter dataset. We discuss the differences between the datasets and how the algorithm uses them (e.g., for the Amazon dataset the algorithm makes use of structured information). We also discuss the utility of Twitter #sarcasm hashtags for the task.
27	Learning Probabilistic Synchronous CFGs for Phrase-based Translation Markos Mylonakis and Khalil Sima’an show abstracthide abstract Probabilistic phrase-based synchronous grammars are now considered promising devices for statistical machine translation because they can express reordering phenomena between pairs of languages. Learning these hierarchical, probabilistic devices from parallel corpora constitutes a major challenge, because of multiple latent model variables as well as the risk of data overﬁtting. This paper presents an effective method for learning a family of particular interest to MT, binary Synchronous Context-Free Grammars with inverted/monotone orientation (a.k.a. Binary ITG). A second contribution concerns devising a lexicalized phrase reordering mechanism that has complimentary strengths to Chiang’s model. The latter conditions reordering decisions on the surrounding lexical context of phrases, whereas our mechanism works with the lexical content of phrase pairs (akin to standard phrase-based systems). Surprisingly, our experiments on French-English data show that our learning method applied to far simpler models exhibits performance indistinguishable from the Hiero system.
28	A Semi-Supervised Batch-Mode Active Learning Strategy for Improved Statistical Machine Translation Sankaranarayanan Ananthakrishnan, Rohit Prasad, David Stallard and Prem Natarajan show abstracthide abstract The availability of substantial, in-domain parallel corpora is critical for the development of high-performance statistical machine translation (SMT) systems. Such corpora, however, are expensive to produce due to the labor intensive nature of manual translation. We propose to alleviate this problem with a novel, semi-supervised, batch-mode active learning strategy that attempts to maximize in-domain coverage by selecting sentences, which represent a balance between domain match, translation difﬁculty, and batch diversity. Simulation experiments on an English-to-Pashto translation task show that the proposed strategy not only outperforms the random selection baseline, but also traditional active learning techniques based on dissimilarity to existing training data. Our approach achieves a relative improvement of 45.9% in BLEU over the seed baseline, while the closest competitor gained only 24.8% with the same number of selected sentences.
29	Improving Word Alignment by Semi-supervised Ensemble Shujian Huang, Kangxi Li, Xinyu Dai and Jiajun Chen show abstracthide abstract Supervised learning has been recently used to improve the performance of word alignment. However, due to the limited amount of labeled data, the performance of "pure" supervised learning, which only used labeled data, is limited. As a result, many existing methods employ features learnt from a large amount of unlabeled data to assist the task. In this paper, we propose a semi-supervised ensemble method to better incorporate both labeled and unlabeled data during learning. Firstly, we employ an ensemble learning framework, which effectively uses alignment results from different unsupervised alignment models. We then propose to use a semi-supervised learning method, namely Tri-training, to train classiﬁers using both labeled and unlabeled data collaboratively and further improve the result of ensemble learning. Experimental results show that our methods can substantially improve the quality of word alignment. The ﬁnal translation quality of a phrase-based translation system is slightly improved, as well.
30	A Comparative Study of Bayesian Models for Unsupervised Sentiment Detection Chenghua Lin, Yulan He and Richard Everson show abstracthide abstract This paper presents a comparative study of three closely related Bayesian models for unsupervised sentiment detection, namely, the latent sentiment model (LSM), the joint sentiment-topic (JST) model, and the Reverse-JST model. Extensive experiments have been conducted on two corpora, the movie review dataset and the multi-domain sentiment dataset. It has been found that while all the three models achieve either better or comparable performance on these two corpora when compared to the existing unsupervised sentiment classiﬁcation approaches, both JST and Reverse-JST are able to extract sentiment-oriented topics. In addition, Reverse-JST always performs worse than JST suggesting that the JST model is more appropriate for joint sentiment topic detection.
31	A Hybrid Approach to Emotional Sentence Polarity and Intensity Classiﬁcation Jorge Carrillo de Albornoz, Laura Plaza and Pablo Gervás show abstracthide abstract In this paper, the authors present a new approach to sentence level sentiment analysis. The aim is to determine whether a sentence expresses a positive, negative or neutral sentiment, as well as its intensity. The method performs WSD over the words in the sentence in order to work with concepts rather than terms, and makes use of the knowledge in an affective lexicon to label these concepts with emotional categories. It also deals with the effect of negations and quantiﬁers on polarity and intensity analysis. An extensive evaluation in two different domains is performed in order to determine how the method behaves in 2-classes (positive and negative), 3-classes (positive, negative and neutral) and 5-classes (strongly negative, weakly negative, neutral, weakly positive and strongly positive) classiﬁcation tasks. The results obtained compare favorably with those achieved by other systems addressing similar evaluations.
32	Cross-Caption Coreference Resolution for Automatic Image Understanding Micah Hodosh, Peter Young, Cyrus Rashtchian and Julia Hockenmaier show abstracthide abstract In order to “understand” an image, it is necessary to identify not only the depicted entities, but also their attributes, relations between them and the actions they participate in. This information cannot be conveyed by simple keyword annotations. We have collected a corpus of 8108 “action” images associated each with ﬁve simple sentences describing their content and created a simple ontology of entity categories that appear in these images. In order to obtain a consistent semantic representation of the image content from these sentences, we need to ﬁrst identify multiple mentions of the same entities. We present a hierarchical Bayesian model for cross-caption coreference resolution. We also evaluate how well the ontological types of the entities can be recovered.
33	Improved Natural Language Learning via Variance-Regularization Support Vector Machines Shane Bergsma, Dekang Lin and Dale Schuurmans show abstracthide abstract We present a simple technique for learning better SVMs using fewer training examples. Rather than using the standard SVM regularization, we regularize toward low weight-variance. Our new SVM objective remains a convex quadratic function of the weights, and is therefore computationally no harder to optimize than a standard SVM. Variance regularization is shown to enable dramatic improvements in the learning rates of SVMs on three lexical disambiguation tasks.
Shared Task posters
37	Hedge Detection using the RelHunter Approach Eraldo Fernandes, Carlos Crestana and Ruy Milidiú show abstracthide abstract RelHunter is a Machine Learning based method for the extraction of structured information from text. Here, we apply RelHunter to the Hedge Detection task, proposed as the CoNLL 2010 Shared Task. RelHunter’s key design idea is to model the target structures as a relation over entities. The method decomposes the original task into three subtasks: (i) Entity Identiﬁcation; (ii) Candidate Relation Generation; and (iii) Relation Recognition. In the Hedge Detection task, we deﬁne three types of entities: cue chunk, start scope token and end scope token. Hence, the Entity Identiﬁcation subtask is further decomposed into three token classiﬁcation subtasks, one for each entity type. In the Candidate Relation Generation subtask, we apply a simple procedure to generate a ternary candidate relation. Each instance in this relation represents a hedge candidate composed by a cue chunk, a start scope token and an end scope token. For the Relation Recognition subtask, we use a binary classiﬁer to discriminate between true and false candidates. The four classiﬁers are trained with the Entropy Guided Transformation Learning algorithm. When compared to the other hedge detection systems of the CoNLL shared task, our scheme shows a competitive performance. The F-score of our system is 54.05 on the evaluation corpus.
38	A High-Precision Approach to Detecting Hedges and Their Scopes Halil Kilicoglu and Sabine Bergler show abstracthide abstract We extend our prior work on speculative sentence recognition and speculation scope detection in biomedical text to the CoNLL’10 Shared Task on Hedge Detection. In our participation, we sought to assess the extensibility and portability of our prior work, which relies on linguistic categorization and weighting of hedging cues and on syntactic patterns in which these cues play a role. For Task 1a, we tuned our categorization and weighting scheme to recognize hedging in biological text. By accommodating a small number of vagueness quantiﬁers, we were able to extend our methodology to detecting vague sentences in Wikipedia articles. We exploited constituent parse trees in addition to syntactic dependency relations in resolving hedging scope. Our results are competitive with those of closed-domain trained systems and demonstrate that our high-precision oriented methodology is extensible and portable.
39	Exploiting Rich Features for Detecting Hedges and Their Scope Xinxin Li, Jianping Shen, Xiang Gao and Xuan Wang show abstracthide abstract This paper describes our system about detecting hedges and their scope in natural language texts for our participation in CoNLL2010 shared tasks. We formalize these two tasks as sequence labeling problems, and implement them using conditional random ﬁelds (CRFs) model. In the ﬁrst task, we use a greedy forward procedure to select features for the classiﬁer. These features include part-of-speech tag, word form, lemma, chunk tag of tokens in the sentence. In the second task, our system exploits rich syntactic features about dependency structures and phrase structures, which achieves a better performance than only using the ﬂat sequence features. Our system achieves the third score in biological data set for the ﬁrst task, and achieves 0.5265 F1 score for the second task.
40	Uncertainty Detection as Approximate Max-Margin Sequence Labelling Oscar Täckström, Sumithra Velupillai, Martin Hassel, Gunnar Eriksson, Hercules Dalianis and Jussi Karlgren show abstracthide abstract This paper reports experiments for the CoNLL-2010 Shared Task on Learning to detect hedges and their scope in natural language text. We have addressed the experimental tasks as supervised linear maximum margin prediction prob- lems. For sentence level hedge detection in the biological domain we use an L1-regularised binary support vector machine, while for sentence level weasel detection in the Wikipedia domain, we use an L2-regularised approach. We model the in-sentence uncertainty cue and scope detection task as an L2-regularised approximate maximum margin sequence labelling problem, using the BIO-encoding. In addition to surface level features, we use a variety of linguistic features based on a functional dependency analysis. A greedy forward selection strategy is used in exploring the large set of potential features. Our ofﬁcial results for Task 1 for the biological domain were 0.852 F-score, for the Wikipedia set 0.5538 F-score. For Task 2, our ofﬁcial results were 0.0215 for the entire task with a score of 0.6249 for cue detection. After resolving errors and ﬁnal bugs, our ﬁnal results are for Task 1, biological: 0.788, Wikipedia: 0.577; Task 2: 0.396 and 0.785 for cues.
41	Hedge Detection and Scope Finding by Sequence Labeling with Procedural Feature Selection Shaodian Zhang, Hai Zhao, Guodong Zhou and Bao-liang Lu show abstracthide abstract This paper presents a system which adopts a standard sequence labeling technique for hedge detection and scope ﬁnding. For hedge detection, we formulate it as a hedge labeling problem, while for hedge scope ﬁnding, we use a two-step labeling strategy, one for hedge labeling and the other for scope ﬁnding. In particular, various kinds of syntactic dependencies are systemically exploited and effectively integrated using a large-scale normalized feature selection method. Evaluation on the CoNLL-2010 shared task shows that our system achieves stable and competitive results for all the closed tasks. Furthermore, post-deadline experiments show that the performance can be much further improved using a sufﬁcient feature selection.
42	Learning to Detect Hedges and their Scope using CRF Qi Zhao, Chengjie Sun, Bingquan Liu and Yong Cheng show abstracthide abstract This paper presents an approach for extracting the hedge cues and their scopes in BioScope corpus using two CRF models for CoNLL 2010 shared task. In the ﬁrst task, the HCDic feature is proposed to improve the system performances, getting better performance (84.1% in F-score) than the baseline. The HCDic feature is also helpful to make use of cross-domain resources. The comparison of our methods based on between BioScope and Wikipedia corpus is given, which shows that ours are good at hedge cues detection in BioScope corpus but short at the in Wikipedia corpus. To detect the scope of hedge cues, we make rules to post process the text. For future work, we will look forward to constructing regulations for the HCDic to improve our system.
43	Exploiting Multi-Features to Detect Hedges and Their Scope in Biomedical Texts Huiwei Zhou, Xiaoyan Li, Degen Huang, Zezhong Li and Yuansheng Yang show abstracthide abstract In this paper, we present a machine learning approach that detects hedge cues and their scope in biomedical texts. Identifying hedged information in texts is a kind of semantic ﬁltering of texts and it is important since it could extract speculative information from factual information. In order to deal with the semantic analysis problem, various evidential features are proposed and integrated through a Conditional Random Fields (CRFs) model. Hedge cues that appear in the training dataset are regarded as keywords and employed as an important feature in hedge cue identiﬁcation system. For the scope ﬁnding, we construct a CRF-based system and a syntactic pattern-based system, and compare their performances. Experiments using test data from CoNLL-2010 shared task show that our proposed method is robust. F-score of the biological hedge detection task and scope ﬁnding task achieves 86.32% and 54.18% in in-domain evaluations respectively.
44	A Lucene and Maximum Entropy Model Based Hedge Detection System Lin Chen and Barbara Di Eugenio show abstracthide abstract This paper describes the approach to hedge detection we developed, in order to participate in the shared task at CoNLL 2010. A supervised learning approach is employed in our implementation. Hedge cue annotations in the training data are used as the seed to build a reliable hedge cue set. Maximum Entropy(MaxEnt) model is used as the learning technique to determine uncertainty. By making use of Apache Lucene, we are able to do fuzzy string match to extract hedge cues, and to incorporate part-of-speech(POS) tags in hedge cues. Not only can our system determine the certainty of the sentence, but is also able to ﬁnd all the contained hedges. Our system was ranked third on the Wikipedia dataset. In later experiments with different parameters, we further improved our results, with a 0.612 F-score on the Wikipedia dataset, and a 0.802 F-score on the biological dataset.
45	HedgeHunter: A System for Hedge Detection and Uncertainty Classiﬁcation David Clausen show abstracthide abstract With the dramatic growth of scientiﬁc publishing, Information Extraction (IE) systems are becoming an increasingly important tool for large scale data analysis. Hedge detection and uncertainty classiﬁcation are important components of a high precision IE system. This paper describes a two part supervised system which classiﬁes words as hedge or non-hedged and sentences as certain or uncertain in biomedical and Wikipedia data. In the ﬁrst stage, our system trains a logistic regression classiﬁer to detect hedges based on lexical and Part-of-Speech collocation features. In the second stage, we use the output of the hedge classiﬁer to generate sentence level features based on the number of hedge cues, the identity of hedge cues, and a Bag-of-Words feature vector to train a logistic regression classiﬁer for sentence level uncertainty. With the resulting classiﬁcation, an IE system can then discard facts and relations extracted from these sentences or treat them as appropriately doubtful. We present results for in domain training and testing and cross domain training and testing based on a simple union of training sets.
46	Exploiting CCG Structures with Tree Kernels for Speculation Detection Liliana Paola Mamani Sanchez, Baoli Li and Carl Vogel show abstracthide abstract Our CoNLL-2010 speculative sentence detector disambiguates putative keywords based on the following considerations: a speculative keyword may be composed of one or more word tokens; a speculative sentence may have one or more speculative keywords; and if a sentence contains at least one real speculative keyword, it is deemed speculative. A tree kernel classiﬁer is used to assess whether a potential speculative keyword conveys speculation. We exploit information implicit in tree structures. For prediction efﬁciency, only a segment of the whole tree around a speculation keyword is considered, along with morphological features inside the segment and information about the containing document. A maximum entropy classiﬁer is used for sentences not covered by the tree kernel classiﬁer. Experiments on the Wikipedia data set show that our system achieves 0.55 F-measure (in-domain).
47	Uncertainty Learning using SVMs and CRFs Vinodkumar Prabhakaran show abstracthide abstract In this work, I explore the use of SVMs and CRFs in the problem of predicting certainty in sentences. I consider this as a task of tagging uncertainty cues in context, for which I used lexical, wordlist-based and deep-syntactic features. Results show that the syntactic context of the tokens in conjunction with the wordlist-based features turned out to be useful in predicting uncertainty cues.
48	Features for Detecting Hedge Cues Nobuyuki Shimizu and Hiroshi Nakagawa show abstracthide abstract We present a sequential labeling approach to hedge cue detection submitted to the CoNLL-2010 shared task, biological por- tion of task 1. Our main approach is as fol- lows. We make use of partial syntactic in- formation together with features obtained from the unlabeled corpus, and convert the t ask into a sequential BIO-tagging. If a cue is found, a sentence is classiﬁed as uncertain and certain otherwise. To ex- amine a large number of feature combi- nations, we employ a genetic algorithm. While some obtained features are difﬁcult to interpret, they were shown to improve the performance of the ﬁnal system.
49	A Simple Ensemble Method for Hedge Identiﬁcation Ferenc Szidarovszky, Illés Solt and Domonkos Tikk show abstracthide abstract We present in this paper a simple hedge identiﬁcation method and its application on biomedical text. The problem at hand is a subtask of CoNLL 2010 shared task. Our solution consists of two classiﬁers, a statistical one and a CRF model, and a simple combination schema that combines their predictions. We report in detail on each component of our system and discuss the results. We also show that a more sophisticated combination schema could improve the F-score signiﬁcantly.
50	A Baseline Approach for Detecting Sentences Containing Uncertainty Erik Tjong Kim Sang show abstracthide abstract We apply a baseline approach to the CoNLL-2010 shared task data sets on hedge detection. Weights have been assigned to cue words marked in the training data based on their occurrences in certain and uncertain sentences. New sentences received scores that correspond with those of their best scoring cue word, if present. The best acceptance scores for uncertain sentences were determined using 10-fold cross validation on the training data. This approach performed reasonably on the shared task’s biological (F=82.0) and Wikipedia (F=62.8) data sets.
51	Hedge Classiﬁcation with Syntactic Dependency Features based on an Ensemble Classiﬁer Yi Zheng, Qifeng Dai, Qiming Luo and Enhong Chen show abstracthide abstract We present our CoNLL-2010 Shared Task system in the paper. The system operates in three steps: sequence labeling, syntactic de-pendency parsing, and classiﬁcation. We have participated in the Shared Task 1. Our experi-mental results measured by the in-domain and cross-domain F-scores on the biological do-main are 81.11% and 67.99%, and on the Wikipedia domain 55.48% and 55.41%.

12:30–14:00

Lunch

14:00–15:15

Session 3: Semantics and Information Extraction

14:00–14:25	Online Entropy-based Model of Lexical Category Acquisition Grzegorz Chrupała and Afra Alishahi show abstracthide abstract Children learn a robust representation of lexical categories at a young age. We propose an incremental model of this process which efﬁciently groups words into lexical categories based on their local context using an information-theoretic criterion. We train our model on a corpus of child-directed speech from CHILDES and show that the model learns a ﬁne-grained set of intuitive word categories. Furthermore, we propose a novel evaluation approach by comparing the efﬁciency of our induced categories against other category sets (including traditional part of speech tags) in a variety of language tasks. We show the categories induced by our model typically outperform the other category sets.
14:25–14:50	Tagging and Linking Web Forum Posts Su Nam Kim, Li Wang and Timothy Baldwin show abstracthide abstract We propose a method for annotating post-to-post discourse structure in online user forum data, in the hopes of improving troubleshooting-oriented information access. We introduce the tasks of: (1) post classiﬁcation, based on a novel dialogue act tag set; and (2) link classiﬁcation. We also introduce three feature sets (structural features, post context features and semantic features) and experiment with three discriminative learners (maximum entropy, SVM-HMM and CRF). We achieve above-baseline results for both dialogue act and link classiﬁcation, with interesting divergences in which feature sets perform well over the two sub-tasks, and go on to perform preliminary investigation of the interaction between post tagging and linking.
14:50–15:15	Joint Entity and Relation Extraction using Card-Pyramid Parsing Rohit Kate and Raymond Mooney show abstracthide abstract Both entity and relation extraction can beneﬁt from being performed jointly, allowing each task to correct the errors of the other. We present a new method for joint entity and relation extraction using a graph we call a “card-pyramid”. This graph compactly encodes all possible entities and relations in a sentence, reducing the task of their joint extraction to jointly labeling its nodes. We give an efﬁcient labeling algorithm that is analogous to parsing using dynamic programming. Experimental results show improved results for our joint extraction method compared to a pipelined approach.

15:30–16:00

Break

16:00–17:15

Session 4: Machine learning

16:00–16:25	Distributed Asynchronous Online Learning for Natural Language Processing Kevin Gimpel, Dipanjan Das and Noah A. Smith show abstracthide abstract Recent speed-ups for training large-scale models like those found in statistical NLP exploit distributed computing (either on multicore or "cloud" architectures) and rapidly converging online learning algorithms. Here we aim to combine the two. We focus on distributed, "mini-batch" learners that make frequent updates asynchronously (Nedic et al., 2001; Langford et al., 2009). We generalize existing asynchronous algorithms and experiment extensively with structured prediction problems from NLP, including discriminative, unsupervised, and non-convex learning scenarios. Our results show asynchronous learning can provide substantial speedups compared to distributed and single-processor mini-batch algorithms with no signs of error arising from the approximate nature of the technique.
16:25–16:50	On Reverse Feature Engineering of Syntactic Tree Kernels Daniele Pighin and Alessandro Moschitti show abstracthide abstract In this paper, we provide a theoretical framework for feature selection in tree kernel spaces based on gradient-vector components of kernel-based machines. We show that a huge number of features can be discarded without a signiﬁcant decrease in accuracy. Our selection algorithm is as accurate as and much more efﬁcient than those proposed in previous work. Comparative experiments on three interesting and very diverse classiﬁcation tasks, i.e. Question Classiﬁcation, Relation Extraction and Semantic Role Labeling, support our theoretical ﬁndings and demonstrate the algorithm performance.
16:50–17:15	Inspecting the Structural Biases of Dependency Parsing Algorithms Yoav Goldberg and Michael Elhadad show abstracthide abstract We propose the notion of a structural bias inherent in a parsing system with respect to the language it is aiming to parse. This structural bias characterizes the behaviour of a parsing system in terms of structures it tends to under- and over- produce. We propose a Boosting-based method for uncovering some of the structural bias inherent in parsing systems. We then apply our method to four English dependency parsers (an Arc-Eager and Arc-Standard transition-based parsers, and ﬁrst- and second-order graph-based parsers). We show that all four parsers are biased with respect to the kind of annotation they are trained to parse. We present a detailed analysis of the biases that highlights speciﬁc differences and commonalities between the parsing systems, and improves our understanding of their strengths and weaknesses.

17:15–17:45

SIGNLL Business Meeting and Best Paper Award

⇑ WS2: WMT’10/MetricsMATR

Joint Fifth Workshop on Statistical Machine Translation and MetricsMATR
July 15–16
Venue B, Lecture Hall 3
Chairs: Chris Callison-Burch, Philipp Koehn, Christof Monz and Kay Peterson
Homepage

Thursday, July 15, 2010

8:45–9:00

Opening Remarks

9:00–9:50

Full Paper Session 1

9:00–9:25

A Semi-supervised Word Alignment Algorithm with Partial Manual Alignments

Qin Gao, Nguyen Bach and Stephan Vogel

show abstract

9:25–9:50

Fast Consensus Hypothesis Regeneration for Machine Translation

Boxing Chen, George Foster and Roland Kuhn

show abstract

9:50–10:45

Shared Translation Task

9:50–10:15

Findings of the 2010 Joint Workshop on Statistical Machine Translation and Metrics for Machine Translation

Chris Callison-Burch, Philipp Koehn, Christof Monz, Kay Peterson, Mark Przybocki and Omar Zaidan

show abstract

10:15–10:45

Boaster Session 1: Translation Task

10:45–11:00

Morning Break

11:00–12:30

Poster Session: Translation Task

101	LIMSI’s Statistical Translation Systems for WMT’10 Alexandre Allauzen, Josep M. Crego, İlknur Durgar El-Kahlout and François Yvon show abstracthide abstract This paper describes our Statistical Machine Translation systems for the WMT10 evaluation, where LIMSI participated for two language pairs (French-English and German-English, in both directions). For German-English, we concentrated on normalizing the German side through a proper preprocessing, aimed at reducing the lexical redundancy and at splitting complex compounds. For French-English, we studied two extensions of our in-house N-code decoder: ﬁrstly, the effect of integrating a new bilingual reordering model; second, the use of adaptation techniques for the translation model. For both set of experiments, we report the improvements obtained on the development and test data.
102	2010 Failures in English-Czech Phrase-Based MT Ondřej Bojar and Kamil Kos show abstracthide abstract The paper describes our experiments with English-Czech machine translation for WMT10 in 2010. Focusing primarily on the translation to Czech, our additions to the standard Moses phrase-based MT pipeline include two-step translation to overcome target-side data sparseness and optimization towards SemPOS, a metric better suited for evaluating Czech. Unfortunately, none of the approaches bring a signiﬁcant improvement over our standard setup.
103	An Empirical Study on Development Set Selection Strategy for Machine Translation Learning Hui Cong, Zhao Hai, Lu Bao-Liang and Song Yan show abstracthide abstract In this paper we have described our system for WMT10 machine translation shared task and discussed the development set selection. Comparing the results using different development sets and batch processing, we think that the choice of the development set would play a important role in the translation performance, in other words, the unseen translation is tuning-sensitive. We have found that the combined development set could lead results be more stable and better enough. The next step is to ﬁnd out the speciﬁc critical factors in the development selection which can guide us to improve translation performance.
104	The University of Maryland Statistical Machine Translation System for the Fifth Workshop on Machine Translation Vladimir Eidelman, Chris Dyer and Philip Resnik show abstracthide abstract This paper describes the system we developed to improve German-English translation of News text for the shared task of the Fifth Workshop on Statistical Machine Translation. Working within cdec, an open source modular framework for machine translation, we explore the beneﬁts of several modiﬁcations to our hierarchical phrase-based model, including segmentation lattices, minimum Bayes Risk decoding, grammar extraction methods, and varying language models. Furthermore, we analyze decoder speed and memory performance across our set of models and show there is an important trade-off that needs to be made.
105	Further Experiments with Shallow Hybrid MT Systems Christian Federmann, Andreas Eisele, Yu Chen, Sabine Hunsicker, Jia Xu and Hans Uszkoreit show abstracthide abstract We describe our hybrid machine translation system which has been developed for and used in the WMT10 shared task. We compute translations from a rule-based MT system and combine the resulting translation “templates” with partial phrases from a state-of-the-art phrase- based, statistical MT engine. Phrase substitution is guided by several decision factors, a continuation of previous work within our group. For the shared task, we have computed translations for six language pairs including English, German, French and Spanish. Our experiments have shown that our shallow substitution approach can effectively improve the translation result from the RBMT system; however it has also become clear that a deeper integration is needed to further improve translation quality.
106	Improved Features and Grammar Selection for Syntax-Based MT Greg Hanneman, Jonathan Clark and Alon Lavie show abstracthide abstract We present the Carnegie Mellon University Stat-XFER group submission to the WMT 2010 shared translation task. Updates to our syntax-based SMT system mainly fell in the areas of new feature formulations in the translation model and improved ﬁltering of SCFG rules. Compared to our WMT 2009 submission, we report a gain of 1.73 BLEU by using the new features and decoding environment, and a gain of up to 0.52 BLEU from improved grammar selection.
107	FBK at WMT 2010: Word Lattices for Morphological Reduction and Chunk-based Reordering Christian Hardmeier, Arianna Bisazza and Marcello Federico show abstracthide abstract FBK participated in the WMT~2010 Machine Translation shared task with phrase-based Statistical Machine Translation systems based on the Moses decoder for English-German and German-English translation. Our work concentrates on exploiting the available language modelling resources by using linear mixtures of large 6-gram language models and on addressing linguistic differences between English and German with methods based on word lattices. In particular, we use lattices to integrate a morphological analyser for German into our system, and we present some initial work on rule-based word reordering.
109	The RWTH Aachen Machine Translation System for WMT 2010 Carmen Heger, Joern Wuebker, Matthias Huck, Gregor Leusch, Saab Mansour, Daniel Stein and Hermann Ney show abstracthide abstract In this paper we describe the statistical machine translation system of the RWTH Aachen University developed for the translation task of the Fifth Workshop on Statistical Machine Translation. State-of-the-art phrase-based and hierarchical statistical MT systems are augmented with appropriate morpho-syntactic enhancements, as well as alternative phrase training methods and extended lexicon models. For some tasks, a system combination of the best systems was used to generate a ﬁnal hypothesis. We participated in the constrained condition of German-English and French-English in each translation direction.
110	Using Collocation Segmentation to Augment the Phrase Table Carlos A. Henríquez Q., Marta Ruiz Costa-jussà, Vidas Daudaravicius, Rafael E. Banchs and José B. Mariño show abstracthide abstract Abstract This paper describes the 2010 phrase-based statistical machine translation system developed at the TALP Research Center of the UPC in cooperation with BMIC and VMU. In phrase-based SMT, the phrase table is the main tool in translation. It is created extracting phrases from an aligned parallel corpus and then computing translation model scores with them. Performing a collocation segmentation over the source and target corpus before the alignment cause that different and larger phrases are extracted from the same original documents. We performed this segmentation and used the union of this phrase set with the phrase set extracted from the non-segmented corpus to compute the phrase table. We present the conﬁgurations considered and also report results obtained with internal and ofﬁcial test sets.
111	The RALI Machine Translation System for WMT 2010 Stéphane Huet, Julien Bourdaillet, Alexandre Patry and Philippe Langlais show abstracthide abstract We describe our system for the translation task of WMT 2010. This system, developed for the English-French and French-English directions, is based on Moses and was trained using only the resources supplied for the workshop. We report experiments to enhance it with out-of-domain parallel corpora sub-sampling, N-best list post-processing and a French grammatical checker.
112	Exodus - Exploring SMT for EU Institutions Michael Jellinghaus, Alexandros Poulis and David Kolovratník show abstracthide abstract In this paper, we describe Exodus, a joint pilot project of the European Commission’s Directorate-General for Translation (DGT) and the European Parliament’s Directorate-General for Translation (DG TRAD) which explores the potential of deploying new approaches to machine translation in European institutions. We have participated in the English-to-French track of this year’s WMT10 shared translation task using a system trained on data previously extracted from large in-house translation memories.
113	More Linguistic Annotation for Statistical Machine Translation Philipp Koehn, Barry Haddow, Philip Williams and Hieu Hoang show abstracthide abstract We report on efforts to build large-scale translation systems for eight European language pairs. We achieve most gains from the use of larger training corpora and basic modeling, but also show promising results from integrating more linguistic annotation.
114	LIUM SMT Machine Translation System for WMT 2010 Patrik Lambert, Sadaf Abdul-Rauf and Holger Schwenk show abstracthide abstract This paper describes the development of French–English and English–French machine translation systems for the 2010 WMT shared task evaluation. These systems were standard phrase-based statistical systems based on the Moses decoder, trained on the provided data only. Most of our efforts were devoted to the choice and extraction of bilingual data used for training. We ﬁltered out some bilingual corpora and pruned the phrase table. We also investigated the impact of adding two types of additional bilingual texts, extracted automatically from the available monolingual data. We ﬁrst collected bilingual data by performing automatic translations of monolingual texts. The second type of bilingual text was harvested from comparable corpora with Information Retrieval techniques.
115	Lessons from NRC’s Portage System at WMT 2010 Samuel Larkin, Boxing Chen, George Foster, Ulrich Germann, Eric Joanis, Howard Johnson and Roland Kuhn show abstracthide abstract NRC’s Portage system participated in the English-French (E-F) and French-English (F-E) translation tasks of the ACL WMT 2010 evaluation. The most notable improvement over earlier versions of Portage is an efﬁcient implementation of lattice MERT. While Portage has typically performed well in Chinese to English MT evaluations, most recently in the NIST09 evaluation, our participation in WMT 2010 revealed some interesting differences be-tween Chinese-English and E-F/F-E translation, and alerted us to certain weak spots in our system. Most of this paper discusses the problems we found in our system and ways of ﬁxing them. We learned several lessons that we think will be of general interest.
116	Joshua 2.0: A Toolkit for Parsing-Based Machine Translation with Syntax, Semirings, Discriminative Training and Other Goodies Zhifei Li, Chris Callison-Burch, Chris Dyer, Juri Ganitkevitch, Ann Irvine, Sanjeev Khudanpur, Lane Schwartz, Wren Thornton, Ziyuan Wang, Jonathan Weese and Omar Zaidan show abstracthide abstract We describe the progress we have made in the past year on Joshua (Li et al., 2009), an open source toolkit for parsing based machine translation. The new functionality includes: support for translation grammars with a rich set of syntactic nonterminals, the ability for external modules to posit constraints on how spans in the input sentence should be translated, lattice parsing for dealing with input uncertainty, a semiring framework that provides a uniﬁed way of doing various dynamic programming calculations, variational decoding for approximating the intractable MAP decoding, hypergraph-based discriminative training for better feature engineering, a parallelized MERT module, document-level and tail-based MERT, visualization of the derivation trees, and a cleaner pipeline for MT experiments.
117	The Karlsruhe Institute for Technology Translation System for the ACL-WMT 2010 Jan Niehues, Teresa Herrmann, Mohammed Mediani and Alex Waibel show abstracthide abstract This paper describes our phrase-based Statistical Machine Translation (SMT) system for the WMT10 Translation Task. We submitted translations for the German to English and English to German translation tasks. Compared to state-of-the-art phrase-based systems we preformed additional preprocessing and used a discriminative word alignment approach. The word reordering was modeled using POS information and we extended the translation model with additional features.
118	MATREX: The DCU MT System for WMT 2010 Sergio Penkale, Rejwanul Haque, Sandipan Dandapat, Pratyush Banerjee, Ankit K. Srivastava, Jinhua Du, Pavel Pecina, Sudip Kumar Naskar, Mikel L. Forcada and Andy Way show abstracthide abstract This paper describes the DCU machine translation system in the evaluation campaign of the Joint Fifth Workshop on Statistical Machine Translation and Metrics in ACL-2010. We describe the modular design of our multi-engine machine translation (MT) system with particular focus on the components used in this participation. We participated in the English-Spanish and English-Czech translation tasks, in which we employed our multi-engine architecture to translate. We also participated in the system combination task which was carried out by the MBR decoder and confusion network decoder.
119	The Cunei Machine Translation Platform for WMT ’10 Aaron Phillips show abstracthide abstract This paper describes the Cunei Machine Translation Platform and how it was used in the WMT ’10 German to English and Czech to English translation tasks.
120	The CUED HiFST System for the WMT10 Translation Shared Task Juan Pino, Gonzalo Iglesias, Adrià de Gispert, Graeme Blackwood, Jamie Brunning and William Byrne show abstracthide abstract This paper describes the Cambridge University Engineering Department submission to the Fifth Workshop on Statistical Machine Translation. We report results for the French-English and Spanish-English shared translation tasks in both directions. The CUED system is based on HiFST, a hierarchical phrase-based decoder implemented using weighted ﬁnite-state transducers. In the French-English task, we investigate the use of context-dependent alignment models. We also show that lattice minimum Bayes-risk decoding is an effective framework for multi-source translation, leading to large gains in BLEU score.
121	The LIG Machine Translation System for WMT 2010 Marion Potet, Laurent Besacier and Hervé Blanchon show abstracthide abstract This paper describes the system submitted by the Laboratory of Informatics of Grenoble (LIG) for the ﬁfth Workshop on Statistical Machine Translation. We participated to the news shared translation task for the French-English language pair. We investigated differents techniques to simply deal with Out-Of-Vocabulary words in a statistical phrase-based machine translation system and analyze their impact on translation quality. The ﬁnal submission is a combination between a standard phrase-based system using the Moses decoder, with appropriate setups and pre-processing, and a lemmatized one to prevent Out-Of-Vocabulary conjugate verbs.
122	Linear Inversion Transduction Grammar Alignments as a Second Translation Path Markus Saers, Joakim Nivre and Dekai Wu show abstracthide abstract We explore the possibility of using Stochastic Bracketing Linear Inversion Transduction Grammars for a full-scale German–English translation task, both on their own and in conjunction with alignments induced with GIZA++. The rationale for transduction grammars, the details of the system and some results are presented.
123	UPV-PRHLT English–Spanish System for WMT10 Germán Sanchis-Trilles, Jesús Andrés-Ferrer, Guillem Gascó, Jesús González Rubio, Pascual Martínez-Gómez, Martha-Alicia Rocha, Joan-Andreu Sánchez and Francisco Casacuberta show abstracthide abstract In this paper, the system submitted by the PRHLT group for the Fifth Workshop on Statistical Machine Translation of ACL2010 is presented. On this evaluation campaign, we have worked on the English–Spanish language pair, putting special emphasis on two problems derived from the large amount of data available. The ﬁrst one, how to optimize the use of the monolingual data within the language model, and the second one, how to make a good use of all the bilingual data provided without making use of unnecessary computational resources.
124	Reproducible Results in Parsing-Based Machine Translation: The JHU Shared Task Submission Lane Schwartz show abstracthide abstract We present the Johns Hopkins University submission to the 2010 WMT shared translation task. We describe processing steps using open data and open source software used in our submission, and provide the scripts and conﬁgurations required to train, tune, and test our machine translation system.
125	Vs and OOVs: Two Problems for Translation between German and English Sara Stymne, Maria Holmqvist and Lars Ahrenberg show abstracthide abstract In this paper we report on experiments with three preprocessing strategies for improving translation output in a statistical MT system. In training, two reordering strategies were studied: (i) reorder on the basis of the alignments from Giza++, and (ii) reorder by moving all verbs to the end of segments. In translation, out-of-vocabulary words were preprocessed in a knowledge-lite fashion to identify a likely equivalent. All three strategies were implemented for our translation systems between English and German submitted to the WMT10 shared task. Reordering by using Giza++ in two phases had a small, but consistent positive effect on metrics for our systems. Aligning verbs by co-locating them at the end of sentences had a largely negative effect. However, it seems that this strategy produced some useful alignments, since when its output was concatenated with the baseline alignment before extracting the phrase table, there were consistent improvements. Combining reordering in training with the knowledge-lite method for handling out-of-vocabulary words led to signiﬁcant improvements on Meteor scores for translation between German and English in both directions.
126	To Cache or not to Cache? Experiments with Adaptive Models in Statistical Machine Translation Jörg Tiedemann show abstracthide abstract We report results of our submissions to the WMT 2010 shared translation task in which we applied a system that includes adaptive language and translation models. Adaptation is implemented using exponentially decaying caches storing previous translations as the history for new predictions. Evidence from the cache is then mixed with the global background model. The main problem in this setup is error propagation and our submissions essentially failed to improve over the competitive baseline. There are slight improvements in lexical choice but the global performance decreases in terms of BLEU scores.
127	Applying Morphological Decompositions to Statistical Machine Translation Sami Virpioja, Jaakko Väyrynen, Andre Mansikkaniemi and Mikko Kurimo show abstracthide abstract This paper describes the Aalto submission for the German-to-English and the Czech-to-English translation tasks of the ACL 2010 Joint Fifth Workshop on Statistical Machine Translation and MetricsMATR. Statistical machine translation has focused on using words, and longer phrases constructed from words, as tokens in the system. In contrast, we apply different morphological decompositions of words using the unsupervised Morfessor algorithms. While translation models trained using the morphological decompositions did not improve the BLEU scores, we show that the Minimum Bayes Risk combination with a word-based translation model produces signiﬁcant improvements for the German-to-English translation. However, we did not see improvements for the Czech-to-English translations.
128	Maximum Entropy Translation Model in Dependency-Based MT Framework Zdeněk Žabokrtský, Martin Popel and David Mareček show abstracthide abstract Maximum Entropy Principle has been used successfully in various NLP tasks. In this paper we propose a forward translation model consisting of a set of maximum entropy classiﬁers: a separate classiﬁer is trained for each (sufﬁciently frequent) source-side lemma. In this way the estimates of translation probabilities can be sensitive to a large number of features derived from the source sentence (including non-local features, features making use of sentence syntactic structure, etc.). When integrated into English-to-Czech dependency-based translation scenario implemented in the TectoMT framework, the new translation model signiﬁcantly outperforms the baseline model (MLE) in terms of BLEU. The performance is further boosted in a conﬁguration inspired by Hidden Tree Markov Models which combines the maximum entropy translation model with the target-language dependency tree model.
129	UCH-UPV English–Spanish system for WMT10 Francisco Zamora-Martinez and Germán Sanchis-Trilles show abstracthide abstract This paper describes the system developed in collabaration between UCH and UPV for the 2010 WMT. For this year’s workshop, we present a system for English-Spanish translation. Output N-best lists were rescored via a target Neural Network Language Model, yielding improvements in the ﬁnal translation quality as measured by BLEU and TER.
130	Hierarchical Phrase-Based MT at the Charles University for the WMT 2010 Shared Task Daniel Zeman show abstracthide abstract We describe our experiments with hierarchical phrase-based machine translation for WMT 2010 Shared Task. We provide a detailed description of our conﬁguration and data so the results are replicable. For English-to-Czech translation, we experiment with several datasets of various sizes and with various preprocessing sequences. For the other 7 translation directions, we just present the baseline results.

12:30–14:00

Lunch

14:00–15:00

Invited Talk

Hermann Ney

15:05–15:30

Full Paper Session 2

15:05–15:30

Incremental Decoding for Phrase-based Statistical Machine Translation

Baskaran Sankaran, Ajeet Grewal and Anoop Sarkar

show abstract

15:30–16:00

Afternoon Break

16:00–17:40

Full Paper Session 3

16:00–16:25	How to Avoid Burning Ducks: Combining Linguistic Analysis and Corpus Statistics for German Compound Processing Fabienne Fritzinger and Alexander Fraser show abstracthide abstract Compound splitting is an important problem in many NLP applications which must be solved in order to address issues of data sparsity. Previous work has shown that linguistic approaches for German compound splitting more often produce a correct splitting, but corpus-based approaches work best for phrase-based statistical machine translation from German to English, a worrisome contradiction. We address this situation by combining linguistic analysis with corpus-based statistics and obtaining better results in terms of both producing splittings according to a gold standard and statistical machine translation performance.
16:25–16:50	Chunk-based Verb Reordering in VSO Sentences for Arabic-English Statistical Machine Translation Arianna Bisazza and Marcello Federico show abstracthide abstract In Arabic-to-English phrase-based statistical machine translation, a large number of syntactic disﬂuencies are due to wrong long-range reordering of the verb in VSO sentences, where the verb is anticipated with respect to the English word order. In this paper, we propose a chunk-based reordering technique to automatically detect and displace clause-initial verbs in the Arabic side of a word-aligned parallel corpus. This method is applied to preprocess the training data, and to collect statistics about verb movements. From this analysis, speciﬁc verb reordering lattices are then built on the test sentences before decoding them. The application of our reordering methods on the training and test sets results in consistent BLEU score improvements on the NIST-MT 2009 Arabic-English benchmark.
16:50–17:15	Head Finalization: A Simple Reordering Rule for SOV Languages Hideki Isozaki, Katsuhito Sudoh, Hajime Tsukada and Kevin Duh show abstracthide abstract English is a typical SVO (Subject-Verb-Object) language, while Japanese is a typical SOV language. Conventional Statistical Machine Translation (SMT) systems work well within each of these language families. However, SMT-based translation from an SVO language to an SOV language does not work well because their word orders are completely different. Recently, a few groups have proposed rule-based preprocessing methods to mitigate this problem (Xu et al., 2009; Hong et al., 2009). These methods rewrite SVO sentences to derive more SOV-like sentences by using a set of handcrafted rules. In this paper, we propose an alternative single reordering rule: Head Finalization. This is a syntax-based preprocessing approach that offers the advantage of simplicity. We do not have to be concerned about partof-speech tags or rule weights because the powerful Enju parser allows us to implement the rule at a general level. Our experiments show that its result, Head Final English (HFE), follows almost the same order as Japanese. We also show that this rule improves automatic evaluation scores.
17:15–17:40	Aiding Pronoun Translation with Co-Reference Resolution Ronan Le Nagard and Philipp Koehn show abstracthide abstract We propose a method to improve the translation of pronouns by resolving their co-reference to prior mentions. We report results using two different co-reference resolution methods and point to remaining challenges.

Friday, July 16, 2010

9:00–11:00

Shared Task Presentations

9:00–10:00	Overview: MetricsMATR
10:00–10:30	Discussion
10:30–10:45	Boaster Session
10:45–11:00	Morning Break

11:00–12:30

Poster Sessions

Full Paper
101	Jane: Open Source Hierarchical Translation, Extended with Reordering and Lexicon Models David Vilar, Daniel Stein, Matthias Huck and Hermann Ney show abstracthide abstract We present Jane, RWTH’s hierarchical phrase-based translation system, which has been open sourced for the scientiﬁc community. This system has been in development at RWTH for the last two years and has been successfully applied in different machine translation evaluations. It includes extensions to the hierarchical approach developed by RWTH as well as other research institutions. In this paper we give an overview of its main features. We also introduce a novel reordering model for the hierarchical phrase-based approach which further enhances translation performance, and analyze the effect some recent extended lexicon models have on the performance of the system.
System Combination Task
102	MANY: Open Source MT System Combination at WMT’10 Loïc Barrault show abstracthide abstract LIUM participated in the System Combination task of the Fifth Workshop on Statistical Machine Translation (WMT 2010). Hypotheses from 5 French/English MT systems were combined with MANY, an open source system combination software based on confusion networks currently developed at LIUM. The system combination yielded signiﬁcant improvements in BLEU score when applied on WMT’09 data. The same behavior has been observed when tuning is performed on development data of this year evaluation.
103	Adaptive Model Weighting and Transductive Regression for Predicting Best System Combinations Ergun Bicici and S. Serdar Kozat show abstracthide abstract We analyze adaptive model weighting techniques for reranking using instance scores obtained by L1 regularized transductive regression. Competitive statistical machine translation is an on-line learning technique for sequential translation tasks where we try to select the the best among competing statistical machine translators. The competitive predictor assigns a probability per model weighted by the sequential performance. We deﬁne additive, multiplicative, and loss-based weight updates with exponential loss functions for competitive statistical machine translation. Without any pre-knowledge of the performance of the translation models, we succeed in achieving the performance of the best model in all systems and surpass their performance in most of the language pairs we considered.
104	L1 Regularized Regression for Reranking and System Combination in Machine Translation Ergun Bicici and Deniz Yuret show abstracthide abstract We use L1 regularized transductive regression to learn mappings between source and target features of the training sets derived for each test sentence and use these mappings to rerank translation outputs. We compare the effectiveness of L1 regularization techniques for regression to learn mappings between features given in a sparse feature matrix. The results show the effectiveness of using L1 regularization versus L2 used in ridge regression. We show that regression mapping is effective in reranking translation outputs and in selecting the best system combinations with encouraging results on different language pairs.
105	An Augmented Three-Pass System Combination Framework: DCU Combination System for WMT 2010 Jinhua Du, Pavel Pecina and Andy Way show abstracthide abstract This paper describes the augmented threepass system combination framework of the Dublin City University (DCU) MT group for the WMT 2010 system combination task. The basic three-pass framework includes building individual confusion networks (CNs), a super network, and a modiﬁed Minimum Bayes-risk (mCon- MBR) decoder. The augmented parts for WMT2010 tasks include 1) a rescoring component which is used to re-rank the N-best lists generated from the individual CNs and the super network, 2) a new hypothesis alignment metric – TERp – that is used to carry out English-targeted hypothesis alignment, and 3) more different backbone-based CNs which are employed to increase the diversity of the mConMBR decoding phase. We took part in the combination tasks of Englishto- Czech and French-to-English. Experimental results show that our proposed combination framework achieved 2.17 absolute points (13.36 relative points) and 1.52 absolute points (5.37 relative points) in terms of BLEU score on English-to- Czech and French-to-English tasks respectively than the best single system. We also achieved better performance on human evaluation.
106	The UPV-PRHLT Combination System for WMT 2010 Jesús González Rubio, Germán Sanchis-Trilles, Joan-Andreu Sánchez, Jesús Andrés-Ferrer, Guillem Gascó, Pascual Martínez-Gómez, Martha-Alicia Rocha and Francisco Casacuberta show abstracthide abstract UPV-PRHLT participated in the System Combination task of the Fifth Workshop on Statistical Machine Translation (WMT 2010). On each translation direction, all the submitted systems were combined into a consensus translation. These consensus translations always improve translation quality of the best individual system.
107	CMU Multi-Engine Machine Translation for WMT 2010 Kenneth Heaﬁeld and Alon Lavie show abstracthide abstract This paper describes our submission, cmu-heaﬁeld-combo, to the WMT 2010 machine translation system combination task. Using constrained resources, we participated in all nine language pairs, namely translating English to and from Czech, French, German, and Spanish as well as combining English translations from multiple languages. Combination proceeds by aligning all pairs of system outputs then navigating the aligned outputs from left to right where each path is a candidate combination. Candidate combinations are scored by their length, agreement with the underlying systems, and a language model. On tuning data, improvement in BLEU over the best system depends on the language pair and ranges from 0.89% to 5.57% with mean 2.37%.
108	CMU System Combination via Hypothesis Selection for WMT’10 Almut Silja Hildebrand and Stephan Vogel show abstracthide abstract This paper describes the CMU entry for the system combination shared task at WMT’10. Our combination method is hypothesis selection, which uses information from n-best lists from the input MT systems, where available. The sentence level features used are independent from the MT systems involved. Compared to the baseline we added source-to-target word alignment based features and trained system weights to our feature set. We combined MT systems for French - English and German - English using provided data only.
109	JHU System Combination Scheme for WMT 2010 Sushant Narsale show abstracthide abstract This paper describes the JHU system combination scheme that was used in the WMT 2010 submission.
110	The RWTH System Combination System for WMT 2010 Gregor Leusch and Hermann Ney show abstracthide abstract RWTH participated in the System Combination task of the Fifth Workshop on Statistical Machine Translation (WMT 2010). For 7 of the 8 language pairs, we combine 5 to 13 systems into a single consensus translation, using additional nbest reranking techniques in two of these language pairs. Depending on the language pair, improvements versus the best single system are in the range of +0.5 and +1.7 on BLEU, and between -0.4 and -2.3 on TER. Novel techniques compared with RWTH’s submission to WMT 2009 include the utilization of nbest reranking techniques, a consensus true casing approach, a different tuning algorithm, and the separate selection of input systems for CN construction, primary/skeleton hypotheses, HypLM, and true casing.
111	BBN System Description for WMT10 System Combination Task Antti-Veikko Rosti, Bing Zhang, Spyros Matsoukas and Richard Schwartz show abstracthide abstract BBN submitted system combination outputs for Czech-English, German-English, Spanish-English, French-English, and All- English language pairs. All combinations were based on confusion network decoding. An incremental hypothesis alignment algorithm with ﬂexible matching was used to build the networks. The bi-gram decoding weights for the single source language translations were tuned directly to maximize the BLEU score of the decoding output. Approximate expected BLEU was used as the objective function in gradient based optimization of the combination weights for a 44 system multi-source language combination (All-English). The system combination gained around 0.4- 2.0 BLEU points over the best individual systems on the single source conditions. On the multi-source condition, the system combination gained 6.6 BLEU points.
Metrics Task
112	LRscore for Evaluating Lexical and Reordering Quality in MT Alexandra Birch and Miles Osborne show abstracthide abstract The ability to measure the quality of word order in translations is an important goal for research in machine translation. Current machine translation metrics do not adequately measure the reordering performance of translation systems. We present a novel metric, the LRscore, which directly measures reordering success. The reordering component is balanced by a lexical metric. Capturing the two most important elements of translation success in a simple combined metric with only one parameter results in an intuitive, shallow, language independent metric.
113	Document-level Automatic MT Evaluation based on Discourse Representations Elisabet Comelles, Jesus Gimenez, Lluis Marquez, Irene Castellon and Victoria Arranz show abstracthide abstract This paper describes the joint submission of Universitat Politècnica de Catalunya and Universitat de Barcelona to the Metrics MaTr 2010 evaluation challenge, in collaboration with ELDA/ELRA. Our work is aimed at widening the scope of current automatic evaluation measures from sentence to document level. Preliminary experiments, based on an extension of the metrics by Gimenez and Marquez (2009) operating over discourse representations, are presented.
114	METEOR-NEXT and the METEOR Paraphrase Tables: Improved Evaluation Support for Five Target Languages Michael Denkowski and Alon Lavie show abstracthide abstract This paper describes our submission to the WMT10 Shared Evaluation Task and MetricsMATR10. We present a version of the METEOR-NEXT metric with paraphrase tables for ﬁve target languages. We describe the creation of these paraphrase tables and conduct a tuning experiment that demonstrates consistent improvement across all languages over baseline versions of the metric without paraphrase resources.
115	Normalized Compression Distance Based Measures for MetricsMATR 2010 Marcus Dobrinkat, Tero Tapiovaara, Jaakko Väyrynen and Kimmo Kettunen show abstracthide abstract We present the MT-NCD and MT-mNCD machine translation evaluation metrics as submission to the machine translation evaluation shared task (MetricsMATR 2010). The metrics are based on normalized compression distance (NCD), a general information theoretic measure of string similarity, and evaluated against human judgments from the WMT08 shared task. The experiments show that 1) our metric improves correlation to human judgments by using ﬂexible matching, 2) segment replication is effective, and 3) our NCD-inspired method for multiple references indicates improved results. Generally, the proposed MT-NCD and MT-mNCD methods correlate competitively with human judgments compared to commonly used machine translations evaluation metrics, for instance, BLEU.
116	The DCU Dependency-Based Metric in WMT-MetricsMATR 2010 Yifan He, Jinhua Du, Andy Way and Josef van Genabith show abstracthide abstract We describe DCU’s LFG dependency-based metric submitted to the shared evaluation task of WMT-MetricsMATR 2010. The metric is built on the LFG F-structure-based approach presented in (Owczarzak et al., 2007). We explore the following improvements on the original metric: 1) we replace the in-house LFG parser with an open source dependency parser that directly parses strings into LFG dependencies; 2) we add a stemming module and unigram paraphrases to strengthen the aligner; 3) we introduce a chunk penalty following the practice of Meteor to reward continuous matches; and 4) we introduce and tune parameters to maximize the correlation with human judgement. Experiments show that these enhancements improve the dependency-based metric’s correlation with human judgement.
117	TESLA: Translation Evaluation of Sentences with Linear-programming-based Analysis Chang Liu, Daniel Dahlmeier and Hwee Tou Ng show abstracthide abstract We present TESLA-M and TESLA, two novel automatic machine translation evaluation metrics with state-of-the-art performances. TESLA-M builds on the success of METEOR and MaxSim, but employs a more expressive linear programming framework. TESLA further exploits parallel texts to build a shallow semantic representation. We evaluate both on the WMT 2009 shared evaluation task and show that they outperform all participating systems in most tasks.
118	The Parameter-optimized ATEC Metric for MT Evaluation Billy Wong and Chunyu Kit show abstracthide abstract This paper describes the latest version of the ATEC metric for automatic MT evaluation, with parameters optimized for word choice and word order, the two fundamental features of language that the metric relies on. The former is assessed by matching at various linguistic levels and weighting the informativeness of both matched and unmatched words. The latter is quantiﬁed in term of word position and information ﬂow. We also discuss those aspects of language not yet covered by other existing evaluation metrics but carefully considered in the formulation of our metric.

12:30–14:00

Lunch

14:00–15:40

Full Paper Session 4

14:00–14:25	A Uniﬁed Approach to Minimum Risk Training and Decoding Abhishek Arun, Barry Haddow and Philipp Koehn show abstracthide abstract We present a uniﬁed approach to performing minimum risk training and minimum Bayes risk (MBR) decoding with BLEU in a phrase-based model. Key to our approach is the use of a Gibbs sampler that allows us to explore the entire probability distribution and maintain a strict probabilistic formulation across the pipeline. We also describe a new sampling algorithm called corpus sampling which allows us at training time to use BLEU instead of an approximation thereof. Our approach is theoretically sound and gives better (up to +0.6%BLEU) and more stable results than the standard MERT optimization algorithm. By comparing our approach to lattice MBR, we are also able to gain crucial insights about both methods.
14:25–14:50	N-best Reranking by Multitask Learning Kevin Duh, Katsuhito Sudoh, Hajime Tsukada, Hideki Isozaki and Masaaki Nagata show abstracthide abstract We propose a new framework for N-best reranking on sparse feature sets. The idea is to reformulate the reranking problem as a Multitask Learning problem, where each N-best list corresponds to a distinct task. This is motivated by the observation that N-best lists often show signiﬁcant differences in feature distributions. Training a single reranker directly on this heterogenous data can be difﬁcult. Our proposed meta-algorithm solves this challenge by using multitask learning (such as l1/l2 regularization) to discover common feature representations across N-best lists. This meta-algorithm is simple to implement, and its modular approach allows one to plug-in different learning algorithms from existing literature. As a proof of concept, we show statistically signiﬁcant improvements on a machine translation system involving millions of features.
14:50–15:15	Taming Structured Perceptrons on Wild Feature Vectors Ralf Brown show abstracthide abstract Structured perceptrons are attractive due to their simplicity and speed, and have been used successfully for tuning the weights of binary features in a machine translation system. When we attempted to apply them to tuning the weights of real-valued features with highly skewed distributions, we found that they did not work well. This paper describes a modiﬁcation to the update step and compares the performance of the resulting algorithm to standard minimum error-rate training. In addition, preliminary results for combining MERT or structured-perceptron tuning of the log-linear feature weights with coordinate ascent of other translation system parameters are presented.
15:15–15:40	Translation Model Adaptation by Resampling Kashif Shah, Loïc Barrault and Holger Schwenk show abstracthide abstract The translation model of statistical machine translation systems is trained on parallel data coming from various sources and domains. These corpora are usually concatenated, word alignments are calculated and phrases are extracted. This means that the corpora are not weighted according to their importance to the domain of the translation task. This is in contrast to the training of the language model for which well known techniques are used to weight the various sources of texts. On a smaller granularity, the automatic calculated word alignments differ in quality. This is usually not considered when extracting phrases either. In this paper we propose a method to automatically weight the different corpora and alignments. This is achieved with a resampling technique.We report experimental results for a small (IWSLT) and large (NIST) Arabic/English translation tasks. In both cases, signiﬁcant improvements in the BLEU score were observed.

15:40–16:00

Afternoon Break

16:00–17:40

Full Paper Session 5

16:00–16:25	Integration of Multiple Bilingually-Learned Segmentation Schemes into Statistical Machine Translation Michael Paul, Andrew Finch and Eiichiro Sumita show abstracthide abstract This paper proposes an unsupervised word segmentation algorithm that identiﬁes word boundaries in continuous source language text in order to improve the translation quality of statistical machine translation (SMT) approaches. The method can be applied to any language pair where the source language is unsegmented and the target language segmentation is known. First, an iterative bootstrap method is applied to learn multiple segmentation schemes that are consistent with the phrasal segmentations of an SMT system trained on the resegmented bitext. In the second step, multiple segmentation schemes are integrated into a single SMT system by characterizing the source language side and merging identical translation pairs of differently segmented SMT models. Experimental results translating ﬁve Asian languages into English revealed that the method of integrating multiple segmentation schemes outperforms SMT models trained on any of the learned word segmentations and performs comparably to available state-of-the-art monolingually-built segmentation tools.
16:25–16:50	Improved Translation with Source Syntax Labels Hieu Hoang and Philipp Koehn show abstracthide abstract We present a new translation model that include undecorated hierarchical-style phrase rules, decorated source-syntax rules, and partially decorated rules. Results show an increase in translation performance of up to 0.8% BLEU for German-English translation when trained on the news-commentary corpus, using syntactic annotation from a source language parser. We also experimented with annotation from shallow taggers and found this increased performance by 0.5% BLEU.
16:50–17:15	Divide and Translate: Improving Long Distance Reordering in Statistical Machine Translation Katsuhito Sudoh, Kevin Duh, Hajime Tsukada, Tsutomu Hirao and Masaaki Nagata show abstracthide abstract This paper proposes a novel method for long distance, clause-level reordering in statistical machine translation (SMT). The proposed method separately translates clauses in the source sentence and reconstructs the target sentence using the clause translations with non-terminals. The non-terminals are placeholders of embedded clauses, by which we reduce complicated clause-level reordering into simple word-level reordering. Its translation model is trained using a bilingual corpus with clause-level alignment, which can be automatically annotated by our alignment algorithm with a syntactic parser in the source language. We achieved signiﬁcant improvements of 1.4% in BLEU and 1.3% in TER by using Moses, and 2.2% in BLEU and 3.5% in TER by using our hierarchical phrase-based SMT, for the English-to-Japanese translation of research paper abstracts in the medical domain.
17:15–17:40	Decision Trees for Lexical Smoothing in Statistical Machine Translation Rabih Zbib, Spyros Matsoukas, Richard Schwartz and John Makhoul show abstracthide abstract We present a method for incorporating arbitrary context-informed word attributes into statistical machine translation by clustering attribute-qualiﬁed source words, and smoothing their word translation lexical probabilities using binary decision trees. We describe two ways in which the decision trees are used in machine translation: by using the attribute-qualiﬁed source word clusters directly, or by using attribute-dependent lexical probabilities that are obtained from the trees, as a lexical smoothing feature in the decoder model. We present experiments using Arabic-to-English newswire data, and using Arabic diacritics and part-of-speech as source word attributes, and show that the proposed method improves on a state-of-the-art translation system.

⇑ WS1: SemEval-2010

5th International Workshop on Semantic Evaluation
July 15–16
Venue B, Lecture Hall 4
Chairs: Katrin Erk and Carlo Strapparava
Homepage

July 15, 2010

09:00–10:40

Task description papers

09:00–09:20	SemEval-2010 Task 1: Coreference Resolution in Multiple Languages Marta Recasens, Lluís Màrquez, Emili Sapena, M. Antònia Martí, Mariona Taulé, Véronique Hoste, Massimo Poesio and Yannick Versley show abstracthide abstract This paper presents the SemEval-2010 task on "Coreference Resolution in Multiple Languages." The goal was to evaluate and compare automatic coreference resolution systems for six different languages (Catalan, Dutch, English, German, Italian, and Spanish) in four evaluation settings and using four different metrics. Such a rich scenario had the potential to provide insight into key issues concerning coreference resolution: (i) the portability of systems across languages, (ii) the relevance of different levels of linguistic information, and (iii) the behavior of scoring metrics.
09:20–09:40	SemEval-2010 Task 2: Cross-Lingual Lexical Substitution Rada Mihalcea, Ravi Sinha and Diana McCarthy show abstracthide abstract In this paper we describe the SemEval-2010 Cross-Lingual Lexical Substitution task, where given an English target word in context, participating systems had to ﬁnd an alternative substitute word or phrase in Spanish. The task is based on the English Lexical Substitution task run at SemEval-2007. In this paper we provide background and motivation for the task, we describe the data annotation process and the scoring system, and present the results of the participating systems.
09:40–10:00	SemEval-2010 Task 3: Cross-Lingual Word Sense Disambiguation Els Lefever and Véronique Hoste show abstracthide abstract The goal of this task is to evaluate the feasibility of multilingual WSD on a newly developed multilingual lexical sample data set. Participants were asked to automatically determine the contextually appropriate translation of a given English noun in ﬁve languages, viz. Dutch, German, Italian, Spanish and French. This paper reports on the sixteen submissions from the ﬁve different participating teams.
10:00–10:20	SemEval-2010 Task 5 : Automatic Keyphrase Extraction from Scientiﬁc Articles Su Nam Kim, Olena Medelyan, Min-Yen Kan and Timothy Baldwin show abstracthide abstract This paper describes Task 5 of the Workshop on Semantic Evaluation 2010 (SemEval-2010). Systems are to automatically assign keyphrases or keywords to given scientiﬁc articles. The participating systems were evaluated by matching their extracted keyphrases against manually assigned ones. We present the overall ranking of the submitted systems and discuss our ﬁndings to suggest future directions for this task.
10:20–10:40	SemEval-2010 Task 7: Argument Selection and Coercion James Pustejovsky, Anna Rumshisky, Alex Plotnick, Elisabetta Jezek, Olga Batiukova and Valeria Quochi show abstracthide abstract We describe the argument selection and coercion task for the SemEval-2010 evaluation exercise. This task involves characterizing the type of compositional operation that exists between a predicate and the arguments it selects. Speciﬁcally, the goal is to identify whether the type that a verb selects is satisﬁed directly by the argument, or whether the argument must change type to satisfy the verb typing. We discuss the problem in detail, describe the data preparation for the task, and analyze the results of the submissions.

10:40–11:00

Coffee/Tea Break

11:00–12:40

Task description papers

11:00–11:20	SemEval-2010 Task 8: Multi-Way Classiﬁcation of Semantic Relations Between Pairs of Nominals Iris Hendrickx, Su Nam Kim, Zornitsa Kozareva, Preslav Nakov, Diarmuid Ó Séaghdha, Sebastian Pado, Marco Pennacchiotti, Lorenza Romano and Stan Szpakowicz show abstracthide abstract SemEval-2 Task 8 focuses on Multi-way classiﬁcation of semantic relations between pairs of nominals. The task was designed to compare different approaches to semantic relation classiﬁcation and to provide a standard testbed for future research. This paper deﬁnes the task, describes the training and test data and the process of their creation, lists the participating systems (10 teams, 28 runs), and discusses their results.
11:20–11:40	SemEval-2 Task 9: The Interpretation of Noun Compounds Using Paraphrasing Verbs and Prepositions Cristina Butnariu, Su Nam Kim, Preslav Nakov, Diarmuid Ó Séaghdha, Stan Szpakowicz and Tony Veale show abstracthide abstract Previous research has shown that the meaning of many noun-noun compounds "N1 N2" can be approximated reasonably well by paraphrasing clauses of the form "N2 that ... N1", where "..." stands for a verb with or without a preposition. For example, "malaria mosquito" is a "mosquito that carries malaria". Evaluating the quality of such paraphrases is the theme of Task 9 at SemEval-2. This paper describes some background, the task deﬁnition, the process of data collection and the task results. We also venture a few general conclusions before the participating teams present their systems at the SemEval-2 workshop. There were 5 teams who submitted 7 systems.
11:40–12:00	SemEval-2010 Task 10: Linking Events and Their Participants in Discourse Josef Ruppenhofer, Caroline Sporleder, Roser Morante, Collin Baker and Martha Palmer show abstracthide abstract We describe the SemEval-2010 shared task on “Linking Events and Their Participants in Discourse”. This task is an extension to the classical semantic role labeling task. While semantic role labeling is traditionally viewed as a sentenceinternal task, it is clear that local semantic argument structures also interact with each other in a larger context, e.g., by sharing references to speciﬁc discourse entities or events. In the shared task we looked at one particular aspect of cross-sentence links between argument structures, namely linking locally uninstantiated roles to their coreferents in the wider discourse context (if such co-referents exist). This task is potentially beneﬁcial for a number of NLP applications, such as information extraction, question answering or text summarization.
12:00–12:20	SemEval-2010 Task 12: Parser Evaluation using Textual Entailments Deniz Yuret, Aydin Han and Zehra Turgut show abstracthide abstract Parser Evaluation using Textual Entailments (PETE) is a shared task in the SemEval-2010 Evaluation Exercises on Semantic Evaluation. The task involves recognizing textual entailments based on syntactic information alone. PETE introduces a new parser evaluation scheme that is formalism independent, less prone to annotation error, and focused on semantically relevant distinctions.
12:20–12:40	SemEval-2010 Task 13: TempEval-2 Marc Verhagen, Roser Sauri, Tommaso Caselli and James Pustejovsky show abstracthide abstract Tempeval-2 comprises evaluation tasks for time expressions, events and temporal relations, the latter of which was split up in four sub tasks, motivated by the notion that smaller subtasks would make both data preparation and temporal relation extraction easier. Manually annotated data were provided for six languages: Chinese, English, French, Italian, Korean and Spanish.

12:40–14:00

Lunch

14:00–15:20

Task description papers

14:00–14:20	SemEval-2010 Task 14: Word Sense Induction & Disambiguation Suresh Manandhar, Ioannis Klapaftis, Dmitriy Dligach and Sameer Pradhan show abstracthide abstract This paper presents the description and evaluation framework of SemEval-2010 Word Sense Induction & Disambiguation task, as well as the evaluation results of 26 participating systems. In this task, participants were required to induce the senses of 100 target words using a training set, and then disambiguate unseen instances of the same words using the induced senses. Systems’ answers were evaluated in: (1) an unsupervised manner by using two clustering evaluation measures, and (2) a supervised manner, i.e. in a WSD task.
14:20–14:40	SemEval-2010 Task: Japanese WSD Manabu Okumura, Kiyoaki Shirai, Kanako Komiya and Hikaru Yokono show abstracthide abstract An overview of the SemEval-2 Japanese WSD task is presented. It is a lexical sample task, and word senses are deﬁned according to a Japanese dictionary, the Iwanami Kokugo Jiten. This dictionary and a training corpus were distributed to participants. The number of target words was 50, with 22 nouns, 23 verbs, and 5 adjectives. Fifty instances of each target word were provided, consisting of a total of 2,500 instances for the evaluation. Nine systems from four organizations participated in the task.
14:40–15:00	SemEval-2010 Task 17: All-words Word Sense Disambiguation on a Speciﬁc Domain Eneko Agirre, Oier Lopez de Lacalle, Christiane Fellbaum, Shu-Kai Hsieh, Maurizio Tesconi, Monica Monachini, Piek Vossen and Roxanne Segers show abstracthide abstract Domain portability and adaptation of NLP components and Word Sense Disambiguation systems present new challenges. The difﬁculties found by supervised systems to adapt might change the way we assess the strengths and weaknesses of supervised and knowledge-based WSD systems. Unfortunately, all existing evaluation datasets for speciﬁc domains are lexical-sample corpora. This task presented all-words datasets on the environment domain for WSD in four languages (Chinese, Dutch, English, Italian). 11 teams participated, with supervised and knowledge-based systems, mainly in the English dataset. The results show that in all languages the participants where able to beat the most frequent sense heuristic as estimated from general corpora. The most successful approaches used some sort of supervision in the form of hand-tagged examples from the domain.
15:00–15:20	SemEval-2010 Task 18: Disambiguating Sentiment Ambiguous Adjectives Yunfang Wu and Peng Jin show abstracthide abstract Sentiment ambiguous adjectives cause major difﬁculties for existing algorithms of sentiment analysis. We present an evaluation task designed to provide a framework for comparing different approaches in this problem. We deﬁne the task, describe the data creation, list the participating systems and discuss their results. There are 8 teams and 16 systems.

15:20–16:00

Coffee/Tea Break

16:00–17:30

Poster Session

101	RelaxCor: A Global Relaxation Labeling Approach to Coreference Resolution Emili Sapena, Lluís Padró and Jordi Turmo show abstracthide abstract This paper describes the participation of RelaxCor in the Semeval-2010 task number 1: "Coreference Resolution in Multiple Languages". RelaxCor is a constraint-based graph partitioning approach to coreference resolution solved by relaxation labeling. The approach combines the strengths of groupwise classiﬁers and chain formation methods in one global method.
102	SUCRE: A Modular System for Coreference Resolution Hamidreza Kobdani and Hinrich Schütze show abstracthide abstract This paper presents SUCRE, a new software tool for coreference resolution and its feature engineering. It is able to separately do noun, pronoun and full coreference resolution. SUCRE introduces a new approach to the feature engineering of coreference resolution based on a relational database model and a regular feature deﬁnition language. SUCRE successfully participated in SemEval-2010 Task 1 on Coreference Resolution in Multiple Languages for gold and regular closed annotation tracks of six languages. It obtained the best results in several categories, including the regular closed annotation tracks of English and German.
103	UBIU: A Language-Independent System for Coreference Resolution Desislava Zhekova and Sandra Kübler show abstracthide abstract We present UBIU, a language independent system for detecting full coreference chains, composed of named entities, pronouns, and full noun phrases which makes use of memory based learning and a feature model following Rahman and Ng (2009). UBIU is evaluated on the task "Coreference Resolution in Multiple Languages" (SemEval Task 1 (Recasens et al., 2010)) in the context of the 5th International Workshop on Semantic Evaluation.
104	Corry: a System for Coreference Resolution Olga Uryupina show abstracthide abstract Corry is a system for coreference resolution in English. It supports both local and global (ILP) models of coreference. The backbone of the system is a family of SVM classiﬁers for pairs of mentions: each mention type receives its own classiﬁer. A separate anaphoricity classiﬁer is learned for the ILP setting. Corry relies on a rich linguistically motivated feature set, which has, however, been manually reduced to 64 features for efﬁciency reasons. The system uses the Stanford NLP toolkit for parsing and NE-tagging, Wordnet for semantic classes and the U.S. census data for assigning gender values to person names. Three runs have been submitted for the SemEval task 1, optimizing Corry’s performance for BLANC, MUC and CEAF. The runs differ with respect to the model (local for BLANC, global for MUC and CEAF) and the deﬁnition of mention types. Corry runs have shown the best performance level among all the system in their track for the corresponding metric.
105	BART: A Multilingual Anaphora Resolution System Samuel Broscheit, Massimo Poesio, Simone Paolo Ponzetto, Kepa Joseba Rodriguez, Lorenza Romano, Olga Uryupina, Yannick Versley and Roberto Zanoli show abstracthide abstract BART is a highly modular toolkit for coreference resolution that supports state-of-the-art statistical approaches to the task and enables efﬁcient feature engineering. BART has originally been created and tested for English, but its ﬂexible architecture ensures its portability to other languages and domains. At the SemEval task 1 on Coreference Resolution, BART runs have been submitted for German, English, and Italian. BART relies on a maximum enthropy-based classiﬁer for pairs of mentions. A novel entity-mention approach based on Semantic Trees is at the moment only supported for English. For German and English, BART relies on Wordnet/Germanet for determining semantic classes and a list of names pre-classiﬁed for gender (extracted from Wikipedia). Mention boundaries are derived from parse trees. For Italian, mention boundaries and semantic types are provided by our mention tagger – it relies on Wikipedia and a gazetteer extracted from the ICab dataset.
106	TANL-1: Coreference Resolution by Parse Analysis and Similarity Clustering Giuseppe Attardi, Maria Simi and Stefano Dei Rossi show abstracthide abstract This paper describes our submission to the Semeval 2010 task on coreference resolution in multiple languages. The system uses a binary classiﬁer, based on Maximum Entropy, to decide whether or not there is a relationship between each pair of mentions extracted from a textual document. Mention detection is based on the analysis of the dependency parse tree.
107	FCC: Modeling Probabilities with GIZA++ for Task #2 and #3 of SemEval-2 Darnes Vilariño Ayala, Carlos Balderas Posada, David Eduardo Pinto Avendaño, Miguel Rodríguez Hernández and Saul León Silverio show abstracthide abstract In this paper we present a naïve approach to tackle the problem of cross-lingual WSD and cross-lingual lexical substitution which correspond to the Task #2 and #3 of the SemEval-2 competition. We used a bilingual statistical dictionary, which is calculated with Giza++ by using the EUROPARL parallel corpus, in order to calculate the probability of a source word to be translated to a target word (which is assumed to be the correct sense of the source word but in a different language). Two versions of the probabilistic model are tested: unweighted and weighted. The obtained values show that the unweighted version performs better thant the weighted one.
108	Combining Dictionaries and Contextual Information for Cross-Lingual Lexical Substitution Wilker Aziz and Lucia Specia show abstracthide abstract We describe two systems participating in Semeval-2010’s Cross-Lingual Lexical Substitution task: USPwlv and WLVusp. Both systems are based on two main components: (i) a dictionary to provide a number of possible translations for each source word, and (ii) a contextual model to select the best translation according to the context where the source word occurs. These components and the way they are integrated are different in the two systems: they exploit corpus-based and linguistic resources, and supervised and unsupervised learning methods. Among the 14 participants in the subtask to identify the best translation, our systems were ranked 2nd and 4th in terms of recall, 3rd and 4th in terms of precision. used.
110	COLEPL and COLSLM: An Unsupervised WSD Approach to Multilingual Lexical Substitution, Tasks 2 and 3 SemEval 2010 Weiwei Guo and Mona Diab show abstracthide abstract In this paper, we present a word sense disambiguation (WSD) based system for multilingual lexical substitution. Our method depends on having a WSD system for English and an automatic word alignment method. Crucially the approach relies on having parallel corpora. For Task 2 we apply a supervised WSD system to derive the English word senses. For Task 3, we apply an unsupervised approach to the training and test data. Both of our systems that participated in Task 2 achieve a decent ranking among the participating systems. For Task 3 we achieve the highest ranking on several of the language pairs: French, German and Italian.
111	UHD: Cross-Lingual Word Sense Disambiguation Using Multilingual Co-occurrence Graphs Carina Silberer and Simone Paolo Ponzetto show abstracthide abstract We describe the University of Heidelberg (UHD) system for the Cross-Lingual Word Sense Disambiguation SemEval-2010 task (CL-WSD). The system performs CL-WSD by applying graph algorithms previously developed for monolingual Word Sense Disambiguation to multilingual co-occurrence graphs. UHD has participated in the Best and out-of-ﬁve (OOF) evaluations and ranked among the most competitive systems for this task, thus indicating that graph-based approaches represent a powerful alternative for this task.
112	OWNS: Cross-lingual Word Sense Disambiguation Using Weighted Overlap Counts and Wordnet Based Similarity Measures Lipta Mahapatra, Meera Mohan, Mitesh Khapra and Pushpak Bhattacharyya show abstracthide abstract We report here our work on English French Cross-lingual Word Sense Disambiguation where the task is to ﬁnd the best French translation for a target English word depending on the context in which it is used. Our approach relies on identifying the nearest neighbors of the test sentence from the training data using a pairwise similarity measure. The proposed measure ﬁnds the afﬁnity between two sentences by calculating a weighted sum of the word overlap and the semantic overlap between them. The semantic overlap is calculated using standard Wordnet Similarity measures. Once the nearest neighbors have been identiﬁed, the best translation is found by taking a majority vote over the French translations of the nearest neighbors.
113	273. Task 5. Keyphrase Extraction Based on Core Word Identiﬁcation and Word Expansion You Ouyang, Wenjie Li and Renxian Zhang show abstracthide abstract This paper provides a description of the Hong Kong Polytechnic University (PolyU) System that participated in the task #5 of SemEval-2, i.e., the Automatic Keyphrase Extraction from Scientiﬁc Articles task. We followed a novel framework to develop our keyphrase extraction system, motivated by differentiating the roles of the words in a keyphrase. We ﬁrst identiﬁed the core words which are deﬁned as the most essential words in the article, and then expanded the identiﬁed core words to the target keyphrases by a word expansion approach.
114	DERIUNLP: A Context Based Approach to Automatic Keyphrase Extraction Georgeta Bordea and Paul Buitelaar show abstracthide abstract The DERI UNLP team participated in the SemEval 2010 Task #5 with an unsupervised system that automatically extracts keyphrases from scientiﬁc articles. Our approach does not consider only a general description of a term to select keyphrase candidates but also context information in the form of "skill types". Even though our system analysed a restricted list of candidates, our team was able to outperform baseline unsupervised and supervised approaches.
115	DFKI KeyWE: Ranking keyphrases extracted from scientiﬁc articles Kathrin Eichler and Günter Neumann show abstracthide abstract A central issue for making the content of a scientiﬁc document quickly accessible to a potential reader is the extraction of keyphrases, which capture the main topic of the document. Keyphrases can be extracted automatically by generating a list of keyphrase candidates, ranking these candidates, and selecting the top-ranked candidates as keyphrases. We present the KeyWE system, which uses an adapted nominal group chunker for candidate extraction and a supervised ranking algorithm based on support vector machines for ranking the extracted candidates. The system was evaluated on data provided for the SemEval 2010 Shared Task on Keyphrase Extraction.
116	Single Document Keyphrase Extraction Using Sentence Clustering and Latent Dirichlet Allocation Claude Pasquier show abstracthide abstract This paper describes the design of a system for extracting keyphrases from a single document The principle of the algorithm is to cluster sentences of the documents in order to highlight parts of text that are semantically related. The clusters of sentences, that reﬂect the themes of the document, are then analyzed to ﬁnd the main topics of the text. Finally, the most important words, or groups of words, from these topics are proposed as keyphrases. This method is evaluated on task number 5 (Automatic Keyphrase Extraction from Scientiﬁc Articles) of SemEval-2010: the 5th International Workshop on Semantic Evaluations.
117	SJTULTLAB: Chunk Based Method for Keyphrase Extraction Letian Wang and Fang Li show abstracthide abstract In this paper we present a chunk based keyphrase extraction method for scientiﬁc articles. Different from most previous systems, supervised machine learning algorithms are not used in our system. Instead, document structure information is used to remove unimportant contents; Chunk extraction and ﬁltering is used to reduce the quantity of candidates; Keywords are used to ﬁlter the candidates before generating ﬁnal keyphrases. Our experimental results on test data show that the method works better than the baseline systems and is comparable with other known algorithms.
118	Likey: Unsupervised Language-independent Keyphrase Extraction Mari-Sanna Paukkeri and Timo Honkela show abstracthide abstract Likey is an unsupervised statistical approach for keyphrase extraction. The method is language-independent and the only language-dependent component is the reference corpus with which the documents to be analyzed are compared. In this study, we have also used another language-dependent component: an English-speciﬁc Porter stemmer as a preprocessing step. In our experiments of keyphrase extraction from scientiﬁc articles, the Likey method outperforms both supervised and unsupervised baseline methods.
119	WINGNUS: Keyphrase Extraction Utilizing Document Logical Structure Thuy Dung Nguyen and Minh-Thang Luong show abstracthide abstract We present a system description of the WINGNUS team work for the SemEval-2010 task #5 Automatic Keyphrase Extraction from Scientiﬁc Articles. A key feature of our system is that it utilizes an inferred document logical structure in our candidate identiﬁcation process, to limit the number of phrases in the candidate list, while maintaining its coverage of important phrases. Our top performing system achieves an F1 of 25.22% for the combined keyphrases (author and reader assigned) in the ﬁnal test data. We note that method we report here is novel and orthogonal from other systems, so it can be combined with other techniques to potentially achieve higher performance.
120	KX: A ﬂexible system for Keyphrase eXtraction Emanuele Pianta and Sara Tonelli show abstracthide abstract In this paper we present KX, a system for keyphrase extraction developed at FBK-IRST, which exploits basic linguistic annotation combined with simple statistical measures to select a list of weighted keywords from a document. The system is ﬂexible in that it offers to the user the possibility of setting parameters such as frequency thresholds for collocation extraction and indicators for keyphrase relevance, as well as it allows for domain adaptation exploiting a corpus of documents in an unsupervised way. KX is also easily adaptable to new languages in that it requires only a PoS-Tagger to derive lexical patterns. In the SemEval task 5 “Automatic Keyphrase Extraction from Scientiﬁc Articles”, KX performance achieved satisfactory results both in ﬁnding reader-assigned keywords and in the combined keywords subtask.
121	BUAP: An Unsupervised Approach to Automatic Keyphrase Extraction from Scientiﬁc Articles Roberto Ortiz, David Pinto, Mireya Tovar and Héctor Jiménez-Salazar show abstracthide abstract In this paper, it is presented an unsupervised approach to automatically discover the latent keyphrases contained in scientiﬁc articles. The proposed technique is constructed on the basis of the combination of two techniques: maximal frequent sequences and pageranking. We evaluated the obtained results by using micro-averaged precision, recall and Fscores with respect to two different gold standards: 1) reader’s keyphrases, and 2) a combined set of author’s and reader’s keyphrases. The obtained results were also compared against three different baselines: one unsupervised (TF-IDF based) and two supervised (Na¨ıve Bayes and Maximum Entropy).
122	UNPMC: Naive Approach to Extract Keyphrases from Scientiﬁc Articles Jungyeul Park, Jong Gun Lee and Béatrice Daille show abstracthide abstract We describe our method for extracting keyphrases from scientiﬁc articles which we participate in the shared task of SemEval-2 Evaluation Exercise. Even though general-purpose term extractors along with linguistically-motivated analysis allow us to extract elaborated morpho-syntactic variation forms of terms, a naive statistic approach proposed in this paper is very simple and quite efﬁcient for extracting keyphrases especially from well-structured scientiﬁc articles. Based on the characteristics of keyphrases with section information, we obtain 18.34% for f-measure using top 15 candidates. We also show further improvement without any complications and we discuss this at the end of the paper.
123	SEERLAB: A System for Extracting Keyphrases from Scholarly Documents Pucktada Treeratpituk, Pradeep Teregowda, Jian Huang and C. Lee Giles show abstracthide abstract We describe the SEERLAB system that participated in the SemEval 2010’s Keyphrase Extraction Task. SEERLAB utilizes the DBLP corpus for generating a set of candidate keyphrases from a document. Random Forest, a supervised ensemble classiﬁer, is then used to select the top keyphrases from the candidate set. SEERLAB achieved a 0.24 F-score in generating the top 15 keyphrases, which places it sixth among 19 participating sys- tems. Additionally, SEERLAB performed particularly well in generating the top 5 keyphrases with an F-score that ranked third.
124	SZTERGAK : Feature Engineering for Keyphrase Extraction Gábor Berend and Richárd Farkas show abstracthide abstract Automatically assigning keyphrases to documents has a great variety of applications. Here we focus on the keyphrase extraction of scientiﬁc publications and present a novel set of features for the supervised learning of keyphraseness. Although these features are intended for extracting keyphrases from scientiﬁc papers, because of their generality and robustness, they should have uses in other domains as well. With the help of these features SZTERGAK achieved top results on the SemEval-2 shared task on Automatic Keyphrase Extraction from Scientiﬁc Articles and exceeded its baseline by 10%.
125	KP-Miner: Participation in SemEval-2 Samhaa R. El-Beltagy and Ahmed Rafea show abstracthide abstract This paper brieﬂy describes the KP-Miner sys-tem which is a system developed for the extraction of keyphrases from English and Arabic documents, irrespective of their nature. The paper also outlines the performance of the system in the “Automatic Keyphrase Extraction from Scientiﬁc Articles” task which is part of SemEval-2.
126	UvT: The UvT Term Extraction System in the Keyphrase Extraction task Kalliopi Zervanou show abstracthide abstract The UvT system is based on a hybrid, linguistic and statistical approach, originally proposed for the recognition of multi-word terminological phrases, the C-value method (Frantzi et al., 2000). In the UvT implementation, we use an extended noun phrase rule set and take into consideration orthographic and morphological variation, term abbreviations and acronyms, and basic document structure information.
127	UNITN: Part-Of-Speech Counting in Relation Extraction Fabio Celli show abstracthide abstract This report describes the UNITN system, a Part-Of-Speech Context Counter, that participated at Semeval 2010 Task 8: Multi- Way Classiﬁcation of Semantic Relations Between Pairs of Nominals. Given a text annotated with Part-of-Speech, the system outputs a vector representation of a sentence containing 20 features in total. There are three steps in the system’s pipeline: ﬁrst the system produces an estimation of the entities’ position in the relation, then an estimation of the semantic relation type by means of decision trees and ﬁnally it gives a predicition of semantic relation plus entities’ position. The system obtained good results in the estimation of entities’ position (F1=98.3%) but a critically poor performance in relation classiﬁcation (F1=26.6%), indicating that lexical and semantic information is essential in relation extraction. The system can be used as an integration for other systems or for purposes different from relation extraction.
128	FBK_NK: a WordNet-based System for Multi-Way Classiﬁcation of Semantic Relations Matteo Negri and Milen Kouylekov show abstracthide abstract We describe a WordNet-based system for the extraction of semantic relations between pairs of nominals appearing in English texts. The system adopts a lightweight approach, based on training a Bayesian Network classiﬁer using large sets of binary features. Our features consider: i) the context surrounding the nominals involved in the relation, and ii) different types of knowledge extracted from WordNet, including direct and explicit relations between the annotated nominals, and more general and implicit evidence (e.g. semantic boundary collocations). The system achieved a Macro-averaged F1 of 68.02% on the “Multi-Way Classiﬁcation of Semantic Relations Between Pairs of Nominals” task (Task #8) at SemEval-2010.
129	JU: A Supervised Approach to Identify Semantic Relations from Paired Nominals Santanu Pal, Partha Pakray, Dipankar Das and Sivaji Bandyopadhyay show abstracthide abstract This article presents the experiments carried out at Jadavpur University as part of the participation in Multi-Way Classiﬁcation of Semantic Relations between Pairs of Nomi-nals in the SemEval 2010 exercise. Separate rules for each type of the relations are iden-tiﬁed in the baseline model based on the verbs and prepositions present in the seg-ment between each pair of nominals. Inclu-sion of WordNet features associated with the paired nominals play an important role in distinguishing the relations from each other. The Conditional Random Field (CRF) based machine-learning framework is adopted for classifying the pair of nominals. Application of dependency relations, Named Entities (NE) and various types of WordNet features along with several com-binations of these features help to improve the performance of the system. Error analy-sis suggests that the performance can be im-proved by applying suitable strategies to differentiate each paired nominal in an al-ready identiﬁed relation. Evaluation result gives an overall macro-averaged F1 score of 52.16%.
131	FBK-IRST: Semantic Relation Extraction using Cyc Kateryna Tymoshenko and Claudio Giuliano show abstracthide abstract We present an approach for semantic relation extraction between nominals that combines semantic information with shallow syntactic processing. We propose to use the ResearchCyc knowledge base as a source of semantic information about nominals. Each kind of information is represented by kernel functions. The experiments were carried out using support vector machines as a classiﬁer. The system achieves an overall F1 of 77.62 on the "Multi-Way Classiﬁcation of Semantic Relations Between Pairs of Nominals" task at SemEval-2010.
132	ISTI@SemEval-2 Task #8: Boosting-Based Multiway Relation Classiﬁcation Andrea Esuli, Diego Marcheggiani and Fabrizio Sebastiani show abstracthide abstract We describe a boosting-based supervised learning approach to the “Multi-Way Classiﬁcation of Semantic Relations between Pairs of Nominals” task #8 of SemEval-2. Participants were asked to determine which relation, from a set of nine relations plus “Other”, exists between two nominals, and also to determine the roles of the two nominals in the relation. Our participation has focused, rather than on the choice of a rich set of features, on the classiﬁcation model adopted to determine the correct assignment of relation and roles.
133	ISI: Automatic Classiﬁcation of Relations Between Nominals Using a Maximum Entropy Classiﬁer Stephen Tratz and Eduard Hovy show abstracthide abstract The automatic interpretation of semantic relations between nominals is an important subproblem within natural language understanding applications and is an area of increasing interest. In this paper, we present the system we used to participate in the SemEval 2010 Task 8 Multi-Way Classiﬁcation of Semantic Relations between Pairs of Nominals. Our system, based upon a Maximum Entropy classiﬁer trained using a large number of boolean features, received the third highest score.
134	ECNU: Effective Semantic Relations Classiﬁcation without Complicated Features or Multiple External Corpora Yuan Chen, Man Lan, Jian Su, Zhi Min Zhou and Yu Xu show abstracthide abstract This paper describes our approach to the automatic identiﬁcation of semantic relations between nominals in English sentences. The basic idea of our strategy is to develop machine-learning classiﬁers which:(1) make use of class-independent features and classiﬁer; (2) make use of a simple and effective feature set without high computational cost; (3) make no use of external annotated or unannotated corpus at all. At SemEval 2010 Task 8 our system achieved an F-measure of 75.43% and an accuracy of 70.22%.
135	UCD-Goggle: A Hybrid System for Noun Compound Paraphrasing Guofu Li, Alejandra Lopez-Fernandez and Tony Veale show abstracthide abstract This paper addresses the problem of ranking a list of paraphrases associated with a noun-noun compound as closely as possible to the judgments of human raters. UCD-Goggle tackles this task using semantic knowledge learnt from the Google n-grams together with human-preferences for paraphrases mined from training data. Empirical evaluation shows that UCD-Goggle achieves 0.432 Spearman correlation with human judgments.
136	UCD-PN: Selecting General Paraphrases Using Conditional Probability Paul Nulty and Fintan Costello show abstracthide abstract We describe a system which ranks human-provided paraphrases of noun compounds, where the frequency with which a given paraphrase was provided by human volunteers is the gold standard for ranking. Our system assigns a score to a paraphrase of a given compound according to the number of times it has co-occurred with other paraphrases given in the rest of the dataset. We use these co-occurrence statistics to compute conditional probabilities which cluster together paraphrases which have similar meanings and also favour frequent, general paraphrases rather than infrequent paraphrases with more speciﬁc meanings.

July 16, 2010

09:00–10:30

System papers

09:00–09:15	COLEPL and COLSLM: An Unsupervised WSD Approach to Multilingual Lexical Substitution, Tasks 2 and 3 SemEval 2010 Weiwei Guo and Mona Diab show abstracthide abstract In this paper, we present a word sense disambiguation (WSD) based system for multilingual lexical substitution. Our method depends on having a WSD system for English and an automatic word alignment method. Crucially the approach relies on having parallel corpora. For Task 2 we apply a supervised WSD system to derive the English word senses. For Task 3, we apply an unsupervised approach to the training and test data. Both of our systems that participated in Task 2 achieve a decent ranking among the participating systems. For Task 3 we achieve the highest ranking on several of the language pairs: French, German and Italian.
09:15–09:30	UBA: Using Automatic Translation and Wikipedia for Cross-Lingual Lexical Substitution Pierpaolo Basile and Giovanni Semeraro show abstracthide abstract This paper presents the participation of the University of Bari (UBA) at the SemEval-2010 Cross-Lingual Lexical Substitution Task. The goal of the task is to substitute a word in a language Ls, which occurs in a particular context, by providing the best synonyms in a different language Lt which ﬁt in that context. This task has a strict relation with the task of automatic machine translation, but there are some differences: Cross-lingual lexical substitution targets one word at a time and the main goal is to ﬁnd as many good translations as possible for the given target word. Moreover, there are some connections with Word Sense Disambiguation (WSD) algorithms. Indeed, understanding the meaning of the target word is necessary to ﬁnd the best substitutions. An important aspect of this kind of task is the possibility of ﬁnding synonyms without using a particular sense inventory or a speciﬁc parallel corpus, thus allowing the participation of unsupervised approaches. UBA proposes two systems: the former is based on an automatic translation system which exploits Google Translator, the latter is based on a parallel corpus approach which relies on Wikipedia in order to ﬁnd the best substitutions.
09:30–09:45	HUMB: Automatic Key Term Extraction from Scientiﬁc Articles in GROBID Patrice Lopez and Laurent Romary show abstracthide abstract The Semeval task 5 was an opportunity for experimenting with the key term extraction module of GROBID, a system for extracting and generating bibliographical information from technical and scientiﬁc documents. The tool ﬁrst uses GROBID’s facilities for analyzing the structure of scientiﬁc articles, resulting in a ﬁrst set of structural features. A second set of features captures content properties based on phraseness, informativeness and keywordness measures. Two knowledge bases, GRISP and Wikipedia, are then exploited for producing a last set of lexical/semantic features. Bagged decision trees appeared to be the most efﬁcient machine learning algorithm for generating a list of ranked key term candidates. Finally a post ranking was realized based on statistics of co-usage of keywords in HAL, a large Open Access publication repository.
09:45–10:00	UTDMet: Combining WordNet and Corpus Data for Argument Coercion Detection Kirk Roberts and Sanda Harabagiu show abstracthide abstract This paper describes our system for the classiﬁcation of argument coercion for SemEval-2010 Task 7. We present two approaches to classifying an argument’s semantic class, which is then compared to the predicate’s expected semantic class to detect coercions. The ﬁrst approach is based on learning the members of an arbitrary semantic class using WordNet’s hypernymy structure. The second approach leverages automatically extracted semantic parse information from a large corpus to identify similar arguments by the predicates that select them. We show the results these approaches obtain on the task as well as how they can improve a traditional feature-based approach.
10:00–10:15	UTD: Classifying Semantic Relations by Combining Lexical and Semantic Resources Bryan Rink and Sanda Harabagiu show abstracthide abstract This paper describes our system for SemEval-2010 Task 8 on multi-way classiﬁcation of semantic relations between nominals. First, the type of semantic relation is classiﬁed. Then a relation type-speciﬁc classiﬁer determines the relation direction. Classiﬁcation is performed using SVM classiﬁers and a number of features that capture the context, semantic role afﬁliation, and possible pre-existing relations of the nominals. This approach achieved an F1 score of 82.19% and an accuracy of 77.92%.
10:15–10:30	UvT: Memory-based pairwise ranking of paraphrasing verbs Sander Wubben show abstracthide abstract In this paper we describe Mephisto, our system for Task 9 of the SemEval-2 workshop. Our approach to this task is to develop a machine learning classiﬁer which determines for each verb pair describing a noun compound which verb should be ranked higher. These classiﬁcations are then combined into one ranking. Our classiﬁer uses features from the Google N-gram Corpus, WordNet and the provided training data.

10:40–11:00

Coffee/Tea Break

11:00–12:30

System papers

11:00–11:15	SEMAFOR: Frame Argument Resolution with Log-Linear Models Desai Chen, Nathan Schneider, Dipanjan Das and Noah A. Smith show abstracthide abstract This paper describes the SEMAFOR system’s performance in the SemEval 2010 task on linking events and their participants in discourse. Our entry is based upon SEMAFOR 1.0 (Das et al., 2010), a frame-semantic probabilistic parser built from log-linear models. The extended system models null instantiations, including non-local argument reference. Performance is evaluated on the task data with and without gold-standard overt arguments. In both settings, it fares the best of the submitted systems with respect to recall and F1.
11:15–11:30	Cambridge: Parser Evaluation using Textual Entailment by Grammatical Relation Comparison Laura Rimell and Stephen Clark show abstracthide abstract This paper describes the Cambridge submission to the SemEval-2010 Parser Evaluation using Textual Entailment (PETE) task. We used a simple deﬁnition of entailment, parsing both T and H with the C&C parser and checking whether the core grammatical relations (subject and object) produced for H were a subset of those for T. This simple system achieved the top score for the task out of those systems submitted. We analyze the errors made by the system and the potential role of the task in parser evaluation.
11:30–11:45	MARS: A Specialized RTE System for Parser Evaluation Rui Wang and Yi Zhang show abstracthide abstract This paper describes our participation in the the SemEval-2010 Task #12, Parser Evaluation using Textual Entailment. Our system incorporated two dependency parsers, one semantic role labeler, and a deep parser based on hand-crafted grammars. The shortest path algorithm is applied on the graph representation of the parser outputs. Then, different types of features are extracted and the entailment recognition is casted into a machine-learning-based classiﬁcation task. The best setting of the system achieves 66.78% of accuracy, which ranks the 3rd place.
11:45–12:00	TRIPS and TRIOS System for TempEval-2: Extracting Temporal Information from Text Naushad UzZaman and James Allen show abstracthide abstract Extracting temporal information from raw text is fundamental for deep language understanding, and key to many applications like question answering, information extraction, and document summarization. In this paper, we describe two systems we submitted to the TempEval 2 challenge, for extracting temporal information from raw text. The systems use a combination of deep semantic parsing, Markov Logic Networks and Conditional Random Field classiﬁers. Our two submitted systems, TRIPS and TRIOS, approached all tasks and outperformed all teams in two tasks. Furthermore, TRIOS mostly had second-best performances in other tasks. TRIOS also outperformed the other teams that attempted all the tasks. Our system is notable in that for tasks C – F, they operated on raw text while all other systems used tagged events and temporal expressions in the corpus as input.
12:00–12:15	TIPSem (English and Spanish): Evaluating CRFs and Semantic Roles in TempEval-2 Hector Llorens, Estela Saquete Boro and Borja Navarro show abstracthide abstract This paper presents TIPSem, a system to extract temporal information from natural language texts for English and Spanish. TIPSem, learns CRF models from training data. Although the used features include different language analysis levels, the approach is focused on semantic information. For Spanish, TIPSem achieved the best F1 score in all the tasks. For English, it obtained the best F1 in tasks B (events) and D (event-dct links); and was among the best systems in the rest.
12:15–12:30	CityU-DAC: Disambiguating Sentiment-Ambiguous Adjectives within Context Bin Lu and Benjamin K. Tsou show abstracthide abstract This paper describes our system participating in task 18 of SemEval-2010, i.e. disambiguating Sentiment-Ambiguous Adjectives (SAAs). To disambiguating SAAs, we compare the machine learning-based and lexicon-based methods in our submissions: 1) Maximum entropy is used to train classiﬁers based on the annotated Chinese data from the NTCIR opinion analysis tasks, and the clause-level and sentence-level classiﬁers are compared; 2) For the lexicon-based method, we ﬁrst classify the adjectives into two classes: intensiﬁers (i.e. adjectives intensifying the intensity of context) and suppressors (i.e. adjectives decreasing the intensity of context), and then use the polarity of context to get the SAAs’ contextual polarity based on a sentiment lexicon. The results show that the performance of maximum entropy is not quite high due to little training data; on the other hand, the lexicon-based method could improve the precision by considering the polarity of context.

12:30–14:00

Lunch

14:00–15:30

Panel

15:30–16:00

Coffee/Tea Break

16:00–17:30

Posters Session

101	VENSES++: Adapting a deep semantic processing system to the identiﬁcation of null instantiations Sara Tonelli and Rodolfo Delmonte show abstracthide abstract In this paper we present VENSES++, a system to spot null instantiations and their antecedents, if available, as required by the "NIs-only" subtask of the SemEval 2010 Task 10 "Linking events and their participants in discourse". Our application is an adaptation of VENSES, a system for semantic evaluation that has been used for RTE challenges in the last 6 years. The new version exploits three modules of VENSES, namely the lexico-semantic module, the anaphora resolution module and the semantic module, in order to represent and analyse the document information. Then, two further procedures have been added: one identiﬁes null instantiated roles of verbal predicates, while the other deals with nominal predicates. The ﬁrst is based on the valence patterns extracted for every verbal lexical unit from FrameNet v. 1.4 and from the training data. The second procedure, instead, relies on a History List created by VENSES containing all events, spatial and temporal locations and body parts found in the document. Another useful resource employed to ﬁnd antecedents is ConceptNet 2.0. Even if the preliminary results are far from satisfactory, we were able to devise a robust, knowledge-based system and a general strategy for dealing with the task.
102	CLR: Linking Events and Their Participants in Discourse Using a Comprehensive FrameNet Dictionary Ken Litkowski show abstracthide abstract The CL Research system for SemEval-2 Task 10 for linking events and their participants in discourse is an exploration of the use of a specially created FrameNet dictionary that cap-tures all FrameNet information about frames, lexical units, and frame-to-frame relations. This system is embedded in a specially designed interface, the Linguistic Task Analyzer. The implementation of this system was quite minimal at the time of submission, allowing only an initial completion of the role recognition and labeling task, with recall of 0.112, precision of 0.670, and F-score of 0.192. We describe the design of the system and the continuing efforts to determine how much of this task can be performed with the available lexical resources. Changes since the ofﬁcial submission have improved the F-score to 0.266.
103	PKU_HIT: An Event Detection System Based on Instances Expansion and Rich Syntactic Features Shiqi Li, Peng-Yuan Liu, Tiejun Zhao, Qin Lu and Hanjing Li show abstracthide abstract This paper describes the PKU_HIT system on event detection in the SemEval-2010 Task. We construct three modules for the three sub-tasks of this evaluation. For target verb WSD, we build a Naïve Bayesian classiﬁer which uses additional training instances expanded from an untagged Chinese corpus automatically. For sentence SRL and event detection, we use a feature-based machine learning method which makes combined use of both consti-tuent-based and dependency-based features. Experimental results show that the Macro Accuracy of the WSD module reaches 83.81% and F-Score of the SRL module is 55.71%.
104	372:Comparing the Beneﬁt of Different Dependency Parsers for Textual Entailment Using Syntactic Constraints Only Alexander Volokh and Günter Neumann show abstracthide abstract We compare several state of the art dependency parsers with our own parser based on a linear classiﬁcation technique. Our primary goal is therefore to use syntactic information only, in order to keep the comparison of the parsers as fair as possible. We demonstrate, that despite the inferior result using the standard evaluation metrics for parsers like UAS or LAS on standard test data, our system achieves comparable results when used in an application, such as the PETE shared task. Our submission achieved the 4th position out of 19 participating systems. However, since it only uses a linear classiﬁer it works 17-20 times faster than other state of the parsers, as for instance MaltParser or Stanford Parser.
105	SCHWA: PETE using CCG Dependencies with the C&C Parser Dominick Ng, James W.D. Constable, Matthew Honnibal and James R. Curran show abstracthide abstract This paper describes the SCHWA system entered by the University of Sydney in SemEval 2010 Task 12 – Parser Evaluation using Textual Entailments (Yuret et al., 2010). Our system achieved an overall accuracy of 70% in the task evaluation. We used the C&C parser to build CCG dependency parses of the truth and hypothesis sentences. We then used partial match heuristics to determine whether the system should predict entailment. Heuristics were used because the dependencies generated by the parser are construction speciﬁc, making full compatibility unlikely. We also manually annotated the development set with CCG analyses, establishing an upper bound for our entailment system of 87%.
106	ID 392:TERSEO + T2T3 Transducer. A systems for recognizing and normalizing TIMEX3 Estela Saquete Boro show abstracthide abstract The system described in this paper has participated in the Tempeval 2 competition, speciﬁcally in the Task A, which aim is to determine the extent of the time expressions in a text as deﬁned by the TimeML TIMEX3 tag, and the value of the features type and val. For this purpose, a combination of TERSEO system and the T2T3 Transducer was used. TERSEO system is able to annotate text with TIDES TIMEX2 tags, and T2T3 transducer performs the translation from this TIMEX2 tags to TIMEX3 tags.
107	HeidelTime: High Quality Rule-based Extraction and Normalization of Temporal Expressions Jannik Strötgen and Michael Gertz show abstracthide abstract In this paper, we describe HeidelTime, a system for the extraction and normalization of temporal expressions. HeidelTime is a rule-based system mainly using regular expression patterns for the extraction of temporal expressions and knowledge resources as well as linguistic clues for their normalization. In the TempEval-2 challenge, HeidelTime achieved the highest F-Score (86%) for the extraction and the best results in assigning the correct value attribute, i.e., in understanding the semantics of the temporal expressions.
108	KUL: Recognition and Normalization of Temporal Expressions Oleksandr Kolomiyets and Marie-Francine Moens show abstracthide abstract In this paper we describe a system for the recognition and normalization of temporal expressions (Task 13: TempEval-2, Task A). The recognition task is approached as a classiﬁcation problem of sentence constituents and the normalization is implemented in a rule-based manner. One of the system features is extending positive annotations in the corpus by semantically similar words automatically obtained from a large unannotated textual corpus. The best results obtained by the system are 0.85 and 0.84 for precision and recall respectively for recognition of temporal expressions; the accuracy values of 0.91 and 0.55 were obtained for the feature values TYPE and VAL respectively.
109	UC3M system: Determining the Extent, Type and Value of Time Expressions in TempEval-2 María Teresa Vicente-Díez, Julián Moreno-Schneider and Paloma Martínez show abstracthide abstract This paper describes the participation of Universidad Carlos III de Madrid in Task A of the TempEval-2 evaluation. The UC3M system was originally developed for the temporal expressions recognition and normalization (TERN task) in Spanish texts, according to the TIDES standard. Current version supposes an almost-total refactoring of the earliest system. Additionally, it has been adapted to the TimeML annotation schema and a considerable effort has been done with the aim of increasing its coverage. It takes a rule-based design both in the identiﬁcation and the resolution phases. It adopts an inductive approach based on the empirical study of frequency of temporal expressions in Spanish corpora. Detecting the extent of the temporal expressions the system achieved a Precision/Recall of 0.90/0.87 whereas, in determining the TYPE and VALUE of those expressions, system results were 0.91 and 0.83, respectively.
110	Edinburgh-LTG: TempEval-2 System Description Claire Grover, Richard Tobin, Beatrice Alex and Kate Byrne show abstracthide abstract We describe the Edinburgh information extraction system which we are currently adapting for analysis of newspaper text as part of the SYNC3 project. Our most recent focus is geospatial and temporal grounding of entities and it has been useful to participate in TempEval-2 to measure the performance of our system and to guide further development. We took part in Tasks A and B for English.
111	USFD2: Annotating Temporal Expresions and TLINKs for TempEval-2 Leon Derczynski and Robert Gaizauskas show abstracthide abstract We describe the University of Shefﬁeld system used in the TempEval-2 challenge, USFD2. The challenge requires the automatic identiﬁcation of temporal entities and relations in text. USFD2 identiﬁes and anchors temporal expressions, and also attempts two of the four temporal relation assignment tasks. A rule-based system picks out and anchors temporal expressions, and a maximum entropy classiﬁer assigns temporal link labels, based on features that including descriptions of associated temporal signal words. USFD2 identiﬁed temporal expressions successfully, and correctly classiﬁed their type in 90% of cases. Determining the relation between an event and time expression in the same sentence was performed at 63% accuracy, the second highest score in this part of the challenge.
112	NCSU: Modeling Temporal Relations with Markov Logic and Lexical Ontology Eun Ha, Alok Baikadi, Carlyle Licata and James Lester show abstracthide abstract As a participant in TempEval-2, we address the temporal relations task consisting of four related subtasks. We take a supervised machine-learning technique using Markov Logic in combination with rich lexical relations beyond basic and syntactic features. One of our two submitted systems achieved the highest score for the Task F (66% precision), untied, and the second highest score (63% precision) for the Task C, which tied with three other systems.
113	JU_CSE_TEMP: A First Step towards Evaluating Events, Time Expressions and Temporal Relations Anup Kumar Kolya, Asif Ekbal and Sivaji Bandyopadhyay show abstracthide abstract Temporal information extraction is a popular and interesting research ﬁeld in the area of Natural Language Processing (NLP). In this paper, we report our works on TempEval-2 shared task. This is our ﬁrst participation and we participated in Tasks A, B, C, D, E and F. We develop the rule-based systems for Tasks A and B, whereas the remaining tasks are based on a machine learning approach, namely Conditional Random Field (CRF). All our systems are still in their development stages, and we report the very ini-tial results. Evaluation results on the shared task English datasets yield the precision, recall and F-measure values of 55%, 17% and 26%, respec-tively for Task A and 48%, 56% and 52%, re-spectively for Task B (event recognition). The rest of tasks, namely C, D, E and F were eva-luated with a relatively simpler metric: the num-ber of correct answers divided by the number of answers. Experiments on the English datasets yield the accuracies of 63%, 80%, 56% and 56% for tasks C, D, E and F, respectively.
114	KCDC: Word Sense Induction by Using Grammatical Dependencies and Sentence Phrase Structure Roman Kern, Markus Muhr and Michael Granitzer show abstracthide abstract Word sense induction and discrimination (WSID) identiﬁes the senses of an ambiguous word and assigns instances of this word to one of these senses. We have build a WSID system that exploits syntactic and semantic features based on the results of a natural language parser component. To achieve high robustness and good generalization capabilities, we designed our system to work on a restricted, but grammatically rich set of features. Based on the results of the evaluations our system provides a promising performance and robustness.
115	UoY: Graphs of Unambiguous Vertices for Word Sense Induction and Disambiguation Ioannis Korkontzelos and Suresh Manandhar show abstracthide abstract This paper presents an unsupervised graph-based method for automatic word sense induction and disambiguation. The innovative part of our method is the assignment of either a word or a word pair to each vertex of the constructed graph. Word senses are induced by clustering the constructed graph. In the disambiguation stage, each induced cluster is scored according to the number of its vertices found in the context of the target word. Our system participated in SemEval-2010 word sense induction and disambiguation task.
116	HERMIT: Flexible Clustering for the SemEval-2 WSI Task David Jurgens and Keith Stevens show abstracthide abstract A single word may have multiple unspeciﬁed meanings in a corpus. Word sense induction aims to discover these different meanings through word use, and knowledge-poor algorithms attempt this without using external lexical resources.We propose a new method for identifying the different senses that uses a ﬂexible clustering strategy to automatically determine the number of senses, rather than predeﬁning it. We demonstrate the effectiveness using the SemEval-2 WSI task, achieving competitive scores on both the V-Measure and Recall metrics, depending on the parameter conﬁguration.
117	Duluth-WSI: SenseClusters Applied to the Sense Induction Task of SemEval-2 Ted Pedersen show abstracthide abstract The Duluth-WSI systems in SemEval-2 built word co–occurrence matrices from the task test data to create a second order co–occurrence representation of those test instances. The senses of words were induced by clustering these instances, where the number of clusters was automatically predicted. The Duluth-Mix system was a variation of WSI that used the combination of training and test data to create the co-occurrence matrix. The Duluth-R system was a series of random baselines.
118	KSU KDD: Word Sense Induction by Clustering in Topic Space Wesam Elshamy, Doina Caragea and William Hsu show abstracthide abstract We describe our language-independent unsupervised word sense induction system. This system only uses topic features to cluster different word senses in their global context topic space. Using unlabeled data, this system trains a latent Dirichlet allocation (LDA) topic model then uses it to infer the topics distribution of the test instances. By clustering these topics distributions in their topic space we cluster them into different senses. Our hypothesis is that closeness in topic space reﬂects similarity between different word senses. This system participated in SemEval-2 word sense induction and disambiguation task and achieved the second highest V-measure score among all other systems.
119	PengYuan@PKU: Extracting Infrequent Sense Instance with the Same N-gram Pattern for the SemEval-2010 Task 15 Peng-Yuan Liu, Shi-Wen Yu, Shui Liu and Tiejun Zhao show abstracthide abstract This paper describes our infrequent sense identiﬁcation system participating in the SemEval-2010 task 15 on Infrequent Sense Identiﬁcation for Mandarin Text to Speech Systems. The core system is a supervised system based on the ensembles of Naïve Bayesian classiﬁers. In order to solve the problem of unbalanced sense distribution, we intentionally extract only instances of infrequent sense with the same N-gram pattern as the complemental training data from an untagged Chinese corpus – People’s Daily of the year 2001. At the same time, we adjusted the prior probability to adapt to the distribution of the test data and tuned the smoothing coefﬁcient to take the data sparseness into account. Ofﬁcial result shows that, our system ranked the ﬁrst with the best Macro Accuracy 0.952. We brieﬂy describe this system, its conﬁguration options and the features used for this task and present some discussion of the results.
120	RALI: Automatic weighting of text window distances Bernard Brosseau-Villeneuve, Noriko Kando and Jian-Yun Nie show abstracthide abstract Systems using text windows to model word contexts have mostly been using ﬁxed-sized windows and uniform weights. The window size is often selected by trial and error to maximize task results. We propose a non-supervised method for selecting weights for each window distance, effectively removing the need to limit window sizes, by maximizing the mutual generation of two sets of samples of the same word. Experiments on Semeval Word Sense Disambiguation tasks showed considerable improvements.
121	JAIST: Clustering and Classiﬁcation based Approaches for Japanese WSD Kiyoaki Shirai and Makoto Nakamura show abstracthide abstract This paper reports about our three participating systems in SemEval-2 Japanese WSD task. The ﬁrst one is a clustering based method, which chooses a sense for, not individual instances, but automatically constructed clusters of instances. The second one is a classiﬁcation method, which is an ordinary SVM classiﬁer with simple domain adaptation techniques. The last is an ensemble of these two systems. Results of the formal run shows the second system is the best. Its precision is 0.7476.
122	MSS: Investigating the Effectiveness of Domain Combinations and Topic Features for Word Sense Disambiguation Sanae Fujita, Kevin Duh, Akinori Fujino, Hirotoshi Taira and Hiroyuki Shindo show abstracthide abstract We participated in the SemEval-2010 Japanese Word Sense Disambiguation (WSD) task (Task 16). Our focus was on (1) investigating domain differences, (2) incorporating topic features, (3) predicting new unknown senses. We experimented with Support Vector Machines (SVM) and Maximum Entropy (MEM) classiﬁers. We achieved an accuracy of 80.1 % in our experiments.
123	IIITH: Domain Speciﬁc Word Sense Disambiguation Siva Reddy, Abhilash Inumella, Diana McCarthy and Mark Stevenson show abstracthide abstract We describe two systems that participated in SemEval-2010 task 17 (All-words Word Sense Disambiguation on a Speciﬁc Domain) and were ranked in the third and fourth positions in the formal evaluation. Domain adaptation techniques using the background documents released in the task were used to assign ranking scores to the words and their senses. The test data was disambiguated using the Personalised PageRank algorithm which was applied to a graph constructed from the whole of WordNet in which nodes are initialised with ranking scores of words and their senses. Our systems achieved comparable accuracy of 53.4 and 52.2, which outperforms the most frequent sense baseline (50.5)
124	UCF-WS: Domain Word Sense Disambiguation using Web Selectors Hansen A. Schwartz and Fernando Gomez show abstracthide abstract This paper studies the application of the Web Selectors word sense disambiguation system on a speciﬁc domain. The system was primarily applied without any domain tuning, but the incorporation of domain predominant sense information was explored. Results indicated that the system performs relatively the same with domain predominant sense information as without, scoring well above a random baseline, but still 5 percentage points below results of using the most frequent sense.
125	TreeMatch: A Fully Unsupervised WSD System Using Dependency Knowledge on a Speciﬁc Domain Andrew Tran, Chris Bowes, David Brown, Ping Chen, Max Choly and Wei Ding show abstracthide abstract Word sense disambiguation (WSD) is one of the main challenges of applications in Natural Language Processing. TreeMatch is a WSD system originally developed using data from SemEval 2007 Task 7 (Coarse-grained English All-words Task) that has been adapted for use in SemEval 2010 Task 17 (All-words Word Sense Disambiguation on a Speciﬁc Domain). The system is based on a fully unsupervised method using dependency knowledge drawn from a domain speciﬁc knowledge base that was built for this task. When evaluated on the task, the system precision performs above the First Sense Baseline.
126	GPLSI-IXA: Using Semantic Classes to Acquire Monosemous Training Examples from Domain Texts Rubén Izquierdo, Armando Suárez and German Rigau show abstracthide abstract This paper summarizes our participation in task #17 of SemEval–2 (All–words WSD on a speciﬁc domain) using a supervised class-based Word Sense Disambiguation system. Basically, we use Support Vector Machines (SVM) as learning algorithm and a set of simple features to build three different models. Each model considers a different training corpus: SemCor (SC), examples from monosemous words extracted automatically from background data (BG), and both SC and BG (SCBG). Our system explodes the monosemous words appearing as members of a particular WordNet semantic class to automatically acquire class-based annotated examples from the domain text. We use the class-based examples gathered from the domain corpus to adapt our traditional system trained on SemCor. The evaluation reveal that the best results are achieved training with SemCor and the background examples from monosemous words, obtaining results above the most frequent baseline and the ﬁfth best position in the competition rank.
127	HIT-CIR: An Unsupervised {WSD} System Based on Domain Most Frequent Sense Estimation Yuhang Guo, Wanxiang Che, Wei He, Ting Liu and Sheng Li show abstracthide abstract This paper presents an unsupervised system for all-word domain speciﬁc word sense disambiguation task. This system tags target word with the most frequent sense which is estimated using a thesaurus and the word distribution information in the domain. The thesaurus is automatically constructed from bilingual parallel corpus using paraphrase technique. The recall of this system is 43.5\% on SemEval-2 task 17 English data set.
128	RACAI: Unsupervised WSD experiments @ SemEval-2, Task #17 Radu Ion and Dan Ştefănescu show abstracthide abstract This paper documents the participation of the Research Institute for Artiﬁcial Intelligence of the Romanian Academy (RACAI) to the Task 17 – All-words Word Sense Disambiguation on a Speciﬁc Domain, of the SemEval-2 competition. We describe three unsupervised WSD systems that make extensive use of the Princeton WordNet (WN) structure and WordNet Domains in order to perform the disambiguation. The best of them has been ranked the 12th by the task organizers out of 29 judged runs.
129	Kyoto: An Integrated System for Speciﬁc Domain WSD Aitor Soroa, Eneko Agirre, Oier López de Lacalle, Wauter Bosma, Piek Vossen, Monica Monachini, Jessie Lo and Shu-Kai Hsieh show abstracthide abstract This document describes the preliminary release of the integrated Kyoto system for speciﬁc domain WSD. The system uses concept miners (Tybots) to extract domain-related terms and produces a domain-related thesaurus, followed by knowledge-based WSD based on wordnet graphs (UKB). The resulting system can be applied to any language with a lexical knowledge base, and is based on publicly available software and resources. Our participation in Semeval task #17 focused on producing running systems for all languages in the task, and we attained good results in all except Chinese. Due to the pressure of the time-constraints in the competition, the system is still under development, and we expect results to improve in the near future.
130	CFILT: Resource Conscious Approaches for All-Words Domain Speciﬁc WSD Anup Kulkarni, Mitesh Khapra, Saurabh Sohoney and Pushpak Bhattacharyya show abstracthide abstract We describe two approaches for All-words Word Sense Disambiguation on a Speciﬁc Domain}. The ﬁrst approach is a knowledge based approach which extracts domain-speciﬁc largest connected components from the Wordnet graph by exploiting the semantic relations between all candidate synsets appearing in a domain-speciﬁc untagged corpus. Given a test word, disambiguation is performed by considering only those candidate synsets that belong to the top-k largest connected components. The second approach is a weakly supervised approach which relies on the "One Sense Per Domain" heuristic and uses a few hand labeled examples for the most frequently appearing words in the target domain. Once the most frequent words have been disambiguated they can provide strong clues for disambiguating other words in the sentence using an iterative disambiguation algorithm. Our weakly supervised system gave the best performance across all systems that participated in the task even when it used as few as 100 hand labeled examples from the target domain.
131	UMCC-DLSI: Integrative Resource for Disambiguation Task Yoan Gutiérrez Vázquez, Antonio Fernandez Orquín, Andrés Montoyo Guijarro and Sonia Vázquez Pérez show abstracthide abstract This paper describes the UMCC-DLSI system in SemEval-2010 task number 17 (All-words Word Sense Disambiguation on Speciﬁc Domain). The main purpose of this work is to evaluate and compare our computational resource of WordNet’s mappings using 3 different methods: Relevant Semantic Tree, Relevant Semantic Tree 2 and an Adaptation of k-clique’s Technique. Our proposal is a non-supervised and knowledge-based system that uses Domains Ontology and SUMO.
132	HR-WSD: System Description for All-words Word Sense Disambiguation on a Speciﬁc Domain at SemEval-2010 Meng-Hsien Shih show abstracthide abstract The document describes the knowledge-based Domain-WSD system using heuristic rules (knowledge-base). This HR-WSD system delivered the best performance (55.9%) among all Chinese systems in SemEval-2010 Task 17: All-words WSD on a speciﬁc domain.
133	Twitter Based System: Using Twitter for Disambiguating Sentiment Ambiguous Adjectives Alexander Pak and Patrick Paroubek show abstracthide abstract In this paper, we describe our system which participated in the SemEval 2010 task of disambiguating sentiment ambiguous adjectives for Chinese. Our system uses text messages from Twitter, a popular microblogging platform, for building a dataset of emotional texts. Using the built dataset, the system classiﬁes the meaning of adjectives into positive or negative sentiment polarity according to the given context. Our approach is fully automatic. It does not require any additional hand-built language resources and it is language independent.
134	YSC-DSAA: An Approach to Disambiguate Sentiment Ambiguous Adjectives Based On SAAOL Shi-Cai Yang and Mei-Juan Liu show abstracthide abstract In this paper, we describe the system we developed for the SemEval-2010 task of Disambiguating Sentiment Ambiguous Adjectives (hereinafter referred to SAA). Our system created a new word library named SAA-Oriented Library consisting of positive words, negative words, negative words related to SAA, positive words related to SAA, and inverse words, etc. Based on the syntactic parsing, we analyzed the relationship between SAA and the keywords and handled other special processes by extracting such words in the relevant sen-tences to disambiguate sentiment ambiguous adjectives. Our micro average accuracy is 0.942, which puts our system in the ﬁrst place.
135	OpAL: Applying Opinion Mining Techniques for the Disambiguation of Sentiment Ambiguous Adjectives in SemEval-2 Task 18 Alexandra Balahur and Andrés Montoyo Guijarro show abstracthide abstract The task of extracting the opinion expressed in text is challenging due to different reasons. One of them is that the same word (in particular, adjectives) can have different polarities depending on the context. This paper presents the experiments carried out by the OpAL team for the participation in the SemEval 2010 Task 18 – Disambiguation of Sentiment Ambiguous Adjectives. Our approach is based on three different strategies: a) the evaluation of the polarity of the whole context using an opinion mining system; b) the assessment of the polarity of the local context, given by the combinations between the closest nouns and the adjective to be classiﬁed; c) rules aiming at reﬁning the local semantics through the spotting of modiﬁers. The ﬁnal decision for classiﬁcation is taken according to the output of the majority of these three approaches. The method used yielded good results, the OpAL system run ranking ﬁfth among 16.
136	HITSZ_CITYU: Combine Collocation, Context Words and Neighboring Sentence Sentiment in Sentiment Adjectives Disambiguation Ruifeng Xu, Jun Xu and Chunyu Kit show abstracthide abstract This paper presents the HIT_CITYU systems in Semeval-2 Task 18, namely, disambiguat-ing sentiment ambiguous adjectives. The baseline system (HITSZ_CITYU_3) incorporates bi-gram and n-gram collocations of sentiment adjectives, and other context words as features in a one-class Support Vector Machine (SVM) classiﬁer. To enhance the baseline system, collocation set expansion and characteristics learning based on word similarity and semi-supervised learning are investigated, respectively. The ﬁnal system (HITSZ_CITYU_1/2) combines collocations, context words and neighboring sentence sentiment in a two-class SVM classiﬁer to determine the polarity of sentiment adjectives. The ﬁnal systems achieved 0.957 and 0.953 (ranked 1st and 2nd) macro accuracy, and 0.936 and 0.933 (ranked 2nd and 3rd) micro accuracy, respectively.
137	SWAT: Cross-Lingual Lexical Substitution using Local Context Matching, Bilingual Dictionaries and Machine Translation Richard Wicentowski, Maria Kelly and Rachel Lee show abstracthide abstract We present two systems that select the most appropriate Spanish substitutes for a marked word in an English test sentence. These systems were ofﬁcial entries to the SemEval-2010 Cross-Lingual Lexical Substitution task. The ﬁrst system, Swat-E, ﬁnds Spanish substitutions by ﬁrst ﬁnding English substitutions in the English sentence and then translating these substitutions into Spanish using an English-Spanish dictionary. The second system, Swat-S, translates each English sentence into Spanish and then ﬁnds the Spanish substitutions in the Spanish sentence. Both systems exceeded the baseline and all other participating systems by a wide margin using one of the two ofﬁcial scoring metrics.
138	TUD: semantic relatedness for relation classiﬁcation György Szarvas and Iryna Gurevych show abstracthide abstract In this paper, we describe the system submitted by the team TUD to Task 8 at SemEval 2010. The challenge focused on the identiﬁcation of semantic relations between pairs of nominals in sentences collected from the web. We applied maximum entropy classiﬁcation using both lexical and syntactic features to describe the nominals and their context. In addition, we experimented with features describing the semantic relatedness (SR) between the target nominals and a set of clue words characteristic to the relations. Our best submission with SR features achieved 69.23% macro-averaged F-measure, providing 8.73% improvement over our baseline system. Thus, we think SR can serve as a natural way to incorporate external knowledge to relation classiﬁcation.

⇑ WS3: The LAW IV

The 4th Linguistic Annotation Workshop
July 15–16
Venue A, Hall X
Chairs: Nianwen Xue and Massimo Poesio
Homepage

Thursday, July 15, 2010

08:40–08:50

Opening remarks

08:50–10:30

Session I

08:50–09:15	EmotiBlog: a Finer-Grained and More Precise Learning of Subjectivity Expression Models Ester Boldrini, Alexandra Balahur, Patricio Martínez-Barco and Andrés Montoyo Guijarro show abstracthide abstract The exponential growth of the subjective in-formation in the framework of the Web 2.0 has led to the need to create Natural Language Processing tools able to analyse and process such data for multiple practical applications. These applications require training on speciﬁcally annotated corpora, whose level of detail must be ﬁne enough to capture the phenomena involved. This paper presents EmotiBlog — a ﬁne-grained annotation scheme for subjectivity. We show the manner in which it is built and demonstrate the beneﬁts it brings to the systems using it for training, through the experiments we carried out on opinion mining and emotion detection. We employ corpora of different textual genres — a set of annotated reported speech extracted from news articles, the set of news titles annotated with polarity and emotion from the SemEval 2007 (Task 14) and ISEAR, a corpus of real-life self-expressed emotion. We also show how the model built from the EmotiBlog annotations can be enhanced with external resources. The results demonstrate that EmotiBlog, through its structure and annotation paradigm, offers high quality training data for systems dealing both with opinion mining, as well as emotion detection.
09:15–09:40	Error-tagged Learner Corpus of Czech Jirka Hana, Alexandr Rosen, Svatava Škodová and Barbora Štindlová show abstracthide abstract The paper describes a Learner corpus of Czech, currently under development. The corpus captures Czech as used by non-native speakers. We discuss its structure, the layered annotation of errors and the annotation process.
09:40–10:05	Annotation Scheme for Social Network Extraction from Text Apoorv Agarwal, Owen Rambow and Rebecca Passonneau show abstracthide abstract In this paper we present a novel annotation scheme that facilitates the extraction of social networks from text. We focus on a new type of event, called social event, in which two people participate and either both are cognizant of each other or only one is cognizant of the other. We deﬁne four types of social events: Interaction, Cognition, Physical Proximity and Perception. Since our annotation task is complex and layered, we present confusion matrices, Cohen’s Kappa, and F-measure values for each of the decision points that the annotators go through in the process of selecting a type and subtype for an event. For a set of documents from the ACE-2005 corpus, we achieve high Kappa (0.66-0.86) and F-measure (0.8-0.9) values which indicate that our annotation scheme is reliable. We also implement a global agreement measure which is inspired by the Automated Content Extraction (ACE) inter-annotator agreement measure. We get about 70\% agreement that compares favorably to the ACE annotation effort.
10:05–10:30	Agile Corpus Annotation in Practice: An Overview of Manual and Automatic Annotation of CVs Beatrice Alex, Claire Grover, Rongzhou Shen and Mijail Kabadjov show abstracthide abstract Annotated data sets are an important resources for various research ﬁelds, including natural language processing (NLP) and text mining (TM). While the detection of annotation inconsistencies in different data sets has been investigated and their effect on NLP performance has been studied, very little work has been done on deriving better methods of annotation as a whole process in order to maximize both the quality and quantity of annotated data. This paper describes our annotation project in which we tested the relatively new approach of agile corpus annotation of moving away from the traditional, linear phases of corpus creation towards iterative ones and of recognizing the fact that sources of error can occur throughout the annotation process. The paper also summarize the performance of the machine-learning (ML)-based TM components which were trained and evaluated on the annotated data of CVs of software developers and programmers.

10:30–11:00

Break

11:00–12:40

Session II

11:00–11:25	Consistency Checking for Treebank Alignment Markus Dickinson and Yvonne Samuelsson show abstracthide abstract This paper explores ways to detect errors in aligned corpora, using very little technology. In the ﬁrst method, applicable to any aligned corpus, we consider alignment as a string-to-string mapping. Treating the target string as a label, we examine each source string to ﬁnd inconsistencies in alignment. Despite setting up the problem on a par with grammatical annotation, we demonstrate crucial differences in sorting errors from legitimate variations. The second method examines phrase nodes which are predicted to be aligned, based on the alignment of their yields. Both methods are effective in complementary ways.
11:25–11:50	Anveshan: A Framework for Analysis of Multiple Annotators’ Labeling Behavior Vikas Bhardwaj, Rebecca Passonneau, Ansaf Salleb-Aouissi and Nancy Ide show abstracthide abstract Manual annotation of natural language to capture linguistic information is essential for NLP tasks involving supervised machine learning of semantic knowledge. Judgements of meaning can be more or less subjective, in which case instead of a single correct label, the labels assigned might vary among annotators based on the annotators’ knowledge, age, gender, intuitions, background, and so on. We introduce a framework ”Anveshan”, where we investigate annotator behavior to ﬁnd outliers, cluster annotators by behavior, and identify confusable labels. We also investigate the effectiveness of using trained annotators versus a larger number of untrained annotators on a word sense annotation task. The annotation data comes from a word sense disambiguation task for polysemous words, annotated by both trained annotators and untrained annotators from Amazon’s Mechanical turk. Our results show that Anveshan is effective in uncovering patterns in annotator behavior, and we also show that trained annotators are superior to a larger number of untrained annotators for this task.
11:50–12:15	Inﬂuence of Pre-annotation on POS-tagged Corpus Development Karën Fort and Benoît Sagot show abstracthide abstract This article details a series of carefully designed experiments aiming at evaluating the inﬂuence of automatic pre-annotation on the manual part-of-speech annotation of a corpus, both from the quality and the time points of view, with a speciﬁc attention drawn to biases. For this purpose, we manually annotated parts of the Penn Treebank corpus (Marcus et al., 1993) under various experimental setups, either from scratch or using various pre-annotations. These experiments conﬁrm and detail the gain in quality observed before (Marcus et al., 1993; Dandapat et al., 2009; Rehbein et al., 2009), while showing that biases do appear and should be taken into account. They ﬁnally demonstrate that even a not so accurate tagger can help improving annotation speed.
12:15–12:40	To Annotate More Accurately or to Annotate More Dmitriy Dligach, Rodney Nielsen and Martha Palmer show abstracthide abstract The common accepted wisdom is that blind double annotation followed by adjudication of disagreements is necessary to create training and test corpora that result in the best possible performance. We provide evidence that this is unlikely to be the case. Rather, the greatest value for your annotation dollar lies in single annotating more data.

12:40–13:50

Lunch

13:50–15:30

Session III

13:50–14:15	Annotating Underquantiﬁcation Aurelie Herbelot and Ann Copestake show abstracthide abstract Many noun phrases in text are ambiguously quantiﬁed: syntax doesn’t explicitly tell us whether they refer to a single entity or to several, and what portion of the set denoted by the Nbar actually takes part in the event expressed by the verb. We describe this ambiguity phenomenon in terms of underspeciﬁcation, or rather ‘underquantiﬁcation’. We attempt to validate the underquantiﬁcation hypothesis by producing and testing an annotation scheme for quantiﬁcation resolution, the aim of which is to associate a single quantiﬁer with each noun phrase in our corpus.
14:15–14:40	PropBank Annotation of Multilingual Light Verb Constructions Jena D. Hwang, Archna Bhatia, Claire Bonial, Aous Mansouri, Ashwini Vaidya, Nianwen Xue and Martha Palmer show abstracthide abstract In this paper, we have addressed the task of PropBank annotation of light verb constructions, which like multi-word expressions pose special problems. To arrive at a solution, we have evaluated 3 different possible methods of annotation. The ﬁnal method involves three passes: (1) manual identiﬁcation of a light verb construction, (2) annotation based on the light verb construction’s Frame File, and (3) a deterministic merging of the ﬁrst two passes. We also discuss how in various languages the light verb constructions are identiﬁed and can be distinguished from the non-light verb word groupings.
14:40–15:05	Retrieving Correct Semantic Boundaries in Dependency Structure Jinho Choi and Martha Palmer show abstracthide abstract This paper describes the retrieval of correct semantic boundaries for predicate-argument structures annotated by dependency structure. Unlike phrase structure, in which arguments are annotated at the phrase level, dependency structure does not have phrases so the argument labels are associated with head words instead: the subtree of each head word is assumed to include the same set of words as the annotated phrase does in phrase structure. However, at least in English, retrieving such subtrees does not always guarantee retrieval of the correct phrase boundaries. In this paper, we present heuristics that retrieve correct phrase boundaries for semantic arguments, called semantic boundaries, from dependency trees. By applying heuristics, we achieved an F1-score of 99.54% for correct representation of semantic boundaries. Furthermore, error analysis showed that some of the errors could also be considered correct, depending on the interpretation of the annotation.
15:05–15:30	Complex Predicates Annotation in a Corpus of Portuguese Iris Hendrickx, Amália Mendes, Sílvia Pereira, Anabela Gonçalves and Inês Duarte show abstracthide abstract We present an annotation scheme for the annotation of complex predicates, understood as constructions with more than one lexical unit, each contributing part of the information normally associated with a single predicate. We discuss our annotation guidelines of four types of complex predicates, and the treatment of several difﬁcult cases, related to ambiguity, overlap and coordination. We then discuss the process of marking up the Portuguese CINTIL corpus of 1M tokens (written and spoken) with a new layer of information regarding complex predicates. We also present the outcomes of the annotation work and statistics on the types of CPs that we found in the corpus.

15:30–16:00

Break

16:00–17:30

Poster session

1	Using an Online Tool for the Documentation of Edo Language Ota Ogie show abstracthide abstract Language documentation is important as a tool for preservation of endangered languages and making data available to speakers and researchers of a language. A data base such as TypeCraft is important for typology studies both for well documented languages as well as little documented languages and is a valid tool for comparison of languages. This requires that linguistic elements must be coded in a manner that allows comparability across widely varying language data. In this paper, I discuss how I have used the coding system in TypeCraft for the documentation of data from Èdó language, a language belonging to the Edoid group of the Benue-Congo subfamily of the Volta-Congo language family and spoken in Mid-Western Nigeria, West Africa. The study shows how syntactic, semantic and morphological properties of multi-verb constructions in Èdó (Benue-Congo) can be represented in a relational database.
2	Cross-Lingual Validity of PropBank in the Manual Annotation of French Lonneke van der Plas, Tanja Samardzic and Paola Merlo show abstracthide abstract Methods that re-use existing mono-lingual semantic annotation resources to annotate a new language rely on the hypothesis that the semantic annotation scheme used is cross-lingually valid. We test this hypothesis in an annotation agreement study. We show that the annotation scheme can be applied cross-lingually.
3	Characteristics of High Agreement Affect Snnotation in Text Cecilia Ovesdotter Alm show abstracthide abstract The purpose of this paper is to present an unusual English dataset for affect exploration in text. It describes a corpus of fairy tales from three sources that have been annotated for affect at the sentence level. Special attention is given to data marked by high annotator agreement. A qualitative analysis of characteristics of high agreement sentences from H. C. Andersen reveals several interesting trends, illustrated by examples. Requested additional information: Poster proposal Language: English (including texts in translation from Danish/German) Paper categories: corpus annotation, semantics, (opinion/sentiment) Non-standard equipment: not required
4	The Deep Re-annotation in a Chinese Scientiﬁc Treebank Kun Yu, Xiangli Wang, Yusuke Miyao, Takuya Matsuzaki and Jun’ichi Tsujii show abstracthide abstract In this paper, we introduce our recent work on re-annotating the deep information, which includes both the grammatical functional tags and the traces, in a Chinese scientiﬁc tree-bank. The issues with regard to re-annotation and its corresponding solutions are discussed. Furthermore, the process of the re-annotation work is described.
5	The Uniﬁed Annotation of Syntax and Discourse in the Copenhagen Dependency Treebanks Matthias Buch-Kromann and Iørn Korzen show abstracthide abstract We propose a uniﬁed model of syntax and discourse in which text structure is viewed as a tree structure augmented with anaphoric relations and other secondary relations. We describe how the model accounts for discourse connectives and the syntax-discourse-semantics interface. Our model is dependency-based, ie, words are the basic building blocks in our analyses. The analyses have been applied cross-linguistically in the Copenhagen Dependency Treebanks, a set of parallel treebanks for Danish, English, German, Italian, and Spanish which are currently being annotated with respect to discourse, anaphora, syntax, morphology, and translational equivalence.
6	Identifying Sources of Inter-Annotator Variation: Evaluating Two Models of Argument Analysis Barbara White show abstracthide abstract An analysis of an article’s argument (rhetorical) structure can serve to identify elements that biomedical researchers wish to access. Human-annotated data are needed to train such automated systems for Information Extraction. This paper reports on a study where two Models of argument were applied to the Discussion sections of a corpus of twelve biomedical research articles downloaded from the BMC-series of journals. The three annotators were the study director and current author, and two fourth-year Medical Science students. The goals were to evaluate and compare the performance of the Models and to identify sources of inter-annotator variation as diagnostics for improving either or both Models. The ﬁrst Model applied was based on previous work – Argumentative Zoning, Teufel et al. 1999; Zone Analysis, Mizuta et al. 2005 – but the second was developed from Toulmin’s Claims-based argument structure (1958/2003). The results exhibited a mixture of systematic and random (noise-like) inter-annotator disagreements. The patterns in the systematic variation showed that there are problems with particular argument categories under both Models as well as notable annotator bias toward certain categories in some instances. In addition, there was a surprisingly wide range in percentage of three-way inter-annotator agreement under both Models among the twelve corpus articles. This ‘inter-article’ variation brings to light the importance of another factor in these annotation results: the quality and clarity of the writing and exposition of the corpus data. The results of this study indicate a need to revise both Models of argument to ensure that categories are clearly distinguished. Based on the technical complexity of the corpus data and the importance of understanding how authors present their arguments, it is recommended that in the future annotators should work in pairs – a biomedical domain expert together with an expert in rhetoric.
7	Dependency-Based PropBanking of Clinical Finnish Katri Haverinen, Filip Ginter, Timo Viljanen, Veronika Laippala and Tapio Salakoski show abstracthide abstract In this paper, we present a PropBank of clinical Finnish, an annotated corpus of verbal propositions and arguments. The clinical PropBank is created on top of a previously existing dependency treebank annotated in the Stanford Dependency (SD) scheme and covers 90% of all verb occurrences in the treebank. We establish that the PropBank scheme is applicable to clinical Finnish as well as compatible with the SD scheme, with an overwhelming proportion of arguments being governed by the verb. This allows argument candidates to be restricted to direct verb dependents, substantially simplifying the PropBank construction. The clinical Finnish PropBank is freely available at the address http://bionlp.utu.ﬁ.
8	Building the Syntactic Reference Corpus of Medieval French Using NotaBene RDF Annotation Tool Nicolas Mazziotta show abstracthide abstract In this paper, we introduce the NotaBene RDF Annotation Tool free software used to build the Syntactic Reference Corpus of Medieval French. It relies on a dependency-based model to manually annotate Old French texts from the Base de Français Médiéval and the Nouveau Corpus d’Amsterdam. NotaBene uses OWL ontologies to frame the terminology used in the annotation, which is displayed in a tree-like view of the annotation. This tree widget allows easy grouping and tagging of words and structures. To increase the quality of the annotation, two annotators work independently on the same texts at the same time and NotaBene can also generate automatic comparisons between both analyses. The RDF format can be used to export the data to several other formats: namely, TigerXML (for querying the data and extracting structures) and graphviz dot format (for quoting syntactic description in research papers).
9	Chunking German: An Unsolved Problem Sandra Kübler, Kathrin Beck, Erhard Hinrichs and Heike Telljohann show abstracthide abstract This paper describes a CoNLL-style chunk representation for the Tübingen Treebank of Written German, which assumes a ﬂat chunk structure so that each word belongs to at most one chunk. For German, such a chunk deﬁnition causes problems in cases of complex prenominal modiﬁcation. We introduce a ﬂat annotation that can handle these structures via a stranded noun chunk.
10	Proposal for MWE Annotation in Running Text Iris Hendrickx, Amália Mendes and Sandra Antunes show abstracthide abstract We present a proposal for the annotation of multi-word expressions in a 1M corpus of contemporary portuguese. Our aim is to create a resource that allows us to study multi-word expressions (MWE) in their context. The corpus will be a valuable additional resource next to the already existing MWE lexicon that was based on a much larger corpus of 50M words. In this paper we discuss the problematic cases for annotation and proposed solutions, focusing on the variational properties of MWE .
11	A Feature Type Classiﬁcation for Therapeutic Purposes: a preliminary evaluation with non-expert speakers Gianluca E. Lebani and Emanuele Pianta show abstracthide abstract We propose a feature type classiﬁcation thought to be used in a therapeutic context. Such a scenario lays behind our need for a easily usable and cognitively plausible classi-ﬁcation. Nevertheless, our proposal has both a practical and a theoretical outcome, and its applications range from computational lin-guistics to psycholinguistics. An evaluation through inter-coder agreement has been per-formed to highlight the strength of our pro-posal and to conceive some improvements for the future.
12	Annotating Korean Demonstratives Sun-Hee Lee and Jae-young Song show abstracthide abstract This paper presents preliminary work on a corpus-based study of Korean demonstratives. Through the development of an annotation scheme and the use of spoken and written corpora, we aim to determine different functions of demonstratives and to examine their distributional properties. Our corpus study adopts similar features of annotation used in Botley and McEnery (2001) and provides some linguistic hypotheses on grammatical functions of Korean demonstratives to be further explored.
21	Creating and Exploiting a Resource of Parallel Parses Christian Chiarcos, Kerstin Eckart and Julia Ritz show abstracthide abstract This paper describes the creation of a resource of German sentences with multiple automatically created alternative syntactic analyses (parses) for the same text, and how qualitative and quantitative investigations of this resource can be performed using ANNIS, a tool for corpus querying and visualization. Using the example of PP attachment, we show how parsing can beneﬁt from the use of such a resource.
22	From Descriptive Annotation to Grammar Speciﬁcation Lars Hellan show abstracthide abstract The paper presents an architecture for connecting annotated linguistic data with a computational gram-mar system. Pivotal to the architecture is an annota-tional interlingua – called the Construction Labeling system (CL) - which is notationally very simple, de-scriptively ﬁnegrained, cross-typologically applica-ble, and formally well-deﬁned enough to map to a state-of-the-art computational model of grammar. In the present instantiation of the architecture, the com-putational grammar is an HPSG-based system called TypeGram. Underlying the architecture is a research program of enhancing the interconnectivity between linguistic analytic subsystems such as grammar for-malisms and text annotation systems.
23	An Annotation Schema for Preposition Senses in German Antje Müller, Olaf Hülscher, Claudia Roch, Katja Kesselmeier, Tobias Stadtfeld, Jan Strunk and Tibor Kiss show abstracthide abstract Prepositions are highly polysemous. Yet, little effort has been spent to develop language-speciﬁc annotation schemata for preposition senses to systematically represent and analyze the polysemy of prepositions in large corpora. In this paper, we present an annotation schema for preposition senses in German. The annotation schema includes a hierarchical taxonomy and also allows multiple annotations for individual tokens. It is based on an analysis of usage-based dictionaries and grammars and has been evaluated in an inter-annotator-agreement study.
24	OTTO: A Transcription and Management Tool for Historical Texts Stefanie Dipper, Lara Kresse, Martin Schnurrenberger and Seong-Eun Cho show abstracthide abstract This paper presents OTTO, a transcription tool designed for diplomatic transcription of historical language data. The tool supports easy and fast typing and instant rendering of transcription in order to gain a look as close to the original manuscript as possible. In addition, the tool provides support for the management of transcription projects which involve distributed, collaborative working of multiple parties on collections of documents.
25	Multimodal Annotation of Conversational Data Philippe Blache, Roxane Bertrand, Emmanuel Bruno, Brigitte Bigi, Robert Espesser, Gaelle Ferre, Mathilde Guardiola, Daniel Hirst, Ning Tan, Edlira Cela, Jean-Claude Martin, Stéphane Rauzy, Mary-Annick Morel, Elisabeth Murisasco and Irina Nesterenko show abstracthide abstract We propose in this paper a broad-coverage approach for multimodal annotation of conversational data. Large annotation projects addressing the question of multimodal annotation bring together many different kinds of information from different domains, with different levels of granularity. We present in this paper the ﬁrst results of the OTIM project aiming at developing conventions and tools for multimodal annotation.
26	Combining Parallel Treebanks and Geo-Tagging Martin Volk, Anne Goehring and Torsten Marek show abstracthide abstract This paper describes a new kind of semantic annotation in parallel treebanks. We build French-German parallel treebanks of mountaineering reports, a text genre that abounds with geographical names which we classify and ground with reference to a large gazetteer of Swiss toponyms. We discuss the challenges in obtaining a high recall and precision in automatic grounding, and sketch how we represent the grounding information in our treebank.
27	Challenges of Cheap Resource Creation Jirka Hana and Anna Feldman show abstracthide abstract We describe the challenges of resource creation for a resource-light system for morphological tagging of fusional languages (Feldman and Hana, 2010). The constraints on resources (time, expertise, and money) introduce challenges that are not present in development of morphological tools and corpora in the usual, resource intensive way.
28	Discourse Relation Conﬁgurations in Turkish and an Annotation Environment Berﬁn Aktaş, Cem Bozşahin and Deniz Zeyrek show abstracthide abstract In this paper, we describe an annotation environment developed for the marking of discourse structures in Turkish, and the kinds of discourse relation conﬁgurations that led to its design.
29	An Overview of the CRAFT Concept Annotation Guidelines Michael Bada, Miriam Eckert, Martha Palmer and Lawrence Hunter show abstracthide abstract We present our concept-annotation guidelines for an large multi-institutional effort to create a gold-standard manually annotated corpus of full-text biomedical journal articles. We are semantically annotating these documents with the full term sets of eight large biomedical ontologies and controlled terminologies ranging from approximately 1,000 to millions of terms, and, using these guidelines, we have been able to perform this extremely challenging task with a high degree of interannotator agreement. The guidelines have been designed to be able to be used with any terminology employed to semantically annotate concept mentions in text and are available for external use.
30	Syntactic Tree Queries in Prolog Gerlof Bouma show abstracthide abstract In this paper, we argue for and demonstrate the use of Prolog as a tool to query annotated corpora. We present a case study based on the German TüBa-D/Z Treebank to show that ﬂexible and efﬁcient corpus querying can be started with a minimal amount of effort. We end this paper with a brief discussion of performance, that suggests that the approach is both fast enough and scalable.
31	An Integrated Tool for Annotating Historical Corpora Pablo Picasso Feliciano de Faria, Fabio Natanael Kepler and Maria Clara Paixão de Sousa show abstracthide abstract E-Dictor is a tool for encoding, applying levels of editions, and assigning part-of-speech tags to ancient texts. In short, it works as a WYSIWYG interface to encode text in XML format. It comes from the experience during the building of the Tycho Brahe Parsed Corpus of Historical Portuguese and from consortium activities with other research groups. Preliminary results show a decrease of at least 50% on the overall time taken on the editing process.
32	The Revised Arabic PropBank Wajdi Zaghouani, Mona Diab, Aous Mansouri, Sameer Pradhan and Martha Palmer show abstracthide abstract The revised Arabic PropBank (APB) reﬂects a number of changes to the data and the process of PropBanking. Several changes stem from Treebank revisions, and an automatic process was put in place to map existing annotation to the new trees. We have revised the original 493 Frame Files from the Pilot APB and added 1462 new ﬁles for a total of 1955 Frame Files with 2446 framesets. In addition to a heightened attention to sense distinctions this cycle includes a greater attempt to address complicated predicates such as light verb constructions and multi-word expressions. New tools facilitate the data tagging and also simplify frame creation.

Friday, July 16, 2010

08:50–10:30

Session IV

08:50–09:15	PackPlay: Mining Semantic Data in Collaborative Games Nathan Green, Paul Breimyer, Vinay Kumar and Nagiza Samatova show abstracthide abstract Building training data is labor-intensive and presents a major obstacle to advancing machine learning technologies such as machine translators, named entity recognizers (NER), part-of-speech taggers, etc. Training data are often specialized for a particular language or Natural Language Processing (NLP) task. Knowledge captured by a speciﬁc set of training data is not easily transferable, even to the same NLP task in another language. Emerging technologies, such as social networks and serious games, offer a unique opportunity to change how we construct training data. While collaborative games have been used in information retrieval, it is an open issue whether users can contribute accurate annotations in a collaborative game context for a problem that requires an exact answer, such as games that would create named entity recognition training data. We present PackPlay, a collaborative game framework that empirically shows players’ ability to mimic annotation accuracy and thoroughness seen in gold standard annotated corpora.
09:15–09:40	A Proposal for a Conﬁgurable Silver Standard Udo Hahn, Katrin Tomanek, Elena Beisswanger and Erik Faessler show abstracthide abstract Among the many proposals to promote alternatives to costly to create gold standards, just recently the idea of a fully automatically, and thus cheaply, to set up silver standard has been launched. However, the current construction policy for such a silver standard requires crucial parameters (such as similarity thresholds and agreement cut-offs) to be set a priori, based on extensive testing though, at corpus compile time. Accordingly, such a corpus is static, once it is released. We here propose an alternative policy where silver standards can be dynamically optimized and customized on demand (given a speciﬁc goal function) using a gold standard as an oracle.
09:40–10:05	A Hybrid Model for Annotating Named Entity Training Corpora Robert Voyer, Valerie Nygaard, Will Fitzgerald and Hannah Copperman show abstracthide abstract In this paper, we present a two-phase, hybrid model for generating training data for Named Entity Recognition systems. In the ﬁrst phase, a trained annotator labels all named entities in a text irrespective of type. In the second phase, naïve crowdsourcing workers complete binary judgment tasks to indicate the type(s) of each entity. Decomposing the data generation task in this way results in a ﬂexible, reusable corpus that accommodates changes to entity type taxonomies. In addition, it makes efﬁcient use of precious trained annotator resources by leveraging highly available and cost effective crowdsourcing worker pools in a way that does not sacriﬁce quality.
10:05–10:30	Anatomy of Annotation Schemes: Mapping to GrAF Nancy Ide and Harry Bunt show abstracthide abstract In this paper, we apply the annotation scheme design methodology deﬁned in (Bunt, 2010) and demonstrate its use for generating a mapping from an existing annotation scheme to a representation in GrAF format. By way of illustration, we apply the mapping strategy to annotations from ISO-TimeML (Mani et al., 2004), PropBank (Palmer et al., 2005), and FrameNet (Baker et al., 1998).

10:30–11:00

Break

11:00–12:40

Session V

11:00–11:25	Annotating Participant Reference in English Spoken Conversation John Niekrasz and Johanna D. Moore show abstracthide abstract In conversational language, references to people (especially to the conversation participants, e.g., I, you, and we) are an essential part of many expressed meanings. In most conversational settings, however, many such expressions have numerous potential meanings, are frequently vague, and are highly dependent on social and situational context. This is a signiﬁcant challenge to conversational language understanding systems — one which has seen little attention in annotation studies. In this paper, we present a method for annotating verbal reference to people in conversational speech, with a focus on reference to conversation participants. Our goal is to provide a resource that tackles the issues of vagueness, ambiguity, and contextual dependency in a nuanced yet reliable way, with the ultimate aim of supporting work on summarization and information extraction for conversation.
11:25–11:50	Design and Evaluation of Shared Prosodic Annotation for Spontaneous French Speech: From Expert Knowledge to Non-Expert Annotation Anne Lacheret-Dujour, Nicolas Obin and Mathieu Avanzi show abstracthide abstract In the area of large French speech corpora, there is a demonstrated need for a common prosodic notation system allowing for easy data exchange, comparison, and automatic annotation. The major questions are: (1) how to develop a single simple scheme of prosodic transcription which could form the basis of guidelines for non-expert manual annotation (NEMA), used for linguistic teaching and research; (2) based on this NEMA, how to establish reference prosodic corpora (RPC) for different discourse genres (Cresti and Moneglia, 2005); (3) how to use the RPC to develop corpus-based learning methods for automatic prosodic labelling in spontaneous speech (Buhman et al., 2002; Avanzi, et al. 2010). This paper presents two pilot experiments conducted with a consortium of 15 French experts in prosody in order to provide a prosodic transcription framework (transcription methodology and transcription reliability measures) and to establish reference prosodic corpora in French
11:50–12:15	Depends on What the French Say - Spoken Corpus Annotation With and Beyond Syntactic Functions José Deulofeu, Lucie Duffort, Kim Gerdes, Sylvain Kahane and Paola Pietrandrea show abstracthide abstract We present a syntactic annotation scheme for spoken French that is currently used in the Rhapsodie project. This annotation is dependency-based and includes coordination and disﬂuency as analogously encoded types of paradigmatic phenomena. Furthermore, we attempt a thorough deﬁnition of the discourse units re-quired by the systematic annotation of other phenomena beyond usual sentence boundaries, which are typical for spoken language. This includes so called "macrosyntactic" phenomena such as dislocation, parataxis, insertions, grafts, and epexegesis.
12:15–12:40	The Annotation Scheme of the Turkish Discourse Bank and An Evaluation of Inconsistent Annotations Deniz Zeyrek, Işin Demirşahin, Ayişiǧi Sevdik-Çalli, Hale Ögel Balaban, Ihsan Yalçinkaya and Ümit Deniz Turan show abstracthide abstract In this paper, we report on the annotation procedures we developed for annotating the Turkish Discourse Bank (TDB), an effort that extends the Penn Discourse Tree Bank (PDTB) annotation style by using it for annotating Turkish discourse. After a brief introduction to the TDB, we describe the annotation cycle and the annotation scheme we developed, deﬁning which parts of the scheme are an extension of the PDTB and which parts are different. We provide inter-coder reliability tests on the ﬁrst and second arguments of some connectives and discuss the most important sources of disagreement among annotators.

12:40–13:00

Closing remarks

⇑ WS4: BioNLP2010

Workshop on Biomedical Natural Language Processing
July 15
Venue A, Hall IX
Chairs: K. Bretonnel Cohen, Dina Demner-Fushman, Sophia Ananiadou, John Pestian, Jun’ichi Tsujii and Bonnie Webber
Homepage

Thursday, July 15, 2010

9:00–9:15

Opening Remarks

9:15–10:30

Session 1: Extraction

9:15–9:40	Two Strong Baselines for the BioNLP 2009 Event Extraction Task Andreas Vlachos show abstracthide abstract This paper presents two strong baselines for the BioNLP 2009 shared task on event extraction. First we re-implement a rule-based approach which allows us to explore the task and the effect of domain-adapted parsing on it. We then replace the rule-based component with support vector machine classiﬁers and achieve performance near the state-of-the-art without using any external resources. The good performances achieved and the relative simplicity of both approaches make them reproducible baselines. We conclude with suggestions for future work with respect to the task representation.
9:40–10:05	Recognizing Biomedical Named Entities Using Skip-Chain Conditional Random Fields Jingchen Liu, Minlie Huang and Xiaoyan Zhu show abstracthide abstract Linear-chain Conditional Random Fields (CRF) has been applied to perform the Named Entity Recognition (NER) task in many biomedical text mining and information extraction systems. However, the linear-chain CRF cannot capture long distance dependency, which is very common in the biomedical literature. In this paper, we propose a novel study of capturing such long distance dependency by deﬁning two principles of constructing skip-edges for a skip-chain CRF: linking similar words and linking words having typed dependencies. The approach is applied to recognize gene/protein mentions in the literature. When tested on the BioCreAtIvE II Gene Mention dataset and GENIA corpus, the approach contributes signiﬁcant improvements over the linear-chain CRF. We also present in-depth error analysis on inconsistent labeling and study the inﬂuence of the quality of skip edges on the labeling performance.
10:05–10:30	Event Extraction for Post-Translational Modiﬁcations Tomoko Ohta, Sampo Pyysalo, Makoto Miwa, Jin-Dong Kim and Jun’ichi Tsujii show abstracthide abstract We consider the task of automatically extracting post-translational modiﬁcation events from biomedical scientiﬁc publications. Building on the success of event extraction for phosphorylation events in the BioNLP’09 shared task, we extend the event annotation approach to four major new post-transitional modiﬁcation event types. We present a new targeted corpus of 157 PubMed abstracts annotated for over 1000 proteins and 400 post-translational modiﬁcation events identifying the modiﬁed proteins and sites. Experiments with a state-of-the-art event extraction system show that the events can be extracted with 52% precision and 36% recall (42% Fscore), suggesting remaining challenges in the extraction of the events. The annotated corpus is freely available in the BioNLP’09 shared task format at the GENIA project homepage.

10:30–11:00

Morning coffee break

11:00–12:30

Session 2

11:00–12:00

Keynote speaker, W. John Wilbur: Text Mining and Intelligence

W. John Wilbur

12:05–12:30

Scaling up Biomedical Event Extraction to the Entire PubMed

Jari Björne, Filip Ginter, Sampo Pyysalo, Jun’ichi Tsujii and Tapio Salakoski

show abstract

12:30–14:00

Lunch break

14:00–14:50

Session 3: Foundations

14:00–14:25

A Comparative Study of Syntactic Parsers for Event Extraction

Makoto Miwa, Sampo Pyysalo, Tadayoshi Hara and Jun’ichi Tsujii

show abstract

14:25–14:50

Arguments of Nominals in Semantic Interpretation of Biomedical Text

Halil Kilicoglu, Marcelo Fiszman, Graciela Rosemblat, Sean Marimpietri and Thomas Rindﬂesch

show abstract

14:50–15:15

Session 4: High-level tasks

14:50–15:15

Improving Summarization of Biomedical Documents Using Word Sense Disambiguation

Laura Plaza, Mark Stevenson and Alberto Díaz

show abstract

15:30–16:00

Afternoon coffee break

16:00–16:50

Session 4: High-level tasks, continued

16:00–16:25

Cancer Stage Prediction Based on Patient Online Discourse

Mukund Jha and Noemie Elhadad

show abstract

16:25–16:50

An Exploration of Mining Gene Expression Mentions and Their Anatomical Locations from Biomedical Text

Martin Gerner, Goran Nenadic and Casey M. Bergman

show abstract

16:50–17:00

Poster Boaster Session and Conclusions

17:00–17:30

Poster Session

37	Exploring Surface-Level Heuristics for Negation and Speculation Discovery in Clinical Texts Emilia Apostolova and Noriko Tomuro show abstracthide abstract We investigate the automatic identiﬁcation of negated and speculative statements in biomedical texts, focusing on the clinical domain. Our goal is to evaluate the performance of simple, Regex-based algorithms that have the advantage of low computational cost, simple implementation, and avoid the problems associated with the accurate computation of deep linguistic features of idiosyncratic clinical texts. The performance of the NegEx algorithm with an additional set of Regex-based rules reveals promising results (evaluated on the BioScope corpus). Current and future work focuses on a bootstrapping algorithm for the discovery of new rules from unannotated clinical texts.
38	Disease Mention Recognition with Speciﬁc Features Md. Faisal Mahbub Chowdhury and Alberto Lavelli show abstracthide abstract Despite an increasing amount of research on biomedical named entity recognition, there has been not enough work done on disease mention recognition. Difﬁculty of obtaining adequate corpora is one of the key reasons which hindered this particular research. Previous studies argue that correct identiﬁcation of disease mentions is the key issue for further improvement of the disease-centric knowledge extraction tasks. In this paper, we present a machine learning based approach that uses a feature set tailored for disease mention recognition and outperforms the state-of-the-art results. The paper also discusses why a feature set for the well studied gene/protein mention recognition task is not necessarily equally effective for other biomedical semantic types such as diseases.
39	Extraction of Disease-Treatment Semantic Relations from Biomedical Sentences Oana Frunza and Diana Inkpen show abstracthide abstract This paper describes our study on identifying semantic relations that exist between diseases and treatments in biomedical sentences. We focus on three semantic relations: Cure, Prevent, and Side Effect. The contributions of this paper consists in the fact that better results are obtained com-pared to previous studies, and the fact that our research settings allow the integration of biomedical and medical knowledge. We obtain 98.55% F-measure for the Cure relation, 100% F-measure for the Prevent relation, and 88.89% F-measure for the Side Effect relation.
40	Identifying the Information Structure of Scientiﬁc Abstracts: An Investigation of Three Different Schemes Yufan Guo, Anna Korhonen, Maria Liakata, Ilona Silins, Lin Sun and Ulla Stenius show abstracthide abstract Many practical tasks require accessing speciﬁc types of information in scientiﬁc literature; e.g. information about the objective, methods, results or conclusions of the study in question. Several schemes have been developed to characterize such information in full journal papers. Yet many tasks focus on abstracts instead. We take three schemes of different type and granularity (those based on section names, argumentative zones and conceptual structure of documents) and investigate their applicability to biomedical abstracts. We show that even for the ﬁnest-grained of these schemes, the majority of categories appear in abstracts and can be identiﬁed relatively reliably using machine learning. We discuss the impact of our results and the need for subsequent task-based evaluation of the schemes.
41	Reconstruction of Semantic Relationships from Their Projections in Biomolecular Domain Juho Heimonen, Jari Björne and Tapio Salakoski show abstracthide abstract The extraction of nested, semantically rich relationships of biological entities has recently gained popularity in the biomedical text mining community. To move toward this objective, a method is proposed for reconstructing original semantic relationship graphs from projections, where each node and edge is mapped to the representative of its equivalence class, by determining the relationship argument combinations that represent real relationships. It generalises the limited postprocessing step of the best-performing system in the BioNLP’09 Shared Task on Event Extraction and hence extends this extraction method to arbitrarily deep relationships with unrestricted primary argument combinations. The viability of the method is shown by successfully extracting nested relationships in BioInfer and the corpus of the BioNLP’09 Shared Task on Event Extraction. The reported results, to the best of our knowledge, are the ﬁrst for the nested relationships in BioInfer on a task in which only named entities are given.
42	Towards Internet-Age Pharmacovigilance: Extracting Adverse Drug Reactions from User Posts in Health-Related Social Networks Robert Leaman, Laura Wojtulewicz, Ryan Sullivan, Annie Skariah, Jian Yang and Graciela Gonzalez show abstracthide abstract Adverse reactions to drugs are among the most common causes of death in industrialized nations. Expensive clinical trials are not sufﬁcient to uncover all of the adverse reactions a drug may cause, necessitating systems for post-marketing surveillance, or pharmacovigilance. These systems have typically relied on voluntary reporting by health care professionals. However, self-reported patient data has become an increasingly important resource, with efforts such as MedWatch from the FDA allowing reports directly from the consumer. In this paper, we propose mining the relationships between drugs and adverse reactions as reported by the patients themselves in user comments to health-related websites. We evaluate our system on a manually annotated set of user comments, with promising performance. We also report correlations between the frequency of adverse drug reactions found by our system in unlabeled data and the frequency of documented adverse drug reactions. We conclude that user comments pose a signiﬁcant natural language processing challenge, but do contain useful extractable information which merits further exploration.
43	Semantic Role Labeling of Gene Regulation Events: Preliminary Results Roser Morante show abstracthide abstract This abstract describes work in progress on semantic role labeling of gene regulation events. We present preliminary results of a supervised semantic role labeler that has been trained and tested on the GREC corpus.
44	Ontology-Based Extraction and Summarization of Protein Mutation Impact Information Nona Naderi and René Witte show abstracthide abstract Large effort has been expended in the study of modiﬁcations in genetic material, known as mutations. Mutations can have far-ranging consequences in medical, agricultural, and industrial domains; a signiﬁcant and increasing number of publications describe the impacts of speciﬁc mutations. As manually curated databases, like the Protein Mutant Database (PMD) cannot keep up with the rapid pace of mutation research, NLP methods for extracting mutation information from the bibliome has become an important new research area within bio-NLP. A large number of systems now attempt to detect mutation information and extract them into structured formats. However, while signiﬁcant progress has been made with respect to mutation detection, the automated extraction of the impacts of these mutations has so far not been targeted. In this paper, we describe the ﬁrst work to automatically summarize impact information from protein mutations. Our approach is based on populating an OWL-DL ontology with impact information, which can then be queried to provide structured information, including a summary.
45	Extracting Distinctive Features of Swine (H1N1) Flu through Data Mining Clinical Documents Heekyong Park and Jinwook Choi show abstracthide abstract Early recognition of distinguishing pat-terns of a novel pandemic disease is important. We introduce a methodological approach based on popular data mining techniques to extract key features and temporal pat-terns of swine (h1n1) ﬂu that is discriminated from swine ﬂu like symptoms.
46	Towards Event Extraction from Full Texts on Infectious Diseases Sampo Pyysalo, Tomoko Ohta, Han-Cheol Cho, Dan Sullivan, Chunhong Mao, Bruno Sobral, Jun’ichi Tsujii and Sophia Ananiadou show abstracthide abstract Event extraction approaches based on expressive structured representations of extracted information have been a signiﬁcant focus of research in recent biomedical natural language processing studies. However, event extraction efforts have so far been limited to publication abstracts, with most studies further considering only the speciﬁc transcription factor-related subdomain of molecular biology of the GENIA corpus. To establish the broader relevance of the event extraction approach and proposed methods, it is necessary to expand on these constraints. In this study, we propose an adaptation of the event extraction approach to a subdomain related to infectious diseases and present analysis and initial experiments on the feasibility of event extraction from domain full text publications.
47	Applying the TARSQI Toolkit to Augment Text Mining of EHRs Amber Stubbs and Benjamin Harshﬁeld show abstracthide abstract We present a preliminary attempt to apply the TARSQI Toolkit to the medical domain, speciﬁcally electronic health records, for use in answering temporally motivated questions.
48	Integration of Static Relations to Enhance Event Extraction from Text Soﬁe Van Landeghem, Sampo Pyysalo, Tomoko Ohta and Yves Van de Peer show abstracthide abstract As research on biomedical text mining is shifting focus from simple binary relations to more expressive event representations, extraction performance drops due to the increase in complexity. Recently introduced data sets speciﬁcally targeting static relations between named entities and domain terms have been suggested to enable a better representation of the biological processes underlying annotated events and opportunities for addressing their complexity. In this paper, we present the ﬁrst study of integrating these static relations with event data with the aim of enhancing event extraction performance. While obtaining promising results, we will argue that an event extraction framework will beneﬁt most from this new data when taking intrinsic differences between various event types into account.

⇑ WS6: NLPLing 2010

NLP and Linguistics: Finding the Common Ground
July 16
Venue A, Hall IX
Chairs: Fei Xia, William Lewis and Lori Levin
Homepage

Friday, July 16, 2010

8:45–9:50

Opening Remarks and Invited Talk

8:45–8:50	Opening Remarks
8:50–9:50	The Human Language Project: Uniting Computational Linguistics with Documentary Linguistics Steven Bird

9:50–10:30

Paper Session 1

9:50–10:10

Modeling and Encoding Traditional Wordlists for Machine Applications

Shakthi Poornima and Jeff Good

show abstract

10:10–10:30

Evidentiality for Text Trustworthiness Detection

Su Qi, Huang Chu-Ren and Chen Kai-yun

show abstract

10:30–11:00

Morning break

11:00–12:00

Panel Session 1: NLP helps Linguistics

Presentation and discussion from panelists Hal Daumé III, Alexis Dimitriadis, Erhard Hinrichs and Dipti Misra Sharma
On the Role of NLP in Linguistics Dipti Misra Sharma
Matching Needs and Resources: How NLP Can Help Theoretical Linguistics Alexis Dimitriadis

12:00–12:40

Paper Session 2

12:00–12:20

Grammar-Driven versus Data-Driven: Which Parsing System is More Affected by Domain Shifts?

Barbara Plank and Gertjan van Noord

show abstract

12:20–12:40

A Cross-Lingual Induction Technique for German Adverbial Participles

Sina Zarrieß, Aoife Cahill, Jonas Kuhn and Christian Rohrer

show abstract

12:40–14:10

Lunch

14:10–15:30

Paper Session 3

14:10–14:30	You Talking to Me? A Predictive Model for Zero Auxiliary Constructions Andrew Caines and Paula Buttery show abstracthide abstract As a consequence of the established practice to prefer training data obtained from written sources, NLP tools encounter problems in handling data from the spoken domain. However, accurate models of spoken data are increasingly in demand for naturalistic speech generation and machine translations in speech-like contexts (such as chat windows and SMS). There is a widely held assumption in the linguistic ﬁeld that spoken language is an impoverished form of written language. However, we show that spoken data is not unpredictably irregular and that language models can beneﬁt from detailed consideration of spoken language features. This paper considers one speciﬁc construction which is largely restricted to the spoken domain - the ’zero auxiliary’ - and makes a predictive model of that construction for native speakers of British English. The model can predict zero auxiliary occurrence in the BNC with 96.9% accuracy. We will demonstrate how this model can be integrated into existing parsing tools, increasing the number of successful parses for this zero auxiliary construction by around 30%, and thus improving the performance of NLP applications which rely on parsing.
14:30–14:50	Cross-Lingual Variation of Light Verb Constructions: Using Parallel Corpora and Automatic Alignment for Linguistic Research Tanja Samardzic and Paola Merlo show abstracthide abstract Cross-lingual parallelism and small-scale language variation have recently become subject of research in both computational and theoretical linguistics. In this article, we use a parallel corpus and an automatic aligner to study English light verb constructions and their German translations. We show that parallel corpus data can provide new empirical evidence for better understanding the properties of light verbs. We also study the inﬂuence that the identiﬁed properties of light verb constructions have on the quality of their automatic alignment in a parallel corpus. We show that, even though characterised by limited compositionality, these constructions can be aligned better than fully compositional phrases, due to an interaction between the type of light verb construction and its frequency.
14:50–15:10	No Sentence is Too Confusing to Ignore Paul Cook and Suzanne Stevenson show abstracthide abstract We consider sentences of the form "No X is too Y to Z", in which X is a noun phrase, Y is an adjective phrase, and Z is a verb phrase. Such constructions are ambiguous, with two possible (and opposite!) interpretations, roughly meaning either that "Every X Zs", or that "No X Zs". The interpretations have been noted to depend on semantic and pragmatic factors. We show here that automatic disambiguation of this pragmatically complex construction can be largely achieved by using features of the lexical semantic properties of the verb (i.e., Z) participating in the construction. We discuss our experimental ﬁndings in the context of construction grammar, which suggests a possible account of this phenomenon.
15:10–15:30	Consonant Co-occurrence in Stems Across Languages: Automatic Analysis and Visualization of a Phonotactic Constraint Thomas Mayer, Christian Rohrdantz, Frans Plank, Peter Bak, Miriam Butt and Daniel A. Keim show abstracthide abstract In this paper, we explore the phenomenon of Similar Place Avoidance (SPA), according to which successive consonants within stems sharing the same place of articulation are avoided. This principle has recently been hypothesized as a universal tendency although evidence from only a few languages scattered across the world has been considered. Using methods taken from the ﬁeld of Visual Analytics, which have demonstrably been shown to help with understanding complex interactions across large data sets, we investigated a large crosslinguistic lexical database (comprising data on more than 4,500 languages) and found that a universal tendency can indeed be maintained.

15:30–16:00

Afternoon break

16:00–17:00

Panel Session 2: Linguistics helps NLP

Presentation and discussion from panelists Julia Hockenmaier, Eduard Hovy and Owen Rambow
Injecting Linguistics into NLP through Annotation Eduard Hovy

17:00–17:30

Group discussion and closing

⇑ WS11: DANLP 2010

Domain Adaptation for Natural Language Processing
July 15
Venue A, Hall IV
Chairs: Hal Daumé III, Tejaswini Deoskar, David McClosky, Barbara Plank and Jörg Tiedemann
Homepage

Thursday 15 July 2010

9:15–9:30

Opening

Barbara Plank

9:30–10:30

Invited Talk: Semi-supervised Domain Adaptation: From Practice to Theory

John Blitzer

10:30–11:00

Morning Break

11:00–12:25

Session I

11:00–11:25	Adaptive Parameters for Entity Recognition with Perceptron HMMs Massimiliano Ciaramita and Olivier Chapelle show abstracthide abstract We discuss the problem of model adaptation for the task of named entity recognition with respect to the variation of label distributions in data from different domains. We investigate an adaptive extension of the sequence perceptron, where the adaptive component includes parameters estimated from unlabelled data in combination with background knowledge in the form of gazetteers. We apply this idea empirically on adaptation experiments involving two newswire datasets from different domains and compare with other popular methods such as self training and structural correspondence learning.
11:30–11:55	Context Adaptation in Statistical Machine Translation Using Models with Exponentially Decaying Cache Jörg Tiedemann show abstracthide abstract We report results from a domain adaptation task for statistical machine translation (SMT) using cache-based adaptive language and translation models. We apply an exponential decay factor and integrate the cache models in a standard phrase-based SMT decoder. Without the need for any domain-speciﬁc resources we obtain a 2.6% relative improvement on average in BLEU scores using our dynamic adaptation procedure.
12:00–12:25	Domain Adaptation to Summarize Human Conversations Oana Sandu, Giuseppe Carenini, Gabriel Murray and Raymond Ng show abstracthide abstract We are interested in improving the sum-marization of conversations by using domain adaptation. Since very few email corpora have been annotated for summa-rization purposes, we attempt to leverage the labeled data available in the multi-party meetings domain for the summari-zation of email threads. In this paper, we compare several approaches to super-vised domain adaptation using out-of-domain labeled data, and also try to use unlabeled data in the target domain through semi-supervised domain adapta-tion. From the results of our experiments, we conclude that with some in-domain labeled data, training in-domain with no adaptation is most effective, but that when there is no labeled in-domain data, domain adaptation algorithms such as structural correspondence learning can improve summarization.

12:30–14:00

Lunch

14:00–15:25

Session II

14:00–14:25	Exploring Representation-Learning Approaches to Domain Adaptation Fei Huang and Alexander Yates show abstracthide abstract Most supervised language processing systems show a signiﬁcant drop-off in performance when they are tested on text that comes from a domain signiﬁcantly different from the domain of the training data. Sequence labeling systems like part-of-speech taggers are typically trained on newswire text, and in tests their error rate on, for example, biomedical data can triple, or worse. We investigate techniques for building open-domain sequence labeling systems that approach the ideal of a system whose accuracy is high and constant across domains. In particular, we investigate unsupervised techniques for representation learning that provide new features which are stable across domains, in that they are predictive in both the training and out-of-domain test data. In experiments, our novel techniques reduce error by as much as 29% relative to the previous state of the art on out-of-domain text.
14:30–14:55	Using Domain Similarity for Performance Estimation Vincent Van Asch and Walter Daelemans show abstracthide abstract Many natural language processing (NLP) tools exhibit a decrease in performance when they are applied to data that is linguistically different from the corpus used during development. This makes it hard to develop NLP tools for domains for which annotated corpora are not available. This paper explores a number of metrics that attempt to predict the cross-domain performance of an NLP tool through statistical inference. We apply different similarity metrics to compare different domains and investigate the correlation between similarity and accuracy loss of NLP tool. We ﬁnd that the correlation between the performance of the tool and the similarity metric is linear and that the latter can therefore be used to predict the performance of an NLP tool on out-of-domain data. The approach also provides a way to quantify the difference between domains.
15:00–15:25	Self-Training without Reranking for Parser Domain Adaptation and Its Impact on Semantic Role Labeling Kenji Sagae show abstracthide abstract We compare self-training with and without reranking for parser domain adaptation, and examine the impact of syntactic parser adaptation on a semantic role labeling system. Although self-training without reranking has been found not to improve in-domain accuracy for parsers trained on the WSJ Penn Treebank, we show that it is surprisingly effective for parser domain adaptation. We also show that simple self-training of a syntactic parser improves out-of-domain accuracy of a semantic role labeler.

15:30–16:00

Afternoon Break

16:00–16:55

Session III

16:00–16:25

Domain Adaptation with Unlabeled Data for Dialog Act Tagging

Anna Margolis, Karen Livescu and Mari Ostendorf

show abstract

16:30–16:55

Frustratingly Easy Semi-Supervised Domain Adaptation

Hal Daumé III, Abhishek Kumar and Avishek Saha

show abstract

17:00–17:45

Panel Discussion

John Blitzer, Walter Daelemans, Hal Daumé III, Jing Jiang and Khalil Sima’an

⇑ WS8: TextGraphs-5

Graph-based Methods for Natural Language Processing
July 16
Venue A, Hall IV
Chairs: Carmen Banea, Alessandro Moschitti, Swapna Somasundaran and Fabio Massimo Zanzotto
Homepage

Friday July 16, 2010

09:00–09:10

Welcome to TextGraphs 5

09:10–10:30

Session 1: Lexical Clustering and Disambiguation

09:10–09:30	Graph-based Clustering for Computational Linguistics: a Survey Zheng Chen and Heng Ji show abstracthide abstract In this survey we overview graph-based clus-tering and its applications in computational linguistics. We summarize graph-based clus-tering as a ﬁve-part story: hypothesis, model-ing, measure, algorithm and evaluation. We then survey three typical NLP problems in which graph-based clustering approaches have been successfully applied. Finally, we comment on the strengths and weaknesses of graph-based clustering and envision that graph-based clustering is a promising solution for some emerging NLP problems.
09:30–09:50	Towards the Automatic Creation of a Wordnet from a Term-based Lexical Network Hugo Gonçalo Oliveira and Paulo Gomes show abstracthide abstract The work described here aims to create a wordnet automatically from a semantic network based on terms. So, a clustering procedure is ran over a synonymy network, in order to obtain synsets. Then, the term arguments of each relational triple are assigned to the latter, originating a wordnet. Experiments towards our goal are reported and their results validated.
09:50–10:10	An Investigation on the Inﬂuence of Frequency on the Lexical Organization of Verbs Daniel German, Aline Villavicencio and Maity Siqueira show abstracthide abstract This work extends the study of Germann et al. (2010) in investigating the lexical organization of verbs. Particularly, we look at the inﬂuence of frequency on the process of lexical acquisition and use. We examine data obtained from psycholinguistic action naming tasks performed by children and adults (speakers of Brazilian Portuguese), and analyze some characteristics of the verbs used by each group in terms of similarity of content, using Jaccard‟s coefﬁcient, and of topology, using graph theory. The experiments suggest that younger children tend to use more frequent verbs than adults to describe events in the world.
10:10–10:30	Robust and Efﬁcient Page Rank for Word Sense Disambiguation Diego De Cao, Roberto Basili, Matteo Luciani, Francesco Mesiano and Riccardo Rossi show abstracthide abstract Graph-based methods that are en vogue in the social network analysis area, such as centrality models, have been recently applied to linguistic knowledge bases, including unsupervised Word Sense Disambiguation. Although the achievable accuracy is rather high, the main drawback of these methods is the high computational demanding whenever applied to the large scale sense repositories. In this paper an adaptation of the PageRank algorithm recently proposed for Word Sense Disambiguation is presented that preserves the reachable accuracy while signiﬁcantly reducing the requested processing time. Experimental analysis over well-known benchmarks will be presented in the paper and the results conﬁrm our hypothesis.

10:30–11:00

Coffee Break

11:00–11:40

Session 2: Clustering Languages and Dialects

11:00–11:20

Hierarchical Spectral Partitioning of Bipartite Graphs to Cluster Dialects and Identify Distinguishing Features

Martijn Wieling and John Nerbonne

show abstract

11:20–11:40

A Character-Based Intersection Graph Approach to Linguistic Phylogeny

Jessica Enright

show abstract

11:40–12:40

Invited Talk

11:40–12:40

Spectral Approaches to Learning in the Graph Domain

Edwin Hancock

show abstract

12:50–13:50

Lunch break

13:50–15:30

Session 3: Lexical Similarity and Its application

13:50–14:10	Cross-lingual Comparison between Distributionally Determined Word Similarity Networks Olof Görnerup and Jussi Karlgren show abstracthide abstract As an initial effort to identify universal and language-speciﬁc factors that inﬂuence the behavior of distributional models, we have formulated a distributionally determined word similarity network model, implemented it for eleven different languages, and compared the resulting networks. In the model, vertices constitute words and two words are linked if they occur in similar contexts. The model is found to capture clear isomorphisms across languages in terms of syntactic and semantic classes, as well as functional categories of abstract discourse markers. Language speciﬁc morphology is found to be a dominating factor for the accuracy of the model.
14:10–14:30	Co-occurrence Cluster Features for Lexical Substitutions in Context Chris Biemann show abstracthide abstract This paper examines the inﬂuence of features based on clusters of co-occurrences for supervised Word Sense Disambiguation and Lexical Substitution. Co-occurrence cluster features are derived from clustering the local neighborhood of a target word in a cooccurrence graph based on a corpus in a completely unsupervised fashion. Clusters can be assigned in context and are used as features in a supervised WSD system. Experiments ﬁtting a strong baseline system with these additional features are conducted on two datasets, showing improvements. Co-occurrence features are a simple way to mimic Topic Signatures (Mart´ınez et al., 2008) without needing to construct resources manually. Further, a system is described that produces lexical substitutions in context with very high precision.
14:30–14:50	Contextually-Mediated Semantic Similarity Graphs for Topic Segmentation Geetu Ambwani and Anthony Davis show abstracthide abstract We present a representation of documents as directed, weighted graphs, modeling the range of inﬂuence of terms within the document as well as contextually determined semantic relatedness among terms. We then show the usefulness of this kind of representation in topic segmentation. Our boundary detection algorithm uses this graph to determine topical coherence and potential topic shifts, and does not require labeled data or training of parameters. We show that this method yields improved results on both concatenated pseudo-documents and on closed-captions for television programs.
14:50–15:10	MuLLinG: MultiLevel Linguistic Graphs for Knowledge Extraction Vincent Archer show abstracthide abstract MuLLinG is a model for knowledge extraction (especially lexical extraction from corpora), based on multilevel graphs. Its aim is to allow large-scale data acquisition, by making it easy to realize automatically, and simple to conﬁgure by linguists with limited knowledge in computer programming. In MuLLinG, each new level represents the information in a different manner (more and more abstract). We also introduce several associated operators, written to be as generic as possible. They are independent of what nodes and edges represent, and of the task to achieve. Consequently, they allow the description of a complex extraction process as a succession of simple graph manipulations. Finally, we present an experiment of collocation extraction using MuLLinG model.
15:10–15:30	Experiments with CST-based Multidocument Summarization Maria Lucia Castro Jorge and Thiago Pardo show abstracthide abstract Recently, with the huge amount of growing information in the web and the little available time to read and process all this information, automatic summaries have become very important resources. In this work, we evaluate deep content selection methods for multidocument summarization based on the CST model (Cross-document Structure Theory). Our methods consider summarization preferences and focus on the overall main problems of multidocument treatment: redundancy, complementarity, and contradiction among different information sources. We also evaluate the impact of the CST model over superﬁcial summarization systems. Our results show that the use of CST model helps to improve informativeness and quality in automatic summaries.

15:30–16:00

Coffee Break

16:00–17:00

Special Session on Opinion Mining

16:00–16:20	Distinguishing between Positive and Negative Opinions with Complex Network Features Diego Raphael Amancio, Renato Fabbri, Osvaldo Novais Oliveira Jr., Maria das Graças Volpe Nunes and Luciano da Fontoura Costa show abstracthide abstract Topological and dynamic features of complex networks have proven to be suitable for capturing text characteristics in recent years, with various applications in natural language processing. In this article we show that texts with positive and negative opinions can be distinguished from each other when represented as complex networks. The distinction was possible by obtaining several metrics of the networks, including the in-degree, out-degree, shortest paths, clustering coefﬁcient, betweenness and global efﬁciency. For visualization, the obtained multidimensional dataset was projected into a 2-dimensional space with the canonical variable analysis. The distinction was quantiﬁed using machine learning algorithms, which allowed an recall of 70\% in the automatic discrimination for the negative opinions, even without attempts to optimize the pattern recognition process.
16:20–16:40	Image and Collateral Text in Support of Auto-annotation and Sentiment Analysis Pamela Zontone, Giulia Boato, Jonathon Hare, Paul Lewis, Stefan Siersdorfer and Enrico Minack show abstracthide abstract We present a brief overview of the way in which image analysis, coupled with associated collateral text, is being used for auto-annotation and sentiment analysis. In particular, we describe our approach to auto-annotation using the graph-theoretic dominant set clustering algorithm and the annotation of images with sentiment scores from SentiWordNet. Preliminary results are given for both, and our planned work aims to explore synergies between the two approaches.
16:40–17:00	Aggregating Opinions: Explorations into Graphs and Media Content Analysis Gabriele Tatzl and Christoph Waldhauser show abstracthide abstract Understanding, as opposed to reading is vital for the extraction of opinions out of a text. This is especially true, as an author’s opinion is not always clearly marked. Finding the overall opinion in a text can be challenging to both human readers and computers alike. Media Content Analysis is a popular method of extracting information out of a text, by means of human coders. We describe the difﬁculties humans have and the process they use to extract opinions and offer a formalization that could help to automate opinion extraction within the Media Content Analysis framework.

17:00–17:40

Session 5: Spectral Approaches

17:00–17:20

Eliminating Redundancy by Spectral Relaxation for Multi-Document Summarization

Fumiyo Fukumoto, Akina Sakai and Yoshimi Suzuki

show abstract

17:20–17:40

Computing Word Senses by Semantic Mirroring and Spectral Graph Partitioning

Martin Fagerlund, Magnus Merkel, Lars Eldén and Lars Ahrenberg

show abstract

17:40–18:00

Final Wrap-up

⇑ WS5: CMCL

Cognitive Modeling and Computational Linguistics
July 15
Venue A, Room VIII
Chair: John Hale
Homepage

Thursday July 15th 2010

9:00–10:30

Language change at multiple levels

9:00–9:30	Using Sentence Type Information for Syntactic Category Acquisition Stella Frank, Sharon Goldwater and Frank Keller show abstracthide abstract In this paper we investigate a new source of information for syntactic category acquisition: sentence type (question, declarative, imperative). Sentence type correlates strongly with intonation patterns in most languages; we hypothesize that these intonation patterns are a valuable signal to a language learner, indicating different syntactic patterns. To test this hypothesis, we train a Bayesian Hidden Markov Model (and variants) on child-directed speech. We ﬁrst show that simply training a separate model for each sentence type decreases performance due to sparse data. As an alternative, we propose two new models based on the BHMM in which sentence type is an observed variable which inﬂuences either emission or transition probabilities. Both models outperform a standard BHMM on data from English, Cantonese, and Dutch. This suggests that sentence type information available from intonational cues may be helpful for syntactic acquisition cross-linguistically.
9:30–10:00	Did Social Networks Shape Language Evolution? A Multi-Agent Cognitive Simulation David Reitter and Christian Lebiere show abstracthide abstract Natural language as well as other communication forms are constrained by cognitive function and evolved through a social process. Here, we examine whether human memory may be uniquely adapted to the social structures prevalent in groups, speciﬁcally small-world networks. The emergence of domain languages is simulated using an empirically evaluated ACT-R-based cognitive model of agents in a naming game played within communities. Several community structures are examined (grids, trees, random graphs and small-world networks). We present preliminary results from small-scale simulations, showing relative robustness of cognitive models to network structure.
10:00–10:30	Syntactic Adaptation in Language Comprehension Alex Fine, Ting Qian, Florian T. Jaeger and Robert Jacobs show abstracthide abstract In this paper we investigate the manner in which the human language comprehension system adapts to shifts in probability distributions over syntactic structures, given experimentally controlled experience with those structures. We replicate a classic reading experiment, and show that a model that implements a form of Bayesian belief update over the course of the experiment outperforms competing models previously proposed in the psycholinguistics literature. The results suggest that human language comprehension is an instance of ﬂexible probabilistic inferencing, well-suited to the statistical properties of language.

10:30–11:00

Morning break

11:00–12:00

Parsing and memory

11:00–11:30

HHMM Parsing with Limited Parallelism

Tim Miller and William Schuler

show abstract

11:30–12:00

The Role of Memory in Superiority Violation Gradience

Marisa Ferrara Boston

show abstract

12:00–14:00

Lunch break

14:00–15:30

Corpus-based modeling

14:00–14:30	Close = Relevant? The Role of Context in Efﬁcient Language Production Ting Qian and T. Florian Jaeger show abstracthide abstract We formally derive a mathematical model for evaluating the effect of context relevance in language production. The model is based on the principle that distant contextual cues tend to gradually lose their relevance for predicting upcoming linguistic signals. We evaluate our model against a hypothesis of efﬁcient communication (Genzel and Charniak’s Constant Entropy Rate hypothesis). We show that the development of entropy throughout discourses is described signiﬁcantly better by a model with cue relevance decay than by previous models that do not consider context effects.
14:30–15:00	Predicting Cognitively Salient Modiﬁers of the Constitutive Parts of Concepts Gerhard Kremer and Marco Baroni show abstracthide abstract When subjects describe concepts in terms of their characteristic properties, they often produce composite properties, e.g., rabbits are said to have long ears, not just ears. We present a set of simple methods to extract the modiﬁers of composite properties (in particular: parts) from corpora. We achieve our best performance by combining evidence about the association between the modiﬁer and the part both within the context of the target concept and independently of it. We show that this performance is relatively stable across languages (Italian and German) and for production vs. perception of properties.
15:00–15:30	Towards a Data-Driven Model of Eye Movement Control in Reading Mattias Nilsson and Joakim Nivre show abstracthide abstract This paper presents a data-driven model of eye movement control in reading that builds on earlier work using machine learning methods to model saccade behavior. We extend previous work by modeling the time course of eye movements, in addition to where the eyes move. In this model, the initiation of eye movements is delayed as a function of on-line processing difﬁculty, and the decision of where to move the eyes is guided by past reading experience, approximated using machine learning methods. In benchmarking the model against held-out previously unseen data, we show that it can predict gaze durations and skipping probabilities with good accuracy.

15:30–16:00

Afternoon break

16:00–17:00

Information-theoretical approaches

16:00–16:30

Modeling the Noun Phrase versus Sentence Coordination Ambiguity in Dutch: Evidence from Surprisal Theory

Harm Brouwer, Hartmut Fitz and John Hoeks

show abstract

16:30–17:00

Uncertainty Reduction as a Measure of Cognitive Processing Effort

Stefan Frank

show abstract

⇑ WS9: NEWS 2010

Named Entities Workshop
July 16
Venue A, Room VIII
Chairs: A Kumaran and Haizhou Li
Homepage

Friday, July 16, 2010

9:00–10:30

Session 1: Oral

9:00–9:15	Opening Remarks A Kumaran and Haizhou Li
9:15–10:00	Keynote Speech Dan Roth
10:00–10:30	Transliteration Generation and Mining with Limited Training Resources Sittichai Jiampojamarn, Kenneth Dwyer, Shane Bergsma, Aditya Bhargava, Qing Dou, Mi-Young Kim and Grzegorz Kondrak show abstracthide abstract We present DirecTL+: an online discriminative sequence prediction model based on many-to-many alignments, which is further augmented by the incorporation of joint n-gram features. Experimental results show improvement over the results achieved by DirectTL in 2009. We also explore a number of diverse resource-free and language-independent approaches to transliteration mining, which range from simple to sophisticated.

10:30–11:00

Morning Break

11:00–11:40

Session 2: Oral

11:00–11:20

Transliteration Using a Phrase-Based Statistical Machine Translation System to Re-Score the Output of a Joint Multigram Model

Andrew Finch and Eiichiro Sumita

show abstract

11:20–11:40

Transliteration Mining with Phonetic Conﬂation and Iterative Training

Kareem Darwish

show abstract

11:40–12:40

Session 3: Poster Presentation

37	Language Independent Transliteration Mining System Using Finite State Automata Framework Sara Noeman and Amgad Madkour show abstracthide abstract We propose a Named Entities transliteration mining system using Finite State Automata (FSA). We compare the proposed approach with a baseline system that utilizes the Editex technique to measure the length-normalized phonetic based edit distance between the two words. We submitted three standard runs in NEWS2010 shared task and ranked ﬁrst for English to Arabic (WM-EnAr) and obtained an F-measure of 0.915, 0.903, and 0.874 respectively.
38	Reranking with Multiple Features for Better Transliteration Yan Song, Chunyu Kit and Hai Zhao show abstracthide abstract Effective transliteration of proper names via grapheme conversion needs to ﬁnd transliteration patterns in training data, and then generate optimized candidates for testing samples accordingly. However, the top-1 accuracy for the generated candidates cannot be good if the right one is not ranked at the top. To tackle this issue, we propose to rerank the output candidates for a better order using the averaged perceptron with multiple features. This paper describes our recent work in this direction for our participation in NEWS2010 transliteration evaluation. The ofﬁcial results conﬁrm its effectiveness in English-Chinese bidirectional transliteration.
39	Syllable-Based Thai-English Machine Transliteration Chai Wutiwiwatchai and Ausdang Thangthai show abstracthide abstract This article describes the ﬁrst trial on bidirectional Thai-English machine transliteration applied on the NEWS 2010 transliteration corpus. The system relies on segmenting source-language words into syllable-like units, ﬁnding unit’s pronunciations, consulting a syllable transliteration table to form target-language word hypotheses, and ranking the hypotheses by using syllable n-gram. The approach yields 84.2% and 70.4% mean F-scores on English-to-Thai and Thai-to-English transliteration. Discussion on existing problems and future solutions are addressed.
40	English to Indian Languages Machine Transliteration System at NEWS 2010 Amitava Das, Tanik Saikh, Tapabrata Mondal, Asif Ekbal and Sivaji Bandyopadhyay show abstracthide abstract This paper reports about our work in the NEWS 2010 Machine Transliteration Shared Task held as part of ACL 2010. One standard run and two non-standard runs were submitted for English to Hindi and Bengali transliteration while one standard and one non-standard run were submitted for Kannada and Tamil. The transliteration systems are based on Orthographic rules and Phoneme based technology. The system has been trained on the NEWS 2010 Machine Transliteration Shared Task datasets. For the standard run, the system demonstrated mean F-Score values of 0.818 for Bengali, 0.714 for Hindi, 0.663 for Kannada and 0.563 for Tamil. The reported mean F-Score values of non-standard runs are 0.845 and 0.875 for Bengali non-standard run-1 and 2, 0.752 and 0.739 for Hindi non-standard run-1 and 2, 0.662 for Kannada non-standard run-1 and 0.760 for Tamil non-standard run-1. Non-Standard Run-2 for Bengali has achieved the highest score among all the submitted runs. Hindi Non-Standard Run-1 and Run-2 runs are ranked as the 5th and 6th among all submitted Runs.
41	Mining Transliterations from Wikipedia Using Pair HMMs Peter Nabende show abstracthide abstract This paper describes the use of a pair Hidden Markov Model (pair HMM) system in mining transliteration pairs from noisy Wikipedia data. A pair HMM variant that uses nine transition parameters, and emission parameters associated with single character mappings between source and target language alphabets is identiﬁed and used in estimating transliteration similarity. The system resulted in a precision of 78% and recall of 83% when evaluated on a random selection of English-Russian Wikipedia topics.
42	Phrase-Based Transliteration with Simple Heuristics Avinesh PVS and Ankur Parikh show abstracthide abstract This paper presents modeling of transliteration as a phrase-based machine translation system. We used a popular phrase-based machine translation system for English-Hindi machine transliteration. We have achieved an accuracy of 38.1% on the test set. We used some basic rules to modulate the existing phrased-based transliteration system. Our experiments show that phrase-based machine translation systems can be adopted by modulating the system to ﬁt the transliteration problem.

12:40–14:00

Lunch Break

14:00–15:20

Session 4: Oral

14:00–14:20	Classifying Wikipedia Articles into NE’s Using SVM’s with Threshold Adjustment Iman Saleh, Kareem Darwish and Aly Fahmy show abstracthide abstract In this paper, a method is presented to recognize multilingual Wikipedia named entity articles. This method classiﬁes multilingual Wikipedia articles using a variety of structured and unstructured features and is aided by cross-language links and features in Wikipedia. Adding multilingual features helps boost classiﬁcation accuracy and is shown to effectively classify multilingual pages in a language independent way. Classiﬁcation is done using Support Vectors Machine (SVM) classiﬁer at ﬁrst, and then the threshold of SVM is adjusted in order to improve the recall scores of classiﬁcation. Threshold adjustment is performed using beta-gamma threshold adjustment algorithm which is a post learning step that shifts the hyperplane of SVM. This approach boosted recall with minimal effect on precision.
14:20–14:40	Assessing the Challenge of Fine-Grained Named Entity Recognition and Classiﬁcation Asif Ekbal, Eva Sourjikova, Anette Frank and Simone Paolo Ponzetto show abstracthide abstract Named Entity Recognition and Classiﬁcation (NERC) is a well-studied NLP task typically focused on coarse-grained named entity (NE) classes. NERC for more ﬁne-grained semantic NE classes has not been systematically studied. This paper quantiﬁes the difﬁculty of ﬁne-grained NERC (FG-NERC) when performed at large scale on the people domain. We apply unsupervised acquisition methods to construct a gold standard dataset for FG-NERC. This dataset is used to benchmark methods for classifying NEs at various levels of ﬁne-grainedness using classical NERC techniques and global contextual information inspired from Word Sense Disambiguation approaches. Our results indicate high difﬁculty of the task and provide a ‘strong’ baseline for future research.
14:40–15:00	Using Deep Belief Nets for Chinese Named Entity Categorization Yu Chen, You Ouyang, Wenjie Li, Dequan Zheng and Tiejun Zhao show abstracthide abstract Identifying the categories of named enti-ties provides rich structured information in plain texts. In this paper, we are going to present a novel approach to Chinese entity mention categorization based on the Deep Belief Nets (DBN). As a deep structured neutral network, DBN has very strong representation power which can be trained efﬁciently. It is very suitable for Chinese entity categorization using cha-racter-level features because it can elaborately model the complicated relations among the characters. Experimental results from the ACE 2004 data set prove the advantage of DBN is its ability to outperform some state-of-the-art learning models such as SVM or BP neutral network.
15:00–15:20	Simpliﬁed Feature Set for Arabic Named Entity Recognition Ahmed Abdul Hamid and Kareem Darwish show abstracthide abstract This paper introduces simpliﬁed yet effective features that can robustly identify named entities in Arabic text without the need for morphological or syntactic analysis or gazetteers. A CRF sequence labeling model is trained on features that primarily use character n-gram of leading and trailing letters in words and word n-grams. The proposed features help overcome some of the morphological and orthographic complexities of Arabic. In comparing to results in the literature using Arabic speciﬁc features such POS tags on the same dataset and same CRF implementation, the results in this paper are lower by 2 F-measure points for locations, but are better by 8 points for organizations and 9 points for persons.

15:20–16:00

Break

16:00–17:00

Session 5: Oral

16:00–16:20	Think Globally, Apply Locally: Using Distributional Characteristics for Hindi Named Entity Identiﬁcation Shalini Gupta and Pushpak Bhattacharyya show abstracthide abstract In this paper, we present a novel approach for Hindi Named Entity Identiﬁcation (NEI) in a large corpus. The key idea is to harness the global distributional characteristics of the words in the corpus. We show that combining the global distributional characteristics along with the local context information improves the NEI performance over statistical baseline systems that employ only local context. The improvement is prominent (around 10\%) in scenarios where the test and train corpus belong to different genres. We also propose a novel measure for NEI based on term informativeness and show that it is competitive with the best measure and better than other well known information measures.
16:20–16:40	Rule-Based Named Entity Recognition in Urdu Kashif Riaz show abstracthide abstract Named Entity Recognition or Extraction (NER) is an important task for automated text processing for industries and academia engaged in the ﬁeld of language processing, intelligence gathering and Bioinformatics. In this paper we discuss the general problem of Named Entity Recognition, more speciﬁcally the challenges in NER in languages that do not have language resources e.g. large annotated corpora. We speciﬁcally address the challenges for Urdu NER and differentiate it from other South Asian (Indic) languages. We discuss the differences between Hindi and Urdu and conclude that the NER computational models for Hindi cannot be applied to Urdu. A rule-based Urdu NER algorithm is presented that outperforms the models that use statistical learning.
16:40–17:00	CONE: Metrics for Automatic Evaluation of Named Entity Co-Reference Resolution Bo Lin, Rushin Shah, Robert Frederking and Anatole Gershman show abstracthide abstract Human annotation for Co-reference Resolution (CRR) is labor intensive and costly, and only a handful of annotated corpora are currently available. However, corpora with Named Entity (NE) annotations are widely available. Also, unlike current CRR systems, state-of-the-art NER systems have very high accuracy and can generate NE labels that are very close to the gold standard for unlabeled corpora. We propose a new set of metrics collectively called CONE for Named Entity Co-reference Resolution (NE-CRR) that use a subset of gold standard annotations, with the advantage that this subset can be easily approximated using NE labels when gold standard CRR annotations are absent. We deﬁne CONE B3 and CONE CEAF metrics based on the traditional B3 and CEAF metrics and show that CONE B3 and CONE CEAF scores of any CRR system on any dataset are highly correlated with its B3 and CEAF scores respectively. We obtain correlation factors greater than 0.6 for all CRR systems across all datasets, and a best-case correlation factor of 0.8. We also present a baseline method to estimate the gold standard required by CONE metrics, and show that CONE B3 and CONE CEAF scores using this estimated gold standard are also correlated with B3 and CEAF scores respectively. We thus demonstrate the suitability of CONE B3and CONE CEAF for automatic evaluation of NE-CRR.

17:00–17:10

Closing

⇑ WS7: SIGMORPHON-11

11th Meeting of ACL Special Interest Group in Computational Morphology and Phonology
July 15
Venue A, Room XI
Chairs: Jeffrey Heinz, Lynne Cahill and Richard Wicentowski
Homepage

Thursday, 15 July 2010

9:00–10:30

Session

9:00–9:30	Instance-Based Acquisition of Vowel Harmony Fred Mailhot show abstracthide abstract I present a nonparametric regression-based model that induces a generalised and productive pattern of vowel harmony—including opaque and transparent neutrality—on the basis of simpliﬁed formant data. The model quickly learns to generate harmonically correct morphologically complex forms to which it has not been exposed.
9:30–10:00	Verifying Vowel Harmony Typologies Sara Finley show abstracthide abstract This paper applies ﬁnite state technologies to verify the typological validity of Turbid Spreading, a theory of vowel harmony in Op-timality Theory (OT) (Prince & Smolensky, 1993/2004). Previous analyses of vowel har-mony in OT have been prone to typological inconsistencies, predicting grammars that do not occur in natural language (Wilson, 2003). However, attempts to eliminate typological pathologies relying on hand-made inputs and candidate sets have been shown to be highly prone to error (Wilson, 2005). Using a modi-ﬁed version of the Contenders Algorithm (Riggle, 2004b), we verify that Turbid Spreading makes typologically valid predic-tions about the types of harmony processes that may appear in natural language. This modiﬁcation of the Contenders Algorithm to include complex spreading interactions and intermediate representations demonstrates the utility of computational methods for verifying the typological predictions of complex phonological theories.
10:00–10:30	Complexity of the Acquisition of Phonotactics in Optimality Theory Giorgio Magri show abstracthide abstract The problem of the acquisition of Phonotactics in Optimality Theory is shown to be not tractable in its strong formulation, whereby constraints and generating function vary arbitrarily as inputs of the problem.

10:30–11:00

Morning Break

11:00–12:30

Session

11:00–11:30	Maximum Likelihood Estimation of Feature-Based Distributions Jeffrey Heinz and Cesar Koirala show abstracthide abstract Motivated by recent work in phonotactic learning (Hayes and Wilson 2008, Albright 2009), this paper shows how to deﬁne feature-based probability distributions whose parameters can be provably efﬁciently estimated. The main idea is that these distributions are deﬁned as a product of simpler distributions (cf. Ghahramani and Jordan 1997). One advantage of this framework is it draws attention to what is minimally necessary to describe and learn phonological feature interactions in phonotactic patterns. The ’bottom-up’ approach adopted here is contrasted with the ’top-down’ approach in Hayes and Wilson (2008), and it is argued that the bottom-up approach is more analytically transparent.
11:30–12:00	A Method for Compiling Two-Level Rules with Multiple Contexts Kimmo Koskenniemi and Miikka Silfverberg show abstracthide abstract A novel method is presented for compiling two-level rules which have multiple context parts. The same method can also be applied to the resolution of so-called right-arrow rule conﬂicts. The method makes use of the fact that one can efﬁciently compose sets of two-level rules with a lexicon transducer. By introducing variant characters and using simple pre-processing of multi-context rules, all rules can be reduced into single-context rules. After the modiﬁed rules have been combined with the lexicon transducer, the variant characters may be reverted back to the original surface characters. The proposed method appears to be efﬁcient but only partial evidence is presented yet.
12:00–12:30	Exploring Dialect Phonetic Variation Using PARAFAC Jelena Prokic and Tim Van de Cruys show abstracthide abstract In this paper we apply the multi-way decomposition method PARAFAC in order to detect the most prominent sound changes in dialect variation. We investigate various phonetic patterns, both in stressed and unstressed syllables. We proceed from regular sound correspondences which are automatically extracted from the aligned transcriptions and analyzed using PARAFAC. This enables us to analyze simultaneously the co-occurrence patterns of all sound correspondences found in the data set and determine the most important factors of the variation. The ﬁrst ten dimensions are examined in more detail by recovering the geographical distribution of the extracted correspondences. We also compare dialect divisions based on the extracted correspondences to the divisions based on the whole data set and to the traditional scholarship as well. The results show that PARAFAC can be successfully used to detect the linguistic basis of the automatically obtained dialect divisions.

12:30–14:00

Lunch

14:00–15:30

Session

14:00–14:30	Quantitative Evaluation of Competing Syllable Parses Jason A. Shaw and Adamantios I. Gafos show abstracthide abstract This paper develops computational tools for evaluating competing syllabic parses of a phonological string on the basis of temporal patterns in speech production data. This is done by constructing models linking syllable parses to patterns of coordination between articulatory events. Data simulated from different syllabic parses are evaluated against experimental data from American English and Moroccan Arabic, two languages claimed to parse similar strings of segments into different syllabic structures. Results implicate a tautosyllabic parse of initial consonant clusters in English and a heterosyllabic parse of initial clusters in Arabic, in accordance with theoretical work on the syllable structure of these languages. It is further demonstrated that the model can correctly diagnose syllable structure even when previously proposed phonetic heuristics for such structure do not clearly point to the correct diagnosis.
14:30–15:00	Toward a Totally Unsupervised, Language-Independent Method for the Syllabiﬁcation of Written Texts Thomas Mayer show abstracthide abstract Unsupervised algorithms for the induction of linguistic knowledge should at best require as few basic assumptions as possible and at the same time in principle yield good results for any language. However, most of the time such algorithms are only tested on a few (closely related) languages. In this paper, an approach is presented that takes into account typological knowledge in order to induce syllabic divisions in a fully automatic manner based on reasonably-sized written texts. Our approach is able to account for syllable structures of languages where other approaches would fail, thereby raising the question whether computational methods can really be claimed to be language-universal when they are not tested on the variety of structures that are found in the languages of the world.
15:00–15:30	Comparing Canonicalizations of Historical German Text Bryan Jurish show abstracthide abstract Historical text presents numerous challenges for contemporary natural language processing techniques. In particular, the absence of consistent orthographic conventions in historical text presents difﬁculties for any system requiring reference to a static lexicon accessed by orthographic form. In this paper, we present three methods for associating unknown historical word forms with synchronically active canonical cognates and evaluate their performance on an information retrieval task over a manually annotated corpus of historical German verse.

15:30–16:00

Afternoon Break

16:00–17:00

Session

16:00–16:30

Semi-Supervised Learning of Concatenative Morphology

Oskar Kohonen, Sami Virpioja and Krista Lagus

show abstract

16:30–17:00

Morpho Challenge 2005-2010: Evaluations and Results

Mikko Kurimo, Sami Virpioja, Ville Turunen and Krista Lagus

show abstract

17:00–

Business Meeting (all are welcome)

⇑ WS10: ATANLP 2010

Applications of Tree Automata in Natural Language Processing
July 16
Venue A, Room XI
Chairs: Frank Drewes and Marco Kuhlmann
Homepage

Friday, July 16, 2010

09:00–09:15

Opening Remarks

09:15–10:30

Invited Talk

Kevin Knight

10:30–11:00

Coffee Break

11:00–12:30

Full Paper Session 1

11:00–11:30	Preservation of Recognizability for Synchronous Tree Substitution Grammars Zoltán Fülöp, Andreas Maletti and Heiko Vogler show abstracthide abstract We consider synchronous tree substitution grammars (Stsg). With the help of a characterization of the expressive power of Stsg in terms of weighted tree bimorphisms, we show that both the forward and the backward application of an Stsg preserve recognizability of weighted tree languages in all reasonable cases. As a consequence, both the domain and the range of an Stsg without chain rules are recognizable weighted tree languages.
11:30–12:00	A Decoder for Probabilistic Synchronous Tree Insertion Grammars Steve DeNeefe, Kevin Knight and Heiko Vogler show abstracthide abstract Synchronous tree insertion grammars (STIG) are formal models for syntax-based machine translation. We formalize a decoder for probabilistic STIG; the decoder transforms every source-language string into a target-language tree and calculates the probability of this transformation.
12:00–12:30	Parsing and Translation Algorithms Based on Weighted Extended Tree Transducers Andreas Maletti and Giorgio Satta show abstracthide abstract This paper proposes a uniform framework for the development of parsing and translation algorithms for weighted extended (top-down) tree transducers and input strings. The asymptotic time complexity of these algorithms can be improved in practice by exploiting an algorithm for rule factorization in the above transducers.

12:30–14:00

Lunch Break

14:00–15:30

Full Paper Session 2

14:00–14:30	Millstream Systems – a Formal Model for Linking Language Modules by Interfaces Suna Bensch and Frank Drewes show abstracthide abstract We introduce Millstream systems, a formal model consisting of modules and an interface, where the modules formalise different aspects of language, and the interface links these aspects with each other.
14:30–15:00	Transforming Lexica as Trees Mark-Jan Nederhof show abstracthide abstract We investigate the problem of structurally changing lexica, while preserving the information. We present a type of lexicon transformation that is complete on an interesting class of lexica. Our work is motivated by the problem of merging one or more lexica into one lexicon. Lexica, lexicon schemas, and lexicon transformations are all seen as particular kinds of trees.
15:00–15:30	n-Best Parsing Revisited Matthias Büchse, Daniel Geisler, Torsten Stüber and Heiko Vogler show abstracthide abstract We derive and implement an algorithm similar to (Huang and Chiang, 2005) for ﬁnding the n best derivations in a weighted hypergraph. We prove the correctness and termination of the algorithm and we show experimental results concerning its runtime. Our work is different from the aforementioned one in the following respects: we consider labeled hypergraphs, allowing for tree-based language models (Maletti and Satta, 2009); we speciﬁcally handle the case of cyclic hypergraphs; we admit structured weight domains, allowing for multiple features to be processed; we use the paradigm of functional programming together with lazy evaluation, achieving concise algorithmic descriptions.

15:30–16:00

Coffee Break

16:00–17:30

Quickﬁre Presentations

16:00–16:15	Tree Automata Techniques and the Learning of Semantic Grammars Michael Minock
16:15–16:30	Do We Really Want a Single Tree to Cover the Whole Sentence? Aravind Joshi
16:30–16:45	The Tree Automata Workbench ‘Marbles’ Frank Drewes
16:45–17:00	Requirements on a Tree Transformation Model for Machine Translation Andreas Maletti
17:00–17:30	Discussion

⇑ WS12: CDS

Companionable Dialogue Systems
July 15
Venue A, Room II
Chairs: Yorick Wilks, Morena Danieli and Björn Gambäck
Homepage

July 15

09:00–10:30

Invited Paper Session

09:00–9:15	Welcome
09:15–10:30	Do’s and Don’ts for Software Companions David Traum

10:30–11:00

Morning break

11:00–12:30

Session

11:00–11:30	Episodic Memory for Companion Dialogue Gregor Sieber and Brigitte Krenn show abstracthide abstract We present an episodic memory component for enhancing the dialogue of artiﬁcial companions with the capability to refer to, take up and comment on past interactions with the user, and to take into account in the dialogue long-term user preferences and interests. The proposed episodic memory is based on RDF representations of the agent’s experiences and is linked to the agent’s semantic memory containing the agent’s knowledge base of ontological data and information about the user’s interests.
11:30–12:00	MANA for the Ageing David M W Powers, Martin H Luerssen, Trent W Lewis, Richard E Leibbrandt, Marissa Milne, John Pashalis and Kenneth Treharne show abstracthide abstract We present a family of Embodied Conversational Agents (ECAs) using Talking Head technology, along with a program of associated research and user trials. Whilst antecedents of our current ECAs include “chatbots” desgined to pass the Turing Test (TT) or win a Loebner Prize (LP), our current agents are task-oriented Teaching Agents and Social Companions. The current focus for our research includes the role of emotion, expression and gesture in our agents/companions, the explicit teaching of such social skills as recognizing and displaying appropriate expressions/gestures, and the integration of template/database-based dialogue managers with more conversational TT/LP systems as well as with audio-visual speech/gesture recognition/synthesis technologies.
12:00–12:30	Is a Companion a Distinctive Kind of Relationship with a Machine? Yorick Wilks show abstracthide abstract I start from a perspective close to that of the EC COMPANIONS project, and set out its aim to model a new kind of human-computer relationship based on long-term interaction, with some tasks involved although the Companion is not inherently task-based, since there need be no stopping point to its conversation. Some demonstration of its functionality will be given but the main purpose here is an analysis of what it is people might want from such a relationship and what evidence we have for whatever we conclude. Is politeness important? Is an attempt at emotional sympathy important or achievable? Does a user want a consistent personality in a Companion or a variety of personalities? Should we be talking more in terms of a "cognitive prosthesis (or orthosis)?" —something to extract, organize, and locate the user’s knowledge or personal information—rather than attitudes?

12:30–14:00

Lunch break

14:00–15:30

Session

14:00–14:30	“Hello Emily, How are You Today?” - Personalised Dialogue in a Toy to Engage Children. Carole Adam, Lawrence Cavedon and Lin Padgham show abstracthide abstract In line with the growing interest in conversational agents as companions, i.e. agents that are intelligent, and able to interact via speech and other modalities with a user over a long period of time, personalising interaction to them. We are developing a toy companion for children that is capable of engaging interactions and of developing a long-term relationship with them, and is extensible so as to evolve with them. In this paper, we investigate the importance of personalising interaction both for engagement and for long-term relationship development. In particular, we propose a framework for representing, gathering and using personal knowledge about the child during dialogue interaction.
14:30–15:00	A Robot in the Kitchen Peter Wallis show abstracthide abstract A technology demonstrator is one thing but having people use a technology is another, and the result reported here is that people often ignore our lovingly crafted handywork. The SERA project - Social Engagement with Robots and Agents - was set up to look explicitly at what happens when a robot companion is put in someone’s home. Even if things worked perfectly, there are times when a companion’s human is simply not engaged. As a result we have separated our “dialog manager” into two parts: the dialog manager itself that determines what to say next, and an “interaction manager” that determines when to say it. This paper details the design of this SALT-E architecture.
15:00–15:30	An Embodied Dialogue System with Personality and Emotions Stasinos Konstantopoulos show abstracthide abstract An enduring challenge in human-computer interaction (HCI) research is the creation of natural and intuitive interfaces. Besides the obvious requirement that such interfaces communicate over modalities such as natural language (especially spoken) and gesturing that are more natural for humans, exhibiting affect and adaptivity have also been identiﬁed as important factors to the interface’s acceptance by the user. In the work presented here, we propose a novel architecture for affective and multimodal dialogue systems that allows explicit control over the personality traits that we want the system to exhibit. More speciﬁcally, we approach personality as a means of synthesising different, and possibly conﬂicting, adaptivity models into an overall model to be used to drive the interaction components of the system. Furthermore, this synthesis is performed in the presence of domain knowledge, so that domain structure and relations inﬂuence the results of the calculation.

15:30–16:00

Afternoon break

16:00–17:00

Session

16:00–16:30

How was Your Day?

Stephen Pulman, Johan Boye, Marc Cavazza, Cameron Smith and Raúl Santos de la Cámara

show abstract

16:30–17:00

VCA: An Experiment With A Multiparty Virtual Chat Agent

Samira Shaikh, Tomek Strzalkowski, Sarah Taylor and Nick Webb

show abstract

17:00–17:30

Wrap up discussion of the day’s issues

⇑ WS13: GEMS-2010

Geometric Models of Natural Language Semantics
July 16
Venue A, Room II
Chairs: Roberto Basili and Marco Pennacchiotti
Homepage

July 16, 2010

9:25–9:30

Welcome and Opening

9:30–10:30

Session: Geometry and Semantics

9:30–10:00

Capturing Nonlinear Structure in Word Spaces Through Dimensionality Reduction

David Jurgens and Keith Stevens

show abstract

10:00–10:30

Manifold Learning for the Semi-Supervised Induction of FrameNet Predicates: an Empirical Investigation

Danilo Croce and Daniele Previtali

show abstract

10:30–11:00

Coffee Break

11:00–12:10

Invited Talk

11:00–12:10

What is Word Meaning, Really? (And How Can Distributional Models Help Us Describe It?)

Katrin Erk

show abstract

12:10–13:10

Session: Lexical Acquisition 1

12:10–12:40

Relatedness Curves for Acquiring Paraphrases

Georgiana Dinu and Grzegorz Chrupała

show abstract

12:40–13:10

A Regression Model of Adjective-Noun Compositionality in Distributional Semantics

Emiliano Guevara and Daniele Previtali

show abstract

13:10–14:30

Lunch Break

14:30–15.30

Session: Lexical Acquisition 2

14:30–15:00

Semantic Composition with Quotient Algebras

Daoud Clarke, Rudi Lutz and David Weir

show abstract

15:00–15.30

Expectation Vectors: A Semiotics Inspired Approach to Geometric Lexical-Semantic Representation

Justin Washtell

show abstract

15:30–16:00

Coffee Break

16:00–17:00

Session: Computational Aspects

16:00–16:30

Sketch Techniques for scaling Distributional Similarity to the Web

Amit Goyal, Jagadeesh Jagaralamudi, Hal Daumé III and Suresh Venkatasubramanian

show abstract

16:30–17:00

Active learning for constrained Dirichlet process mixture models

Andreas Vlachos, Zoubin Ghahramani and Ted Briscoe

show abstract

17:00–17:55

GEMS panel

17:55–18:00

Closing Remarks

ACL 2010 July 11-16

Workshops

ACL 2010 Workshop Chairs

⇑ CoNLL-2010

⇑ WS2: WMT’10/MetricsMATR

⇑ WS1: SemEval-2010

⇑ WS3: The LAW IV

⇑ WS4: BioNLP2010

⇑ WS6: NLPLing 2010

⇑ WS11: DANLP 2010

⇑ WS8: TextGraphs-5

⇑ WS5: CMCL

⇑ WS9: NEWS 2010

⇑ WS7: SIGMORPHON-11

⇑ WS10: ATANLP 2010

⇑ WS12: CDS

⇑ WS13: GEMS-2010

ACL 2010
July 11-16