Thursday, July 15, 2010 |
9:00–9:15
|
Opening Remarks
|
9:15–10:30
|
Session 1: Parsing
9:15–9:40 |
Improvements in Unsupervised Co-Occurrence-Based Parsing
Christian Hänig
show abstracthide abstractThis paper presents an algorithm for unsupervised co-occurrence based parsing that improves and extends existing approaches. The proposed algorithm induces a context-free grammar of the language in question in an iterative manner. The resulting structure of a sentence will be given as a hierarchical arrangement of constituents. Although this algorithm does not use any a priori knowledge about the language, it is able to detect heads, modifiers and a phrase type’s different compound composition possibilities. For evaluation purposes, the algorithm is applied to manually annotated part-of-speech tags (POS tags) as well as to word classes induced by an unsupervised part-of-speech tagger.
|
9:40–10:05 |
Viterbi Training Improves Unsupervised Dependency Parsing
Valentin I. Spitkovsky, Hiyan Alshawi, Daniel Jurafsky and Christopher D. Manning
show abstracthide abstractWe show that Viterbi (or "hard") EM is well-suited to unsupervised grammar induction. It is more accurate than standard inside-outside re-estimation (classic EM), significantly faster, and simpler. Our experiments with Klein and Manning’s Dependency Model with Valence (DMV) attain state-of-the-art performance — 44.8% accuracy on Section 23 (all sentences) of the Wall Street Journal corpus — without clever initialization; with a good initializer, Viterbi training improves to 47.9%. This generalizes to the Brown corpus, our held-out set, where accuracy reaches 50.8% — a 7.5% gain over previous best results. We find that classic EM learns better from short sentences but cannot cope with longer ones, where Viterbi thrives. However, we explain that both algorithms optimize the wrong objectives and prove that there are fundamental disconnects between the likelihoods of sentences, best parses, and true parses, beyond the well-established discrepancies between likelihood, accuracy and extrinsic performance.
|
10:05–10:30 |
Driving Semantic Parsing from the World’s Response
James Clarke, Dan Goldwasser, Ming-Wei Chang and Dan Roth
show abstracthide abstractCurrent approaches to semantic parsing, the task of converting text to a formal meaning representation, rely on annotated training data mapping sentences to logical forms. Providing this supervision is a major bottleneck in scaling semantic parsers. This paper presents a new learning paradigm aimed at alleviating the supervision burden. We develop two novel learning algorithms capable of predicting complex structures which only rely on a binary feedback signal based on the context of an external world. In addition we reformulate the semantic parsing problem to reduce the dependency of the model on syntactic patterns, thus allowing our parser to scale better using less supervision. Our results surprisingly show that without using any annotated meaning representations learning with a weak feedback signal is capable of producing a parser that is competitive with fully supervised parsers.
|
|
10:30–11:00
|
Break
|
11:00–12:15
|
Session 2: Grammar Induction
11:00–11:25 |
Efficient, Correct, Unsupervised Learning for Context-Sensitive Languages
Alexander Clark
show abstracthide abstractA central problem for NLP is grammar induction: the development of unsupervised learning algorithms for syntax. In this paper we present a lattice-theoretic representation for natural language syntax, called Distributional Lattice Grammars. These representations are objective or empiricist, based on a generalisation of distributional learning, and are capable of representing all regular languages, some but not all context-free languages and some non-context-free languages. We present a simple algorithm for learning these grammars together with a complete self-contained proof of the correctness and efficiency of the algorithm.
|
11:25–11:50 |
Identifying Patterns for Unsupervised Grammar Induction
Jesús Santamaría and Lourdes Araujo
show abstracthide abstractThis paper describes a new method for unsupervised grammar induction based on the automatic extraction of certain patterns in the texts. Our starting hypothesis is that there exist some classes of words that function as separators, marking the beginning or the end of new constituents. Among these separators we distinguish those which trigger new levels in the parse tree. If we are able to detect these separators we can follow a very simple procedure to identify the constituents of a sentence by taking the classes of words between separators. This paper is devoted to describe the process that we have followed to automatically identify the set of separators from a corpus only annotated with Part-of-Speech (POS) tags. The proposed approach has allowed us to improve the results of previous proposals when parsing sentences from the Wall Street Journal corpus.
|
11:50–12:15 |
Learning Better Monolingual Models with Unannotated Bilingual Text
David Burkett, Slav Petrov, John Blitzer and Dan Klein
show abstracthide abstractThis work shows how to improve state-of-the-art monolingual natural language processing models using unannotated bilingual text. We build a multiview learning objective that enforces agreement between monolingual and bilingual models. In our method the first, monolingual view consists of supervised predictors learned separately for each language. The second, bilingual view consists of log-linear predictors learned over both languages on bilingual text. Our training procedure estimates the parameters of the bilingual model using the output of the monolingual model, and we show how to combine the two models to account for dependence between views. For the task of named entity recognition, using bilingual predictors increases F1 by 16.1% absolute over a supervised monolingual model, and retraining on bilingual predictions increases *monolingual* model F1 by 14.6%. For syntactic parsing, our bilingual predictor increases F1 by 2.1% absolute, and retraining a monolingual model on its output gives an improvement of 2.0%.
|
|
12:15–14:15
|
Lunch
|
14:15–15:30
|
Invited Talk
14:15–15:30 |
Clueless: Explorations in Unsupervised, Knowledge-Lean Extraction of Lexical-Semantic Information
Lillian Lee
show abstracthide abstractI will discuss two current projects on automatically extracting certain types of lexical-semantic information in settings wherein we can rely neither on annotations nor existing knowledge resources to provide us with clues. The name of the game in such settings is to find and leverage auxiliary sources of information. Why is it that if you know I’ll give a silly talk, it follows that you know I’ll give a talk, whereas if you doubt I’ll give a good talk, it doesn’t follow that you doubt I’ll give a talk? This pair of examples shows that the word “doubt” exhibits a special but prevalent kind of behavior known as downward entailingness — the licensing of reasoning from supersets to subsets, so to speak, but not vice versa. The first project I’ll describe is to identify words that are downward entailing, a task that promises to enhance the performance of systems that engage in textual inference, and one that is quite challenging since it is difficult to characterize these items as a class and no corpus with downward-entailingness annotations exists. We are able to surmount these challenges by utilizing some insights from the linguistics literature regarding the relationship between downward entailing operators and what are known as negative polarity items — words such as “ever” or the idiom “have a clue” that tend to occur only in negative contexts. A cross-linguistic analysis indicates some potentially interesting connections to findings in linguistic typology. That previous paragraph was quite a mouthful, wasn’t it? Wouldn’t it be nice if it were written in plain English that was easier to understand? The second project I’ll talk about, which has the eventual aim to make it possible to automatically simplify text, aims to learn lexical-level simplifications, such as “work together” for “collaborate”. (This represents a complement to prior work, which focused on syntactic transformations, such as passive to active voice.) We exploit edit histories in Simple English Wikipedia for this task. This isn’t as simple (ahem) as it might at first seem because Simple English Wikipedia and the usual Wikipedia are far from a perfect parallel corpus and because many edits in Simple Wikipedia do not constitute simplifications. We consider both explicitly modeling different kinds of operations and various types of bootstrapping, including as clues the comments Wikipedians sometimes leave when they edit. Joint work with Cristian Danescu-Niculescu-Mizil, Bo Pang, and Mark Yatskar.
|
|
15:30–16:00
|
Break
|
16:00–17:30
|
Shared Task Session 1: Overview and Oral Presentations
16:00–16:20 |
The CoNLL 2010 Shared Task: Learning to Detect Hedges and their Scope in Natural Language Text
Richárd Farkas, Veronika Vincze, György Móra, János Csirik and György Szarvas
show abstracthide abstractThe CoNLL 2010 Shared Task was dedicated to the detection of uncertainty cues and their linguistic scope in natural language texts. The motivation behind this task was that distinguishing factual and uncertain information in texts is of essential importance in information extraction. This paper provides a general overview of the \emph{Learning to detect hedges and their scope in natural language texts Shared Task}, including the annotation protocols of the training and evaluation datasets, the exact task definitions, the evaluation metrics employed and the overall results. The paper concludes with an analysis of the prominent approaches and an overview of the systems submitted to the Shared Task.
|
16:20–16:30 |
A Cascade Method for Detecting Hedges and their Scope in Natural Language Text
Buzhou Tang, Xiaolong Wang, Xuan Wang, Bo Yuan and Shixi Fan
show abstracthide abstractDetecting hedges and their scope in natural language text is very important for information inference. In this paper, we present a system based on a cascade method for the CoNLL-2010 shared task. The system composes of two components: one for detecting hedges and another one for detecting their scope. For detecting hedges, we build a cascade subsystem. Firstly, a conditional random field (CRF) model and a large margin-based model are trained respectively. Then, we train another CRF model using the result of the first phase. For detecting the scope of hedges, a CRF model is trained according to the result of the first subtask. The experiments show that our system achieves 86.36% F-measure on Biological corpus and 55.05\% F-measure on Wikipedia corpus for hedge detection, and 49.95% F-measure on Biological corpus for hedge scope detection. Among them, 86.36% is the best result on Biological corpus for hedge detection.
|
16:30–16:40 |
Detecting Speculative Language using Syntactic Dependencies and Logistic Regression
Andreas Vlachos and Mark Craven
show abstracthide abstractIn this paper we describe our approach to the CoNLL 2010 shared task on detecting speculative language in biomedical text. We treat the detection of sentences containing uncertain information (Task1) as a token classification task since the existence or absence of cues determines the sentence label. We distinguish words that have speculative and non-speculative meaning by employing syntactic features as a proxy for their semantic content. In order to identify the scope of each cue (Task2), we learn a classifier that predicts whether each token of a sentence belongs to the scope of a given cue. The features in the classifier are based on the syntactic dependency path between the cue and the token. In both tasks, we use a Bayesian logistic regression classifier incorporating a sparsity-enforcing Laplace prior. Overall, the performance achieved is 85.21% F-score and 44.11% F-score in Task1 and Task2, respectively.
|
16:40–16:50 |
A Hedgehop over a Max-margin Framework using Hedge Cues
Maria Georgescul
show abstracthide abstractIn this paper, we describe the experimental settings we adopted in the context of the 2010 CoNLL shared task for detecting sentences containing uncertainty. The classification results reported on are obtained using discriminative learning with features essentially incorporating lexical information. Hyper-parameters are tuned for each domain: using BioScope training data for the biomedical domain and Wikipedia training data for the Wikipedia test set. By allowing an efficient handling of combinations of large-scale input features, the discriminative approach we adopted showed highly competitive empirical results for hedge detection on the Wikipedia dataset: our system is ranked as the first with an F-score of 60.17%.
|
16:50–17:00 |
Detecting Hedge Cues and their Scopes with Average Perceptron
Feng Ji, Xipeng Qiu and Xuanjing Huang
show abstracthide abstractIn this paper, we proposed a hedge detection method with average perceptron, which was used in the closed challenge in CoNLL 2010 Shared Task. There are two subtasks: (1) detecting uncertain sentences and (2) identifying the in-sentence scopes of hedge cues. We use the unified learning algorithm for both subtasks since that the hedge score of sentence can be decomposed into scores of the words, especially the hedge words. On the biomedical corpus, our methods achieved F-measure with 77.86% in detecting in-domain uncertain sentences, 77.44% in recognizing hedge cues, and 19.27% in identifying the scopes.
|
17:00–17:10 |
Memory-based Resolution of In-sentence Scopes of Hedge Cues
Roser Morante, Vincent Van Asch and Walter Daelemans
show abstracthide abstractIn this paper we describe the machine learning systems that we submitted to the CoNLL-2010 Shared Task on Learning to Detect Hedges and Their Scope in Natural Language Text. Task 1 on detecting uncertain information was performed by an SVM-based system to process the Wikipedia data and by a memory-based system to process the biological data. Task 2 on resolving in-sentence scopes of hedge cues, was performed by a memory-based system that relies on information from syntactic dependencies. This system scored the highest F1 (57.32) of Task 2.
|
17:10–17:20 |
Resolving Speculation: MaxEnt Cue Classification and Dependency-Based Scope Rules
Erik Velldal, Lilja Øvrelid and Stephan Oepen
show abstracthide abstractThis paper describes a hybrid, two-level approach for resolving hedge cues, the problem of the CoNLL 2010 shared task. First, a maximum entropy classifier is applied to identify cue words, using both syntactic and surface-oriented features. Second, a set of manually crafted rules, operating on dependency representations and the output of the classifier, is applied to resolve the scope of the hedge cues within the sentence. For both Task 1 and Task 2, our system participates in the stricter category of ‘closed’ or ‘in-domain’ systems.
|
17:20–17:30 |
Combining Manual Rules and Supervised Learning for Hedge Cue and Scope Detection
Marek Rei and Ted Briscoe
show abstracthide abstractHedge cues were detected using a supervised Conditional Random Field (CRF) classifier exploiting features from the RASP parser. The CRF’s predictions were filtered using known cues and unseen instances were removed, increasing precision while retaining recall. Rules for scope detection, based on the grammatical relations of the sentence and the part-of-speech tag of the cue, were manually developed. However, another supervised CRF classifier was used to refine these predictions. As a final step, scopes were constructed from the classifier output using a small set of post-processing rules. Development of the system revealed a number of issues with the annotation scheme adopted by the organisers.
|
|
17:30–18:00
|
Shared Task Discussion Panel
|
Friday, July 16, 2010 |
9:15–10:30
|
Invited Talk
9:15–10:30 |
Bayesian Hidden Markov Models and Extensions
Zoubin Ghahramani
show abstracthide abstractHidden Markov models (HMMs) are one of the cornerstones of time-series modelling. I will review HMMs, motivations for Bayesian approaches to inference in them, and our work on variational Bayesian learning. I will then focus on recent nonparametric extensions to HMMs. Traditionally, HMMs have a known structure with a fixed number of states and are trained using maximum likelihood techniques. The infinite HMM (iHMM) allows a potentially unbounded number of hidden states, letting the model use as many states as it needs for the data. The recent development of ’Beam Sampling’ — an efficient inference algorithm for iHMMs based on dynamic programming — makes it possible to apply iHMMs to large problems. I will show some applications of iHMMs to unsupervised POS tagging and experiments with parallel and distributed implementations. I will also describe a factorial generalisation of the iHMM which makes it possible to have an unbounded number of binary state variables, and can be thought of as a time-series generalisation of the Indian buffet process. I will conclude with thoughts on future directions in Bayesian modelling of sequential data.
|
|
10:30–11:00
|
Break
|
11:00–12:30
|
Joint Poster Session: Main conference and shared task posters
|
21 |
Improved Unsupervised POS Induction Using Intrinsic Clustering Quality and a Zipfian Constraint
Roi Reichart, Raanan Fattal and Ari Rappoport
show abstracthide abstractModern unsupervised POS taggers usually apply an optimization procedure to a non-convex function, and tend to converge to local maxima that are sensitive to starting conditions. The quality of the tagging induced by such algorithms is thus highly variable, and researchers report average results over several random initializations. Consequently, applications are not guaranteed to use an induced tagging of the quality reported for the algorithm. In this paper we address this issue using an unsupervised test for intrinsic clustering quality. We run a base tagger with different random initializations, and select the best tagging using the quality test. As a base tagger, we modify a leading unsupervised POS tagger (Clark, 2003) to constrain the distributions of word types across clusters to be Zipfian, allowing us to utilize a perplexity-based quality test. We show that the correlation between our quality test and gold standard-based tagging quality measures is high. Our results are better in most evaluation measures than all results reported in the literature for this task, and are always better than the Clark average results.
|
22 |
Syntactic and Semantic Structure for Opinion Expression Detection
Richard Johansson and Alessandro Moschitti
show abstracthide abstractWe demonstrate that relational features derived from dependency-syntactic and semantic role structures are useful for the task of detecting opinionated expressions in natural-language text, significantly improving over conventional models based on sequence labeling with local features. These features allow us to model the way opinionated expressions interact in a sentence over arbitrary distances. While the relational features make the prediction task more computationally expensive, we show that it can be tackled effectively by using a reranker. We evaluate a number of machine learning approaches for the reranker, and the best model results in a 10-point absolute improvement in soft recall on the MPQA corpus, while decreasing precision only slightly.
|
23 |
Type Level Clustering Evaluation: New Measures and a POS Induction Case Study
Roi Reichart, Omri Abend and Ari Rappoport
show abstracthide abstractClustering is a central technique in NLP. Consequently, clustering evaluation is of great importance. Many clustering algorithms are evaluated by their success in tagging corpus tokens. In this paper we discuss type level evaluation, which reflects class membership only and is independent of the token statistics of a particular reference corpus. Type level evaluation casts light on the merits of algorithms, and for some applications is a more natural measure of the algorithm’s quality. We propose new type level evaluation measures that, contrary to existing measures, are applicable when items are polysemous, the common case in NLP. We demonstrate the benefits of our measures using a detailed case study, POS induction. We experiment with seven leading algorithms, obtaining useful insights and showing that token and type level measures can weakly or even negatively correlate, which underscores the fact that these two approaches reveal different aspects of clustering quality.
|
24 |
Recession Segmentation: Simpler Online Word Segmentation Using Limited Resources
Constantine Lignos and Charles Yang
show abstracthide abstractIn this paper we present a cognitively plausible approach to word segmentation that segments in an online fashion using only local information and a lexicon of previously segmented words. Unlike popular statistical optimization techniques, the learner uses structural information of the input syllables rather than distributional cues to segment words. We develop a memory model for the learner that like a child learner does not recall previously hypothesized words perfectly. The learner attains an F-score of 86.69% in ideal conditions and 85.05% when word recall is unreliable and stress in the input is reduced. These results demonstrate the power that a simple learner can have when paired with appropriate structural constraints on its hypotheses.
|
25 |
Computing Optimal Alignments for the IBM-3 Translation Model
Thomas Schoenemann
show abstracthide abstractPrior work on training the IBM-3 translation model is based on suboptimal methods for computing Viterbi alignments. In this paper, we present the first method guaranteed to produce globally optimal alignments. This not only results in improved alignments, it also gives us the opportunity to evaluate the quality of standard hillclimbing methods. Indeed, hillclimbing works reasonably well in practice but still fails to find the global optimum for between 2\% and 12\% of all sentence pairs and the probabilities can be several tens of orders of magnitude away from the Viterbi alignment. By reformulating the alignment problem as an Integer Linear Program, we can use standard machinery from global optimization theory to compute the solutions. We use the well-known branch-and-cut method, but also show how it can be customized to the specific problem discussed in this paper. In fact, a large number of alignments can be excluded from the start without losing global optimality.
|
26 |
Semi-Supervised Recognition of Sarcasm in Twitter and Amazon
Dmitry Davidov, Oren Tsur and Ari Rappoport
show abstracthide abstractSarcasm is a form of speech act in which the speakers convey their message in an implicit way. The inherently ambiguous nature of sarcasm sometimes makes it hard even for humans to decide whether an utterance is sarcastic or not. Recognition of sarcasm can benefit many sentiment analysis NLP applications, such as review summarization, dialogue systems and review ranking systems. In this paper we experiment with semi-supervised sarcasm identification on two very different data sets: a collection of 5.9 million tweets collected from Twitter, and a collection of 66000 product reviews from Amazon. Using the Mechanical Turk we created a gold standard sample in which each sentence was tagged by 3 annotators, obtaining F-scores of 0.78 on the product reviews dataset and 0.83 on the Twitter dataset. We discuss the differences between the datasets and how the algorithm uses them (e.g., for the Amazon dataset the algorithm makes use of structured information). We also discuss the utility of Twitter #sarcasm hashtags for the task.
|
27 |
Learning Probabilistic Synchronous CFGs for Phrase-based Translation
Markos Mylonakis and Khalil Sima’an
show abstracthide abstractProbabilistic phrase-based synchronous grammars are now considered promising devices for statistical machine translation because they can express reordering phenomena between pairs of languages. Learning these hierarchical, probabilistic devices from parallel corpora constitutes a major challenge, because of multiple latent model variables as well as the risk of data overfitting. This paper presents an effective method for learning a family of particular interest to MT, binary Synchronous Context-Free Grammars with inverted/monotone orientation (a.k.a. Binary ITG). A second contribution concerns devising a lexicalized phrase reordering mechanism that has complimentary strengths to Chiang’s model. The latter conditions reordering decisions on the surrounding lexical context of phrases, whereas our mechanism works with the lexical content of phrase pairs (akin to standard phrase-based systems). Surprisingly, our experiments on French-English data show that our learning method applied to far simpler models exhibits performance indistinguishable from the Hiero system.
|
28 |
A Semi-Supervised Batch-Mode Active Learning Strategy for Improved Statistical Machine Translation
Sankaranarayanan Ananthakrishnan, Rohit Prasad, David Stallard and Prem Natarajan
show abstracthide abstractThe availability of substantial, in-domain parallel corpora is critical for the development of high-performance statistical machine translation (SMT) systems. Such corpora, however, are expensive to produce due to the labor intensive nature of manual translation. We propose to alleviate this problem with a novel, semi-supervised, batch-mode active learning strategy that attempts to maximize in-domain coverage by selecting sentences, which represent a balance between domain match, translation difficulty, and batch diversity. Simulation experiments on an English-to-Pashto translation task show that the proposed strategy not only outperforms the random selection baseline, but also traditional active learning techniques based on dissimilarity to existing training data. Our approach achieves a relative improvement of 45.9% in BLEU over the seed baseline, while the closest competitor gained only 24.8% with the same number of selected sentences.
|
29 |
Improving Word Alignment by Semi-supervised Ensemble
Shujian Huang, Kangxi Li, Xinyu Dai and Jiajun Chen
show abstracthide abstractSupervised learning has been recently used to improve the performance of word alignment. However, due to the limited amount of labeled data, the performance of "pure" supervised learning, which only used labeled data, is limited. As a result, many existing methods employ features learnt from a large amount of unlabeled data to assist the task. In this paper, we propose a semi-supervised ensemble method to better incorporate both labeled and unlabeled data during learning. Firstly, we employ an ensemble learning framework, which effectively uses alignment results from different unsupervised alignment models. We then propose to use a semi-supervised learning method, namely Tri-training, to train classifiers using both labeled and unlabeled data collaboratively and further improve the result of ensemble learning. Experimental results show that our methods can substantially improve the quality of word alignment. The final translation quality of a phrase-based translation system is slightly improved, as well.
|
30 |
A Comparative Study of Bayesian Models for Unsupervised Sentiment Detection
Chenghua Lin, Yulan He and Richard Everson
show abstracthide abstractThis paper presents a comparative study of three closely related Bayesian models for unsupervised sentiment detection, namely, the latent sentiment model (LSM), the joint sentiment-topic (JST) model, and the Reverse-JST model. Extensive experiments have been conducted on two corpora, the movie review dataset and the multi-domain sentiment dataset. It has been found that while all the three models achieve either better or comparable performance on these two corpora when compared to the existing unsupervised sentiment classification approaches, both JST and Reverse-JST are able to extract sentiment-oriented topics. In addition, Reverse-JST always performs worse than JST suggesting that the JST model is more appropriate for joint sentiment topic detection.
|
31 |
A Hybrid Approach to Emotional Sentence Polarity and Intensity Classification
Jorge Carrillo de Albornoz, Laura Plaza and Pablo Gervás
show abstracthide abstractIn this paper, the authors present a new approach to sentence level sentiment analysis. The aim is to determine whether a sentence expresses a positive, negative or neutral sentiment, as well as its intensity. The method performs WSD over the words in the sentence in order to work with concepts rather than terms, and makes use of the knowledge in an affective lexicon to label these concepts with emotional categories. It also deals with the effect of negations and quantifiers on polarity and intensity analysis. An extensive evaluation in two different domains is performed in order to determine how the method behaves in 2-classes (positive and negative), 3-classes (positive, negative and neutral) and 5-classes (strongly negative, weakly negative, neutral, weakly positive and strongly positive) classification tasks. The results obtained compare favorably with those achieved by other systems addressing similar evaluations.
|
32 |
Cross-Caption Coreference Resolution for Automatic Image Understanding
Micah Hodosh, Peter Young, Cyrus Rashtchian and Julia Hockenmaier
show abstracthide abstractIn order to “understand” an image, it is necessary to identify not only the depicted entities, but also their attributes, relations between them and the actions they participate in. This information cannot be conveyed by simple keyword annotations. We have collected a corpus of 8108 “action” images associated each with five simple sentences describing their content and created a simple ontology of entity categories that appear in these images. In order to obtain a consistent semantic representation of the image content from these sentences, we need to first identify multiple mentions of the same entities. We present a hierarchical Bayesian model for cross-caption coreference resolution. We also evaluate how well the ontological types of the entities can be recovered.
|
33 |
Improved Natural Language Learning via Variance-Regularization Support Vector Machines
Shane Bergsma, Dekang Lin and Dale Schuurmans
show abstracthide abstractWe present a simple technique for learning better SVMs using fewer training examples. Rather than using the standard SVM regularization, we regularize toward low weight-variance. Our new SVM objective remains a convex quadratic function of the weights, and is therefore computationally no harder to optimize than a standard SVM. Variance regularization is shown to enable dramatic improvements in the learning rates of SVMs on three lexical disambiguation tasks.
|
|
37 |
Hedge Detection using the RelHunter Approach
Eraldo Fernandes, Carlos Crestana and Ruy Milidiú
show abstracthide abstractRelHunter is a Machine Learning based method for the extraction of structured information from text. Here, we apply RelHunter to the Hedge Detection task, proposed as the CoNLL 2010 Shared Task. RelHunter’s key design idea is to model the target structures as a relation over entities. The method decomposes the original task into three subtasks: (i) Entity Identification; (ii) Candidate Relation Generation; and (iii) Relation Recognition. In the Hedge Detection task, we define three types of entities: cue chunk, start scope token and end scope token. Hence, the Entity Identification subtask is further decomposed into three token classification subtasks, one for each entity type. In the Candidate Relation Generation subtask, we apply a simple procedure to generate a ternary candidate relation. Each instance in this relation represents a hedge candidate composed by a cue chunk, a start scope token and an end scope token. For the Relation Recognition subtask, we use a binary classifier to discriminate between true and false candidates. The four classifiers are trained with the Entropy Guided Transformation Learning algorithm. When compared to the other hedge detection systems of the CoNLL shared task, our scheme shows a competitive performance. The F-score of our system is 54.05 on the evaluation corpus.
|
38 |
A High-Precision Approach to Detecting Hedges and Their Scopes
Halil Kilicoglu and Sabine Bergler
show abstracthide abstractWe extend our prior work on speculative sentence recognition and speculation scope detection in biomedical text to the CoNLL’10 Shared Task on Hedge Detection. In our participation, we sought to assess the extensibility and portability of our prior work, which relies on linguistic categorization and weighting of hedging cues and on syntactic patterns in which these cues play a role. For Task 1a, we tuned our categorization and weighting scheme to recognize hedging in biological text. By accommodating a small number of vagueness quantifiers, we were able to extend our methodology to detecting vague sentences in Wikipedia articles. We exploited constituent parse trees in addition to syntactic dependency relations in resolving hedging scope. Our results are competitive with those of closed-domain trained systems and demonstrate that our high-precision oriented methodology is extensible and portable.
|
39 |
Exploiting Rich Features for Detecting Hedges and Their Scope
Xinxin Li, Jianping Shen, Xiang Gao and Xuan Wang
show abstracthide abstractThis paper describes our system about detecting hedges and their scope in natural language texts for our participation in CoNLL2010 shared tasks. We formalize these two tasks as sequence labeling problems, and implement them using conditional random fields (CRFs) model. In the first task, we use a greedy forward procedure to select features for the classifier. These features include part-of-speech tag, word form, lemma, chunk tag of tokens in the sentence. In the second task, our system exploits rich syntactic features about dependency structures and phrase structures, which achieves a better performance than only using the flat sequence features. Our system achieves the third score in biological data set for the first task, and achieves 0.5265 F1 score for the second task.
|
40 |
Uncertainty Detection as Approximate Max-Margin Sequence Labelling
Oscar Täckström, Sumithra Velupillai, Martin Hassel, Gunnar Eriksson, Hercules Dalianis and Jussi Karlgren
show abstracthide abstractThis paper reports experiments for the CoNLL-2010 Shared Task on Learning to detect hedges and their scope in natural language text. We have addressed the experimental tasks as supervised linear maximum margin prediction prob- lems. For sentence level hedge detection in the biological domain we use an L1-regularised binary support vector machine, while for sentence level weasel detection in the Wikipedia domain, we use an L2-regularised approach. We model the in-sentence uncertainty cue and scope detection task as an L2-regularised approximate maximum margin sequence labelling problem, using the BIO-encoding. In addition to surface level features, we use a variety of linguistic features based on a functional dependency analysis. A greedy forward selection strategy is used in exploring the large set of potential features. Our official results for Task 1 for the biological domain were 0.852 F-score, for the Wikipedia set 0.5538 F-score. For Task 2, our official results were 0.0215 for the entire task with a score of 0.6249 for cue detection. After resolving errors and final bugs, our final results are for Task 1, biological: 0.788, Wikipedia: 0.577; Task 2: 0.396 and 0.785 for cues.
|
41 |
Hedge Detection and Scope Finding by Sequence Labeling with Procedural Feature Selection
Shaodian Zhang, Hai Zhao, Guodong Zhou and Bao-liang Lu
show abstracthide abstractThis paper presents a system which adopts a standard sequence labeling technique for hedge detection and scope finding. For hedge detection, we formulate it as a hedge labeling problem, while for hedge scope finding, we use a two-step labeling strategy, one for hedge labeling and the other for scope finding. In particular, various kinds of syntactic dependencies are systemically exploited and effectively integrated using a large-scale normalized feature selection method. Evaluation on the CoNLL-2010 shared task shows that our system achieves stable and competitive results for all the closed tasks. Furthermore, post-deadline experiments show that the performance can be much further improved using a sufficient feature selection.
|
42 |
Learning to Detect Hedges and their Scope using CRF
Qi Zhao, Chengjie Sun, Bingquan Liu and Yong Cheng
show abstracthide abstractThis paper presents an approach for extracting the hedge cues and their scopes in BioScope corpus using two CRF models for CoNLL 2010 shared task. In the first task, the HCDic feature is proposed to improve the system performances, getting better performance (84.1% in F-score) than the baseline. The HCDic feature is also helpful to make use of cross-domain resources. The comparison of our methods based on between BioScope and Wikipedia corpus is given, which shows that ours are good at hedge cues detection in BioScope corpus but short at the in Wikipedia corpus. To detect the scope of hedge cues, we make rules to post process the text. For future work, we will look forward to constructing regulations for the HCDic to improve our system.
|
43 |
Exploiting Multi-Features to Detect Hedges and Their Scope in Biomedical Texts
Huiwei Zhou, Xiaoyan Li, Degen Huang, Zezhong Li and Yuansheng Yang
show abstracthide abstractIn this paper, we present a machine learning approach that detects hedge cues and their scope in biomedical texts. Identifying hedged information in texts is a kind of semantic filtering of texts and it is important since it could extract speculative information from factual information. In order to deal with the semantic analysis problem, various evidential features are proposed and integrated through a Conditional Random Fields (CRFs) model. Hedge cues that appear in the training dataset are regarded as keywords and employed as an important feature in hedge cue identification system. For the scope finding, we construct a CRF-based system and a syntactic pattern-based system, and compare their performances. Experiments using test data from CoNLL-2010 shared task show that our proposed method is robust. F-score of the biological hedge detection task and scope finding task achieves 86.32% and 54.18% in in-domain evaluations respectively.
|
44 |
A Lucene and Maximum Entropy Model Based Hedge Detection System
Lin Chen and Barbara Di Eugenio
show abstracthide abstractThis paper describes the approach to hedge detection we developed, in order to participate in the shared task at CoNLL 2010. A supervised learning approach is employed in our implementation. Hedge cue annotations in the training data are used as the seed to build a reliable hedge cue set. Maximum Entropy(MaxEnt) model is used as the learning technique to determine uncertainty. By making use of Apache Lucene, we are able to do fuzzy string match to extract hedge cues, and to incorporate part-of-speech(POS) tags in hedge cues. Not only can our system determine the certainty of the sentence, but is also able to find all the contained hedges. Our system was ranked third on the Wikipedia dataset. In later experiments with different parameters, we further improved our results, with a 0.612 F-score on the Wikipedia dataset, and a 0.802 F-score on the biological dataset.
|
45 |
HedgeHunter: A System for Hedge Detection and Uncertainty Classification
David Clausen
show abstracthide abstractWith the dramatic growth of scientific publishing, Information Extraction (IE) systems are becoming an increasingly important tool for large scale data analysis. Hedge detection and uncertainty classification are important components of a high precision IE system. This paper describes a two part supervised system which classifies words as hedge or non-hedged and sentences as certain or uncertain in biomedical and Wikipedia data. In the first stage, our system trains a logistic regression classifier to detect hedges based on lexical and Part-of-Speech collocation features. In the second stage, we use the output of the hedge classifier to generate sentence level features based on the number of hedge cues, the identity of hedge cues, and a Bag-of-Words feature vector to train a logistic regression classifier for sentence level uncertainty. With the resulting classification, an IE system can then discard facts and relations extracted from these sentences or treat them as appropriately doubtful. We present results for in domain training and testing and cross domain training and testing based on a simple union of training sets.
|
46 |
Exploiting CCG Structures with Tree Kernels for Speculation Detection
Liliana Paola Mamani Sanchez, Baoli Li and Carl Vogel
show abstracthide abstractOur CoNLL-2010 speculative sentence detector disambiguates putative keywords based on the following considerations: a speculative keyword may be composed of one or more word tokens; a speculative sentence may have one or more speculative keywords; and if a sentence contains at least one real speculative keyword, it is deemed speculative. A tree kernel classifier is used to assess whether a potential speculative keyword conveys speculation. We exploit information implicit in tree structures. For prediction efficiency, only a segment of the whole tree around a speculation keyword is considered, along with morphological features inside the segment and information about the containing document. A maximum entropy classifier is used for sentences not covered by the tree kernel classifier. Experiments on the Wikipedia data set show that our system achieves 0.55 F-measure (in-domain).
|
47 |
Uncertainty Learning using SVMs and CRFs
Vinodkumar Prabhakaran
show abstracthide abstractIn this work, I explore the use of SVMs and CRFs in the problem of predicting certainty in sentences. I consider this as a task of tagging uncertainty cues in context, for which I used lexical, wordlist-based and deep-syntactic features. Results show that the syntactic context of the tokens in conjunction with the wordlist-based features turned out to be useful in predicting uncertainty cues.
|
48 |
Features for Detecting Hedge Cues
Nobuyuki Shimizu and Hiroshi Nakagawa
show abstracthide abstractWe present a sequential labeling approach to hedge cue detection submitted to the CoNLL-2010 shared task, biological por- tion of task 1. Our main approach is as fol- lows. We make use of partial syntactic in- formation together with features obtained from the unlabeled corpus, and convert the t ask into a sequential BIO-tagging. If a cue is found, a sentence is classified as uncertain and certain otherwise. To ex- amine a large number of feature combi- nations, we employ a genetic algorithm. While some obtained features are difficult to interpret, they were shown to improve the performance of the final system.
|
49 |
A Simple Ensemble Method for Hedge Identification
Ferenc Szidarovszky, Illés Solt and Domonkos Tikk
show abstracthide abstractWe present in this paper a simple hedge identification method and its application on biomedical text. The problem at hand is a subtask of CoNLL 2010 shared task. Our solution consists of two classifiers, a statistical one and a CRF model, and a simple combination schema that combines their predictions. We report in detail on each component of our system and discuss the results. We also show that a more sophisticated combination schema could improve the F-score significantly.
|
50 |
A Baseline Approach for Detecting Sentences Containing Uncertainty
Erik Tjong Kim Sang
show abstracthide abstractWe apply a baseline approach to the CoNLL-2010 shared task data sets on hedge detection. Weights have been assigned to cue words marked in the training data based on their occurrences in certain and uncertain sentences. New sentences received scores that correspond with those of their best scoring cue word, if present. The best acceptance scores for uncertain sentences were determined using 10-fold cross validation on the training data. This approach performed reasonably on the shared task’s biological (F=82.0) and Wikipedia (F=62.8) data sets.
|
51 |
Hedge Classification with Syntactic Dependency Features based on an Ensemble Classifier
Yi Zheng, Qifeng Dai, Qiming Luo and Enhong Chen
show abstracthide abstractWe present our CoNLL-2010 Shared Task system in the paper. The system operates in three steps: sequence labeling, syntactic de-pendency parsing, and classification. We have participated in the Shared Task 1. Our experi-mental results measured by the in-domain and cross-domain F-scores on the biological do-main are 81.11% and 67.99%, and on the Wikipedia domain 55.48% and 55.41%.
|
|
12:30–14:00
|
Lunch
|
14:00–15:15
|
Session 3: Semantics and Information Extraction
14:00–14:25 |
Online Entropy-based Model of Lexical Category Acquisition
Grzegorz Chrupała and Afra Alishahi
show abstracthide abstractChildren learn a robust representation of lexical categories at a young age. We propose an incremental model of this process which efficiently groups words into lexical categories based on their local context using an information-theoretic criterion. We train our model on a corpus of child-directed speech from CHILDES and show that the model learns a fine-grained set of intuitive word categories. Furthermore, we propose a novel evaluation approach by comparing the efficiency of our induced categories against other category sets (including traditional part of speech tags) in a variety of language tasks. We show the categories induced by our model typically outperform the other category sets.
|
14:25–14:50 |
Tagging and Linking Web Forum Posts
Su Nam Kim, Li Wang and Timothy Baldwin
show abstracthide abstractWe propose a method for annotating post-to-post discourse structure in online user forum data, in the hopes of improving troubleshooting-oriented information access. We introduce the tasks of: (1) post classification, based on a novel dialogue act tag set; and (2) link classification. We also introduce three feature sets (structural features, post context features and semantic features) and experiment with three discriminative learners (maximum entropy, SVM-HMM and CRF). We achieve above-baseline results for both dialogue act and link classification, with interesting divergences in which feature sets perform well over the two sub-tasks, and go on to perform preliminary investigation of the interaction between post tagging and linking.
|
14:50–15:15 |
Joint Entity and Relation Extraction using Card-Pyramid Parsing
Rohit Kate and Raymond Mooney
show abstracthide abstractBoth entity and relation extraction can benefit from being performed jointly, allowing each task to correct the errors of the other. We present a new method for joint entity and relation extraction using a graph we call a “card-pyramid”. This graph compactly encodes all possible entities and relations in a sentence, reducing the task of their joint extraction to jointly labeling its nodes. We give an efficient labeling algorithm that is analogous to parsing using dynamic programming. Experimental results show improved results for our joint extraction method compared to a pipelined approach.
|
|
15:30–16:00
|
Break
|
16:00–17:15
|
Session 4: Machine learning
16:00–16:25 |
Distributed Asynchronous Online Learning for Natural Language Processing
Kevin Gimpel, Dipanjan Das and Noah A. Smith
show abstracthide abstractRecent speed-ups for training large-scale models like those found in statistical NLP exploit distributed computing (either on multicore or "cloud" architectures) and rapidly converging online learning algorithms. Here we aim to combine the two. We focus on distributed, "mini-batch" learners that make frequent updates asynchronously (Nedic et al., 2001; Langford et al., 2009). We generalize existing asynchronous algorithms and experiment extensively with structured prediction problems from NLP, including discriminative, unsupervised, and non-convex learning scenarios. Our results show asynchronous learning can provide substantial speedups compared to distributed and single-processor mini-batch algorithms with no signs of error arising from the approximate nature of the technique.
|
16:25–16:50 |
On Reverse Feature Engineering of Syntactic Tree Kernels
Daniele Pighin and Alessandro Moschitti
show abstracthide abstractIn this paper, we provide a theoretical framework for feature selection in tree kernel spaces based on gradient-vector components of kernel-based machines. We show that a huge number of features can be discarded without a significant decrease in accuracy. Our selection algorithm is as accurate as and much more efficient than those proposed in previous work. Comparative experiments on three interesting and very diverse classification tasks, i.e. Question Classification, Relation Extraction and Semantic Role Labeling, support our theoretical findings and demonstrate the algorithm performance.
|
16:50–17:15 |
Inspecting the Structural Biases of Dependency Parsing Algorithms
Yoav Goldberg and Michael Elhadad
show abstracthide abstractWe propose the notion of a *structural bias* inherent in a parsing system with respect to the language it is aiming to parse. This structural bias characterizes the behaviour of a parsing system in terms of structures it tends to under- and over- produce. We propose a Boosting-based method for uncovering some of the structural bias inherent in parsing systems. We then apply our method to four English dependency parsers (an Arc-Eager and Arc-Standard transition-based parsers, and first- and second-order graph-based parsers). We show that all four parsers are biased with respect to the kind of annotation they are trained to parse. We present a detailed analysis of the biases that highlights specific differences and commonalities between the parsing systems, and improves our understanding of their strengths and weaknesses.
|
|
17:15–17:45
|
SIGNLL Business Meeting and Best Paper Award
|
Thursday, July 15, 2010 |
8:45–9:00
|
Opening Remarks
|
9:00–9:50
|
Full Paper Session 1
9:00–9:25 |
A Semi-supervised Word Alignment Algorithm with Partial Manual Alignments
Qin Gao, Nguyen Bach and Stephan Vogel
show abstracthide abstractWe present a word alignment framework that can incorporate partial manual alignments. The core of the approach is a novel semi-supervised algorithm extending the widely used IBM Models with a constrained EM algorithm. %that has a small impact on the time complexity. The partial manual alignments can be obtained by human labelling or automatically by high-precision-low-recall heuristics. We demonstrate the usages of both methods by selecting alignment links from manually aligned corpus and apply links generated from bilingual dictionary on unlabelled data. For the first method, we conduct controlled experiments on Chinese-English and Arabic-English translation tasks to compare the quality of word alignment, and to measure effects of two different methods in selecting alignment links from manually aligned corpus. For the second method, we experimented with moderate-scale Chinese-English translation task. The experiment results show an average improvement of 0.33 BLEU point across 8 test sets.
|
9:25–9:50 |
Fast Consensus Hypothesis Regeneration for Machine Translation
Boxing Chen, George Foster and Roland Kuhn
show abstracthide abstractThis paper presents a fast consensus hypothesis regeneration approach for machine translation. It combines the advantages of feature-based fast consensus decoding and hypothesis regeneration. Our approach is more efficient than previous work on hypothesis regeneration, and it explores a wider search space than consensus decoding, resulting in improved performance. Experimental results show consistent improvements across language pairs, and an improvement of up to 0.72 BLEU is obtained over a competitive single-pass baseline on the Chinese-to-English NIST task.
|
|
9:50–10:45
|
Shared Translation Task
9:50–10:15 |
Findings of the 2010 Joint Workshop on Statistical Machine Translation and Metrics for Machine Translation
Chris Callison-Burch, Philipp Koehn, Christof Monz, Kay Peterson, Mark Przybocki and Omar Zaidan
show abstracthide abstractThis paper presents the results of the WMT10 and MetricsMATR10 shared tasks, which included a translation task, a system combination task, and an evaluation task. We conducted a large-scale manual evaluation of machine translation systems and system combination entries. We used the ranking of these systems to measure how strongly automatic metrics correlate with human judgments of translation quality. This year we also investigated increasing the number of human judgments by hiring non-expert annotators through Amazon’s Mechanical Turk.
|
10:15–10:45 |
Boaster Session 1: Translation Task
|
|
10:45–11:00
|
Morning Break
|
11:00–12:30
|
Poster Session: Translation Task
101 |
LIMSI’s Statistical Translation Systems for WMT’10
Alexandre Allauzen, Josep M. Crego, İlknur Durgar El-Kahlout and François Yvon
show abstracthide abstractThis paper describes our Statistical Machine Translation systems for the WMT10 evaluation, where LIMSI participated for two language pairs (French-English and German-English, in both directions). For German-English, we concentrated on normalizing the German side through a proper preprocessing, aimed at reducing the lexical redundancy and at splitting complex compounds. For French-English, we studied two extensions of our in-house N-code decoder: firstly, the effect of integrating a new bilingual reordering model; second, the use of adaptation techniques for the translation model. For both set of experiments, we report the improvements obtained on the development and test data.
|
102 |
2010 Failures in English-Czech Phrase-Based MT
Ondřej Bojar and Kamil Kos
show abstracthide abstractThe paper describes our experiments with English-Czech machine translation for WMT10 in 2010. Focusing primarily on the translation to Czech, our additions to the standard Moses phrase-based MT pipeline include two-step translation to overcome target-side data sparseness and optimization towards SemPOS, a metric better suited for evaluating Czech. Unfortunately, none of the approaches bring a significant improvement over our standard setup.
|
103 |
An Empirical Study on Development Set Selection Strategy for Machine Translation Learning
Hui Cong, Zhao Hai, Lu Bao-Liang and Song Yan
show abstracthide abstractIn this paper we have described our system for WMT10 machine translation shared task and discussed the development set selection. Comparing the results using different development sets and batch processing, we think that the choice of the development set would play a important role in the translation performance, in other words, the unseen translation is tuning-sensitive. We have found that the combined development set could lead results be more stable and better enough. The next step is to find out the specific critical factors in the development selection which can guide us to improve translation performance.
|
104 |
The University of Maryland Statistical Machine Translation System for the Fifth Workshop on Machine Translation
Vladimir Eidelman, Chris Dyer and Philip Resnik
show abstracthide abstractThis paper describes the system we developed to improve German-English translation of News text for the shared task of the Fifth Workshop on Statistical Machine Translation. Working within cdec, an open source modular framework for machine translation, we explore the benefits of several modifications to our hierarchical phrase-based model, including segmentation lattices, minimum Bayes Risk decoding, grammar extraction methods, and varying language models. Furthermore, we analyze decoder speed and memory performance across our set of models and show there is an important trade-off that needs to be made.
|
105 |
Further Experiments with Shallow Hybrid MT Systems
Christian Federmann, Andreas Eisele, Yu Chen, Sabine Hunsicker, Jia Xu and Hans Uszkoreit
show abstracthide abstractWe describe our hybrid machine translation system which has been developed for and used in the WMT10 shared task. We compute translations from a rule-based MT system and combine the resulting translation “templates” with partial phrases from a state-of-the-art phrase- based, statistical MT engine. Phrase substitution is guided by several decision factors, a continuation of previous work within our group. For the shared task, we have computed translations for six language pairs including English, German, French and Spanish. Our experiments have shown that our shallow substitution approach can effectively improve the translation result from the RBMT system; however it has also become clear that a deeper integration is needed to further improve translation quality.
|
106 |
Improved Features and Grammar Selection for Syntax-Based MT
Greg Hanneman, Jonathan Clark and Alon Lavie
show abstracthide abstractWe present the Carnegie Mellon University Stat-XFER group submission to the WMT 2010 shared translation task. Updates to our syntax-based SMT system mainly fell in the areas of new feature formulations in the translation model and improved filtering of SCFG rules. Compared to our WMT 2009 submission, we report a gain of 1.73 BLEU by using the new features and decoding environment, and a gain of up to 0.52 BLEU from improved grammar selection.
|
107 |
FBK at WMT 2010: Word Lattices for Morphological Reduction and Chunk-based Reordering
Christian Hardmeier, Arianna Bisazza and Marcello Federico
show abstracthide abstractFBK participated in the WMT~2010 Machine Translation shared task with phrase-based Statistical Machine Translation systems based on the Moses decoder for English-German and German-English translation. Our work concentrates on exploiting the available language modelling resources by using linear mixtures of large 6-gram language models and on addressing linguistic differences between English and German with methods based on word lattices. In particular, we use lattices to integrate a morphological analyser for German into our system, and we present some initial work on rule-based word reordering.
|
109 |
The RWTH Aachen Machine Translation System for WMT 2010
Carmen Heger, Joern Wuebker, Matthias Huck, Gregor Leusch, Saab Mansour, Daniel Stein and Hermann Ney
show abstracthide abstractIn this paper we describe the statistical machine translation system of the RWTH Aachen University developed for the translation task of the Fifth Workshop on Statistical Machine Translation. State-of-the-art phrase-based and hierarchical statistical MT systems are augmented with appropriate morpho-syntactic enhancements, as well as alternative phrase training methods and extended lexicon models. For some tasks, a system combination of the best systems was used to generate a final hypothesis. We participated in the constrained condition of German-English and French-English in each translation direction.
|
110 |
Using Collocation Segmentation to Augment the Phrase Table
Carlos A. Henríquez Q., Marta Ruiz Costa-jussà, Vidas Daudaravicius, Rafael E. Banchs and José B. Mariño
show abstracthide abstractAbstract This paper describes the 2010 phrase-based statistical machine translation system developed at the TALP Research Center of the UPC in cooperation with BMIC and VMU. In phrase-based SMT, the phrase table is the main tool in translation. It is created extracting phrases from an aligned parallel corpus and then computing translation model scores with them. Performing a collocation segmentation over the source and target corpus before the alignment cause that different and larger phrases are extracted from the same original documents. We performed this segmentation and used the union of this phrase set with the phrase set extracted from the non-segmented corpus to compute the phrase table. We present the configurations considered and also report results obtained with internal and official test sets.
|
111 |
The RALI Machine Translation System for WMT 2010
Stéphane Huet, Julien Bourdaillet, Alexandre Patry and Philippe Langlais
show abstracthide abstractWe describe our system for the translation task of WMT 2010. This system, developed for the English-French and French-English directions, is based on Moses and was trained using only the resources supplied for the workshop. We report experiments to enhance it with out-of-domain parallel corpora sub-sampling, N-best list post-processing and a French grammatical checker.
|
112 |
Exodus - Exploring SMT for EU Institutions
Michael Jellinghaus, Alexandros Poulis and David Kolovratník
show abstracthide abstractIn this paper, we describe Exodus, a joint pilot project of the European Commission’s Directorate-General for Translation (DGT) and the European Parliament’s Directorate-General for Translation (DG TRAD) which explores the potential of deploying new approaches to machine translation in European institutions. We have participated in the English-to-French track of this year’s WMT10 shared translation task using a system trained on data previously extracted from large in-house translation memories.
|
113 |
More Linguistic Annotation for Statistical Machine Translation
Philipp Koehn, Barry Haddow, Philip Williams and Hieu Hoang
show abstracthide abstractWe report on efforts to build large-scale translation systems for eight European language pairs. We achieve most gains from the use of larger training corpora and basic modeling, but also show promising results from integrating more linguistic annotation.
|
114 |
LIUM SMT Machine Translation System for WMT 2010
Patrik Lambert, Sadaf Abdul-Rauf and Holger Schwenk
show abstracthide abstractThis paper describes the development of French–English and English–French machine translation systems for the 2010 WMT shared task evaluation. These systems were standard phrase-based statistical systems based on the Moses decoder, trained on the provided data only. Most of our efforts were devoted to the choice and extraction of bilingual data used for training. We filtered out some bilingual corpora and pruned the phrase table. We also investigated the impact of adding two types of additional bilingual texts, extracted automatically from the available monolingual data. We first collected bilingual data by performing automatic translations of monolingual texts. The second type of bilingual text was harvested from comparable corpora with Information Retrieval techniques.
|
115 |
Lessons from NRC’s Portage System at WMT 2010
Samuel Larkin, Boxing Chen, George Foster, Ulrich Germann, Eric Joanis, Howard Johnson and Roland Kuhn
show abstracthide abstractNRC’s Portage system participated in the English-French (E-F) and French-English (F-E) translation tasks of the ACL WMT 2010 evaluation. The most notable improvement over earlier versions of Portage is an efficient implementation of lattice MERT. While Portage has typically performed well in Chinese to English MT evaluations, most recently in the NIST09 evaluation, our participation in WMT 2010 revealed some interesting differences be-tween Chinese-English and E-F/F-E translation, and alerted us to certain weak spots in our system. Most of this paper discusses the problems we found in our system and ways of fixing them. We learned several lessons that we think will be of general interest.
|
116 |
Joshua 2.0: A Toolkit for Parsing-Based Machine Translation with Syntax, Semirings, Discriminative Training and Other Goodies
Zhifei Li, Chris Callison-Burch, Chris Dyer, Juri Ganitkevitch, Ann Irvine, Sanjeev Khudanpur, Lane Schwartz, Wren Thornton, Ziyuan Wang, Jonathan Weese and Omar Zaidan
show abstracthide abstractWe describe the progress we have made in the past year on Joshua (Li et al., 2009), an open source toolkit for parsing based machine translation. The new functionality includes: support for translation grammars with a rich set of syntactic nonterminals, the ability for external modules to posit constraints on how spans in the input sentence should be translated, lattice parsing for dealing with input uncertainty, a semiring framework that provides a unified way of doing various dynamic programming calculations, variational decoding for approximating the intractable MAP decoding, hypergraph-based discriminative training for better feature engineering, a parallelized MERT module, document-level and tail-based MERT, visualization of the derivation trees, and a cleaner pipeline for MT experiments.
|
117 |
The Karlsruhe Institute for Technology Translation System for the ACL-WMT 2010
Jan Niehues, Teresa Herrmann, Mohammed Mediani and Alex Waibel
show abstracthide abstractThis paper describes our phrase-based Statistical Machine Translation (SMT) system for the WMT10 Translation Task. We submitted translations for the German to English and English to German translation tasks. Compared to state-of-the-art phrase-based systems we preformed additional preprocessing and used a discriminative word alignment approach. The word reordering was modeled using POS information and we extended the translation model with additional features.
|
118 |
MATREX: The DCU MT System for WMT 2010
Sergio Penkale, Rejwanul Haque, Sandipan Dandapat, Pratyush Banerjee, Ankit K. Srivastava, Jinhua Du, Pavel Pecina, Sudip Kumar Naskar, Mikel L. Forcada and Andy Way
show abstracthide abstractThis paper describes the DCU machine translation system in the evaluation campaign of the Joint Fifth Workshop on Statistical Machine Translation and Metrics in ACL-2010. We describe the modular design of our multi-engine machine translation (MT) system with particular focus on the components used in this participation. We participated in the English-Spanish and English-Czech translation tasks, in which we employed our multi-engine architecture to translate. We also participated in the system combination task which was carried out by the MBR decoder and confusion network decoder.
|
119 |
The Cunei Machine Translation Platform for WMT ’10
Aaron Phillips
show abstracthide abstractThis paper describes the Cunei Machine Translation Platform and how it was used in the WMT ’10 German to English and Czech to English translation tasks.
|
120 |
The CUED HiFST System for the WMT10 Translation Shared Task
Juan Pino, Gonzalo Iglesias, Adrià de Gispert, Graeme Blackwood, Jamie Brunning and William Byrne
show abstracthide abstractThis paper describes the Cambridge University Engineering Department submission to the Fifth Workshop on Statistical Machine Translation. We report results for the French-English and Spanish-English shared translation tasks in both directions. The CUED system is based on HiFST, a hierarchical phrase-based decoder implemented using weighted finite-state transducers. In the French-English task, we investigate the use of context-dependent alignment models. We also show that lattice minimum Bayes-risk decoding is an effective framework for multi-source translation, leading to large gains in BLEU score.
|
121 |
The LIG Machine Translation System for WMT 2010
Marion Potet, Laurent Besacier and Hervé Blanchon
show abstracthide abstractThis paper describes the system submitted by the Laboratory of Informatics of Grenoble (LIG) for the fifth Workshop on Statistical Machine Translation. We participated to the news shared translation task for the French-English language pair. We investigated differents techniques to simply deal with Out-Of-Vocabulary words in a statistical phrase-based machine translation system and analyze their impact on translation quality. The final submission is a combination between a standard phrase-based system using the Moses decoder, with appropriate setups and pre-processing, and a lemmatized one to prevent Out-Of-Vocabulary conjugate verbs.
|
122 |
Linear Inversion Transduction Grammar Alignments as a Second Translation Path
Markus Saers, Joakim Nivre and Dekai Wu
show abstracthide abstractWe explore the possibility of using Stochastic Bracketing Linear Inversion Transduction Grammars for a full-scale German–English translation task, both on their own and in conjunction with alignments induced with GIZA++. The rationale for transduction grammars, the details of the system and some results are presented.
|
123 |
UPV-PRHLT English–Spanish System for WMT10
Germán Sanchis-Trilles, Jesús Andrés-Ferrer, Guillem Gascó, Jesús González Rubio, Pascual Martínez-Gómez, Martha-Alicia Rocha, Joan-Andreu Sánchez and Francisco Casacuberta
show abstracthide abstractIn this paper, the system submitted by the PRHLT group for the Fifth Workshop on Statistical Machine Translation of ACL2010 is presented. On this evaluation campaign, we have worked on the English–Spanish language pair, putting special emphasis on two problems derived from the large amount of data available. The first one, how to optimize the use of the monolingual data within the language model, and the second one, how to make a good use of all the bilingual data provided without making use of unnecessary computational resources.
|
124 |
Reproducible Results in Parsing-Based Machine Translation: The JHU Shared Task Submission
Lane Schwartz
show abstracthide abstractWe present the Johns Hopkins University submission to the 2010 WMT shared translation task. We describe processing steps using open data and open source software used in our submission, and provide the scripts and configurations required to train, tune, and test our machine translation system.
|
125 |
Vs and OOVs: Two Problems for Translation between German and English
Sara Stymne, Maria Holmqvist and Lars Ahrenberg
show abstracthide abstractIn this paper we report on experiments with three preprocessing strategies for improving translation output in a statistical MT system. In training, two reordering strategies were studied: (i) reorder on the basis of the alignments from Giza++, and (ii) reorder by moving all verbs to the end of segments. In translation, out-of-vocabulary words were preprocessed in a knowledge-lite fashion to identify a likely equivalent. All three strategies were implemented for our translation systems between English and German submitted to the WMT10 shared task. Reordering by using Giza++ in two phases had a small, but consistent positive effect on metrics for our systems. Aligning verbs by co-locating them at the end of sentences had a largely negative effect. However, it seems that this strategy produced some useful alignments, since when its output was concatenated with the baseline alignment before extracting the phrase table, there were consistent improvements. Combining reordering in training with the knowledge-lite method for handling out-of-vocabulary words led to significant improvements on Meteor scores for translation between German and English in both directions.
|
126 |
To Cache or not to Cache? Experiments with Adaptive Models in Statistical Machine Translation
Jörg Tiedemann
show abstracthide abstractWe report results of our submissions to the WMT 2010 shared translation task in which we applied a system that includes adaptive language and translation models. Adaptation is implemented using exponentially decaying caches storing previous translations as the history for new predictions. Evidence from the cache is then mixed with the global background model. The main problem in this setup is error propagation and our submissions essentially failed to improve over the competitive baseline. There are slight improvements in lexical choice but the global performance decreases in terms of BLEU scores.
|
127 |
Applying Morphological Decompositions to Statistical Machine Translation
Sami Virpioja, Jaakko Väyrynen, Andre Mansikkaniemi and Mikko Kurimo
show abstracthide abstractThis paper describes the Aalto submission for the German-to-English and the Czech-to-English translation tasks of the ACL 2010 Joint Fifth Workshop on Statistical Machine Translation and MetricsMATR. Statistical machine translation has focused on using words, and longer phrases constructed from words, as tokens in the system. In contrast, we apply different morphological decompositions of words using the unsupervised Morfessor algorithms. While translation models trained using the morphological decompositions did not improve the BLEU scores, we show that the Minimum Bayes Risk combination with a word-based translation model produces significant improvements for the German-to-English translation. However, we did not see improvements for the Czech-to-English translations.
|
128 |
Maximum Entropy Translation Model in Dependency-Based MT Framework
Zdeněk Žabokrtský, Martin Popel and David Mareček
show abstracthide abstractMaximum Entropy Principle has been used successfully in various NLP tasks. In this paper we propose a forward translation model consisting of a set of maximum entropy classifiers: a separate classifier is trained for each (sufficiently frequent) source-side lemma. In this way the estimates of translation probabilities can be sensitive to a large number of features derived from the source sentence (including non-local features, features making use of sentence syntactic structure, etc.). When integrated into English-to-Czech dependency-based translation scenario implemented in the TectoMT framework, the new translation model significantly outperforms the baseline model (MLE) in terms of BLEU. The performance is further boosted in a configuration inspired by Hidden Tree Markov Models which combines the maximum entropy translation model with the target-language dependency tree model.
|
129 |
UCH-UPV English–Spanish system for WMT10
Francisco Zamora-Martinez and Germán Sanchis-Trilles
show abstracthide abstractThis paper describes the system developed in collabaration between UCH and UPV for the 2010 WMT. For this year’s workshop, we present a system for English-Spanish translation. Output N-best lists were rescored via a target Neural Network Language Model, yielding improvements in the final translation quality as measured by BLEU and TER.
|
130 |
Hierarchical Phrase-Based MT at the Charles University for the WMT 2010 Shared Task
Daniel Zeman
show abstracthide abstractWe describe our experiments with hierarchical phrase-based machine translation for WMT 2010 Shared Task. We provide a detailed description of our configuration and data so the results are replicable. For English-to-Czech translation, we experiment with several datasets of various sizes and with various preprocessing sequences. For the other 7 translation directions, we just present the baseline results.
|
|
12:30–14:00
|
Lunch
|
14:00–15:00
|
|
15:05–15:30
|
Full Paper Session 2
15:05–15:30 |
Incremental Decoding for Phrase-based Statistical Machine Translation
Baskaran Sankaran, Ajeet Grewal and Anoop Sarkar
show abstracthide abstractIn this paper we focus on the incremental decoding for a statistical machine translation system. In incremental decoding, translations are generated incrementally for every word typed by a user, instead of waiting for the entire sentence as input. We propose a novel modification in the beam-search decoder to address this issue in a phrase-based setting, which is aimed at efficient computation of future costs and avoiding search errors. Our objective is to do a faster translation while incremental decoding without significant reduction in the translation quality as measured by BLEU.
|
|
15:30–16:00
|
Afternoon Break
|
16:00–17:40
|
Full Paper Session 3
16:00–16:25 |
How to Avoid Burning Ducks: Combining Linguistic Analysis and Corpus Statistics for German Compound Processing
Fabienne Fritzinger and Alexander Fraser
show abstracthide abstractCompound splitting is an important problem in many NLP applications which must be solved in order to address issues of data sparsity. Previous work has shown that linguistic approaches for German compound splitting more often produce a correct splitting, but corpus-based approaches work best for phrase-based statistical machine translation from German to English, a worrisome contradiction. We address this situation by combining linguistic analysis with corpus-based statistics and obtaining better results in terms of both producing splittings according to a gold standard and statistical machine translation performance.
|
16:25–16:50 |
Chunk-based Verb Reordering in VSO Sentences for Arabic-English Statistical Machine Translation
Arianna Bisazza and Marcello Federico
show abstracthide abstractIn Arabic-to-English phrase-based statistical machine translation, a large number of syntactic disfluencies are due to wrong long-range reordering of the verb in VSO sentences, where the verb is anticipated with respect to the English word order. In this paper, we propose a chunk-based reordering technique to automatically detect and displace clause-initial verbs in the Arabic side of a word-aligned parallel corpus. This method is applied to preprocess the training data, and to collect statistics about verb movements. From this analysis, specific verb reordering lattices are then built on the test sentences before decoding them. The application of our reordering methods on the training and test sets results in consistent BLEU score improvements on the NIST-MT 2009 Arabic-English benchmark.
|
16:50–17:15 |
Head Finalization: A Simple Reordering Rule for SOV Languages
Hideki Isozaki, Katsuhito Sudoh, Hajime Tsukada and Kevin Duh
show abstracthide abstractEnglish is a typical SVO (Subject-Verb-Object) language, while Japanese is a typical SOV language. Conventional Statistical Machine Translation (SMT) systems work well within each of these language families. However, SMT-based translation from an SVO language to an SOV language does not work well because their word orders are completely different. Recently, a few groups have proposed rule-based preprocessing methods to mitigate this problem (Xu et al., 2009; Hong et al., 2009). These methods rewrite SVO sentences to derive more SOV-like sentences by using a set of handcrafted rules. In this paper, we propose an alternative single reordering rule: Head Finalization. This is a syntax-based preprocessing approach that offers the advantage of simplicity. We do not have to be concerned about partof-speech tags or rule weights because the powerful Enju parser allows us to implement the rule at a general level. Our experiments show that its result, Head Final English (HFE), follows almost the same order as Japanese. We also show that this rule improves automatic evaluation scores.
|
17:15–17:40 |
Aiding Pronoun Translation with Co-Reference Resolution
Ronan Le Nagard and Philipp Koehn
show abstracthide abstractWe propose a method to improve the translation of pronouns by resolving their co-reference to prior mentions. We report results using two different co-reference resolution methods and point to remaining challenges.
|
|
Friday, July 16, 2010 |
9:00–11:00
|
Shared Task Presentations
9:00–10:00 |
|
10:00–10:30 |
|
10:30–10:45 |
|
10:45–11:00 |
|
|
11:00–12:30
|
Poster Sessions
|
101 |
Jane: Open Source Hierarchical Translation, Extended with Reordering and Lexicon Models
David Vilar, Daniel Stein, Matthias Huck and Hermann Ney
show abstracthide abstractWe present Jane, RWTH’s hierarchical phrase-based translation system, which has been open sourced for the scientific community. This system has been in development at RWTH for the last two years and has been successfully applied in different machine translation evaluations. It includes extensions to the hierarchical approach developed by RWTH as well as other research institutions. In this paper we give an overview of its main features. We also introduce a novel reordering model for the hierarchical phrase-based approach which further enhances translation performance, and analyze the effect some recent extended lexicon models have on the performance of the system.
|
|
102 |
MANY: Open Source MT System Combination at WMT’10
Loïc Barrault
show abstracthide abstractLIUM participated in the System Combination task of the Fifth Workshop on Statistical Machine Translation (WMT 2010). Hypotheses from 5 French/English MT systems were combined with MANY, an open source system combination software based on confusion networks currently developed at LIUM. The system combination yielded significant improvements in BLEU score when applied on WMT’09 data. The same behavior has been observed when tuning is performed on development data of this year evaluation.
|
103 |
Adaptive Model Weighting and Transductive Regression for Predicting Best System Combinations
Ergun Bicici and S. Serdar Kozat
show abstracthide abstractWe analyze adaptive model weighting techniques for reranking using instance scores obtained by L1 regularized transductive regression. Competitive statistical machine translation is an on-line learning technique for sequential translation tasks where we try to select the the best among competing statistical machine translators. The competitive predictor assigns a probability per model weighted by the sequential performance. We define additive, multiplicative, and loss-based weight updates with exponential loss functions for competitive statistical machine translation. Without any pre-knowledge of the performance of the translation models, we succeed in achieving the performance of the best model in all systems and surpass their performance in most of the language pairs we considered.
|
104 |
L1 Regularized Regression for Reranking and System Combination in Machine Translation
Ergun Bicici and Deniz Yuret
show abstracthide abstractWe use L1 regularized transductive regression to learn mappings between source and target features of the training sets derived for each test sentence and use these mappings to rerank translation outputs. We compare the effectiveness of L1 regularization techniques for regression to learn mappings between features given in a sparse feature matrix. The results show the effectiveness of using L1 regularization versus L2 used in ridge regression. We show that regression mapping is effective in reranking translation outputs and in selecting the best system combinations with encouraging results on different language pairs.
|
105 |
An Augmented Three-Pass System Combination Framework: DCU Combination System for WMT 2010
Jinhua Du, Pavel Pecina and Andy Way
show abstracthide abstractThis paper describes the augmented threepass system combination framework of the Dublin City University (DCU) MT group for the WMT 2010 system combination task. The basic three-pass framework includes building individual confusion networks (CNs), a super network, and a modified Minimum Bayes-risk (mCon- MBR) decoder. The augmented parts for WMT2010 tasks include 1) a rescoring component which is used to re-rank the N-best lists generated from the individual CNs and the super network, 2) a new hypothesis alignment metric – TERp – that is used to carry out English-targeted hypothesis alignment, and 3) more different backbone-based CNs which are employed to increase the diversity of the mConMBR decoding phase. We took part in the combination tasks of Englishto- Czech and French-to-English. Experimental results show that our proposed combination framework achieved 2.17 absolute points (13.36 relative points) and 1.52 absolute points (5.37 relative points) in terms of BLEU score on English-to- Czech and French-to-English tasks respectively than the best single system. We also achieved better performance on human evaluation.
|
106 |
The UPV-PRHLT Combination System for WMT 2010
Jesús González Rubio, Germán Sanchis-Trilles, Joan-Andreu Sánchez, Jesús Andrés-Ferrer, Guillem Gascó, Pascual Martínez-Gómez, Martha-Alicia Rocha and Francisco Casacuberta
show abstracthide abstractUPV-PRHLT participated in the System Combination task of the Fifth Workshop on Statistical Machine Translation (WMT 2010). On each translation direction, all the submitted systems were combined into a consensus translation. These consensus translations always improve translation quality of the best individual system.
|
107 |
CMU Multi-Engine Machine Translation for WMT 2010
Kenneth Heafield and Alon Lavie
show abstracthide abstractThis paper describes our submission, cmu-heafield-combo, to the WMT 2010 machine translation system combination task. Using constrained resources, we participated in all nine language pairs, namely translating English to and from Czech, French, German, and Spanish as well as combining English translations from multiple languages. Combination proceeds by aligning all pairs of system outputs then navigating the aligned outputs from left to right where each path is a candidate combination. Candidate combinations are scored by their length, agreement with the underlying systems, and a language model. On tuning data, improvement in BLEU over the best system depends on the language pair and ranges from 0.89% to 5.57% with mean 2.37%.
|
108 |
CMU System Combination via Hypothesis Selection for WMT’10
Almut Silja Hildebrand and Stephan Vogel
show abstracthide abstractThis paper describes the CMU entry for the system combination shared task at WMT’10. Our combination method is hypothesis selection, which uses information from n-best lists from the input MT systems, where available. The sentence level features used are independent from the MT systems involved. Compared to the baseline we added source-to-target word alignment based features and trained system weights to our feature set. We combined MT systems for French - English and German - English using provided data only.
|
109 |
JHU System Combination Scheme for WMT 2010
Sushant Narsale
show abstracthide abstractThis paper describes the JHU system combination scheme that was used in the WMT 2010 submission.
|
110 |
The RWTH System Combination System for WMT 2010
Gregor Leusch and Hermann Ney
show abstracthide abstractRWTH participated in the System Combination task of the Fifth Workshop on Statistical Machine Translation (WMT 2010). For 7 of the 8 language pairs, we combine 5 to 13 systems into a single consensus translation, using additional nbest reranking techniques in two of these language pairs. Depending on the language pair, improvements versus the best single system are in the range of +0.5 and +1.7 on BLEU, and between -0.4 and -2.3 on TER. Novel techniques compared with RWTH’s submission to WMT 2009 include the utilization of nbest reranking techniques, a consensus true casing approach, a different tuning algorithm, and the separate selection of input systems for CN construction, primary/skeleton hypotheses, HypLM, and true casing.
|
111 |
BBN System Description for WMT10 System Combination Task
Antti-Veikko Rosti, Bing Zhang, Spyros Matsoukas and Richard Schwartz
show abstracthide abstractBBN submitted system combination outputs for Czech-English, German-English, Spanish-English, French-English, and All- English language pairs. All combinations were based on confusion network decoding. An incremental hypothesis alignment algorithm with flexible matching was used to build the networks. The bi-gram decoding weights for the single source language translations were tuned directly to maximize the BLEU score of the decoding output. Approximate expected BLEU was used as the objective function in gradient based optimization of the combination weights for a 44 system multi-source language combination (All-English). The system combination gained around 0.4- 2.0 BLEU points over the best individual systems on the single source conditions. On the multi-source condition, the system combination gained 6.6 BLEU points.
|
|
112 |
LRscore for Evaluating Lexical and Reordering Quality in MT
Alexandra Birch and Miles Osborne
show abstracthide abstractThe ability to measure the quality of word order in translations is an important goal for research in machine translation. Current machine translation metrics do not adequately measure the reordering performance of translation systems. We present a novel metric, the LRscore, which directly measures reordering success. The reordering component is balanced by a lexical metric. Capturing the two most important elements of translation success in a simple combined metric with only one parameter results in an intuitive, shallow, language independent metric.
|
113 |
Document-level Automatic MT Evaluation based on Discourse Representations
Elisabet Comelles, Jesus Gimenez, Lluis Marquez, Irene Castellon and Victoria Arranz
show abstracthide abstractThis paper describes the joint submission of Universitat Politècnica de Catalunya and Universitat de Barcelona to the Metrics MaTr 2010 evaluation challenge, in collaboration with ELDA/ELRA. Our work is aimed at widening the scope of current automatic evaluation measures from sentence to document level. Preliminary experiments, based on an extension of the metrics by Gimenez and Marquez (2009) operating over discourse representations, are presented.
|
114 |
METEOR-NEXT and the METEOR Paraphrase Tables: Improved Evaluation Support for Five Target Languages
Michael Denkowski and Alon Lavie
show abstracthide abstractThis paper describes our submission to the WMT10 Shared Evaluation Task and MetricsMATR10. We present a version of the METEOR-NEXT metric with paraphrase tables for five target languages. We describe the creation of these paraphrase tables and conduct a tuning experiment that demonstrates consistent improvement across all languages over baseline versions of the metric without paraphrase resources.
|
115 |
Normalized Compression Distance Based Measures for MetricsMATR 2010
Marcus Dobrinkat, Tero Tapiovaara, Jaakko Väyrynen and Kimmo Kettunen
show abstracthide abstractWe present the MT-NCD and MT-mNCD machine translation evaluation metrics as submission to the machine translation evaluation shared task (MetricsMATR 2010). The metrics are based on normalized compression distance (NCD), a general information theoretic measure of string similarity, and evaluated against human judgments from the WMT08 shared task. The experiments show that 1) our metric improves correlation to human judgments by using flexible matching, 2) segment replication is effective, and 3) our NCD-inspired method for multiple references indicates improved results. Generally, the proposed MT-NCD and MT-mNCD methods correlate competitively with human judgments compared to commonly used machine translations evaluation metrics, for instance, BLEU.
|
116 |
The DCU Dependency-Based Metric in WMT-MetricsMATR 2010
Yifan He, Jinhua Du, Andy Way and Josef van Genabith
show abstracthide abstractWe describe DCU’s LFG dependency-based metric submitted to the shared evaluation task of WMT-MetricsMATR 2010. The metric is built on the LFG F-structure-based approach presented in (Owczarzak et al., 2007). We explore the following improvements on the original metric: 1) we replace the in-house LFG parser with an open source dependency parser that directly parses strings into LFG dependencies; 2) we add a stemming module and unigram paraphrases to strengthen the aligner; 3) we introduce a chunk penalty following the practice of Meteor to reward continuous matches; and 4) we introduce and tune parameters to maximize the correlation with human judgement. Experiments show that these enhancements improve the dependency-based metric’s correlation with human judgement.
|
117 |
TESLA: Translation Evaluation of Sentences with Linear-programming-based Analysis
Chang Liu, Daniel Dahlmeier and Hwee Tou Ng
show abstracthide abstractWe present TESLA-M and TESLA, two novel automatic machine translation evaluation metrics with state-of-the-art performances. TESLA-M builds on the success of METEOR and MaxSim, but employs a more expressive linear programming framework. TESLA further exploits parallel texts to build a shallow semantic representation. We evaluate both on the WMT 2009 shared evaluation task and show that they outperform all participating systems in most tasks.
|
118 |
The Parameter-optimized ATEC Metric for MT Evaluation
Billy Wong and Chunyu Kit
show abstracthide abstractThis paper describes the latest version of the ATEC metric for automatic MT evaluation, with parameters optimized for word choice and word order, the two fundamental features of language that the metric relies on. The former is assessed by matching at various linguistic levels and weighting the informativeness of both matched and unmatched words. The latter is quantified in term of word position and information flow. We also discuss those aspects of language not yet covered by other existing evaluation metrics but carefully considered in the formulation of our metric.
|
|
12:30–14:00
|
Lunch
|
14:00–15:40
|
Full Paper Session 4
14:00–14:25 |
A Unified Approach to Minimum Risk Training and Decoding
Abhishek Arun, Barry Haddow and Philipp Koehn
show abstracthide abstractWe present a unified approach to performing minimum risk training and minimum Bayes risk (MBR) decoding with BLEU in a phrase-based model. Key to our approach is the use of a Gibbs sampler that allows us to explore the entire probability distribution and maintain a strict probabilistic formulation across the pipeline. We also describe a new sampling algorithm called corpus sampling which allows us at training time to use BLEU instead of an approximation thereof. Our approach is theoretically sound and gives better (up to +0.6%BLEU) and more stable results than the standard MERT optimization algorithm. By comparing our approach to lattice MBR, we are also able to gain crucial insights about both methods.
|
14:25–14:50 |
N-best Reranking by Multitask Learning
Kevin Duh, Katsuhito Sudoh, Hajime Tsukada, Hideki Isozaki and Masaaki Nagata
show abstracthide abstractWe propose a new framework for N-best reranking on sparse feature sets. The idea is to reformulate the reranking problem as a Multitask Learning problem, where each N-best list corresponds to a distinct task. This is motivated by the observation that N-best lists often show significant differences in feature distributions. Training a single reranker directly on this heterogenous data can be difficult. Our proposed meta-algorithm solves this challenge by using multitask learning (such as l1/l2 regularization) to discover common feature representations across N-best lists. This meta-algorithm is simple to implement, and its modular approach allows one to plug-in different learning algorithms from existing literature. As a proof of concept, we show statistically significant improvements on a machine translation system involving millions of features.
|
14:50–15:15 |
Taming Structured Perceptrons on Wild Feature Vectors
Ralf Brown
show abstracthide abstractStructured perceptrons are attractive due to their simplicity and speed, and have been used successfully for tuning the weights of binary features in a machine translation system. When we attempted to apply them to tuning the weights of real-valued features with highly skewed distributions, we found that they did not work well. This paper describes a modification to the update step and compares the performance of the resulting algorithm to standard minimum error-rate training. In addition, preliminary results for combining MERT or structured-perceptron tuning of the log-linear feature weights with coordinate ascent of other translation system parameters are presented.
|
15:15–15:40 |
Translation Model Adaptation by Resampling
Kashif Shah, Loïc Barrault and Holger Schwenk
show abstracthide abstractThe translation model of statistical machine translation systems is trained on parallel data coming from various sources and domains. These corpora are usually concatenated, word alignments are calculated and phrases are extracted. This means that the corpora are not weighted according to their importance to the domain of the translation task. This is in contrast to the training of the language model for which well known techniques are used to weight the various sources of texts. On a smaller granularity, the automatic calculated word alignments differ in quality. This is usually not considered when extracting phrases either. In this paper we propose a method to automatically weight the different corpora and alignments. This is achieved with a resampling technique.We report experimental results for a small (IWSLT) and large (NIST) Arabic/English translation tasks. In both cases, significant improvements in the BLEU score were observed.
|
|
15:40–16:00
|
Afternoon Break
|
16:00–17:40
|
Full Paper Session 5
16:00–16:25 |
Integration of Multiple Bilingually-Learned Segmentation Schemes into Statistical Machine Translation
Michael Paul, Andrew Finch and Eiichiro Sumita
show abstracthide abstractThis paper proposes an unsupervised word segmentation algorithm that identifies word boundaries in continuous source language text in order to improve the translation quality of statistical machine translation (SMT) approaches. The method can be applied to any language pair where the source language is unsegmented and the target language segmentation is known. First, an iterative bootstrap method is applied to learn multiple segmentation schemes that are consistent with the phrasal segmentations of an SMT system trained on the resegmented bitext. In the second step, multiple segmentation schemes are integrated into a single SMT system by characterizing the source language side and merging identical translation pairs of differently segmented SMT models. Experimental results translating five Asian languages into English revealed that the method of integrating multiple segmentation schemes outperforms SMT models trained on any of the learned word segmentations and performs comparably to available state-of-the-art monolingually-built segmentation tools.
|
16:25–16:50 |
Improved Translation with Source Syntax Labels
Hieu Hoang and Philipp Koehn
show abstracthide abstractWe present a new translation model that include undecorated hierarchical-style phrase rules, decorated source-syntax rules, and partially decorated rules. Results show an increase in translation performance of up to 0.8% BLEU for German-English translation when trained on the news-commentary corpus, using syntactic annotation from a source language parser. We also experimented with annotation from shallow taggers and found this increased performance by 0.5% BLEU.
|
16:50–17:15 |
Divide and Translate: Improving Long Distance Reordering in Statistical Machine Translation
Katsuhito Sudoh, Kevin Duh, Hajime Tsukada, Tsutomu Hirao and Masaaki Nagata
show abstracthide abstractThis paper proposes a novel method for long distance, clause-level reordering in statistical machine translation (SMT). The proposed method separately translates clauses in the source sentence and reconstructs the target sentence using the clause translations with non-terminals. The non-terminals are placeholders of embedded clauses, by which we reduce complicated clause-level reordering into simple word-level reordering. Its translation model is trained using a bilingual corpus with clause-level alignment, which can be automatically annotated by our alignment algorithm with a syntactic parser in the source language. We achieved significant improvements of 1.4% in BLEU and 1.3% in TER by using Moses, and 2.2% in BLEU and 3.5% in TER by using our hierarchical phrase-based SMT, for the English-to-Japanese translation of research paper abstracts in the medical domain.
|
17:15–17:40 |
Decision Trees for Lexical Smoothing in Statistical Machine Translation
Rabih Zbib, Spyros Matsoukas, Richard Schwartz and John Makhoul
show abstracthide abstractWe present a method for incorporating arbitrary context-informed word attributes into statistical machine translation by clustering attribute-qualified source words, and smoothing their word translation lexical probabilities using binary decision trees. We describe two ways in which the decision trees are used in machine translation: by using the attribute-qualified source word clusters directly, or by using attribute-dependent lexical probabilities that are obtained from the trees, as a lexical smoothing feature in the decoder model. We present experiments using Arabic-to-English newswire data, and using Arabic diacritics and part-of-speech as source word attributes, and show that the proposed method improves on a state-of-the-art translation system.
|
|
July 15, 2010 |
09:00–10:40
|
Task description papers
09:00–09:20 |
SemEval-2010 Task 1: Coreference Resolution in Multiple Languages
Marta Recasens, Lluís Màrquez, Emili Sapena, M. Antònia Martí, Mariona Taulé, Véronique Hoste, Massimo Poesio and Yannick Versley
show abstracthide abstractThis paper presents the SemEval-2010 task on "Coreference Resolution in Multiple Languages." The goal was to evaluate and compare automatic coreference resolution systems for six different languages (Catalan, Dutch, English, German, Italian, and Spanish) in four evaluation settings and using four different metrics. Such a rich scenario had the potential to provide insight into key issues concerning coreference resolution: (i) the portability of systems across languages, (ii) the relevance of different levels of linguistic information, and (iii) the behavior of scoring metrics.
|
09:20–09:40 |
SemEval-2010 Task 2: Cross-Lingual Lexical Substitution
Rada Mihalcea, Ravi Sinha and Diana McCarthy
show abstracthide abstractIn this paper we describe the SemEval-2010 Cross-Lingual Lexical Substitution task, where given an English target word in context, participating systems had to find an alternative substitute word or phrase in Spanish. The task is based on the English Lexical Substitution task run at SemEval-2007. In this paper we provide background and motivation for the task, we describe the data annotation process and the scoring system, and present the results of the participating systems.
|
09:40–10:00 |
SemEval-2010 Task 3: Cross-Lingual Word Sense Disambiguation
Els Lefever and Véronique Hoste
show abstracthide abstractThe goal of this task is to evaluate the feasibility of multilingual WSD on a newly developed multilingual lexical sample data set. Participants were asked to automatically determine the contextually appropriate translation of a given English noun in five languages, viz. Dutch, German, Italian, Spanish and French. This paper reports on the sixteen submissions from the five different participating teams.
|
10:00–10:20 |
SemEval-2010 Task 5 : Automatic Keyphrase Extraction from Scientific Articles
Su Nam Kim, Olena Medelyan, Min-Yen Kan and Timothy Baldwin
show abstracthide abstractThis paper describes Task 5 of the Workshop on Semantic Evaluation 2010 (SemEval-2010). Systems are to automatically assign keyphrases or keywords to given scientific articles. The participating systems were evaluated by matching their extracted keyphrases against manually assigned ones. We present the overall ranking of the submitted systems and discuss our findings to suggest future directions for this task.
|
10:20–10:40 |
SemEval-2010 Task 7: Argument Selection and Coercion
James Pustejovsky, Anna Rumshisky, Alex Plotnick, Elisabetta Jezek, Olga Batiukova and Valeria Quochi
show abstracthide abstractWe describe the argument selection and coercion task for the SemEval-2010 evaluation exercise. This task involves characterizing the type of compositional operation that exists between a predicate and the arguments it selects. Specifically, the goal is to identify whether the type that a verb selects is satisfied directly by the argument, or whether the argument must change type to satisfy the verb typing. We discuss the problem in detail, describe the data preparation for the task, and analyze the results of the submissions.
|
|
10:40–11:00
|
Coffee/Tea Break
|
11:00–12:40
|
Task description papers
11:00–11:20 |
SemEval-2010 Task 8: Multi-Way Classification of Semantic Relations Between Pairs of Nominals
Iris Hendrickx, Su Nam Kim, Zornitsa Kozareva, Preslav Nakov, Diarmuid Ó Séaghdha, Sebastian Pado, Marco Pennacchiotti, Lorenza Romano and Stan Szpakowicz
show abstracthide abstractSemEval-2 Task 8 focuses on Multi-way classification of semantic relations between pairs of nominals. The task was designed to compare different approaches to semantic relation classification and to provide a standard testbed for future research. This paper defines the task, describes the training and test data and the process of their creation, lists the participating systems (10 teams, 28 runs), and discusses their results.
|
11:20–11:40 |
SemEval-2 Task 9: The Interpretation of Noun Compounds Using Paraphrasing Verbs and Prepositions
Cristina Butnariu, Su Nam Kim, Preslav Nakov, Diarmuid Ó Séaghdha, Stan Szpakowicz and Tony Veale
show abstracthide abstractPrevious research has shown that the meaning of many noun-noun compounds "N1 N2" can be approximated reasonably well by paraphrasing clauses of the form "N2 that ... N1", where "..." stands for a verb with or without a preposition. For example, "malaria mosquito" is a "mosquito that carries malaria". Evaluating the quality of such paraphrases is the theme of Task 9 at SemEval-2. This paper describes some background, the task definition, the process of data collection and the task results. We also venture a few general conclusions before the participating teams present their systems at the SemEval-2 workshop. There were 5 teams who submitted 7 systems.
|
11:40–12:00 |
SemEval-2010 Task 10: Linking Events and Their Participants in Discourse
Josef Ruppenhofer, Caroline Sporleder, Roser Morante, Collin Baker and Martha Palmer
show abstracthide abstractWe describe the SemEval-2010 shared task on “Linking Events and Their Participants in Discourse”. This task is an extension to the classical semantic role labeling task. While semantic role labeling is traditionally viewed as a sentenceinternal task, it is clear that local semantic argument structures also interact with each other in a larger context, e.g., by sharing references to specific discourse entities or events. In the shared task we looked at one particular aspect of cross-sentence links between argument structures, namely linking locally uninstantiated roles to their coreferents in the wider discourse context (if such co-referents exist). This task is potentially beneficial for a number of NLP applications, such as information extraction, question answering or text summarization.
|
12:00–12:20 |
SemEval-2010 Task 12: Parser Evaluation using Textual Entailments
Deniz Yuret, Aydin Han and Zehra Turgut
show abstracthide abstractParser Evaluation using Textual Entailments (PETE) is a shared task in the SemEval-2010 Evaluation Exercises on Semantic Evaluation. The task involves recognizing textual entailments based on syntactic information alone. PETE introduces a new parser evaluation scheme that is formalism independent, less prone to annotation error, and focused on semantically relevant distinctions.
|
12:20–12:40 |
SemEval-2010 Task 13: TempEval-2
Marc Verhagen, Roser Sauri, Tommaso Caselli and James Pustejovsky
show abstracthide abstractTempeval-2 comprises evaluation tasks for time expressions, events and temporal relations, the latter of which was split up in four sub tasks, motivated by the notion that smaller subtasks would make both data preparation and temporal relation extraction easier. Manually annotated data were provided for six languages: Chinese, English, French, Italian, Korean and Spanish.
|
|
12:40–14:00
|
Lunch
|
14:00–15:20
|
Task description papers
14:00–14:20 |
SemEval-2010 Task 14: Word Sense Induction & Disambiguation
Suresh Manandhar, Ioannis Klapaftis, Dmitriy Dligach and Sameer Pradhan
show abstracthide abstractThis paper presents the description and evaluation framework of SemEval-2010 Word Sense Induction & Disambiguation task, as well as the evaluation results of 26 participating systems. In this task, participants were required to induce the senses of 100 target words using a training set, and then disambiguate unseen instances of the same words using the induced senses. Systems’ answers were evaluated in: (1) an unsupervised manner by using two clustering evaluation measures, and (2) a supervised manner, i.e. in a WSD task.
|
14:20–14:40 |
SemEval-2010 Task: Japanese WSD
Manabu Okumura, Kiyoaki Shirai, Kanako Komiya and Hikaru Yokono
show abstracthide abstractAn overview of the SemEval-2 Japanese WSD task is presented. It is a lexical sample task, and word senses are defined according to a Japanese dictionary, the Iwanami Kokugo Jiten. This dictionary and a training corpus were distributed to participants. The number of target words was 50, with 22 nouns, 23 verbs, and 5 adjectives. Fifty instances of each target word were provided, consisting of a total of 2,500 instances for the evaluation. Nine systems from four organizations participated in the task.
|
14:40–15:00 |
SemEval-2010 Task 17: All-words Word Sense Disambiguation on a Specific Domain
Eneko Agirre, Oier Lopez de Lacalle, Christiane Fellbaum, Shu-Kai Hsieh, Maurizio Tesconi, Monica Monachini, Piek Vossen and Roxanne Segers
show abstracthide abstractDomain portability and adaptation of NLP components and Word Sense Disambiguation systems present new challenges. The difficulties found by supervised systems to adapt might change the way we assess the strengths and weaknesses of supervised and knowledge-based WSD systems. Unfortunately, all existing evaluation datasets for specific domains are lexical-sample corpora. This task presented all-words datasets on the environment domain for WSD in four languages (Chinese, Dutch, English, Italian). 11 teams participated, with supervised and knowledge-based systems, mainly in the English dataset. The results show that in all languages the participants where able to beat the most frequent sense heuristic as estimated from general corpora. The most successful approaches used some sort of supervision in the form of hand-tagged examples from the domain.
|
15:00–15:20 |
SemEval-2010 Task 18: Disambiguating Sentiment Ambiguous Adjectives
Yunfang Wu and Peng Jin
show abstracthide abstractSentiment ambiguous adjectives cause major difficulties for existing algorithms of sentiment analysis. We present an evaluation task designed to provide a framework for comparing different approaches in this problem. We define the task, describe the data creation, list the participating systems and discuss their results. There are 8 teams and 16 systems.
|
|
15:20–16:00
|
Coffee/Tea Break
|
16:00–17:30
|
Poster Session
101 |
RelaxCor: A Global Relaxation Labeling Approach to Coreference Resolution
Emili Sapena, Lluís Padró and Jordi Turmo
show abstracthide abstractThis paper describes the participation of RelaxCor in the Semeval-2010 task number 1: "Coreference Resolution in Multiple Languages". RelaxCor is a constraint-based graph partitioning approach to coreference resolution solved by relaxation labeling. The approach combines the strengths of groupwise classifiers and chain formation methods in one global method.
|
102 |
SUCRE: A Modular System for Coreference Resolution
Hamidreza Kobdani and Hinrich Schütze
show abstracthide abstractThis paper presents SUCRE, a new software tool for coreference resolution and its feature engineering. It is able to separately do noun, pronoun and full coreference resolution. SUCRE introduces a new approach to the feature engineering of coreference resolution based on a relational database model and a regular feature definition language. SUCRE successfully participated in SemEval-2010 Task 1 on Coreference Resolution in Multiple Languages for gold and regular closed annotation tracks of six languages. It obtained the best results in several categories, including the regular closed annotation tracks of English and German.
|
103 |
UBIU: A Language-Independent System for Coreference Resolution
Desislava Zhekova and Sandra Kübler
show abstracthide abstractWe present UBIU, a language independent system for detecting full coreference chains, composed of named entities, pronouns, and full noun phrases which makes use of memory based learning and a feature model following Rahman and Ng (2009). UBIU is evaluated on the task "Coreference Resolution in Multiple Languages" (SemEval Task 1 (Recasens et al., 2010)) in the context of the 5th International Workshop on Semantic Evaluation.
|
104 |
Corry: a System for Coreference Resolution
Olga Uryupina
show abstracthide abstractCorry is a system for coreference resolution in English. It supports both local and global (ILP) models of coreference. The backbone of the system is a family of SVM classifiers for pairs of mentions: each mention type receives its own classifier. A separate anaphoricity classifier is learned for the ILP setting. Corry relies on a rich linguistically motivated feature set, which has, however, been manually reduced to 64 features for efficiency reasons. The system uses the Stanford NLP toolkit for parsing and NE-tagging, Wordnet for semantic classes and the U.S. census data for assigning gender values to person names. Three runs have been submitted for the SemEval task 1, optimizing Corry’s performance for BLANC, MUC and CEAF. The runs differ with respect to the model (local for BLANC, global for MUC and CEAF) and the definition of mention types. Corry runs have shown the best performance level among all the system in their track for the corresponding metric.
|
105 |
BART: A Multilingual Anaphora Resolution System
Samuel Broscheit, Massimo Poesio, Simone Paolo Ponzetto, Kepa Joseba Rodriguez, Lorenza Romano, Olga Uryupina, Yannick Versley and Roberto Zanoli
show abstracthide abstractBART is a highly modular toolkit for coreference resolution that supports state-of-the-art statistical approaches to the task and enables efficient feature engineering. BART has originally been created and tested for English, but its flexible architecture ensures its portability to other languages and domains. At the SemEval task 1 on Coreference Resolution, BART runs have been submitted for German, English, and Italian. BART relies on a maximum enthropy-based classifier for pairs of mentions. A novel entity-mention approach based on Semantic Trees is at the moment only supported for English. For German and English, BART relies on Wordnet/Germanet for determining semantic classes and a list of names pre-classified for gender (extracted from Wikipedia). Mention boundaries are derived from parse trees. For Italian, mention boundaries and semantic types are provided by our mention tagger – it relies on Wikipedia and a gazetteer extracted from the ICab dataset.
|
106 |
TANL-1: Coreference Resolution by Parse Analysis and Similarity Clustering
Giuseppe Attardi, Maria Simi and Stefano Dei Rossi
show abstracthide abstractThis paper describes our submission to the Semeval 2010 task on coreference resolution in multiple languages. The system uses a binary classifier, based on Maximum Entropy, to decide whether or not there is a relationship between each pair of mentions extracted from a textual document. Mention detection is based on the analysis of the dependency parse tree.
|
107 |
FCC: Modeling Probabilities with GIZA++ for Task #2 and #3 of SemEval-2
Darnes Vilariño Ayala, Carlos Balderas Posada, David Eduardo Pinto Avendaño, Miguel Rodríguez Hernández and Saul León Silverio
show abstracthide abstractIn this paper we present a naïve approach to tackle the problem of cross-lingual WSD and cross-lingual lexical substitution which correspond to the Task #2 and #3 of the SemEval-2 competition. We used a bilingual statistical dictionary, which is calculated with Giza++ by using the EUROPARL parallel corpus, in order to calculate the probability of a source word to be translated to a target word (which is assumed to be the correct sense of the source word but in a different language). Two versions of the probabilistic model are tested: unweighted and weighted. The obtained values show that the unweighted version performs better thant the weighted one.
|
108 |
Combining Dictionaries and Contextual Information for Cross-Lingual Lexical Substitution
Wilker Aziz and Lucia Specia
show abstracthide abstractWe describe two systems participating in Semeval-2010’s Cross-Lingual Lexical Substitution task: USPwlv and WLVusp. Both systems are based on two main components: (i) a dictionary to provide a number of possible translations for each source word, and (ii) a contextual model to select the best translation according to the context where the source word occurs. These components and the way they are integrated are different in the two systems: they exploit corpus-based and linguistic resources, and supervised and unsupervised learning methods. Among the 14 participants in the subtask to identify the best translation, our systems were ranked 2nd and 4th in terms of recall, 3rd and 4th in terms of precision. used.
|
110 |
COLEPL and COLSLM: An Unsupervised WSD Approach to Multilingual Lexical Substitution, Tasks 2 and 3 SemEval 2010
Weiwei Guo and Mona Diab
show abstracthide abstractIn this paper, we present a word sense disambiguation (WSD) based system for multilingual lexical substitution. Our method depends on having a WSD system for English and an automatic word alignment method. Crucially the approach relies on having parallel corpora. For Task 2 we apply a supervised WSD system to derive the English word senses. For Task 3, we apply an unsupervised approach to the training and test data. Both of our systems that participated in Task 2 achieve a decent ranking among the participating systems. For Task 3 we achieve the highest ranking on several of the language pairs: French, German and Italian.
|
111 |
UHD: Cross-Lingual Word Sense Disambiguation Using Multilingual Co-occurrence Graphs
Carina Silberer and Simone Paolo Ponzetto
show abstracthide abstractWe describe the University of Heidelberg (UHD) system for the Cross-Lingual Word Sense Disambiguation SemEval-2010 task (CL-WSD). The system performs CL-WSD by applying graph algorithms previously developed for monolingual Word Sense Disambiguation to multilingual co-occurrence graphs. UHD has participated in the Best and out-of-five (OOF) evaluations and ranked among the most competitive systems for this task, thus indicating that graph-based approaches represent a powerful alternative for this task.
|
112 |
OWNS: Cross-lingual Word Sense Disambiguation Using Weighted Overlap Counts and Wordnet Based Similarity Measures
Lipta Mahapatra, Meera Mohan, Mitesh Khapra and Pushpak Bhattacharyya
show abstracthide abstractWe report here our work on English French Cross-lingual Word Sense Disambiguation where the task is to find the best French translation for a target English word depending on the context in which it is used. Our approach relies on identifying the nearest neighbors of the test sentence from the training data using a pairwise similarity measure. The proposed measure finds the affinity between two sentences by calculating a weighted sum of the word overlap and the semantic overlap between them. The semantic overlap is calculated using standard Wordnet Similarity measures. Once the nearest neighbors have been identified, the best translation is found by taking a majority vote over the French translations of the nearest neighbors.
|
113 |
273. Task 5. Keyphrase Extraction Based on Core Word Identification and Word Expansion
You Ouyang, Wenjie Li and Renxian Zhang
show abstracthide abstractThis paper provides a description of the Hong Kong Polytechnic University (PolyU) System that participated in the task #5 of SemEval-2, i.e., the Automatic Keyphrase Extraction from Scientific Articles task. We followed a novel framework to develop our keyphrase extraction system, motivated by differentiating the roles of the words in a keyphrase. We first identified the core words which are defined as the most essential words in the article, and then expanded the identified core words to the target keyphrases by a word expansion approach.
|
114 |
DERIUNLP: A Context Based Approach to Automatic Keyphrase Extraction
Georgeta Bordea and Paul Buitelaar
show abstracthide abstractThe DERI UNLP team participated in the SemEval 2010 Task #5 with an unsupervised system that automatically extracts keyphrases from scientific articles. Our approach does not consider only a general description of a term to select keyphrase candidates but also context information in the form of "skill types". Even though our system analysed a restricted list of candidates, our team was able to outperform baseline unsupervised and supervised approaches.
|
115 |
DFKI KeyWE: Ranking keyphrases extracted from scientific articles
Kathrin Eichler and Günter Neumann
show abstracthide abstractA central issue for making the content of a scientific document quickly accessible to a potential reader is the extraction of keyphrases, which capture the main topic of the document. Keyphrases can be extracted automatically by generating a list of keyphrase candidates, ranking these candidates, and selecting the top-ranked candidates as keyphrases. We present the KeyWE system, which uses an adapted nominal group chunker for candidate extraction and a supervised ranking algorithm based on support vector machines for ranking the extracted candidates. The system was evaluated on data provided for the SemEval 2010 Shared Task on Keyphrase Extraction.
|
116 |
Single Document Keyphrase Extraction Using Sentence Clustering and Latent Dirichlet Allocation
Claude Pasquier
show abstracthide abstractThis paper describes the design of a system for extracting keyphrases from a single document The principle of the algorithm is to cluster sentences of the documents in order to highlight parts of text that are semantically related. The clusters of sentences, that reflect the themes of the document, are then analyzed to find the main topics of the text. Finally, the most important words, or groups of words, from these topics are proposed as keyphrases. This method is evaluated on task number 5 (Automatic Keyphrase Extraction from Scientific Articles) of SemEval-2010: the 5th International Workshop on Semantic Evaluations.
|
117 |
SJTULTLAB: Chunk Based Method for Keyphrase Extraction
Letian Wang and Fang Li
show abstracthide abstractIn this paper we present a chunk based keyphrase extraction method for scientific articles. Different from most previous systems, supervised machine learning algorithms are not used in our system. Instead, document structure information is used to remove unimportant contents; Chunk extraction and filtering is used to reduce the quantity of candidates; Keywords are used to filter the candidates before generating final keyphrases. Our experimental results on test data show that the method works better than the baseline systems and is comparable with other known algorithms.
|
118 |
Likey: Unsupervised Language-independent Keyphrase Extraction
Mari-Sanna Paukkeri and Timo Honkela
show abstracthide abstractLikey is an unsupervised statistical approach for keyphrase extraction. The method is language-independent and the only language-dependent component is the reference corpus with which the documents to be analyzed are compared. In this study, we have also used another language-dependent component: an English-specific Porter stemmer as a preprocessing step. In our experiments of keyphrase extraction from scientific articles, the Likey method outperforms both supervised and unsupervised baseline methods.
|
119 |
WINGNUS: Keyphrase Extraction Utilizing Document Logical Structure
Thuy Dung Nguyen and Minh-Thang Luong
show abstracthide abstractWe present a system description of the WINGNUS team work for the SemEval-2010 task #5 Automatic Keyphrase Extraction from Scientific Articles. A key feature of our system is that it utilizes an inferred document logical structure in our candidate identification process, to limit the number of phrases in the candidate list, while maintaining its coverage of important phrases. Our top performing system achieves an F1 of 25.22% for the combined keyphrases (author and reader assigned) in the final test data. We note that method we report here is novel and orthogonal from other systems, so it can be combined with other techniques to potentially achieve higher performance.
|
120 |
KX: A flexible system for Keyphrase eXtraction
Emanuele Pianta and Sara Tonelli
show abstracthide abstractIn this paper we present KX, a system for keyphrase extraction developed at FBK-IRST, which exploits basic linguistic annotation combined with simple statistical measures to select a list of weighted keywords from a document. The system is flexible in that it offers to the user the possibility of setting parameters such as frequency thresholds for collocation extraction and indicators for keyphrase relevance, as well as it allows for domain adaptation exploiting a corpus of documents in an unsupervised way. KX is also easily adaptable to new languages in that it requires only a PoS-Tagger to derive lexical patterns. In the SemEval task 5 “Automatic Keyphrase Extraction from Scientific Articles”, KX performance achieved satisfactory results both in finding reader-assigned keywords and in the combined keywords subtask.
|
121 |
BUAP: An Unsupervised Approach to Automatic Keyphrase Extraction from Scientific Articles
Roberto Ortiz, David Pinto, Mireya Tovar and Héctor Jiménez-Salazar
show abstracthide abstractIn this paper, it is presented an unsupervised approach to automatically discover the latent keyphrases contained in scientific articles. The proposed technique is constructed on the basis of the combination of two techniques: maximal frequent sequences and pageranking. We evaluated the obtained results by using micro-averaged precision, recall and Fscores with respect to two different gold standards: 1) reader’s keyphrases, and 2) a combined set of author’s and reader’s keyphrases. The obtained results were also compared against three different baselines: one unsupervised (TF-IDF based) and two supervised (Na¨ıve Bayes and Maximum Entropy).
|
122 |
UNPMC: Naive Approach to Extract Keyphrases from Scientific Articles
Jungyeul Park, Jong Gun Lee and Béatrice Daille
show abstracthide abstractWe describe our method for extracting keyphrases from scientific articles which we participate in the shared task of SemEval-2 Evaluation Exercise. Even though general-purpose term extractors along with linguistically-motivated analysis allow us to extract elaborated morpho-syntactic variation forms of terms, a naive statistic approach proposed in this paper is very simple and quite efficient for extracting keyphrases especially from well-structured scientific articles. Based on the characteristics of keyphrases with section information, we obtain 18.34% for f-measure using top 15 candidates. We also show further improvement without any complications and we discuss this at the end of the paper.
|
123 |
SEERLAB: A System for Extracting Keyphrases from Scholarly Documents
Pucktada Treeratpituk, Pradeep Teregowda, Jian Huang and C. Lee Giles
show abstracthide abstractWe describe the SEERLAB system that participated in the SemEval 2010’s Keyphrase Extraction Task. SEERLAB utilizes the DBLP corpus for generating a set of candidate keyphrases from a document. Random Forest, a supervised ensemble classifier, is then used to select the top keyphrases from the candidate set. SEERLAB achieved a 0.24 F-score in generating the top 15 keyphrases, which places it sixth among 19 participating sys- tems. Additionally, SEERLAB performed particularly well in generating the top 5 keyphrases with an F-score that ranked third.
|
124 |
SZTERGAK : Feature Engineering for Keyphrase Extraction
Gábor Berend and Richárd Farkas
show abstracthide abstractAutomatically assigning keyphrases to documents has a great variety of applications. Here we focus on the keyphrase extraction of scientific publications and present a novel set of features for the supervised learning of keyphraseness. Although these features are intended for extracting keyphrases from scientific papers, because of their generality and robustness, they should have uses in other domains as well. With the help of these features SZTERGAK achieved top results on the SemEval-2 shared task on Automatic Keyphrase Extraction from Scientific Articles and exceeded its baseline by 10%.
|
125 |
KP-Miner: Participation in SemEval-2
Samhaa R. El-Beltagy and Ahmed Rafea
show abstracthide abstractThis paper briefly describes the KP-Miner sys-tem which is a system developed for the extraction of keyphrases from English and Arabic documents, irrespective of their nature. The paper also outlines the performance of the system in the “Automatic Keyphrase Extraction from Scientific Articles” task which is part of SemEval-2.
|
126 |
UvT: The UvT Term Extraction System in the Keyphrase Extraction task
Kalliopi Zervanou
show abstracthide abstractThe UvT system is based on a hybrid, linguistic and statistical approach, originally proposed for the recognition of multi-word terminological phrases, the C-value method (Frantzi et al., 2000). In the UvT implementation, we use an extended noun phrase rule set and take into consideration orthographic and morphological variation, term abbreviations and acronyms, and basic document structure information.
|
127 |
UNITN: Part-Of-Speech Counting in Relation Extraction
Fabio Celli
show abstracthide abstractThis report describes the UNITN system, a Part-Of-Speech Context Counter, that participated at Semeval 2010 Task 8: Multi- Way Classification of Semantic Relations Between Pairs of Nominals. Given a text annotated with Part-of-Speech, the system outputs a vector representation of a sentence containing 20 features in total. There are three steps in the system’s pipeline: first the system produces an estimation of the entities’ position in the relation, then an estimation of the semantic relation type by means of decision trees and finally it gives a predicition of semantic relation plus entities’ position. The system obtained good results in the estimation of entities’ position (F1=98.3%) but a critically poor performance in relation classification (F1=26.6%), indicating that lexical and semantic information is essential in relation extraction. The system can be used as an integration for other systems or for purposes different from relation extraction.
|
128 |
FBK_NK: a WordNet-based System for Multi-Way Classification of Semantic Relations
Matteo Negri and Milen Kouylekov
show abstracthide abstractWe describe a WordNet-based system for the extraction of semantic relations between pairs of nominals appearing in English texts. The system adopts a lightweight approach, based on training a Bayesian Network classifier using large sets of binary features. Our features consider: i) the context surrounding the nominals involved in the relation, and ii) different types of knowledge extracted from WordNet, including direct and explicit relations between the annotated nominals, and more general and implicit evidence (e.g. semantic boundary collocations). The system achieved a Macro-averaged F1 of 68.02% on the “Multi-Way Classification of Semantic Relations Between Pairs of Nominals” task (Task #8) at SemEval-2010.
|
129 |
JU: A Supervised Approach to Identify Semantic Relations from Paired Nominals
Santanu Pal, Partha Pakray, Dipankar Das and Sivaji Bandyopadhyay
show abstracthide abstractThis article presents the experiments carried out at Jadavpur University as part of the participation in Multi-Way Classification of Semantic Relations between Pairs of Nomi-nals in the SemEval 2010 exercise. Separate rules for each type of the relations are iden-tified in the baseline model based on the verbs and prepositions present in the seg-ment between each pair of nominals. Inclu-sion of WordNet features associated with the paired nominals play an important role in distinguishing the relations from each other. The Conditional Random Field (CRF) based machine-learning framework is adopted for classifying the pair of nominals. Application of dependency relations, Named Entities (NE) and various types of WordNet features along with several com-binations of these features help to improve the performance of the system. Error analy-sis suggests that the performance can be im-proved by applying suitable strategies to differentiate each paired nominal in an al-ready identified relation. Evaluation result gives an overall macro-averaged F1 score of 52.16%.
|
131 |
FBK-IRST: Semantic Relation Extraction using Cyc
Kateryna Tymoshenko and Claudio Giuliano
show abstracthide abstractWe present an approach for semantic relation extraction between nominals that combines semantic information with shallow syntactic processing. We propose to use the ResearchCyc knowledge base as a source of semantic information about nominals. Each kind of information is represented by kernel functions. The experiments were carried out using support vector machines as a classifier. The system achieves an overall F1 of 77.62 on the "Multi-Way Classification of Semantic Relations Between Pairs of Nominals" task at SemEval-2010.
|
132 |
ISTI@SemEval-2 Task #8: Boosting-Based Multiway Relation Classification
Andrea Esuli, Diego Marcheggiani and Fabrizio Sebastiani
show abstracthide abstractWe describe a boosting-based supervised learning approach to the “Multi-Way Classification of Semantic Relations between Pairs of Nominals” task #8 of SemEval-2. Participants were asked to determine which relation, from a set of nine relations plus “Other”, exists between two nominals, and also to determine the roles of the two nominals in the relation. Our participation has focused, rather than on the choice of a rich set of features, on the classification model adopted to determine the correct assignment of relation and roles.
|
133 |
ISI: Automatic Classification of Relations Between Nominals Using a Maximum Entropy Classifier
Stephen Tratz and Eduard Hovy
show abstracthide abstractThe automatic interpretation of semantic relations between nominals is an important subproblem within natural language understanding applications and is an area of increasing interest. In this paper, we present the system we used to participate in the SemEval 2010 Task 8 Multi-Way Classification of Semantic Relations between Pairs of Nominals. Our system, based upon a Maximum Entropy classifier trained using a large number of boolean features, received the third highest score.
|
134 |
ECNU: Effective Semantic Relations Classification without Complicated Features or Multiple External Corpora
Yuan Chen, Man Lan, Jian Su, Zhi Min Zhou and Yu Xu
show abstracthide abstractThis paper describes our approach to the automatic identification of semantic relations between nominals in English sentences. The basic idea of our strategy is to develop machine-learning classifiers which:(1) make use of class-independent features and classifier; (2) make use of a simple and effective feature set without high computational cost; (3) make no use of external annotated or unannotated corpus at all. At SemEval 2010 Task 8 our system achieved an F-measure of 75.43% and an accuracy of 70.22%.
|
135 |
UCD-Goggle: A Hybrid System for Noun Compound Paraphrasing
Guofu Li, Alejandra Lopez-Fernandez and Tony Veale
show abstracthide abstractThis paper addresses the problem of ranking a list of paraphrases associated with a noun-noun compound as closely as possible to the judgments of human raters. UCD-Goggle tackles this task using semantic knowledge learnt from the Google n-grams together with human-preferences for paraphrases mined from training data. Empirical evaluation shows that UCD-Goggle achieves 0.432 Spearman correlation with human judgments.
|
136 |
UCD-PN: Selecting General Paraphrases Using Conditional Probability
Paul Nulty and Fintan Costello
show abstracthide abstractWe describe a system which ranks human-provided paraphrases of noun compounds, where the frequency with which a given paraphrase was provided by human volunteers is the gold standard for ranking. Our system assigns a score to a paraphrase of a given compound according to the number of times it has co-occurred with other paraphrases given in the rest of the dataset. We use these co-occurrence statistics to compute conditional probabilities which cluster together paraphrases which have similar meanings and also favour frequent, general paraphrases rather than infrequent paraphrases with more specific meanings.
|
|
July 16, 2010 |
09:00–10:30
|
System papers
09:00–09:15 |
COLEPL and COLSLM: An Unsupervised WSD Approach to Multilingual Lexical Substitution, Tasks 2 and 3 SemEval 2010
Weiwei Guo and Mona Diab
show abstracthide abstractIn this paper, we present a word sense disambiguation (WSD) based system for multilingual lexical substitution. Our method depends on having a WSD system for English and an automatic word alignment method. Crucially the approach relies on having parallel corpora. For Task 2 we apply a supervised WSD system to derive the English word senses. For Task 3, we apply an unsupervised approach to the training and test data. Both of our systems that participated in Task 2 achieve a decent ranking among the participating systems. For Task 3 we achieve the highest ranking on several of the language pairs: French, German and Italian.
|
09:15–09:30 |
UBA: Using Automatic Translation and Wikipedia for Cross-Lingual Lexical Substitution
Pierpaolo Basile and Giovanni Semeraro
show abstracthide abstractThis paper presents the participation of the University of Bari (UBA) at the SemEval-2010 Cross-Lingual Lexical Substitution Task. The goal of the task is to substitute a word in a language Ls, which occurs in a particular context, by providing the best synonyms in a different language Lt which fit in that context. This task has a strict relation with the task of automatic machine translation, but there are some differences: Cross-lingual lexical substitution targets one word at a time and the main goal is to find as many good translations as possible for the given target word. Moreover, there are some connections with Word Sense Disambiguation (WSD) algorithms. Indeed, understanding the meaning of the target word is necessary to find the best substitutions. An important aspect of this kind of task is the possibility of finding synonyms without using a particular sense inventory or a specific parallel corpus, thus allowing the participation of unsupervised approaches. UBA proposes two systems: the former is based on an automatic translation system which exploits Google Translator, the latter is based on a parallel corpus approach which relies on Wikipedia in order to find the best substitutions.
|
09:30–09:45 |
HUMB: Automatic Key Term Extraction from Scientific Articles in GROBID
Patrice Lopez and Laurent Romary
show abstracthide abstractThe Semeval task 5 was an opportunity for experimenting with the key term extraction module of GROBID, a system for extracting and generating bibliographical information from technical and scientific documents. The tool first uses GROBID’s facilities for analyzing the structure of scientific articles, resulting in a first set of structural features. A second set of features captures content properties based on phraseness, informativeness and keywordness measures. Two knowledge bases, GRISP and Wikipedia, are then exploited for producing a last set of lexical/semantic features. Bagged decision trees appeared to be the most efficient machine learning algorithm for generating a list of ranked key term candidates. Finally a post ranking was realized based on statistics of co-usage of keywords in HAL, a large Open Access publication repository.
|
09:45–10:00 |
UTDMet: Combining WordNet and Corpus Data for Argument Coercion Detection
Kirk Roberts and Sanda Harabagiu
show abstracthide abstractThis paper describes our system for the classification of argument coercion for SemEval-2010 Task 7. We present two approaches to classifying an argument’s semantic class, which is then compared to the predicate’s expected semantic class to detect coercions. The first approach is based on learning the members of an arbitrary semantic class using WordNet’s hypernymy structure. The second approach leverages automatically extracted semantic parse information from a large corpus to identify similar arguments by the predicates that select them. We show the results these approaches obtain on the task as well as how they can improve a traditional feature-based approach.
|
10:00–10:15 |
UTD: Classifying Semantic Relations by Combining Lexical and Semantic Resources
Bryan Rink and Sanda Harabagiu
show abstracthide abstractThis paper describes our system for SemEval-2010 Task 8 on multi-way classification of semantic relations between nominals. First, the type of semantic relation is classified. Then a relation type-specific classifier determines the relation direction. Classification is performed using SVM classifiers and a number of features that capture the context, semantic role affiliation, and possible pre-existing relations of the nominals. This approach achieved an F1 score of 82.19% and an accuracy of 77.92%.
|
10:15–10:30 |
UvT: Memory-based pairwise ranking of paraphrasing verbs
Sander Wubben
show abstracthide abstractIn this paper we describe Mephisto, our system for Task 9 of the SemEval-2 workshop. Our approach to this task is to develop a machine learning classifier which determines for each verb pair describing a noun compound which verb should be ranked higher. These classifications are then combined into one ranking. Our classifier uses features from the Google N-gram Corpus, WordNet and the provided training data.
|
|
10:40–11:00
|
Coffee/Tea Break
|
11:00–12:30
|
System papers
11:00–11:15 |
SEMAFOR: Frame Argument Resolution with Log-Linear Models
Desai Chen, Nathan Schneider, Dipanjan Das and Noah A. Smith
show abstracthide abstractThis paper describes the SEMAFOR system’s performance in the SemEval 2010 task on linking events and their participants in discourse. Our entry is based upon SEMAFOR 1.0 (Das et al., 2010), a frame-semantic probabilistic parser built from log-linear models. The extended system models null instantiations, including non-local argument reference. Performance is evaluated on the task data with and without gold-standard overt arguments. In both settings, it fares the best of the submitted systems with respect to recall and F1.
|
11:15–11:30 |
Cambridge: Parser Evaluation using Textual Entailment by Grammatical Relation Comparison
Laura Rimell and Stephen Clark
show abstracthide abstractThis paper describes the Cambridge submission to the SemEval-2010 Parser Evaluation using Textual Entailment (PETE) task. We used a simple definition of entailment, parsing both T and H with the C&C parser and checking whether the core grammatical relations (subject and object) produced for H were a subset of those for T. This simple system achieved the top score for the task out of those systems submitted. We analyze the errors made by the system and the potential role of the task in parser evaluation.
|
11:30–11:45 |
MARS: A Specialized RTE System for Parser Evaluation
Rui Wang and Yi Zhang
show abstracthide abstractThis paper describes our participation in the the SemEval-2010 Task #12, Parser Evaluation using Textual Entailment. Our system incorporated two dependency parsers, one semantic role labeler, and a deep parser based on hand-crafted grammars. The shortest path algorithm is applied on the graph representation of the parser outputs. Then, different types of features are extracted and the entailment recognition is casted into a machine-learning-based classification task. The best setting of the system achieves 66.78% of accuracy, which ranks the 3rd place.
|
11:45–12:00 |
TRIPS and TRIOS System for TempEval-2: Extracting Temporal Information from Text
Naushad UzZaman and James Allen
show abstracthide abstractExtracting temporal information from raw text is fundamental for deep language understanding, and key to many applications like question answering, information extraction, and document summarization. In this paper, we describe two systems we submitted to the TempEval 2 challenge, for extracting temporal information from raw text. The systems use a combination of deep semantic parsing, Markov Logic Networks and Conditional Random Field classifiers. Our two submitted systems, TRIPS and TRIOS, approached all tasks and outperformed all teams in two tasks. Furthermore, TRIOS mostly had second-best performances in other tasks. TRIOS also outperformed the other teams that attempted all the tasks. Our system is notable in that for tasks C – F, they operated on raw text while all other systems used tagged events and temporal expressions in the corpus as input.
|
12:00–12:15 |
TIPSem (English and Spanish): Evaluating CRFs and Semantic Roles in TempEval-2
Hector Llorens, Estela Saquete Boro and Borja Navarro
show abstracthide abstractThis paper presents TIPSem, a system to extract temporal information from natural language texts for English and Spanish. TIPSem, learns CRF models from training data. Although the used features include different language analysis levels, the approach is focused on semantic information. For Spanish, TIPSem achieved the best F1 score in all the tasks. For English, it obtained the best F1 in tasks B (events) and D (event-dct links); and was among the best systems in the rest.
|
12:15–12:30 |
CityU-DAC: Disambiguating Sentiment-Ambiguous Adjectives within Context
Bin Lu and Benjamin K. Tsou
show abstracthide abstractThis paper describes our system participating in task 18 of SemEval-2010, i.e. disambiguating Sentiment-Ambiguous Adjectives (SAAs). To disambiguating SAAs, we compare the machine learning-based and lexicon-based methods in our submissions: 1) Maximum entropy is used to train classifiers based on the annotated Chinese data from the NTCIR opinion analysis tasks, and the clause-level and sentence-level classifiers are compared; 2) For the lexicon-based method, we first classify the adjectives into two classes: intensifiers (i.e. adjectives intensifying the intensity of context) and suppressors (i.e. adjectives decreasing the intensity of context), and then use the polarity of context to get the SAAs’ contextual polarity based on a sentiment lexicon. The results show that the performance of maximum entropy is not quite high due to little training data; on the other hand, the lexicon-based method could improve the precision by considering the polarity of context.
|
|
12:30–14:00
|
Lunch
|
14:00–15:30
|
Panel
|
15:30–16:00
|
Coffee/Tea Break
|
16:00–17:30
|
Posters Session
101 |
VENSES++: Adapting a deep semantic processing system to the identification of null instantiations
Sara Tonelli and Rodolfo Delmonte
show abstracthide abstractIn this paper we present VENSES++, a system to spot null instantiations and their antecedents, if available, as required by the "NIs-only" subtask of the SemEval 2010 Task 10 "Linking events and their participants in discourse". Our application is an adaptation of VENSES, a system for semantic evaluation that has been used for RTE challenges in the last 6 years. The new version exploits three modules of VENSES, namely the lexico-semantic module, the anaphora resolution module and the semantic module, in order to represent and analyse the document information. Then, two further procedures have been added: one identifies null instantiated roles of verbal predicates, while the other deals with nominal predicates. The first is based on the valence patterns extracted for every verbal lexical unit from FrameNet v. 1.4 and from the training data. The second procedure, instead, relies on a History List created by VENSES containing all events, spatial and temporal locations and body parts found in the document. Another useful resource employed to find antecedents is ConceptNet 2.0. Even if the preliminary results are far from satisfactory, we were able to devise a robust, knowledge-based system and a general strategy for dealing with the task.
|
102 |
CLR: Linking Events and Their Participants in Discourse Using a Comprehensive FrameNet Dictionary
Ken Litkowski
show abstracthide abstractThe CL Research system for SemEval-2 Task 10 for linking events and their participants in discourse is an exploration of the use of a specially created FrameNet dictionary that cap-tures all FrameNet information about frames, lexical units, and frame-to-frame relations. This system is embedded in a specially designed interface, the Linguistic Task Analyzer. The implementation of this system was quite minimal at the time of submission, allowing only an initial completion of the role recognition and labeling task, with recall of 0.112, precision of 0.670, and F-score of 0.192. We describe the design of the system and the continuing efforts to determine how much of this task can be performed with the available lexical resources. Changes since the official submission have improved the F-score to 0.266.
|
103 |
PKU_HIT: An Event Detection System Based on Instances Expansion and Rich Syntactic Features
Shiqi Li, Peng-Yuan Liu, Tiejun Zhao, Qin Lu and Hanjing Li
show abstracthide abstractThis paper describes the PKU_HIT system on event detection in the SemEval-2010 Task. We construct three modules for the three sub-tasks of this evaluation. For target verb WSD, we build a Naïve Bayesian classifier which uses additional training instances expanded from an untagged Chinese corpus automatically. For sentence SRL and event detection, we use a feature-based machine learning method which makes combined use of both consti-tuent-based and dependency-based features. Experimental results show that the Macro Accuracy of the WSD module reaches 83.81% and F-Score of the SRL module is 55.71%.
|
104 |
372:Comparing the Benefit of Different Dependency Parsers for Textual Entailment Using Syntactic Constraints Only
Alexander Volokh and Günter Neumann
show abstracthide abstractWe compare several state of the art dependency parsers with our own parser based on a linear classification technique. Our primary goal is therefore to use syntactic information only, in order to keep the comparison of the parsers as fair as possible. We demonstrate, that despite the inferior result using the standard evaluation metrics for parsers like UAS or LAS on standard test data, our system achieves comparable results when used in an application, such as the PETE shared task. Our submission achieved the 4th position out of 19 participating systems. However, since it only uses a linear classifier it works 17-20 times faster than other state of the parsers, as for instance MaltParser or Stanford Parser.
|
105 |
SCHWA: PETE using CCG Dependencies with the C&C Parser
Dominick Ng, James W.D. Constable, Matthew Honnibal and James R. Curran
show abstracthide abstractThis paper describes the SCHWA system entered by the University of Sydney in SemEval 2010 Task 12 – Parser Evaluation using Textual Entailments (Yuret et al., 2010). Our system achieved an overall accuracy of 70% in the task evaluation. We used the C&C parser to build CCG dependency parses of the truth and hypothesis sentences. We then used partial match heuristics to determine whether the system should predict entailment. Heuristics were used because the dependencies generated by the parser are construction specific, making full compatibility unlikely. We also manually annotated the development set with CCG analyses, establishing an upper bound for our entailment system of 87%.
|
106 |
ID 392:TERSEO + T2T3 Transducer. A systems for recognizing and normalizing TIMEX3
Estela Saquete Boro
show abstracthide abstractThe system described in this paper has participated in the Tempeval 2 competition, specifically in the Task A, which aim is to determine the extent of the time expressions in a text as defined by the TimeML TIMEX3 tag, and the value of the features type and val. For this purpose, a combination of TERSEO system and the T2T3 Transducer was used. TERSEO system is able to annotate text with TIDES TIMEX2 tags, and T2T3 transducer performs the translation from this TIMEX2 tags to TIMEX3 tags.
|
107 |
HeidelTime: High Quality Rule-based Extraction and Normalization of Temporal Expressions
Jannik Strötgen and Michael Gertz
show abstracthide abstractIn this paper, we describe HeidelTime, a system for the extraction and normalization of temporal expressions. HeidelTime is a rule-based system mainly using regular expression patterns for the extraction of temporal expressions and knowledge resources as well as linguistic clues for their normalization. In the TempEval-2 challenge, HeidelTime achieved the highest F-Score (86%) for the extraction and the best results in assigning the correct value attribute, i.e., in understanding the semantics of the temporal expressions.
|
108 |
KUL: Recognition and Normalization of Temporal Expressions
Oleksandr Kolomiyets and Marie-Francine Moens
show abstracthide abstractIn this paper we describe a system for the recognition and normalization of temporal expressions (Task 13: TempEval-2, Task A). The recognition task is approached as a classification problem of sentence constituents and the normalization is implemented in a rule-based manner. One of the system features is extending positive annotations in the corpus by semantically similar words automatically obtained from a large unannotated textual corpus. The best results obtained by the system are 0.85 and 0.84 for precision and recall respectively for recognition of temporal expressions; the accuracy values of 0.91 and 0.55 were obtained for the feature values TYPE and VAL respectively.
|
109 |
UC3M system: Determining the Extent, Type and Value of Time Expressions in TempEval-2
María Teresa Vicente-Díez, Julián Moreno-Schneider and Paloma Martínez
show abstracthide abstractThis paper describes the participation of Universidad Carlos III de Madrid in Task A of the TempEval-2 evaluation. The UC3M system was originally developed for the temporal expressions recognition and normalization (TERN task) in Spanish texts, according to the TIDES standard. Current version supposes an almost-total refactoring of the earliest system. Additionally, it has been adapted to the TimeML annotation schema and a considerable effort has been done with the aim of increasing its coverage. It takes a rule-based design both in the identification and the resolution phases. It adopts an inductive approach based on the empirical study of frequency of temporal expressions in Spanish corpora. Detecting the extent of the temporal expressions the system achieved a Precision/Recall of 0.90/0.87 whereas, in determining the TYPE and VALUE of those expressions, system results were 0.91 and 0.83, respectively.
|
110 |
Edinburgh-LTG: TempEval-2 System Description
Claire Grover, Richard Tobin, Beatrice Alex and Kate Byrne
show abstracthide abstractWe describe the Edinburgh information extraction system which we are currently adapting for analysis of newspaper text as part of the SYNC3 project. Our most recent focus is geospatial and temporal grounding of entities and it has been useful to participate in TempEval-2 to measure the performance of our system and to guide further development. We took part in Tasks A and B for English.
|
111 |
USFD2: Annotating Temporal Expresions and TLINKs for TempEval-2
Leon Derczynski and Robert Gaizauskas
show abstracthide abstractWe describe the University of Sheffield system used in the TempEval-2 challenge, USFD2. The challenge requires the automatic identification of temporal entities and relations in text. USFD2 identifies and anchors temporal expressions, and also attempts two of the four temporal relation assignment tasks. A rule-based system picks out and anchors temporal expressions, and a maximum entropy classifier assigns temporal link labels, based on features that including descriptions of associated temporal signal words. USFD2 identified temporal expressions successfully, and correctly classified their type in 90% of cases. Determining the relation between an event and time expression in the same sentence was performed at 63% accuracy, the second highest score in this part of the challenge.
|
112 |
NCSU: Modeling Temporal Relations with Markov Logic and Lexical Ontology
Eun Ha, Alok Baikadi, Carlyle Licata and James Lester
show abstracthide abstractAs a participant in TempEval-2, we address the temporal relations task consisting of four related subtasks. We take a supervised machine-learning technique using Markov Logic in combination with rich lexical relations beyond basic and syntactic features. One of our two submitted systems achieved the highest score for the Task F (66% precision), untied, and the second highest score (63% precision) for the Task C, which tied with three other systems.
|
113 |
JU_CSE_TEMP: A First Step towards Evaluating Events, Time Expressions and Temporal Relations
Anup Kumar Kolya, Asif Ekbal and Sivaji Bandyopadhyay
show abstracthide abstractTemporal information extraction is a popular and interesting research field in the area of Natural Language Processing (NLP). In this paper, we report our works on TempEval-2 shared task. This is our first participation and we participated in Tasks A, B, C, D, E and F. We develop the rule-based systems for Tasks A and B, whereas the remaining tasks are based on a machine learning approach, namely Conditional Random Field (CRF). All our systems are still in their development stages, and we report the very ini-tial results. Evaluation results on the shared task English datasets yield the precision, recall and F-measure values of 55%, 17% and 26%, respec-tively for Task A and 48%, 56% and 52%, re-spectively for Task B (event recognition). The rest of tasks, namely C, D, E and F were eva-luated with a relatively simpler metric: the num-ber of correct answers divided by the number of answers. Experiments on the English datasets yield the accuracies of 63%, 80%, 56% and 56% for tasks C, D, E and F, respectively.
|
114 |
KCDC: Word Sense Induction by Using Grammatical Dependencies and Sentence Phrase Structure
Roman Kern, Markus Muhr and Michael Granitzer
show abstracthide abstractWord sense induction and discrimination (WSID) identifies the senses of an ambiguous word and assigns instances of this word to one of these senses. We have build a WSID system that exploits syntactic and semantic features based on the results of a natural language parser component. To achieve high robustness and good generalization capabilities, we designed our system to work on a restricted, but grammatically rich set of features. Based on the results of the evaluations our system provides a promising performance and robustness.
|
115 |
UoY: Graphs of Unambiguous Vertices for Word Sense Induction and Disambiguation
Ioannis Korkontzelos and Suresh Manandhar
show abstracthide abstractThis paper presents an unsupervised graph-based method for automatic word sense induction and disambiguation. The innovative part of our method is the assignment of either a word or a word pair to each vertex of the constructed graph. Word senses are induced by clustering the constructed graph. In the disambiguation stage, each induced cluster is scored according to the number of its vertices found in the context of the target word. Our system participated in SemEval-2010 word sense induction and disambiguation task.
|
116 |
HERMIT: Flexible Clustering for the SemEval-2 WSI Task
David Jurgens and Keith Stevens
show abstracthide abstractA single word may have multiple unspecified meanings in a corpus. Word sense induction aims to discover these different meanings through word use, and knowledge-poor algorithms attempt this without using external lexical resources.We propose a new method for identifying the different senses that uses a flexible clustering strategy to automatically determine the number of senses, rather than predefining it. We demonstrate the effectiveness using the SemEval-2 WSI task, achieving competitive scores on both the V-Measure and Recall metrics, depending on the parameter configuration.
|
117 |
Duluth-WSI: SenseClusters Applied to the Sense Induction Task of SemEval-2
Ted Pedersen
show abstracthide abstractThe Duluth-WSI systems in SemEval-2 built word co–occurrence matrices from the task test data to create a second order co–occurrence representation of those test instances. The senses of words were induced by clustering these instances, where the number of clusters was automatically predicted. The Duluth-Mix system was a variation of WSI that used the combination of training and test data to create the co-occurrence matrix. The Duluth-R system was a series of random baselines.
|
118 |
KSU KDD: Word Sense Induction by Clustering in Topic Space
Wesam Elshamy, Doina Caragea and William Hsu
show abstracthide abstractWe describe our language-independent unsupervised word sense induction system. This system only uses topic features to cluster different word senses in their global context topic space. Using unlabeled data, this system trains a latent Dirichlet allocation (LDA) topic model then uses it to infer the topics distribution of the test instances. By clustering these topics distributions in their topic space we cluster them into different senses. Our hypothesis is that closeness in topic space reflects similarity between different word senses. This system participated in SemEval-2 word sense induction and disambiguation task and achieved the second highest V-measure score among all other systems.
|
119 |
PengYuan@PKU: Extracting Infrequent Sense Instance with the Same N-gram Pattern for the SemEval-2010 Task 15
Peng-Yuan Liu, Shi-Wen Yu, Shui Liu and Tiejun Zhao
show abstracthide abstractThis paper describes our infrequent sense identification system participating in the SemEval-2010 task 15 on Infrequent Sense Identification for Mandarin Text to Speech Systems. The core system is a supervised system based on the ensembles of Naïve Bayesian classifiers. In order to solve the problem of unbalanced sense distribution, we intentionally extract only instances of infrequent sense with the same N-gram pattern as the complemental training data from an untagged Chinese corpus – People’s Daily of the year 2001. At the same time, we adjusted the prior probability to adapt to the distribution of the test data and tuned the smoothing coefficient to take the data sparseness into account. Official result shows that, our system ranked the first with the best Macro Accuracy 0.952. We briefly describe this system, its configuration options and the features used for this task and present some discussion of the results.
|
120 |
RALI: Automatic weighting of text window distances
Bernard Brosseau-Villeneuve, Noriko Kando and Jian-Yun Nie
show abstracthide abstractSystems using text windows to model word contexts have mostly been using fixed-sized windows and uniform weights. The window size is often selected by trial and error to maximize task results. We propose a non-supervised method for selecting weights for each window distance, effectively removing the need to limit window sizes, by maximizing the mutual generation of two sets of samples of the same word. Experiments on Semeval Word Sense Disambiguation tasks showed considerable improvements.
|
121 |
JAIST: Clustering and Classification based Approaches for Japanese WSD
Kiyoaki Shirai and Makoto Nakamura
show abstracthide abstractThis paper reports about our three participating systems in SemEval-2 Japanese WSD task. The first one is a clustering based method, which chooses a sense for, not individual instances, but automatically constructed clusters of instances. The second one is a classification method, which is an ordinary SVM classifier with simple domain adaptation techniques. The last is an ensemble of these two systems. Results of the formal run shows the second system is the best. Its precision is 0.7476.
|
122 |
MSS: Investigating the Effectiveness of Domain Combinations and Topic Features for Word Sense Disambiguation
Sanae Fujita, Kevin Duh, Akinori Fujino, Hirotoshi Taira and Hiroyuki Shindo
show abstracthide abstractWe participated in the SemEval-2010 Japanese Word Sense Disambiguation (WSD) task (Task 16). Our focus was on (1) investigating domain differences, (2) incorporating topic features, (3) predicting new unknown senses. We experimented with Support Vector Machines (SVM) and Maximum Entropy (MEM) classifiers. We achieved an accuracy of 80.1 % in our experiments.
|
123 |
IIITH: Domain Specific Word Sense Disambiguation
Siva Reddy, Abhilash Inumella, Diana McCarthy and Mark Stevenson
show abstracthide abstractWe describe two systems that participated in SemEval-2010 task 17 (All-words Word Sense Disambiguation on a Specific Domain) and were ranked in the third and fourth positions in the formal evaluation. Domain adaptation techniques using the background documents released in the task were used to assign ranking scores to the words and their senses. The test data was disambiguated using the Personalised PageRank algorithm which was applied to a graph constructed from the whole of WordNet in which nodes are initialised with ranking scores of words and their senses. Our systems achieved comparable accuracy of 53.4 and 52.2, which outperforms the most frequent sense baseline (50.5)
|
124 |
UCF-WS: Domain Word Sense Disambiguation using Web Selectors
Hansen A. Schwartz and Fernando Gomez
show abstracthide abstractThis paper studies the application of the Web Selectors word sense disambiguation system on a specific domain. The system was primarily applied without any domain tuning, but the incorporation of domain predominant sense information was explored. Results indicated that the system performs relatively the same with domain predominant sense information as without, scoring well above a random baseline, but still 5 percentage points below results of using the most frequent sense.
|
125 |
TreeMatch: A Fully Unsupervised WSD System Using Dependency Knowledge on a Specific Domain
Andrew Tran, Chris Bowes, David Brown, Ping Chen, Max Choly and Wei Ding
show abstracthide abstractWord sense disambiguation (WSD) is one of the main challenges of applications in Natural Language Processing. TreeMatch is a WSD system originally developed using data from SemEval 2007 Task 7 (Coarse-grained English All-words Task) that has been adapted for use in SemEval 2010 Task 17 (All-words Word Sense Disambiguation on a Specific Domain). The system is based on a fully unsupervised method using dependency knowledge drawn from a domain specific knowledge base that was built for this task. When evaluated on the task, the system precision performs above the First Sense Baseline.
|
126 |
GPLSI-IXA: Using Semantic Classes to Acquire Monosemous Training Examples from Domain Texts
Rubén Izquierdo, Armando Suárez and German Rigau
show abstracthide abstractThis paper summarizes our participation in task #17 of SemEval–2 (All–words WSD on a specific domain) using a supervised class-based Word Sense Disambiguation system. Basically, we use Support Vector Machines (SVM) as learning algorithm and a set of simple features to build three different models. Each model considers a different training corpus: SemCor (SC), examples from monosemous words extracted automatically from background data (BG), and both SC and BG (SCBG). Our system explodes the monosemous words appearing as members of a particular WordNet semantic class to automatically acquire class-based annotated examples from the domain text. We use the class-based examples gathered from the domain corpus to adapt our traditional system trained on SemCor. The evaluation reveal that the best results are achieved training with SemCor and the background examples from monosemous words, obtaining results above the most frequent baseline and the fifth best position in the competition rank.
|
127 |
HIT-CIR: An Unsupervised {WSD} System Based on Domain Most Frequent Sense Estimation
Yuhang Guo, Wanxiang Che, Wei He, Ting Liu and Sheng Li
show abstracthide abstractThis paper presents an unsupervised system for all-word domain specific word sense disambiguation task. This system tags target word with the most frequent sense which is estimated using a thesaurus and the word distribution information in the domain. The thesaurus is automatically constructed from bilingual parallel corpus using paraphrase technique. The recall of this system is 43.5\% on SemEval-2 task 17 English data set.
|
128 |
RACAI: Unsupervised WSD experiments @ SemEval-2, Task #17
Radu Ion and Dan Ştefănescu
show abstracthide abstractThis paper documents the participation of the Research Institute for Artificial Intelligence of the Romanian Academy (RACAI) to the Task 17 – All-words Word Sense Disambiguation on a Specific Domain, of the SemEval-2 competition. We describe three unsupervised WSD systems that make extensive use of the Princeton WordNet (WN) structure and WordNet Domains in order to perform the disambiguation. The best of them has been ranked the 12th by the task organizers out of 29 judged runs.
|
129 |
Kyoto: An Integrated System for Specific Domain WSD
Aitor Soroa, Eneko Agirre, Oier López de Lacalle, Wauter Bosma, Piek Vossen, Monica Monachini, Jessie Lo and Shu-Kai Hsieh
show abstracthide abstractThis document describes the preliminary release of the integrated Kyoto system for specific domain WSD. The system uses concept miners (Tybots) to extract domain-related terms and produces a domain-related thesaurus, followed by knowledge-based WSD based on wordnet graphs (UKB). The resulting system can be applied to any language with a lexical knowledge base, and is based on publicly available software and resources. Our participation in Semeval task #17 focused on producing running systems for all languages in the task, and we attained good results in all except Chinese. Due to the pressure of the time-constraints in the competition, the system is still under development, and we expect results to improve in the near future.
|
130 |
CFILT: Resource Conscious Approaches for All-Words Domain Specific WSD
Anup Kulkarni, Mitesh Khapra, Saurabh Sohoney and Pushpak Bhattacharyya
show abstracthide abstractWe describe two approaches for All-words Word Sense Disambiguation on a Specific Domain}. The first approach is a knowledge based approach which extracts domain-specific largest connected components from the Wordnet graph by exploiting the semantic relations between all candidate synsets appearing in a domain-specific untagged corpus. Given a test word, disambiguation is performed by considering only those candidate synsets that belong to the top-k largest connected components. The second approach is a weakly supervised approach which relies on the "One Sense Per Domain" heuristic and uses a few hand labeled examples for the most frequently appearing words in the target domain. Once the most frequent words have been disambiguated they can provide strong clues for disambiguating other words in the sentence using an iterative disambiguation algorithm. Our weakly supervised system gave the best performance across all systems that participated in the task even when it used as few as 100 hand labeled examples from the target domain.
|
131 |
UMCC-DLSI: Integrative Resource for Disambiguation Task
Yoan Gutiérrez Vázquez, Antonio Fernandez Orquín, Andrés Montoyo Guijarro and Sonia Vázquez Pérez
show abstracthide abstractThis paper describes the UMCC-DLSI system in SemEval-2010 task number 17 (All-words Word Sense Disambiguation on Specific Domain). The main purpose of this work is to evaluate and compare our computational resource of WordNet’s mappings using 3 different methods: Relevant Semantic Tree, Relevant Semantic Tree 2 and an Adaptation of k-clique’s Technique. Our proposal is a non-supervised and knowledge-based system that uses Domains Ontology and SUMO.
|
132 |
HR-WSD: System Description for All-words Word Sense Disambiguation on a Specific Domain at SemEval-2010
Meng-Hsien Shih
show abstracthide abstractThe document describes the knowledge-based Domain-WSD system using heuristic rules (knowledge-base). This HR-WSD system delivered the best performance (55.9%) among all Chinese systems in SemEval-2010 Task 17: All-words WSD on a specific domain.
|
133 |
Twitter Based System: Using Twitter for Disambiguating Sentiment Ambiguous Adjectives
Alexander Pak and Patrick Paroubek
show abstracthide abstractIn this paper, we describe our system which participated in the SemEval 2010 task of disambiguating sentiment ambiguous adjectives for Chinese. Our system uses text messages from Twitter, a popular microblogging platform, for building a dataset of emotional texts. Using the built dataset, the system classifies the meaning of adjectives into positive or negative sentiment polarity according to the given context. Our approach is fully automatic. It does not require any additional hand-built language resources and it is language independent.
|
134 |
YSC-DSAA: An Approach to Disambiguate Sentiment Ambiguous Adjectives Based On SAAOL
Shi-Cai Yang and Mei-Juan Liu
show abstracthide abstractIn this paper, we describe the system we developed for the SemEval-2010 task of Disambiguating Sentiment Ambiguous Adjectives (hereinafter referred to SAA). Our system created a new word library named SAA-Oriented Library consisting of positive words, negative words, negative words related to SAA, positive words related to SAA, and inverse words, etc. Based on the syntactic parsing, we analyzed the relationship between SAA and the keywords and handled other special processes by extracting such words in the relevant sen-tences to disambiguate sentiment ambiguous adjectives. Our micro average accuracy is 0.942, which puts our system in the first place.
|
135 |
OpAL: Applying Opinion Mining Techniques for the Disambiguation of Sentiment Ambiguous Adjectives in SemEval-2 Task 18
Alexandra Balahur and Andrés Montoyo Guijarro
show abstracthide abstractThe task of extracting the opinion expressed in text is challenging due to different reasons. One of them is that the same word (in particular, adjectives) can have different polarities depending on the context. This paper presents the experiments carried out by the OpAL team for the participation in the SemEval 2010 Task 18 – Disambiguation of Sentiment Ambiguous Adjectives. Our approach is based on three different strategies: a) the evaluation of the polarity of the whole context using an opinion mining system; b) the assessment of the polarity of the local context, given by the combinations between the closest nouns and the adjective to be classified; c) rules aiming at refining the local semantics through the spotting of modifiers. The final decision for classification is taken according to the output of the majority of these three approaches. The method used yielded good results, the OpAL system run ranking fifth among 16.
|
136 |
HITSZ_CITYU: Combine Collocation, Context Words and Neighboring Sentence Sentiment in Sentiment Adjectives Disambiguation
Ruifeng Xu, Jun Xu and Chunyu Kit
show abstracthide abstractThis paper presents the HIT_CITYU systems in Semeval-2 Task 18, namely, disambiguat-ing sentiment ambiguous adjectives. The baseline system (HITSZ_CITYU_3) incorporates bi-gram and n-gram collocations of sentiment adjectives, and other context words as features in a one-class Support Vector Machine (SVM) classifier. To enhance the baseline system, collocation set expansion and characteristics learning based on word similarity and semi-supervised learning are investigated, respectively. The final system (HITSZ_CITYU_1/2) combines collocations, context words and neighboring sentence sentiment in a two-class SVM classifier to determine the polarity of sentiment adjectives. The final systems achieved 0.957 and 0.953 (ranked 1st and 2nd) macro accuracy, and 0.936 and 0.933 (ranked 2nd and 3rd) micro accuracy, respectively.
|
137 |
SWAT: Cross-Lingual Lexical Substitution using Local Context Matching, Bilingual Dictionaries and Machine Translation
Richard Wicentowski, Maria Kelly and Rachel Lee
show abstracthide abstractWe present two systems that select the most appropriate Spanish substitutes for a marked word in an English test sentence. These systems were official entries to the SemEval-2010 Cross-Lingual Lexical Substitution task. The first system, Swat-E, finds Spanish substitutions by first finding English substitutions in the English sentence and then translating these substitutions into Spanish using an English-Spanish dictionary. The second system, Swat-S, translates each English sentence into Spanish and then finds the Spanish substitutions in the Spanish sentence. Both systems exceeded the baseline and all other participating systems by a wide margin using one of the two official scoring metrics.
|
138 |
TUD: semantic relatedness for relation classification
György Szarvas and Iryna Gurevych
show abstracthide abstractIn this paper, we describe the system submitted by the team TUD to Task 8 at SemEval 2010. The challenge focused on the identification of semantic relations between pairs of nominals in sentences collected from the web. We applied maximum entropy classification using both lexical and syntactic features to describe the nominals and their context. In addition, we experimented with features describing the semantic relatedness (SR) between the target nominals and a set of clue words characteristic to the relations. Our best submission with SR features achieved 69.23% macro-averaged F-measure, providing 8.73% improvement over our baseline system. Thus, we think SR can serve as a natural way to incorporate external knowledge to relation classification.
|
|
Thursday, July 15, 2010 |
08:40–08:50
|
Opening remarks
|
08:50–10:30
|
Session I
08:50–09:15 |
EmotiBlog: a Finer-Grained and More Precise Learning of Subjectivity Expression Models
Ester Boldrini, Alexandra Balahur, Patricio Martínez-Barco and Andrés Montoyo Guijarro
show abstracthide abstractThe exponential growth of the subjective in-formation in the framework of the Web 2.0 has led to the need to create Natural Language Processing tools able to analyse and process such data for multiple practical applications. These applications require training on specifically annotated corpora, whose level of detail must be fine enough to capture the phenomena involved. This paper presents EmotiBlog — a fine-grained annotation scheme for subjectivity. We show the manner in which it is built and demonstrate the benefits it brings to the systems using it for training, through the experiments we carried out on opinion mining and emotion detection. We employ corpora of different textual genres — a set of annotated reported speech extracted from news articles, the set of news titles annotated with polarity and emotion from the SemEval 2007 (Task 14) and ISEAR, a corpus of real-life self-expressed emotion. We also show how the model built from the EmotiBlog annotations can be enhanced with external resources. The results demonstrate that EmotiBlog, through its structure and annotation paradigm, offers high quality training data for systems dealing both with opinion mining, as well as emotion detection.
|
09:15–09:40 |
Error-tagged Learner Corpus of Czech
Jirka Hana, Alexandr Rosen, Svatava Škodová and Barbora Štindlová
show abstracthide abstractThe paper describes a Learner corpus of Czech, currently under development. The corpus captures Czech as used by non-native speakers. We discuss its structure, the layered annotation of errors and the annotation process.
|
09:40–10:05 |
Annotation Scheme for Social Network Extraction from Text
Apoorv Agarwal, Owen Rambow and Rebecca Passonneau
show abstracthide abstractIn this paper we present a novel annotation scheme that facilitates the extraction of social networks from text. We focus on a new type of event, called social event, in which two people participate and either both are cognizant of each other or only one is cognizant of the other. We define four types of social events: Interaction, Cognition, Physical Proximity and Perception. Since our annotation task is complex and layered, we present confusion matrices, Cohen’s Kappa, and F-measure values for each of the decision points that the annotators go through in the process of selecting a type and subtype for an event. For a set of documents from the ACE-2005 corpus, we achieve high Kappa (0.66-0.86) and F-measure (0.8-0.9) values which indicate that our annotation scheme is reliable. We also implement a global agreement measure which is inspired by the Automated Content Extraction (ACE) inter-annotator agreement measure. We get about 70\% agreement that compares favorably to the ACE annotation effort.
|
10:05–10:30 |
Agile Corpus Annotation in Practice: An Overview of Manual and Automatic Annotation of CVs
Beatrice Alex, Claire Grover, Rongzhou Shen and Mijail Kabadjov
show abstracthide abstractAnnotated data sets are an important resources for various research fields, including natural language processing (NLP) and text mining (TM). While the detection of annotation inconsistencies in different data sets has been investigated and their effect on NLP performance has been studied, very little work has been done on deriving better methods of annotation as a whole process in order to maximize both the quality and quantity of annotated data. This paper describes our annotation project in which we tested the relatively new approach of agile corpus annotation of moving away from the traditional, linear phases of corpus creation towards iterative ones and of recognizing the fact that sources of error can occur throughout the annotation process. The paper also summarize the performance of the machine-learning (ML)-based TM components which were trained and evaluated on the annotated data of CVs of software developers and programmers.
|
|
10:30–11:00
|
Break
|
11:00–12:40
|
Session II
11:00–11:25 |
Consistency Checking for Treebank Alignment
Markus Dickinson and Yvonne Samuelsson
show abstracthide abstractThis paper explores ways to detect errors in aligned corpora, using very little technology. In the first method, applicable to any aligned corpus, we consider alignment as a string-to-string mapping. Treating the target string as a label, we examine each source string to find inconsistencies in alignment. Despite setting up the problem on a par with grammatical annotation, we demonstrate crucial differences in sorting errors from legitimate variations. The second method examines phrase nodes which are predicted to be aligned, based on the alignment of their yields. Both methods are effective in complementary ways.
|
11:25–11:50 |
Anveshan: A Framework for Analysis of Multiple Annotators’ Labeling Behavior
Vikas Bhardwaj, Rebecca Passonneau, Ansaf Salleb-Aouissi and Nancy Ide
show abstracthide abstractManual annotation of natural language to capture linguistic information is essential for NLP tasks involving supervised machine learning of semantic knowledge. Judgements of meaning can be more or less subjective, in which case instead of a single correct label, the labels assigned might vary among annotators based on the annotators’ knowledge, age, gender, intuitions, background, and so on. We introduce a framework ”Anveshan”, where we investigate annotator behavior to find outliers, cluster annotators by behavior, and identify confusable labels. We also investigate the effectiveness of using trained annotators versus a larger number of untrained annotators on a word sense annotation task. The annotation data comes from a word sense disambiguation task for polysemous words, annotated by both trained annotators and untrained annotators from Amazon’s Mechanical turk. Our results show that Anveshan is effective in uncovering patterns in annotator behavior, and we also show that trained annotators are superior to a larger number of untrained annotators for this task.
|
11:50–12:15 |
Influence of Pre-annotation on POS-tagged Corpus Development
Karën Fort and Benoît Sagot
show abstracthide abstractThis article details a series of carefully designed experiments aiming at evaluating the influence of automatic pre-annotation on the manual part-of-speech annotation of a corpus, both from the quality and the time points of view, with a specific attention drawn to biases. For this purpose, we manually annotated parts of the Penn Treebank corpus (Marcus et al., 1993) under various experimental setups, either from scratch or using various pre-annotations. These experiments confirm and detail the gain in quality observed before (Marcus et al., 1993; Dandapat et al., 2009; Rehbein et al., 2009), while showing that biases do appear and should be taken into account. They finally demonstrate that even a not so accurate tagger can help improving annotation speed.
|
12:15–12:40 |
To Annotate More Accurately or to Annotate More
Dmitriy Dligach, Rodney Nielsen and Martha Palmer
show abstracthide abstractThe common accepted wisdom is that blind double annotation followed by adjudication of disagreements is necessary to create training and test corpora that result in the best possible performance. We provide evidence that this is unlikely to be the case. Rather, the greatest value for your annotation dollar lies in single annotating more data.
|
|
12:40–13:50
|
Lunch
|
13:50–15:30
|
Session III
13:50–14:15 |
Annotating Underquantification
Aurelie Herbelot and Ann Copestake
show abstracthide abstractMany noun phrases in text are ambiguously quantified: syntax doesn’t explicitly tell us whether they refer to a single entity or to several, and what portion of the set denoted by the Nbar actually takes part in the event expressed by the verb. We describe this ambiguity phenomenon in terms of underspecification, or rather ‘underquantification’. We attempt to validate the underquantification hypothesis by producing and testing an annotation scheme for quantification resolution, the aim of which is to associate a single quantifier with each noun phrase in our corpus.
|
14:15–14:40 |
PropBank Annotation of Multilingual Light Verb Constructions
Jena D. Hwang, Archna Bhatia, Claire Bonial, Aous Mansouri, Ashwini Vaidya, Nianwen Xue and Martha Palmer
show abstracthide abstractIn this paper, we have addressed the task of PropBank annotation of light verb constructions, which like multi-word expressions pose special problems. To arrive at a solution, we have evaluated 3 different possible methods of annotation. The final method involves three passes: (1) manual identification of a light verb construction, (2) annotation based on the light verb construction’s Frame File, and (3) a deterministic merging of the first two passes. We also discuss how in various languages the light verb constructions are identified and can be distinguished from the non-light verb word groupings.
|
14:40–15:05 |
Retrieving Correct Semantic Boundaries in Dependency Structure
Jinho Choi and Martha Palmer
show abstracthide abstractThis paper describes the retrieval of correct semantic boundaries for predicate-argument structures annotated by dependency structure. Unlike phrase structure, in which arguments are annotated at the phrase level, dependency structure does not have phrases so the argument labels are associated with head words instead: the subtree of each head word is assumed to include the same set of words as the annotated phrase does in phrase structure. However, at least in English, retrieving such subtrees does not always guarantee retrieval of the correct phrase boundaries. In this paper, we present heuristics that retrieve correct phrase boundaries for semantic arguments, called semantic boundaries, from dependency trees. By applying heuristics, we achieved an F1-score of 99.54% for correct representation of semantic boundaries. Furthermore, error analysis showed that some of the errors could also be considered correct, depending on the interpretation of the annotation.
|
15:05–15:30 |
Complex Predicates Annotation in a Corpus of Portuguese
Iris Hendrickx, Amália Mendes, Sílvia Pereira, Anabela Gonçalves and Inês Duarte
show abstracthide abstractWe present an annotation scheme for the annotation of complex predicates, understood as constructions with more than one lexical unit, each contributing part of the information normally associated with a single predicate. We discuss our annotation guidelines of four types of complex predicates, and the treatment of several difficult cases, related to ambiguity, overlap and coordination. We then discuss the process of marking up the Portuguese CINTIL corpus of 1M tokens (written and spoken) with a new layer of information regarding complex predicates. We also present the outcomes of the annotation work and statistics on the types of CPs that we found in the corpus.
|
|
15:30–16:00
|
Break
|
16:00–17:30
|
Poster session
1 |
Using an Online Tool for the Documentation of Edo Language
Ota Ogie
show abstracthide abstractLanguage documentation is important as a tool for preservation of endangered languages and making data available to speakers and researchers of a language. A data base such as TypeCraft is important for typology studies both for well documented languages as well as little documented languages and is a valid tool for comparison of languages. This requires that linguistic elements must be coded in a manner that allows comparability across widely varying language data. In this paper, I discuss how I have used the coding system in TypeCraft for the documentation of data from Èdó language, a language belonging to the Edoid group of the Benue-Congo subfamily of the Volta-Congo language family and spoken in Mid-Western Nigeria, West Africa. The study shows how syntactic, semantic and morphological properties of multi-verb constructions in Èdó (Benue-Congo) can be represented in a relational database.
|
2 |
Cross-Lingual Validity of PropBank in the Manual Annotation of French
Lonneke van der Plas, Tanja Samardzic and Paola Merlo
show abstracthide abstractMethods that re-use existing mono-lingual semantic annotation resources to annotate a new language rely on the hypothesis that the semantic annotation scheme used is cross-lingually valid. We test this hypothesis in an annotation agreement study. We show that the annotation scheme can be applied cross-lingually.
|
3 |
Characteristics of High Agreement Affect Snnotation in Text
Cecilia Ovesdotter Alm
show abstracthide abstractThe purpose of this paper is to present an unusual English dataset for affect exploration in text. It describes a corpus of fairy tales from three sources that have been annotated for affect at the sentence level. Special attention is given to data marked by high annotator agreement. A qualitative analysis of characteristics of high agreement sentences from H. C. Andersen reveals several interesting trends, illustrated by examples. Requested additional information: Poster proposal Language: English (including texts in translation from Danish/German) Paper categories: corpus annotation, semantics, (opinion/sentiment) Non-standard equipment: not required
|
4 |
The Deep Re-annotation in a Chinese Scientific Treebank
Kun Yu, Xiangli Wang, Yusuke Miyao, Takuya Matsuzaki and Jun’ichi Tsujii
show abstracthide abstractIn this paper, we introduce our recent work on re-annotating the deep information, which includes both the grammatical functional tags and the traces, in a Chinese scientific tree-bank. The issues with regard to re-annotation and its corresponding solutions are discussed. Furthermore, the process of the re-annotation work is described.
|
5 |
The Unified Annotation of Syntax and Discourse in the Copenhagen Dependency Treebanks
Matthias Buch-Kromann and Iørn Korzen
show abstracthide abstractWe propose a unified model of syntax and discourse in which text structure is viewed as a tree structure augmented with anaphoric relations and other secondary relations. We describe how the model accounts for discourse connectives and the syntax-discourse-semantics interface. Our model is dependency-based, ie, words are the basic building blocks in our analyses. The analyses have been applied cross-linguistically in the Copenhagen Dependency Treebanks, a set of parallel treebanks for Danish, English, German, Italian, and Spanish which are currently being annotated with respect to discourse, anaphora, syntax, morphology, and translational equivalence.
|
6 |
Identifying Sources of Inter-Annotator Variation: Evaluating Two Models of Argument Analysis
Barbara White
show abstracthide abstractAn analysis of an article’s argument (rhetorical) structure can serve to identify elements that biomedical researchers wish to access. Human-annotated data are needed to train such automated systems for Information Extraction. This paper reports on a study where two Models of argument were applied to the Discussion sections of a corpus of twelve biomedical research articles downloaded from the BMC-series of journals. The three annotators were the study director and current author, and two fourth-year Medical Science students. The goals were to evaluate and compare the performance of the Models and to identify sources of inter-annotator variation as diagnostics for improving either or both Models. The first Model applied was based on previous work – Argumentative Zoning, Teufel et al. 1999; Zone Analysis, Mizuta et al. 2005 – but the second was developed from Toulmin’s Claims-based argument structure (1958/2003). The results exhibited a mixture of systematic and random (noise-like) inter-annotator disagreements. The patterns in the systematic variation showed that there are problems with particular argument categories under both Models as well as notable annotator bias toward certain categories in some instances. In addition, there was a surprisingly wide range in percentage of three-way inter-annotator agreement under both Models among the twelve corpus articles. This ‘inter-article’ variation brings to light the importance of another factor in these annotation results: the quality and clarity of the writing and exposition of the corpus data. The results of this study indicate a need to revise both Models of argument to ensure that categories are clearly distinguished. Based on the technical complexity of the corpus data and the importance of understanding how authors present their arguments, it is recommended that in the future annotators should work in pairs – a biomedical domain expert together with an expert in rhetoric.
|
7 |
Dependency-Based PropBanking of Clinical Finnish
Katri Haverinen, Filip Ginter, Timo Viljanen, Veronika Laippala and Tapio Salakoski
show abstracthide abstractIn this paper, we present a PropBank of clinical Finnish, an annotated corpus of verbal propositions and arguments. The clinical PropBank is created on top of a previously existing dependency treebank annotated in the Stanford Dependency (SD) scheme and covers 90% of all verb occurrences in the treebank. We establish that the PropBank scheme is applicable to clinical Finnish as well as compatible with the SD scheme, with an overwhelming proportion of arguments being governed by the verb. This allows argument candidates to be restricted to direct verb dependents, substantially simplifying the PropBank construction. The clinical Finnish PropBank is freely available at the address http://bionlp.utu.fi.
|
8 |
Building the Syntactic Reference Corpus of Medieval French Using NotaBene RDF Annotation Tool
Nicolas Mazziotta
show abstracthide abstractIn this paper, we introduce the NotaBene RDF Annotation Tool free software used to build the Syntactic Reference Corpus of Medieval French. It relies on a dependency-based model to manually annotate Old French texts from the Base de Français Médiéval and the Nouveau Corpus d’Amsterdam. NotaBene uses OWL ontologies to frame the terminology used in the annotation, which is displayed in a tree-like view of the annotation. This tree widget allows easy grouping and tagging of words and structures. To increase the quality of the annotation, two annotators work independently on the same texts at the same time and NotaBene can also generate automatic comparisons between both analyses. The RDF format can be used to export the data to several other formats: namely, TigerXML (for querying the data and extracting structures) and graphviz dot format (for quoting syntactic description in research papers).
|
9 |
Chunking German: An Unsolved Problem
Sandra Kübler, Kathrin Beck, Erhard Hinrichs and Heike Telljohann
show abstracthide abstractThis paper describes a CoNLL-style chunk representation for the Tübingen Treebank of Written German, which assumes a flat chunk structure so that each word belongs to at most one chunk. For German, such a chunk definition causes problems in cases of complex prenominal modification. We introduce a flat annotation that can handle these structures via a stranded noun chunk.
|
10 |
Proposal for MWE Annotation in Running Text
Iris Hendrickx, Amália Mendes and Sandra Antunes
show abstracthide abstractWe present a proposal for the annotation of multi-word expressions in a 1M corpus of contemporary portuguese. Our aim is to create a resource that allows us to study multi-word expressions (MWE) in their context. The corpus will be a valuable additional resource next to the already existing MWE lexicon that was based on a much larger corpus of 50M words. In this paper we discuss the problematic cases for annotation and proposed solutions, focusing on the variational properties of MWE .
|
11 |
A Feature Type Classification for Therapeutic Purposes: a preliminary evaluation with non-expert speakers
Gianluca E. Lebani and Emanuele Pianta
show abstracthide abstractWe propose a feature type classification thought to be used in a therapeutic context. Such a scenario lays behind our need for a easily usable and cognitively plausible classi-fication. Nevertheless, our proposal has both a practical and a theoretical outcome, and its applications range from computational lin-guistics to psycholinguistics. An evaluation through inter-coder agreement has been per-formed to highlight the strength of our pro-posal and to conceive some improvements for the future.
|
12 |
Annotating Korean Demonstratives
Sun-Hee Lee and Jae-young Song
show abstracthide abstractThis paper presents preliminary work on a corpus-based study of Korean demonstratives. Through the development of an annotation scheme and the use of spoken and written corpora, we aim to determine different functions of demonstratives and to examine their distributional properties. Our corpus study adopts similar features of annotation used in Botley and McEnery (2001) and provides some linguistic hypotheses on grammatical functions of Korean demonstratives to be further explored.
|
21 |
Creating and Exploiting a Resource of Parallel Parses
Christian Chiarcos, Kerstin Eckart and Julia Ritz
show abstracthide abstractThis paper describes the creation of a resource of German sentences with multiple automatically created alternative syntactic analyses (parses) for the same text, and how qualitative and quantitative investigations of this resource can be performed using ANNIS, a tool for corpus querying and visualization. Using the example of PP attachment, we show how parsing can benefit from the use of such a resource.
|
22 |
From Descriptive Annotation to Grammar Specification
Lars Hellan
show abstracthide abstractThe paper presents an architecture for connecting annotated linguistic data with a computational gram-mar system. Pivotal to the architecture is an annota-tional interlingua – called the Construction Labeling system (CL) - which is notationally very simple, de-scriptively finegrained, cross-typologically applica-ble, and formally well-defined enough to map to a state-of-the-art computational model of grammar. In the present instantiation of the architecture, the com-putational grammar is an HPSG-based system called TypeGram. Underlying the architecture is a research program of enhancing the interconnectivity between linguistic analytic subsystems such as grammar for-malisms and text annotation systems.
|
23 |
An Annotation Schema for Preposition Senses in German
Antje Müller, Olaf Hülscher, Claudia Roch, Katja Kesselmeier, Tobias Stadtfeld, Jan Strunk and Tibor Kiss
show abstracthide abstractPrepositions are highly polysemous. Yet, little effort has been spent to develop language-specific annotation schemata for preposition senses to systematically represent and analyze the polysemy of prepositions in large corpora. In this paper, we present an annotation schema for preposition senses in German. The annotation schema includes a hierarchical taxonomy and also allows multiple annotations for individual tokens. It is based on an analysis of usage-based dictionaries and grammars and has been evaluated in an inter-annotator-agreement study.
|
24 |
OTTO: A Transcription and Management Tool for Historical Texts
Stefanie Dipper, Lara Kresse, Martin Schnurrenberger and Seong-Eun Cho
show abstracthide abstractThis paper presents OTTO, a transcription tool designed for diplomatic transcription of historical language data. The tool supports easy and fast typing and instant rendering of transcription in order to gain a look as close to the original manuscript as possible. In addition, the tool provides support for the management of transcription projects which involve distributed, collaborative working of multiple parties on collections of documents.
|
25 |
Multimodal Annotation of Conversational Data
Philippe Blache, Roxane Bertrand, Emmanuel Bruno, Brigitte Bigi, Robert Espesser, Gaelle Ferre, Mathilde Guardiola, Daniel Hirst, Ning Tan, Edlira Cela, Jean-Claude Martin, Stéphane Rauzy, Mary-Annick Morel, Elisabeth Murisasco and Irina Nesterenko
show abstracthide abstractWe propose in this paper a broad-coverage approach for multimodal annotation of conversational data. Large annotation projects addressing the question of multimodal annotation bring together many different kinds of information from different domains, with different levels of granularity. We present in this paper the first results of the OTIM project aiming at developing conventions and tools for multimodal annotation.
|
26 |
Combining Parallel Treebanks and Geo-Tagging
Martin Volk, Anne Goehring and Torsten Marek
show abstracthide abstractThis paper describes a new kind of semantic annotation in parallel treebanks. We build French-German parallel treebanks of mountaineering reports, a text genre that abounds with geographical names which we classify and ground with reference to a large gazetteer of Swiss toponyms. We discuss the challenges in obtaining a high recall and precision in automatic grounding, and sketch how we represent the grounding information in our treebank.
|
27 |
Challenges of Cheap Resource Creation
Jirka Hana and Anna Feldman
show abstracthide abstractWe describe the challenges of resource creation for a resource-light system for morphological tagging of fusional languages (Feldman and Hana, 2010). The constraints on resources (time, expertise, and money) introduce challenges that are not present in development of morphological tools and corpora in the usual, resource intensive way.
|
28 |
Discourse Relation Configurations in Turkish and an Annotation Environment
Berfin Aktaş, Cem Bozşahin and Deniz Zeyrek
show abstracthide abstractIn this paper, we describe an annotation environment developed for the marking of discourse structures in Turkish, and the kinds of discourse relation configurations that led to its design.
|
29 |
An Overview of the CRAFT Concept Annotation Guidelines
Michael Bada, Miriam Eckert, Martha Palmer and Lawrence Hunter
show abstracthide abstractWe present our concept-annotation guidelines for an large multi-institutional effort to create a gold-standard manually annotated corpus of full-text biomedical journal articles. We are semantically annotating these documents with the full term sets of eight large biomedical ontologies and controlled terminologies ranging from approximately 1,000 to millions of terms, and, using these guidelines, we have been able to perform this extremely challenging task with a high degree of interannotator agreement. The guidelines have been designed to be able to be used with any terminology employed to semantically annotate concept mentions in text and are available for external use.
|
30 |
Syntactic Tree Queries in Prolog
Gerlof Bouma
show abstracthide abstractIn this paper, we argue for and demonstrate the use of Prolog as a tool to query annotated corpora. We present a case study based on the German TüBa-D/Z Treebank to show that flexible and efficient corpus querying can be started with a minimal amount of effort. We end this paper with a brief discussion of performance, that suggests that the approach is both fast enough and scalable.
|
31 |
An Integrated Tool for Annotating Historical Corpora
Pablo Picasso Feliciano de Faria, Fabio Natanael Kepler and Maria Clara Paixão de Sousa
show abstracthide abstractE-Dictor is a tool for encoding, applying levels of editions, and assigning part-of-speech tags to ancient texts. In short, it works as a WYSIWYG interface to encode text in XML format. It comes from the experience during the building of the Tycho Brahe Parsed Corpus of Historical Portuguese and from consortium activities with other research groups. Preliminary results show a decrease of at least 50% on the overall time taken on the editing process.
|
32 |
The Revised Arabic PropBank
Wajdi Zaghouani, Mona Diab, Aous Mansouri, Sameer Pradhan and Martha Palmer
show abstracthide abstractThe revised Arabic PropBank (APB) reflects a number of changes to the data and the process of PropBanking. Several changes stem from Treebank revisions, and an automatic process was put in place to map existing annotation to the new trees. We have revised the original 493 Frame Files from the Pilot APB and added 1462 new files for a total of 1955 Frame Files with 2446 framesets. In addition to a heightened attention to sense distinctions this cycle includes a greater attempt to address complicated predicates such as light verb constructions and multi-word expressions. New tools facilitate the data tagging and also simplify frame creation.
|
|
Friday, July 16, 2010 |
08:50–10:30
|
Session IV
08:50–09:15 |
PackPlay: Mining Semantic Data in Collaborative Games
Nathan Green, Paul Breimyer, Vinay Kumar and Nagiza Samatova
show abstracthide abstractBuilding training data is labor-intensive and presents a major obstacle to advancing machine learning technologies such as machine translators, named entity recognizers (NER), part-of-speech taggers, etc. Training data are often specialized for a particular language or Natural Language Processing (NLP) task. Knowledge captured by a specific set of training data is not easily transferable, even to the same NLP task in another language. Emerging technologies, such as social networks and serious games, offer a unique opportunity to change how we construct training data. While collaborative games have been used in information retrieval, it is an open issue whether users can contribute accurate annotations in a collaborative game context for a problem that requires an exact answer, such as games that would create named entity recognition training data. We present PackPlay, a collaborative game framework that empirically shows players’ ability to mimic annotation accuracy and thoroughness seen in gold standard annotated corpora.
|
09:15–09:40 |
A Proposal for a Configurable Silver Standard
Udo Hahn, Katrin Tomanek, Elena Beisswanger and Erik Faessler
show abstracthide abstractAmong the many proposals to promote alternatives to costly to create gold standards, just recently the idea of a fully automatically, and thus cheaply, to set up silver standard has been launched. However, the current construction policy for such a silver standard requires crucial parameters (such as similarity thresholds and agreement cut-offs) to be set a priori, based on extensive testing though, at corpus compile time. Accordingly, such a corpus is static, once it is released. We here propose an alternative policy where silver standards can be dynamically optimized and customized on demand (given a specific goal function) using a gold standard as an oracle.
|
09:40–10:05 |
A Hybrid Model for Annotating Named Entity Training Corpora
Robert Voyer, Valerie Nygaard, Will Fitzgerald and Hannah Copperman
show abstracthide abstractIn this paper, we present a two-phase, hybrid model for generating training data for Named Entity Recognition systems. In the first phase, a trained annotator labels all named entities in a text irrespective of type. In the second phase, naïve crowdsourcing workers complete binary judgment tasks to indicate the type(s) of each entity. Decomposing the data generation task in this way results in a flexible, reusable corpus that accommodates changes to entity type taxonomies. In addition, it makes efficient use of precious trained annotator resources by leveraging highly available and cost effective crowdsourcing worker pools in a way that does not sacrifice quality.
|
10:05–10:30 |
Anatomy of Annotation Schemes: Mapping to GrAF
Nancy Ide and Harry Bunt
show abstracthide abstractIn this paper, we apply the annotation scheme design methodology defined in (Bunt, 2010) and demonstrate its use for generating a mapping from an existing annotation scheme to a representation in GrAF format. By way of illustration, we apply the mapping strategy to annotations from ISO-TimeML (Mani et al., 2004), PropBank (Palmer et al., 2005), and FrameNet (Baker et al., 1998).
|
|
10:30–11:00
|
Break
|
11:00–12:40
|
Session V
11:00–11:25 |
Annotating Participant Reference in English Spoken Conversation
John Niekrasz and Johanna D. Moore
show abstracthide abstractIn conversational language, references to people (especially to the conversation participants, e.g., I, you, and we) are an essential part of many expressed meanings. In most conversational settings, however, many such expressions have numerous potential meanings, are frequently vague, and are highly dependent on social and situational context. This is a significant challenge to conversational language understanding systems — one which has seen little attention in annotation studies. In this paper, we present a method for annotating verbal reference to *people* in conversational speech, with a focus on reference to conversation *participants*. Our goal is to provide a resource that tackles the issues of vagueness, ambiguity, and contextual dependency in a nuanced yet reliable way, with the ultimate aim of supporting work on summarization and information extraction for conversation.
|
11:25–11:50 |
Design and Evaluation of Shared Prosodic Annotation for Spontaneous French Speech: From Expert Knowledge to Non-Expert Annotation
Anne Lacheret-Dujour, Nicolas Obin and Mathieu Avanzi
show abstracthide abstractIn the area of large French speech corpora, there is a demonstrated need for a common prosodic notation system allowing for easy data exchange, comparison, and automatic annotation. The major questions are: (1) how to develop a single simple scheme of prosodic transcription which could form the basis of guidelines for non-expert manual annotation (NEMA), used for linguistic teaching and research; (2) based on this NEMA, how to establish reference prosodic corpora (RPC) for different discourse genres (Cresti and Moneglia, 2005); (3) how to use the RPC to develop corpus-based learning methods for automatic prosodic labelling in spontaneous speech (Buhman et al., 2002; Avanzi, et al. 2010). This paper presents two pilot experiments conducted with a consortium of 15 French experts in prosody in order to provide a prosodic transcription framework (transcription methodology and transcription reliability measures) and to establish reference prosodic corpora in French
|
11:50–12:15 |
Depends on What the French Say - Spoken Corpus Annotation With and Beyond Syntactic Functions
José Deulofeu, Lucie Duffort, Kim Gerdes, Sylvain Kahane and Paola Pietrandrea
show abstracthide abstractWe present a syntactic annotation scheme for spoken French that is currently used in the Rhapsodie project. This annotation is dependency-based and includes coordination and disfluency as analogously encoded types of paradigmatic phenomena. Furthermore, we attempt a thorough definition of the discourse units re-quired by the systematic annotation of other phenomena beyond usual sentence boundaries, which are typical for spoken language. This includes so called "macrosyntactic" phenomena such as dislocation, parataxis, insertions, grafts, and epexegesis.
|
12:15–12:40 |
The Annotation Scheme of the Turkish Discourse Bank and An Evaluation of Inconsistent Annotations
Deniz Zeyrek, Işin Demirşahin, Ayişiǧi Sevdik-Çalli, Hale Ögel Balaban, Ihsan Yalçinkaya and Ümit Deniz Turan
show abstracthide abstractIn this paper, we report on the annotation procedures we developed for annotating the Turkish Discourse Bank (TDB), an effort that extends the Penn Discourse Tree Bank (PDTB) annotation style by using it for annotating Turkish discourse. After a brief introduction to the TDB, we describe the annotation cycle and the annotation scheme we developed, defining which parts of the scheme are an extension of the PDTB and which parts are different. We provide inter-coder reliability tests on the first and second arguments of some connectives and discuss the most important sources of disagreement among annotators.
|
|
12:40–13:00
|
Closing remarks
|
Thursday, July 15, 2010 |
9:00–9:15
|
Opening Remarks
|
9:15–10:30
|
Session 1: Extraction
9:15–9:40 |
Two Strong Baselines for the BioNLP 2009 Event Extraction Task
Andreas Vlachos
show abstracthide abstractThis paper presents two strong baselines for the BioNLP 2009 shared task on event extraction. First we re-implement a rule-based approach which allows us to explore the task and the effect of domain-adapted parsing on it. We then replace the rule-based component with support vector machine classifiers and achieve performance near the state-of-the-art without using any external resources. The good performances achieved and the relative simplicity of both approaches make them reproducible baselines. We conclude with suggestions for future work with respect to the task representation.
|
9:40–10:05 |
Recognizing Biomedical Named Entities Using Skip-Chain Conditional Random Fields
Jingchen Liu, Minlie Huang and Xiaoyan Zhu
show abstracthide abstractLinear-chain Conditional Random Fields (CRF) has been applied to perform the Named Entity Recognition (NER) task in many biomedical text mining and information extraction systems. However, the linear-chain CRF cannot capture long distance dependency, which is very common in the biomedical literature. In this paper, we propose a novel study of capturing such long distance dependency by defining two principles of constructing skip-edges for a skip-chain CRF: linking similar words and linking words having typed dependencies. The approach is applied to recognize gene/protein mentions in the literature. When tested on the BioCreAtIvE II Gene Mention dataset and GENIA corpus, the approach contributes significant improvements over the linear-chain CRF. We also present in-depth error analysis on inconsistent labeling and study the influence of the quality of skip edges on the labeling performance.
|
10:05–10:30 |
Event Extraction for Post-Translational Modifications
Tomoko Ohta, Sampo Pyysalo, Makoto Miwa, Jin-Dong Kim and Jun’ichi Tsujii
show abstracthide abstractWe consider the task of automatically extracting post-translational modification events from biomedical scientific publications. Building on the success of event extraction for phosphorylation events in the BioNLP’09 shared task, we extend the event annotation approach to four major new post-transitional modification event types. We present a new targeted corpus of 157 PubMed abstracts annotated for over 1000 proteins and 400 post-translational modification events identifying the modified proteins and sites. Experiments with a state-of-the-art event extraction system show that the events can be extracted with 52% precision and 36% recall (42% Fscore), suggesting remaining challenges in the extraction of the events. The annotated corpus is freely available in the BioNLP’09 shared task format at the GENIA project homepage.
|
|
10:30–11:00
|
Morning coffee break
|
11:00–12:30
|
Session 2
11:00–12:00 |
Keynote speaker, W. John Wilbur: Text Mining and Intelligence
W. John Wilbur
|
12:05–12:30 |
Scaling up Biomedical Event Extraction to the Entire PubMed
Jari Björne, Filip Ginter, Sampo Pyysalo, Jun’ichi Tsujii and Tapio Salakoski
show abstracthide abstractWe present the first full-scale event extraction experiment covering the titles and abstracts of all PubMed citations. Extraction is performed using a pipeline composed of state-of-the-art methods: the BANNER named entity recognizer, the McClosky-Charniak domain-adapted parser, and the Turku Event Extraction System. We analyze the statistical properties of the resulting dataset and present evaluations of the core event extraction as well as negation and speculation detection components of the system. Further, we study in detail the set of extracted events relevant to the apoptosis pathway to gain insight into the biological relevance of the result. The dataset, consisting of 19.2 million occurrences of 4.5 million unique events, is freely available for use in research at http://bionlp.utu.fi/.
|
|
12:30–14:00
|
Lunch break
|
14:00–14:50
|
Session 3: Foundations
14:00–14:25 |
A Comparative Study of Syntactic Parsers for Event Extraction
Makoto Miwa, Sampo Pyysalo, Tadayoshi Hara and Jun’ichi Tsujii
show abstracthide abstractThe extraction of bio-molecular events from text is an important task for a number of domain applications such as pathway construction. Several syntactic parsers have been used in Biomedical Natural Language Processing (BioNLP) applications, and the BioNLP 2009 Shared Task results suggest that incorporation of syntactic analysis is important to achieving state-of-the-art performance. Direct comparison of parsers is complicated by to differences in the such as the division between phrase structure- and dependency-based analyses and the variety of output formats, structures and representations applied. In this paper, we present a task-oriented comparison of five parsers, measuring their contribution to bio-molecular event extraction using a state-of-the-art event extraction system. The results show that the parsers with domain models using dependency formats provide very similar performance, and that an ensemble of different parsers in different formats can improve the event extraction system.
|
14:25–14:50 |
Arguments of Nominals in Semantic Interpretation of Biomedical Text
Halil Kilicoglu, Marcelo Fiszman, Graciela Rosemblat, Sean Marimpietri and Thomas Rindflesch
show abstracthide abstractBased on linguistic generalizations, we enhanced an existing semantic processor, SemRep, for effective interpretation of a wide range of patterns used to express arguments of nominalization in clinically oriented biomedical text. Nominalizations are pervasive in the scientific literature, yet few text mining systems adequately address them, thus missing a wealth of information. We evaluated the system by assessing the algorithm independently and by determining its contribution to SemRep generally. The first evaluation demonstrated the strength of the method through an F-score of 0.646 (P=0.743, R=0.569), which is more than 20 points higher than the baseline. The second evaluation showed that overall SemRep results were increased to F-score 0.689 (P=0.745, R=0.640), approximately 25 points better than processing without nominalizations.
|
|
14:50–15:15
|
Session 4: High-level tasks
14:50–15:15 |
Improving Summarization of Biomedical Documents Using Word Sense Disambiguation
Laura Plaza, Mark Stevenson and Alberto Díaz
show abstracthide abstractWe describe a concept-based summarization system for biomedical documents and show that its performance can be improved using Word Sense Disambiguation. The system represents the documents as graphs formed from concepts and relations from the UMLS. A degree-based clustering algorithm is applied to these graphs to discover different themes or topics within the document. To create the graphs, the MetaMap program is used to map the text onto concepts in the UMLS Metathesaurus. This paper shows that applying a graph-based Word Sense Disambiguation algorithm to the output of MetaMap improves the quality of the summaries that are generated.
|
|
15:30–16:00
|
Afternoon coffee break
|
16:00–16:50
|
Session 4: High-level tasks, continued
16:00–16:25 |
Cancer Stage Prediction Based on Patient Online Discourse
Mukund Jha and Noemie Elhadad
show abstracthide abstractForums and mailing lists dedicated to particular diseases are increasingly popular online. Automatically inferring the health status of a patient can be useful for both forum users and health researchers who study patients’ online behaviors. In this paper, we focus on breast cancer forums and present a method to predict the stage of patients’ cancers from their online discourse. We show that what the patients talk about (content-based features) and whom they interact with (social network-based features) provide complementary cues to predicting cancer stage and can be leveraged for better prediction. Our methods are extendable and can be applied to other tasks of acquiring contextual information about online health forum participants.
|
16:25–16:50 |
An Exploration of Mining Gene Expression Mentions and Their Anatomical Locations from Biomedical Text
Martin Gerner, Goran Nenadic and Casey M. Bergman
show abstracthide abstractHere we explore mining data on gene expres-sion from the biomedical literature and present Gene Expression Text Miner (GETM), a tool for extraction of information about the expression of genes and their ana-tomical locations from text. Provided with recognized gene mentions, GETM identifies mentions of anatomical locations and cell lines, and extracts text passages where au-thors discuss the expression of a particular gene in specific anatomical locations or cell lines. This enables the automatic construction of expression profiles for both genes and anatomical locations. Evaluated against a manually extended version of the BioNLP ’09 corpus, GETM achieved precision and recall levels of 58.8% and 23.8%, respectively. Application of GETM to MEDLINE and PubMed Central yielded over 700,000 gene expression mentions. This data set may be queried through a web interface, and should prove useful not only for researchers who are interested in the developmental regulation of specific genes of interest, but also for database curators aiming to create structured repositories of gene expression information. The compiled tool, its source code, the manually annotated evaluation corpus and a search query interface to the data set extracted from MEDLINE and PubMed Central is available at http://getm-project.sourceforge.net/.
|
|
16:50–17:00
|
Poster Boaster Session and Conclusions
|
17:00–17:30
|
Poster Session
37 |
Exploring Surface-Level Heuristics for Negation and Speculation Discovery in Clinical Texts
Emilia Apostolova and Noriko Tomuro
show abstracthide abstractWe investigate the automatic identification of negated and speculative statements in biomedical texts, focusing on the clinical domain. Our goal is to evaluate the performance of simple, Regex-based algorithms that have the advantage of low computational cost, simple implementation, and avoid the problems associated with the accurate computation of deep linguistic features of idiosyncratic clinical texts. The performance of the NegEx algorithm with an additional set of Regex-based rules reveals promising results (evaluated on the BioScope corpus). Current and future work focuses on a bootstrapping algorithm for the discovery of new rules from unannotated clinical texts.
|
38 |
Disease Mention Recognition with Specific Features
Md. Faisal Mahbub Chowdhury and Alberto Lavelli
show abstracthide abstractDespite an increasing amount of research on biomedical named entity recognition, there has been not enough work done on disease mention recognition. Difficulty of obtaining adequate corpora is one of the key reasons which hindered this particular research. Previous studies argue that correct identification of disease mentions is the key issue for further improvement of the disease-centric knowledge extraction tasks. In this paper, we present a machine learning based approach that uses a feature set tailored for disease mention recognition and outperforms the state-of-the-art results. The paper also discusses why a feature set for the well studied gene/protein mention recognition task is not necessarily equally effective for other biomedical semantic types such as diseases.
|
39 |
Extraction of Disease-Treatment Semantic Relations from Biomedical Sentences
Oana Frunza and Diana Inkpen
show abstracthide abstractThis paper describes our study on identifying semantic relations that exist between diseases and treatments in biomedical sentences. We focus on three semantic relations: Cure, Prevent, and Side Effect. The contributions of this paper consists in the fact that better results are obtained com-pared to previous studies, and the fact that our research settings allow the integration of biomedical and medical knowledge. We obtain 98.55% F-measure for the Cure relation, 100% F-measure for the Prevent relation, and 88.89% F-measure for the Side Effect relation.
|
40 |
Identifying the Information Structure of Scientific Abstracts: An Investigation of Three Different Schemes
Yufan Guo, Anna Korhonen, Maria Liakata, Ilona Silins, Lin Sun and Ulla Stenius
show abstracthide abstractMany practical tasks require accessing specific types of information in scientific literature; e.g. information about the objective, methods, results or conclusions of the study in question. Several schemes have been developed to characterize such information in full journal papers. Yet many tasks focus on abstracts instead. We take three schemes of different type and granularity (those based on section names, argumentative zones and conceptual structure of documents) and investigate their applicability to biomedical abstracts. We show that even for the finest-grained of these schemes, the majority of categories appear in abstracts and can be identified relatively reliably using machine learning. We discuss the impact of our results and the need for subsequent task-based evaluation of the schemes.
|
41 |
Reconstruction of Semantic Relationships from Their Projections in Biomolecular Domain
Juho Heimonen, Jari Björne and Tapio Salakoski
show abstracthide abstractThe extraction of nested, semantically rich relationships of biological entities has recently gained popularity in the biomedical text mining community. To move toward this objective, a method is proposed for reconstructing original semantic relationship graphs from projections, where each node and edge is mapped to the representative of its equivalence class, by determining the relationship argument combinations that represent real relationships. It generalises the limited postprocessing step of the best-performing system in the BioNLP’09 Shared Task on Event Extraction and hence extends this extraction method to arbitrarily deep relationships with unrestricted primary argument combinations. The viability of the method is shown by successfully extracting nested relationships in BioInfer and the corpus of the BioNLP’09 Shared Task on Event Extraction. The reported results, to the best of our knowledge, are the first for the nested relationships in BioInfer on a task in which only named entities are given.
|
42 |
Towards Internet-Age Pharmacovigilance: Extracting Adverse Drug Reactions from User Posts in Health-Related Social Networks
Robert Leaman, Laura Wojtulewicz, Ryan Sullivan, Annie Skariah, Jian Yang and Graciela Gonzalez
show abstracthide abstractAdverse reactions to drugs are among the most common causes of death in industrialized nations. Expensive clinical trials are not sufficient to uncover all of the adverse reactions a drug may cause, necessitating systems for post-marketing surveillance, or pharmacovigilance. These systems have typically relied on voluntary reporting by health care professionals. However, self-reported patient data has become an increasingly important resource, with efforts such as MedWatch from the FDA allowing reports directly from the consumer. In this paper, we propose mining the relationships between drugs and adverse reactions as reported by the patients themselves in user comments to health-related websites. We evaluate our system on a manually annotated set of user comments, with promising performance. We also report correlations between the frequency of adverse drug reactions found by our system in unlabeled data and the frequency of documented adverse drug reactions. We conclude that user comments pose a significant natural language processing challenge, but do contain useful extractable information which merits further exploration.
|
43 |
Semantic Role Labeling of Gene Regulation Events: Preliminary Results
Roser Morante
show abstracthide abstractThis abstract describes work in progress on semantic role labeling of gene regulation events. We present preliminary results of a supervised semantic role labeler that has been trained and tested on the GREC corpus.
|
44 |
Ontology-Based Extraction and Summarization of Protein Mutation Impact Information
Nona Naderi and René Witte
show abstracthide abstractLarge effort has been expended in the study of modifications in genetic material, known as mutations. Mutations can have far-ranging consequences in medical, agricultural, and industrial domains; a significant and increasing number of publications describe the impacts of specific mutations. As manually curated databases, like the Protein Mutant Database (PMD) cannot keep up with the rapid pace of mutation research, NLP methods for extracting mutation information from the bibliome has become an important new research area within bio-NLP. A large number of systems now attempt to detect mutation information and extract them into structured formats. However, while significant progress has been made with respect to mutation detection, the automated extraction of the impacts of these mutations has so far not been targeted. In this paper, we describe the first work to automatically summarize impact information from protein mutations. Our approach is based on populating an OWL-DL ontology with impact information, which can then be queried to provide structured information, including a summary.
|
45 |
Extracting Distinctive Features of Swine (H1N1) Flu through Data Mining Clinical Documents
Heekyong Park and Jinwook Choi
show abstracthide abstractEarly recognition of distinguishing pat-terns of a novel pandemic disease is important. We introduce a methodological approach based on popular data mining techniques to extract key features and temporal pat-terns of swine (h1n1) flu that is discriminated from swine flu like symptoms.
|
46 |
Towards Event Extraction from Full Texts on Infectious Diseases
Sampo Pyysalo, Tomoko Ohta, Han-Cheol Cho, Dan Sullivan, Chunhong Mao, Bruno Sobral, Jun’ichi Tsujii and Sophia Ananiadou
show abstracthide abstractEvent extraction approaches based on expressive structured representations of extracted information have been a significant focus of research in recent biomedical natural language processing studies. However, event extraction efforts have so far been limited to publication abstracts, with most studies further considering only the specific transcription factor-related subdomain of molecular biology of the GENIA corpus. To establish the broader relevance of the event extraction approach and proposed methods, it is necessary to expand on these constraints. In this study, we propose an adaptation of the event extraction approach to a subdomain related to infectious diseases and present analysis and initial experiments on the feasibility of event extraction from domain full text publications.
|
47 |
Applying the TARSQI Toolkit to Augment Text Mining of EHRs
Amber Stubbs and Benjamin Harshfield
show abstracthide abstractWe present a preliminary attempt to apply the TARSQI Toolkit to the medical domain, specifically electronic health records, for use in answering temporally motivated questions.
|
48 |
Integration of Static Relations to Enhance Event Extraction from Text
Sofie Van Landeghem, Sampo Pyysalo, Tomoko Ohta and Yves Van de Peer
show abstracthide abstractAs research on biomedical text mining is shifting focus from simple binary relations to more expressive event representations, extraction performance drops due to the increase in complexity. Recently introduced data sets specifically targeting static relations between named entities and domain terms have been suggested to enable a better representation of the biological processes underlying annotated events and opportunities for addressing their complexity. In this paper, we present the first study of integrating these static relations with event data with the aim of enhancing event extraction performance. While obtaining promising results, we will argue that an event extraction framework will benefit most from this new data when taking intrinsic differences between various event types into account.
|
|
Friday July 16, 2010 |
09:00–09:10
|
Welcome to TextGraphs 5
|
09:10–10:30
|
Session 1: Lexical Clustering and Disambiguation
09:10–09:30 |
Graph-based Clustering for Computational Linguistics: a Survey
Zheng Chen and Heng Ji
show abstracthide abstractIn this survey we overview graph-based clus-tering and its applications in computational linguistics. We summarize graph-based clus-tering as a five-part story: hypothesis, model-ing, measure, algorithm and evaluation. We then survey three typical NLP problems in which graph-based clustering approaches have been successfully applied. Finally, we comment on the strengths and weaknesses of graph-based clustering and envision that graph-based clustering is a promising solution for some emerging NLP problems.
|
09:30–09:50 |
Towards the Automatic Creation of a Wordnet from a Term-based Lexical Network
Hugo Gonçalo Oliveira and Paulo Gomes
show abstracthide abstractThe work described here aims to create a wordnet automatically from a semantic network based on terms. So, a clustering procedure is ran over a synonymy network, in order to obtain synsets. Then, the term arguments of each relational triple are assigned to the latter, originating a wordnet. Experiments towards our goal are reported and their results validated.
|
09:50–10:10 |
An Investigation on the Influence of Frequency on the Lexical Organization of Verbs
Daniel German, Aline Villavicencio and Maity Siqueira
show abstracthide abstractThis work extends the study of Germann et al. (2010) in investigating the lexical organization of verbs. Particularly, we look at the influence of frequency on the process of lexical acquisition and use. We examine data obtained from psycholinguistic action naming tasks performed by children and adults (speakers of Brazilian Portuguese), and analyze some characteristics of the verbs used by each group in terms of similarity of content, using Jaccard‟s coefficient, and of topology, using graph theory. The experiments suggest that younger children tend to use more frequent verbs than adults to describe events in the world.
|
10:10–10:30 |
Robust and Efficient Page Rank for Word Sense Disambiguation
Diego De Cao, Roberto Basili, Matteo Luciani, Francesco Mesiano and Riccardo Rossi
show abstracthide abstractGraph-based methods that are en vogue in the social network analysis area, such as centrality models, have been recently applied to linguistic knowledge bases, including unsupervised Word Sense Disambiguation. Although the achievable accuracy is rather high, the main drawback of these methods is the high computational demanding whenever applied to the large scale sense repositories. In this paper an adaptation of the PageRank algorithm recently proposed for Word Sense Disambiguation is presented that preserves the reachable accuracy while significantly reducing the requested processing time. Experimental analysis over well-known benchmarks will be presented in the paper and the results confirm our hypothesis.
|
|
10:30–11:00
|
Coffee Break
|
11:00–11:40
|
Session 2: Clustering Languages and Dialects
11:00–11:20 |
Hierarchical Spectral Partitioning of Bipartite Graphs to Cluster Dialects and Identify Distinguishing Features
Martijn Wieling and John Nerbonne
show abstracthide abstractIn this study we apply hierarchical spectral partitioning of bipartite graphs to a Dutch dialect dataset to cluster dialect varieties and determine the concomitant sound correspondences. An important advantage of this clustering method over other dialectometric methods is that the linguistic basis is simultaneously determined, bridging the gap between traditional and quantitative dialectology. Besides showing that the results of the hierarchical clustering improve over the flat spectral clustering method used in an earlier study (Wieling and Nerbonne, 2009), the values of the second singular vector used to generate the two-way clustering can be used to identify the most important sound correspondences for each cluster. This is an important advantage of the hierarchical method as it obviates the need for external methods to determine the most important sound correspondences for a geographical cluster.
|
11:20–11:40 |
A Character-Based Intersection Graph Approach to Linguistic Phylogeny
Jessica Enright
show abstracthide abstractLinguists use phylogenetic methods to build evolutionary trees of languages given lexical, phonological, and morphological data. Perfect phylogeny is too restrictive to explain most data sets. Conservative Dollo phylogeny is more permissive, and has been used in biological applications. We propose the use of conservative Dollo phylogeny as an alternative or complementary approach for linguistic phylogenetics. We test this approach on an Indo-European dataset.
|
|
11:40–12:40
|
Invited Talk
11:40–12:40 |
Spectral Approaches to Learning in the Graph Domain
Edwin Hancock
show abstracthide abstractThis talk will commence by discussing some of the problems that arise if machine learning is attempted with graphs. Based on this discussion the talk will define a taxonomy of different methods organised around a) clustering b) characterisation and c) constructing generative models in the graph domain. With this taxonomy to-hand, I will describe a number of graph-spectral algorithms that can be applied to solve these different problems. The talk will be furnished with examples from computer vision.
|
|
12:50–13:50
|
Lunch break
|
13:50–15:30
|
Session 3: Lexical Similarity and Its application
13:50–14:10 |
Cross-lingual Comparison between Distributionally Determined Word Similarity Networks
Olof Görnerup and Jussi Karlgren
show abstracthide abstractAs an initial effort to identify universal and language-specific factors that influence the behavior of distributional models, we have formulated a distributionally determined word similarity network model, implemented it for eleven different languages, and compared the resulting networks. In the model, vertices constitute words and two words are linked if they occur in similar contexts. The model is found to capture clear isomorphisms across languages in terms of syntactic and semantic classes, as well as functional categories of abstract discourse markers. Language specific morphology is found to be a dominating factor for the accuracy of the model.
|
14:10–14:30 |
Co-occurrence Cluster Features for Lexical Substitutions in Context
Chris Biemann
show abstracthide abstractThis paper examines the influence of features based on clusters of co-occurrences for supervised Word Sense Disambiguation and Lexical Substitution. Co-occurrence cluster features are derived from clustering the local neighborhood of a target word in a cooccurrence graph based on a corpus in a completely unsupervised fashion. Clusters can be assigned in context and are used as features in a supervised WSD system. Experiments fitting a strong baseline system with these additional features are conducted on two datasets, showing improvements. Co-occurrence features are a simple way to mimic Topic Signatures (Mart´ınez et al., 2008) without needing to construct resources manually. Further, a system is described that produces lexical substitutions in context with very high precision.
|
14:30–14:50 |
Contextually-Mediated Semantic Similarity Graphs for Topic Segmentation
Geetu Ambwani and Anthony Davis
show abstracthide abstractWe present a representation of documents as directed, weighted graphs, modeling the range of influence of terms within the document as well as contextually determined semantic relatedness among terms. We then show the usefulness of this kind of representation in topic segmentation. Our boundary detection algorithm uses this graph to determine topical coherence and potential topic shifts, and does not require labeled data or training of parameters. We show that this method yields improved results on both concatenated pseudo-documents and on closed-captions for television programs.
|
14:50–15:10 |
MuLLinG: MultiLevel Linguistic Graphs for Knowledge Extraction
Vincent Archer
show abstracthide abstractMuLLinG is a model for knowledge extraction (especially lexical extraction from corpora), based on multilevel graphs. Its aim is to allow large-scale data acquisition, by making it easy to realize automatically, and simple to configure by linguists with limited knowledge in computer programming. In MuLLinG, each new level represents the information in a different manner (more and more abstract). We also introduce several associated operators, written to be as generic as possible. They are independent of what nodes and edges represent, and of the task to achieve. Consequently, they allow the description of a complex extraction process as a succession of simple graph manipulations. Finally, we present an experiment of collocation extraction using MuLLinG model.
|
15:10–15:30 |
Experiments with CST-based Multidocument Summarization
Maria Lucia Castro Jorge and Thiago Pardo
show abstracthide abstractRecently, with the huge amount of growing information in the web and the little available time to read and process all this information, automatic summaries have become very important resources. In this work, we evaluate deep content selection methods for multidocument summarization based on the CST model (Cross-document Structure Theory). Our methods consider summarization preferences and focus on the overall main problems of multidocument treatment: redundancy, complementarity, and contradiction among different information sources. We also evaluate the impact of the CST model over superficial summarization systems. Our results show that the use of CST model helps to improve informativeness and quality in automatic summaries.
|
|
15:30–16:00
|
Coffee Break
|
16:00–17:00
|
Special Session on Opinion Mining
16:00–16:20 |
Distinguishing between Positive and Negative Opinions with Complex Network Features
Diego Raphael Amancio, Renato Fabbri, Osvaldo Novais Oliveira Jr., Maria das Graças Volpe Nunes and Luciano da Fontoura Costa
show abstracthide abstractTopological and dynamic features of complex networks have proven to be suitable for capturing text characteristics in recent years, with various applications in natural language processing. In this article we show that texts with positive and negative opinions can be distinguished from each other when represented as complex networks. The distinction was possible by obtaining several metrics of the networks, including the in-degree, out-degree, shortest paths, clustering coefficient, betweenness and global efficiency. For visualization, the obtained multidimensional dataset was projected into a 2-dimensional space with the canonical variable analysis. The distinction was quantified using machine learning algorithms, which allowed an recall of 70\% in the automatic discrimination for the negative opinions, even without attempts to optimize the pattern recognition process.
|
16:20–16:40 |
Image and Collateral Text in Support of Auto-annotation and Sentiment Analysis
Pamela Zontone, Giulia Boato, Jonathon Hare, Paul Lewis, Stefan Siersdorfer and Enrico Minack
show abstracthide abstractWe present a brief overview of the way in which image analysis, coupled with associated collateral text, is being used for auto-annotation and sentiment analysis. In particular, we describe our approach to auto-annotation using the graph-theoretic dominant set clustering algorithm and the annotation of images with sentiment scores from SentiWordNet. Preliminary results are given for both, and our planned work aims to explore synergies between the two approaches.
|
16:40–17:00 |
Aggregating Opinions: Explorations into Graphs and Media Content Analysis
Gabriele Tatzl and Christoph Waldhauser
show abstracthide abstractUnderstanding, as opposed to reading is vital for the extraction of opinions out of a text. This is especially true, as an author’s opinion is not always clearly marked. Finding the overall opinion in a text can be challenging to both human readers and computers alike. Media Content Analysis is a popular method of extracting information out of a text, by means of human coders. We describe the difficulties humans have and the process they use to extract opinions and offer a formalization that could help to automate opinion extraction within the Media Content Analysis framework.
|
|
17:00–17:40
|
Session 5: Spectral Approaches
17:00–17:20 |
Eliminating Redundancy by Spectral Relaxation for Multi-Document Summarization
Fumiyo Fukumoto, Akina Sakai and Yoshimi Suzuki
show abstracthide abstractThis paper focuses on redundancy, overlapping information in multi-documents, and presents a method for detecting salient, key sentences from documents that discuss the same event. To eliminate redundancy, we used spectral clustering and classified each sentence into groups, each of which consists of semantically related sentences. Then, we applied link analysis, the Markov Random Walk(MRW) Model to deciding the importance of a sentence within documents. The method was tested on the NTCIR evaluation data, and the result shows the effectiveness of the method.
|
17:20–17:40 |
Computing Word Senses by Semantic Mirroring and Spectral Graph Partitioning
Martin Fagerlund, Magnus Merkel, Lars Eldén and Lars Ahrenberg
show abstracthide abstractUsing the technique of "semantic mirroring" a graph is obtained that represents words and their translations from a parallel corpus or a bilingual lexicon. The connectedness of the graph holds information about the different meanings of words that occur in the translations. Spectral graph theory is used to partition the graph, which leads to a grouping of the words according to different senses. We also report results from an evaluation using a small sample of seed words from a lexicon of Swedish and English adjectives.
|
|
17:40–18:00
|
Final Wrap-up
|