|
TITLE |
Multilingual Subjectiviy and Sentiment Analysis
|
PRESENTERS |
Rada Mihalcea, Carmen Banea and Janyce Wiebe |
ABSTRACT |
Subjectivity and sentiment analysis focuses on the automatic
identification of private states, such as opinions, emotions,
sentiments, evaluations, beliefs, and speculations in natural
language. While subjectivity classification labels text as either
subjective or objective, sentiment classification adds an additional
level of granularity, by further classifying subjective text as either
positive, negative or neutral.
While much of the research work in this area has been applied to
English, research on other languages is growing, including Japanese,
Chinese, German, Spanish, Romanian. While most of the researchers in
the field are familiar with the methods applied on English, few of
them have closely looked at the original research carried out in other
languages. For example, in languages such as Chinese, researchers have
been looking at the ability of characters to carry sentiment
information. In Romanian, due to markers of politeness and additional
verbal modes embedded in the language, experiments have hinted that
subjectivity detection may be easier to achieve. These additional
sources of information may not be available across all languages, yet,
various articles have pointed out that by investigating a synergistic
approach for detecting subjectivity and sentiment in multiple
languages at the same time, improvements can be achieved not only in
other languages, but in English as well. The development and interest
in these methods is also highly motivated by the fact that only 27% of
Internet users speak English (www.internetworldstats.com/stats.htm,
Oct 11, 2011), and that number diminishes further every year, as more
people across the globe gain Internet access.
The aim of this tutorial is to familiarize the attendees with the
subjectivity and sentiment research carried out on languages other
than English in order to enable and promote
cross-fertilization. Specifically, we will review work along three
main directions. First, we will present methods where the resources
and tools have been specifically developed for a given target
language. In this category, we will also briefly overview the main
methods that have been proposed for English, but which can be easily
ported to other languages. Second, we will describe cross-lingual
approaches, including several methods that have been proposed to
leverage on the resources and tools available in English by using
cross-lingual projections. Finally, third, we will show how the
expression of opinions and polarity pervades language boundaries, and
thus methods that holistically explore multiple languages at the same
time can be effectively considered.
|
OUTLINE |
I. Sentiment and subjectivity analysis
• Definitions
• Examples
• Applications
II. Sentiment and subjectivity analysis on English
• Lexicons: words and phrases; methods to build lexicons
• Corpora: manually annotated corpora; online corpora
• Tools: rule-based and statistical approaches
SENTIMENT AND SUBJECTIVITY ANALYSIS ON OTHER LANGUAGES:
III. Word- and phrase-level annotations
• Dictionary-based methods: manual and automatic annotations
• Corpus-based methods: inferring word and phrase polarity from corpora
• Hybrid
IV. Sentence level annotations
• Dictionary-based: rule-based systems and bootstrapping
• Corpus-based: cross-lingual projections
V. Document level annotations
• Dictionary-based: rule-based systems and bootstrapping
• Corpus-based: collections of online reviews, multilingual co-training
VI. What works, what doesn't
• An overview of the main methods with evaluations and
comparative analyses of the main benefits and challenges
|
PRESENTER BIOS |
• RADA MIHALCEA is an Associate Professor in the Department of Computer
Science and Engineering at University of North Texas. Her research
interests are in computational linguistics, with a focus on lexical
semantics, graph-based algorithms for natural language processing, and
multilingual natural language processing. She is currently involved in
a number of research projects, including subjectivity, sentiment, and
emotion analysis, word sense disambiguation, monolingual and
cross-lingual semantic similarity. She serves or has served on the
editorial boards of the Journals of Computational Linguistics,
Language Resources and Evaluations, Natural Language Engineering,
Research in Language in Computation, IEEE Transactions on Affective
Computing, and Transactions of the Association for Computational
Linguistics. She is the recipient of a National Science Foundation
CAREER award (2008) and a Presidential Early Career Award for
Scientists and Engineers (2009). Together with collaborators, she
presented several tutorials in the past, at ACL (2005), AAAI (2005),
RANLP (2005), EUROLAN (2005), NAACL (2006), ESSLI (2006), EUROLAN
(2007), IJCNLP (2008).
• CARMEN BANEA is a doctoral student in the Department of Computer
Science at the University of North Texas, working on research in the
field of Natural Language Processing. Her research work focuses
primarily on multilingual approaches to subjectivity and sentiment
analysis, where she developed both dictionary and corpus based methods
that leverage on languages with rich resources to create tools and
data in other languages. She received her Master degree in Computer
Science from the University of North Texas in 2009. She published
eight research papers in major Natural Language Processing conferences
(i.e. ACL, EMNLP, LREC), including a paper that received the IEEE best
student paper award, and co-authored a chapter on multilingual
sentiment and subjectivity in the book entitled "Multilingual Natural
Language Applications: From Theory to Practice" (Prentice Hall,
2010). She was one of the organizers of the University of North Texas
site of the North American Computational Linguistics Olympiad.
• JANYCE WIEBE is Professor of Computer Science and Director of the
Intelligent Systems Program at the University of Pittsburgh. Her
research with students and colleagues has been in discourse
processing, pragmatics, and word-sense disambiguation. A major
concentration of her research is "subjectivity analysis", recognizing
and interpreting expressions of opinions and sentiments in text, to
support NLP applications such as question answering, information
extraction, text categorization, and summarization. Her professional
roles have included ACL Program Co-Chair, NAACL Program Chair, NAACL
Executive Board member, Computational Linguistics and Language
Resources and Evaluation Editorial Board member, AAAI Workshop
Co-Chair, ACM Special Interest Group on Artificial Intelligence
(SIGART) Vice-Chair, and ACM-SIGART/AAAI Doctoral Consortium Chair.
|
|
|