TITLE

Multilingual Subjectiviy and Sentiment Analysis

PRESENTERS

Rada Mihalcea, Carmen Banea and Janyce Wiebe

ABSTRACT
Subjectivity and sentiment analysis focuses on the automatic identification of private states, such as opinions, emotions, sentiments, evaluations, beliefs, and speculations in natural language. While subjectivity classification labels text as either subjective or objective, sentiment classification adds an additional level of granularity, by further classifying subjective text as either positive, negative or neutral. While much of the research work in this area has been applied to English, research on other languages is growing, including Japanese, Chinese, German, Spanish, Romanian. While most of the researchers in the field are familiar with the methods applied on English, few of them have closely looked at the original research carried out in other languages. For example, in languages such as Chinese, researchers have been looking at the ability of characters to carry sentiment information. In Romanian, due to markers of politeness and additional verbal modes embedded in the language, experiments have hinted that subjectivity detection may be easier to achieve. These additional sources of information may not be available across all languages, yet, various articles have pointed out that by investigating a synergistic approach for detecting subjectivity and sentiment in multiple languages at the same time, improvements can be achieved not only in other languages, but in English as well. The development and interest in these methods is also highly motivated by the fact that only 27% of Internet users speak English (www.internetworldstats.com/stats.htm, Oct 11, 2011), and that number diminishes further every year, as more people across the globe gain Internet access.

The aim of this tutorial is to familiarize the attendees with the subjectivity and sentiment research carried out on languages other than English in order to enable and promote cross-fertilization. Specifically, we will review work along three main directions. First, we will present methods where the resources and tools have been specifically developed for a given target language. In this category, we will also briefly overview the main methods that have been proposed for English, but which can be easily ported to other languages. Second, we will describe cross-lingual approaches, including several methods that have been proposed to leverage on the resources and tools available in English by using cross-lingual projections. Finally, third, we will show how the expression of opinions and polarity pervades language boundaries, and thus methods that holistically explore multiple languages at the same time can be effectively considered.
OUTLINE
  I. Sentiment and subjectivity analysis
    • Definitions
    • Examples
    • Applications

  II. Sentiment and subjectivity analysis on English
    • Lexicons: words and phrases; methods to build lexicons
    • Corpora: manually annotated corpora; online corpora
    • Tools: rule-based and statistical approaches

  SENTIMENT AND SUBJECTIVITY ANALYSIS ON OTHER LANGUAGES:

  III. Word- and phrase-level annotations
    • Dictionary-based methods: manual and automatic annotations
    • Corpus-based methods: inferring word and phrase polarity from corpora
    • Hybrid

  IV. Sentence level annotations
    • Dictionary-based: rule-based systems and bootstrapping
    • Corpus-based: cross-lingual projections

  V. Document level annotations
    • Dictionary-based: rule-based systems and bootstrapping
    • Corpus-based: collections of online reviews, multilingual co-training

  VI. What works, what doesn't
    • An overview of the main methods with evaluations and comparative analyses of the main benefits and challenges
PRESENTER BIOS
• RADA MIHALCEA is an Associate Professor in the Department of Computer Science and Engineering at University of North Texas. Her research interests are in computational linguistics, with a focus on lexical semantics, graph-based algorithms for natural language processing, and multilingual natural language processing. She is currently involved in a number of research projects, including subjectivity, sentiment, and emotion analysis, word sense disambiguation, monolingual and cross-lingual semantic similarity. She serves or has served on the editorial boards of the Journals of Computational Linguistics, Language Resources and Evaluations, Natural Language Engineering, Research in Language in Computation, IEEE Transactions on Affective Computing, and Transactions of the Association for Computational Linguistics. She is the recipient of a National Science Foundation CAREER award (2008) and a Presidential Early Career Award for Scientists and Engineers (2009). Together with collaborators, she presented several tutorials in the past, at ACL (2005), AAAI (2005), RANLP (2005), EUROLAN (2005), NAACL (2006), ESSLI (2006), EUROLAN (2007), IJCNLP (2008).

• CARMEN BANEA is a doctoral student in the Department of Computer Science at the University of North Texas, working on research in the field of Natural Language Processing. Her research work focuses primarily on multilingual approaches to subjectivity and sentiment analysis, where she developed both dictionary and corpus based methods that leverage on languages with rich resources to create tools and data in other languages. She received her Master degree in Computer Science from the University of North Texas in 2009. She published eight research papers in major Natural Language Processing conferences (i.e. ACL, EMNLP, LREC), including a paper that received the IEEE best student paper award, and co-authored a chapter on multilingual sentiment and subjectivity in the book entitled "Multilingual Natural Language Applications: From Theory to Practice" (Prentice Hall, 2010). She was one of the organizers of the University of North Texas site of the North American Computational Linguistics Olympiad.

• JANYCE WIEBE is Professor of Computer Science and Director of the Intelligent Systems Program at the University of Pittsburgh. Her research with students and colleagues has been in discourse processing, pragmatics, and word-sense disambiguation. A major concentration of her research is "subjectivity analysis", recognizing and interpreting expressions of opinions and sentiments in text, to support NLP applications such as question answering, information extraction, text categorization, and summarization. Her professional roles have included ACL Program Co-Chair, NAACL Program Chair, NAACL Executive Board member, Computational Linguistics and Language Resources and Evaluation Editorial Board member, AAAI Workshop Co-Chair, ACM Special Interest Group on Artificial Intelligence (SIGART) Vice-Chair, and ACM-SIGART/AAAI Doctoral Consortium Chair.