[PDF]
Over the last ten years, there has been an explosion in interest in sentiment analysis, with many interesting and impressive results. For example, the first twenty publications on Google Scholar returned for the Query "sentiment analysis'' all date from 2003 or later, and have a total citation count of 12,140. The total number of publications is in the thousands. Partly, this interest is driven by the immediate commercial applications of sentiment analysis.
Sentiment is a "private state'' (Wiebe 1990). However, it is not the only private state that has received attention in the computational literature; others include belief and intention. In this tutorial, we propose to provide a deeper understanding of what a private state is. We will concentrate on sentiment and belief. Belief is very closely related to factuality, and also to notions such as veridicality, modality, and hedging. We will provide background that will allow the tutorial participants to understand the notion of a private state as a cognitive phenomenon, which can be manifested linguistically in various ways. We will explain the formalization in terms of a triple of state, source, and target. We will discuss how to model the source and the target. We will then explain in some detail the annotations that have been made. The issue of annotation is crucial for private states: while the MPQA corpus (Wiebe et al. 2005) has been around for some time, most research using it does not make use of many of its features. We believe this is because the MPQA annotation is quite complex and requires a deeper understanding of the phenomenon of "private state'', which is what the annotation is getting at. Furthermore, there are currently several efforts underway of creating new versions of annotations, which we will also present.
The larger goal of this tutorial is to allow the tutorial participants to gain a deeper understanding of the role of private states in human communication, and to encourage them to use this deeper understanding in their computational work. The immediate goal of this tutorial is to allow the participants to make more complete use of available annotated resources. These include the MPQA corpus, The LU Coprus (Diab et al. 2009), FactBank (Saur铆 and Pustejovsky 2009), and the corpora under development at the LDC which include sentiment and belief. We propose to achieve these goals by concentrating on annotated corpora, since this will allow participants to both understand the underlying content (achieving the larger goal) and the technical details of the annotations (achieving the immediate goal).
Tutorial Contents
1. Introduction: an overview over the issue of private states, and how they relate to other well-known concepts such as the BDI (belief-desire-intention) model \cite{bratman:1987}, related work in NLP (such as RST \cite{mann/thompson:1987} and dialog act tagging), linguistic semantics (for example, the notion of veridicity \cite{karttunen:1971} and modality), and cognitive science. (45 minutes)
2. Representing sentiment: a presentation of early work, of MPQA V2 (with nested sources, and attitude, expressive-subjective element, and target span annotations), and of MPQA Version 3 (extension of MPQA V2 to eTargets). (45 minutes)
Break (15 minutes)
3. Representing belief: a presentation of FactBank, the LU corpus, and the ongoing LDC annotation under the DARPA DEFT program. (30 minutes)
4. Integration and looking forward: a discussion of how sentiment and belief interact, and how we can integrate their annotations, including a discussion of a General Modality Annotation Scheme. (45 minutes)
Tutorial Instructors
Owen Rambow is a Senior Research Scientist at the Center for Computational Learning Systems at Columbia University. He is also the co-chair of the Center for New Media at the Data Science Institute at Columbia University. He has been interested in modeling cognitive states in relation to language for a long time, initially in the context of natural language generation (Rambow 1993 Walker and Rambow 1994). More recently, he has studied belief in the context of recognizing beliefs in language (diab et al. 2009, Prabhakaran et al. 2010, Danlos and Rambow 2011, Prabhakaran et al. 2012). He is currently leading the DARPA DEFT Belief group, working with other researchers and with the LDC to define annotation standards and evaluations. He was recently involved in the pilot evaluation for belief recognition (in English) in the DARPA DEFT program.
Janyce Wiebe is Professor of Computer Science and Professor and Co-Director
of the Intelligent Systems at the University of Pittsburgh. She has worked
on issues related to private states for some time, originally in the
context of tracking point of view in narrative (Wiebe 1994), and later in
the context of recognizing sentiment in other genres such as news articles
(Wilson et al. 2005). She has approached the area from the perspective of
corpus annotation (Wiebe et al. 2005, Deng et al. 2013), lexical semantics
(Wiebe and Mihalcea 2006), and discourse (Somasundaran et al. 2009). In
addition to continuing these lines of research, she has recently begun
investigating implicatures in opinion analysis (Deng and Wiebe 2014).
http://people.cs.pitt.edu/~wiebe/