We introduce ConnotationWordNet, a connotation lexicon over the network of words in conjunction with senses. We formulate the lexicon induction problem as collective inference over pairwise-Markov Random Fields, and present a loopy belief propagation algorithm for inference. The key aspect of our method is that it is the first unified approach that assigns the polarity of both word- and sense-level connotations, exploiting the innate bipartite graph structure encoded in WordNet. We present comprehensive evaluation to demonstrate the quality and utility of the resulting lexicon in comparison to existing connotation and sentiment lexicons.
We introduce ConnotationWordNet, a connotation lexicon over the network of words in conjunction with senses, as defined in WordNet. A connotation lexicon, as introduced first by Feng et al. (2011), aims to encompass subtle shades of sentiment a word may conjure, even for seemingly objective words such as “sculpture”, “Ph.D.”, “rosettes”. Understanding the rich and complex layers of connotation remains to be a challenging task. As a starting point, we study a more feasible task of learning the polarity of connotation.
For non-polysemous words, which constitute a significant portion of English vocabulary, learning the general connotation at the word-level (rather than at the sense-level) would be a natural operational choice. However, for polysemous words, which correspond to most frequently used words, it would be an overly crude assumption that the same connotative polarity should be assigned for all senses of a given word. For example, consider “abound”, for which lexicographers of WordNet prescribe two different senses:
(v) abound: (be abundant of plentiful; exist in large quantities)
(v) abound, burst, bristle: (be in a state of movement or action) “The room abounded with screaming children”; “The garden bristled with toddlers”
For the first sense, which is the most commonly used sense for “abound”, the general overtone of the connotation would seem positive. That is, although one can use this sense in both positive and negative contexts, this sense of “abound” seems to collocate more often with items that are good to be abundant (e.g., “resources”), than unfortunate items being abundant (e.g., “complaints”).
However, as for the second sense, for which “burst” and “bristle” can be used interchangeably with respect to this particular sense,11Hence a sense in WordNet is defined by synset (= synonym set), which is the set of words sharing the same sense. the general overtone is slightly more negative with a touch of unpleasantness, or at least not as positive as that of the first sense. Especially if we look up the WordNet entry for “bristle”, there are noticeably more negatively connotative words involved in its gloss and examples.
This word sense issue has been a universal challenge for a range of Natural Language Processing applications, including sentiment analysis. Recent studies have shown that it is fruitful to tease out subjectivity and objectivity corresponding to different senses of the same word, in order to improve computational approaches to sentiment analysis (e.g. Pestian et al. (2012), Mihalcea et al. (2012) Balahur et al. (2014)). Encouraged by these recent successes, in this study, we investigate if we can attain similar gains if we model the connotative polarity of senses separately.
There is one potential practical issue we would like to point out in building a sense-level lexical resource, however. End-users of such a lexicon may not wish to deal with Word Sense Disambiguation (WSD), which is known to be often too noisy to be incorporated into the pipeline with respect to other NLP tasks. As a result, researchers often would need to aggregate labels across different senses to derive the word-level label. Although such aggregation is not entirely unreasonable, it does not seem to be the most optimal and principled way of integrating available resources.
Therefore, in this work, we present the first unified approach that learns both sense- and word-level connotations simultaneously. This way, end-users will have access to more accurate sense-level connotation labels if needed, while also having access to more general word-level connotation labels. We formulate the lexicon induction problem as collective inference over pairwise-Markov Random Fields (pairwise-MRF) and derive a loopy belief propagation algorithm for inference.
The key aspect of our approach is that we exploit the innate bipartite graph structure between words and senses encoded in WordNet. Although our approach seems conceptually natural, previous approaches, to our best knowledge, have not directly exploited these relations between words and senses for the purpose of deriving lexical knowledge over words and senses collectively. In addition, previous studies (for both sentiment and connotation lexicons) aimed to produce only either of the two aspects of the polarity: word-level or sense-level, while we address both.
Another contribution of our work is the introduction of loopy belief propagation (loopy-BP) as a lexicon induction algorithm. Loopy-BP in our study achieves statistically significantly better performance over the constraint optimization approaches previously explored. In addition, it runs much faster and it is considerably easier to implement. Last but not least, by using probabilistic representation of pairwise-MRF in conjunction with Loopy-BP as inference, the resulting solution has the natural interpretation as the intensity of connotation. This contrasts to approaches that seek discrete solutions such as Integer Linear Programming[21].
ConnotationWordNet, the final outcome of our study, is a new lexical resource that has connotation labels over both words and senses following the structure of WordNet. The lexicon is publicly available at: http://www.cs.sunysb.edu/~junkang/connotation_wordnet.)
In what follows, we will first describe the network of words and senses (Section 2), then introduce the representation of the network structure as pairwise Markov Random Fields, and a loopy belief propagation algorithm as collective inference (Section 3). We then present comprehensive evaluation (Section 4 & 5 & 6), followed by related work (Section 7) and conclusion (Section 8).
The connotation graph, called , is a heterogeneous graph with multiple types of nodes and edges. As shown in Figure 1, it contains two types of nodes; (i) lemmas (i.e., words, 115K) and (ii) synsets (63K), and four types of edges; () predicate-argument (179K), () argument-argument (144K), () argument-synset (126K), and () synset-synset (3.4K) edges.
The predicate-argument edges, first introduced by Feng et al. (2011), depict the selectional preference of connotative predicates (i.e., the polarity of a predicate indicates the polarity of its arguments) and encode their co-occurrence relations based on the Google Web 1T corpus. The argument-argument edges are based on the distributional similarities among the arguments. The argument-synset edges capture the synonymy between argument nodes through the corresponding synsets. Finally, the synset-synset edges depict the antonym relations between synset pairs.
In general, our graph construction is similar to that of Feng et al. (2013), but there are a few important differences. Most notably, we model both words and synsets explicitly, and exploit the membership relations between words and senses. We expect that edges between words and senses will encourage senses that belong to the same word to receive the same connotation label. Conversely, we expect that these edges will also encourage words that belong to the same sense (i.e., synset definition) to receive the same connotation label.
Another benefit of our approach is that for various WordNet relations (e.g., antonym relations), which are defined over synsets (not over words), we can add edges directly between corresponding synsets, rather than projecting (i.e., approximating) those relations over words. Note that the latter, which has been employed by several previous studies (e.g., Kamps et al. (2004), Takamura et al. (2005), Andreevskaia and Bergler (2006), Su and Markert (2009), Lu et al. (2011), Kaji and Kitsuregawa (2007), Feng et al. (2013)), could be a source of noise, as one needs to assume that the semantic relation between a pair of synsets transfers over the pair of words corresponding to that pair of synsets. For polysemous words, this assumption may be overly strong.
We formulate the task of learning sense- and word-level connotation lexicon as a graph-based classification task [26]. More formally, we denote the connotation graph by , in which a total of word and synset nodes are connected with typed edges , where edge types ---- depict the four edge types as described in Section 2. A neighborhood function , where , describes the underlying network structure.
In our collective classification formulation, each node in is represented as a random variable that takes a value from an appropriate class label domain; in our case, for positive and negative connotation. In this classification task, we denote by the nodes the labels of which need to be assigned, and let refer to ’s label.
We next define our objective function. We propose to use an objective formulation that utilizes pairwise Markov Random Fields (MRFs) [14], which we adapt to our problem setting. MRFs are a class of probabilistic graphical models that are suited for solving inference problems in networked data. An MRF consists of an undirected graph where each node can be in any of a finite number of states (i.e., class labels). The state of a node is assumed to be dependent on each of its neighbors and independent of other nodes in the graph.22This assumption yields a pairwise Markov Random Field (MRF); a special case of general MRFs [37]. In pairwise MRFs, the joint probability of the graph can be written as a product of pairwise factors, parameterized over the edges. These factors are referred to as clique potentials in general MRFs, which are essentially functions that collectively determine the graph’s joint probability.
Specifically, let denote a network of random variables, where consists of the unobserved variables that need to be assigned values from label set . Let denote a set of clique potentials that consists of two types of factors:
For each , is a prior mapping , where denotes non-negative real numbers.
For each , is a compatibility mapping .
Given an assignment to all the unobserved variables and to observed ones (variables with known labels, if any), our objective function is associated with the following joint probability distribution
(1) |
where is the normalization function. Our goal is then to infer the maximum likelihood assignment of states (i.e., labels) to unobserved variables (i.e., nodes) that will maximize Equation (1).
Having introduced our graph-based classification task and objective formulation, we define our problem more formally.
Given
a connotation graph of words and synsets connected with typed edges,
prior knowledge (i.e., probabilities) of (some or all) nodes belonging to each class,
compatibility of two nodes with a given pair of labels being connected to each other;
Classify the nodes , into one of two classes; , such that the class assignments maximize our objective in Equation (1).
We can further rank the network objects by the probability of their connotation polarity.
Finding the best assignments to unobserved variables in our objective function is the inference problem. The brute force approach through enumeration of all possible assignments is exponential and thus intractable. In general, exact inference is known to be NP-hard and there is no known algorithm which can be theoretically shown to solve the inference problem for general MRFs. Therefore in this work, we employ a computationally tractable (in fact linearly scalable with network size) approximate inference algorithm called Loopy Belief Propagation (LBP) [37], which we extend to handle typed graphs like our connotation graph.
Our inference algorithm is based on iterative message passing and the core of it can be concisely expressed as the following two equations:
(2) |
(3) |
A message is sent from node to node and captures the belief of about , which is the probability distribution over the labels of ; i.e. what “thinks” ’s label is, given the current label of and the type of the edge that connects and . Beliefs refer to marginal probability distributions of nodes over labels; for example denotes the belief of node having label . and are the normalization constants, which respectively ensure that each message and each set of marginal probabilities sum to . At every iteration, each node computes its belief based on messages received from its neighbors, and uses the compatibility mapping to transform its belief into messages for its neighbors. The key idea is that after enough iterations of message passes between the nodes, the “conversations” are likely to come to a consensus, which determines the marginal probabilities of all the unknown variables.
[!ht] \DontPrintSemicolon\SetAlgorithmNameAlgorithmprocedureList of procedures \SetAlgoNlRelativeSize-5 Input: Connotation graph =, prior potentials for seed words , and compatibility potentials Output: Connotation label probabilities for each node \ForEach(// initialize msg.s) \ForEach \ForEach(// initialize priors) \ForEach \lIf \lElse \Repeat( // iterative message passing)all messages stop changing \ForEach, \ForEach Use Equation (3.2) \ForEach(// compute final beliefs) \ForEach Use Equation (3)
The pseudo-code of our method is given in Algorithm 3.2. It first initializes all messages to and priors to unbiased (i.e., equal) probabilities for all nodes except the seed nodes for which the sentiment is known (lines 3-9). It then proceeds by making each communicate messages with their neighbors in an iterative fashion until the messages stabilize (lines 10-14), i.e. convergence is reached.33Although convergence is not theoretically guaranteed, in practice LBP converges to beliefs within a small threshold of change (e.g., ) fairly quickly with accurate results [20, 16, 2]. At convergence, we calculate the marginal probabilities, that is of assigning with label , by computing the final beliefs (lines 15-17). We use these maximum likelihood probabilities for label assignment; for each node , we assign the label .
To completely define our algorithm, we need to instantiate the potentials , in particular the priors and the compatibilities, which we discuss next.
The prior beliefs of nodes can be suitably initialized if there is any prior knowledge for their connotation sentiment (e.g., enjoy is positive, suffer is negative). As such, our method is flexible to integrate available side information. In case there is no prior knowledge available, each node is initialized equally likely to have any of the possible labels, i.e., as in Algorithm 3.2 (line 9).
The compatibility potentials can be thought of as matrices, with entries that give the likelihood of a node having label , given that it has a neighbor with label to which it is connected through a type edge. A key difference of our method from earlier models is that we use clique potentials that differ for edge types, since the connotation graph is heterogeneous. This is exactly because the compatibility of class labels of two adjacent nodes depends on the type of the edge connecting them: e.g., is highly compatible, whereas is unlikely; as - edges capture synonymy; i.e., words-sense memberships, while - edges depict antonym relations.
A sample instantiation of the compatibilities is shown in Table 1. Notice that the potentials for , , and capture homophily, i.e., nodes with the same label are likely to connect to each other through these types of edges.44 edges are based on co-occurrence (see Section 2), which does not carry as strong indication of the same connotation as e.g., synonymy. Thus, we enforce less homophily for nodes connected through edges of type. On the other hand, edges connect nodes that are antonyms of each other, and thus the compatibilities capture the reverse relationship among their labels.
|
| |||||||||||||||||||||||||
() | () | |||||||||||||||||||||||||
|
| |||||||||||||||||||||||||
() | () | |||||||||||||||||||||||||
(synonym relations) | (antonym relations) |
Most demanding component of Algorithm 3.2 is the iterative message passing over the edges (lines 10-14), with time complexity , where is the number of edges in the connotation graph, , the classes, and , the iterations until convergence. Often, is quite small (in our case, ) and . Thus running time grows linearly with the number of edges and is scalable to large datasets.
ConnotationWordNet is expected to be the superset of a sentiment lexicon, as it is highly likely for any word with positive/negative sentiment to carry connotation of the same polarity. Thus, we use two conventional sentiment lexicons, General Inquirer (GenInq) [27] and MPQA [36], as surrogates to measure the performance of our inference algorithm.
The construction of the connotation graph, denoted by , which includes words and synsets, has been described in Section 2. In addition to this graph, we tried several other graph constructions, the first three of which have previously been used in [10]. We briefly describe these graphs below, and compare performance on all the graphs in the proceeding.
This is a (bipartite) subgraph of , which only includes the connotative predicates and their arguments. As such, it contains only type edges. The edges between the predicates and the arguments can be weighted by their Point-wise Mutual Information (PMI)55PMI scores are widely used in previous studies to measure association between words (e.g., [7], [31], [19]). based on the Google Web 1T corpus.
The second graph is also a proper subgraph of , which includes the predicates and all the argument words. Predicate words are connected to their arguments as before. In addition, argument pairs (, ) are connected if they occurred together in the “ and ” or “ and ” coordination [11, 24]. This graph contains both type and edges. The edges can also be weighted based on the distributional similarities of the word pairs.
The third graph is a super-graph of w/ Overlay, with additional edges, where argument pairs in synonym and antonym relation are connected to each other. Note that unlike the connotation graph , it does not contain any synset nodes. Rather, the words that are synonyms or antonyms of each other are directly linked in the graph. As such, this graph contains all edge types through .
This is a super-graph of our original graph; that is, it has all the predicate, arguments, and synset nodes, as well as the four types of edges between them. In addition, we add edges of a fifth type between the synset nodes to capture their similarity. To define similarity, we use the glossary definitions of the synsets and derive three different scores. Each score utilizes the count() of overlapping nouns, verbs, and adjectives/adverbs among the glosses of the two synsets and .
w/ SynSim1: We discard edges with count less than 3. The weighted version has the counts normalized between 0 and 1.
w/ SynSim2: We normalize the counts by the length of the gloss (the avg of two lengths), that is, count / avg(len_gloss(), len_gloss()) and discard edges with . The weighted version contains values as edge weights.
w/ SynSim3: To further sparsify the graph we discard edges with . To weigh the edges, we use the cosine similarity between the gloss vectors of the synsets based on the TF-IDF values of the words the glosses contain.
Note that the connotation inference algorithm, as given in Algorithm 3.2, remains exactly the same for all the graphs described above. The only difference is the set of parameters used; while w/ Pred-Arg and w/ Overlay contain one and two edge types, respectively and only use compatibilities and , uses all four as given in Table 1. The w/ SynSim graphs use an additional compatibility matrix for the synset similarity edges of type , which is the same as the one used for , i.e., similar synsets are likely to have the same connotation label. This flexibility is one of the key advantages of our algorithm as new types of nodes and edges can be added to the graph seamlessly.
In this section, we first compare the performance of our connotation graph to graphs that do not include synset nodes but only words. Then we analyze the performance when the additional synset similarity edges are added. First, we briefly describe our performance measures.
The sentiment lexicons we use as gold standard are small, compared to the size (i.e., number of words) our graphs contain. Thus, we first find the overlap between each graph and a sentiment lexicon. Note that the overlap size may be smaller than the lexicon size, as some sentiment words may be missing from our graphs. Then, we calculate the number of correct label assignments. As such, precision is defined as (correct / overlap), and recall as (correct / lexicon size). Finally, F1-score is their harmonic mean and reflects the overall accuracy.
As shown in Table 2 (top), we first observe that including the synonym and antonym relations in the graph, as with and , improve the performance significantly, almost by an order of magnitude, over graphs w/ Pred-Arg and w/ Overlay that do not contain those relation types. Furthermore, we notice that the performances on the graph are better than those on the word-only graphs. This shows that including the synset nodes explicitly in the graph structure is beneficial. What is more, it gives us a means to obtain connotation labels for the synsets themselves, which we use in the evaluations in the next sections. Finally, we note that using the unweighted versions of the graphs provide relatively more robust performance, potentially due to noise in the relative edge weights.
GenInq | Mpqa | |||
P | R | F | F | |
Variations of | ||||
w/ Pred-Arg | 88.0 | 67.6 | 76.5 | 57.3 |
w/ Pred-Arg-w | 84.9 | 68.9 | 76.1 | 57.8 |
w/ Overlay | 87.8 | 70.4 | 78.1 | 58.4 |
w/ Overlay-w | 82.2 | 67.7 | 74.2 | 54.2 |
88.5 | 83.1 | 85.7 | 69.7 | |
-w | 75.5 | 71.5 | 73.4 | 53.2 |
Variations of | ||||
88.8 | 84.1 | 86.4 | 70.0 | |
-w | 76.8 | 73.0 | 74.9 | 54.6 |
w/ SynSim1 | 87.2 | 83.3 | 85.2 | 67.9 |
w/ SynSim2 | 83.9 | 80.8 | 82.3 | 65.1 |
w/ SynSim3 | 86.5 | 83.2 | 84.8 | 67.8 |
w/ SynSim1-w | 88.0 | 84.3 | 86.1 | 69.2 |
w/ SynSim2-w | 86.4 | 83.7 | 85.0 | 68.5 |
w/ SynSim3-w | 86.7 | 83.4 | 85.0 | 68.2 |
Next we analyze the performance when the new edges between synsets are introduced, as given in Table 2 (bottom). We observe that connecting the synset nodes by their gloss-similarity (at least in the ways we tried) does not yield better performance than on our original graph. Different from earlier, the weighted versions of the similarity based graphs provide better performance than their unweighted counterparts. This suggests that glossary similarity would be a more robust means to correlate nodes; we leave it as future work to explore this direction for predicate-argument and argument-argument relations.
Our belief propagation based connotation sentiment inference algorithm has one user-specified parameter (see Table 1). To study the sensitivity of its performance to the choice of , we reran our experiments for 66Note that for , compatibilities of in Table 1 are reversed, hence the maximum of . and report the accuracy results on our in Figure 2 for the two lexicons. The results indicate that the performances remain quite stable across a wide range of the parameter choice.
(a) GenInq Eval | (b) MPQA Eval |
In this section, we present the result of human evaluation we executed using Amazon Mechanical Turk (AMT). We collect two separate sets of labels: a set of labels at the word-level, and another set at the sense-level. We first describe the labeling process of sense-level connotation: We selected 350 polysemous words and one of their senses, and each Turker was asked to rate the connotative polarity of a given word (or of a given sense), from -5 to 5, 0 being the neutral.77Because senses in WordNet can be tricky to understand, care should be taken in designing the task so that the Turkers will focus only on the corresponding sense of a word. Therefore, we provided the part of speech tag, the WordNet gloss of the selected sense, and a few examples as given in WordNet. As an incentive, each Turker was rewarded $0.07 per hit which consists of 10 words to label. For each word, we asked 5 Turkers to rate and we took the average of the 5 ratings as the connotative intensity score of the word. We labeled a word as negative if its intensity score is less than 0 and positive otherwise. For word-level labels we apply similar procedure as above.
We first evaluate the word-level assignment of connotation, as shown in Table 3. The agreement between the new lexicon and human judges varies between 84% and 86.98%. Sentiment lexicons such as SentiWordNet (Baccianella et al. (2010)) and OpinionFinder (Wilson et al. (2005a)) show low agreement rate with human, which is somewhat as expected: human judges in this study are labeling for subtle connotation, not for more explicit sentiment. OpinionFinder’s low agreement rate was mainly due to the low hit rate of the words (successful look-up rate, 33.43%). Feng2013 is the lexicon presented in [10] and it showed a relatively higher 72.13% hit rate.
Note that belief propagation was run until 95% and 99% of the nodes were converged in their beliefs. In addition, the seed words with known connotation labels originally consist of 20 positive and 20 negative predicates. We also extended the seed set with the sentiment lexicon words and denote these runs with e- for ‘extended’.
Lexicon | Word-level | Sense-level \bigstrut |
---|---|---|
SentiWordNet | 27.22 | 14.29 |
OpinionFinder | 31.95 | - |
Feng2013 | 62.72 | - |
(95%) | 84.91 | 83.43 |
(99%) | 84.91 | 83.71 |
e-(95%) | 86.98 | 86.29 |
e-(99%) | 86.69 | 85.71 |
We also examined the agreement rates on the sense-level. Since OpinionFinder and Feng2013 do not provide the polarity scores at the sense-level, we excluded them from this evaluation. Because sense-level polarity assignment is a harder (more subtle) task, the performance of all lexicons decreased to some degree in comparison to that of word-level evaluations.
A notable goodness of our induction algorithm is that the outcome of the algorithm can be interpreted as an intensity of the corresponding connotation. But are these values meaningful? We answer this question in this section. We formulate a pair-wise ranking task as a binary decision task as follows: given a pair of words, we ask which one is more positive (or more negative) than the other. Since we collect human labels based on scales, we already have this information at hand. Because different human judges have different notion of scales however, subtle differences are more likely to be noisy. Therefore, we experiment with varying degrees of differences in their scales, as shown in Figure 3. Threshold values (ranging from 0.5 to 3.0) indicate the minimum differences in scales for any pair of words, for the pair to be included in the test set. As expected, we observe that the performance improves as we increase the threshold (as pairs get better separated). Within range [0.5, 1.5] (249 pairs examined), the accuracies are as high as 68.27%, which shows that even the subtle differences of the connotative intensities are relatively well reflected in the new lexicons.
The results for pair-wise intensity evaluation (threshold=2.0, 1,208 pairs) are given in Table 4. Despite that intensity is generally a harder property to measure (than the coarser binary categorization of polarities), our connotation lexicons perform surprisingly well, reaching up to 74.83% accuracy. Further study on the incorrect cases reveals that SentiWordNet has many pair of words with the same polarity score (23.34%). Such cases seems to be due to the limited score patterns of SentiWordNet. The ratio of such cases are accounted as Undecided in Table 4.
Lexicon | Correct | Undecided \bigstrut |
---|---|---|
SentiWordNet | 33.77 | 23.34 |
(95%) | 74.83 | 0.58 |
(99%) | 73.01 | 0.58 |
e-(95%) | 73.84 | 1.16 |
e-(99%) | 74.01 | 1.16 |
Finally, to show the utility of the resulting lexicon in the context of a concrete sentiment analysis task, we perform lexicon-based sentiment analysis. We experiment with SemEval dataset [28] that includes the human labeled dataset for predicting whether a news headline is a good news or a bad news, which we expect to have a correlation with the use of connotative words that we focus on in this paper. The good/bad news are annotated with scores (ranging from -100 to 87). We construct several data sets by applying different thresholds on scores. For example, with the threshold set to 60, we discard the instances whose scores lie between -60 and 60. For comparison, we also test the connotation lexicon from [10] and the combined sentiment lexicon GenInq+MPQA.
Note that there is a difference in how humans judge the orientation and the degree of connotation for a given word out of context, and how the use of such words in context can be perceived as good/bad news. In particular, we conjecture that humans may have a bias toward the use of positive words, which in turn requires calibration from the readers’ minds [22]. That is, we might need to tone down the level of positiveness in order to correctly measure the actual intended positiveness of the message.
With this in mind, we tune the appropriate calibration from a small training data, by using 1 fold from fold cross validation, and using the remaining folds as testing. We simply learn the mixture coefficient to scale the contribution of positive and negative connotation values. We tune this parameter 88What is reported is based on . More detailed parameter search does not change the results much. for other lexicons we compare against as well. Note that due to this parameter learning, we are able to report better performance for the connotation lexicon of [10] than what the authors have reported in their paper (labeled with *) in Table 5.
Table 5 shows the results for =, where the new lexicon consistently outperforms other competitive lexicons. In addition, Figure 4 shows that the performance does not change much based on the size of training data used for parameter tuning (=).
4]*Lexicon | SemEval Threshold \bigstrut | |||
---|---|---|---|---|
20 | 40 | 60 | 80 \bigstrut | |
Instance Size | 955 | 649 | 341 | 86 |
Feng2013 | 71.5 | 77.1 | 81.6 | 90.5 |
GenInq+MPQA | 72.8 | 77.2 | 80.4 | 86.7 |
(95%) | 74.5 | 79.4 | 86.5 | 91.9 |
(99%) | 74.6 | 79.4 | 86.8 | 91.9 |
e-(95%) | 72.5 | 76.8 | 82.3 | 87.2 |
e-(99%) | 72.6 | 76.9 | 82.5 | 87.2 |
Feng2013* | 70.8 | 74.6 | 80.8 | 93.5 |
GenInq+MPQA* | 64.5 | 69.0 | 74.0 | 80.5 |
Several previous approaches explored the use of graph propagation for sentiment lexicon induction [32] and connotation lexicon induction [10]. Our work introduces the use of loopy belief propagation over pairwise-MRF as an alternative solution to these tasks. At a high-level, both approaches share the general idea of propagating confidence or belief over the graph connectivity. The key difference, however, is that in our MRF representation, we can explicitly model various types of word-word, sense-sense and word-sense relations as edge potentials. In particular, we can naturally encode relations that encourage the same assignment (e.g., synonym) as well as the opposite assignment (e.g., antonym) of the polarity labels. Note that integration of the latter is not straightforward in the graph propagation framework.
There have been a number of previous studies that aim to construct a word-level sentiment lexicon [34, 25] and a sense-level sentiment lexicon [8]. But none of these approaches considered to induce the polarity labels at both the word-level and sense-level. Although we focus on learning connotative polarity of words and senses in this paper, the same approach would be applicable to constructing a sentiment lexicon as well.
There have been recent studies that address word sense disambiguation issues for sentiment analysis. SentiWordNet [8] was the very first lexicon developed for sense-level labels of sentiment polarity. In recent years, Akkaya et al. (2009) report a successful empirical result where WSD helps improving sentiment analysis, while Wiebe and Mihalcea (2006) study the distinction between objectivity and subjectivity in each different sense of a word, and their empirical effects in the context of sentiment analysis. Our work shares the high-level spirit of accessing the sense-level polarity, while also deriving the word-level polarity.
In recent years, there has been a growing research interest in investigating more fine-grained aspects of lexical sentiment beyond positive and negative sentiment. For example, Mohammad and Turney (2010) study the affects words can evoke in people’s minds, while Bollen et al. (2011) study various moods, e.g., “tension”, “depression”, beyond simple dichotomy of positive and negative sentiment. Our work, and some recent work by Feng et al. (2011) and Feng et al. (2013) share this spirit by targeting more subtle, nuanced sentiment even from those words that would be considered as objective in early studies of sentiment analysis.
We have introduced a novel formulation of lexicon induction operating over both words and senses, by exploiting the innate structure between the words and senses as encoded in WordNet. In addition, we introduce the use of loopy belief propagation over pairwise-Markov Random Fields as an effective lexicon induction algorithm. A notable strength of our approach is its expressiveness: various types of prior knowledge and lexical relations can be encoded as node potentials and edge potentials. In addition, it leads to a lexicon of better quality while also offering faster run-time and easiness of implementation. The resulting lexicon, called ConnotationWordNet, is the first lexicon that has polarity labels over both words and senses. ConnotationWordNet is publicly available for research and practical use.
This research was supported by the Army Research Office under Contract No. W911NF-14-1-0029, Stony Brook University Office of Vice President for Research, and gifts from Northrop Grumman Aerospace Systems and Google. We thank reviewers for many insightful comments and suggestions.