Political Ideology Detection Using Recursive Neural Networks

Mohit Iyyer

{}^{1}

, Peter Enns

{}^{2}

, Jordan Boyd-Graber

{}^{3,4}

, Philip Resnik

{}^{2,4}

{}^{1}

Computer Science,

{}^{2}

Linguistics,

{}^{3}

iSchool, and

{}^{4}

umiacs
University of Maryland
{miyyer,peter,jbg}@umiacs.umd.edu, resnik@umd.edu

Abstract

An individual’s words often reveal their political ideology. Existing automated techniques to identify ideology from text focus on bags of words or wordlists, ignoring syntax. Taking inspiration from recent work in sentiment analysis that successfully models the compositional aspect of language, we apply a recursive neural network (RNN) framework to the task of identifying the political position evinced by a sentence. To show the importance of modeling subsentential elements, we crowdsource political annotations at a phrase and sentence level. Our model outperforms existing models on our newly annotated dataset and an existing dataset.

\NewEnviron

smalign

\displaystyle\BODY

(1)

1 Introduction

Many of the issues discussed by politicians and the media are so nuanced that even word choice entails choosing an ideological position. For example, what liberals call the “estate tax” conservatives call the “death tax”; there are no ideologically neutral alternatives [14]. While objectivity remains an important principle of journalistic professionalism, scholars and watchdog groups claim that the media are biased [10, 6, 18], backing up their assertions by publishing examples of obviously biased articles on their websites. Whether or not it reflects an underlying lack of objectivity, quantitative changes in the popular framing of an issue over time—favoring one ideologically-based position over another—can have a substantial effect on the evolution of policy [5].

Manually identifying ideological bias in political text, especially in the age of big data, is an impractical and expensive process. Moreover, bias may be localized to a small portion of a document, undetectable by coarse-grained methods. In this paper, we examine the problem of detecting ideological bias on the sentence level. We say a sentence contains ideological bias if its author’s political position (here liberal or conservative, in the sense of U.S. politics) is evident from the text.

Figure 1: An example of compositionality in ideological bias detection (red

\rightarrow

conservative, blue

\rightarrow

liberal, gray

\rightarrow

neutral) in which modifier phrases and punctuation cause polarity switches at higher levels of the parse tree.

Ideological bias is difficult to detect, even for humans—the task relies not only on political knowledge but also on the annotator’s ability to pick up on subtle elements of language use. For example, the sentence in Figure 1 includes phrases typically associated with conservatives, such as “small businesses” and “death tax”. When we take more of the structure into account, however, we find that scare quotes and a negative propositional attitude (a lie about X) yield an evident liberal bias.

Existing approaches toward bias detection have not gone far beyond “bag of words” classifiers, thus ignoring richer linguistic context of this kind and often operating at the level of whole documents. In contrast, recent work in sentiment analysis has used deep learning to discover compositional effects [27, 28].

Building from those insights, we introduce a recursive neural network (rnn) to detect ideological bias on the sentence level. This model requires richer data than currently available, so we develop a new political ideology dataset annotated at the phrase level. With this new dataset we show that rnns not only label sentences well but also improve further when given additional phrase-level annotations. rnns are quantitatively more effective than existing methods that use syntactic and semantic features separately, and we also illustrate how our model correctly identifies ideological bias in complex syntactic constructions.

2 Recursive Neural Networks

Recursive neural networks (rnns) are machine learning models that capture syntactic and semantic composition. They have achieved state-of-the-art performance on a variety of sentence-level nlp tasks, including sentiment analysis, paraphrase detection, and parsing [26, 13]. rnn models represent a shift from previous research on ideological bias detection in that they do not rely on hand-made lexicons, dictionaries, or rule sets. In this section, we describe a supervised rnn model for bias detection and highlight differences from previous work in training procedure and initialization.

Figure 2: An example rnn for the phrase “so-called climate change”. Two d-dimensional word vectors (here,

d=6

) are composed to generate a phrase vector of the same dimensionality, which can then be recursively used to generate vectors at higher-level nodes.

2.1 Model Description

By taking into account the hierarchical nature of language, rnns can model semantic composition, which is the principle that a phrase’s meaning is a combination of the meaning of the words within that phrase and the syntax that combines those words. While semantic composition does not apply universally (e.g., sarcasm and idioms), most language follows this principle. Since most ideological bias becomes identifiable only at higher levels of sentence trees (as verified by our annotation, Figure 4), models relying primarily on word-level distributional statistics are not desirable for our problem.

The basic idea behind the standard rnn model is that each word $w$ in a sentence is associated with a vector representation $x_{w}\in\mathbb{R}^{d}$ . Based on a parse tree, these words form phrases $p$ (Figure 2). Each of these phrases also has an associated vector $x_{p}\in\mathbb{R}^{d}$ of the same dimension as the word vectors. These phrase vectors should represent the meaning of the phrases composed of individual words. As phrases themselves merge into complete sentences, the underlying vector representation is trained to retain the sentence’s whole meaning.

The challenge is to describe how vectors combine to form complete representations. If two words $w_{a}$ and $w_{b}$ merge to form phrase $p$ , we posit that the phrase-level vector is

x_{p}=f{(W_{L}\cdot x_{a}+W_{R}\cdot x_{b}+b_{1})},

(2)

where $W_{L}$ and $W_{R}$ are $d\times d$ left and right composition matrices shared across all nodes in the tree, $b_{1}$ is a bias term, and $f$ is a nonlinear activation function such as $\tanh$ . The word-level vectors $x_{a}$ and $x_{b}$ come from a $d\times V$ dimensional word embedding matrix $W_{e}$ , where $V$ is the size of the vocabulary.

We are interested in learning representations that can distinguish political polarities given labeled data. If an element of this vector space, $x_{d}$ , represents a sentence with liberal bias, its vector should be distinct from the vector $x_{r}$ of a conservative-leaning sentence.

Supervised rnns achieve this distinction by applying a regression that takes the node’s vector $x_{p}$ as input and produces a prediction $\hat{y}_{p}$ . This is a softmax layer

\hat{y}_{d}=\mbox{softmax}(W_{cat}\cdot x_{p}+b_{2}),

(3)

where the softmax function is

\mbox{softmax}(q)=\frac{\exp{q}}{\sum_{j=1}^{k}\exp{q_{j}}}

(4)

and $W_{cat}$ is a $k\times d$ matrix for a dataset with $k$ -dimensional labels.

We want the predictions of the softmax layer to match our annotated data; the discrepancy between categorical predictions and annotations is measured through the cross-entropy loss. We optimize the model parameters to minimize the cross-entropy loss over all sentences in the corpus. The cross-entropy loss of a single sentence is the sum over the true labels $y_{i}$ in the sentence,

\ell(\hat{y}_{s})=\sum\limits_{p=1}^{k}y_{p}*log(\hat{y}_{p}).

(5)

This induces a supervised objective function over all sentences: a regularized sum over all node losses normalized by the number of nodes $N$ in the training set,

C=\frac{1}{N}\sum\limits_{i}^{N}\ell(pred_{i})+\frac{\lambda}{2}\left\lVert% \theta\right\rVert^{2}.

(6)

We use l-bfgs with parameter averaging [12] to optimize the model parameters $\theta=(W_{L},W_{R},W_{cat},W_{e},b_{1},b_{2})$ . The gradient of the objective, shown in Eq. (7), is computed using backpropagation through structure [8],

\frac{\partial C}{\partial\theta}=\frac{1}{N}\sum\limits_{i}^{N}\frac{\partial% \ell(\hat{y}_{i})}{\partial\theta}+\lambda\theta.

(7)

2.2 Initialization

When initializing our model, we have two choices: we can initialize all of our parameters randomly or provide the model some prior knowledge. As we see in Section 4, these choices have a significant effect on final performance.

Random

The most straightforward choice is to initialize the word embedding matrix $W_{e}$ and composition matrices $W_{L}$ and $W_{R}$ randomly such that without any training, representations for words and phrases are arbitrarily projected into the vector space.

word2vec

The other alternative is to initialize the word embedding matrix $W_{e}$ with values that reflect the meanings of the associated word types. This improves the performance of rnn models over random initializations [4, 26]. We initialize our model with 300-dimensional word2vec toolkit vectors generated by a continuous skip-gram model trained on around 100 billion words from the Google News corpus [16].

The word2vec embeddings have linear relationships (e.g., the closest vectors to the average of “green” and “energy” include phrases such as “renewable energy”, “eco-friendly”, and “efficient lightbulbs”). To preserve these relationships as phrases are formed in our sentences, we initialize our left and right composition matrices such that parent vector $p$ is computed by taking the average of children $a$ and $b$ ( $W_{L}=W_{R}=0.5\mathbb{I}_{d\times d}$ ). This initialization of the composition matrices has previously been effective for parsing [25].

3 Datasets

We performed initial experiments on a dataset of Congressional debates that has annotations on the author level for partisanship, not ideology. While the two terms are highly correlated (e.g., a member of the Republican party likely agrees with conservative stances on most issues), they are not identical. For example, a moderate Republican might agree with the liberal position on increased gun control but take conservative positions on other issues. To avoid conflating partisanship and ideology we create a new dataset annotated for ideological bias on the sentence and phrase level. In this section we describe our initial dataset (Convote) and explain the procedure we followed for creating our new dataset (ibc).¹¹Available at http://cs.umd.edu/~miyyer/ibc

3.1 Convote

The Convote dataset [29] consists of us Congressional floor debate transcripts from 2005 in which all speakers have been labeled with their political party (Democrat, Republican, or independent). We propagate party labels down from the speaker to all of their individual sentences and map from party label to ideology label (Democrat $\rightarrow$ liberal, Republican $\rightarrow$ conservative). This is an expedient choice; in future work we plan to make use of work in political science characterizing candidates’ ideological positions empirically based on their behavior [3].

While the Convote dataset has seen widespread use for document-level political classification, we are unaware of similar efforts at the sentence level.

3.1.1 Biased Sentence Selection

The strong correlation between us political parties and political ideologies (Democrats with liberal, Republicans with conservative) lends confidence that this dataset contains a rich mix of ideological statements. However, the raw Convote dataset contains a low percentage of sentences with explicit ideological bias.²²Many sentences in Convote are variations on “I think this is a good/bad bill”, and there is also substantial parliamentary boilerplate language. We therefore use the features in Yano et al. [32], which correlate with political bias, to select sentences to annotate that have a higher likelihood of containing bias. Their features come from the Linguistic Inquiry and Word Count lexicon (liwc) [20], as well as from lists of “sticky bigrams” [2] strongly associated with one party or another (e.g., “illegal aliens” implies conservative, “universal healthcare” implies liberal).

We first extract the subset of sentences that contains any words in the liwc categories of Negative Emotion, Positive Emotion, Causation, Anger, and Kill verbs.³³While Kill verbs are not a category in liwc, Yano et al. [32] adopted it from Greene and Resnik (2009) and showed it to be a useful predictor of political bias. It includes words such as “slaughter” and “starve”. After computing a list of the top 100 sticky bigrams for each category, ranked by log-likelihood ratio, and selecting another subset from the original data that included only sentences containing at least one sticky bigram, we take the union of the two subsets. Finally, we balance the resulting dataset so that it contains an equal number of sentences from Democrats and Republicans, leaving us with a total of 7,816 sentences.

3.2 Ideological Books

In addition to Convote, we use the Ideological Books Corpus (ibc) developed by Gross et al. [11]. This is a collection of books and magazine articles written between 2008 and 2012 by authors with well-known political leanings. Each document in the ibc has been manually labeled with coarse-grained ideologies (right, left, and center) as well as fine-grained ideologies (e.g., religious-right, libertarian-right) by political science experts.

There are over a million sentences in the ibc, most of which have no noticeable political bias. Therefore we use the filtering procedure outlined in Section 3.1.1 to obtain a subset of 55,932 sentences. Compared to our final Convote dataset, an even larger percentage of the ibc sentences exhibit no noticeable political bias.⁴⁴This difference can be mainly attributed to a historical topics in the ibc (e.g., the Crusades, American Civil War). In Convote, every sentence is part of a debate about 2005 political policy. Because our goal is to distinguish between liberal and conservative bias, instead of the more general task of classifying sentences as “neutral” or “biased”, we filter the dataset further using dualist [23], an active learning tool, to reduce the proportion of neutral sentences in our dataset. To train the dualist classifier, we manually assigned class labels of “neutral” or “biased” to 200 sentences, and selected typical partisan unigrams to represent the “biased” class. dualist labels 11,555 sentences as politically biased, 5,434 of which come from conservative authors and 6,121 of which come from liberal authors.

3.2.1 Annotating the ibc

For purposes of annotation, we define the task of political ideology detection as identifying, if possible, the political position of a given sentence’s author, where position is either liberal or conservative.⁵⁵This is a simplification, as the ideological hierarchy in ibc makes clear. We used the Crowdflower crowdsourcing platform (crowdflower.com), which has previously been used for subsentential sentiment annotation [22], to obtain human annotations of the filtered ibc dataset for political bias on both the sentence and phrase level. While members of the Crowdflower workforce are certainly not experts in political science, our simple task and the ubiquity of political bias allows us to acquire useful annotations.

Crowdflower Task

First, we parse the filtered ibc sentences using the Stanford constituency parser [25]. Because of the expense of labeling every node in a sentence, we only label one path in each sentence. The process for selecting paths is as follows: first, if any paths contain one of the top-ten partisan unigrams,⁶⁶The words that the multinomial naïve Bayes classifier in dualist marked as highest probability given a polarity: market, abortion, economy, rich, liberal, tea, economic, taxes, gun, abortion we select the longest such path; otherwise, we select the path with the most open class constituencies (np, vp, adjp). The root node of a sentence is always included in a path.

Our task is shown in Figure 3. Open class constituencies are revealed to the worker incrementally, starting with the np, vp, or adjp furthest from the root and progressing up the tree. We choose this design to prevent workers from changing their lower-level phrase annotations after reading the full sentence.

Filtering the Workforce

To ensure our annotators have a basic understanding of us politics, we restrict workers to us ip addresses and require workers manually annotate one node from 60 different “gold ” paths annotated by the authors. We select these nodes such that the associated phrase is either obviously biased or obviously neutral. Workers must correctly annotate at least six of eight gold paths before they are granted access to the full task. In addition, workers must maintain 75% accuracy on gold paths that randomly appear alongside normal paths. Gold paths dramatically improve the quality of our workforce: 60% of contributors passed the initial quiz (the 40% that failed were barred from working on the task), while only 10% of workers who passed the quiz were kicked out for mislabeling subsequent gold paths.

Annotation Results

Workers receive the following instructions:

Each task on this page contains a set of phrases from a single sentence. For each phrase, decide whether or not the author favors a political position to the left (Liberal) or right (Conservative) of center.

{itemize*}

If the phrase is indicative of a position to the left of center, please choose Liberal.

If the phrase is indicative of a position to the right of center, please choose Conservative.

If you feel like the phrase indicates some position to the left or right of the political center, but you’re not sure which direction, please mark Not neutral, but I’m unsure of which direction.

If the phrase is not indicative of a position to the left or right of center, please mark Neutral.

We had workers annotate 7,000 randomly selected paths from the filtered ibc dataset, with half of the paths coming from conservative authors and the other half from liberal authors, as annotated by Gross et al. [11]. Three workers annotated each path in the dataset, and we paid $0.03 per sentence. Since identifying political bias is a relatively difficult and subjective task, we include all sentences where at least two workers agree on a label for the root node in our final dataset, except when that label is “Not neutral, but I’m unsure of which direction”. We only keep phrase-level annotations where at least two workers agree on the label: 70.4% of all annotated nodes fit this definition of agreement. All unannotated nodes receive the label of their closest annotated ancestor. Since the root of each sentence is always annotated, this strategy ensures that every node in the tree has a label. Our final balanced ibc dataset consists of 3,412 sentences (4,062 before balancing and removing neutral sentences) with a total of 13,640 annotated nodes. Of these sentences, 543 switch polarity (liberal $\rightarrow$ conservative or vice versa) on an annotated path.

While we initially wanted to incorporate neutral labels into our model, we observed that lower-level phrases are almost always neutral while full sentences are much more likely to be biased (Figure 4). Due to this discrepancy, the objective function in Eq. (6) was minimized by making neutral predictions for almost every node in the dataset.

Figure 3: Example political ideology annotation task showing incremental reveal of progressively longer phrases.

Figure 4: Proportion of liberal, conservative, and neutral annotations with respect to node depth (distance from root). As we get farther from the root of the tree, nodes are more likely to be neutral.

4 Experiments

In this section we describe our experimental framework. We discuss strong baselines that use lexical and syntactic information (including framing-specific features from previous work) as well as multiple RNN configurations. Each of these models have the same task: to predict sentence-level ideology labels for sentences in a test set. To account for label imbalance, we subsample the data so that there are an equal number of labels and report accuracy over this balanced dataset.

4.1 Baselines

{itemize*}

The random baseline chooses a label at random from {liberal, conservative}.

lr1, our most basic logistic regression baseline, uses only bag of words (BoW) features.

lr2 uses only BoW features. However, lr2 also includes phrase-level annotations as separate training instances.⁷⁷The Convote dataset was not annotated on the phrase level, so we only provide a result for the IBC dataset.

lr3 uses BoW features as well as syntactic pseudo-word features from Greene & Resnik [9]. These features from dependency relations specify properties of verbs (e.g., transitivity or nominalization).⁸⁸We do not include phrase-level annotations in the lr3 feature set because the pseudo-word features can only be computed from full sentence parses.

lr-(w2v) is a logistic regression model trained on the average of the pretrained word embeddings for each sentence (Section 2.2).

The lr-(w2v) baseline allows us to compare against a strong lexical representation that encodes syntactic and semantic information without the RNN tree structure. (lr1, lr2) offer a comparison to simple bag of words models, while the lr3 baseline contrasts traditional syntactic features with those learned by RNN models.

Model	Convote	IBC
random	50%	50%
lr1	64.7%	62.1%
lr2	–	61.9%
lr3	66.9%	62.6%
lr-(w2v)	66.6%	63.7%
rnn1	69.4%	66.2%
rnn1-(w2v)	70.2%	67.1%
rnn2-(w2v)	–	69.3%

Table 1: Sentence-level bias detection accuracy. The rnn framework, adding phrase-level data, and initializing with word2vec all improve performance over logistic regression baselines. The lr2 and rnn2-(w2v) models were not trained on Convote since it lacks phrase annotations.

5 Where Compositionality Helps Detect Ideological Bias

In this section, we examine the rnn models to see why they improve over our baselines. We also give examples of sentences that are correctly classified by our best rnn model but incorrectly classified by all of the baselines. Finally, we investigate sentence constructions that our model cannot handle and offer possible explanations for these errors.

Experimental Results

Table 1 shows the rnn models outperforming the bag-of-words baselines as well as the word2vec baseline on both datasets. The increased accuracy suggests that the trained rnns are capable of detecting bias polarity switches at higher levels in parse trees. While phrase-level annotations do not improve baseline performance, the rnn model significantly benefits from these annotations because the phrases are themselves derived from nodes in the network structure. In particular, the phrase annotations allow our best model to detect bias accurately in complex sentences that the baseline models cannot handle.

Initializing the rnn $W_{e}$ matrix with word2vec embeddings improves accuracy over randomly initialization by 1%. This is similar to improvements from pretrained vectors from neural language models [27].

We obtain better results on Convote than on ibc with both bag-of-words and rnn models. This result was unexpected since the Convote labels are noisier than the annotated ibc labels; however, there are three possible explanations for the discrepancy. First, Convote has twice as many sentences as ibc, and the extra training data might help the model more than ibc’s better-quality labels. Second, since the sentences in Convote were originally spoken, they are almost half as short (21.3 words per sentence) as those in the ibc (42.2 words per sentence). Finally, some information is lost at every propagation step, so rnns are able to model the shorter sentences in Convote more effectively than the longer ibc sentences.

Qualitative Analysis

As in previous work [27], we visualize the learned vector space by listing the most probable n-grams for each political affiliation in Table 2. As expected, conservatives emphasize values such as freedom and religion while disparaging excess government spending and their liberal opposition. Meanwhile, liberals inveigh against the gap between the rich and the poor while expressing concern for minority groups and the working class.

Our best model is able to accurately model the compositional effects of bias in sentences with complex syntactic structures. The first three sentences in Figure 5 were correctly classified by our best model (rnn2-(w2v)) and incorrectly classified by all of the baselines. Figures 5A and C show traditional conservative phrases, “free market ideology” and “huge amounts of taxpayer money”, that switch polarities higher up in the tree when combined with phrases such as “made worse by” and “saved by”. Figure 5B shows an example of a bias polarity switch in the opposite direction: the sentence negatively portrays supporters of nationalized health care, which our model picks up on.

Figure 5: Predictions by rnn2-(w2v) on four sentences from the ibc. Node color is the true label (red for conservative, blue for liberal), and an “X” next to a node means the model’s prediction was wrong. In A and C, the model accurately detects conservative-to-liberal polarity switches, while in B it correctly predicts the liberal-to-conservative switch. In D, negation confuses our model.

Our model often makes errors when polarity switches occur at nodes that are high up in the tree. In Figure 5D, “be used as an instrument to achieve charitable or social ends” reflects a liberal ideology, which the model predicts correctly. However, our model is unable to detect the polarity switch when this phrase is negated with “should not”. Since many different issues are discussed in the ibc, it is likely that our dataset has too few examples of some of these issues for the model to adequately learn the appropriate ideological positions, and more training data would resolve many of these errors.

n	Most conservative n-grams	Most liberal n-grams
1	Salt, Mexico, housework, speculated, consensus, lawyer, pharmaceuticals, ruthless, deadly, Clinton, redistribution	rich, antipsychotic, malaria, biodiversity, richest, gene, pesticides, desertification, Net, wealthiest, labor, fertilizer, nuclear, HIV
3	prize individual liberty, original liberal idiots, stock market crash, God gives freedom, federal government interference, federal oppression nullification, respect individual liberty, Tea Party patriots, radical Sunni Islamists, Obama stimulus programs	rich and poor,“corporate greed”, super rich pay, carrying the rich, corporate interest groups, young women workers, the very rich, for the rich, by the rich, soaking the rich, getting rich often, great and rich, the working poor, corporate income tax, the poor migrants
5	spending on popular government programs, bailouts and unfunded government promises, North America from external threats, government regulations place on businesses, strong Church of Christ convictions, radical Islamism and other threats	the rich are really rich, effective forms of worker participation, the pensions of the poor, tax cuts for the rich, the ecological services of biodiversity, poor children and pregnant women, vacation time for overtime pay
7	government intervention helped make the Depression Great, by God in His image and likeness, producing wealth instead of stunting capital creation, the traditional American values of limited government, trillions of dollars to overseas oil producers, its troubled assets to federal sugar daddies, Obama and his party as racialist fanatics	African Americans and other disproportionately poor groups; the growing gap between rich and poor; the Bush tax cuts for the rich; public outrage at corporate and societal greed; sexually transmitted diseases , most notably AIDS; organize unions or fight for better conditions, the biggest hope for health care reform

Table 2: Highest probability n-grams for conservative and liberal ideologies, as predicted by the rnn2-(w2v) model.

Acknowledgments

We thank the anonymous reviewers, Hal Daumé, Yuening Hu, Yasuhiro Takayama, and Jyothi Vinjumur for their insightful comments. We also want to thank Justin Gross for providing the ibc and Asad Sayeed for help with the Crowdflower task design, as well as Richard Socher and Karl Moritz Hermann for assisting us with our model implementations. This work was supported by nsf Grant CCF-1018625. Boyd-Graber is also supported by nsf Grant IIS-1320538. Any opinions, findings, conclusions, or recommendations expressed here are those of the authors and do not necessarily reflect the view of the sponsor.

References

[1] A. Ahmed and E. P. Xing(2010) Staying informed: supervised and semi-supervised multi-view topical analysis of ideological perspective. Cited by: 6.1.
[2] P. F. Brown, P. V. Desouza, R. L. Mercer, V. J. D. Pietra and J. C. Lai(1992) Class-based n-gram models of natural language. cl 18 (4), pp. 467–479. Cited by: 3.1.1.
[3] R. Carroll, J. B. Lewis, J. Lo, K. T. Poole and H. Rosenthal(2009) Measuring bias and uncertainty in dw-nominate ideal point estimates via the parametric bootstrap. Political Analysis 17 (3), pp. 261–275. Cited by: 3.1.
[4] R. Collobert and J. Weston(2008) A unified architecture for natural language processing: deep neural networks with multitask learning. Cited by: 2.2.
[5] F. E. Dardis, F. R. Baumgartner, A. E. Boydstun, S. De Boef and F. Shen(2008) Media framing of capital punishment and its impact on individuals’ cognitive responses. Mass Communication & Society 11 (2), pp. 115–140. Cited by: 1.
[6] M. Gentzkow and J. M. Shapiro(2010) What drives media slant? evidence from us daily newspapers. Econometrica 78 (1), pp. 35–71. Cited by: 1, 6.1.
[7] S. Gerrish and D. M. Blei(2011) Predicting legislative roll calls from text. Cited by: 6.1.
[8] C. Goller and A. Kuchler(1996) Learning task-dependent distributed representations by backpropagation through structure. Vol. 1. Cited by: 2.1.
[9] S. Greene and P. Resnik(2009) More than words: syntactic packaging and implicit sentiment. Cited by: 3.1.1, 4.1, 6.2.
[10] T. Groseclose and J. Milyo(2005) A measure of media bias. The Quarterly Journal of Economics 120 (4), pp. 1191–1237. Cited by: 1.
[11] J. Gross, B. Acree, Y. Sim and N. A. Smith(2013) Testing the etch-a-sketch hypothesis: a computational analysis of mitt romney’s ideological makeover during the 2012 primary vs. general elections. Cited by: 3.2.1, 3.2.
[12] K. Hashimoto, M. Miwa, Y. Tsuruoka and T. Chikayama(2013) Simple customization of recursive neural networks for semantic relation classification. Cited by: 2.1.
[13] K. M. Hermann and P. Blunsom(2013) The Role of Syntax in Vector Space Models of Compositional Semantics. Cited by: 2.
[14] G. Lakoff(2002) Moral politics: how liberals and conservatives think, second edition. University of Chicago Press. Cited by: 1.
[15] W. Lin, E. Xing and A. Hauptmann(2008) A joint topic and perspective model for ideological discourse. Machine Learning and Knowledge Discovery in Databases, pp. 17–32. Cited by: 6.1.
[16] T. Mikolov, K. Chen, G. Corrado and J. Dean(2013) Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781. Cited by: 2.2.
[17] V. Nguyen, J. Boyd-Graber and P. Resnik(2013) Lexical and hierarchical topic regression. pp. 1106–1114. Cited by: 6.1.
[18] D. Niven(2003) Objective evidence on media bias: newspaper coverage of congressional party switchers. Journalism & Mass Communication Quarterly 80 (2), pp. 311–326. Cited by: 1.
[19] B. Pang and L. Lee(2008) Opinion mining and sentiment analysis. Foundations and trends in information retrieval 2 (1-2). Cited by: 6.
[20] J. W. Pennebaker, M. E. Francis and R. J. Booth(2001) Linguistic inquiry and word count [computer software]. Mahwah, NJ: Erlbaum Publishers. Cited by: 3.1.1.
[21] M. Recasens, C. Danescu-Niculescu-Mizil and D. Jurafsky(2013) Linguistic models for analyzing and detecting biased language. Cited by: 6.2.
[22] A. B. Sayeed, J. Boyd-Graber, B. Rusk and A. Weinberg(2012) Grammatical structures for word-level sentiment detection. Cited by: 3.2.1.
[23] B. Settles(2011) Closing the loop: fast, interactive semi-supervised annotation with queries on features and instances. Cited by: 3.2.
[24] Y. Sim, B. Acree, J. H. Gross and N. A. Smith(2013) Measuring ideological proportions in political speeches. Cited by: 6.1.
[25] R. Socher, J. Bauer, C. D. Manning and A. Y. Ng(2013) Parsing With Compositional Vector Grammars. acl, Cited by: 2.2, 3.2.1, 6.2.
[26] R. Socher, E. H. Huang, J. Pennington, A. Y. Ng and C. D. Manning(2011) Dynamic Pooling and Unfolding Recursive Autoencoders for Paraphrase Detection. nips, Cited by: 2.2, 2.
[27] R. Socher, J. Pennington, E. H. Huang, A. Y. Ng and C. D. Manning(2011) Semi-Supervised Recursive Autoencoders for Predicting Sentiment Distributions. Cited by: 1, 4.2, 5, 5.
[28] R. Socher, A. Perelygin, J. Y. Wu, J. Chuang, C. D. Manning, A. Y. Ng and C. Potts(2013) Recursive deep models for semantic compositionality over a sentiment treebank. Cited by: 1.
[29] M. Thomas, B. Pang and L. Lee(2006) Get out the vote: determining support or opposition from Congressional floor-debate transcripts. Cited by: 3.1.
[30] J. Wiebe, T. Wilson, R. Bruce, M. Bell and M. Martin(2004) Learning subjective language. cl 30 (3), pp. 277–308. Cited by: 6.2.
[31] T. Wilson, J. Wiebe and P. Hoffmann(2005) Recognizing contextual polarity in phrase-level sentiment analysis. Cited by: 6.
[32] T. Yano, P. Resnik and N. A. Smith(2010) Shedding (a thousand points of) light on biased language. pp. 152–158. Cited by: 3.1.1, 3.1.1.