Beyond Canonical Texts: A Computational Analysis of Fanfiction

Smitha Milli1 and David Bamman2
1UC Berkeley, 2University of California, Berkeley


Abstract

While much computational work on fiction has focused on works in the literary canon, user-created fanfiction presents a unique opportunity to study an ecosystem of literary production and consumption, embodying qualities both of large-scale literary data (55 billion tokens) and also a social network (with over 2 million users). We present several empirical analyses of this data in order to illustrate the range of affordances it presents to research in NLP, computational social science and the digital humanities. We find that fanfiction deprioritizes main protagonists in comparison to canonical texts, has a statistically significant difference in attention allocated to female characters, and offers promise for predicting reader reactions to stories.