In this paper, we propose an unsupervised method to identify noun sense changes based on rigorous analysis of time-varying text data available in the form of millions of digitized books. We construct distributional thesauri based networks from data at different time points and cluster each of them separately to obtain word-centric sense clusters corresponding to the different time points. Subsequently, we compare these sense clusters of two different time points to find if (i) there is birth of a new sense or (ii) if an older sense has got split into more than one sense or (iii) if a newer sense has been formed from the joining of older senses or (iv) if a particular sense has died. We conduct a thorough evaluation of the proposed methodology both manually as well as through comparison with WordNet. Manual evaluation indicates that the algorithm could correctly identify 60.4% birth cases from a set of 48 randomly picked samples and 57% split/join cases from a set of 21 randomly picked samples. Remarkably, in 44% cases the birth of a novel sense is attested by WordNet, while in 46% cases and 43% cases split and join are respectively confirmed by WordNet. Our approach can be applied for lexicography, as well as for applications like word sense disambiguation or semantic search.
Two of the fundamental components of a natural language communication are word sense discovery [Jones1986] and word sense disambiguation [Ide and Veronis1998]. While discovery corresponds to acquisition of vocabulary, disambiguation forms the basis of understanding. These two aspects are not only important from the perspective of developing computer applications for natural languages but also form the key components of language evolution and change.
Words take different senses in different contexts while appearing with other words. Context plays a vital role in disambiguation of word senses as well as in the interpretation of the actual meaning of words. For instance, the word “bank” has several distinct interpretations, including that of a “financial institution” and the “shore of a river.” Automatic discovery and disambiguation of word senses from a given text is an important and challenging problem which has been extensively studied in the literature [Jones1986, Ide and Veronis1998, Schütze1998, Navigli2009]. However, another equally important aspect that has not been so far well investigated corresponds to one or more changes that a word might undergo in its sense. This particular aspect is getting increasingly attainable as more and more time-varying text data become available in the form of millions of digitized books [Goldberg and Orwant2013] gathered over the last centuries. As a motivating example one could consider the word “sick” – while according to the standard English dictionaries the word is normally used to refer to some sort of illness, a new meaning of “sick” referring to something that is “crazy” or “cool” is currently getting popular in the English vernacular. This change is further interesting because while traditionally “sick” has been associated to something negative in general, the current meaning associates positivity with it. In fact, a rock band by the name of “Sick Puppies” has been founded which probably is inspired by the newer sense of the word sick. The title of this paper has been motivated by the above observation. Note that this phenomena of change in word senses has existed ever since the beginning of human communication [Bamman and Crane2011, Michel et al.2011, Wijaya and Yeniterzi2011, Mihalcea and Nastase2012]; however, with the advent of modern technology and the availability of huge volumes of time-varying data it now has become possible to automatically track such changes and, thereby, help the lexicographers in word sense discovery, and design engineers in enhancing various NLP/IR applications (e.g., disambiguation, semantic search etc.) that are naturally sensitive to change in word senses.
The above motivation forms the basis of the central objective set in this paper, which is to devise a completely unsupervised approach to track noun sense changes in large texts available over multiple timescales. Toward this objective we make the following contributions: (a) devise a time-varying graph clustering based sense induction algorithm, (b) use the time-varying sense clusters to develop a split-join based approach for identifying new senses of a word, and (c) evaluate the performance of the algorithms on various datasets using different suitable approaches along with a detailed error analysis. Remarkably, comparison with the English WordNet indicates that in 44% cases, as identified by our algorithm, there has been a birth of a completely novel sense, in 46% cases a new sense has split off from an older sense and in 43% cases two or more older senses have merged in to form a new sense.
The remainder of the paper is organized as follows. In the next section we present a short review of the literature. In Section 3 we briefly describe the datasets and outline the process of co-occurrence graph construction. In Section 4 we present an approach based on graph clustering to identify the time-varying sense clusters and in Section 5 we present the split-merge based approach for tracking word sense changes. Evaluation methods are summarized in Section 6. Finally, conclusions and further research directions are outlined in Section 7.
Word sense disambiguation as well as word sense discovery have both remained key areas of research right from the very early initiatives in natural language processing research. Ide and Veronis [Ide and Veronis1998] present a very concise survey of the history of ideas used in word sense disambiguation; for a recent survey of the state-of-the-art one can refer to [Navigli2009]. Some of the first attempts to automatic word sense discovery were made by Karen Spärck Jones [Jones1986]; later in lexicography, it has been extensively used as a pre-processing step for preparing mono- and multi-lingual dictionaries [Kilgarriff and Tugwell2001, Kilgarriff2004]. However, as we have already pointed out that none of these works consider the temporal aspect of the problem.
In contrast, the current study, is inspired by works on language dynamics and opinion spreading [Mukherjee et al.2011, Maity et al.2012, Loreto et al.2012] and automatic topic detection and tracking [Allan et al.1998]. However, our work differs significantly from those proposed in the above studies. Opinion formation deals with the self-organisation and emergence of shared vocabularies whereas our work focuses on how the different senses of these vocabulary words change over time and thus become “out-of-vocabulary”. Topic detection involves detecting the occurrence of a new event such as a plane crash, a murder, a jury trial result, or a political scandal in a stream of news stories from multiple sources and tracking is the process of monitoring a stream of news stories to find those that track (or discuss) the same event. This is done on shorter timescales (hours, days), whereas our study focuses on larger timescales (decades, centuries) and we are interested in common nouns, verbs and adjectives as opposed to events that are characterized mostly by named entities. Other similar works on dynamic topic modelling can be found in [Blei and Lafferty2006, Wang and McCallum2006]. Google books n-gram viewer11␣https://books.google.com/ngrams is a phrase-usage graphing tool which charts the yearly count of selected letter combinations, words, or phrases as found in over 5.2 million digitized books. It only reports frequency of word usage over the years, but does not give any correlation among them as e.g., in [Heyer et al.2009], and does not analyze their senses.
A few approaches suggested by [Bond et al.2009, Pääkkö and Lindén2012] attempt to augment WordNet synsets primarily using methods of annotation. Another recent work by Cook et al. [Cook et al.2013] attempts to induce word senses and then identify novel senses by comparing two different corpora: the “focus corpora” (i.e., a recent version of the corpora) and the “reference corpora” (older version of the corpora). However, this method is limited as it only considers two time points to identify sense changes as opposed to our approach which is over a much larger timescale, thereby, effectively allowing us to track the points of change and the underlying causes. One of the closest work to what we present here has been put forward by [Tahmasebi et al.2011], where the authors analyze a newspaper corpus containing articles between 1785 and 1985. The authors mainly report the frequency patterns of certain words that they found to be candidates for change; however a detailed cause analysis as to why and how a particular word underwent a sense change has not been demonstrated. Further, systematic evaluation of the results obtained by the authors has not been provided.
All the above points together motivated us to undertake the current work where we introduce, for the first time, a completely unsupervised and automatic method to identify the change of a word sense and the cause for the same. Further, we also present an extensive evaluation of the proposed algorithm in order to test its overall accuracy and performance.
In this section, we outline a brief description of the dataset used for our experiments and the graph construction procedure. The primary source of data have been the millions of digitized books made available through the Google Book project [Goldberg and Orwant2013]. The Google Book syntactic n-grams dataset provides dependency fragment counts by the years. However, instead of using the plain syntactic n-grams, we use a far richer representation of the data in the form of a distributional thesaurus [Lin1997, Rychlý and Kilgarriff2007]. In specific, we prepare a distributional thesaurus (DT) for each of the time periods separately and subsequently construct the required networks. We briefly outline the procedure of thesauri construction here referring the reader to [Riedl and Biemann2013] for further details. In this approach, we first extract each word and a set of its context features, which are formed by labeled and directed dependency parse edges as provided in the dataset. Following this, we compute the frequencies of the word, the context and the words along with their context. Next we calculate the lexicographer’s mutual information LMI [Kilgarriff2004] between a word and its features and retain only the top ranked features for every word. Finally, we construct the DT network as follows: each word is a node in the network and the edge weight between two nodes is defined as the number of features that the two corresponding words share in common.
The basic idea of our algorithm for tracking sense changes is as follows. If a word undergoes a sense change, this can be detected by comparing its senses obtained from two different time periods. Since we aim to detect this change automatically, we require distributional representations corresponding to word senses for different time periods. We, therefore, utilize the basic hypothesis of unsupervised sense induction to induce the sense clusters over various time periods and then compare these clusters to detect sense change. The basic premises of the ‘unsupervised sense induction’ are briefly described below.
We use the co-occurrence based graph clustering framework introduced in [Biemann2006]. The algorithm proceeds in three basic steps. Firstly, a co-occurrence graph is created for every target word found in DT. Next, the neighbourhood/ego graph is clustered using the Chinese Whispers (CW) algorithm (see [McAuley and Leskovec2012] for similar approaches). The algorithm, in particular, produces a set of clusters for each target word by decomposing its open neighborhood. We hypothesize that each different cluster corresponds to a particular sense of the target word. For a detailed description, the reader is referred to [Biemann2011].
If a word undergoes sense change, this can be detected by comparing the sense clusters obtained from two different time periods by the algorithm outlined above. For this purpose, we use statistics from the DT corresponding to two different time intervals, say and . We then run the sense induction algorithm over these two different datasets. Now, for a given word that appears in both the datasets, we get two different set of clusters, say and . Without loss of generality, let us assume that our algorithm detects sense clusters for the word in and sense clusters in . Let and , where denotes sense cluster for word during time interval . We next describe our algorithm for detecting sense change from these sets of sense clusters.
We hypothesize that word can undergo sense change from one time interval () to another () as per one of the following scenarios:
A sense cluster in splits into two (or more) sense clusters, and in
Two sense clusters and in join to make a single cluster in
A new sense cluster appears in , which was absent in
A sense cluster in dies out and does not appear in
To detect split, join, birth or death, we build an matrix to capture the intersection between sense clusters of two different time periods. The first rows and columns correspond to the sense clusters in and espectively. We append an additional row and column to capture the fraction of words, which did not show up in any of the sense clusters in another time interval. So, an element of the matrix
: denotes the fraction of words in a newer sense cluster , that were also present in an older sense cluster .
: denotes the fraction of words in the sense cluster , that were not present in any of the clusters in .
: denotes the fraction of words in the sense cluster , that did not show up in any of the clusters in .
Thus, the matrix captures all the four possible scenarios for sense change. Since we can not expect a perfect split, birth etc., we used certain threshold values to detect if a candidate word is undergoing sense change via one of these four cases. In Figure 1, as an example, we illustrate the birth of a new sense for the word ‘compiler’.
To make sure that the candidate words obtained via our algorithm are meaningful, we applied multi-stage filtering to prune the candidate word list. The following criterion were used for the filtering:
We utilize the fact that the CW algorithm is non-deterministic in nature. We apply CW three times over the source and target time intervals. We obtain the candidate word lists using our algorithm for the three runs, then take the intersection to output those words, which came up in all the three runs.
From the above list, we retain only those candidate words, which have a part-of-speech tag ‘NN’ or ‘NNS’, as we focus on nouns for this work.
We sort the candidate list obtained in Stage 2 as per their occurrence in the first time period. Then, we remove the top and the bottom words from this list. Therefore, we consider the torso of the frequency distribution which is the most informative part for this type of an analysis.
For our experiments, we utilized DTs created for 8 different time periods: 1520-1908, 1909-1953, 1954-1972, 1973-1986, 1987-1995, 1996-2001, 2002-2005 and 2006-2008 [Riedl et al.2014]. The time periods were set such that the amount of data in each time period is roughly the same. We will also use to to denote these time periods. The parameters for CW clustering were set as follows. The size of the neighbourhood () to be clustered was set to . The parameter regulating the edge density in this neighbourhood was set to as well. The parameter was set to , which corresponds to favouring smaller clusters by hub downweighing22data available at http://sf.net/p/jobimtext/wiki/LREC2014_Google_DT/. The threshold values used to detect the sense changes were as follows. For birth, at least words of the target cluster should be novel. For split, each split cluster should have at least words of the source cluster and the total intersection of all the split clusters should be . The same parameters were used for the join and death case with the interchange of source and target clusters.
Making comparisons between all the pairs of time periods gave us 28 candidate words lists. For each of these comparison, we applied the multi-stage filtering to obtain the pruned list of candidate words. Table 1 provides some statistics about the number of candidate words obtained corresponding to the birth case. The rows correspond to the source time-period and the columns correspond to the target time periods. An element of the table shows the number of candidate words obtained by comparing the corresponding source and target time periods.
2498 | 3319 | 3901 | 4220 | 4238 | 4092 | 3578 | ||
1451 | 2330 | 2789 | 2834 | 2789 | 2468 | |||
917 | 1460 | 1660 | 1827 | 1815 | ||||
517 | 769 | 1099 | 1416 | |||||
401 | 818 | 1243 | ||||||
682 | 1107 | |||||||
609 |
The table clearly shows a trend. For most of the cases, the number of candidate birth senses tends to increase as we go from left to right. Similarly, this number decreases as we go down in the table. This is quite intuitive since going from left to right corresponds to increasing the gap between two time periods while going down corresponds to decreasing this gap. As the gap increases (decreases), one would expect more (less) new senses coming in. Even while moving diagonally, the candidate words tend to decrease as we move downwards. This corresponds to the fact that the number of years in the time periods decreases as we move downwards, and therefore, the gap also decreases.
Formally, we consider a sense change from to stable if it was also detected while comparing with the following time periods s. This number of subsequent time periods, where the same sense change is detected, helps us to determine the age of a new sense. Similarly, for a candidate sense change from to , we say that the location of the sense change is if and only if that sense change does not get detected by comparing with any time interval , intermediate between and .
Table 1 gives a lot of candidate words for sense change. However, not all the candidate words were stable. Thus, it was important to prune these results using stability analysis. Also, it is to be noted that these results do not pin-point to the exact time-period, when the sense change might have taken place. For instance, among the candidate birth sense detected by comparing and , many of these new senses might have come up in between to as well. We prune these lists further based on the stability of the sense, as well as to locate the approximate time interval, in which the sense change might have occurred.
Table 2 shows the number of stable (at least twice) senses as well as the number of stable sense changes located in that particular time period. While this decreases recall, we found this to be beneficial for the accuracy of the method.
2498 | 3319 | 3901 | 4220 | 4238 | 4092 | |||
stable | 537 | 989 | 1368 | 1627 | 1540 | 1299 | ||
located | 537 | 754 | 772 | 686 | 420 | 300 | ||
1451 | 2330 | 2789 | 2834 | 2789 | ||||
stable | 343 | 718 | 938 | 963 | 810 | |||
located | 343 | 561 | 517 | 357 | 227 |
Once we were able to locate the senses as well as to find the age of the senses, we attempted to select some representative words and plotted them on a timeline as per the birth period and their age in Figure 2. The source time period here is 1909-1953.
During evaluation, we considered the clusters obtained using the 1909-1953 time-slice as our reference and attempted to track sense change by comparing these with the clusters obtained for 2002-2005. The sense change detected was categorized as to whether it was a new sense (birth), a single sense got split into two or more senses (split) or two or more senses got merged (join) or a particular sense died (death). We present a few instances of the resulting clusters in the paper and refer the reader to the supplementary material33http://cse.iitkgp.ac.in/resgrp/cnerg/acl2014_wordsense/ for the rest of the results.
The algorithm detected a lot of candidate words for the cases of birth, split/join as well as death. Since it was difficult to go through all the candidate sense changes for all the comparisons manually, we decided to randomly select some candidate words, which were flagged by our algorithm as undergoing sense change, while comparing 1909-1953 and 2002-2005 DT. We selected 48 random samples of candidate words for birth cases and 21 random samples for split/join cases. One of the authors annotated each of the birth cases identifying whether or not the algorithm signalled a true sense change while another author did the same task for the split/join cases. The accuracy as per manual evaluation was found to be 60.4% for the birth cases and 57% for the split/join cases.
Table 3 shows the evaluation results for a few candidate words, flagged due to birth. Columns correspond to the candidate words, words obtained in the cluster of each candidate word (we will use the term ‘birth cluster’ for these words, henceforth), which indicated a new sense, the results of manual evaluation as well as the possible sense this birth cluster denotes.
Sl | Candidate | birth cluster | Evaluation judgement, |
No. | Word | comments | |
1 | implant | gel, fibre, coatings, cement, materials, metal, filler | No, New set of words but |
silicone, composite, titanium, polymer, coating | similar sense already existed | ||
2 | passwords | browsers, server, functionality, clients, workstation | Yes, New sense related |
printers, software, protocols, hosts, settings, utilities | to ‘a computer sense’ | ||
3 | giants | multinationals, conglomerates, manufacturers | Yes, New sense as ‘an |
corporations, competitors, enterprises, companies | organization with very great | ||
businesses, brands, firms | size or force’ | ||
4 | donation | transplantation, donation, fertilization, transfusions | Yes, The new usage of donation |
transplant, transplants, insemination, donors, donor … | associated with body organs etc. | ||
5 | novice | negro, fellow, emigre, yankee, realist, quaker, teen | No, this looks like a false |
male, zen, lady, admiring, celebrity, thai, millionaire … | positive | ||
6 | partitions | server, printers, workstation, platforms, arrays | Yes, New usage related to |
modules, computers, workstations, kernel … | the ‘computing’ domain | ||
7 | yankees | athletics, cubs, tigers, sox, bears, braves, pirates | Yes, related to the ‘New |
cardinals, dodgers, yankees, giants, cardinals … | York Yankees’ team |
Table 4 shows the corresponding evaluation results for a few candidate words, flagged due to split or join.
Sl | Candidate | Source and target clusters |
No. | Word | |
1 | intonation | : whisper, glance, idioms, gesture, chant, sob, inflection, diction, sneer, rhythm, accents … |
(split) | : nod, tone, grimace, finality, gestures, twang, shake, shrug, irony, scowl, twinkle … | |
: accents, phrase, rhythm, style, phonology, diction, utterance, cadence, harmonies … | ||
Yes, corresponds to intonation in normal conversations while corresponds to the use of accents in | ||
formal and research literature | ||
2 | diagonal | : coast, edge, shoreline, coastline, border, surface, crease, edges, slope, sides, seaboard … |
(split) | : circumference, center, slant, vertex, grid, clavicle, margin, perimeter, row, boundary .. | |
: border, coast, seaboard, seashore, shoreline, waterfront, shore, shores, coastline, coasts | ||
Yes, the split is based on mathematics where as is based on geography | ||
3 | mantra | : sutra, stanza, chanting, chants, commandments, monologue, litany, verse, verses … |
(join) | : praise, imprecation, benediction, praises, curse, salutation, benedictions, eulogy … | |
: blessings, spell, curses, spells, rosary, prayers, blessing, prayer, benediction … | ||
Yes, the two seemingly distinct senses of mantra - a contextual usage for chanting and prayer () | ||
and another usage in its effect - salutations, benedictions () have now merged in . | ||
4 | continuum | : circumference, ordinate, abscissa, coasts, axis, path, perimeter, arc, plane axis … |
(split) | : roadsides, corridors, frontier, trajectories, coast, shore, trail, escarpment, highways … | |
: arc, ellipse, meridians, equator, axis, axis, plane, abscissa, ordinate, axis, meridian …. | ||
Yes, the split denotes the usage of ‘continuum’ with physical objects while the | ||
the split corresponds to its usages in mathematics domain. | ||
5 | headmaster | : master, overseer, councillor, chancellor, tutors, captain, general, principal … |
(join) | : mentor, confessor, tutor, founder, rector, vicar, graduate, counselor, lawyer … | |
: chaplain, commander, surveyor, coordinator, consultant, lecturer, inspector … | ||
No, it seems a false positive |
A further analysis of the words marked due to birth in the random samples indicates that there are 22 technology-related words, 2 slangs, 3 economics related words and 2 general words. For the split-join case we found that there are 3 technology-related words while the rest of the words are general. Therefore one of the key observations is that most of the technology related words (where the neighborhood is completely new) could be extracted from our birth results. In contrast, for the split-join instances most of the results are from the general category since the neighborhood did not change much here; it either got split or merged from what it was earlier.
In addition to manual evaluation, we also performed automated evaluation for the candidate words. We chose WordNet for automated evaluation because not only does it have a wide coverage of word senses but also it is being maintained and updated regularly to incorporate new senses. We did this evaluation for the candidate birth, join and split sense clusters obtained by comparing 1909-1953 time period with respect to 2002-2005. For our evaluation, we developed an aligner to align the word clusters obtained with WordNet senses. The aligner constructs a WordNet dictionary for the purpose of synset alignment. The CW cluster is then aligned to WordNet synsets by comparing the clusters with WordNet graph and the synset with the maximum alignment score is returned as the output. In summary, the aligner tool takes as input the CW cluster and returns a WordNet synset id that corresponds to the cluster words. The evaluation settings were as follows:
For a candidate word flagged as birth, we first find out the set of all WordNet synset ids for its CW clusters in the source time period (1909-1953 in this case). Let denote the union of these synset ids. We then find WordNet synset id for its birth-cluster, say . Then, if , it implies that this is a new sense that was not present in the source clusters and we call it a ‘success’ as per WordNet.
For the join case, we find WordNet synset ids and for the clusters obtained in the source time period and for the join cluster in the target time period. If and is either or , we call it a ‘success’.
For the split case, we find WordNet synset id for the source cluster and synset ids and for the target split clusters. If and either , or retains the id , we call it a ‘success’.
Table 5 show the results of WordNet based evaluation. In case of birth we observe a success of 44% while for split and join we observe a success of 46% and 43% respectively.
Category | No. of Candidate Words | Success Cases | |
---|---|---|---|
Birth | 810 | 44 | |
Split | 24 | 46 | |
Join | 28 | 43 |
We then manually verified some of the words that were deemed as successes, as well as investigated WordNet sense they were mapped to. Table 6 shows some of the words for which the evaluation detected success along with WordNet senses. Clearly, the cluster words correspond to a newer sense for these words and the mapped WordNet synset matches the birth cluster to a very high degree.
Sl | Candidate | birth cluster | Synset Id, |
---|---|---|---|
No. | Word | WordNet sense | |
1 | macro | code, query, handler, program, procedure, subroutine | 6582403, a set sequence of steps, |
module, script | part of larger computer program | ||
2 | caller | browser, compiler, sender, routers, workstation, cpu | 4175147, a computer that |
host, modem, router, server | provides client stations with access to files | ||
3 | searching | coding, processing, learning, computing, scheduling | 1144355, programming: setting an |
planning, retrieval, routing, networking, navigation | order and time for planned events | ||
4 | hooker | bitch, whore, stripper, woman slut, prostitute | 10485440, a woman who |
girl, dancer … | engages in sexual intercourse for money | ||
5 | drones | helicopters, fighters, rockets, flights, planes | 4264914, a craft capable of |
vehicles, bomber, missions, submarines … | traveling in outer space | ||
6 | amps | inverters, capacitor, oscillators, switches, mixer | 2955247, electrical device characterized |
transformer, windings, capacitors, circuits … | by its capacity to store an electric charge | ||
7 | compilers | interfaces, algorithms, programming, software | 6566077, written programs pertaining |
modules, libraries, routines, tools, utilities … | to the operation of a computer system |
Sl | Candidate | death cluster | Vanished meaning |
---|---|---|---|
No. | Word | ||
1 | slop | jeans, velveteen, tweed, woollen, rubber, sealskin, wear | clothes and bedding supplied to |
oilskin, sheepskin, velvet, calico, deerskin, goatskin, cloth … | sailors by the navy | ||
2 | blackmail | subsidy, rent, presents, tributes, money, fine, bribes | Origin: denoting protection money |
dues, tolls, contributions, contribution, customs, duties … | levied by Scottish chiefs | ||
3 | repertory | dictionary, study, compendium, bibliography, lore, directory | Origin: denoting an index |
catalogues, science, catalog, annals, digest, literature … | or catalog: from late Latin repertorium | ||
4 | phrasing | contour, outline, construction, handling, grouping, arrangement | in the sense ‘style or manner of |
structure, modelling, selection, form … | expression’: via late Latin Greek phrasis |
Slangs are words and phrases that are regarded as very informal, and are typically restricted to a particular context. New slang words come up every now and then, and this plays an integral part in the phenomena of sense change. We therefore decided to perform an evaluation as to how many slang words were being detected by our candidate birth clusters. We used a list of slangs available from the slangcity website44http://slangcity.com/email_archive/index_2003.htm. We collected slangs for the years 2002-2005 and found the intersection with our candidate birth words. Note that the website had a large number of multi-word expressions that we did not consider in our study. Further, some of the words appeared as either erroneous or very transient (not existing more than a few months) entires, which had to be removed from the list. All these removal left us with a very little space for comparison; however, despite this we found 25 slangs from the website that were present in our birth results, e.g. ‘bum’, ‘sissy’, ‘thug’, ‘dude’ etc.
Much of our evaluation was focussed on the birth sense clusters, mainly because these are more interesting from a lexicographic perspective. Additionally, the main theme of this work was to detect new senses for a given word. To detect a true death of a sense, persistence analysis was required, that is, to verify if the sense was persisting earlier and vanished after a certain time period. While such an analysis goes beyond the scope of this paper, we selected some interesting candidate “death” senses. Table 7 shows some of these interesting candidate words, their death cluster along with the possible vanished meaning, identified by the authors. While these words are still used in a related sense, the original meaning does not exist in the modern usage.
In this paper, we presented a completely unsupervised method to detect word sense changes by analyzing millions of digitized books archived spanning several centuries. In particular, we constructed DT networks over eight different time windows, clustered these networks and compared these clusters to identify the emergence of novel senses. The performance of our method has been evaluated manually as well as by comparison with WordNet and a list of slang words. Through manual evaluation we found that the algorithm could correctly identify 60.4% birth cases from a set of 48 random samples and 57% split/join cases from a set of 21 randomly picked samples. Quite strikingly, we observe that (i) in 44% cases the birth of a novel sense is attested by WordNet, (ii) in 46% cases the split of an older sense is signalled on comparison with WordNet and (iii) in 43% cases the join of two senses is attested by WordNet. These results might have strong lexicographic implications – even if one goes by very moderate estimates almost half of the words would be candidate entries in WordNet if they were not already part of it. This method can be extremely useful in the construction of lexico-semantic networks for low-resource languages, as well as for keeping lexico-semantic resources up to date in general.
Future research directions based on this work are manifold. On one hand, our method can be used by lexicographers in designing new dictionaries where candidate new senses can be semi-automatically detected and included, thus greatly reducing the otherwise required manual effort. On the other hand, this method can be directly used for various NLP/IR applications like semantic search, automatic word sense discovery as well as disambiguation. For semantic search, taking into account the newer senses of the word can increase the relevance of the query result. Similarly, a disambiguation engine informed with the newer senses of a word can increase the efficiency of disambiguation, and recognize senses uncovered by the inventory that would otherwise have to be wrongly assigned to covered senses. In addition, this method can be also extended to the ‘NNP’ part-of-speech (i.e., named entities) to identify changes in role of a person/place. Furthermore, it would be interesting to apply this method to languages other than English and to try to align new senses of cognates across languages.
AM would like to thank DAAD for supporting the faculty exchange programme to TU Darmstadt. PG would like to thank Google India Private Ltd. for extending travel support to attend the conference. MR and CB have been supported by an IBM SUR award and by LOEWE as part of the research center Digital Humanities.