Dialog topic tracking aims at analyzing and maintaining topic transitions in on-going dialogs. This paper proposes a composite kernel approach for dialog topic tracking to utilize various types of domain knowledge obtained from Wikipedia. Two kernels are defined based on history sequences and context trees constructed based on the extracted features. The experimental results show that our composite kernel approach can significantly improve the performances of topic tracking in mixed-initiative human-human dialogs.
Human communications in real world situations interlace multiple topics which are related to each other in conversational contexts. This fact suggests that a dialog system should be also capable of conducting multi-topic conversations with users to provide them a more natural interaction with the system. However, the majority of previous work on dialog interfaces has focused on dealing with only a single target task. Although some multi-task dialog systems have been proposed [10, 6, 4], they have aimed at just choosing the most probable one for each input from the sub-systems, each of which is independently operated from others.
To analyze and maintain dialog topics from a more systematic perspective in a given dialog flow, some researchers [12, 8, 1] have considered this dialog topic identification as a separate sub-problem of dialog management and attempted to solve it with text categorization approaches for the recognized utterances in a given turn. The major obstacle to the success of these approaches results from the differences between written texts and spoken utterances. In most text categorization tasks, the proper category for each textual unit can be assigned based only on its own content. However, the dialog topic at each turn can be determined not only by the user’s intentions captured from the given utterances, but also by the system’s decisions for dialog management purposes. Thus, the text categorization approaches can only be effective for the user-initiative cases when users tend to mention the topic-related expressions explicitly in their utterances.
The other direction of dialog topic tracking approaches made use of external knowledge sources including domain models [13], heuristics [15], and agendas [2, 9]. These knowledge-based methods have an advantage of dealing with system-initiative dialogs, because dialog flows can be controlled by the system based on given resources. However, this aspect can limit the flexibility to handle the user’s responses which are contradictory to the system’s suggestions. Moreover, these approaches face cost problems for building a sufficient amount of resources to cover broad states of complex dialogs, because these resources should be manually prepared by human experts for each specific domain.
In this paper, we propose a composite kernel to explore various types of information obtained from Wikipedia for mixed-initiative dialog topic tracking without significant costs for building resources. Composite kernels have been successfully applied to improve the performances in other NLP problems [17, 16] by integrating multiple individual kernels, which aim to overcome the errors occurring at one level by information from other levels. Our composite kernel consists of a history sequence and a domain context tree kernels, both of which are composed based on similar textual units in Wikipedia articles to a given dialog context.
Speaker | Utterance | Topic Transition | |
---|---|---|---|
0 | Guide | How can I help you? | NONENONE |
\hdashline[.4pt/1pt] 1 | Tourist | Can you recommend some good places to visit in Singapore? | NONEATTR |
Guide | Well if you like to visit an icon of Singapore, Merlion park will be a nice place to visit. | ||
\hdashline[.4pt/1pt] 2 | Tourist | Merlion is a symbol for Singapore, right? | ATTRATTR |
Guide | Yes, we use that to symbolise Singapore. | ||
\hdashline[.4pt/1pt] 3 | Tourist | Okay. | ATTRATTR |
Guide | The lion head symbolised the founding of the island and the fish body just symbolised the humble fishing village. | ||
\hdashline[.4pt/1pt] 4 | Tourist | How can I get there from Orchard Road? | ATTRTRSP |
Guide | You can take the north-south line train from Orchard Road and stop at Raffles Place station. | ||
\hdashline[.4pt/1pt] 5 | Tourist | Is this walking distance from the station to the destination? | TRSPTRSP |
Guide | Yes, it’ll take only ten minutes on foot. | ||
\hdashline[.4pt/1pt] 6 | Tourist | Alright. | TRSPFOOD |
Guide | Well, you can also enjoy some seafoods at the riverside near the place. | ||
\hdashline[.4pt/1pt] 7 | Tourist | What food do you have any recommendations to try there? | FOODFOOD |
Guide | If you like spicy foods, you must try chilli crab which is one of our favourite dishes here in Singapore. | ||
\hdashline[.4pt/1pt] 8 | Tourist | Great! I’ll try that. | FOODFOOD |
Dialog topic tracking can be considered as a classification problem to detect topic transitions. The most probable pair of topics at just before and after each turn is predicted by the following classifier: , where contains the input features obtained at a turn , , and is a closed set of topic categories. If a topic transition occurs at , should be different from . Otherwise, both and have the same value.
Figure 1 shows an example of dialog topic tracking in a given dialog fragment on Singapore tour guide domain between a tourist and a guide. This conversation is divided into three segments, since detects three topic transitions at , and . Then, a topic sequence of ‘Attraction’, ‘Transportation’, and ‘Food’ is obtained from the results.
The classifier can be built on the training examples annotated with topic labels using supervised machine learning techniques. Although some fundamental features extracted from the utterances mentioned at a given turn or in a certain number of previous turns can be used for training the model, this information obtained solely from an ongoing dialog is not sufficient to identify not only user-initiative, but also system-initiative topic transitions.
To overcome this limitation, we propose to leverage on Wikipedia as an external knowledge source that can be obtained without significant effort toward building resources for topic tracking. Recently, some researchers [14, 3] have shown the feasibility of using Wikipedia knowledge to build dialog systems. While each of these studies mainly focuses only on a single type of information including category relatedness or hyperlink connectedness, this work aims at incorporating various knowledge obtained from Wikipedia into the model using a composite kernel method.
Our composite kernel consists of two different kernels: a history sequence kernel and a domain context tree kernel. Both represent the current dialog context at a given turn with a set of relevant Wikipedia paragraphs which are selected based on the cosine similarity between the term vectors of the recently mentioned utterances and each paragraph in the Wikipedia collection as follows:
where is the input, is the -th paragraph in the Wikipedia collection, is the term vector extracted from . The term vector for the input , , is computed by accumulating the weights in the previous turns as follows:
where , is the utterance mentioned in a turn , is the product of term frequency of a word in and inverse document frequency of , is a decay factor for giving more importance to more recent turns, is the size of word dictionary, and is the number of previous turns considered as dialog history features.
After computing this relatedness between the current dialog context and every paragraph in the Wikipedia collection, two kernel structures are constructed using the information obtained from the highly-ranked paragraphs in the Wikipedia.
The first structure to be constructed for our composite kernel is a sequence of the most similar paragraph IDs of each turn from the beginning of the session to the current turn. Formally, the sequence at a given turn is defined as:
where .
Since our hypothesis is that the more similar the dialog histories of the two inputs are, the more similar aspects of topic transtions occur for them, we propose a sub-sequence kernel [11] to map the data into a new feature space defined based on the similarity of each pair of history sequences as follows:
where is a finite set of paragraph IDs, is a finite sequence of paragraph IDs, is a subsequence of , is the subsequence with the -th characters , is the length of the subsequence, and is a decay factor.
The other kernel incorporates more various types of domain knowledge obtained from Wikipedia into the feature space. In this method, each instance is encoded in a tree structure constructed following the rules in Figure 2. The root node of a tree has few children, each of which is a subtree rooted at each paragraph node in:
where is a threshold value to select the relevant paragraphs. Each subtree consists of a set of features from a given paragraph in the Wikipedia collection in a hierarchical structure. Figure 3 shows an example of a constructed tree.
<TREE>:=(ROOT <PAR>...<PAR>) <PAR>:=(PAR_ID <PARENTS> <PREV_PAR><NEXT_PAR><LINKS>) <PARENTS>:=(`PARENTS' <ART><SEC>) <ART>:=(ART_ID <ART_NAME><CAT_LIST>) <ART_NAME>:=(`ART_NAME' ART_NAME) <CAT_LIST>:=(`CAT' <CAT>...<CAT>) <CAT>:=(CAT_ID *) <SEC>:=(SEC_ID <SEC_NAME><PARENT_SEC> <PREV_SEC><NEXT_SEC>) <SEC_NAME>:=(`SEC_NAME' SEC_NAME) <PARENT_SEC>:=(`PRN_SEC', PRN_SEC_ID) <PREV_SEC>:=(`PREV_SEC', PREV_SEC_NAME) <NEXT_SEC>:=(`NEXT_SEC', NEXT_SEC_NAME) <PREV_PAR>:=(`PREV_PAR', PREV_PAR_ID) <NEXT_PAR>:=(`NEXT_PAR', NEXT_PAR_ID) <LINKS>:=(`LINKS' <LINK>...<LINK>) <LINK>:=(LINK_NAME *)
Since this constructed tree structure represents semantic, discourse, and structural information extracted from the similar Wikipedia paragraphs to each given instance, we can explore these more enriched features to build the topic tracking model using a subset tree kernel [5] which computes the similarity between each pair of trees in the feature space as follows:
where is the set of ’s nodes, , and is a function that is 1 iff the -th tree fragment occurs with root at node and 0 otherwise.
In this work, a composite kernel is defined by combining the individual kernels including history sequence and domain context tree kernels, as well as the linear kernel between the vectors representing fundamental features extracted from the utterances themselves and the results of linguistic preprocessors. The composition is performed by linear combination as follows:
where , , and are the feature vector, history sequence, and domain context tree of , respectively, is the linear kernel computed by inner product of the vectors, , , and are coefficients for linear combination of three kernels, and .
To demonstrate the effectiveness of our proposed kernel method for dialog topic tracking, we performed experiments on the Singapore tour guide dialogs which consists of 35 dialog sessions collected from real human-human mixed initiative conversations related to Singapore between guides and tourists. All the recorded dialogs with the total length of 21 hours were manually transcribed, then these transcribed dialogs with 19,651 utterances were manually annotated with the following nine topic categories: Opening, Closing, Itinerary, Accommodation, Attraction, Food, Transportation, Shopping, and Other.
Since we aim at developing the system which acts as a guide communicating with tourist users, an instance for both training and prediction of topic transition was created for each turn of tourists. The annotation of an instance is a pair of previous and current topics, and the actual number of labels occurred in the dataset is 65.
For each instance, the term vector was generated from the utterances in current user turn, previous system turn, and history turns within the window sizes . Then, the history sequence and tree context structures for our composite kernel were constructed based on 3,155 articles related to Singapore collected from Wikipedia database dump as of February 2013. For the linear kernel baseline, we used the following features: n-gram words, previous system actions, and current user acts which were manually annotated. Finally, 8,318 instances were used for training the model.
We trained the SVM models using SVMlight 11http://svmlight.joachims.org/ [7] with the following five different combinations of kernels: only, with as features, , , and . The threshold value for selecting was 0.5, and the combinations of kernels were performed with the same , , or coefficient values for all sub-kernels. All the evaluations were done in five-fold cross validation to the manual annotations with two different metrics: one is accuracy of the predicted topic label for every turn, and the other is precision/recall/F-measure for each event of topic transition occurred either in the answer or the predicted result.
Turn-level | Transition-level | |||||
---|---|---|---|---|---|---|
Accuracy | P | R | F | |||
62.45 | 42.77 | 24.77 | 31.37 | |||
62.44 | 42.76 | 24.77 | 31.37 | |||
67.19 | 39.94 | 40.59 | 40.26 | |||
68.54 | 45.55 | 35.69 | 40.02 | |||
All | 69.98 | 44.82 | 39.83 | 42.18 |
Table 1 compares the performances of the five combinations of kernels. When just the paragraph IDs were included as additional features, it failed to improve the performances from the baseline without any external features. However, our proposed kernels using history sequences and domain context trees achieved significant performances improvements for both evaluation metrics. While the history sequence kernel enhanced the coverage of the model to detect topic transitions, the domain context tree kernel contributed to produce more precise outputs. Finally, the model combining all the kernels outperformed the baseline by 7.53% in turn-level accuracy and 10.81% in transition-level F-measure.
The error distributions in Figure 4 indicate that these performance improvements were achieved by resolving the errors not only on user-initiative topic transitions, but also on system-initiative cases, which implies the effectiveness of the structured knowledge from Wikipedia to track the topics in mixed-initiative dialogs.
This paper presented a composite kernel approach for dialog topic tracking. This approach aimed to represent various types of domain knowledge obtained from Wikipedia as two structures: history sequences and domain context trees; then incorporate them into the model with kernel methods. Experimental results show that the proposed approaches helped to improve the topic tracking performances in mixed-initiative human-human dialogs with respect to the baseline model.