Understanding Relation Temporality of Entities

Taesung Lee and Seung-won Hwang
Department of Computer Science and Engineering
Pohang University of Science and Technology (POSTECH)
Pohang, Republic of Korea
{elca4u, swhwang}@postech.edu

Abstract

This paper demonstrates the importance of relation equivalence for entity translation pair discovery. Existing approach of understanding relation equivalence has focused on using explicit features of co-occurring entities. In this paper, we explore latent features of temporality for understanding relation equivalence, and empirically show that the explicit and latent features complement each other. Our proposed hybrid approach of using both explicit and latent features improves relation translation by 0.16 F1-score, and in turn improves entity translation by 0.02.

1 Introduction

Understanding relations is important in entity tasks. In this paper, we illustrate such importance using named entity (NE) translation mining problem. Early research on NE translation used phonetic similarities, for example, to mine the translation ‘Mandelson’ $\rightarrow$ ‘{CJK*}UTF8zhfsæ¼å¾·å°æ£®’[ManDeErSen] with similar sounds [9, 15]. However, not all NE translations are based on transliterations, but they might be based on semantics (e.g., ‘WTO’ $\rightarrow$ ‘{CJK*}UTF8zhfsä¸è´¸ç»ç»’[ShiMaoZuZhi]), or even arbitrary (e.g., ‘Jackie Chan’ $\rightarrow$ ‘{CJK*}UTF8zhfsæé¾’[ChengLong]).

To address this challenge, current state-of-the-art approaches build an entity graph for each language corpus, and align the two graphs by propagating the seed translation similarities (Figure 1) [7, 17]. For example, arbitrary translation pair such as (Jackie Chan, {CJK*}UTF8zhfsæé¾) can be obtained, if he is connected to his film ‘Drunken Master’ ({CJK*}UTF8zhfséæ³) in both graphs. That is, we can propagate the seed translation similarity of (Drunken Master, {CJK*}UTF8zhfséæ³) to neighbor entities ‘Jackie Chan’ and ‘{CJK*}UTF8zhfsæé¾’ in each graph.

When two graphs are obtained from parallel corpora, graphs are symmetric and “blind propagation” described above is effective. In contrast, Lee and Hwang () propose “selective propagation” for asymmetric graphs, of comparing the semantics of relations. A key contribution of this paper is using relation temporality for determining relation equivalence. Existing work [14, 12, 11] uses only co-occurring entity pairs, or explicit features (EF). For example, for a relation

pay an official visit to

, with a statement (Bush,

pay an official visit to

, China), an entity pair (Bush, China) is in the “support set”, which is a set of co-occurring entity pairs of

pay an official visit to

. When its support set is {(Bush, China), (Mandelson, Moscow), (Rice, Israel)}, and that of visit is {(Bush, China), (Rice, Israel), (Medvedev, Cuba)}, we can infer their semantic equivalence based on the set intersection: {(Bush, China), (Rice, Israel)}.

Figure 1: Entity translation by propagation.

In contrast, we propose to explore corpus latent features (LF), to complement the sparsity problem of EF: Out of 158 randomly chosen correct relation translation pairs we labeled, 64% has only one co-occurring entity pair, which makes EF not very effective to identify these relation translations. Therefore, we leverage relation temporality, which is both orthogonal and complementary to existing efforts leveraging entity temporality [8, 6, 16]. In particular, we discover three new challenges on using temporality for relation understanding in comparable corpora, which we discuss in detail in Section 3.2. Based on these challenges, we identify three new features for LF.

We observe the complementary nature of EF and LF, then propose a hybrid approach combining both features. Our new hybrid approach significantly improves the relation translation (0.16 higher F1-score than EF), and in turn improves the entity translation (0.02 higher F1-score).

2 Preliminary: Entity Translation by Selective Propagation

Selective propagation, leveraging the statements extracted from bilingual comparable corpora, can be summarized by several steps.

Step 1

Initialize entity translation function $T_{N}^{(0)}$ .
Step 2

Build relation translation function $T_{R}^{(t)}$ using $T_{N}^{(t)}$ .
Step 3

Update entity translation function to acquire $T_{N}^{(t+1)}$ using $T_{R}^{(t)}$ .
Step 4

Repeat Step 2 and Step 3.

For Step 1, an existing method for entity translation is adopted. In our experiments, we use a non-selective (hence not requiring relation translations) propagation approach [17] with [10] for a base translation matrix. The focus of this paper is Step 2, building the translation score $T_{R}^{(t)}(r_{E},r_{C})$ of English relation $r_{E}$ and Chinese relation $r_{C}$ : We will discuss the detailed procedure of Step 2 and propose how to improve it in Section 3. Step 3 is the stage that selective propagation takes place.

Step 2 and Step 3 reinforce each other to improve the final entity translation function. While Step 3 is well-defined in [11], to propagate entity translation scores when the relation semantics of the edges are equivalent, Step 2 has been restricted to the explicit feature, i.e., co-occurring entities or shared context. In clear contrast, by discovering novel latent features based on temporal properties, we can increase the accuracy of both entity and relation translations. Note that we omit $t$ for readability in the following sections.

3 Relation Translation

In this section, we present our approaches to obtain relations of equivalent semantics across languages (e.g., visit $\rightarrow$ {CJK*}UTF8zhfsè®¿é®). Formally, our goal is to build the relation translation score function $T_{R}(r_{E},r_{C})$ for English relation $r_{E}$ and Chinese relation $r_{C}$ .

3.1 Baseline: Explicit Feature Approach (EF)

In this section, we briefly illustrate a baseline method EF [11]. As we mentioned in the introduction, traditional approaches leverage common co-occurring entity-pairs. This observation also holds in the bilingual environment by exploiting seed entity translations. For example, let us say that we have two extracted statements: (Bruce Willis, star in, The Sixth Sense) and ({CJK*}UTF8zhfså¸é²æ¯ $\cdot$ {CJK*}UTF8zhfså¨å©æ¯ (Bruce Willis), {CJK*}UTF8zhfsä¸»æ¼ (star in), {CJK*}UTF8zhfsç¬¬åæ (The Sixth Sense)). Knowing a few seed entity translations using $T_{N}$ , ‘Bruce Willis’ $\rightarrow$ ‘{CJK*}UTF8zhfså¸é²æ¯ $\cdot$ {CJK*}UTF8zhfså¨å©æ¯’ and ‘The Sixth Sense’ $\rightarrow$ ‘{CJK*}UTF8zhfsç¬¬åæ’, we can find star in and {CJK*}UTF8zhfsä¸»æ¼ are semantically similar.

Specifically, we quantify this similarity based on the number of such common entity pairs that we denote as $|H(r_{E},r_{C})|$ for an English relation $r_{E}$ and a Chinese relation $r_{C}$ . The existing approaches are variations of using $|H(r_{E},r_{C})|$ . Our baseline implementation uses the one by [11], and we refer the reader to the paper for formal definitions and processing steps we omitted due to the page limit.

Unfortunately, this approach suffers from sparsity of the common entity pairs due to the incomparability of the corpora and those entities that cannot be translated by $T_{N}$ . Therefore, we leverage corpus latent features as an additional signal to overcome this problem.

3.2 Latent Feature Approach (LF)

Temporal Feature Discovery

We exploit the temporal distribution $d[x](t)$ of textual element $x$ during $t$ -th week in statements; we count the occurrences of the element $x$ on a weekly basis, and normalize them to observe $\sum_{t}d[x](t)=1$ . For example, Figure 5 shows the relation temporal distribution $d[$ visit $](t)$ against week $t$ . Unlike entities, we can easily observe the dissimilarity of the temporal distributions of semantically equivalent relations. We identify the three big challenges in exploiting the temporality in relation translations.

[C1] Considering temporal distributions $d[r]$ of relations alone is not sufficient. For relations, such as visit, that involves diverse entities, the temporal distributions are highly noisy (Figure 5).

To address the first challenge, we use a finer-granularity unit for observing the temporality. More specifically, we exploit a coupling of a relation and an entity: $d[e,r,*]$ where $e$ is an entity, $r$ a relation, and * is a placeholder indicating that any noun phrase is accepted for the second argument of a statement.¹¹ We use both $d[e,r,*]$ and $d[*,r,e]$ to measure the relation translation scores and leverage the average score. But in this section, we only use $d[e,r,*]$ for readability. As shown in Figure 5, $d[e,r,*]$ is more distinctive and hence a key clue to find semantically equivalent relations.

{subfigure}

[b]0.9

Figure 2: Atemporality of equivalent relations:

d[

visit

]

and

d[

{CJK*}UTF8zhfsè®¿é®

]

{subfigure}

[b]0.9

Figure 3: Temporality of equivalent entity-relation couplings:

d[

Bush, visit, *

]

and

d[

{CJK*}UTF8zhfså¸ä», {CJK*}UTF8zhfsè®¿é®, *

]

{subfigure}

[b]0.9

Figure 4: Temporality of non-equivalent relations:

d[

deploy

]

and

d[

{CJK*}

UTF8zhfså¨…é¨ç½²

(deploy at)

]

Figure 5: Temporal distributions of a relation, and a coupling.

{subfigure}

[t]0.47

Figure 6: Temporal distribution of an entity having a peak.

{subfigure}

[t]0.47

Figure 7: Temporal distribution of a coupling of a relation and the entity.

Figure 8: False positive peak of an entity-relation coupling.

[C2] Considering entity-relation coupling distribution $d[e,r,*]$ alone is not sufficient due to the domination of individual temporality. For example, Figure 8 shows entity-dominating entity-relation temporality. If an entity has a peak at some period (Figure 8), most relations that are coupled with the entity would have a peak at the very same period (Figure 8). This makes all relations that appear with this entity very similar to each other regardlessly of semantics. To address this challenge, we use features to measure whether $d[e,r,*]$ is too close to either of $d[e]$ or $d[r]$ .

[C3] Lastly, we have to eliminate false positives in relation temporality. To illustrate, two relations deploy and {CJK*}UTF8zhfså¨…é¨ç½² (deploy at) have similar temporal behaviors (Figure 5). However, the first relation takes [person], but the second relation [location] for the second argument.

To address this, we check the common co-occurring entity pair of the relations. For example, we can obtain “Russia deployed an aircraft carrier”, but not “Russia deployed at ({CJK*}UTF8zhfså¨…é¨ç½²) an aircraft carrier”. Thus, we cannot acquire any common entity pair like (Russia, aircraft carrier) for deploy and {CJK*}UTF8zhfså¨…é¨ç½² (deploy at).

Relation Similarity Computation

We compute the similarity of two relations $r_{E}$ in English and $r_{C}$ in Chinese using the following 2-steps.

$\bullet$

Compute the similarity $S_{CP}(r_{E},r_{C},e_{E},e_{C})$ of temporal distributions of entity-relation couplings for each bilingual entity pair $(e_{E},e_{C})$ .
$\bullet$

Compute the translation score $T_{R}^{LF}(r_{E},r_{C})$ by aggregating the coupling similarities.

Considering the three challenges, we produce a list of features $\{f_{x}(r_{E},r_{C},e_{E},e_{C})\}$ to measure the coupling similarity $S_{CP}(r_{E},r_{C},e_{E},e_{C})$ as follows.

$\bullet$

[Base feature] $f_{ET}$ : $T_{N}(e_{E},e_{C})$ . The entity translation score obtained in the previous iteration or the seed entity translation score.
$\bullet$

[C1] $f_{ER}$ : $1-JSD(d[e_{E},r_{E},*],d[e_{C},r_{C},*])$ . The temporal similarity of the couplings, where $JSD(P,Q)$ is the Jensen-Shannon divergence of two distributions $P$ and $Q$ , defined as $JSD(P,Q)=\frac{1}{2}D(P||M)+\frac{1}{2}D(Q||M)$ , with $M=\frac{1}{2}(P+Q)$ and $D(P||M)=\sum_{i}P(i)\log\frac{P(i)}{M(i)}$ .
$\bullet$

[C2] $f_{D1,E}$ , $f_{D2,E}$ , $f_{D1,C}$ , $f_{D2,C}$ :

$\displaystyle JSD(d[e_{E}],d[\scalebox{0.7}[1.0]{$e_{E},r_{E},*$}]),\,JSD(d[r_% {E}],d[\scalebox{0.7}[1.0]{$e_{E},r_{E},*$}])$

$\displaystyle JSD(d[e_{C}],d[\scalebox{0.7}[1.0]{$e_{C},r_{C},*$}]),\,JSD(d[r_% {C}],d[\scalebox{0.7}[1.0]{$e_{C},r_{C},*$}])$

Entity to entity-relation distribution difference (D1) and relation to entity-relation distribution difference (D2), for English and Chinese respectively.
$\bullet$

[C3] $f_{EX}$ : The existence of a common entity pair using the seed entity translations (boolean). That is, $f_{EX}=1$ if $|H(r_{E},r_{C})|\geq 1$ , and $f_{EX}=0$ otherwise.

Additionally, we use the following features to consider absolute frequencies $freq(\cdot)$ of textual elements as well because 1) we are more confident with more evidence and 2) in the comparable corpora, the equivalent elements are likely to show similar frequencies.

$\bullet$

$f_{FW,E}$ , $f_{FW,C}$ : $\mathcal{S}(freq(e_{E},r_{E}))$ and $\mathcal{S}(freq(e_{C},r_{C}))$ . $\mathcal{S}(x)$ is a normalization function, for which we use a sigmoid function over a linear transformation of $x$ .
$\bullet$

$f_{FS1}$ and $f_{FS2}$ :

$\frac{\min(freq(e_{E},r_{E}),freq(e_{C},r_{C}))}{\max(freq(e_{E},r_{E}),freq(e% _{C},r_{C}))},$

$\frac{\min(freq(r_{E}),freq(r_{C}))}{\max(freq(r_{E}),freq(r_{C}))}$

With these features, we measure the similarity of a pair of couplings as follows.

S_{CP}(r_{E},r_{C},e_{E},e_{C})=\prod_{x}f_{x}(r_{E},r_{C},e_{E},e_{C})

(1)

By aggregating coupling similarities, we measure the translation score of two relations:

T_{R}^{\mathrm{LF}}(r_{E},r_{C})=\sum_{(e_{E},e_{C})\in T}S_{CP}(r_{E},r_{C},e% _{E},e_{C})

(2)

where $T=\{(e_{E},e_{C})|T_{N}(e_{E},e_{C})\geq\theta\}$ with $\theta=0.6$ , a set of translation pairs obtained in the seeds or previous iteration such as (Bush, {CJK*}UTF8zhfså¸ä»).

We normalize the obtained function values for each English relations using the top- $k$ Chinese translations. That is, for $(r_{E},r_{C})$ , we redefine the score as $T_{R}^{LF}(r_{E},r_{C})/\sum_{i\in[1,k]}T_{R}^{LF}(r_{E},r_{C}^{rank_{i}})$ where $r_{C}^{rank_{i}}$ is the $i$ -th rank Chinese relation for $r_{E}$ by Equation 2. We empirically set $k=4$ .

3.3 Hybrid Approach LF+EF

We find that LF and EF are complementary. Table 1 shows the examples of relations and their translations. In general, LF can translate more relations (e.g., support and capture). However, in some cases like ratify, highly related relations may induce noise. That is, we always {CJK*}UTF8zhfsè®¨è®º (discuss) before we {CJK*}UTF8zhfsæ¹å (ratify) something and hence the temporal behavior of {CJK*}UTF8zhfsè®¨è®º (discuss) is also very similar to that of ratify. On the other hand, it can be correctly translated using EF.

Thus, we produce the hybrid relation translation, and we empirically set $\lambda=0.4$ :

T_{R}^{\scalebox{0.6}[0.7]{LF+EF}}(r_{E},r_{C}){\,=\,}\lambda T_{R}^{\scalebox% {0.6}[0.7]{LF}}(r_{E},r_{C}){\scalebox{0.8}[1.0]{$+(1-\lambda)$}}T_{R}^{% \scalebox{0.6}[0.7]{EF}}(r_{E},r_{C})

(3)

²²footnotetext: The correct translation {CJK*}UTF8zhfsæ¹å (ratify) is ranked second.

English	LF	EF
visit	{CJK*}UTF8zhfsè®¿é® (visit)	{CJK*}UTF8zhfsè®¿é® (visit)
support	{CJK*}UTF8zhfså…æä¾ (provide to …)	-
ratify	{CJK*}UTF8zhfsè®¨è®º (discuss) ${}^{\decimal{footnote}}$	{CJK*}UTF8zhfsæ¹å (ratify)

Table 1: Examples of relation translations.

4 Evaluation

In this section, we evaluate the proposed approach on the entity translation task and the relation translation task. We extract English and Chinese statements from news articles in 2008 by Xinhua news who publishes news in both English and Chinese, which were also used by Lee and Hwang (). The number of English articles is 100,746, and that of Chinese articles is 88,031. As we can see from the difference in the numbers of the documents, the news corpora are not direct translations, but they have asymmetry of entities and relations.

4.1 Entity Translation

	Person			Organization
Method	P.	R.	F1	P.	R.	F1
LF+EF	0.84	0.80	0.82	0.60	0.52	0.56
EF	0.81	0.79	0.80	0.56	0.52	0.54
Seed	0.80	0.77	0.78	0.49	0.44	0.46
PH+SM	0.59	0.59	0.59	0.29	0.29	0.29

Table 2: Entity translation comparison.

In this section, we present experimental settings and results on translating entities using our proposed approaches. To measure the effectiveness, we use a set of gold standard entity translation pairs which consist of 221 person entities and 52 organization entities. We measure precision, recall, and F1-score based on the returned translation pairs for each English entity as it is done in [11].

We compare our hybrid approach, which is denoted by LF+EF with EF [11], a combined approach PH+SM of phonetic similarity and letter-wise semantic translation for better accuracy for organizations [10], and the seed translations Seed that we adopt [17] with PH+SM as a base translation matrix.³³Our results leveraging relational temporality outperforms the reported results using entity temporality on the same data set. The two approaches using temporality are orthogonal and can be aggregated, which we leave as our future directions. We process one iteration of the entire framework (Step 1-3) for both LF+EF and EF.

Table 2 shows the comparison of the methods. Our proposed approach LF+EF shows higher F1-score than the baselines. In particular, our approach outperforms EF. For example, ‘Matthew Emmons’ is a lesser known entity, and we have only few statements mentioning him in the corpora. The corpus explicit feature EF alone cannot translate the relation win and, in turn, ‘Matthew Emmons’. However, LF+EF translates him correctly into {CJK*}UTF8zhfsé©¬ä¿®Â·åèæ¯ through the relation win.

4.2 Relation Translation

This section considers the relation translation task. Each relation translation method translates an English relation $r_{E}$ into a list of Chinese relations, and we check whether a Chinese relation $r_{C}$ with the highest translation score is the correct translation. We consider the relation translation is correct when the semantics are equivalent. For example, {CJK*}UTF8zhfså» (leave for/go to) is a correct translation of leave for, but {CJK*}UTF8zhfsç¦»å¼ (leave) is not. Total 3342 English-Chinese relation translation pairs returned by our method and the baselines are randomly shown and labeled. Out of 3342 pairs, 399 are labeled as correct.

Methods	Precision	Recall	F1
LF+EF	0.37	0.44	0.40
LF	0.26	0.25	0.26
EF	0.41	0.17	0.24

Table 3: Relation translation comparison.

Table 3 shows the comparisons of LF, EF and their hybrid LF+EF. We can clearly see that LF shows higher recall than EF while EF shows higher precision. As we emphasized in Section 3.3, we can see their complementary property. Their hybrid LF+EF has both high precision and recall, thus has the highest F1-score.

Note that the absolute numbers (due to the harsh evaluation criteria) may look low. But the top translations are still relevant (e.g., fight is translated to {CJK*}UTF8zhfsé©» (deploy troops)). In addition, the lower ranked but correct relation translations also affect entity translation. Therefore, even lower-performing EF boosted the entity translations, and in effect, our approach could achieve higher F1-score in the entity translation task.

Eng. Rel.	C1	C1+C2	C1+C2+C3	EF
visit	15	4	1	1
drop	21	14	1	-
capture	6	4	1	-

Table 4: Rank of correct relation translation. The symbol ‘-’ indicates no correct translation.

To illustrate the detailed effects of the corpus latent features, Table 4 shows the ranks of correct Chinese translations for English relations by methods using selected features for the challenges. For comparison, the ranks of the correct translations when using EF are shown. Our approach using the entity-relation coupling similarity feature for [C1] alone often cannot find the correct translations. But using all features removes such noise.

5 Conclusion

This paper studied temporality features for relation equivalence. With the proposed features, we devised a hybrid approach combining corpus latent and explicit features with complementary strength. We empirically showed the effectiveness of our hybrid approach on relation translation, and it, in turn, improved entity translation.

Acknowledgments

This research was supported by the MSIP (The Ministry of Science, ICT and Future Planning), Korea and Microsoft Research, under IT/SW Creative research program supervised by the NIPA(National IT Industry Promotion Agency). (NIPA-2013-H0503-13-1009).

References

[1] (1998) Cited by: 15.
[2] (2013) Cited by: 16, 11.
[3] (2011) Cited by: 7.
[4] (2011) Cited by: 12.
[5] (2012) Cited by: 14.
[6] J. Kim, S. Hwang, L. Jiang, Y. Song and M. Zhou(2012) Entity translation mining from comparable corpora: combining graph mapping with corpus latent features. pp. 1787–1800. Cited by: 1.
[7] J. Kim, L. Jiang, S. Hwang, Y. Song and M. Zhou Mining entity translations from comparable corpora: a holistic graph mapping approach. See 3, pp. 1295–1304. Cited by: 1.
[8] A. Klementiev and D. Roth Named entity transliteration and discovery from multilingual comparable corpora. See 13, pp. 82–88. Cited by: 1.
[9] K. Knight and J. Graehl(1998-12) Machine transliteration. Computational Linguistics 24 (4), pp. 599–612. External Links: ISSN 0891-2017 Cited by: 1.
[10] W. Lam, S. Chan and R. Huang(2007-02) Named entity translation matching and learning: with application for mining unseen translations. ACM Transactions on Information Systems 25 (1). External Links: ISSN 1046-8188 Cited by: 2, 4.1.
[11] T. Lee and S. Hwang Bootstrapping entity translation on weakly comparable corpora. See 2, Cited by: 1, 2, 3.1, 3.1, 4.1, 4.1, 4.
[12] T. Mohamed, E. Hruschka and T. Mitchell Discovering relations between noun categories. See 4, pp. 1447–1455. Cited by: 1.
[13] (2006) Cited by: 8.
[14] N. Nakashole, G. Weikum and F. M. Suchanek PATTY: A Taxonomy of Relational Patterns with Semantic Types. See 5, Cited by: 1.
[15] S. Wan and C. M. Verspoor Automatic English-Chinese name transliteration for development of multilingual resources. See 1, pp. 1352–1356. Cited by: 1.
[16] G. You, Y. Cha, J. Kim and S. Hwang Enriching entity translation discovery using selective temporality.. See 2, pp. 201–205. Cited by: 1.
[17] G. You, S. Hwang, Y. Song, L. Jiang and Z. Nie(2012-11) Efficient entity translation mining: a parallelized graph alignment approach. ACM Transactions on Information Systems 30 (4), pp. 25:1–25:23. Cited by: 1, 2, 4.1.

Generated on Wed Jun 11 18:39:10 2014 by LaTeXML [LOGO]

	$\displaystyle JSD(d[e_{E}],d[\scalebox{0.7}[1.0]{$e_{E},r_{E},$}]),\,JSD(d[r_% {E}],d[\scalebox{0.7}[1.0]{$e_{E},r_{E},$}])$
	$\displaystyle JSD(d[e_{C}],d[\scalebox{0.7}[1.0]{$e_{C},r_{C},$}]),\,JSD(d[r_% {C}],d[\scalebox{0.7}[1.0]{$e_{C},r_{C},$}])$