This paper explores a simple and effective unified framework for incorporating soft linguistic reordering constraints into a hierarchical phrase-based translation system: 1) a syntactic reordering model that explores reorderings for context free grammar rules; and 2) a semantic reordering model that focuses on the reordering of predicate-argument structures. We develop novel features based on both models and use them as soft constraints to guide the translation process. Experiments on Chinese-English translation show that the reordering approach can significantly improve a state-of-the-art hierarchical phrase-based translation system. However, the gain achieved by the semantic reordering model is limited in the presence of the syntactic reordering model, and we therefore provide a detailed analysis of the behavior differences between the two.
Reordering models in statistical machine translation (SMT) model the word order difference when translating from one language to another. The popular distortion or lexicalized reordering models in phrase-based SMT make good local predictions by focusing on reordering on word level, while the synchronous context free grammars in hierarchical phrase-based (HPB) translation models are capable of handling non-local reordering on the translation phrase level. However, reordering, especially without any help of external knowledge, remains a great challenge because an accurate reordering is usually beyond these word level or translation phrase level reordering models’ ability. In addition, often these translation models fail to respect linguistically-motivated syntax and semantics. As a result, they tend to produce translations containing both syntactic and semantic reordering confusions. In this paper our goal is to take advantage of syntactic and semantic parsing to improve translation quality. Rather than introducing reordering models on either the word level or the translation phrase level, we propose a unified approach to modeling reordering on the linguistic unit level, e.g., syntactic constituents and semantic roles. The reordering unit falls into multiple granularities, from single words to more complex constituents and semantic roles, and often crosses translation phrases. To show the effectiveness of our reordering models, we integrate both syntactic constituent reordering models and semantic role reordering models into a state-of-the-art HPB system [4, 7]. We further contrast it with a stronger baseline, already including fine-grained soft syntactic constraint features [24, 3]. The general ideas, however, are applicable to other translation models, e.g., phrase-based model, as well.
Our syntactic constituent reordering model considers context free grammar (CFG) rules in the source language and predicts the reordering of their elements on the target side, using word alignment information. Due to the fact that a constituent, especially a long one, usually maps into multiple discontinuous blocks in the target language, there is more than one way to describe the monotonicity or swapping patterns; we therefore design two reordering models: one is based on the leftmost aligned target word and the other based on the rightmost target word.
While recently there has also been some encouraging work on incorporating semantic structure (or, more specifically, predicate-argument structure: PAS) reordering in SMT, it is still an open question whether semantic structure reordering strongly overlaps with syntactic structure reordering, since the semantic structure is closely tied to syntax. To this end, we employ the same reordering framework as syntactic constituent reordering and focus on semantic roles in a PAS. We then analyze the differences between the syntactic and semantic features.
The contributions of this paper include the following:
We introduce novel soft reordering constraints, using syntactic constituents or semantic roles, composed over word alignment information in translation rules used during decoding time;
We introduce a unified framework to incorporate syntactic and semantic reordering constraints;
We provide a detailed analysis providing insight into why the semantic reordering model is significantly less effective when syntactic reordering features are also present.
The rest of the paper is organized as follows. Section 2 provides an overview of HPB translation model. Section 3 describes the details of our unified reordering models. Section 4 gives our experimental results and Section 5 discusses the behavior difference between syntactic constituent reordering and semantic role reordering. Section 6 reviews related work and, finally Section 7 concludes the paper.
In HPB models [4], synchronous rules take the form , where is the non-terminal symbol, and are strings of lexical items and non-terminals in the source and target side, respectively, and indicates the one-to-one correspondence between non-terminals in and . Each such rule is associated with a set of translation model features , such as phrase translation probability and its inverse , the lexical translation probability and its inverse , and a rule penalty that affects preference for longer or shorter derivations. Two other widely used features are a target language model feature and a target word penalty.
Given a derivation , its translation log-probability is estimated as:
(1) |
where is the corresponding weight of feature . See [4] for more details.
As mentioned earlier, the linguistic reordering unit is the syntactic constituent for syntactic reordering, and the semantic role for semantic reordering. The syntactic reordering model takes a CFG rule (e.g., ) and models the reordering of the constituents on the left hand side by examining their translation or visit order according to the target language. For the semantic reordering model, it takes a PAS and models its reordering on the target side. Figure 1 shows an example of a PAS where the predicate (Pre) has two core arguments (A0 and A1) and one adjunct (TMP). Note that we refer all core arguments, adjuncts, and predicates as semantic roles; thus we say the PAS in Figure 1 has 4 roles. According to the annotation principles in (Chinese) PropBank [28, 42], all the roles in a PAS map to a corresponding constituent in the parse tree, and these constituents (e.g., NPs and VBD in Figure 1) do not overlap with each other.
Next, we use a CFG rule to describe our syntactic reordering model. Treating the two forms of reorderings in a unified way, the semantic reordering model is obtainable by regarding a PAS as a CFG rule and considering a semantic role as a constituent.
Because the translation of a source constituent might result in multiple discontinuous blocks, there can be several ways to describe or group the reordering patterns. Therefore, we design two general constituent reordering sub-models. One is based on the leftmost aligned word (leftmost reordering model) and the other is based on the rightmost aligned word (rightmost reordering model), as follows. Figure 2 shows the modeling steps for the leftmost reordering model. Figure 2(a) is an example of a CFG rule in the source parse tree and its word alignment links to the target language. Note that constituent , which covers word , has no alignment. Then for each , we find the leftmost target word which is aligned to a source word covered by . Figure 2(b) shows that the leftmost target words for , , and are , , and , respectively, while has no aligned target word. Then we get visit order for in the transformation from Figure 2(b) to Figure 2(c), with the following strategies for special cases:
if the first constituent is unaligned, we add a NULL word at the beginning of the target side and link to the NULL word;
if a constituent () is unaligned, we add a link to the target word which is aligned to , e.g., will be linked to ; and
if constituents () are linked to the same target word, then , e.g., since and are both linked to , then .
Finally Figure 2(d) converts the visit order into a sequence of leftmost reordering types . For every two adjacent constituents and with corresponding visit order and , their reordering could be one of the following:
Monotone (M) if ;
Discontinuous Monotone (DM) if ;
Swap (S) if ;
Discontinuous Swap (DS) if .
Up to this point, we have generated a sequence of leftmost reordering types for a given CFG rule cfg: . The leftmost reordering model takes the following form:
(2) |
where indicates the surrounding context of the CFG. By assuming that any two reordering types in are independent of each other, we reformulate Eq. 2 into:
(3) |
Similarly, the sequence of rightmost reordering types RRT can be decided for a CFG rule .
Accordingly, for a PAS pas: , we can obtain its sequences of leftmost and rightmost reordering types by using the same way described above.
In order to predict either the leftmost or rightmost reordering type for two adjacent constituents, we use a maximum entropy classifier to estimate the probability of the reordering type as follows:
(4) | |||
where are binary features, are the weights of these features. Most of our features are syntax-based. For and in cfg, the features are aimed to examine which of them should be translated first. Therefore, most features share two common components: the syntactic categories of and . Table 1 shows the features used in syntactic leftmost and rightmost reordering models. Note that we use the same features for both.
#Index | Feature | ||
---|---|---|---|
cf1 | (XP) & (XP) & (XP) | ||
cf2 |
| ||
cf3 |
| ||
cf4 | (XP) & (XP) & (XP) | ||
cf5 | (XP) & (XP) & (XP) | ||
cf6 | (XP) & (XP) & (XP) | ||
cf7 | (XP) & (XP) & (XP) | ||
cf8 | (XP) & (XP) & (XP) | ||
cf9 | (XP) & (XP) & (XP) | ||
cf10 | (XP) & (XP) | ||
cf11 | (XP) & (XP) |
Although the semantic reordering model is structured in precisely the same way, we use different feature sets to predict the reordering between two semantic roles. Given the two adjacent roles and in a PAS pas, Table 2 shows the features that are used in the semantic leftmost and rightmost reordering models.
#Index | Feature | |||
---|---|---|---|---|
rf1 |
| |||
rf2 |
| |||
rf3 |
| |||
rf4 | (R) & (R) & (R) | |||
rf5 | (R) & (R) & (R) | |||
rf6 | (R) & (R) & (R) | |||
rf7 | (R) & (R) & (R) | |||
rf8 | (R) & (R) & (R) | |||
rf9 | (R) & (R) & (R) | |||
rf10 | (R) & (R) & (R) | |||
rf11 | (R) & (R) & (R) | |||
rf12 |
| |||
rf13 |
|
For models with syntactic reordering, we add two new features (i.e., one for the leftmost reordering model and the other for the rightmost reordering model) into the log-linear translation model in Eq. 1. Unlike the conventional phrase and lexical translation features, whose values are phrase pair-determined and thus can be calculated offline, the value of the reordering features can only be obtained during decoding time, and requires word alignment information as well. Before we present the algorithm integrating the reordering models, we define the following functions by assuming XP and XP are the constituent pair of interest in CFG rule cfg, is the translation hypothesis and is its word alignment:
: returns true if constituent XP is within the span from word to ; otherwise returns false.
returns true if the reordering of the pair XP, XP in rule cfg has not been calculated yet; otherwise returns false.
returns the leftmost and rightmost reordering types for the constituent pair XP, XP, given alignment , according to Section 3.
returns the probability of leftmost reordering type for the constituent pair XP, XP in rule cfg.
returns the probability of rightmost reordering type for the constituent pair XP, XP in rule cfg.
Algorithm 1 integrates the syntactic leftmost and rightmost reordering models into a CKY-style decoder whenever a new hypothesis is generated. Given a hypothesis with its alignment , it traverses all CFG rules in the parse tree and sees if two adjacent constituents are conditioned to trigger the reordering models (lines 2-4). For each pair of constituents, it first extracts its leftmost and rightmost reordering types (line 6) and then gets their respective probabilities returned by the maximum entropy classifiers defined in Section 3.1 (lines 7-8). Then the algorithm returns two log-probabilities of the syntactic reordering models. Note that Function returns true if hypothesis fully covers, or fully contains, constituent , regardless of the reordering type of . Do not confuse any parsing tag with the nameless variables in Hiero or cdec rules.
For the semantic reordering models, we also add two new features into the log-linear translation model. To get the two semantic reordering model feature values, we simply use Algorithm 1 and its associated functions from to replacing a CFG rule cfg with a PAS pas, and a constituent with a semantic role . Algorithm 1 therefore permits a unified treatment of syntactic and PAS-based reordering, even though it is expressed in terms of syntactic reordering here for ease of presentation.
Algorithm 1: Integrating the syntactic reordering models |
into a CKY-style decoder |
Input: Sentence in the source language |
Parse tree of |
All CFG rules in |
Hypothesis spanning from word to |
Alignment of |
Output: Log-Probabilities of the syntactic leftmost |
and rightmost reordering models |
1. set = = 0.0 |
2. foreach cfg in |
3. foreach pair XP and XP in cfg |
4. if = false or |
= false or |
= false |
5. continue |
6. (l_type, r_type) = |
7. += |
8. += |
9. return |
We have presented our unified approach to incorporating syntactic and semantic soft reordering constraints in an HPB system. In this section, we test its effectiveness in Chinese-English translation.
For training we use 1.6M sentence pairs of the non-UN and non-HK Hansards portions of NIST MT training corpora, segmented with the Stanford segmenter [33]. The English data is lowercased, tokenized and aligned with GIZA++ [27] to obtain bidirectional alignments, which are symmetrized using the grow-diag-final-and method [16]. We train a 4-gram LM on the English side of the corpus with 600M additional words from non-NYT and non-LAT, randomly selected portions of the Gigaword v4 corpus, using modified Kneser-Ney smoothing [1]. We use the HPB decoder cdec [7], with Mr. Mira [8], which is a -best variant of MIRA [3], to tune the parameters of the system.
We use NIST MT 06 dataset (1664 sentence pairs) for tuning, and NIST MT 03, 05, and 08 datasets (919, 1082, and 1357 sentence pairs, respectively) for evaluation.11http://www.itl.nist.gov/iad/mig//tests/mt We use BLEU [29] for both tuning and evaluation.
To obtain syntactic parse trees and semantic roles on the tuning and test datasets, we first parse the source sentences with the Berkeley Parser [30], trained on the Chinese Treebank 7.0 [43]. We then pass the parses to a Chinese semantic role labeler [22], trained on the Chinese PropBank 3.0 [42], to annotate semantic roles for all verbal predicates (part-of-speech tag VV, VE, or VC).
Our basic baseline system employs 19 basic features: a language model feature, 7 translation model features, word penalty, unknown word penalty, the glue rule, date, number and 6 pass-through features. Our stronger baseline employs, in addition, the fine-grained syntactic soft constraint features of Marton and Resnik [24], hereafter MR08. The syntactic soft constraint features include both MR08 exact-matching and cross-boundary constraints (denoted XP= and XP+). Since the syntactic parses of the tuning and test data contain 29 types of constituent labels and 35 types of POS tags, we have 29 types of XP+ features and 64 types of XP= features.
To train the syntactic and semantic reordering models, we use a gold alignment dataset.22This dataset includes LDC2006E86, and newswire parts of LDC2012T16, LDC2012T20, LDC2012T24, and LDC2013T05. Indeed, the reordering models can also be trained on the MT training data with its automatic alignment. However, our preliminary experiments showed that the reordering models trained on gold alignment yielded higher improvement. It contains 7,870 sentences with 191,364 Chinese words and 261,399 English words. We first run syntactic parsing and semantic role labeling on the Chinese sentences, then train the models by using MaxEnt toolkit with L1 regularizer [34].33http://www.logos.ic.i.u-tokyo.ac.jp/tsuruoka/maxent/ Table 3 shows the reordering type distribution over the training data. Interestingly, about 17% of the syntactic instances and 16% of the semantic instances differ in their leftmost and rightmost reordering types, indicating that the leftmost/rightmost distinction is informative. We also see that the number of semantic instances is about 1/3 of that of syntactic instances, but the entropy of the semantic reordering classes is higher, indicating the reordering of semantic roles is harder than that of syntactic constituents.
|
Syntactic | Semantic | ||||
---|---|---|---|---|---|---|
l-m | r-m | l-m | r-m | |||
M | 73.5 | 80.6 | 63.8 | 67.9 | ||
DM | 3.9 | 3.3 | 14.0 | 12.0 | ||
S | 19.5 | 13.2 | 13.1 | 10.7 | ||
DS | 3.2 | 3.0 | 9.1 | 9.5 | ||
#instance | 199,234 | 66,757 |
A deeper examination of the reordering model’s training data reveals that some constituent pairs and semantic role pairs have a preference for a specific reordering type (monotone or swap). In order to understand how well the MR08 system respects their reordering preference, we use the gold alignment dataset LDC2006E86, in which the source sentences are from the Chinese Treebank, and thus both the gold parse trees and gold predicate-argument structures are available. Table 4 presents examples comparing the reordering distribution between gold alignment and the output of the MR08 system. For example, the first row shows that based on the gold alignment, for , 16% are in monotone and 76% are in swap reordering. However, our MR08 system outputs 46% of them in monotone and and 50% in swap reordering. Hence, the reordering accuracy for is 54%. Table 4 also shows that the semantic reordering between core arguments and predicates (e.g., , ) has a less ambiguous pattern than that between adjuncts and other roles (e.g., , ), indicating the higher reordering flexibility of adjuncts.
|
Gold | MR08 output | |||||
M | S | M | S | acc. | |||
PP | VP | 16 | 76 | 46 | 50 | 54 | |
NP | LC | 26 | 74 | 58 | 42 | 50 | |
DNP | NP | 24 | 72 | 78 | 19 | 39 | |
CP | NP | 26 | 67 | 84 | 10 | 33 | |
NP | DEG | 39 | 61 | 31 | 69 | 66 | |
… | … | … | |||||
all | 81 | 13 | 79 | 14 | 80 | ||
|
Gold | MR08 output | |||||
M | S | M | S | acc. | |||
Pred | A1 | 84 | 6 | 82 | 9 | 72 | |
A0 | Pred | 82 | 11 | 79 | 8 | 75 | |
LOC | Pred | 17 | 30 | 36 | 25 | 49 | |
A0 | TMP | 35 | 25 | 61 | 6 | 45 | |
TMP | Pred | 30 | 22 | 49 | 19 | 43 | |
… | … | … | |||||
all | 63 | 13 | 73 | 9 | 64 |
Our first group of experiments investigates whether the syntactic reordering models are able to improve translation quality in terms of BLEU. To this end, we respectively add our syntactic reordering models into both the baseline and MR08 systems. The effect is shown in the rows of “+ syn-reorder” in Table 5. From the table, we have the following two observations.
Although the HPB model is capable of handling non-local phrase reordering using synchronous context free grammars, both our syntactic leftmost reordering model and rightmost model are still able to achieve improvement over both the baseline and MR08. This suggests that our syntactic reordering features interact well with the MR08 syntactic soft constraints: the XP+ and XP= features focus on a single constituent each, while our reordering features focus on a pair of constituents each.
There is no clear indication of whether the leftmost reordering model works better than the other. In addition, integrating both the leftmost and rightmost reordering models has limited improvement over a single reordering model.
System | Tuning | Test | ||||
MT06 | MT03 | MT05 | MT08 | Avg. | ||
Baseline | 34.1 | 36.1 | 32.3 | 27.4 | 31.9 | |
+ syn- reorder | l-m | 35.2 | 36.9 | 33.6 | 28.4 | 33.0 |
r-m | 35.2 | 37.2 | 33.7 | 28.6 | 33.2 | |
both | 35.6 | 37.1 | 33.6 | 28.8 | 33.1 | |
+ sem- reorder | l-m | 34.4 | 36.7 | 33.0 | 27.8 | 32.5 |
r-m | 34.5 | 36.7 | 33.1 | 27.8 | 32.5 | |
both | 34.5 | 37.0 | 33.6 | 27.7 | 32.8 | |
+syn+sem | 35.5 | 37.3 | 33.7 | 29.0 | 33.3 | |
MR08 | 35.6 | 37.4 | 34.2 | 28.7 | 33.4 | |
+ syn- reorder | l-m | 36.0 | 38.2 | 35.0 | 29.2 | 34.1 |
r-m | 36.0 | 38.1 | 34.8 | 29.2 | 34.0 | |
both | 35.9 | 38.2 | 35.3 | 29.5 | 34.3 | |
+ sem- reorder | l-m | 35.8 | 37.6 | 34.7 | 28.7 | 33.7 |
r-m | 35.8 | 37.4 | 34.5 | 28.8 | 33.6 | |
both | 35.8 | 37.6 | 34.7 | 28.8 | 33.7 | |
+syn+sem | 36.1 | 38.4 | 35.2 | 29.5 | 34.4 |
Our second group of experiments is to validate the semantic reordering models. Results are shown in the rows of “+ sem-reorder” in Table 5. Here we observe:
The semantic reordering models also achieve significant gain of 0.8 BLEU on average over the baseline system, demonstrating the effectiveness of PAS-based reordering. However, the gain diminishes to 0.3 BLEU on the MR08 system.
The syntactic reordering models outperform the semantic reordering models on both the baseline and MR08 systems.
Finally, we integrate both the syntactic and semantic reordering models into the final system. The two models collectively achieve a gain of up to 1.4 BLEU over the baseline and 1.0 BLEU over MR08 on average, which is shown in the rows of “+syn+sem” in Table 5.
The trend of the results, summarized as performance gain over the baseline and MR08 systems averaged over all test sets, is presented in Table 6. The syntactic reordering models outperform the semantic reordering models, and the gain achieved by the semantic reordering models is limited in the presence of the MR08 syntactic features. In this section, we look at MR08 system and the systems improving it to explore the behavior differences between the two reordering models.
Coverage analysis: Our statistics show that syntactic reordering features (either leftmost or rightmost) are called 24 times per sentence on average. This is compared to only 9 times per sentence for semantic reordering features. This is not surprising since the semantic reordering features are exclusively attached to predicates, and the span set of the semantic roles is a strict subset of the span set of the syntactic constituents; only 22% of syntactic constituents are semantic roles. On average, a sentences has 4 PASs and each PAS contains 3 semantic roles. Of all the semantic role pairs, 44% are in the same CFG rules, indicating that this part of semantic reordering has overlap with syntactic reordering. Therefore, the PAS model has fewer opportunities to influence reordering.
System | Baseline | MR08 |
---|---|---|
+syn-reorder | 1.2 | 0.9 |
+sem-reorder | 0.8 | 0.3 |
+ both | 1.4 | 1.0 |
Reordering accuracy analysis: The reordering type distribution on the reordering model training data in Table 3 suggests that semantic reordering is more difficult than syntactic reordering. To validate this conjecture on our translation test data, we compare the reordering performance among the MR08 system, the improved systems and the maximum entropy classifiers. For the test set, we have four reference translations. We run GIZA++ on the data combination of our translation training data and test data to get the alignment for the test data and each reference translation. Once we have the (semi-)gold alignment, we compute the gold reordering types between two adjacent syntactic constituents or semantic roles. Then we evaluate the automatic reordering outputs generated from both our translation systems and maximum entropy classifiers. Table 7 shows the accuracy averaged over the four gold reordering sets (the four reference translations). It shows that 1) as expected, our classifiers do worse on the harder semantic reordering prediction than syntactic reordering prediction; 2) thanks to the high accuracy obtained by the maxent classifiers, integrating either the syntactic or the semantic reordering constraints results in better reordering performance from both syntactic and semantic perspectives; 3) in terms of the mutual impact, the syntactic reordering models help improving semantic reordering more than the semantic reordering models help improving syntactic reordering; and 4) the rightmost models have a learnability advantage over the leftmost models, achieving higher accuracy across the board.
System | Syntactic | Semantic | ||
---|---|---|---|---|
l-m | r-m | l-m | r-m | |
MR08 | 75.0 | 78.0 | 66.3 | 68.5 |
+syn-reorder | 78.4 | 80.9 | 69.0 | 70.2 |
+sem-reorder | 76.0 | 78.8 | 70.7 | 72.7 |
+both | 78.6 | 81.7 | 70.6 | 72.1 |
Maxent Classifier | 80.7 | 85.6 | 70.9 | 73.5 |
Feature weight analysis: Table 8 shows the syntactic and semantic reordering feature weights. It shows that the semantic feature weights decrease in the presence of the syntactic features, indicating that the decoder learns to trust semantic features less in the presence of the more accurate syntactic features. This is consistent with our observation that semantic reordering is harder than syntactic reordering, as seen in Tables 3 and 7.
System | Syntactic | Semantic | ||
---|---|---|---|---|
l-m | r-m | l-m | r-m | |
+syn-reorder | 1.2 | 1.2 | - | - |
+sem-reorder | - | - | 0.7 | 0.9 |
+both | 1.2 | 1.0 | 0.5 | 0.4 |
System | MT 03 | MT 05 | MT 08 | Avg. | |
---|---|---|---|---|---|
Non- Oracle | MR08 | 37.4 | 34.2 | 28.7 | 33.4 |
+syn- reorder | 38.2 | 35.3 | 29.5 | 34.3 | |
+sem- reorder | 37.6 | 34.7 | 28.8 | 33.7 | |
+ both | 38.4 | 35.2 | 29.5 | 34.4 | |
Oracle | +syn- reorder | 39.2 | 35.9 | 29.6 | 34.9 |
+sem- reorder | 37.9 | 34.8 | 28.9 | 33.9 | |
+ both | 39.1 | 36.0 | 29.8 | 35.0 |
Potential improvement analysis: Table 7 also shows that our current maximum entropy classifiers have room for improvement, especially for semantic reordering. In order to explore the error propagation from the classifiers themselves and explore the upper bound for improvement from the reordering models, we perform an “oracle” study, letting the classifiers be aware of the “gold” reordering type between two syntactic constituents or two semantic roles, and returning a higher probability for the gold reordering type and a smaller one for the others (i.e., we set 0.9 for the gold reordering type, and let the other non-gold three types share 0.1). Again, to get the gold reordering type, we run GIZA++ to get the alignment for tuning/test source sentences and each of four reference translations. We report the averaged performance by using the gold reordering type extracted from the four reference translations. Table 9 compares the performance between the non-oracle and oracle settings. We clearly see that using gold syntactic reordering types significantly improves the performance (e.g., 34.9 vs. 33.4 on average) and there is still some room for improvement by building a better maximum entropy classifiers (e.g., 34.9 vs. 34.3). To our surprise, however, the improvement achieved by gold semantic reordering types is still small (e.g., 33.9 vs. 33.4), suggesting that the potential improvement of semantic reordering models is much more limited. And we again see that the improvement achieved by semantic reordering models is limited in the presence of the syntactic reordering models.
Syntax-based reordering: Some previous work pre-ordered words in the source sentences, so that the word order of source and target sentences is similar. The reordering rules were either manually designed [6, 36, 41, 18] or automatically learned [39, 12, 35, 15, 19], using syntactic parses. Li et al. [20] focused on finding the -best pre-ordered source sentences by predicting the reordering of sibling constituents, while Yang et al. [44] obtained word order by using a reranking approach to reposition nodes in syntactic parse trees. Both are close to our work; however, our model generates reordering features that are integrated into the log-linear translation model during decoding.
Another approach in previous work added soft constraints as weighted features in the SMT decoder to reward good reorderings and penalize bad ones. Marton and Resnik [24] employed soft syntactic constraints with weighted binary features and no MaxEnt model. They did not explicitly target reordering (beyond applying constraints on HPB rules). Although employing linguistically motivated labels in SCFG is capable of capturing constituent reorderings [5, 25], the rules are sparser than SCFG with nameless non-terminals (i.e., Xs) and soft constraints. Ge [11] presented a syntax-driven maximum entropy reordering model that predicted the source word translation order. Gao et al. [10] employed dependency trees to predict the translation order of a word and its head word. Huang et al. [13] predicted the translation order of two source words.44Note that they obtained the translation order of source word pairs by predicting the reordering of adjacent constituents, which was quite close to our work. Our work, which shares this approach, differs from their work primarily in that our syntactic reordering models are based on the constituent level, rather than the word level.
Semantics-based reordering: Semantics-based reordering has also seen an increase in activity recently. In the pre-ordering approach, Wu et al. [38] automatically learned pre-ordering rules from PAS. In the soft constraint or reordering model approach, Liu and Gildea [23] modeled the reordering/deletion of source-side semantic roles in a tree-to-string translation model. Xiong et al. [40] and Li et al. [21] predicted the translation order between either two arguments or an argument and its predicate. Instead of decomposing a PAS into individual units, Zhai et al. [45] constructed a classifier for each source side PAS. Finally in the post-processing approach category, Wu and Fung [37] performed semantic role labeling on translation output and reordered arguments to maximize the cross-lingual match of the semantic frames between the source sentence and the target translation. To our knowledge, their semantic reordering models were PAS-specific. In contrast, our model is universal and can be easily adopted to model the reordering of other linguistic units (e.g., syntactic constituents). Moreover, we have studied the effectiveness of the semantic reordering model in different scenarios.
Non-syntax-based reorderings in HPB: Recently we have also seen work on lexicalized reordering models without syntactic information in HPB [31, 14, 26]. The non-syntax-based reordering approach models the reordering of translation words/phrases while the syntax-based approach models the reordering of syntactic constituents. Although there are overlaps between translation phrases and syntactic constituents, it is reasonable to think that the two reordering approaches can work together well and even complement each other, as the linguistic patterns they capture differ substantially. Setiawan et al. [32] modeled the orientation decisions between anchors and two neighboring multi-unit chunks which might cross phrase or rule boundaries. Last, we also note that recent work on non-syntax-based reorderings in (flat) phrase-based models [2, 9] can also be potentially adopted to hpb models.
In this paper, we have presented a unified reordering framework to incorporate soft linguistic constraints (of syntactic or semantic nature) into the HPB translation model. The syntactic reordering models take CFG rules and model their reordering on the target side, while the semantic reordering models work with PAS. Experiments on Chinese-English translation show that the reordering approach can significantly improve a state-of-the-art hierarchical phrase-based translation system. We have also discussed the differences between the two linguistic reordering models.
There are many directions in which this work can be continued. First, the syntactic reordering model can be extended to model reordering among constituents that cross CFG rules. Second, although we do not see obvious gain from the semantic reordering model when the syntactic model is adopted, it might be beneficial to further jointly consider the two reordering models, focusing on where each one does well. Third, to better examine the overlap or synergy between our approach and the non-syntax-based reordering approach, we will conduct direct comparisons and combinations with the latter.
This research was supported in part by the BOLT program of the Defense Advanced Research Projects Agency, Contract No. HR0012-12-C-0015. Any opinions, findings, conclusions or recommendations expressed in this paper are those of the authors and do not necessarily reflect the view of DARPA. The authors would like to thank three anonymous reviewers for providing helpful comments, and also acknowledge Ke Wu, Vladimir Eidelman, Hua He, Doug Oard, Yuening Hu, Jordan Boyd-Graber, and Jyothi Vinjumur for useful discussions.