A central challenge in semantic parsing is handling the myriad ways in which knowledge base predicates can be expressed. Traditionally, semantic parsers are trained primarily from text paired with knowledge base information. Our goal is to exploit the much larger amounts of raw text not tied to any knowledge base. In this paper, we turn semantic parsing on its head. Given an input utterance, we first use a simple method to deterministically generate a set of candidate logical forms with a canonical realization in natural language for each. Then, we use a paraphrase model to choose the realization that best paraphrases the input, and output the corresponding logical form. We present two simple paraphrase models, an association model and a vector space model, and train them jointly from question-answer pairs. Our system ParaSempre improves state-of-the-art accuracies on two recently released question-answering datasets.
leftmargin=0cm,labelindent=0cm
We consider the semantic parsing problem of mapping natural language utterances into logical forms to be executed on a knowledge base (KB) [35, 36, 34, 20]. Scaling semantic parsers to large knowledge bases has attracted substantial attention recently [2, 1, 19], since it drives applications such as question answering (QA) and information extraction (IE).
Semantic parsers need to somehow associate natural language phrases with logical predicates, e.g., they must learn that the constructions “What does X do for a living?”, “What is X’s profession?”, and “Who is X?”, should all map to the logical predicate Profession. To learn these mappings, traditional semantic parsers use data which pairs natural language with the KB. However, this leaves untapped a vast amount of text not related to the KB. For instance, the utterances “Where is ACL in 2014?” and “What is the location of ACL 2014?” cannot be used in traditional semantic parsing methods, since the KB does not contain an entity ACL2014, but this pair clearly contains valuable linguistic information. As another reference point, out of 500,000 relations extracted by the ReVerb Open IE system [9], only about 10,000 can be aligned to Freebase [1].
In this paper, we present a novel approach for semantic parsing based on paraphrasing that can exploit large amounts of text not covered by the KB (Figure 1). Our approach targets factoid questions with a modest amount of compositionality. Given an input utterance, we first use a simple deterministic procedure to construct a manageable set of candidate logical forms (ideally, we would generate canonical utterances for all possible logical forms, but this is intractable). Next, we heuristically generate canonical utterances for each logical form based on the text descriptions of predicates from the KB. Finally, we choose the canonical utterance that best paraphrases the input utterance, and thereby the logical form that generated it. We use two complementary paraphrase models: an association model based on aligned phrase pairs extracted from a monolingual parallel corpus, and a vector space model, which represents each utterance as a vector and learns a similarity score between them. The entire system is trained jointly from question-answer pairs only.
Our work relates to recent lines of research in semantic parsing and question answering. Kwiatkowski et al. (2013) first maps utterances to a domain-independent intermediate logical form, and then performs ontology matching to produce the final logical form. In some sense, we approach the problem from the opposite end, using an intermediate utterance, which allows us to employ paraphrasing methods (Figure 2). Fader et al. (2013) presented a QA system that maps questions onto simple queries against Open IE extractions, by learning paraphrases from a large monolingual parallel corpus, and performing a single paraphrasing step. We adopt the idea of using paraphrasing for QA, but suggest a more general paraphrase model and work against a formal KB (Freebase).
We apply our semantic parser on two datasets: WebQuestions [1], which contains 5,810 question-answer pairs with common questions asked by web users; and Free917 [2], which has 917 questions manually authored by annotators. On WebQuestions, we obtain a relative improvement of 12% in accuracy over the state-of-the-art, and on Free917 we match the current best performing system. The source code of our system ParaSempre is released at http://www-nlp.stanford.edu/software/sempre/.
Our task is as follows: Given (i) a knowledge base , and (ii) a training set of question-answer pairs , output a semantic parser that maps new questions to answers via latent logical forms . Let denote a set of entities (e.g., BillGates), and let denote a set of properties (e.g., PlaceOfBirth). A knowledge base is a set of assertions (e.g., ). We use the Freebase KB [13], which has 41M entities, 19K properties, and 596M assertions.
To query the KB, we use a logical language called simple -DCS. In simple -DCS, an entity (e.g., Seattle) is a unary predicate (i.e., a subset of ) denoting a singleton set containing that entity. A property (which is a binary predicate) can be joined with a unary predicate; e.g., Founded.Microsoft denotes the entities that are Microsoft founders. In , an intersection operator allows us to denote the set of Seattle-born Microsoft founders. A reverse operator reverses the order of arguments: [PlaceOfBirth].BillGates denotes Bill Gates’s birthplace (in contrast to PlaceOfBirth.Seattle). Lastly, denotes set cardinality, in this case, the number of Microsoft founders. The denotation of a logical form with respect to a KB is given by . For a formal description of simple -DCS, see Liang (2013) and Berant et al. (2013).
We now present the general framework for semantic parsing via paraphrasing, including the model and the learning algorithm. In Sections 4 and 5, we provide the details of our implementation.
Given an utterance and the KB, we construct a set of candidate logical forms , and then for each generate a small set of canonical natural language utterances . Our goal at this point is only to generate a manageable set of logical forms containing the correct one, and then generate an appropriate canonical utterance from it. This strategy is feasible in factoid QA where compositionality is low, and so the size of is limited (Section 4).
We score the canonical utterances in with respect to the input utterance using a paraphrase model, which offers two advantages. First, the paraphrase model is decoupled from the KB, so we can train it from large text corpora. Second, natural language utterances often do not express predicates explicitly, e.g., the question “What is Italy’s money?” expresses the binary predicate CurrencyOf with a possessive construction. Paraphrasing methods are well-suited for handling such text-to-text gaps. Our framework accommodates any paraphrasing method, and in this paper we propose an association model that learns to associate natural language phrases that co-occur frequently in a monolingual parallel corpus, combined with a vector space model, which learns to score the similarity between vector representations of natural language utterances (Section 5).
We define a discriminative log-linear model that places a probability distribution over pairs of logical forms and canonical utterances , given an utterance :
where is the vector of parameters to be learned, and is a feature vector extracted from the input utterance , the canonical utterance , and the logical form . Note that the candidate set of logical forms and canonical utterances are constructed during the canonical utterance construction phase.
The model score decomposes into two terms:
where the parameters define the paraphrase model (Section 5), which is based on features extracted from text only (the input and canonical utterance). The parameters correspond to semantic parsing features based on the logical form and input utterance, and are briefly described in this section.
The parameters correspond to the following features adopted from Berant et al. (2013). For a logical form , we extract the size of its denotation . We also add all binary predicates in as features. Moreover, we extract a popularity feature for predicates based on the number of instances they have in . For Freebase entities, we extract a popularity feature based on the entity frequency in an entity linked subset of Reverb [22]. Lastly, Freebase formulas have types (see Section 4), and we conjoin the type of with the first word of , to capture the correlation between a word (e.g., “where”) with the Freebase type (e.g., Location).
As our training data consists of question-answer pairs , we maximize the log-likelihood of the correct answer. The probability of an answer is obtained by marginalizing over canonical utterances and logical forms whose denotation is . Formally, our objective function is as follows:
The strength of the regularizer is set based on cross-validation. We optimize the objective by initializing the parameters to zero and running AdaGrad [8]. We approximate the set of pairs of logical forms and canonical utterances with a beam of size 2,000.
We construct canonical utterances in two steps. Given an input utterance , we first construct a set of logical forms , and then generate canonical utterances from each . Both steps are performed with a small and simple set of deterministic rules, which suffices for our datasets, as they consist of factoid questions with a modest amount of compositional structure. We describe these rules below for completeness. Due to its soporific effect though, we advise the reader to skim it quickly.
# | Template | Example | Question |
---|---|---|---|
1 | Directed.TopGun | Who directed Top Gun? | |
2 | Employment.EmployerOf.SteveBalmer | Where does Steve Balmer work? | |
3 | Character.(Actor.BradPitt Film.Troy) | Who did Brad Pitt play in Troy? | |
4 | Type.Composer SpeakerOf.French | What composers spoke French? | |
5 | count | count(BoatDesigner.NatHerreshoff) | How many ships were designed by Nat Herreshoff? |
We consider logical forms defined by a set of templates, summarized in Table 1. The basic template is a join of a binary and an entity, where a binary can either be one property (#1 in the table) or two properties (#2). To handle cases of events involving multiple arguments (e.g., “Who did Brad Pitt play in Troy?”), we introduce the template (#3), where the main event is modified by more than one entity. Logical forms can be further modified by a unary “filter”, e.g., the answer to “What composers spoke French?” is a set of composers, i.e., a subset of all people (#4). Lastly, we handle aggregation formulas for utterances such as “How many teams are in the NCAA?” (#5).
To construct candidate logical forms for a given utterance , our strategy is to find an entity in and grow the logical form from that entity. As we show later, this procedure actually produces a set with better coverage than constructing logical forms recursively from spans of , as is done in traditional semantic parsing. Specifically, for every span of , we take at most 10 entities whose Freebase descriptions approximately match the span. Then, we join each entity with all type-compatible11Entities in Freebase are associated with a set of types, and properties have a type signature We use these types to compute an expected type for any logical form . binaries , and add these logical forms to (#1 and #2).
To construct logical forms with multiple entities (#3) we do the following: For any logical form , where has type signature , we look for other entities that were matched in . Then, we add the logical form , if there exists a binary with a compatible type signature , where is one of ’s types. For example, for the logical form Character.Actor.BradPitt, if we match the entity Troy in , we obtain Character.(Actor.BradPitt Film.Troy). We further modify logical forms by intersecting with a unary filter (#4): given a formula with some Freebase type (e.g., People), we look at all Freebase sub-types (e.g., Composer), and check whether one of their Freebase descriptions (e.g., “composer”) appears in . If so, we add the formula to . Finally, we check whether is an aggregation formula by identifying whether it starts with phrases such as “how many” or “number of” (#5).
On WebQuestions, this results in 645 formulas per utterance on average. Clearly, we can increase the expressivity of this step by expanding the template set. For example, we could handle superlative utterances (“What NBA player is tallest?”) by adding a template with an argmax operator.
Categ. | Rule | Example | |
---|---|---|---|
NP | WH has as NP ? | What election contest has George Bush as winner? | |
VP | WH (AUX) VP ? | What radio station serves area New-York? | |
PP | WH PP ? | What beer from region Argentina? | |
NP VP | WH VP the NP ? | What mass transportation system served the area Berlin? | |
NP | WH is the NP of ? | What location is the place of birth of Elvis Presley? | |
VP | WH AUX VP ? | What film is Brazil featured in? | |
PP | WH PP ? | What destination Spanish steps near travel destination? | |
NP VP | WH NP is VP by ? | What structure is designed by Herod? |
While mapping general language utterances to logical forms is hard, we observe that it is much easier to generate a canonical natural language utterances of our choice given a logical form. Table 2 summarizes the rules used to generate canonical utterances from the template . Questions begin with a question word, are followed by the Freebase description of the expected answer type (), and followed by Freebase descriptions of the entity () and binary (). To fill in auxiliary verbs, determiners, and prepositions, we parse the description into one of NP, VP, PP, or NP VP. This determines the generation rule to be used.
Each Freebase property has an explicit property equivalent to the reverse (e.g., ContainedBy and ). For each logical form , we also generate using equivalent logical forms where is replaced with . Reversed formulas have different generation rules, since entities in these formulas are in the subject position rather than object position.
We generate the description from the Freebase description of the type of (this handles #4). For the template (#2), we have a similar set of rules, which depends on the syntax of and and is omitted for brevity. The template (#3) is generated by appending the prepositional phrase in , e.g, “What character is the character of Brad Pitt in Troy?”. Lastly, we choose the question phrase “How many” for aggregation formulas (#5), and “What” for all other formulas.
We also generate canonical utterances using an alignment lexicon, released by Berant et al. (2013), which maps text phrases to Freebase binary predicates. For a binary predicate mapped from text phrase , we generate the utterance WH ?. On the WebQuestions dataset, we generate an average of 1,423 canonical utterances per input utterance . In Section 6, we show that an even simpler method of generating canonical utterances by concatenating Freebase descriptions hurts accuracy by only a modest amount.
Once the candidate set of logical forms paired with canonical utterances is constructed, our problem is reduced to scoring pairs based on a paraphrase model. The NLP paraphrase literature is vast and ranges from simple methods employing surface features [32], through vector space models [28], to latent variable models [6, 33, 29].
In this paper, we focus on two paraphrase models that emphasize simplicity and efficiency. This is important since for each question-answer pair, we consider thousands of canonical utterances as potential paraphrases. In contrast, traditional paraphrase detection [7] and Recognizing Textual Entailment (RTE) tasks [4] consider examples consisting of only a single pair of candidate paraphrases.
Our paraphrase model decomposes into an association model and a vector space model:
The goal of the association model is to determine whether and contain phrases that are likely to be paraphrases. Given an utterance , we denote by the span from token to token . For each pair of utterances , we go through all spans of and and identify a set of pairs of potential paraphrases , which we call associations. (We will describe how associations are identified shortly.) We then define features on each association; the weighted combination of these features yields a score. In this light, associations can be viewed as soft paraphrase rules. Figure 3 presents examples of associations extracted from a paraphrase pair and visualizes the learned scores. We can see that our model learns a positive score for associating “type” with “genres”, and a negative score for associating “is” with “play”.
We define associations in and primarily by looking up phrase pairs in a phrase table constructed using the Paralex corpus [10]. Paralex is a large monolingual parallel corpora, containing 18 million pairs of question paraphrases from wikianswers.com, which were tagged as having the same meaning by users. Paralex is suitable for our needs since it focuses on question paraphrases. For example, the phrase “do for a living” occurs mostly in questions, and we can extract associations for this phrase from Paralex. Paraphrase pairs in Paralex are word-aligned using standard machine translation methods. We use the word alignments to construct a phrase table by applying the consistent phrase pair heuristic [24] to all 5-grams. This results in a phrase table with approximately 1.3 million phrase pairs. We let denote this set of mined candidate associations.
Category | Description |
---|---|
Assoc. | |
? | |
? | |
and are synonyms? | |
and are derivations? | |
Deletions | Deleted lemma and POS tag |
For a pair , we also consider as candidate associations the set (represented implicitly), which contains token pairs such that and share the same lemma, the same POS tag, or are linked through a derivation link on WordNet [11]. This allows us to learn paraphrases for words that appear in our datasets but are not covered by the phrase table, and to handle nominalizations for phrase pairs such as “Who designed the game of life?” and “What game designer is the designer of the game of life?”.
Our model goes over all possible spans of and and constructs all possible associations from and . This results in many poor associations (e.g., “play” and “the”), but as illustrated in Figure 3, we learn weights that discriminate good from bad associations. Table 3 specifies the full set of features. Note that unlike standard paraphrase detection and RTE systems, we use lexicalized features, firing approximately 400,000 features on WebQuestions. By extracting POS features, we obtain soft syntactic rules, e.g., the feature “JJ N N” indicates that omitting adjectives before nouns is possible. Once associations are constructed, we mark tokens in and that were not part of any association, and extract deletion features for their lemmas and POS tags. Thus, we learn that deleting pronouns is acceptable, while deleting nouns is not.
To summarize, the association model links phrases of two utterances in multiple overlapping ways. During training, the model learns which associations are characteristic of paraphrases and which are not.
The association model relies on having a good set of candidate associations, but mining associations suffers from coverage issues. We now introduce a vector space (VS) model, which assigns a vector representation for each utterance, and learns a scoring function that ranks paraphrase candidates.
We start by constructing vector representations of words. We run the word2vec tool [23] on lower-cased Wikipedia text (1.59 billion tokens), using the CBOW model with a window of 5 and hierarchical softmax. We also experiment with publicly released word embeddings [17], which were trained using both local and global context. Both result in -dimensional vectors (). Next, we construct a vector for each utterance by simply averaging the vectors of all content words (nouns, verbs, and adjectives) in .
We can now estimate a paraphrase score for two utterances and via a weighted combination of the components of the vector representations:
where is a parameter matrix. In terms of our earlier notation, we have and , where unrolls a matrix into a vector. In Section 6, we experiment with equal to the identity matrix, constraining to be diagonal, and learning a full matrix.
The VS model can identify correct paraphrases in cases where it is hard to directly associate phrases from and . For example, the answer to “Where is made Kia car?” (from WebQuestions), is given by the canonical utterance “What city is Kia motors a headquarters of?”. The association model does not associate “made” and “headquarters”, but the VS model is able to determine that these utterances are semantically related. In other cases, the VS model cannot distinguish correct paraphrases from incorrect ones. For example, the association model identifies that the paraphrase for “What type of music did Richard Wagner Play?” is “What is the musical genres of Richard Wagner?”, by relating phrases such as “type of music” and “musical genres”. The VS model ranks the canonical utterance “What composition has Richard Wagner as lyricist?” higher, as this utterance is also in the music domain. Thus, we combine the two models to benefit from their complementary nature.
In summary, while the association model aligns particular phrases to one another, the vector space model provides a soft vector-based representation for utterances.
In this section, we evaluate our system on WebQuestions and Free917. After describing the setup (Section 6.1), we present our main empirical results and analyze the components of the system (Section 6.2).
We use the WebQuestions dataset [1], which contains 5,810 question-answer pairs. This dataset was created by crawling questions through the Google Suggest API, and then obtaining answers using Amazon Mechanical Turk. We use the original train-test split, and divide the training set into 3 random 80%–20% splits for development. This dataset is characterized by questions that are commonly asked on the web (and are not necessarily grammatical), such as “What character did Natalie Portman play in Star Wars?” and “What kind of money to take to Bahamas?”.
Dataset | # examples | # word types |
---|---|---|
Free917 | 917 | 2,036 |
WebQuestions | 5,810 | 4,525 |
The Free917 dataset contains 917 questions, authored by two annotators and annotated with logical forms. This dataset contains questions on rarer topics (for example, “What is the engine in a 2010 Ferrari California?” and “What was the cover price of the X-men Issue 1?”), but the phrasing of questions tends to be more rigid compared to WebQuestions. Table 4 provides some statistics on the two datasets. Following Cai and Yates (2013), we hold out 30% of the data for the final test, and perform 3 random 80%-20% splits of the training set for development. Since we train from question-answer pairs, we collect answers by executing the gold logical forms against Freebase.
We execute -DCS queries by converting them into SPARQL and executing them against a copy of Freebase using the Virtuoso database engine. We evaluate our system with accuracy, that is, the proportion of questions we answer correctly. We run all questions through the Stanford CoreNLP pipeline [30, 12, 18].
We tuned the regularization strength, developed features, and ran analysis experiments on the development set (averaging across random splits). On WebQuestions, without regularization, the number of non-zero features was 360K; regularization brings it down to 17K.
We compare our system to Cai and Yates (2013) (CY13), Berant et al. (2013) (BCFL13), and Kwiatkowski et al. (2013) (KCAZ13). For BCFL13, we obtained results using the Sempre package22http://www-nlp.stanford.edu/software/sempre/ and running Berant et al. (2013)’s system on the datasets.
Table 5 presents results on the test set. We achieve a substantial relative improvement of 12% in accuracy on WebQuestions, and match the best results on Free917. Interestingly, our system gets an oracle accuracy of 63% on WebQuestions compared to 48% obtained by BCFL13, where the oracle accuracy is the fraction of questions for which at least one logical form in the candidate set produced by the system is correct. This demonstrates that our method for constructing candidate logical forms is reasonable. To further examine this, we ran BCFL13 on the development set, allowing it to use only predicates from logical forms suggested by our logical form construction step. This improved oracle accuracy on the development set to 64.5%, but accuracy was 32.2%. This shows that the improvement in accuracy should not be attributed only to better logical form generation, but also to the paraphrase model.
Free917 | WebQuestions | |
---|---|---|
CY13 | 59.0 | – |
BCFL13 | 62.0 | 35.7 |
KCAZ13 | 68.0 | – |
This work | 68.5 | 39.9 |
Free917 | WebQuestions | |
---|---|---|
Our system | 73.9 | 41.2 |
–Vsm | 71.0 | 40.5 |
–Association | 52.7 | 35.3 |
–Paraphrase | 31.8 | 21.3 |
SimpleGen | 73.4 | 40.4 |
Full matrix | 52.7 | 35.3 |
Diagonal | 50.4 | 30.6 |
Identity | 50.7 | 30.4 |
Jaccard | 69.7 | 31.3 |
Edit | 40.8 | 24.8 |
WDDC06 | 71.0 | 29.8 |
We now perform more extensive analysis of our system’s components and compare it to various baselines.
We ablate the association model, the VS model, and the entire paraphrase model (using only logical form features). Table 5 shows that our full system obtains highest accuracy, and that removing the association model results in a much larger degradation compared to removing the VS model.
Our system generates relatively natural utterances from logical forms using simple rules based on Freebase descriptions (Section 4). We now consider simply concatenating Freebase descriptions. For example, the logical form [PlaceOfBirth].ElvisPresley would generate the utterance “What location Elvis Presley place of birth?”. Row SimpleGen in Table 6 demonstrates that we still get good results in this setup. This is expected given that our paraphrase models are not sensitive to the syntactic structure of the generated utterance.
Our system learns parameters for a full matrix. We now examine results when learning parameters for a full matrix , a diagonal matrix , and when setting to be the identity matrix. Table 6 (third section) illustrates that learning a full matrix substantially improves accuracy. Figure 4 gives an example for a correct paraphrase pair, where the full matrix model boosts the overall model score. Note that the full matrix assigns a high score for the phrases “official language” and “speak” compared to the simpler models, but other pairs are less interpretable.
We also compared our system to the following implemented baselines:
Jaccard: We compute the Jaccard score between the tokens of and and define to be this single feature.
Edit: We compute the token edit distance between and and define to be this single feature.
WDDC06: We re-implement 13 features from Wan et al. (2006), who obtained close to state-of-the-art performance on the Microsoft Research paraphrase corpus.33We implement all features that do not require dependency parsing.
Table 6 demonstrates that we improve performance over all baselines. Interestingly, Jaccard and WDDC06 obtain reasonable performance on Free917 but perform much worse on WebQuestions. We surmise this is because questions in Free917 were generated by annotators prompted by Freebase facts, whereas questions in WebQuestions originated independently of Freebase. Thus, word choice in Free917 is often close to the generated Freebase descriptions, allowing simple baselines to perform well.
We sampled examples from the development set to examine the main reasons ParaSempre makes errors. We notice that in many cases the paraphrase model can be further improved. For example, ParaSempre suggests that the best paraphrase for “What company did Henry Ford work for?” is “What written work novel by Henry Ford?” rather than “The employer of Henry Ford”, due to the exact match of the word “work”. Another example is the question “Where is the Nascar hall of fame?”, where ParaSempre suggests that “What hall of fame discipline has Nascar hall of fame as halls of fame?” is the best canonical utterance. This is because our simple model allows to associate “hall of fame” with the canonical utterance three times. Entity recognition also accounts for many errors, e.g., the entity chosen in “where was the gallipoli campaign waged?” is Galipoli and not GalipoliCampaign. Last, ParaSempre does not handle temporal information, which causes errors in questions like “Where did Harriet Tubman live after the civil war?”
In this work, we approach the problem of semantic parsing from a paraphrasing viewpoint. A fundamental motivation and long standing goal of the paraphrasing and RTE communities has been to cast various semantic applications as paraphrasing/textual entailment [4]. While it has been shown that paraphrasing methods are useful for question answering [15] and relation extraction [27], this is, to the best of our knowledge, the first paper to perform semantic parsing through paraphrasing. Our paraphrase model emphasizes simplicity and efficiency, but the framework is agnostic to the internals of the paraphrase method.
On the semantic parsing side, our work is most related to Kwiatkowski et al. (2013). The main challenge in semantic parsing is coping with the mismatch between language and the KB. In both Kwiatkowski et al. (2013) and this work, an intermediate representation is employed to handle the mismatch, but while they use a logical representation, we opt for a text-based one. Our choice allows us to benefit from the parallel monolingual corpus ParaLex and from word vectors trained on Wikipedia. We believe that our approach is particularly suitable for scenarios such as factoid question answering, where the space of logical forms is somewhat constrained and a few generation rules suffice to reduce the problem to paraphrasing.
Our work is also related to Fader et al. (2013), who presented a paraphrase-driven question answering system. One can view this work as a generalization of Fader et al. along three dimensions. First, Fader et al. use a KB over natural language extractions rather than a formal KB and so querying the KB does not require a generation step – they paraphrase questions to KB entries directly. Second, they suggest a particular paraphrasing method that maps a test question to a question for which the answer is already known in a single step. We propose a general paraphrasing framework and instantiate it with two paraphrase models. Lastly, Fader et al. handle queries with only one property and entity whereas we generalize to more types of logical forms.
Since our generated questions are passed to a paraphrase model, we took a very simple approach, mostly ensuring that we preserved the semantics of the utterance without striving for the most fluent realization. Research on generation [5, 26, 31, 25] typically focuses on generating natural utterances for human consumption, where fluency is important.
In conclusion, the main contribution of this paper is a novel approach for semantic parsing based on a simple generation procedure and a paraphrase model. We achieve state-of-the-art results on two recently released datasets. We believe that our approach opens a window of opportunity for learning semantic parsers from raw text not necessarily related to the target KB. With more sophisticated generation and paraphrase, we hope to tackle compositionally richer utterances.
We thank Kai Sheng Tai for performing the error analysis. Stanford University gratefully acknowledges the support of the Defense Advanced Research Projects Agency (DARPA) Deep Exploration and Filtering of Text (DEFT) Program under Air Force Research Laboratory (AFRL) contract no. FA8750-13-2-0040. Any opinions, findings, and conclusion or recommendations expressed in this material are those of the authors and do not necessarily reflect the view of the DARPA, AFRL, or the US government. The second author is supported by a Google Faculty Research Award.