Open-Domain Textual Question Answering
Sanda Harabagiu
and Dan Moldovan
Department of Computer Science and Engineering, Southern Methodist University
Brief Description
Question Answering (QA) is a fast growing area of research and commercial interest. The problem of QA is to find answers to open-domain questions by searching a large collection of documents. Unlike Internet search engines, QA systems provide short, relevant answers to questions. The recent explosion of information available on the World Wide Web makes question answering a compelling framework for finding information that closely matches user needs. The success of QA services, like AskJeeves serves as proof of the popularity of this technique. Due to the fact that both questions and answers are expressed in natural language, QA methodologies deal with language ambiguities and incorporate NLP techniques. Several current NLP-based technologies are able to provide the framework that approximates the complex problem of answering questions from large collections of texts. Ideal QA systems should have good dialog understanding, rich knowledge bases and quality text mining methods. They will certainly incorporate common sense reasoning methods and use good approximations of world knowledge. Until we have these more advanced tools, we can approximate QA with NLP enhancements of IR and IE techniques. The tutorial presents the recent results in QA research and system implementations.Detailed Outline
- Introduction
- Problem definition
- Examples of questions and answers
- QA taxonomies
- QA system architectures
- Survey the most important system architecture features in TREC-8 QA (20 systems) and TREC-9 QA (28 systems)
- Present a generic QA system architecture
- Basic QA
- Question processing
- Document retrieval
- Answer extraction
- Answer ranking
- Accuracy performance
- Advanced QA
- Keyword selection
- Paragraph indexing
- Logic prover for answer extraction
- Answer correctness
- An introduction to answer fusion from several documents
- Interactive Q/A through Dialog
- Time performance
- Open issues in QA
- Briefly survey current research issues in QA such as multilinguality, context, knowledge acquisition for ontology construction that will be incorporated into the future QA systems.
- Concluding remarks
Motivation
Research in the area of open-domain Question Answering generates considerable interest from both the NLP community and the end-users of this technology. In 1999, for the first time, National Institute of Standards and Technology (NIST) has introduced a QA track as part of the already established TREC competition. In 1999 there were 20 participants in the QA competition and in 2000 the number increased to 28. The participants include university research groups, national research laboratories and small and large companies. The interest in QA is world wide as evidenced by the international participation in the TREC QA. Open-domain QA is a complex application that encompasses many aspects of NLP and AI. The current state of the art QA systems can produce answers only to simple questions. However, the complexity of QA systems increases from year to year. This increase in complexity is paralleled by a sustained QA research activity.