E N D
1. Dr. John D. Prange
AQUAINT Program Director
JPrange@nsa.gov
301-688-7092
http://www.ic-arda.org
25 March 2002 Advanced Question Answering:Plenty of Challenges to Go Around
2. Outline Introducing ARDA
Advanced Question Answering
There is Room for Multiple Approaches
The AQUAINT Program
Challenges from an AQUAINT Perspective
Some Final Thoughts . . .
Questions and Comments
3. Introducing ARDA
MISSION:
Incubate revolutionary R&D for the shared benefit of the Intelligence Community Don’t feel bad if you have not previously heard of ARDA. We are only about 1 and half years old.
The acronym ARDA stands for Advance Research and Development Activity in Information Technology. We are a joint Department of Defense and Intelligence Community organization that was established in December 1998.
We have a simply stated mission -- Incubate revolutionary R&D for the shared benefit of the Intelligence Community. Easily stated, but not as easily accomplished. But we are trying very hard.
ARDA has a modest yet significant budget. Not in the same league as DARPA or NSF, but few organizations are.
We are a very small operation -- A total of 6 government staff personnel + several SETA contractors who assist us with the execution of our R&D Programs.
Our office is currently located in the R&D Building of the National Security Agency at Fort George G. Meade.Don’t feel bad if you have not previously heard of ARDA. We are only about 1 and half years old.
The acronym ARDA stands for Advance Research and Development Activity in Information Technology. We are a joint Department of Defense and Intelligence Community organization that was established in December 1998.
We have a simply stated mission -- Incubate revolutionary R&D for the shared benefit of the Intelligence Community. Easily stated, but not as easily accomplished. But we are trying very hard.
ARDA has a modest yet significant budget. Not in the same league as DARPA or NSF, but few organizations are.
We are a very small operation -- A total of 6 government staff personnel + several SETA contractors who assist us with the execution of our R&D Programs.
Our office is currently located in the R&D Building of the National Security Agency at Fort George G. Meade.
4. What ARDA Does We originate and manage R&D programs
With fundamental impact on future operational needs and strategies
That demand substantial, long-term venture investment to spur risk-taking
That progress measurably toward mid-term and final goals
That take many forms and employ many delivery vehicles
5. How ARDA Interacts Community organizations
Plans, forecasts, oversight
Customer champions
Thrust panels / managers
R&D problem statements
Internal peer review
Industry and academia
Principal funding recipients
External peer review and staff ARDA may be small, but we utilize our IntelligenceCommunity partners to the maximum extent to:
Identify R&D Challenges that are worthy of ARDA’s investment and commitment of time, effort and funding.
Serve as Contracting Agents for the vast majority of individual R&D projects
Fully participate on R&D Thrust Panels which are chaired by the ARDA R&D Thrust Manager.
ARDA is especially interested in soliciting your assistance and involvement with our R&D Thrusts. Virtually all of our budget (minus necessary overhead and administrative costs) is used to fund R&D projects in Industry, Academia, and our National Labs.ARDA may be small, but we utilize our IntelligenceCommunity partners to the maximum extent to:
Identify R&D Challenges that are worthy of ARDA’s investment and commitment of time, effort and funding.
Serve as Contracting Agents for the vast majority of individual R&D projects
Fully participate on R&D Thrust Panels which are chaired by the ARDA R&D Thrust Manager.
ARDA is especially interested in soliciting your assistance and involvement with our R&D Thrusts. Virtually all of our budget (minus necessary overhead and administrative costs) is used to fund R&D projects in Industry, Academia, and our National Labs.
6. Where Is ARDA?
7. Current ARDA Programs When we talk about ARDA’s Advanced R&D Program, we must start with the participation of our IC partners. They provide us with their most challenging, long-term R&D problems. Working through our R&D Thrust Panel, we collectively has established three current R&D Thrusts.
Digital Networking
High Performance Computing
and the Thrust that you are most interested in:
Information Exploitation
In addition to these R&D Programs, ARDA also has Fellowship-like program in which we are attempting to encourage world class researchers and scholars from Industry, Academia and the National Labs, to spend a year working in one of the Research organizations within the IC.
In the time that I have left I want to concentrate on our Information Exploitation Thrust. My last slide will tell you how you can contact ARDA or myself for more information on any of our programs.
When we talk about ARDA’s Advanced R&D Program, we must start with the participation of our IC partners. They provide us with their most challenging, long-term R&D problems. Working through our R&D Thrust Panel, we collectively has established three current R&D Thrusts.
Digital Networking
High Performance Computing
and the Thrust that you are most interested in:
Information Exploitation
In addition to these R&D Programs, ARDA also has Fellowship-like program in which we are attempting to encourage world class researchers and scholars from Industry, Academia and the National Labs, to spend a year working in one of the Research organizations within the IC.
In the time that I have left I want to concentrate on our Information Exploitation Thrust. My last slide will tell you how you can contact ARDA or myself for more information on any of our programs.
8. Outline Introducing ARDA
Advanced Question Answering
There is Room for Multiple Approaches
The AQUAINT Program
Challenges from an AQUAINT Perspective
Some Final Thoughts . . .
Questions and Comments
9. Question Answering ala Gary Larson
10. Open Domain Factoid Question Answering
11. ARDA & DARPA co-sponsoring the Question Answering Track in the NIST’s organized Text Retrieval Conference (TREC) Program. (Starting with TREC-8 in Nov 1999)
TREC-10 Results (Nov 2001):
500- factual questions; About 50 questions had no answer in the TREC-10 Data sources; Used “Real” Questions
Data source: approx. 3 GByte database of ~980K news stories
36 US & international organizations participated; 92 separate runs evaluated
System output: top 5 regions
(50 bytes) in a single story believed to contain Answer to the given question TREC QA Track Results
12. Pilot EvaluationsTREC 10 QA Track The “List Task”
Sample Questions:
“Name 4 US cities that have a “Shubert” Theater”
“Name 30 individuals who served as a cabinet officer under Ronald Reagan”
Evaluation Metric: (Number of distinct instances divided by the target number of instances averaged over 25 questions)
Top System among 18 runs: Achieved 76% Accuracy
The “Context Task”
Sample Series of Questions:
“How many species of spiders are there?”
“How many are poisonous to humans?”
“What percentage of spider bites in the US are fatal?”
Evaluation Metric: Same as Main Task; 10 Series of Questions; 42 total Questions)
Top System: Found answer for 34 of the 42 total questions (81%)
13. “Ask Jeeves” Approach
14. Tailored Question Answering Approaches FAQ (Frequently Asked Questions)
Help Desks / Customer Service Phone Centers
Accessing Complex set of Technical Maintenance Manuals
Integrating QA in Knowledge Management and Portals
Wide variety of Other E-Business Applications
15. Structured Knowledge-Base Approach
16. AQUAINTAdvanced QUestion & Answering for INTelligence
17. AQUAINTAdvanced QUestion & Answering for INTelligence
18. Outline Introducing ARDA
Advanced Question Answering
There is Room for Multiple Approaches
The AQUAINT Program
Challenges from an AQUAINT Perspective
Some Final Thoughts . . .
Questions and Comments
19. ARDA’s newest major Info-X R&D Program
Envisioned as a high risk, long term R&D Program:
Phase I Fall 2001 - Fall 2003
Phase II Fall 2003 - Fall 2005
Phase III Fall/Winter 2005 - Fall/Winter 2007
Focus on Final Objective from start
Incrementally add media, data sources, & complexity of questions & answers during each phase
Each of AQUAINT’s 3 Phases:
Use Zero-Based, Open BAA-styled Solicitations
Focus on Key Research Objectives
Be Closely Linked to Parallel System Integration/Testbed Efforts & Data Collection/Preparation and Evaluation Efforts AQUAINT:ARDA’s Plan of Attack
20. AQUAINT:R&D Focused on Three Functional Components
21. Specifically Solicited Research Areas include:
1) Advanced Reasoning for Question Answering
2) Sharable Knowledge Sources
3) Content Representation
4) Interactive Question Answering Sessions
5) Role of Context
6) Role of Knowledge
7) Deep, Human Language Processing and Understanding AQUAINT:Cross Cutting/Enabling Technologies R&D Areas
22. AQUAINT:Separate, Coordinated Activities
23. AQUAINT:User Testbed / System Integration Pull together best available system components emerging from AQUAINT Program research efforts
Couple AQUAINT components with existing GOTS and COTS software
Develop end-to-end AQUAINT prototype(s) aimed at specific Operational QA environments
Government-led effort:
Directly Linked into Sponsoring Agency’s Technology Insertion Organizations
Close, working relationship with working Analysts
Provide external system development support
Mitre/Bedford will lead External System Integration / Testbed efforts
Plan to also utilize additional external researchers as Consultants / Advisors
24. AQUAINT:Data & Evaluation Issues Data
Start by Using Existing Data Collections
NIST’s TREC Text Corpora
Linguistic Data Consortium (LDC) Human Language Corpora (e.g. TDT, Switchboard, Call Home, Call Friend Corpora)
Existing Knowledge Bases and Other Structured Databases
Future Data Collection & Annotation and Question/Answer Key Development will be a major effort
Will likely use combined efforts of NIST and LDC
Evaluation
Build upon highly successful TREC Q&A Track Evaluations -- NIST has lead and is currently developing a Phased Evaluation Plan tied to AQUAINT Program Plans
Cooperate to maximum extent possible with DARPA’s RKF (Rapid Knowledge Formation) Program Evaluation Efforts
25. ARDA’s AQUAINT Partners
26. AQUAINT Program Contractors
27. AQUAINT Phase I Projects (Fall 01 - Fall 03) Total End-to-End Systems (6)
28. AQUAINT Phase I Projects (Fall 01- Fall 03)
29. AQUAINT Phase I Projects (Fall 01- Fall 03)
30. Northeast Regional Research Center Conduct 6-8 week workshops on multiple AQUAINT-related challenge problems during FY 2002
Sep 2001: Planning Workshop held at MITRE.
Attended by Government Technical Leaders, MITRE, and invited set of industrial, FFRDC and Academic researchers in the field
Four Potential Challenge Problems identified; Formal Proposals developed for each Challenge Problem
Two Full Workshops Funded (Temporal Issues & Multiple Perspectives)
One Mini Workshop to further explore challenge problem planned (Re-Use of Accumulated Knowledge)
31. FY2002 NRRC Wkshp Challenge Problems Temporal Issues
Generate Sequence of events and activities along evolving timeline, resolving multiple levels of time references across series of documents/sources.
Leader: James Pustejovsky, Brandeis University
Multiple Perspectives
Develop approaches for handling situations where relevant information is obtained from multiple sources on the same topic but generated from different perspectives (e.g. cultural or political differences).
Leader: Jan Wiebe, University of Pittsburgh
32. NRRC Planning Workshops Re-Use of Accumulated Knowledge
Investigate strategies for structuring and maintaining previously generated knowledge for possible future use. E.g. previous knowledge might include questions and answers (original and amplified) as well as relevant and background information retrieved and processed.
Leaders: Marc Light, MITRE and Abraham Ittycheriah, IBM
33. Supporting Roles
34. Outline Introducing ARDA
Advanced Question Answering
There is Room for Multiple Approaches
The AQUAINT Program
Challenges from an AQUAINT Perspective
Some Final Thoughts . . .
Questions and Comments
35. Top 10 Challenges
36. For ARDA and AQUAINT they are:
Intelligence Community and Military Analysts
But there are other Potential Target Audiences of “Professional Information Analysts”:
Investigative / “CNN-type” Reporters
Financial Industry Analysts / Investors
Historians / Biographers
Lawyers / Law Clerks
Law Enforcement Detectives
And Others Professional Information Analysts:Target Audience for AQUAINT -- Who are They?
37. They are far more than just casual users of information
They work in an information rich environment where they have access to large quantities of heterogeneous data
They are almost always subject matter experts within their assigned task areas
They track and follow a given event, scenario, problem, or situation for an extended period of time
They frequently have extensive collaboration with other analysts
They are focused on their assigned task or mission and will do whatever it takes to accomplish it
The end product that results from their analysis is often judged against the standards of:
Timeliness Accuracy Usability
Completeness Relevance Professional Information Analysts:What do They have in Common?
38. Top 10 Challenges 1) Satisfy QA requirements of the “Professional” Information Analyst
2) Pursue QA Scenarios and not just isolated, factually based QA
39. Implications of QA Scenarios Requires handling a Full Range of Complexity & Continuity of Questions
Need to understand & track the analysts’ line of reasoning and flow of argument
QA System requires significantly greater insight into knowledge, desires, past experiences, likes and dislikes of “Questioner”
Place much higher value on recognizing and capturing “background” information
Questioner/System dialogue is now more than just a means for clarification
40. AQUAINT:Intermediate Goals
41. Top 10 Challenges 1) Satisfy QA requirements of the “Professional” Information Analyst
2) Pursue QA Scenarios and not just isolated, factually based QA
3) Support a collaborative, multiple analyst environment
42. Collaboration within QA Standard Collaboration (From an Analyst Perspective)
Who else is working all or a portion of my task?
What do they know that I don’t and vice versa?
Can we share/work together? Non-Standard Discovery (From a System Perspective)
Identify previous QA Scenarios that have “similarity” to current QA Scenario. Compare & Contrast
Use / Build-on / Update previous results
Uncover new data sources
Borrow a successful “line of reasoning” or “argument flow”
Alerts analyst to different interpretations or to overlooked / undervalued data
43. Top 10 Challenges 1) Satisfy QA requirements of the “Professional” Information Analyst
2) Pursue QA Scenarios and not just isolated, factually based QA
3) Support a collaborative, multiple analyst environment
4) Some times SMALL things really matter and other times BIG things don’t
44. “Small & Big” - Can we tell the difference? Some times SMALL differences can produce significantly different results/interpretations:
Stop Words
“Books {by; for; about} kids”
Attachments
“The man saw the woman in the park with the telescope.”
Co-reference
“John {persuaded; promised} Bill to go. He just left.”
“Mary took the pill from the bottle. She swallowed it.”
Other times BIG differences can produce the same/similar results:
“Name the films in which Denzel Washington starred.”
“Denzel Washington played a leading role in which movies?”
“In what Hollywood productions did Denzel Washington receive top billing?”
45. Top 10 Challenges 1) Satisfy QA requirements of the “Professional” Information Analyst
2) Pursue QA Scenarios and not just isolated, factually based QA
3) Support a collaborative, multiple analyst environment
4) Some times SMALL things really matter and other times BIG things don’t
5) Advanced QA must attack the “Data Chasm”
46. Attacking the Data Chasm
47. Some Challenges: Alternative Wording *
48. Some Challenges: Synthesizing Info *
49. Some Challenges: Evolving Info *
50. Attacking the Data Chasm
51. AQUAINT:Data Types
52. AQUAINT:Data Types
53. AQUAINT:Phase I Data Dimensions
54. AQUAINT:Phase I Data Dimensions
55. Top 10 Challenges 1) Satisfy QA requirements of the “Professional” Information Analyst
2) Pursue QA Scenarios and not just isolated, factually based QA
3) Support a collaborative, multiple analyst environment
4) Some times SMALL things really matter and other times BIG things don’t
5) Advanced QA must attack the “Data Chasm”
6) Time is of the Essence
56. Time: Our Achilles Heel? The Obvious Timeliness Issue:
The timeliness of the system’s response to our question(s) -- we’ll need at least “near real time responses”
But Real Difficulties Still Exist in:
Extracting, correctly interpreting time references & then creating manageable timelines
Estimating & updating changing reliability of information over time
Processing information in time sequence e.g. Tracking the details of an evolving event over time -- A whole different set of problems
57. Temporal Issues Time References vary from precise to vague
Precise/Pinpointed: “0930-1030 hours 25 March 2002”
Vague: “Recently” or “a year or so ago” or “In my youth”
Nested Time References: (e.g. Within a Newspaper article)
Current time of Reader
Time Article was Published
Time of the Reported Event(s)
Time References into Past or Future
Temporally-based Questions are difficult because they refer to:
Temporal properties of the entities being questioned
Relative ordering of events in the world
Events that are mentioned in news articles, but which have not or did not occurred at all
58. Top 10 Challenges 7) Must extract, represent and preserve information uncovered when searching for answers
59. A Different Paradigm may be useful when handling QA Scenarios: Current Analytic Paradigm: QA Scenarios: A Different Paradigm?
60. Different Paradigm: “Casting a Net”
61. Top 10 Challenges 7) Must extract, represent and preserve information uncovered when searching for answers
8) Rapidly increasing importance of Knowledge of all types -- regardless of the approach
62. Complex QA:The Need for Ever Increasing Knowledge -- Of All Types
63. Increasing Knowledge Requirements Types of Knowledge Needed
Factual Knowledge & Linguistic Knowledge
Common Sense Knowledge & World Knowledge
Procedural Knowledge & Explanatory Knowledge
Domain Knowledge & Modal Knowledge
Tacit Knowledge
Etc.
Sources
Hand Crafted by experts; supplemented by end-users
Results from application of:
Learning algorithms
Bootstrapping / Hill-climbing Methods
Extracted from large data corpora
Obtained via “Re-Use”
64. WordNet Extensions * WordNet:
WordNet is a lexical database of English nouns, verbs, adjectives, and adverbs
Entries are lexicalized concepts that consist of one or more synonyms, a definitional gloss, and links to semantically related entries
Extensions: Moving towards WordNet 2.0
Derivational Connections:
Adding links between morphologically related nouns and verbs (e.g. digest and digestion)
Disambiguated Definitions:
Demonstrate and Demonstration each have multiple definitions – Adding links between meanings that match
Topical Connections:
Adding topical access by creating lists of lexicalized concepts that frequently co-occur in discussions of a given topic
65. Knowledge Evolution Tools * KB development requires knowledge evolution
Debugging, refining, structuring, modularizing, …
Power tools are needed to support KB evolution
KB diagnosis
Bugs, omissions, heuristic warnings, architectural advice
KB merging
To enable interoperation of KBs with overlapping content
KB partitioning
To enable effective reasoning
To produce reusable KB building blocks
66. Merging Knowledge Bases *
67. Using Knowledge within Advanced QA Systems * Use Formalized knowledge for:
Semantic understanding of queries;
Discovery of Answers by Reasoning;
Justification of answers;
Use Formalized knowledge as:
Format for data normalization
‘Glue’ for data integration of:
information extracted from unstructured data
SQL queries against structured DBs
Cyc’s knowledge
68. Discovery of Answers by Reasoning *
69. Where Knowledge-Systems Help * Heuristic of finding short passages with all the query terms/semantic classes is good but not sufficient. e.g. from TREC9:
70. Different Solution Approaches *What is the largest city in England? Text Match
Find text that says “London is the largest city in England” (or paraphrase). Confidence is confidence of NL parser * confidence of source. Find multiple instances and confidence of source -> 1.
“Superlative” Search
Find a table of English cities and their populations, and sort.
Find a list of the 10 largest cities in the world, and see which are in England.
Uses logic: if L > all objects in set R then L > all objects in set E < R.
Find the population of as many individual English cities as possible, and choose the largest.
Heuristics
London is the capital of England. (Not guaranteed to imply it is the largest city, but this is very frequently the case.)
Complex Inference
E.g. “Birmingham is England’s second-largest city”; “Paris is larger than Birmingham”; “London is larger than Paris”; “London is in England”.
71. Top 10 Challenges 7) Must extract, represent and preserve information uncovered when searching for answers
8) Rapidly increasing importance of Knowledge of all types -- regardless of the approach
9) Expanding requirements for more advanced learning and reasoning methods/approaches
72. Improved Reasoning & Learning
73. Improved Reasoning & Learning
74. Unsolved Problems Developing / Implementing a Detailed, Complex Plan to Solve the QA Task at Hand
Decomposing Complex Questions into a series / sequence of Simpler Questions whose Answers can be found
Selecting the appropriate sources to search
Knowing when No Answer is Available; Being able to then give a partial, incomplete answer
Giving understandable explanations of the “Plan”, the “Reasoning Used” and the “Answers Found”
75. Increased Emphasis on Planning * QA as Planning
Create a general QA planning system
How should a QA system represent its chain of reasoning?
QA and Auditability
How can we improve a QA system’s ability to justify its steps?
How can we make QA systems open to machine learning?
76. Utility Function Supports QA * Utility-Based Information Fusion
Perceived utility is a function of many different factors
Create and tune utility metrics, e.g.:
77. Planner * Specify planning representation
Identify decision points
Represent & manage uncertainty
Model states and operations
Model justification network
Find acceptable trade-offs:
Ratio of planning to execution
Answer utility vs. available resources
78. An Asymmetric Threat Scenario *
79. An Asymmetric Threat Scenario *
80. Computational Implicatures * The Problem
A professional analyst cannot separate his/her intentions and beliefs from the formulation of a question.
Sometimes the analyst makes a proposal or assertion.
Implied information, important for the interpretation of a question.
Not recognizable at syntactic or semantic level.
Determines the quality of answers returned by the Q/A system.
81. Example of Computational Implicature * “Will Prime Minister Mori survive the crisis?”
Implied belief: the position of Prime Minister is in jeopardy.
Problem: none of the question words indicate directly danger.
Question expected answer type: survival
Implicature: DANGER
82. Top 10 Challenges 7) Must extract, represent and preserve information uncovered when searching for answers
8) Rapidly increasing importance of Knowledge of all types -- regardless of the approach
9) Expanding requirements for more advanced learning and reasoning methods/approaches
10) Discovering the correct answer will be hard enough; but crafting an appropriate, articulate, succinct, explainable response will be even harder
83. Difficulties in Generating Answers Natural Language Generation continues to be a difficult, open research area.
Adding the requirement to generate multimedia answers makes this problem even harder.
Providing the ability to explain and/or justify answers also continues to be a difficult, open research area.
The more complex the line or chain of reasoning, the more complex the explanation and/or justification
QA Scenarios and differences across analysts add additional levels of complexity. The Same Question asked within different scenarios by different analysts could easily produce substantially:
Different Answer content
Different Answer format, structure, depth and/or breadth of coverage
Or both
84. Outline Introducing ARDA
Advanced Question Answering
There is Room for Multiple Approaches
The AQUAINT Program
Challenges from an AQUAINT Perspective
Some Final Thoughts . . .
Questions and Comments
85.
86. Five Final Thoughts Is ARDA and AQUAINT’s Vision for Advanced Question Answering Achievable? I strongly believe that it can be done. Maybe not exactly in the form envisioned and to the full extent hoped for. But having such a vision allows us to:
Identify key, strategic objectives
Attack the final goal simultaneously across a broad front and along multiple avenues
To take bigger R&D “steps” with greater confidence
87. Five Final Thoughts 2. Research is about discovering the unexpected. We must be willing to change direction and course to capitalize on our “discoveries”.
88. Five Final Thoughts 3. Failing is ok,
even expected;
It’s what we do
with our failures
that matters
89. Five Final Thoughts 4. We must not forget that the ultimate goal is to transfer research results into operational
use. So we need
to constantly strive to have a measurable,
practical impact.
90. Five Final Thoughts The Technical Challenges are many, and the Road towards our Final Vision may be long and bumpy. . . But the final results will make these struggles well worth the effort !
91. Contact Information Dr. John Prange, AQUAINT Program Director
ARDA Web Pages: http://www.ic-arda.org
Email arda@nsa.gov
JPrange@nsa.gov
Phones: 301-688-7092
800-276-3747
301-688-7410 (Fax)
Mailing: ARDA Room 12A69 NBP#1
STE 6644
9800 Savage Road Fort Meade, MD 20755-6644
92. Advanced Question Answering:Plenty of Challenges to Go Around
93. Dr. John D. Prange
AQUAINT Program Director
JPrange@nsa.gov
301-688-7092
http://www.ic-arda.org
25 March 2002 Advanced Question Answering:Plenty of Challenges to Go Around