220 likes | 352 Views
Searching for the Quantifiable, Scalable, Verifiable, and Understandable. Quantitative Methods in Defense of National Security, 25 May 2010. Dewey Murdick, Ph.D. Program Manager. Intelligence Advanced Research Projects Activity (IARPA). Overview. This is about taking real risk.
E N D
Searching for the Quantifiable, Scalable, Verifiable, and Understandable Quantitative Methods in Defense of National Security, 25 May 2010 Dewey Murdick, Ph.D. Program Manager
Overview • This is about taking real risk. • This is NOT about “quick wins”, “low-hanging fruit”, “sure things”, etc. • CAVEAT: HIGH-RISK/HIGH-PAYOFF IS NOT A FREE PASS FOR STUPIDITY. • Competent failure is acceptable; incompetence is not. • “Best and brightest”. • World-class PMs. • IARPA will not start a program without a good idea and an exceptional person to lead its execution. • Full and open competition to the greatest possible extent. • Cross-community focus. • Address cross-community challenges • Leverage agency expertise (both operational and R&D) • Work transition strategies and plans IARPA’s mission is to invest in high-risk/high-payoff research programs that have the potential to provide the U.S. with an overwhelming intelligence advantage over our future adversaries
The “P” in IARPA is very important • Technical and programmatic excellence are required • Each Program will have a clearly defined and measurable end-goal, typically 3-5 years out. • Intermediate milestones to measure progress are also required • Every Program has a beginning and an end • A new program may be started that builds upon what has been accomplished in a previous program, but that new program must compete against all other new programs • This approach, coupled with rotational PM positions, ensures that… • IARPA does not “institutionalize” programs • Fresh ideas and perspectives are always coming in • Status quo is always questioned • Only the best ideas are pursued, and only the best performers are funded.
The “Heilmeier Questions” • What are you trying to do? • How does this get done at present? Who does it? What are the limitations of the present approaches? • Are you aware of the state-of-the-art and have you thoroughly thought through all the options? • What is new about your approach? Why do you think you can be successful at this time? • Given that you’ve provided clear answers to 1 & 2, have you created a compelling option? • What does first-order analysis of your approach reveal? • If you succeed, what difference will it make? • Why should we care? • How long will it take? How much will it cost? What are your mid-term and final exams? • What is your program plan? How will you measure progress? What are your milestones/metrics? What is your transition strategy?
The Three Strategic Thrusts (Offices) • Smart Collection: dramatically improve the value of collected data • Innovative modeling and analysis approaches to identify where to look and what to collect. • Novel approaches to access. • Innovative methods to ensure the veracity of data collected from a variety of sources. • Incisive Analysis: maximizing insight from the information we collect, in a timely fashion • Advanced tools and techniques that will enable effective use of large volumes of multiple and disparate sources of information. • Innovative approaches (e.g., using virtual worlds, shared workspaces) that dramatically enhance insight and productivity. • Methods that incorporate socio-cultural and linguistic factors into the analytic process. • Estimation and communication of uncertainty and risk. • Safe and Secure Operations: countering new capabilities of our adversaries that could threaten our ability to operate effectively in a networked world • Cybersecurity • Focus on future vulnerabilities • Approaches to advancing the "science" of cybersecurity, to include the development of fundamental laws and metrics • Quantum information science & technology
Program Manager Interest Areas by Office safe and secure operations incisive analysis smart collection 20 April 2010
Concluding Thoughts on IARPA • Technical Excellence & Technical Truth • Scientific Method • Peer/independent review • Full and open competition • We are looking for outstanding PMs. • How to find out more about IARPA: www.iarpa.gov
Conference on Technical Information Discovery, Extraction & Organization • Mark Heiligman, IARPA PM, Mile-wide, Mile-deep (M2) Exploration • Held October 28-29, 2008, consisted of talks, breakout sessions, and open discussion • Attended by 30+ researchers, business intelligence, and government participants • Facilitated an open and active discussion on current methods, challenges, and opportunities in: • Information Retrieval • Text Processing • Knowledge Discovery • Information Extraction • Social Network Analysis • Scientometrics • Information Visualization and • Closely related research domains • Goal: Drive technical innovation and explore novel applications in the area of systematically mining the global technical literature for useful and non-obvious information and insights This talk is a personal summary of the materials presented and discussed at the conference.
M2 Information Content • Formal Presentations • Mile-wide, Mile-deep, Mark Heiligman, IARPA • Information Retrieval, Scientometrics/Text Mining,and Literature-related Discovery and Innovation, Ron Kostoff, MITRE • From Knowledge Mapping to Innovation Evolution, Hsinchun Chen, University of Arizona • Machine Learning for Extraction, Integration and Mining of Research Literature, Andrew McCallum, University of Massachusetts Amherst • Information Retrieval:The Path Ahead, Jamie Callan, Carnegie Mellon University • Sentiment Analysis from User Forums, Ronen Feldman, Hebrew University • The Accuracy of a Map of Science: Measurement & Implications, Richard Klavans, SciTech Strategies, Inc • Document Classification Using Nonnegative Matrix Factorization, Michael W. Berry, University of Tennessee, Knoxville • Breakout Sessions & Open Discussion – richest idea content, and biggest contribution to what follows • MITRE Summary: • A Two-step Analytic-workshop Process For Identifying Promising Research Opportunities, by Ronald Kostoff et al.
Problems • Too Much Data / Diversity • Scale • Textual / Multimedia • Multilingual • Multiple Sources • Too Complex • Motivation (Create / Disseminate) • Topics / Domains (# / Connectedness) • Shared Intentionally or Not • Too Fast – Streaming Example for Technical Topics:Scientific Literature, Patents, Conference Proceedings, Talks, Technical Blogs, S&T News, Social Media, Experimental Data, Computational Models / Code, Forecasts, Corporate Filings, Government Funding, Policy, Public Opinion, etc.
Weak Signals in Context • Find weak signals • Use weak signals within context for • Finding connections • Anomaly detection/rare events • Cultural meaning / implications • Manage uncertainty • Development new standards for “ground truth”
Connecting Weak Signals • Automated Connection Making / Knowledge Discovery • Iterative information retrieval (IR), extraction (IE), and linkages identification • Leveraging previous relevancy judgments and feedback • Probabilistic linking of subjective qualities within text • Goal: find high-value, low-signature information in context Material processing method X may be interesting for property Y ! Intriguing Rumors, Uncertain Source Analyst w/Quantitative System Analyst Analyst
Enhancing Contextual Awareness • Automatically • Leverage element characteristics in connection building process • Focused information augmentation from secondary sources • Characterize and apply to analogous situations • Network Behaviors and Features • Assessments of subjectivity (e.g., theme, sentiment) • Goal: rapidly inform non-experts with context about a given area/issue Where does this nugget of information fit? S&T Literature www Context Analyst
Identifying Outliers, Rare Events • Automatically • Measuring and analyzing low-frequency indicators in group trends • Systematically identifying anomalies from records of interest and early-stage emerging technologies • Identifying rare events based on non-technical phrase association patterns • Extracting technical phrases of interest by targeting non-technical phrases such as sentiment, analysis, stylistics, etc. • Intelligent clustering techniques • Goal: Identify significant rare events Is Jim doing something illegal? Bank statements Analyst
Collaboration (Two Different Kinds) • Common playground facilitating: • Large-scale data sharing • Data discovery annotation • Error corrections • Multi-source integration • Recall of what has been done in the past • Measure collaboration • Recognize cultural differences • Discover key players • Process changes over time
Multilingual Methods • Need algorithms that can process, filter, and analyze multilingual data • Leverage domain-specific machine translation • Compare and contrast translated and multilingual data for improvements in queries, trends, etc. • Language translation is high cost • Translation is not enough to understand meaning in non-English text • Cultural information helps to understand social landscape, motivation, and production of scientists in S&T
No Black Boxes • No Algorithm black boxes • Shared environment for algorithm development • Success verifiable through indicator metrics • Output must be humanly comprehensible • Human comprehension metrics: • Number of potential associations • Number of dimensions simultaneously analyzed • Steps to finding information • Amount of time to digest information • Amount of information at time • Efficiency of user-driven tuning of level-of-detail • Algorithmic output exportable to interactive tools
User-Friendly Displays for Data Analysis • Interactive and multifaceted views of scientific landscape • Geo-location • Entity Networks • Topical Networks • Environments that provide both contextual awareness and visualizations • Contextual information (Wikipedia style) provided when user encounters unfamiliar term or concept • Interactive interfaces to pull out information
Metric Validation Processes • User studies and human labeling to verify data in information extraction(IE) and NLP is costly • Use hybrid methods (e.g., boosting) • Leverage automatically processed information from a external source to validate output • Automating identification of trusted sources to help validation process • Validate results with historical studies, knowledge of current state, and forecasts Serious Need for Novel Thinking
Things to Remember • Track Uncertainty • Indicator metrics • Weak signals • No black boxes • Human comprehensible output • Provide clear view of evaluation metrics • Gold standards • Ground truth
Take Action • Respond to an open BAA • Chat with a Program Manager (PM) • Come up with new ideas for programs, become a PM • Provide information to open RFIs