220 likes | 228 Views
Explore IARPA's groundbreaking research programs driving intelligence excellence through strategic, measurable initiatives. Learn about high-risk/high-payoff projects and cutting-edge methodologies in defense technology.
E N D
Searching for the Quantifiable, Scalable, Verifiable, and Understandable Quantitative Methods in Defense of National Security, 25 May 2010 Dewey Murdick, Ph.D. Program Manager
Overview • This is about taking real risk. • This is NOT about “quick wins”, “low-hanging fruit”, “sure things”, etc. • CAVEAT: HIGH-RISK/HIGH-PAYOFF IS NOT A FREE PASS FOR STUPIDITY. • Competent failure is acceptable; incompetence is not. • “Best and brightest”. • World-class PMs. • IARPA will not start a program without a good idea and an exceptional person to lead its execution. • Full and open competition to the greatest possible extent. • Cross-community focus. • Address cross-community challenges • Leverage agency expertise (both operational and R&D) • Work transition strategies and plans IARPA’s mission is to invest in high-risk/high-payoff research programs that have the potential to provide the U.S. with an overwhelming intelligence advantage over our future adversaries
The “P” in IARPA is very important • Technical and programmatic excellence are required • Each Program will have a clearly defined and measurable end-goal, typically 3-5 years out. • Intermediate milestones to measure progress are also required • Every Program has a beginning and an end • A new program may be started that builds upon what has been accomplished in a previous program, but that new program must compete against all other new programs • This approach, coupled with rotational PM positions, ensures that… • IARPA does not “institutionalize” programs • Fresh ideas and perspectives are always coming in • Status quo is always questioned • Only the best ideas are pursued, and only the best performers are funded.
The “Heilmeier Questions” • What are you trying to do? • How does this get done at present? Who does it? What are the limitations of the present approaches? • Are you aware of the state-of-the-art and have you thoroughly thought through all the options? • What is new about your approach? Why do you think you can be successful at this time? • Given that you’ve provided clear answers to 1 & 2, have you created a compelling option? • What does first-order analysis of your approach reveal? • If you succeed, what difference will it make? • Why should we care? • How long will it take? How much will it cost? What are your mid-term and final exams? • What is your program plan? How will you measure progress? What are your milestones/metrics? What is your transition strategy?
The Three Strategic Thrusts (Offices) • Smart Collection: dramatically improve the value of collected data • Innovative modeling and analysis approaches to identify where to look and what to collect. • Novel approaches to access. • Innovative methods to ensure the veracity of data collected from a variety of sources. • Incisive Analysis: maximizing insight from the information we collect, in a timely fashion • Advanced tools and techniques that will enable effective use of large volumes of multiple and disparate sources of information. • Innovative approaches (e.g., using virtual worlds, shared workspaces) that dramatically enhance insight and productivity. • Methods that incorporate socio-cultural and linguistic factors into the analytic process. • Estimation and communication of uncertainty and risk. • Safe and Secure Operations: countering new capabilities of our adversaries that could threaten our ability to operate effectively in a networked world • Cybersecurity • Focus on future vulnerabilities • Approaches to advancing the "science" of cybersecurity, to include the development of fundamental laws and metrics • Quantum information science & technology
Program Manager Interest Areas by Office safe and secure operations incisive analysis smart collection 20 April 2010
Concluding Thoughts on IARPA • Technical Excellence & Technical Truth • Scientific Method • Peer/independent review • Full and open competition • We are looking for outstanding PMs. • How to find out more about IARPA: www.iarpa.gov
Conference on Technical Information Discovery, Extraction & Organization • Mark Heiligman, IARPA PM, Mile-wide, Mile-deep (M2) Exploration • Held October 28-29, 2008, consisted of talks, breakout sessions, and open discussion • Attended by 30+ researchers, business intelligence, and government participants • Facilitated an open and active discussion on current methods, challenges, and opportunities in: • Information Retrieval • Text Processing • Knowledge Discovery • Information Extraction • Social Network Analysis • Scientometrics • Information Visualization and • Closely related research domains • Goal: Drive technical innovation and explore novel applications in the area of systematically mining the global technical literature for useful and non-obvious information and insights This talk is a personal summary of the materials presented and discussed at the conference.
M2 Information Content • Formal Presentations • Mile-wide, Mile-deep, Mark Heiligman, IARPA • Information Retrieval, Scientometrics/Text Mining,and Literature-related Discovery and Innovation, Ron Kostoff, MITRE • From Knowledge Mapping to Innovation Evolution, Hsinchun Chen, University of Arizona • Machine Learning for Extraction, Integration and Mining of Research Literature, Andrew McCallum, University of Massachusetts Amherst • Information Retrieval:The Path Ahead, Jamie Callan, Carnegie Mellon University • Sentiment Analysis from User Forums, Ronen Feldman, Hebrew University • The Accuracy of a Map of Science: Measurement & Implications, Richard Klavans, SciTech Strategies, Inc • Document Classification Using Nonnegative Matrix Factorization, Michael W. Berry, University of Tennessee, Knoxville • Breakout Sessions & Open Discussion – richest idea content, and biggest contribution to what follows • MITRE Summary: • A Two-step Analytic-workshop Process For Identifying Promising Research Opportunities, by Ronald Kostoff et al.
Problems • Too Much Data / Diversity • Scale • Textual / Multimedia • Multilingual • Multiple Sources • Too Complex • Motivation (Create / Disseminate) • Topics / Domains (# / Connectedness) • Shared Intentionally or Not • Too Fast – Streaming Example for Technical Topics:Scientific Literature, Patents, Conference Proceedings, Talks, Technical Blogs, S&T News, Social Media, Experimental Data, Computational Models / Code, Forecasts, Corporate Filings, Government Funding, Policy, Public Opinion, etc.
Weak Signals in Context • Find weak signals • Use weak signals within context for • Finding connections • Anomaly detection/rare events • Cultural meaning / implications • Manage uncertainty • Development new standards for “ground truth”
Connecting Weak Signals • Automated Connection Making / Knowledge Discovery • Iterative information retrieval (IR), extraction (IE), and linkages identification • Leveraging previous relevancy judgments and feedback • Probabilistic linking of subjective qualities within text • Goal: find high-value, low-signature information in context Material processing method X may be interesting for property Y ! Intriguing Rumors, Uncertain Source Analyst w/Quantitative System Analyst Analyst
Enhancing Contextual Awareness • Automatically • Leverage element characteristics in connection building process • Focused information augmentation from secondary sources • Characterize and apply to analogous situations • Network Behaviors and Features • Assessments of subjectivity (e.g., theme, sentiment) • Goal: rapidly inform non-experts with context about a given area/issue Where does this nugget of information fit? S&T Literature www Context Analyst
Identifying Outliers, Rare Events • Automatically • Measuring and analyzing low-frequency indicators in group trends • Systematically identifying anomalies from records of interest and early-stage emerging technologies • Identifying rare events based on non-technical phrase association patterns • Extracting technical phrases of interest by targeting non-technical phrases such as sentiment, analysis, stylistics, etc. • Intelligent clustering techniques • Goal: Identify significant rare events Is Jim doing something illegal? Bank statements Analyst
Collaboration (Two Different Kinds) • Common playground facilitating: • Large-scale data sharing • Data discovery annotation • Error corrections • Multi-source integration • Recall of what has been done in the past • Measure collaboration • Recognize cultural differences • Discover key players • Process changes over time
Multilingual Methods • Need algorithms that can process, filter, and analyze multilingual data • Leverage domain-specific machine translation • Compare and contrast translated and multilingual data for improvements in queries, trends, etc. • Language translation is high cost • Translation is not enough to understand meaning in non-English text • Cultural information helps to understand social landscape, motivation, and production of scientists in S&T
No Black Boxes • No Algorithm black boxes • Shared environment for algorithm development • Success verifiable through indicator metrics • Output must be humanly comprehensible • Human comprehension metrics: • Number of potential associations • Number of dimensions simultaneously analyzed • Steps to finding information • Amount of time to digest information • Amount of information at time • Efficiency of user-driven tuning of level-of-detail • Algorithmic output exportable to interactive tools
User-Friendly Displays for Data Analysis • Interactive and multifaceted views of scientific landscape • Geo-location • Entity Networks • Topical Networks • Environments that provide both contextual awareness and visualizations • Contextual information (Wikipedia style) provided when user encounters unfamiliar term or concept • Interactive interfaces to pull out information
Metric Validation Processes • User studies and human labeling to verify data in information extraction(IE) and NLP is costly • Use hybrid methods (e.g., boosting) • Leverage automatically processed information from a external source to validate output • Automating identification of trusted sources to help validation process • Validate results with historical studies, knowledge of current state, and forecasts Serious Need for Novel Thinking
Things to Remember • Track Uncertainty • Indicator metrics • Weak signals • No black boxes • Human comprehensible output • Provide clear view of evaluation metrics • Gold standards • Ground truth
Take Action • Respond to an open BAA • Chat with a Program Manager (PM) • Come up with new ideas for programs, become a PM • Provide information to open RFIs