1 / 18

Discovery Systems: Accelerating Scientific Discovery at NASA

Discovery Systems: Accelerating Scientific Discovery at NASA. Barney Pell, Ph.D. NASA Ames Research Center Barney.D.Pell @@ nasa.gov Presentation at IAAI-04 panel on The Broader Role of Artificial Intelligence in Large-Scale Scientific Research.

kitty
Download Presentation

Discovery Systems: Accelerating Scientific Discovery at NASA

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Discovery Systems: Accelerating Scientific Discovery at NASA Barney Pell, Ph.D. NASA Ames Research Center Barney.D.Pell @@ nasa.gov Presentation at IAAI-04 panel on The Broader Role of Artificial Intelligence in Large-Scale Scientific Research

  2. Trends and Challenges affecting Scientific Discovery at NASA Distributed Data Search, Access, and Analysis Machine-Assisted Model Discovery and Refinement Exploratory Environments and Collaboration Vision for the future and summary of AI technologies Closing remarks Outline of Talk

  3. Science Discovery Acceleration • NASA conducts missions to take measurements that produce large amounts of data to support ambitious science goals • In-situ observation of deep space for origin and evolution of life • Earth-orbiting satellites for global cause and effect relationships • Biological experiments to support life in space • Too much work and expertise required to perform each of many steps in a discovery cycle to understand this data • Detailed knowledge of the heritage of data and models • Hard to invert through a complex processing pipeline • Constant reprocessing and reanalyzing as new info available • The specialized expertise slows the process and also restricts the set of users and scientists using NASA products

  4. Discovery Steps and Architectures • Examples of discovery steps - finding and organizing distributed data - assessing, filtering, cleaning and post-processing the data - reconciling the differences across diverse data - exploring the data sets to discover regularities - using the regularities to formulate and evaluate hypotheses - testing the hypotheses and comparing alternate hypotheses against each other - integrating the data into models - linking separate models together - running simulations to generate predictive data to compare against observations • Current technology programs addressing difficulties of individual steps, typically in isolation • Eg. machine-learning algorithms detect regularities in underlying phenomena but also artifacts of the data collection/processing system. • ML algorithms developed without consideration of the deeper processes by which the data is generated, distributed, and used • Data system put together without characterizing the data stream to enable new users to analyze the data in unanticipated ways.

  5. Trends affecting NASA • Improvements in sensors, communications, and computing • orders of magnitude more data, in more varieties, and at higher rates than ever before. • NASA’s science questions are becoming increasingly large-scale and interdisciplinary. • forming and evaluating theories across a wide variety of data • integrating a complex set of models produced by diverse communities of scientists • virtual projects comprising distributed teams • Socioeconomic demands are requiring increased quality • Eg. many customers for weather and climate model predictions • Need characterization of confidence in data, models, results • Faster feedback loops in observing/simulation systems • make it possible to gather more precise data, often in real-time, if only we could understand the existing data quickly enough. • NASA required to enable public access and benefit from the data to the same extent as the mission science team

  6. Distributed Search, Access and Analysis • Objective • Develop and demonstrate technologies to enable investigating interdisciplinary science questions by finding, integrating, and composing models and data from distributed archives, pipelines; running simulations, and running instruments. • Support interactive and complex query-formulation with constraints and goals in the queries; and resource-efficient intelligent execution of these tasks in a resource-constrained environment. • Milestone: Enable novel what-if and predictive question answering • Across NASA’s complex and heterogeneous data and simulations • By non data-specialists • Use world-knowledge and meta-data • Support query formulation and resource discovery • Example query: “Within 20%, what will be the water runoff in the creeks of the Comanche National Grassland if we seed the clouds over southern Colorado in July and August next year?”

  7. Terrestrial Biogeoscience Involves Many Complex Processes and Data Chemistry CO2, CH4, N2O ozone, aerosols Climate Temperature, Precipitation, Radiation, Humidity, Wind Heat Moisture Momentum CO2 CH4 N2O VOCs Dust Minutes-To-Hours Biogeophysics Biogeochemistry Carbon Assimilation Aero- dynamics Decomposition Water Energy Mineralization Microclimate Canopy Physiology Phenology Hydrology Inter- cepted Water Bud Break Soil Water Days-To-Weeks Snow Leaf Senescence Evaporation Transpiration Snow Melt Infiltration Runoff Gross Primary Production Plant Respiration Microbial Respiration Nutrient Availability Species Composition Ecosystem Structure Nutrient Availability Water Years-To-Centuries Ecosystems Species Composition Ecosystem Structure WatershedsSurface Water Subsurface Water Geomorphology Disturbance Fires Hurricanes Ice Storms Windthrows Vegetation Dynamics Hydrologic Cycle (Courtesy Tim Killeen and Gordon Bonan, NCAR)

  8. evaporationmodel runoff model data preparation evaporati evaporati runoff mo runoff mo data preper data preper snow melt metadata snow melt metadata surface watercommunity surface watercommunity snow coverage snow and iceDAAC (NASA) topography Solution Construction via Composing Models modeledphenomenon service interface: required inputs,provided outputs, data descriptions,events climate model binary data streams snow melt metadata Each model typically has acommunity of experts thatdeal with the complexity of themodel and its environment surface watercommunity parameterizedphenomenon rainfall Nat. WeatherService modeledphenomenon modeledphenomenon USGS

  9. Virtual Data Grid Example Notify that exists LFN for  Need  PERSrequires   data and LFN Application: Three data types of interest:  is derived from ,  is derived from , which is primary data(interaction and and operations proceed left to right) Need  Have  Request  Need  Proceed? How to generate ( is at LFN) Estimate for generating   is known. Contact Materialized Data Catalogue. Need  Abstract Planner(for materializing data) Concrete Planner(generates workflow) MetadataCatalogue Need  Exact steps to generate  ResolveLFN Materialize with PERS Grid workflow engine PFN  ismaterializedat LFN Need tomaterialize  Virtual Data Catalogue(how to generate  and ) Grid compute resources Materialized Data Catalogue Data Grid replica services Inform that is materialized LFN = logical file name PFN = physical file name PERS = prescription for generating unmaterialized data Store an archival copy, if so requested. Record existence of cached copies. Grid storage resources As illustrated, easy to deadlock w/o QoS and SLAs.

  10. Machine assisted model discovery and refinement • Develop and demonstrate methods to • assist discovery of and fit physically descriptive models with quantifiable uncertainty for estimation and prediction • improve the use of observational or experimental data for simulation and assimilation applied to distributed instrument systems (e.g. sensor web) • integrate instrument models with physical domain modeling and with other instruments (fusion) to quantify error, correct for noise, improve estimates and instrument performance. • Eg. Metrics • 50% reduction in scientist time forming models • 10% reduction in uncertainty in parameter estimates or a 10% reduction in effort to achieve current accuracies • 10% reduction in computational costs associated with a forward model • ability to process data on the order of 1000s of dimensions • ability to estimate parameters from tera-scale data.

  11. Prediction of the 97/98 El Nino JFM 1998 Predicted Precipitation 1997 1999 A reasonable 15 month prediction of the 97/98 El Nino is achieved when ocean height, temperature and surface wind data are combined to initialize the model.

  12. Observing System of the Future • Partners • NASA • DoD • Other Govt • Commercial • International • Advanced Sensors • Information Synthesis • Access to Knowledge • Sensor Web User Community Information

  13. Exploratory Environments and Collaboration • Objective • Develop exploratory environments in which interdisciplinary and/or distributed teams visualize and interact with intelligently combined and presented data from such sources as distributed archives, pipelines, simulations, and instruments in networked environments. • Demonstrate that these environments measurably improve scientists’ capability to answer questions, evaluate models, and formulate follow-on questions and predictions.

  14. Multi-parameter Explorations

  15. Vision for future science

  16. Discovery Systems: AI Technology Elements • Distributed data search, access and analysis • Grid based computing and services • Information retrieval • Databases • Planning, execution, agent architecture, multi-agent systems • Knowledge representation and ontologies • Machine-assisted model discovery and refinement • Information and data fusion • Data mining and Machine learning • Modeling and simulation languages • Exploratory environments and Collaboration • Visualization • Human-computer interaction • Computer-supported collaborative work • Cognitive models of science

  17. NASA science is challenging Need to improve in existing capabilities and address emerging trends AI technologies have a crucial role for future science Distributed Data Search, Access, and Analysis Machine-Assisted Model Discovery and Refinement Exploratory Environments and Collaboration Many of these themes are shared with science (or research) at large Closing remarks

More Related