210 likes | 327 Views
The LEAD Gateway Dennis Gannon, Beth Plale, Suresh Marru, Marcus Christie School of Informatics Indiana University. Overview. The LEAD ITR Project Science Objectives Adaptive CyberInfrastructure for Mesoscale Storm Prediction A tour of the LEAD project
E N D
The LEAD GatewayDennis Gannon, Beth Plale, Suresh Marru, Marcus Christie School of InformaticsIndiana University Indiana University School of Informatics
Overview • The LEAD ITR Project • Science Objectives • Adaptive CyberInfrastructure for Mesoscale Storm Prediction • A tour of the LEAD project • Components of our approach to Data and Data Driven Adaptive Workflow • Experience so far. • The Gateway Lifecycle Indiana University School of Informatics
Predicting Storms • Hurricanes and tornadoes cause massive loss of life and damage to property • Underlying physical systems involve highly non-linear dynamics so computationally intense • Data comes from multiple sources • “real time” derived from streams of data from sensors • Archived in databases of past storms • Infrastructure challenges: • Data mine instrument radar data for storms • Allocate supercomputer resources automatically to run forecast simulations • Monitor results and retarget instruments. • Log provenance and metadata about experiments for auditing. Indiana University School of Informatics
The LEAD Project Indiana University School of Informatics
Traditional Methodology STATIC OBSERVATIONS Radar Data Mobile Mesonets Surface Observations Upper-Air Balloons Commercial Aircraft Geostationary and Polar Orbiting Satellite Wind Profilers GPS Satellites • Product Generation, • Display, • Dissemination Prediction/Detection PCs to Teraflop Systems • Analysis/Assimilation • Quality Control • Retrieval of Unobserved • Quantities • Creation of Gridded Fields The Process is Entirely Serial and Static (Pre-Scheduled): No Response to the Weather! • End Users • NWS • Private Companies • Students Indiana University School of Informatics
The LEAD Vision: Adaptive Cyberinfrastructure DYNAMIC OBSERVATIONS • Product Generation, • Display, • Dissemination Prediction/Detection PCs to Teraflop Systems • Analysis/Assimilation • Quality Control • Retrieval of Unobserved • Quantities • Creation of Gridded Fields Models and Algorithms Driving Sensors The CS challenge: Build cyberinfrastructure services that provide adaptability, scalability, availability, useability, and real-time response. • End Users • NWS • Private Companies • Students Indiana University School of Informatics
Change the Paradigm • To make fundamental advances we need: • Adaptivity in computational model. • But also Cyberinfrastructure to: • Execute complex scenarios in response to weather events • Stream processing, triggers • Close loop with the instruments. • Acquire computational resources on demand. • Need supercomputer-scale resources • Invoked in response to weather events • Deal with data deluge • User can no longer manage his/her own experiment products Indiana University School of Informatics
The LEAD Gateway Portal • To support three classes of users • Meteorology research scientists & grad students. • Undergrads in meteorology classes • People who want easy access to weather data. Go to: http://www.leadproject.org Indiana University School of Informatics
Gateway Components • A Framework for Discovery • Four basic components • Data Discovery • Catalogs and index services • The experiment • Computational workflow managing on-demand resources • Data analysis and visualization • Data product preservation, • automatic metadata generation and experimental data providence. Indiana University School of Informatics
Data Search • Select a region and a time range and desired attributes Indiana University School of Informatics
Portal: Experimental Data & Metadata Space • CyberInfrastructure extends user’s desktop to incorporate vast data analysis space. • As users go about doing scientific experiments, the CI manages back-end storage and compute resources. • Portal provides ways to explore this data and search and discover it. • Metadata about experiments is largely automatically generated, and highly searchable. • Describes data object (the file) in application-rich terms, and provides URI to data service that can resolve an abstract unique identifier to real, on-line data “file”. Indiana University School of Informatics
Workflow: Composing Computational Tools to build new Tools • Workflow is a term that describes the process of moving data through a sequence of analysis and transformational steps to achieve a goal. • Another Paradigm Shift for the users. • Each activity a user initiates in LEAD is an Experiment which consists of • Data discovery and collection. • Applied analysis and transformation • A graph of activities (workflow) • Curated data products and results • Each workflow activity is logged using an event system and stored as metadata in the users workspace. • Provides a complete provenance of work. Indiana University School of Informatics
The Experiment Builder • A Portal “wizzard” that leads the user through the set-up of a workflow • Asks the user: • “Which workflow do you want to run?” • Once this is know, it can prompt the user for the required input data sources • Then it “launches” the workflow. Indiana University School of Informatics
Parameter Selection Indiana University School of Informatics
Selecting the forecast region Indiana University School of Informatics
Indiana University School of Informatics
Gateway Support for Adaptive Queries LEAD requires ability to construct workflows that are • Data Driven • Weather data streams define nature of computation • Persistent and Agile • Data mining of data stream, detects “interesting” feature, event triggers workflow scenario that has been waiting for months. • Adaptive • In response to weather: weather changes. • Nature of workflow may have to change on-the-fly. • Resource and requirements change. Indiana University School of Informatics
Experience with on-demand computing • We use TeraGrid. • Actually “best effort” and not yet “on demand” • Use Grid technology for remote job execution and security. • Reliability is critical. • Workflow can automatically resubmit a failed task to another resource • Urgent Computing handled by the Spruce Gateway. Indiana University School of Informatics
Validating Scientific Discovery • The Gateway is becoming part of the process of science by being an active repository of data provenance • Disks are cheap, so why not record everything? • The system records each computational experiment that a user initiates • A complete audit trail of the experiment or computation • Published results can include link to provenance information for repeatability and transparency. Indiana University School of Informatics
Experience so far • First release to support “WxChallenge: the new collegiate weather forecast challenge” • The goal: “forecast the maximum and minimum temperatures, precipitation, and maximum sustained wind speeds for select U.S. cities. • to provide students with an opportunity to compete against their peers and faculty meteorologists at 64 institutions for honors as the top weather forecaster in the nation.” • 79 “users” ran 1,232 forecast workflows generating 2.6TBybes of data. • Over 160 processors were reserved on Tungsten from 10am to 8pm EDT(EST), five days each week • National Spring Forecast • First use of user initiated 2Km forecasts as part of that program. Generated serious interest from National Severe Storm Center. • Integration with CASA project scheduled for final year of LEAD ITR. Indiana University School of Informatics
The LEAD Gateway Lifecycle • Work began in 2003 with requirements analysis by the LEAD meteorology and CS teams. • First 2 years of development supported by LEAD ITR and NMI Portals project. • Year 3 & 4 support of 2 FTE from TG. • Public Release March 2007. • Current Status • A new production release in July 2007. • Last year of LEAD ITR: hardened version of the Gateway to transition to community support • UCAR - UNIDATA may be the host. • Extensive planning underway. Indiana University School of Informatics