510 likes | 636 Views
Geospatial Information, Fundamental Grid Challenges, and the Role of Standards Organizations. European Geoinformatics Workshop. Edinburgh, March 9, 2007 Dr. Craig A. Lee, lee@aero.org The Aerospace Corporation (a non-profit, federally funded R&D center). What’s the Motivation for All This?.
E N D
Geospatial Information, Fundamental Grid Challenges, and the Role of Standards Organizations European Geoinformatics Workshop Edinburgh, March 9, 2007 Dr. Craig A. Lee, lee@aero.org The Aerospace Corporation (a non-profit, federally funded R&D center)
What’s the Motivation for All This? • Geospatial data has immense practical value • Claim that large percentage of all data is geospatial in nature • Applicability across many domains • Service Architecture concept is gaining wide momentum • Natural concept for designing, deploying, using distributed systems • Managing access to data, machines -- resources of all kinds -- for geographically distributed users • A Service Architecture for Geospatial Data and Tools is a Clear Win • It is imperative to engage key stakeholders 2
But What’s the Larger Context? • Geospatial Systems part of larger Systems-of-Systems • Automatically detect, ingest, and disseminate input data events • Automatically analyze the events and known data • Automatically plan responses • Distributed execution of workflows to enact the response • Workflows dynamically respond to further events of interest • Secure, autonomous operation in an environment with only partial control and observability • Focus: Event-Driven Workflows or Dynamic Workflows • Events delivered to decision-making elements that need to know • Decision makers plan and modify responses according to policy • Workflows executed with distributed control in a dynamic env. • Dynamic, Data-Driven Application Systems 3
NSF OLD (serialized and static) NEW PARADIGM (Dynamic Data-Driven Simulation Systems) Motivation: DDDAS Simulations (Math.Modeling Phenomenology Observation Modeling Design) Theory (First Principles) Simulations (Math.Modeling Phenomenology) Theory (First Principles) Experiment Measurements Field-Data User Experiment Measurements Field-Data User Dynamic Feedback & Control Loop Challenges: Application Simulations Development Algorithms Computing Systems Support Frederica Darema, NSF 4
NSF Examplesof Applications benefiting from the new paradigm • Engineering (Design and Control) • aircraft design, oil exploration, semiconductor mfg, structural eng • computing systems hardware and software design (performance engineering) • Crisis Management • transportation systems (planning, accident response) • weather, hurricanes/tornadoes, floods, fire propagation • Medical • customized surgery, radiation treatment, etc • BioMechanics /BioEngineering • Manufacturing/Business/Finance • Supply Chain (Production Planning and Control) • Financial Trading (Stock Mkt, Portfolio Analysis) DDDAS has the potential to revolutionize science, engineering, & management systems 5
NSF Fire Model • Sensible and latent heat fluxes from ground and canopy fire -> heat fluxes in the atmospheric model. • Fire’s heat fluxes are absorbed by air over a specified extinction depth. • 56% fuel mass -> H20 vapor • 3% of sensible heat used to dry ground fuel. • Ground heat flux used to dry and ignite the canopy. • Coupled Models • Sensible and latent heat • Fire Propagation • Atmospheric Dynamics Kirk Complex Fire. U.S.F.S. photo 6 Slide Courtesy of Cohen/NCAR
Atmospheric Model Fire Prop. Model Combustion Model Forest Fires in the Context of a Sensor Network Policy, Planning, Response Fire Fighters Kirk Complex Fire. U.S.F.S. photo 7
Economic Modeling andWell Management Production Forecasting Well Management Reservoir Performance Simulation Models Visualization Data Analysis Multiple Realizations Field Measurements Data Management and Manipulation Reservoir Monitoring Field Implementation Data Collections from Simulations and Field Measurements 8
The NGS Program developsTechnology for integrated feedback & control Runtime Compiling System (RCS) and Dynamic Application Composition tac-com fire cntl alg accelerator …. data base data base fire cntl SAR MPP NOW SP Application Model Dynamic Analysis Situation Distributed Programming Model Application Program Compiler Front-End Application Intermediate Representation Compiler Back-End Launch Application (s) Performance Measuremetns & Models Dynamically Link & Execute Application Components & Frameworks Distributed Computing Resources Distributed Platform Adaptable computing Systems Infrastructure F. Darema, NSF 9
A DDDAS Model(Dynamic, Data-Driven Application Systems) Discover, Ingest, Interact Models Discover, Ingest, Interact Computations Loads a behavior into the infrastructure sensors & actuators sensors & actuators sensors & actuators Cosmological: 10e-20 Hz. Humans: 3 Hz. Computational Infrastructure (grids, perhaps?) Subatomic: 10e+20 Hz. Spectrum of Physical Systems 10
Top-Level Concept: Integration of Event Notification and Workflow Policy Decision Maker Communication Domain Sensed Events Decision Maker Decision Maker Abstract Plan discovery Response Resource Info and Mgmt Service Concrete Action register 11
Top-Level Concept Policy Content-Based Routing Domain Decision Maker Communication Domain Sensed Events Decision Maker Decision Maker Abstract Plan discovery Response Resource Info and Mgmt Service Concrete Action register 12
Top-Level Concept Policy Decision Maker Communication Domain Persistent Decision-making Computations Determined by Policy Sensed Events Decision Maker Decision Maker Abstract Plan discovery Response Resource Info and Mgmt Service Concrete Action register 13
Top-Level Concept Policy Decision Maker Communication Domain Sensed Events Decision Maker Decision Maker Abstract Plan discovery Response Resource Info and Mgmt Service Concrete Action register Grid Information Service 14
Top-Level Concept Policy Decision Maker Communication Domain Sensed Events Decision Maker Decision Maker Abstract Plan Dynamic Grid Workflow Management discovery Response Resource Info and Mgmt Service Concrete Action register 15
Required Capabilities • Events delivered to decision-making elements that need to know • Event Notification Service Managed by Publish/Subscribe • Pre-defined Topics • Publication Advertisements • User-defined Attributes • Content-Based Routing – Topology-Aware Communication • Decision makers plan responses as determined by policy • Semantic analysis to determine the “meaning” of sets of events • Planning - “path construction” from current state to goal state • Classic topics in Artificial Intelligence • Resource Information & Management Systems • Distributed, Scalable, Timely • Metadata schemas, Ontologies • Responses executed as distributed workflows • Workflow Engine independently manages • Scheduling of Data Transfer • Scheduling of Process Execution • Centralized vs. Distributed 16
General Architecture for Topology-Aware Communication Services Peer-to-Peer Network Events are published to the P2P network which are then routed to subscribers Subscription “signals” propagate through the P2P Network 17
Many Types of Communication Services Improved or Enabled • Augmented Semantics • Caching (web caching), filtering, compression, encryption, quality of service, data-transcoding, etc. • Collective Operations • Accomplished “in the network” rather than using point-to-point msgs across the diameter of the grid • Communication Scope • Named topologies can denote a communication scope to limit problem size and improve performance • Content and Policy-based networking • Publish/subscribe, interest management, event services, tuple spaces, quality of service • Issues • Topology management/construction • Dynamic member join/departure • Reliability • Maintaining distributed state in the network • Security • Integrity, authentication, authorization of signaling messages 18
Grid Workflow Management • Organization of distributed computing services • Rather than building applications with ad hoc, "hard-coded“ task organization, workflow provides a general mechanism for distributed task organization • Independent scheduling of data transfer and process execution • Key Capability for all Workflow tools • Subsequent task may not exist when previous task completes • Where subsequent task is to execute may not even be decided • Output data may have to be buffered until it is needed/can be used 19
Workflow Mgmt Considerations • Representation • Graphical (DAGs), Syntactic (code, XML) • Creation • Eager vs. lazy binding of service to physical resources • Eager vs. lazy binding of workflow to service • Co-Scheduling vs. Incremental Scheduling • Data Transfer • Streaming • Buffered channel • File Transfer • Data Persistence and Lifetime • How long does the data live where it is? • Workflow Engine – executes the workflow • Centralized? (“orchestration”) • Decentralized? (“choreography”) 20
Combining Events and Workflow: Dynamic Event-Driven Workflows • Besides events precipitating an initial response workflow, subsequent events may alter an existing workflow that is underway • Current amount of workflow completed must be determined • Current tasks on the “leading edge” of the workflow must be terminated or allowed to complete • Status and disposition of data referenced by tasks must be determined • “Classical” storage management issues reoccur • Dangling references to no data or stale data • Unaccessible data referenced by no one • Such event-driven task mgmt is similar to fault tolerance • Similar mechanisms could be used to detect and respond to faults (failed servers, networks, etc.) • Directly Supports DDDAS Concept 21
Responding to Events under Centralized Workflow Control Event Subscription Event Notification What is State of Workflow When Event Received? Client Making Decision (Centralized Control) • Possible Actions after Event: • Do Nothing • Cancel Entire Workflow • Cancel Part of Workflow • Conditional Workflow • How is Workflow Executed? • Client statically decides workflow services and servers prior to start-time • Client incrementally decides services and servers during run-time 22
In General, Nested or Recursive Workflows will be Possible Event Subscription Event Notification What is State of Workflow When Event Received? Client Making Decision (Centralized Control) Even if Control is Centralized, Client May Not Know Entire Workflow State 23
Avoiding Single Point of Failure: Decentralized Workflow Control Event Subscription Event Notification Client Making Decision (Decentralized Control) • Workflow Representation passed among workflow services • Initiating Client does not explicitly manage each service • Nested, recursive workflows still possibly 24
Responding to Events under Decentralized Workflow Control Event Notification • Currently active workflow agents subscribe to appropriate event topics • Workflow agents may need to find and coordinate with their active collaborators 25
Programming Decentralized Workflows? • “Process programming” in a distributed environment • Example: Little-JIL • Agent Coordination Language • A coordination tree with four non-leaf operations • sequential, parallel, try, choice • Other possibilities? • Stream-based languages? • Dataflow languages? • Decentralized Workflows similar to Active Networks, Active Agents and Active Messages • “Programming the message, not the node” • Autonomic behavior • If peer agent fails, agent will have to infer workflow repair to reach goal state 26
Use of A Priori Information in GridsKnowable independently of experience Task-Define-time Task-Run-time • Expects the world to have certain properties or be in a known state • Semantic translation tools can be used, i.e., compilers • Entire code units can be examined, analyzed, optimized • Static information compiled-in • Everything that can be statically defined a priori takes complexity out of the application and improves performance Start Time • Increasing use of a posteriori information learned from experience • Capturing more information about a running app and the environment • More and more dynamic late binding • “Smart” run-time • “Smart back-end” of a compiler • Limited control and imperfect knowledge of the environment • Must apply reasoning to what is semantically understandable 27
Future Generation Grids:We Are Being Pushed Into… • Dynamic discovery, late binding • How little a priori knowledge can be "compiled-in"? • Resource virtualization • Performance penalty for deciding everything dynamically • Autonomic Control Cycle Occurs Everywhere • Monitor, Understand, Plan, Respond • Fault Tolerance/Recovery • Real-Time/Physical System Monitoring & Interaction • Dynamic configuration (late discovery, binding) • Anytime a goal state must be reached • Planning is a classic AI capability • Chaining of "moves" to get from current state to goal state • Inferencing on known and discoverable facts • Done in environment with imperfect knowledge and limited control • If plan fails, replan and try again • Declarative programming techniques • Programming the “What”, not the “How” • Geosemantics is an archetypal example of this fundamental challenge to grid computing • Advances made in this field should be understood, and hopefully generalized, for this wider context • Interdisciplinary approach 28
How Do We Make Progress on these Fundamental Challenges? • Research • Organized Research • Governmental funding agencies • Organized, Interdisciplinary Research • Getting the right fields of expertise to collaborate • Organized Adoption • Open Grid Forum (OGF) • World Wide Web Consortium (W3C) • Organization for the Adv. of Structured Information Standards (OASIS) • Distributed Management Task Force (DMTF) • Storage Networking Industry Association (SNIA) • Tele Management Forum (TMF) • Internet Engineering Task Force (IETF) • International Telecommunication Union (ITU-T) 29
Key Technical Areas • Security • How to manage grid identity and access • Metadata and Ontologies • How to define the relevant information architecture • Data Discovery and Management • How to manage the location and access to cached and replicated data • Semantics • How to use the meaning of data to produce information • Service Architectures • How to integrate and manage all resources as a whole and provide dynamic, transparent access 30
Security • Security Capabilities • Authentication, Authorization, Privacy, Integrity, Non-Repudiation • Authentication • Evolving to combination of GSI, Kerberos and Shibboleth • Authorization • Databases (VOMS and Permis) • Role-based (TeraGrid, OSG) • WS-Security • Performance is an issue • Delegation of Trust -- Delegation of Identity • Identity is also dependent on role in a Virtual Organization • Identity has a structure 31
Metadata and Ontologies • Metadata – data about data, e.g., • Federal Geographic Data Committee, Content Standard on Digital Geospatial Metadata • GML 3.0 (Geographic Markup Language) • ISO Standards • ISO 19115:2003 Metadata • ISO 19115.2 Metadata-Part 2: Extensions for Imagery and Gridded Data (within two years) • ISO 19119:2005 Services • ISO 19130 Sensor Model and Data Model for Imagery (within two years) • Ontologies • Needed to capture process behavior, spatial/temporal characteristics, data and process relationships • Need to be more than just keyword lists for classification • OWL: Web Ontology Language • Semantic markup language for publishing and sharing ontologies on the web • OWL ontology: description of classes, properties and their instances • OWL-S: web service ontology • Are GML, ISO standards and OWL sufficient for geospatial representation and reasoning? • (No!) 32
Data Discovery and Management • Data (and services) must be published in a registry to be discoverable • Metadata and Ontologies are essential • UDDI is generally considered to be inadequate • Not scalable, poor semantics for application data • Combined catalogue and storage management • Storage Resource Broker (SRB, SDSC) • SRB MCAT (Metadata Catalogue) used to manage access across multiple remote sites • OGSA-DAI (Open Grid Forum) • Open Grid Service Architecture-Data Access and Integration • Web service access to files, databases • Globus Data Replication Services • Built to support high-energy physics projects • Controls pushing of data closer to key consumers • Enables user to choose “closest” replica • Storage Networking community driving to storage virtualization • E.g., Amazon S3 (Simple Storage Service) 33
Semantics: Enabling Intelligence • Automated Systems Only Possible with Well-Known Semantics • Environmental Decision Systems • Emergency Decision Systems • SWRL: Semantic Web Rule Language • Extension to OWL • Adds parts of RuleML into OWL • Extends OWL axioms to Horn-like clauses • Will this be sufficient? 34
WS-* Specification Area Examples 1: Core Service Model XML, WSDL, SOAP 2: Service Internet WS-Addressing, WS-MessageDelivery; Reliable Messaging WSRM 3: Notification WS-Notification, WS-Eventing (Publish-Subscribe) 4: Workflow and Transactions BPEL, WS-Choreography, WS-Coordination 5: Security WS-Security, WS-Trust, WS-Federation, SAML, WS-SecureConversation 6: Service Discovery UDDI, WS-Discovery 7: System Metadata and State WSRF, WS-MetadataExchange, WS-Context 8: Management WSDM, WS-Management, WS-Transfer 9: Policy and Agreements WS-Policy, WS-Agreement 10: Portals and User Interfaces WSRP (Remote Portlets) Service Architectures: Key Capability Areas Covered by Core WS-* Specs ConsensusMerging Developing 35 B&W table courtesy of Fox, Ho, Pierce – U. Indiana
Issues from an Organizational Perspective • General Consensus Only on WS Basic Building Blocks • Must avoid vendor-specific solutions – Adopt vendor-neutral approach • Adoption Roadmap and Timetable? • Much Work Remains to be Done – And It Is Underway • Topics for Harmonization • Merging of competing WS standards expected • Service Component Architecture (SCA) and Open Grid Services Arch (OGSA) • Service Data Objects (SDO) and Web Service Resource Framework (WSRF) • Workflow Management (aka web service chaining) • Triana, Taverna, Pegasus, BPEL • Semantically-aware workflow engine • SAGA: Simple API for Grid Apps – a basic grid programming model • Common “look-and-feel” for programming in a distributed environment • Appropriate use and cost • Not everything needs to be a service in a service architecture • Adoption of any new technology, e.g., SOA, is more expensive up front 36
Driving Innovation Slide Borrowed from Ulf Dahlsten, Director ‘Emerging Technologies and Infrastructures’ Phase 2 Prototype Phase 0 Research Phase 1 Solution proposal Phase 4 Commercial product/service Phase 3 Pre-commercial product/service Innovation “no man’s land” OGF (and other SDOs) Market Pull Market pull Research push Managing the Technology Maturation Process 37
OGF Technical Strategy/Stakeholder Alignment Process Application of Best Known Practices and Current Standards Uses Cases Architectures OGF Events Requirements Milestones Technical Strategy Committee Standards Groups & Workshops Requirements Workshops OGF Technical Strategy & Roadmap Best Practices Specifications OGF Document Series Analysis, Interpretation & Prioritization of Requirements 38
A More Refined View Standards Groups Requirements Solicitation Best Practice Workshops Best Practices EGR-RG Applications Best Practices Req Req SN-CG Architecture Requirements Rollup, Analysis & Prioritization (EGR-RG) TSC GAP Analysis Financial Compute Telco Req Prioritized Req and Req Patterns Data Pharma Infrastructure What WGs are doing i.e. WG roadmap Req and Req Patterns EDA Management • Overall standards roadmap • Gap analysis of WG roadmap vs. prioritized Req • Recommended actions Security Vendors Requirements Specs 39
OGF Grid Requirements Roll-up • DATA MANAGEMENT • Data Copy, Data Movement • Backup • Storage Policy Mgmt • Replica Mgmt • Caching (local disk, indexes, memory) • Data Grid APIs • GRID MANAGEMENT • Mgmt Console GUI • Asset Management and Topology • Policy Management and Quotas • Mitigate management overhead • Transition/evolution models • WORKFLOW • Planning • Management (Cent. & Dist.) • SCHEDULING • Meta scheduler, data aware • MONITORING & EVENT NOTIF. • Monitoring, Auditing and Alert Mgmt • FAULT TOL. & ERROR MGMT • Deep Error Analysis • Error Audit • Verification & Audit • Root Cause Analysis • Job error management • Very high levels of uptime • SECURITY • Grid Identity Mgmt • Strong Security • Multiple domains • ACCOUNTING & AUDITING • Billing and Chargeback • Chargeback models • Business issues (charge back) • Sarbanes-Oxley Support • SYSTEM DEVEL & DEPLOY • Simplify application development • End-User Tools and Envs • AUTONOMIC BEHAVIORS • Monitoring • Semantics • Planning • Action • INFORMATION ARCHITECTURE • Metadata Schemas, Ontologies & Semantics • Data Profiling • Data-tagging, including managing files • DISCOVERY • Detailed Asset Discovery • API for Product Capability Discovery • Extract information about a project or a product • Lets users grab the right data -- categorizing data • Content/Data Discovery • Catalogue-based Data Access • RESOURCE VIRTUALIZATION • Dynamic Provisioning • Capacity on demand • Capacity grows as available • Content Provisioning • Provisioning and Capacity Management • JOB MANAGEMENT • Distributed Execution • Job Submission • Job control management • Job Migration 40
Current OGF Standards Work • Infrastructure Grid and Virtualization Working Group (gridvirt-wg) Network Mark-up Language Working Group (nml-wg) Network Measurements Working Group (nm-wg) • Management Application Contents Service WG (acs-wg) Configuration Description, Deployment, and Lifecycle Management WG (cddlm-wg) Glue Schema Working Group (glue-wg) OGSA Resource Usage Service WG (rus-wg) Usage Record WG (ur-wg) • Security OGSA Authorization WG (ogsa-authz-wg) Trusted Computing Research Group (tc-rg) • Applications Distributed Resource Mgmt App. API WG (drmaa-wg) Grid Checkpoint Recovery WG (gridcpr-wg) Grid Information Retrieval WG (gir-wg) Grid Remote Procedure Call WG (gridrpc-wg) Simple API for Grid Applications Core WG (saga-core-wg) • Architecture OGSA Naming Working Group (ogsa-naming-wg) Open Grid Services Architecture WG (ogsa-wg) • Compute Grid Resource Alloc. Agreement Protocol WG (graap-wg) Job Submission Description Language WG (jsdl-wg) OGSA Basic Execution Services WG (ogsa-bes-wg) OGSA High Perf. Computing Profile WG (ogsa-hpcp-wg) OGSA Resource Selection Services WG (ogsa-rss-wg) • Data Data Format Description Language WG (dfdl-wg) Database Access and Integration Services WG (dais-wg) Grid File System Working Group (gfs-wg) Grid Storage Management WG (gsm-wg) GridFTP WG (gridftp-wg) Info Dissemination WG (infod-wg) OGSA ByteIO Working Group (byteio-wg) OGSA Data Movement Interface WG (ogsa-dmi-wg) OGSA-Data Working Group (ogsa-d-wg) 41
IBM & Friends Open Grid Services Architecture (OGSA) Built on top of Web Services Resource Framework (WSRF) Designed in collaboration with the Globus Alliance and used in GT4 MS & Friends .NET Built on WS-Interoperability (WS-I) Forms basis of MS’s Web Service Extensions (WSEs) (Formerly) Competing Camps! 43
IBM, MS, HP, Intel Publicly Announce Intent to Converge Web Service Standards 44
Strategic Organizational Liaison • Potential OGC-OGF Collaboration • Workshop at OGF-20 • May 7, 2007, Manchester, UK • Organized by Chris Higgins (Edinburgh) • General Agenda • Statements from key stakeholders & potential adopters • Panel on Specific Goals • Goal • Memorandum of Understanding outlining concrete steps of collaboration • Potential Technical Directions • Integration of registry concepts with current standards • Integration of services (e.g., WMS, WFS) w/ emerging WS standards • Identification of suitable security (user identity) model • Integration of resource mgmt, workflow, notification, tools, … 46
NSF Support for Semantic Web Research Frank Olken National Science Foundation CISE/IIS folken@nsf.gov Presentation to SICOP Special Conference Falls Church VA Feb. 6, 2007
Why does NSF care about semantic web technologies? • Formalization of scientific knowledge • Facilitate sharing of scientific data • Facilitate access to scientific data and knowledge • Natural language processing • Information extraction, digital libraries, ... • Support for digital government • Semantic rules languages, disaster support, ... • Support for machine learning • Support for math/science education Bullets courtesy Frank Olken, NSF/CISE/IIS, folken@nsf.gov 48
Debates about semantic web research • Skepticism about adoption of semantic tagging by the masses (and the quality of the tagging) • NSF is concerned about scientific/govt uses, not MySpace. • Poor Quality Ontologies • Ontology development and assessment remains difficult, rare skill. Some progress (e.g., Ontoclean), clear need for more research and more training of practitioners. • Ontology Merging is very very hard: • Currently subject of research, see Ontoclean work, also work by Joslyn, et al. on use of partial orders. • Skepticism of semantics by most of the database research community: • Still somewhat an issue, because semantic proposals often go to to panels dominated by DB researchers. Progress in adding more semantic web researchers to panels. Bullets courtesy Frank Olken, NSF/CISE/IIS, folken@nsf.gov 49
More debates about semantic web research • Description Logic vs. First Order Logic • Heated debates in KR research community about whether description logics are adequate or whether FOL or other logics should be used. • Scalability and structuring of rule bases • Concerns about the software engineering of large rule bases (or collections of logic axioms). Efforts to partition such large rule bases / logic axiom collections (cf. Cyc's microtheories, etc.) This remains an open research topic. • Skepticism about scalability of semantic search and inference engines • Open research issue ... Bullets courtesy Frank Olken, NSF/CISE/IIS, folken@nsf.gov 50
Which challenges and priorities does this group want to put on their research agenda? 51