270 likes | 421 Views
The GMU Geospatial Grid Technology Development and Application Project. Liping Di Laboratory for Advanced Information Technology and Standards (LAITS) George Mason University lpd@rattler.gsfc.nasa.gov. Overall Objectives.
E N D
The GMU Geospatial Grid Technology Development and Application Project Liping Di Laboratory for Advanced Information Technology and Standards (LAITS) George Mason University lpd@rattler.gsfc.nasa.gov
Overall Objectives • Develop the geospatial extensions of Grid technology to make it geospatial enable. • Develop virtual geospatial data and information services in the Grid environment. • Demonstrate the geospatial Grid technology in Earth Observation (EO) environment at NASA data pools. • Contribute technology, software, and the data pool application to the CEOS Grid testbed
The Grid Technology • The Grid technology is developed for securely sharing computational resources within an virtual organization. • Computer CPU cycles • Storage • Networks • Data, Information, algorithms, software, services. • It was originally motivated and supported from sciences and engineering requiring high-end computing, for sharing geographically distributed high-end computing resources. • The core of the technology is the the open source middleware called Globus Toolkit. • The latest version of Globus is version 3.0 which implements the Open Grid Service Architecture (OGSA)
Why Grid is useful to the EO community? • Earth observation community is one of the key communities for collecting, managing, processing, archiving and distribution geospatial data and information. • Because of the large volumes of EO data and geographically scattered receiving and processing facilities, the EO data and associated computational resources are naturally distributed. • The multi-discipline nature of global change research and remote sensing applications requires the integrated analysis of huge volume of multi-source data from multiple data centers. This requires sharing of both data and computing powers among data centers. • Therefore, Grid is an ideal technology for EO community.
Why Needs the geospatial extensions of Grid • Geospatial data and information are significantly different from those in other disciplines. • Very complex and diverse. • Formats, projection, resolutions. • Hyper-dimensions: spatial, temporal, spectral, thematic. • Raster vs. vectors • Large data volume • more than 80% of data human beings has collected is spatial data. • The geospatial community has developed a set of standards specifically for geospatial data and information that users have been familiar with. (e.g., OGC, ISO, FGDC). • Grid technology is developed for general sharing of computational resources and not aware of the specialty of geospatial data. • In order to make Grid technology applicable to geospatial data, we have to do the geospatial domain-specific extensions.
Areas of Extensions • Internally in the Grid, it have to be spatially aware. • Extend Globus toolkit to handle the spatial, spectral, temporal, thematic based spatial data and information management. • Develop enough Grid-enable tools for geospatial data handling/services. • Must provide data/information access and services interfaces that are standard in the geospatial community. • The Open GIS Consortium’s Web Data Access/Service interfaces (e.g., OGC WCS, WMS, WFS, and WRS).
The OGC Web Service Specifications • The Web Coverage Services (WCS) specification: defines the standard interfaces between web-based clients and servers for accessing coverage data. • All imagery type of remote sensing data is coverage data. • The Web Feature Services (WFS) specification: defines the standard interfaces between web-based clients and servers for accessing feature-based geospatial data. • vector and point data are feature data. • The Web Map Services (WMS) specification: define the standard interfaces for accessing and assembling maps from multiple servers. • visualization of geospatial data • The Web Registries Services (WRS) specification: defines the interfaces between web-based clients and servers for finding the required data or services from registries. • WCS, WFS, WRS, and WMS form the foundation for the interoperable geospatial data access and service environment
OGC Interfaces to the Geospatial Data Grid MCS Query MCS Database OGC WRS Query MCS Web Server WRS Server inter-faces Logical Filenames Metadata Catalog Service OGC Client Logical Filenames WRS Results Physical locations Replica index node Replica cat. OGC data protocols Replica Location Service WCS/WFS/WMS Server inter-faces Physical locations Transformed Data Physical Storage System/files Data
Virtual datasets • A virtual dataset is a dataset that: • not exist in a data and information system • The system knows how to create it on-demand. • A virtual dataset, once created, can be kept for fulfilling the same request from next users. • The client/data user will not know the difference between a real dataset and a virtual dataset. • Advantages of virtual datasets • A virtual dataset can be produced (materialized) by • running a program dedicated to the production of the virtual dataset (dedicated program approach). • running a series of service modules, each one takes care of a small step of the materialization of the virtual dataset (service approach).
The Service Approach to Virtual Datasets • A service is defined as self-contained, self-describing, modular applications that can be published, located, and dynamically invoked across a network. • It performs functions, which can be anything from simple requests to complicated business processes. • Once a service is deployed, other applications (and other services) can discover and invoke the deployed service. • A service can be implemented in the Web environment, called a web service, or in the Grid environment, called a Grid service. • Standards on service discovery, declaration, binding, and invocation allow dynamically chaining individual services across a network together to fulfill a complex task. • A virtual dataset, in the service environment, basically is a service chain that describes steps to be taken to produce the virtual dataset. • With enough elementary service models, it is possible to provide unlimited numbers of virtual datasets by just creating the service chains.
Geo-object, Geo-tree, Virtual Dataset, Geospatial Models modeling and virtual data services no service data service User Requested User Obtained archived geo-object user geo-object Geospatial web/Grid services Intermediate geo-object Automated data transformation service(WCS/WFS)
User Creation of Geospatial Models • A user-requested products maybe not exist both virtually and no virtually. • If the user knows the thought process to create the data products from lower-level inputs step-by-step (the logical geospatial modeling) • With help of a good user interface and the availability of service modules and models/submodels, the user can construct a geospatial model/virtual data product interactively. • The system then can produce the virtual data product for the user. • The user-created model can be incorporated into the system as a part of the virtual datasets the system can provide. • This allows the system to grow capabilities with time. • Advantages • allows users to obtain the ready-to-use scientific information instead of the raw data, significantly reducing the data traffic between the users and the geospatial Grid. • allows users to explore huge resources available at a data Grid and to conduct tasks that they never be able to conduct before.
Research Issues in Virtual Geospatial Data Services • Representation of Geo-Tree. • The Geotree/model description • Need a language to describe the geo-tree and subtree • Only logical and thematic description of the tree. • Not attach to individual physical files, use virtual data types • Service module cataloging • Need a catalog in MCS to catalog all service modules available (modules not necessary in the same system) • Describe the inputs, outputs, and how to invoke the service • Use for both manually or automatically constructing the geo-tree. • Use for instantiation of the virtual dataset (to create the workflow)
Research Issues in Virtual Geospatial Data Services • Geo-tree/model database • Contains all geo-trees available • Virtual dataset cataloging • Need to catalog all virtual datasets in a geo-tree (the root of the geo-tree and all intermediate datasets ). • Catalog the virtual datasets with the real data set, use the same description as the real dataset in the catalog except for two things: • no description of spatial and temporal coverage • include a point to the entry to the geotree database where the specific geotree is located.
Additional Required Functional Components • Logical Instantiation • This component will check if a virtual data can be materialized against a specific user search. • Generate logical filenames which are unique and different from the real logical filenames. • Generate the logical workflow per request of physical Instantiation (filenames in the workflow are logical names) • Physical Instantiation • This component will produce the executable workflow when user actually requests the virtual dataset. • What workflow language should be used? • Workflow execution manager • Manage the execution of the workflow for materializing the virtual dataset. • Return the materialized dataset to users.
Virtual Data Services In the Geospatial Data Grid Metadata Catalog Service MCS Web Server MCS Query MCS Database 1 2 GeoTree Lib LF logical instant. Module cata. OGC WRS Query WRS Server 2 LF OGC Client Replica index node Replica cat. PL WRS Results Replica Location Service 3 OGC data protocols 2 Workflow Execution Manager 2 Physical Inst. WCS/WFS/ WMS Server 4 5 Transformed Data PL Physical Storage System/files Data 1: Matched virtual datasets 2: logically instanced virtual filenames (LIVF) 3. logical workflow 4. Physical workflow 5. Data LF: Logical filename Yellow: New component PL: Physical Location Light Yellow: Modified MCS component
The Development Team • PI, Liping Di, LAITS/GMU. • Co-I, Williams Johnston NASA Ames and DOE LBNL. • Co-I, Deans Williams, DOE LLNL.
Implementation Plan • The first phase is the testbed and initial integration, including the setup of the development environment, preliminary design of the integration, and implementation of WCS access to Grid-managed data. • The second phase is the data naming and location transparency, which include the use of Data Grid and Replica Services (metadata catalogues, replication location management, reliable file transfer services, and network caches) to provide naming and location independence for data used by NWGISS and revising NWGISS to invoke such Grid services. • The approach to investigating the Data Grid and Replica Services will be to configure a Data Grid testbed. This will be followed by the integration of NWGISS data catalogs into a data Grid catalog and the investigation of naming approaches, followed by interfacing NWGISS with data generators and Data Grid Replica Location service • The third phase is the virtual dataset research and development.
The development environment • A prototype development environment has been setup at LAITS/GMU • Three machines—2 Linux and 1 SunFire Unix servers. • Machines are linked through 100 Mb LAN. • External link to Internet through dedicated T1 line. • The real development/demo environment is being set • GMU will purchase a server with 4-8 Tb of disk space. The machine will be hosted at NASA Goddard. • NASA AMES will provide a machine with 4-8 Tb disk space and 30 Tb near real-time storage device. • DOE LLNL will provide a machine with 2-4 Tb of disk space. • 1 Gb/sec Internet link • The machine will be populated with NASA EOS data • e.g., MODIS, ASTER, Landsat, MISR.
Current Status • Most of the phase-one developments are near complete. • MCS has been extended to handle spatial, temporal, and parameter-based search by adding a layer on top of MCS. • WRS interface has been implemented on top of MCS for data search and discovery. • WCS server has been modified to access data within the virtual organization. • WRS and WCS are connected for searching and then deliver the one-demand data to users. • A demonstration will show the on-demand access of Grid-managed EOS data through a OGC client. • You will not see the Grid because it supposes to work invisible to clients outside to the Grid.
Near-term Plan (within 6 months) • Build the real development/test environment. • Modify WCS server so that it can work as the client to another WCS server remotely located in other machine. • to fetch just the right amount of data to the requested machine within the Grid. • Test the service concepts • Register services in MCS • Couple services with data • enable to search available services associated with data • enable to search available data with a given service • Develop fundamental data services • Reformatting, subsetting/resampling • Georectification/reprojection • Supervised and unsupervised classification services
The NASA EOSDIS Data Pools • The NASA EOSDIS project is implementing data pools that contains huge amount of remote sensing data on-line for users to directly and rapidly access. • The data pools will be operated at each of nine NASA’s distributed active archive centers (DAACs). • Each data pool will provide discipline-specific EOSDIS data archived at the DAAC. • DAACs are connected through the high-speed network • Currently there are total four operational data pools at GSFC, Langley, EDC, and NSIDC. • Both data search through search criteria and data finding through browsing/drilling-down are provided. • ftp for data downloading. No data services is provided. • OGC WCS interface is being implemented. • provide better data access than FTP.
The CEOS Grid Data Pool Application • Deploy the geospatial Grid software developed by GMU-led team to data pools as one of CEOS Grid Applications • Initially at NASA Goddard DAAC. • Intend to expand to all data pools. • The application will provide • secured sharing of computing resources among the data pools. • a single point of entry to all resources in the pools--location transparent. • geospatial standard-based data discovery and access. • Automatic data transformation services • Virtual data services • Interactive geospatial modeling, execution, and model sharing.
Contribution to the CEOS Grid Activities • Share the technology, software, and experience with other CEOS Grid application projects • Contribute the Grid data pool application to the CEOS Grid testbed for technology demonstration. • The technology and the software created by this project can be used to create a CEOS-based Global EO Data and Information Grid • Support globally sharing the EO data, information, and/or computational resources. • Support International scientific and EO initiatives such as Integrated Global Observation System (IGOS). • Support the use of EO data/information in the developing countries for environmental monitoring and decision support.