260 likes | 433 Views
Enabling User-Oriented Data Access in a Satellite Data Portal Rajesh Kalyanam Lan Zhao Taezoon Park Carol X. Song RCAC, Purdue University, West Lafayette, IN 47907 Larry Biehl PTO, Purdue University, West Lafayette, IN 47907. Outline. Background Motivation System Design Data Production
E N D
Enabling User-Oriented Data Access in a Satellite Data PortalRajesh KalyanamLan ZhaoTaezoon ParkCarol X. SongRCAC, Purdue University, West Lafayette, IN 47907Larry Biehl PTO, Purdue University, West Lafayette, IN 47907
Outline • Background • Motivation • System Design • Data Production • Data Subscription • Data Delivery • Future Work
Background • Overview of Purdue Terrestrial Observatory (PTO) • Remote-sensing research facility • Goes-12 GVAR, AVHRR, and MVISR sensor systems – AQUA/TERRA satellites • Component of the TeraGrid data provider framework • Satellite data products • Land, ocean and atmosphere data • Provide trends on local or continental scales • Used in climatology, hydrology, agriculture and transportation
Example MODIS Products • Level 1A (MOD01) • Level 1B (MOD02) with/without bowtie correction • Geolocation (MOD03) • Aerosol (MOD04) • Water Vapor (MOD05) • Clouds (MOD06) • Atmospheric Profiles (MOD07) • Reflectance (MOD09) • Snow (MOD10) • Fire Detection (MOD14) • Ocean Color (MOD18) • Sea Surface Temperature (MOD28) • Sea Ice (MOD29) • Cloud Mask (MOD35) • Also Multiday composites of above Note that each data set product may contain a few to many variables.
Variables in MOD04 Product Aerosol_Type_Land Angstrom_Exponent_1_Ocean Angstrom_Exponent_2_Ocean Angstrom_Exponent_Land Asymmetry_Factor_Average_Ocean Asymmetry_Factor_Best_Ocean Backscatter_Ratio_Average_Ocean Backscatter_Ratio_Best_Ocean Cloud_Condensation_Nuclei_Ocean Cloud_Fraction_Land Cloud_Fraction_Ocean Cloud_Mask_QA Continental_Optical_Depth_Land Corrected_Optical_Depth_Land Critical_Reflectance_Land Effect_Optical_Depth_Ave_Ocean Effect_Optical_Depth_Best_Ocean Effect_Radius_Ocean Error_Critical_Reflectance_Land Error_Path_Radiance_Land Estimated_Uncertainty_Land Least_Squares_Error_Ocean Mass_Concentration_Land Mass_Concentration_Ocean Mean_Reflectance_Land Mean_Reflectance_Land_All Mean_Reflectance_Ocean Number_Pixels_Percentile_Land Number_Pixels_Used_Ocean OptDepth_Ratio_Small_Land OptDepth_Ratio_Small_Land_Ocean OptDepth_Ratio_Small_Ocean Optical_Depth_Land_And_Ocean Optical_Depth_Large_Ave_Ocean Optical_Depth_Large_Best_Ocean Optical_Depth_Small_Ave_Ocean Optical_Depth_Small_Best_Ocean Optical_Depth_by_models_ocean Path_Radiance_Land QualityWt_Critical_Reflect_Land QualityWt_Path_Radiance_Land Quality_Assurance_Crit_Ref_Land Quality_Assurance_Land Quality_Assurance_Ocean Reflected_Flux_Average_Ocean Reflected_Flux_Best_Ocean Reflected_Flux_Land Reflected_Flux_Land_And_Ocean STD_Reflectance_Land STD_Reflectance_Ocean Scan_Start_Time Scattering_Angle Sensor_Azimuth Sensor_Zenith Solar_Azimuth Solar_Zenith Solution_Index_Ocean_Large Solution_Index_Ocean_Small Std_Dev_Reflectance_Land_All Transmitted_Flux_Average_Ocean Transmitted_Flux_Best_Ocean Transmitted_Flux_Land Latitude Longitude
Motivation • User Requirement • Custom-tailored data configurations • Receive continuous data updates • Real-time or near-real-time access • Current Systems • Impossible to generate complete range of data products • Have to route through the support staff • Manual process which is time consuming and error-prone
Motivation “Web-based data configuration, subscription and delivery system”
System Design • Processing and Storage Backbone • PTO infrastructure • PTO data processing cluster • SDSC SRB middleware • Publish-Subscribe manager • Interface between the client side and the data processing backend • Manager user subscriptions • Handles enabling/disabling data production • Client side applications • Subscription interface • Data access portal
System Design • User-driven publish/subscribe model • Dynamic data generation • User specifies, controls, and receives custom-tailored data • Continuous data updates in near-real-time • Multiple ways to access the data
Data Production • Data production software • SeaSpace TeraScan software • Configuration variables • Various projections and output formats • On-demand data production • User choice driven production • “configproc” file mechanism • Automatic enabling and disabling • scp based data transfer to SRB archive and webserver
Data Production • Example configproc file input_directory: products/tdf/Local/modis/ndvi input_files: %yyyy.%mmdd.%hhmm.%satel.MYD_NDVI image_variable: EVI image_format: jpeg scale_range: -0.25 1.00 color_palette: modis_ndvi grid_delta: 0 boundaries: dcw.coast dcw.states max_width: 256 output_template: %yyyy.%mmdd.t_evi.jpg save_directory: products/images/modis save_files: 20??.????.t_evi.jpg
Data Subscription • Data Subscription Components • Publish-Subscribe based subscription manager • Subscription Interface • Publish-Subscribe subscription manager • Simulates operation of a PubScribe system • Implemented through an Apache Axis webservice • Subscription Interface • Available on a web-based scientific gateway portal • Naïve and advanced user interfaces
Data Subscription • Advanced user interface • Requires knowledge of variables involved in data product • Choice-list based configuration • AJAX dynamic filtering of choice lists • Will allow advanced configuration variables with strict logical composition rules • Naïve user interface • Plain English description : “bimonthly composite of vegetation data” • Scoring mechanism for selecting possible products • Learning mechanism for improving performance over time • Work in progress
Data Subscription • Predicate matching • Keyword definitions for each data product : “BIMONTHLYCOMPOSITE of VEGETATION data” • Score captures the degree of correlation between descriptions and products • Additional keywords are added to a list for further consideration, scores are updated based on repetition frequency • Successful product descriptions are tagged • Tags can be reused by other users to search for common products
Data Subscription • Subscription Manager • Subscription data management • Receives updates from data generator • Distributes notifications to subscribed users • Enabling and disabling data generation • Subscription data management • MySQL database • Product information – product key, generation frequency, configuration variables, filename pattern, webserver path • User subscription information – userid, product key, date range, email address
Data Subscription • Pull-based notifications • Simpler approach • Perl script tracks updates to data repository • Loops through all data products based on the highest generation frequency • Trade-off between performance and notification delays
Data Subscription • Push-based notifications • Requires tight integration with data generation process • Included as an entry in the configproc file • Product name argument is used to query list of users • Constraints on the execution node and environment
Data Delivery • Http access • Users can download images off the webserver • Cannot verify if they are interested in the image • Images cannot be stored for a long time on the webserver • RSS feed based access • Thumbnails are sent as RSS feeds when new images are available • Users can download the actual image from the feed link based on the thumbnail • Data portal access of archive data • Can access archived data from the SRB server • Difficult to sift through the large number of images
Future Work • Future Direction • Explore advantages of standard PubScribe models • Utilise current state of the art in ontology based methods for predicate mapping • Performance studies for scalability • Transfer data automatically to user specified location
“A user-oriented subscription framework that will encourage broader access from the grid user community” Conclusion
Acknowledgements This work was made possible by the National Science Foundation, TeraGrid Resource Partners grant OCI-0503992
References • C. Baru, R. Moore, A. Rajasekar, M. Wan, "The SDSC Storage Resource Broker," Proc. CASCON’98 Conference, 1998. • Content Standard for Digital Geospatial Metadata” (CSDGM) Version 2 (FGDC-STD-001-1998), http://www.fgdc.gov/standards/documents/standards/metadata/v2_0698.pdf. • Content Standard for Digital Geospatial Metadata: Extensions for Remote Sensing Metadata (FGDC-STD-012-2002), http://www.fgdc.gov/standards/documents/standards/remote_sensing/MetadataRemoteSensingExtens.pdf. • C. Pautasso, "JOpera: An Agile Environment for Web Service Composition with Visual Unit Testing and Refactoring, " VL/HCC 2005. • Earth System Grid (ESG), http://www.earthsystemgrid.org/. • J. Novotny, M. Russell, O. Wehrens, "GridSphere: An Advanced Portal Framework, " EUROMICRO 2004, 412-419 • JSR 168: Portlet Specification http://www.jcp.org/jsr/detail/168.jsp. • L. Zhao, T. Park, R. Kalyanam, S. Goasguen, "Purdue Multidisciplinary Data Management Framework Using SRB", SRB Workshop, Vol. 1, pp. 6-11, February 2006. • LEAD Portal, http://lead.ou.edu. • MODIS portal from the Oregon State University direct broadcast station, http://sugar.coas.oregonstate.edu/MODIS/. • M. E. Pierce, G. C. Fox, H. Yuan, and Y. Deng, "Cyberinfrastructure and Web 2.0, " Proceedings of HPC2006, July 4 2006, Cetraro Italy. • M. E. Pierce, G. C. Fox, M. S. Aktas, G. Aydin, H. Gadgil, Z. Qi, and Ahmet Sayar, "The QuakeSim Project: Web Services for Managing Geophysical Data and Applications, " PAGEOPH Special Issue for 5th ACES International Workshop, Island of Maui, Hawaii. • nanoHUB, http://www.nanohub.org. • NEES portal, http://neesforge.nees.org/projects/simportal/. • Purdue Terrestrial Observatory, http://www.itap.purdue.edu/pto/. • R. Kalyanam, L. Zhao, T. Park and S. Goasguen, "A Service-Enabled Distributed Workflow System for Scientific Data Processing," Proceedings of IEEE Int’l Workshop on Future Trends of Distributed Computing Systems (FTDCS’07), Sedona, AZ, March, 2007. • SeaSpace Corporation, http://www.seaspace.com. • U. Nambiar, B. Ludaescher, K. Lin, C. Baru, "The GEON portal: accelerating knowledge discovery in the geosciences," Workshop On Web Information And Data Management Archive, Proceedings of the eighth ACM international workshop on Web information and data management, 2006. • Java Message Service, http://java.sun.com/products/jms