260 likes | 355 Views
We have data, now what?. Carol Song Senior Research Scientist Rosen Center for Advanced Computing Purdue University carolxsong@purdue.edu. WGISS-26, September 23, 2008. Understanding and Utilizing Data.
E N D
We have data, now what? Carol Song Senior Research Scientist Rosen Center for Advanced Computing Purdue University carolxsong@purdue.edu WGISS-26, September 23, 2008
Understanding and Utilizing Data • An integrated system for real-time NEXRAD II radar data delivery and 3D visualization, with multi-layer user interfaces to reach a wide audience. • Collaboration among computer scientists and earth/atmospheric scientists • Team: V. Sundaram, L. Zhao, C.X. Song, B. Benes, P. Kristof, R. Veeramacheneni, M. Huber. • Demand-driven subscription system for real-time satellite data delivery • Purdue Terrestrial Observatory • Team: R. Kalyanam, L. Zhao, L. Biehl, C.X. Song • Providing data through services! Work supported by: National Science Foundation
Next Generation Radar (NEXRAD) Level II Data Weather Surveillance Radar (WSR-88D) • This data contains a very fine temporal and spatial resolution of three attributes: reflectivity, Doppler radial velocity and spectrum width • These attributes are vital to understanding, monitoring and predicting severe weather conditions • There are 135 Radar Stations in the US • Continuously received in near real-time, streaming Doppler Radar Tower in Connecticut and the Pulsed Doppler Radar inside Acknowledgment: Figures are downloaded from websites www.CCSU.edu and www.answers.com.
NEXRAD II Data Generation • 3D structure in Radar Data • Continuous rotation over 360° in azimuth • Simultaneous increase in elevation by 1° to 3°per complete sweep • Continuous NEXRAD Level II radar data stream • Data files vary in size: a few MB to tens of MB each, depending on the weather conditions. • Data compressed with a modified bzip2 • The temporal resolution is 4-5 minutes in severe weather vs. 9-10 minutes in calm weather Structure of Doppler Radar Data (Reflectivity )
NEXRAD II Data Distribution • The National Climatic Data Center (NCDC) houses the data and provides a central clearinghouse of archived Level II data as a resource to the research, teaching, and technology development communities. • Distributed through four top tier distributors • Purdue makes it available on the NSF TeraGrid • Opportunity! • The near real-time availability of high-resolution radar data provides an exciting opportunity for meteorologists if the data can be accessed and visualized in 3D in a timely manner. • Super res data becoming available as we speak
Technical Challenges • Large volume and real-time streaming (50 MB/s) presents major computational and data management challenges. • Super Res data: even larger data • SUPER RESOLUTION DATA INCREASE THE AZIMUTH RESOLUTION FROM 1 DEGREE TO 0.5 DEGREE. • THE REFLECTIVITY DATA RANGE RESOLUTION FROM 1 KM TO 0.25 KM...AND DOPPLER DATA RANGE FROM 230 KM TO 300 KM FOR SPLIT CUTS...GENERALLY SCANS AT 1.5 DEGREES OR LOWER ELEVATION. • THE AMOUNT OF DATA COLLECTED AND TRANSMITTED DURING A VOLUME SCAN WILL INCREASE BY A FACTOR OF APPROXIMATELY 2.3. • Lack of scale: Analyzing data over a long period or large geographical region requires heavy computation • Lack of interactive 3D visualizations • Despite the availability of 3D information in the new generation, the data is most commonly visualized as 2D images, simple 3D Point clouds or iso-surfaces. • Access Method: Download using FTP/HTTP and no programmatic access • Data Format: compressed (modified bzip2) but not supported by popular libraries (eg RSL)
NEXRAD data products • Online data • original streamed data from NWS (compressed), searchable from map and downloadable, most recent months. • Special event data (severe weather events) • Data services • Uncompressed data (through data services) • Variable values (e.g., reflectivity, radial velocity) • Pre-generated 3D volumes • Access methods • Data portal • THREDDS, OPeNDAP • Third party viewers (e.g., IDV, Java NEXRAD viewer) • Programming interfaces APIs (C++ library) • New: near real-time, interactive 3D visualization
An End-to-End Integrated System • Three important components: • Data Management • Download required files from SRB and uncompress using modified bzip2 • Data Processing • Read the radar files using RSL • Process the data from multiple sites • Convert them into render-able 3D volumes • Visualization/Data Rendering • Import the volumetric data from the disk. • Create 3D textures and slices and apply the texture-based volume-rendering techniques. • Utilize transfer functions to render the data on GPU.
TeraGrid Processing Site 1 Processing Site 2 Merge Main Job ....... Processing Site N Scaling using Teragrid • How to scale? Key Observations: • Spatial parallelism: between stations • Temporal parallelism: volumes generated for intervals are indpendent • Data access can be parallel as well • Two types of computation tasks • Processing per station per interval • Merging: combines 3D volumes from all sites and creates the full 3D volume for each interval • Granularity of Parallelization • Depends on the processing power available • Either fine grained (per site per interval ) or coarse grained (per site ) • Using Condor DAGMan to orchestrate jobs
Example Images rendered at different timestamps using a dataset from scanning a 24-hour supercell storm on March 12, 2006, in the Midwest region of the United States.
Hurricane Ike reminant • Hurricane Ike, data from 4 stations (3 in IL and 1 in IN) between 10-noon on Sept. 14, 2008
Services through multiple interfaces • Expert use mode • Need to see details (large data, lots of processing), highly interactive, ability to manipulate color mapping and other settings. • With accelerated graphics hardware • Learning/casual use mode • Simple interface, no learning curve • Does not require high degree of details • Remote access mode • Through web browser • No special hardware • Need interactivity • Application developers • Need API or web service interfaces to integrate with their applications
Workload distribution & Scalability • Web 2.0 gadget for the masses • Data preproposed, rendered, composed into animation on server; animation (or sequence of images) sent over web • Desktop client for maximum interactivity and performance • Data preprocessed offline and 3D data volumes cached on server • 3D Graphics rendering on user’s computer (GPU enabled) • Web browser access for interactivity but slower display • Data preproposed offline, 3D volumes cached and rendered into 3D graphics • Images sent over the network • User accesses the interactive application through a VNC based Java applet
Reach out to the masses A LiveRadar3D Google gadget displaying 3D visualization of radar data, continuously updated with streaming data
Summary • Remote 3D visualization services delivered through multiple interfaces • Application interface of data services for third party integration • An architecture that scales to different use scenarios • Parallel data pre-processing using the TeraGrid Condor resources and partial volume caching which improve the response time and scalability of the system. Continuing effort • User feedback • Scale – support multiple users simultaneously • Hierarchical 3D volume structure to support multi-scale investigation
Thank you! Publications, URLs available. Feel free to contact Carol
PRESTIGEPurdue Real-Time Satellite Information Gateway • User Requirement • Receive continuous data updates • Real-time or near-real-time access • Custom-tailored data configurations • Current Systems • Impossible to generate complete range of data products • Have to route through the support staff • Manual process which is time consuming and error-prone
Range of MODIS Data Products • Level 1A (MOD01) • Vegetation Index (MOD09) • Geolocation (MOD03) • Aerosol (MOD04) • Water Vapor (MOD05) • Clouds (MOD06) • Atmospheric Profiles (MOD07) • Reflectance (MOD09) • Snow (MOD10) • Fire Detection (MOD14) • Ocean Color (MOD18) • Sea Surface Temperature (MOD28) • Sea Ice (MOD29) • Cloud Mask (MOD35) • Also Multiday composites of above Note that each data set product may contain a few to many variables.
System Design • User-driven publish/subscribe model • Dynamic data generation • User specifies, controls, and receives custom-tailored data • Continuous data updates in near-real-time • Multiple ways to access the data
Data Subscription • Web portal based user interface • Choice list based option selection • Options include – Satellite, Coverage area, Data product, Projection type and Data format • Ability to select date range for subscription validity • User-driven product choice expansion • Individual user-based subscriptions • User-initiated data production • Data products generated only when some user is subscribed to the product • Data production automatically turned off when no active subscription exists
Data Notification • Push-based notifications • Near real-time delivery of new data notification through email • Implemented by automatically invoking a web-service from the processing cluster when new data is available • Subscription database used to query active subscriptions • Data delivery mechanism • Data scp’ed from processing cluster to webserver-accessible storage space • Thumbnail generated for images to provide a quick look feature • Link to the webserver data location provided in the notification email