1 / 26

We have data, now what?

We have data, now what?. Carol Song Senior Research Scientist Rosen Center for Advanced Computing Purdue University carolxsong@purdue.edu. WGISS-26, September 23, 2008. Understanding and Utilizing Data.

larya
Download Presentation

We have data, now what?

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. We have data, now what? Carol Song Senior Research Scientist Rosen Center for Advanced Computing Purdue University carolxsong@purdue.edu WGISS-26, September 23, 2008

  2. Understanding and Utilizing Data • An integrated system for real-time NEXRAD II radar data delivery and 3D visualization, with multi-layer user interfaces to reach a wide audience. • Collaboration among computer scientists and earth/atmospheric scientists • Team: V. Sundaram, L. Zhao, C.X. Song, B. Benes, P. Kristof, R. Veeramacheneni, M. Huber. • Demand-driven subscription system for real-time satellite data delivery • Purdue Terrestrial Observatory • Team: R. Kalyanam, L. Zhao, L. Biehl, C.X. Song • Providing data through services! Work supported by: National Science Foundation

  3. Next Generation Radar (NEXRAD) Level II Data Weather Surveillance Radar (WSR-88D) • This data contains a very fine temporal and spatial resolution of three attributes: reflectivity, Doppler radial velocity and spectrum width • These attributes are vital to understanding, monitoring and predicting severe weather conditions • There are 135 Radar Stations in the US • Continuously received in near real-time, streaming Doppler Radar Tower in Connecticut and the Pulsed Doppler Radar inside Acknowledgment: Figures are downloaded from websites www.CCSU.edu and www.answers.com.

  4. NEXRAD II Data Generation • 3D structure in Radar Data • Continuous rotation over 360° in azimuth • Simultaneous increase in elevation by 1° to 3°per complete sweep • Continuous NEXRAD Level II radar data stream • Data files vary in size: a few MB to tens of MB each, depending on the weather conditions. • Data compressed with a modified bzip2 • The temporal resolution is 4-5 minutes in severe weather vs. 9-10 minutes in calm weather Structure of Doppler Radar Data (Reflectivity )

  5. NEXRAD II Data Distribution • The National Climatic Data Center (NCDC) houses the data and provides a central clearinghouse of archived Level II data as a resource to the research, teaching, and technology development communities. • Distributed through four top tier distributors • Purdue makes it available on the NSF TeraGrid • Opportunity! • The near real-time availability of high-resolution radar data provides an exciting opportunity for meteorologists if the data can be accessed and visualized in 3D in a timely manner. • Super res data becoming available as we speak

  6. Technical Challenges • Large volume and real-time streaming (50 MB/s) presents major computational and data management challenges. • Super Res data: even larger data • SUPER RESOLUTION DATA INCREASE THE AZIMUTH RESOLUTION FROM 1 DEGREE TO 0.5 DEGREE. • THE REFLECTIVITY DATA RANGE RESOLUTION FROM 1 KM TO 0.25 KM...AND DOPPLER DATA RANGE FROM 230 KM TO 300 KM FOR SPLIT CUTS...GENERALLY SCANS AT 1.5 DEGREES OR LOWER ELEVATION. • THE AMOUNT OF DATA COLLECTED AND TRANSMITTED DURING A VOLUME SCAN WILL INCREASE BY A FACTOR OF APPROXIMATELY 2.3. • Lack of scale: Analyzing data over a long period or large geographical region requires heavy computation • Lack of interactive 3D visualizations • Despite the availability of 3D information in the new generation, the data is most commonly visualized as 2D images, simple 3D Point clouds or iso-surfaces. • Access Method: Download using FTP/HTTP and no programmatic access • Data Format: compressed (modified bzip2) but not supported by popular libraries (eg RSL)

  7. NEXRAD data products • Online data • original streamed data from NWS (compressed), searchable from map and downloadable, most recent months. • Special event data (severe weather events) • Data services • Uncompressed data (through data services) • Variable values (e.g., reflectivity, radial velocity) • Pre-generated 3D volumes • Access methods • Data portal • THREDDS, OPeNDAP • Third party viewers (e.g., IDV, Java NEXRAD viewer) • Programming interfaces APIs (C++ library) • New: near real-time, interactive 3D visualization

  8. An End-to-End Integrated System • Three important components: • Data Management • Download required files from SRB and uncompress using modified bzip2 • Data Processing • Read the radar files using RSL • Process the data from multiple sites • Convert them into render-able 3D volumes • Visualization/Data Rendering • Import the volumetric data from the disk. • Create 3D textures and slices and apply the texture-based volume-rendering techniques. • Utilize transfer functions to render the data on GPU.

  9. TeraGrid Processing Site 1 Processing Site 2 Merge Main Job ....... Processing Site N Scaling using Teragrid • How to scale? Key Observations: • Spatial parallelism: between stations • Temporal parallelism: volumes generated for intervals are indpendent • Data access can be parallel as well • Two types of computation tasks • Processing per station per interval • Merging: combines 3D volumes from all sites and creates the full 3D volume for each interval • Granularity of Parallelization • Depends on the processing power available • Either fine grained (per site per interval ) or coarse grained (per site ) • Using Condor DAGMan to orchestrate jobs

  10. Example Images rendered at different timestamps using a dataset from scanning a 24-hour supercell storm on March 12, 2006, in the Midwest region of the United States.

  11. Hurricane Ike reminant • Hurricane Ike, data from 4 stations (3 in IL and 1 in IN) between 10-noon on Sept. 14, 2008

  12. A Service Architecture

  13. Services through multiple interfaces • Expert use mode • Need to see details (large data, lots of processing), highly interactive, ability to manipulate color mapping and other settings. • With accelerated graphics hardware • Learning/casual use mode • Simple interface, no learning curve • Does not require high degree of details • Remote access mode • Through web browser • No special hardware • Need interactivity • Application developers • Need API or web service interfaces to integrate with their applications

  14. Workload distribution & Scalability • Web 2.0 gadget for the masses • Data preproposed, rendered, composed into animation on server; animation (or sequence of images) sent over web • Desktop client for maximum interactivity and performance • Data preprocessed offline and 3D data volumes cached on server • 3D Graphics rendering on user’s computer (GPU enabled) • Web browser access for interactivity but slower display • Data preproposed offline, 3D volumes cached and rendered into 3D graphics • Images sent over the network • User accesses the interactive application through a VNC based Java applet

  15. Reach out to the masses A LiveRadar3D Google gadget displaying 3D visualization of radar data, continuously updated with streaming data

  16. The fully Interactive 3D visualization Client

  17. 3D Visualization of all stations

  18. Summary • Remote 3D visualization services delivered through multiple interfaces • Application interface of data services for third party integration • An architecture that scales to different use scenarios • Parallel data pre-processing using the TeraGrid Condor resources and partial volume caching which improve the response time and scalability of the system. Continuing effort • User feedback • Scale – support multiple users simultaneously • Hierarchical 3D volume structure to support multi-scale investigation

  19. Thank you! Publications, URLs available. Feel free to contact Carol

  20. PRESTIGEPurdue Real-Time Satellite Information Gateway • User Requirement • Receive continuous data updates • Real-time or near-real-time access • Custom-tailored data configurations • Current Systems • Impossible to generate complete range of data products • Have to route through the support staff • Manual process which is time consuming and error-prone

  21. Range of MODIS Data Products • Level 1A (MOD01) • Vegetation Index (MOD09) • Geolocation (MOD03) • Aerosol (MOD04) • Water Vapor (MOD05) • Clouds (MOD06) • Atmospheric Profiles (MOD07) • Reflectance (MOD09) • Snow (MOD10) • Fire Detection (MOD14) • Ocean Color (MOD18) • Sea Surface Temperature (MOD28) • Sea Ice (MOD29) • Cloud Mask (MOD35) • Also Multiday composites of above Note that each data set product may contain a few to many variables.

  22. System Design • User-driven publish/subscribe model • Dynamic data generation • User specifies, controls, and receives custom-tailored data • Continuous data updates in near-real-time • Multiple ways to access the data

  23. Satellite Data Subscription

  24. Data Subscription • Web portal based user interface • Choice list based option selection • Options include – Satellite, Coverage area, Data product, Projection type and Data format • Ability to select date range for subscription validity • User-driven product choice expansion • Individual user-based subscriptions • User-initiated data production • Data products generated only when some user is subscribed to the product • Data production automatically turned off when no active subscription exists

  25. Data Notification • Push-based notifications • Near real-time delivery of new data notification through email • Implemented by automatically invoking a web-service from the processing cluster when new data is available • Subscription database used to query active subscriptions • Data delivery mechanism • Data scp’ed from processing cluster to webserver-accessible storage space • Thumbnail generated for images to provide a quick look feature • Link to the webserver data location provided in the notification email

More Related