1 / 6

IISWC 2007 Panel Analyzing Petabytes

IISWC 2007 Panel Analyzing Petabytes. Suchi Raman Netezza Corp. http://www.netezza.com/. Petabyte Database Workloads. Macro-analytic queries Identify trends and patterns Very large data volumes Query times dominated by disk scan times Micro-analytic queries Short running queries

timothyi
Download Presentation

IISWC 2007 Panel Analyzing Petabytes

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. IISWC 2007 PanelAnalyzing Petabytes Suchi Raman Netezza Corp. http://www.netezza.com/

  2. Petabyte Database Workloads • Macro-analytic queries • Identify trends and patterns • Very large data volumes • Query times dominated by disk scan times • Micro-analytic queries • Short running queries • Query run once and stored • Pre-computed summaries • Data management • ETL load/unload • Backup/restore

  3. Netezza NPS System

  4. Software challenges • Effective disk bandwidth • Optimal data layouts • Data compression • Increased effective disk bandwidth (and reliability!) • Upgrades and evolution of on-disk formats • Minimize disk reads (indexes, caches) • Query processing algorithms • Skew avoidance algorithms • Scheduling among queries, especially with mixed workloads combining large and small queries • System Monitoring/profiling • System monitoring during busy periods • Accurate profiling techniques • Data management challenges • High speed data path in/out of NPS system • Efficient/flexible data formats for load/unload • Infrastructure challenge – fast external devices for sourcing/sinking data • Custom functions (UDFs/UDAs) implemented within the system

  5. Hardware challenges • Hardware challenges • Increased effective disk bandwidth (and reliability!) • Multi-core technology • Balancing CPU-to-disk ratio • Specialized engines (e.g., FPGA-based filtering) • Faster internal and external connectivity

  6. How can University Researchers contribute? • Explore new applications and data types • E.g., network traffic analysis • Geospatial data • Biological data types • Skew avoidance/scheduling algorithms • Applications built on UDFs/UDAs • Verification methods for optimizer algorithms • Platform improvements • Disk performance and reliability • FPGA filtering algorithms • Faster interconnect networks • Power and cooling improvements

More Related