60 likes | 80 Views
IISWC 2007 Panel Analyzing Petabytes. Suchi Raman Netezza Corp. http://www.netezza.com/. Petabyte Database Workloads. Macro-analytic queries Identify trends and patterns Very large data volumes Query times dominated by disk scan times Micro-analytic queries Short running queries
E N D
IISWC 2007 PanelAnalyzing Petabytes Suchi Raman Netezza Corp. http://www.netezza.com/
Petabyte Database Workloads • Macro-analytic queries • Identify trends and patterns • Very large data volumes • Query times dominated by disk scan times • Micro-analytic queries • Short running queries • Query run once and stored • Pre-computed summaries • Data management • ETL load/unload • Backup/restore
Software challenges • Effective disk bandwidth • Optimal data layouts • Data compression • Increased effective disk bandwidth (and reliability!) • Upgrades and evolution of on-disk formats • Minimize disk reads (indexes, caches) • Query processing algorithms • Skew avoidance algorithms • Scheduling among queries, especially with mixed workloads combining large and small queries • System Monitoring/profiling • System monitoring during busy periods • Accurate profiling techniques • Data management challenges • High speed data path in/out of NPS system • Efficient/flexible data formats for load/unload • Infrastructure challenge – fast external devices for sourcing/sinking data • Custom functions (UDFs/UDAs) implemented within the system
Hardware challenges • Hardware challenges • Increased effective disk bandwidth (and reliability!) • Multi-core technology • Balancing CPU-to-disk ratio • Specialized engines (e.g., FPGA-based filtering) • Faster internal and external connectivity
How can University Researchers contribute? • Explore new applications and data types • E.g., network traffic analysis • Geospatial data • Biological data types • Skew avoidance/scheduling algorithms • Applications built on UDFs/UDAs • Verification methods for optimizer algorithms • Platform improvements • Disk performance and reliability • FPGA filtering algorithms • Faster interconnect networks • Power and cooling improvements