590 likes | 605 Views
Status report on commodity server performance, why most VLDBs will be multimedia servers, and preview of Microsoft's SQL Server 7.
E N D
Commodity Database Servers Jim Gray Microsoft Research Gray@Microsoft.com http://Research.Microsoft.com/~Gray/talks
Outline • Status report on Commodity Server Performance • Why Most VLDBs will be Multi-Media Servers • Preview of Microsoft’s SQL Server 7
Status Report on Commodity Server Performance • Standards: • TPC, • SpecWeb, ... • Product benchmarks: e.g. • SAP, • PeopleSoft,… • Both indicate that • NT is 18 months behind Unix-SMP performance • but clusters can make up the difference
SMP HP 9000 16 cpu, Sybase 1152.1 ktpmC, 82$/tpmC NEC 8 cpuSQL Server14.9 ktpmC, 60$/tpmC Cluster IBM SP2 12x8 cpuOracle 8.257 ktpmC, 148$/tpmc Predict:large & inexpensive NT cluster number this year. TPC-C
Performance Champions:NCR/Teradata 1 TB:32x4 node clusters 300 GB: 24x4 node cluster 100 GB: 8x4 cluster All use Teradata software on NCR World-Mark Intel-based hardware TPC-D
Outline • Status report on Commodity Server Performance • Why Most VLDBs will be Multi-Media Servers • Preview of Microsoft’s SQL Server 7
California DMV ~ 20 million cars, drivers, doctors, barbers,.. Some drivers have moving violations DMV knows about 1.5 KB about each one 30 GB total. Microsoft: too big says DoJ 40B$ revenue (in company life time) ~1 billion unit sales: @ 100 B = 100 GB ~100 M customers: @1 KB = 100 GB Wall Mart (no one bigger!) Sells 10 B items per year 100 bytes/item => 1 TB ATT 300 M calls per day (peak day) 10 B calls per year 100 b/call = 1 TB VLDB Reality Test
Its HARD to find 1 TB of transaction data 100 M web hits/day 250 B/hit 1TB/year Its HARD to find 1TB of text data 100 M web pages 10 KB/page = 1 TB How do they do it? Lots of indices? No: that is only 3x Precomputed Aggregates? Yes: OLAP benchmark Start at 30 MB Use 2.7 GB or 6GB database But: this is dumb Email? Microsoft: 6 TB Hotmail: 3.5 TB AOL? VLDB Reality Test
Data Tidal Wave • Seagate 47GB drive @ 3k$ • 100 GB penny per MB drive coming in 2000 • 10 $/GB = 10 k$/ Terabyte! (in y2k) • Everyone can afford one • What’s a terror bite? • If you sell ten billion items a year (e.g Wal-Mart) • And you record 100 bytes on each one • Then you got a Terror Bite • Where will the terror bytes come from? • Multimedia (like the TerraServer) and...
Multi Media: Very Large DBs • Photo is 100 KB, not 100 B • So, photo DBs are 1,000x larger • Examples: • Scanned documents • Photo records of products/people/places • Surveillance • Scientific monitoring
Some TerrorByte Databases • EOS/DIS (picture of planet each week) • 15 PB by 2007 • Federal Reserve Clearing house: images of checks • 15 PB by 2006 (7 year history) • Sloan Digital Sky Survey: • 40 TB raw, 2 TB cooked • TerraServer:
Scaleup - Big Database • Build a 1 TB SQL Server database • Show off Windows NT and SQL Server scalability • Stress test the product • Data must be • 1 TB • Unencumbered • Interesting to everyone everywhere • And not offensive to anyone anywhere • Loaded • 1.1 M place names from Encarta World Atlas • 1 M Sq Km from USGS (1 meter resolution) • 2 M Sq Km from Russian Space agency (2 m) • Will be on web (world’s largest atlas) • Sell images with commerce server. • USGS CRDA: 3 TB more coming.
324 disks (2.9 terabytes) 8 x 440Mhz Alpha CPUs 10 GB DRAM NT EE & SQL 7.0 Photo of the planetUSGS and Russianimages TerraServerWorld’s Largest PC!
Earth is 500 Tera-meters square USA is 10 tm2 100 TM2 land in 70ºN to 70ºS We have pictures of 6% of it 3 tsm from USGS 2 tsm from Russian Space Agency Compress 5:1 (JPEG) to 1.5 TB. Slice into 10 KB chunks Store chunks in DB Navigate with Encarta™ Atlas globe gazetteer StreetsPlus™ in the USA Someday multi-spectral image of everywhere once a day / hour 1.8x1.2 km2 tile 10x15 km2 thumbnail 20x30 km2 browse image 40x60 km2 jump image Background
US Geologic Survey 3 TeraBytes Most data not yet published Based on a CRADA TerraServer makes data available. 1x1 meter 4 TB Continental US New DataComing USGS “DOQ” USGS Digital Ortho Quads (DOQ)
SPIN-2 Russian Space Agency(SovInfomSputnik)SPIN-2 (Aerial Images is Worldwide Distributor) • 1.5 Meter Geo Rectified imagery of (almost) anywhere • Almost equal-area projection • De-classified satellite photos (from 200 KM), • More data coming (1 m) • Want to sell imagery on Internet. • Putting 2 tm2 onto TerraServer.
Microsoft BackOffice SPIN-2 Demo http://www.TerraServer.com
Hardware SPIN-2 1TB Database Server AlphaServer 8400 4x400. 10 GB RAM 324 StorageWorks disks 10 drive tape library (STC Timber Wolf DLT7000 )
Software Terra-Server Web Site Web Client ImageServer Active Server Pages Internet InformationServer 4.0 HTML JavaViewer The Internet broswer MTS Terra-ServerStored Procedures Internet InfoServer 4.0 Internet InformationServer 4.0 Sphinx (SQL Server) MicrosoftSite Server EE Microsoft AutomapActiveX Server Automap Server Image DeliveryApplication SQL Server7 Terra-Server DB Image Provider Site(s)
System Management & Maintenance • Backup and Recovery • STC 9717 Tape robot • Legato NetWorker™ • Sphinx Backup/Restore Utility • Clocked at 80 MBps!! • SQL Server Enterprise Mgr • DBA Maintenance • SQL Performance Monitor
H: G: E: F: TerraServer File Group Layout • Convert 324 disks to 28 RAID5 setsplus 28 spare drives • Make 4 NT volumes (RAID 50)595 GB per volume • Build 30 20GB files on each volume • DB is File Group of 120 files
Gazetteer Design • Classic Snowflake Schema • Fast First hint to Optimizer
Image Data Design • Image pyramid stored in DBMS (250 M recs)
ESA LoadMgr AlphaServer4100 AlphaServer4100 60 4.3 GB Drives Image Delivery and Load DLTTape “tar” \Drop’N’ LoadMgrDB DoJob Wait 4 Load DLTTape NTBackup ... Cutting Machines LoadMgr 10: ImgCutter 20: Partition 30: ThumbImg40: BrowseImg 45: JumpImg 50: TileImg 55: Meta Data 60: Tile Meta 70: Img Meta 80: Update Place ImgCutter 100mbitEtherSwitch \Drop’N’ \Images TerraServer Enterprise Storage Array STCDLTTape Library AlphaServer8400 108 9.1 GB Drives 108 9.1 GB Drives 108 9.1 GB Drives
SQL 7 Testimonial • We started using it March 4 1997 • SQL 7 Pre-Alpha • SQL 7 Alpha • SLQ 7 Beta 1 • SQL 7 Beta • Loaded the DB twice • (we made application mistakes) • Now doing it “right” • Reliability: Great! SQL 7 never lost data • Ease of use: Great! • Functionality: Great!
Outline • Status report on Commodity Server Performance • Why Most VLDBs will be Multi-Media Servers • Preview of Microsoft’s SQL Server 7
SQL 7: Easy & Functional • Dynamic self management • Multi-site management • Alert/response management • Job scheduling and execution • Scriptable management • profiling/tuning tools • Fully Unicode • English Language Query • Integrated text search engine Easy Scalability Data Warehousing
Made It Easier!(fewer knobs) • Desktop & Workgroups • Auto Configure Engine / Dynamic Disk/memory • Reduce Learning Curve, Increase Productivity • Self-Managing SQLAgent, Wizards, “Task Pads” • Large Organizations • Deploy/manage hundreds of SQL Servers • Lower TOC for Large Environments • Multi-Server Operations/ “Lights-out” Environment
Multi-Site Management • Admin servers from one place • Automate simple stuff • Wizards for common stuff • Manage arrays of servers • operations, security,… • Replication • Import/export • Interface is scriptable • COM object model • Script with Java, VB, ... • Scheduling and Multi-step jobs
DBA and Developer Tools • Built-in GUI • data/schema design • data query & edit • intgrated with programming tools • SQL Server Profiler • Selected server events and trace criteria • “Capture” output to screen or replay • SQL Server Expert • Analyzes actual server usage history • Makes recommendations to improve performance • Recommends Index design • Recommends operations procedures
Wizards and GUIs • Wizards galore (over 50 at last count) • MS Access as a query interface • Built-in data access tools (integrated with tools) • Graphical show plan
Many New Wizards... • Web Assistant • Register Servers • Configure Replication • Create Publication • Create Pull Subscription • Create Push Subscription • Replication Partitioning • Create an Index • Create a Stored Procedure • Create a View • More to come... • Create a Database • Scheduled Backup • Create a Maintenance Plan • Create a Scheduled Job • Create an Alert • Security Wizard • Import Data to SQL Server • Export Data From SQL Server • Clustering (Wolfpack) • Index Tuning Wizard
Distributed Management Objects (SQL-DMO) • COM Interfaces for administering SQL Server • Embedded Administration (no UI) • All Administration Functions Supported • Server, Database Configurations, Settings • Object Creation, Security, Replication, Scripting,.. • 40+ Objects, 1000+ properties and methods • Integration Interface for ISV Administration • I.e., Baan using DMO for Scripted App Install • Scripting Via VBA and Jscript + DCOM
SQLAgent Jobs Tasks Alerts Operators FileGroups Files Table Columns View Indexes Stored Procs Keys (PK/FK) Rules Triggers Defaults DMO: Object Model (Overview) SQL Server Databases Users DB Options Transaction Logs Publications Logins Configurations Linked Servers Remote Login
DMO Scripting • Backup a Database Set MyServer = CreateObject("SQLDMO.SQLServer")‘Create Server Object Set MyBackup = CreateObject("SQLDMO.Backup") ‘Create Backup Object MyServer.Name = “MSSALES” ‘ Identify Server MyServer.LoginSecure = True ‘ Windows NT Auth MyServer.Connect ‘ Connect MyBackup.Database = ”SALESII” ‘ Database to backup MyBackup.Files = "\\MyServer\Backups\" _ ‘ Backup Location + MyBackup.Database +”.bak” ‘ Name Backup File MyBackup.SQLBackup MyServer ‘ Back it Up MyServer.Disconnect ‘ We’re Done!
Scalability • Win9x/NTW version • Dynamic row-level locking • Improved query optimizer • Intra-query parallelism • 64-bit support • Replication • Distributed query • High Availability Clusters Easy Scalability Data Warehousing
Scale Down to Windows 95-98 • Full function (same as NTW) • Self managing • Many tools • Integration with Next MS Access • Great for imbedded apps
Distributor ODBC OS 390 DB2 DB2 VSAM CICS Subscriber Subscriber Subscriber Subscriber Publisher Replication 2PC, RPC • Transactional and Merge • Remote update • ODBC and OLE DB subscribers • Wizards • Performance Updating Subscriber (immediate updates)
Global Agg. + Result 50 rows 4 x 50 rows + + + + Local Agg. 50,000 rows Disks Parallel QuerySMP & Disk Parallelism • Plus Distributed • Plus Hash Join (fanciest on the planet) • Plus Optimized Partitioned views • # of emp. per group • total inc. per group
DirectoryService Database (DB2, VSAM, Oracle, …) Spreadsheet Photos Mail Maps Documents and the Web Distributed Heterogeneous QueriesData Fusion / Integration SQL 7.0 Query Processor Join spread sheets, databases, directories, Text DBs etc. Any source that exposes OLE DB interfaces SQL Server as gateway, even on the desktop
Backup Fuzzy Parallel Incremental Restartable Recovery Fast File granularity Reorganize shrinks file reclusters file Auto-repair UtilitiesThe Key to LARGE Databases
Data Warehousing • Warehousing Framework • Visual data modeler • Microsoft repository • Data transformation services (DTS) • Plato & Dcube - Multi Dimensional Data Cubes • English query 2.0 • Built-in text-index engine Easy Scalability Data Warehousing
Key Microsoft Data Warehouse Programs • Data Warehouse Framework (DWF) • Process -- for building, using and managing • Pipeline -- for metadata flow • Protocols -- to integrate components • Data Warehouse Alliance (DWA) • Partners -- ISVs pledged to the framework and its parts • Products -- complete spectrum from Microsoft and third-parties
Microsoft Data Warehousing Framework Building Using Data Warehouse Design (logical/physical schema*/ data flow**) Data Mart Design** (Cubes/Star schema) End-User Tools (Excel**, Access, English Query) Operational Data (OLE-DB **) Data Transformations (DTS**) Data Marts (SQL Server** & OLAP Server**) OLE-DB** Managing Microsoft Repository** (Persistent Shared Meta-Data) DB Schema** Transformation** Scheduling OLAP Data Warehouse Management (Console*, Scheduling**, Events**,Topology*,) **available in SQL Server 7 (* partially) Data Flow Meta-Data Flow
BMC Data Mirror Execusoft Informatica Microsoft Platinum Technology Praxis Prism Sagent SAS Sterling V-Mark Technical and marketing relationship Supports SQL Server storage engine Third-party products tested with BackOffice Alliance for Data Warehousing DW Build DW Access Andyne Business Objects Cognos IQ Software Microsoft NCR Data Mining Pilot Platinum Technology Sagent SAS Seagate Wall Data
DW Alliance Milestones • 9/96 - Launched with 8 founding members • 3/97 - Design review • 1/97 - 6/97 - Expanded to 21 members • 7/97 - Repository design review • Team development of shared metadata • 9/97 - OLE DB for OLAP API specification • 1H’98 - Integration development with Sphinx DTS and Replication APIs
UML UMX DTM GEN DBM CDE SQL COM OCL UML Unified Modeling Language GEN Generic UMX Uml Extensions DBM Database Model SQL Microsoft SQL Server CDE Component Descriptions COM Component Object Model OCL Oracle DTM Data Type Model Microsoft Repository • Based on joint Sterling/Microsoft design (Shipped 97Q2) • Wide distribution:VB, Visual Studio and Third-Parties • Designed with over 60 vendors • Extended to support DB schema, transformations, OLAP • Key element of the DW Framework • UML is abstract model • Everything viewable in UML terms
Repository & Data Warehousing • Common infrastructure -- the meta-data pipeline • Supports interoperability between data warehousing tools and products • Process: • Initial spec developed with 12 vendors • Gathering feedback now • Final spec review in Redmond, 2/98
IUnknown IDTSDataPump Transforms Oracle > SQL Server Data Pump Data Pump Data Transformation • Workflow system manages Data Pump • Pre-defined transforms using the DTS GUI • Procedural VB Script, JavaScript, VBA, any COM • Multi-stream in, Multi-stream out Repository Metadata Transformation Objects ActiveX Scripts • Function Example() Transform() • If DTSSource(“CreditRating”) = “1” then • DTSDestination(” Risk ") = ”Good" • Else If DTSSource(”Credit") = ”2” • DTSDestination(” Risk ") = ”Average” • Else If DTSSource(”Credit") = ”3” • DTSDestination(” Risk ") = ”Bad” • Else • Example = DTS_SkipRow • End if • End Function SQLAgent Multiserver Operations
Transformations • Data quality and validation • Missing values, scrubbing, exception handling • Data integration • Heterogeneous query, join keys, elim. dups • Transforms • Combine/decompose multiple columns to one • Aggregation • Central metadata • Business rules, data lineage