650 likes | 774 Views
SQL Server 2008 Beyond Relational. Yoni Okun DBA Team Manager SRL Group Yonio@srl.co.il. Data Storage Explosion!. New Digital Information Created. 55% in personal PCs 16% in corporate data warehouses Internet only 21 TB Email 500x more than Internet/year (~400TB). 2003: 7.0.
E N D
SQL Server 2008 Beyond Relational Yoni Okun DBA Team Manager SRL Group Yonio@srl.co.il
Data Storage Explosion! New Digital Information Created • 55% in personal PCs • 16% in corporate data warehouses • Internet only 21 TB • Email 500x more than Internet/year (~400TB) 2003: 7.0 2002: 5.3 12Exabytes 2001: 3.8 2000: 2.9 1999: 2.2 All information before 1999 1999- 2003 (20 Ex)
Data Storage Explosion! Storage of New Information • 1 Megabyte • 2 – 250 page novels • 1 Gigabyte: • a pickup truck filled with books • 2 Terabyte: • 100,000 trees made into paper and printed. • 2 Petabytes: • All U.S. academic research libraries • 2 Exabytes: • Total volume of information generated in 1999 MagneticDigital64% Hard Drive Prices MagneticAnalog28% Paper1% Film7% 1980… 40,000$!
Relational and Non-Relational Data Relational Data Docs & Multimedia Applications XML Spatial
Agenda • Spatial Data Types • Filestream Data Storage • HierarchyID Data Type • Sparse Design – Columns & Stats • Filtered Indexes • Integrated Full Text Search • Administrative Features
Spatial Data Type • Answers to location-based queries • Which roads intersect campus? • Does my land claim overlap yours? • All Italian restaurants within 5 km • Does Your Database Include Address? • Where are all SQL Server Geeks? SELECT * FROM roads WHERE roads.geom.Intersects(@ms)=1
Spatial Data Type • The Open Geospatial Consortium (OGC) defines a hierarchy of spatial data types
Spatial Data Type Geometry Geography A geodetic model A planar model (Created by the Mercator projection)
Spatial Data Type Supports two dimension Only • While Points can hold members: Z (for elevation) and M (for measure) They are Not used in any spatial method • Coordinate order is Latitude /Longitude • Longitude/Latitude In RTM
Working With Spatial Data Type 1. Use with DDL CREATE TABLE Hotels (HotelNamenvarchar(200), HotelLocation ) geography 2. Insert Values INSERT INTO Hotels VALUES ('Dan', ), ('Herdos' ) geography::Point(29.548306566591655,34.96617794036865,4326) , geography::STGeomFromWKB(0x0101000000B699452D3C8C3D4000000082 877B4140,4326) 3. Use Methods & Properties to work with data DECLARE @x geography, @y geography select @x = HotelLocationfrom Hotels where HotelName= 'Dan' select @y = HotelLocationfrom Hotels where HotelName= 'Herdos' select @x.ToString(), @y.ToString() select @x.STDistance(@y)
Spatial Data Type • Input: • Binary - ST[Type]FromWKB • Text - ST[Type]FromText • GML – GeomFromGml • Output: • Binary - STAsBinary • Text - STAsText • GML - AsGml • Text with Z and M – AsTextZM • SRID: • Represent different assumption around what is the earth elipse. • SRID 4326 – GPS • sys.spatial_reference_systems
Spatial Data Type Demonstration • Using Spatial Viewer • Using Spatial Data Type 15
Useful Methods / Properties • Descriptive • STArea • STLength • STCentroid • Relation between two instances • STIntersects • STDistance • Collections • STGeometryN - geography element in a GeometryCollection • STPointN – Nth point of a geometry
Spatial Indexes • Spatial Is Indexed With An • Adaptive Multi- level Grid • Index Is Integrated Into The SQL • Server • Index Consists Of A Grid- based • Hierarchy • Each Level Subdivides The Grid • Sector That Is Defined In The • Level Above Spatial Index - Conceptual Model
Spatial Indexes 1. Decomposing space into Grid Hierarchy: • Grids parameter determines density: • Low = 4x4 • Medium = 8x8 (Default) • High = 16x16 • 4x4x4 = 65,536 cells !! 2. Tessellation • Fitting Objects Into Grid (Touched Cells).
Tessellation Process Level 3 & 4 Intersections 85 matching Cells Level 1 Intersections Level 2 Intersections Cells Per Object Stops Tessellation At Limit: (Per Object = 15, Cells = 13) Complete Match Cells Aren’t Broken To Lower Level (42 Cells)
Creating Spatial Indexes CREATE SPATIAL INDEX Sindx_col2 ON SpatialTable (geometry_col) USING GEOMETRY_GRID WITH ( BOUNDING_BOX = ( xmin=0, ymin=0, xmax=500, ymax=200 ), GRIDS = (LOW, LOW, MEDIUM, HIGH), CELLS_PER_OBJECT = 64 ) • BOUNDING_BOX - GEOMETRY index only • GRIDS - 4 Grid Levels, Grid Densities For Level - Low, Medium, High • CELLS_PER_OBJECT - number of cells recorded for matching
Spatial Data Type Demonstration • Integration With Maps • App Code by Dan Morgenstern • SRL Group 22
Agenda • Spatial Data Types • FILESTREAM Data Storage • HierarchyID Data Type • Sparse Design – Columns & Stats • Filtered Indexes • Integrated Full Text Search • Administrative Features
FILESTREAM Storage • To Blob Or Not To Blob? • Mgmt Complexities Vs. Streaming & Performance • Cons • LOBS take Memory buffers • Updating LOBS cause fragmentation • $ Per GB • However, File system "update" is delete and insert • Pros • Transactional consistency • Point-in-time backup & restore • Single storage and query vehicle ?
FILESTREAM Storage Store BLOBs in Database Dedicated BLOB Store Use File Servers Application Application Application BLOBs BLOBs BLOBs DB DB DB • Low cost per GB • Streaming Performance • Lower cost per GB at scale • Scalability & Expandability • Integrated management • Data-level consistency Advantages • Complex application development & deployment • Separate data management • Enterprise-scales only • Complex application development & deployment • Integration with structured data Challenges • Poor data streaming support • File size limitations • Highest cost per GB • Windows File Servers • NetAppNetFiler • EMC Centera • Fujitsu Nearline • SQL Server VARBINARY(MAX) Example
FILESTREAM Storage • FILESTREAM Combines The Best Of 2 Worlds • Integrates DB Engine With NTFS • Storing BLOB Data As Files • FEATURES: • Uses NT Cache For Caching File Data. • SQL Bpool Not Used And Is Available To Query Processing • Win 32 File System Interface Provide Streaming Access To Data • Compressed volumes are supported
FILESTREAM Storage • It’s not only about Storing But About Working With BLOBS: • Image Analysis • Voice Interpretation & Scripting • Mixing Satellite Feeds & Spatial Data Type For • Weather Reports • Etc..
FILESTREAM Implementation • At Database Level • Declare A Filegroup & Map To Directory • At Table Level • Define On VARBINARY(MAX) • Must Have UNIQUEIDENTIFIER Column • Integrated Security & Management: • Permissions On FILESTREAM Implied On Files. • Tools And Functions Work For Filestream Data. (Backup)
FILESTREAM Programming Dual Programming Model • TSQL (Same as SQL BLOB) • Win32 Streaming File IO APIs • Begin a SQL Server Tran • Obtain a symbolic PATH NAME & TRANSACTION CONTEXT • Open a handle using sqlncli10.dll - OpenSqlFilestream • Use Handle Within System.IO Classes • Commit Tran
FILESTREAM Programming // 1. Start up a database transaction - SqlTransactiontxn = cxn.BeginTransaction(); // 2. Insert a row to create a handle for streaming. newSqlCommand("INSERT <Table> VALUES ( @mediaId, @fileName, @contentType);",cxn, txn); // 3. Get a filestreamPathName & transaction context. newSqlCommand("SELECT PathName(), GET_FILESTREAM_TRANSACTION_CONTEXT() FROM <Table>", cxn, txn); // 4. Get a Win32 file handle using SQL Native Client call. SafeFileHandle handle = SqlNativeClient.OpenSqlFilestream(...); // 5. Open up a new stream to write the file to the blob. FileStreamdestBlob= newFileStream(handle, FileAccess.Write); // 6. Loop through source file and write to FileStream handle while ((bytesRead = sourceFile.Read(buffer, 0, buffer.Length)) > 0) {destBlob.Write(buffer, 0, bytesRead);} // 7. Commit transaction, cleanup connection. -txn.Commit();
Initial FILESTREAM Limitations • Remote FILESTREAM storage - Not supported • DB Snapshot and Mirroring – Not supported • Features not integrated • SQL Encryption • Table Value Parameters
FILESTREAM Storage Demonstration • Administrating FILESTREAM • Programming FILESTREAM 33
Agenda • Spatial Data Types • FILESTREAM Data Storage • HierarchyID Data Type • Sparse Design – Columns & Stats • Filtered Indexes • Integrated Full Text Search • Administrative Features
HierarchyID Scenarios: • List Forum Threads • Business Organization Charts • Product Categories • Files/Folders Management Features: • Compact - 100,000 Nodes, 6 Level ~ 5 Bytes / Node • Available To Clr Clients As The Sqlhierarchyid Data Type
2k5 Alternatives - Adjacency Model • Pros: • Understandable • 2k5 – Recursive CTE • Cons: • De-Normalized (Personnel+Chart) • Not Set-Based
2k5 Alternatives – Path Enumeration • Holds Path As A String Concatenation • Pros: • Logical Representation • Cons: • Searches Done With String Functions • And Predicates On Those Path Strings
2k5 Alternatives – Nested Sets • "Left" And "Right" Columns Represent Edges • Pros: • Predictable , Set Based Results • Cons: • Must Be Maintained Separately
Demo Structure 1. Insert Root 2. Insert 1st Subordinate 3. Enter Rest of Tree 4. Query Hierarchical Data 5. Reparent Employee 6. Add Subordiante 7. Reparent Node
HierarchyID Demonstration • Trees and Hierarchies 41
HierarchyID Methods GetRoot() - root of hierarchy tree ToString() - Logical string representation of Value GetDescendant()- a child node x of this GetAncestor() - hierarchyid of the nth ancestor of this IsDescendantOf() - true if child is a descendant of this GetLevel() – integer representing depth of the node this GetReparentedValue() -path to newRoot, followed by oldRoot to this
HierarchyID Indexes Depth-first Index Breadth-first Index Employees that Report Directly To The Same Manager Employees That Report Through A Manager
Agenda • Spatial Data Types • FILESTREAM Data Storage • HierarchyID Data Type • Sparse Design – Columns & Stats • Filtered Indexes • Integrated Full Text Search • Administrative Features
Sparse Properties How Can We Model property bags? • Products Catalog • Lab tests with different readings per test • Because they don't appear on each row they are difficult to model
Modeling Sparse Properties Entity-Attribute-Value • Non Relational • Value Column Issues • Need PIVOT to Make Sparse • Xml • Non Relational • Complex Updates • Sparse Columns • Hit 1,024 Limit • Storage
Sparse Columns • Efficient Way Of Managing Empty Data • Null Data No Physical Space • 30,000 Column Limit (RTM) • 1024 For "Non-sparse" Columns • Column Set - Xml Of All Sparse Values • For Web Site That Needs To Show The Properties
Sparse Columns • Sparse require more storage for nonnull values then regular Columns • NULL Percentage • percent of the data that must be NULL for savings 40% space
Sparse Columns Demonstration • Build Sparse Design • Sparse Storage 50
Agenda • Spatial Data Types • FILESTREAM Data Storage • HierarchyID Data Type • Sparse Design – Columns & Stats • Filtered Indexes • Integrated Full Text Search • Administrative Features
Filtered Indexes • Dramatic Effect on Performance & Storage: • Smaller. Stats is more accurate • Reduced Management Costs (on Changes) • Different indexes on frequently \ infrequently changed columns. • Storage-wise is the same. • Do your queries relate to subsets of Data??
Filtered Indexes • Candidate Columns: • Heterogeneous Categories Of Values • Columns With Distinct Ranges Of Values • Partitioned Tables • sparse columns • Can keep track of only non-null value distribution
Filtered Indexes Create Index Ind1 on t(c1) where C1=‘A’ or C1 = ‘D’ Create index Ind2 on t(sc7) where sc7 is not null