320 likes | 699 Views
SQL server 2008 Beyond Relational. Михеев Юрий. 29 мая 2008г. 18:30. Module Overview. Spatial Data Filestream Data Storage XML Data and Query Hierarchies and HierarchyID Sparse Data Designs Sparse Columns Filtered Indexes Filtered Statistics. Relational and Non-Relational Data.
E N D
SQL server 2008Beyond Relational Михеев Юрий 29 мая 2008г. 18:30
Module Overview • Spatial Data • Filestream Data Storage • XML Data and Query • Hierarchies and HierarchyID • Sparse Data Designs • Sparse Columns • Filtered Indexes • Filtered Statistics
Relational and Non-Relational Data • Relational data uses simple data types • Each type has a single value • Generic operations work well with the types • Relational storage/query may not be optimal for • Hierarchical data • Sparse, variable, property bags • Some types • benefit by using a custom library • use extended type system (complex types, inheritance) • use custom storage and non-SQL APIs • use non-relational queries and indexing
Spatial Data • Spatial data provides answers to location-based queries • Which roads intersect the Microsoft campus? • Does my land claim overlap yours? • List all of the Italian restaurants within 5 kilometers • Spatial data is part of almost every database • If your database includes an address
Spatial Data Types • The Open Geospatial Consortium defines a hierarchy of spatial data types • Point • Linestring • Polygon • MultiPoint • MultiLinestring • MultiPolygon • GeomCollection • Non-instanciable classes based on these
SQL Server 2008 and Spatial Data • SQL Server supports two spatial data types • GEOMETRY - flat earth model • GEOGRAPHY - round earth model • Both types support all of the instanciable OGC types • InstanceOf method can distinguish between them • Supports two dimension data • X and Y or Lat and Long members • Z member - elevation (user-defined semantics) • M member - measure (user-defined semantics)
Properties and Methods • The spatial data types are exposed as SQLCLR UDTs • Use '.' syntax for properties • Use '.' syntax for instance methods • Use '::' syntax for static methods • Methods and Properties are case-sensitive • Each type uses a set of properties and methods that correspond to OGC functionality • With Extensions • Geography implements all OGC properties and methods • Geography implements most OGC properties and methods • 2-D vector only implemented
Input • Spatial data is stored in a proprietary binary format • Instance of the type can be NULL • Can be input as • Well Known binary - ST[Type]FromWKB • Well Known text - ST[Type]FromText • Geography Markup Language (GML) - GeomFromGml • Can also use SQLCLR functions • Parse • Point - extension function • Input from SQLCLR Type - SqlGeometry, SqlGeography
Output • Spatial Data Can Be Output As • Well Known binary - STAsBinary • Well Known text - STAsText • GML - AsGml • Text with Z and M values - AsTextZM • SQLCLR standard method • ToString - returns Well Known text • As SQLCLR object - SqlGeometry, SqlGeography • Other useful formats are GeoRSS, KML • Not Directly Supported
SRID • Each instance of a spatial type must have an SRID • Spatial Reference Identifier • SRID specifies the specification used to compute it • SRID 4326 - GPS • SRID 4269 - usually used by ESRI • SRID 0 - no special reference, default for GEOMETRY • Methods that use multiple spatial types (e.g., STDistance) must have types with matching SRID • Else method returns NULL • Geography instance must reference one of these SRID stored in sys.spatial_reference_systems
Useful Methods/Properties • Descriptive • STArea • STLength • STCentroid • Relation between two instances • STIntersects • STDistance • Collections • STGeometryN • STPointN
Sample Query • Which roads intersect Microsoft’s main campus? • SELECT * FROM roads WHERE roads.geom.Intersects(@ms)=1
Filestream storage • Storing large binary objects in databases is suboptimal • Large objects take buffers in database memory • Updating large objects cause database fragmentation • In file system however, "update" is delete and insert • "Before image" in an update is not deleted immediately • Storing all related data in a database adds • Transactional consistency • Integrated, point-in-time backup and restore • Single storage and query vehicle
SQL Server 2008 Filestream Implementation • A filegroup for filestream storage is declared using DDL • Filestream storage is tied to a database • The filegroup is mapped to a directory • Must be NTFS file system • Caution: Files deleteable from file system if you have appropriate permissions • VARBINARY(MAX) columns can be defined with FILESTREAM attribute • Table must also have UNIQUEIDENTIFIER column • Filestream storage not available for other large types • Data is stored in the file system
Programming with Filestreams • Filestream columns are available with SQL methods • If SQL is used, indistinguishable from varbinary(max) • Filestream can be stored and accessed using file IO • PathName function retrieves a symbolic path name • Acquire context with • GET_FILESTREAM_TRANSACTION_CONTEXT • Use OpenSqlFilestream to get a file handle based on • File Name • Required Access • Access Options • FilestreamTransaction context
SQL Server 2000 And XML Data • SQL Server 2000 • OPENXML - XML decomposition to relational • SELECT...FOR XML - XML composition from relational • SQLXML - Series of Web Releases • Access to relational data as XML using HTTP/IIS • With optional stylesheet-based transformation • SQLXML bulk loader - XML -> relational -> BCP • Middle-tier based XML composition (SQLXML view) • SQLXML client libraries • Stored procedures exposed as Web Service through IIS
SQL Server 2005 and XML data • Native XML data type • XML Schema support in database • Mapping of SQL Server types to XSD types • XQuery support • Exposed as methods on XML data type • XQuery strings are input to these methods • XML Indexes • Many improvements to SELECT...FOR XML • XML input from file system through BULK data provider • .NET XML APIs available through SQLCLR • SQLXML functionality ships with database • SSIS supports XML input through adapter
XQuery Support • The XML data type has five methods • Four are accessor methods that use XQuery • exists - returns bit - used to check based on expression • value - returns single value from XML, cast to SQL type • query - returns XML data type output • nodes - decomposes XML to rowset - similar to OPENXML • One is a mutator method that uses XML DML • modify - performs a modification of XML instance • modify(insert....) - inserts single node • modify(replace value of...) - updates single element/attribute value • modify(delete...) - deletes all nodes that match expression
XML Query and Indexes • XQuery methods use relational query processor • Relational and XQuery processed into a single plan • XML decomposed into "node table" during processing • XML Indexes speed XQuery processing • Primary XML Index - materializes node table, primary key • Value XML Index - for searching on node values • Path XML Index - useful when path itself is selective • Property XML Index - name/value property bag matching • XML content can be used with Fulltext Search
SQL Server 2008 XML Enhancements • Mapping of new date/time data types to XSD types • Support for more XML Schema constructs • Union and list types • Many industry-standard XSD schemas use this • Lax validation of xsd:any wildcards • Office documents and other XSD schemas use this • XQuery support for 'let' clause in FLWOR expressions • XML DML supports insert of XML data type
Hierarchical Data • Hierarchical data consists of nodes and edges • In employee-boss relationship, employee and boss are each nodes, the relationship between them is an edge • Hierarchical data can be modeled in relational as • Adjacency model - separate column for edge • Most common, column can either be in same or separate table • Path Enumeration model - column w/hierarchical path • Nested Set model - adds "left" and "right" columns to represent edges, which must be maintained separately
SQL Server 2005 and Hierarchical Data • SQL Server 2005 adds Native Hierarchical Queries • Recursive common table expression • ANSI-standard • Hierarchical Data can be modeled as XML • XML data type uses ORDPATH format to store elements, attributes and hierarchical constructs
SQL Server 2008 and Hierarchical Data • New Built-In Data Type - HierarchyID • SQLCLR based system UDT • Useable on .NET clients directly as SqlHierarchyId • An implementation of path enumeration model • Uses ORDPATH internally for speed
HierarchyID • Depth-first indexing • "Level" property - allows breadth-first indexing • Methods for common hierarchical operations • GetRoot • GetLevel • IsDescendant • GetDescendant, GetAncestor • Reparent • Does not enforce tree structure • Can enforce tree using constraints
Sparse Properties • Many designs require sparse properties • Hardware store has different attributes for each product • Lab tests have different readings for each test • Directory systems have different attributes for each item • These are name-value pairs (property bags) • Because they don't appear on each tuple (row) they are difficult to model
Modeling Sparse Properties • Sparse Properties often modeled as separate table • Base table has one row per item - common properties • Property table has N rows per item - one per property • Known as Entity-Attribute-Value • Can be modeled as sparse tables • 256 table limit in SQL Server JOIN • Can be modeled as sparse columns • 1024 column limit in SQL Server tables • Can be modeled as XML • Common properties are elements, sparse are attributes
SQL Server 2008 and Sparse Columns • Sparse Column extends column limit • Still 1024 column limit for "non-sparse" columns • Up to over 4000000 sparse columns • Column marked as SPARSE in table definition • Additional column represents all sparse column name value pairs as attributes in a single XML element
Filtered Indexes and Statistics • In a sparse column design, 95% of values for a column can be NULL • SQL Server 2008 implements efficient storage • Filtered indexes can be used with sparse columns • Filtered statistics can be kept on sparse columns • Can keep track of only non-null value distribution
Review • SQL Server stores relational and non-relational data • Spatial data has • Library of spatial functions • Three standard input and output formats • Filestream storage stores BLOBs on file system • Buffer, fragmentation savings • XML data support improves with each release • Direct type for support of hierarchical data • EAV efficiently modeled with sparse columns
Resources • SQL Server Spatial Data Technology Center http://www.microsoft.com/sql/2008/technologies/spatial.mspx • Whitepaper: Delivering Location Intelligence with Spatial Data http://www.microsoft.com/sql/techinfo/whitepapers/spatialdata.mspx • MSDN Webcast: Building Spatial Applications with SQL Server 2008, Event ID: 1032353123 • Whitepaper: What's New for XML in SQL Server 2008 http://www.microsoft.com/sql/techinfo/whitepapers/sql_2008_xml.mspx • Whitepaper: Managing Unstructured Data with SQL Server 2008 http://www.microsoft.com/sql/techinfo/whitepapers/sql_2008_unstructured.mspx • BOL: http://msdn.microsoft.com/ru-ru/library/bb543165(sql.100).aspx • http://www.microsoft.com/Rus/sql/2008/default.mspx • WebCasts: http://www.microsoft.com/sqlserver/2008/en/us/events-webcasts.aspx • JumpStart: http://sqlserver2008jumpstart.microsofttraining.com/content/info.asp?CcpSubsiteID=69&infoid=27
Спасибо за внимание yuriymikheev@gmail.com