1 / 37

Multi-Language Extensibility in MS SQL Server

Multi-Language Extensibility in MS SQL Server. James Hamilton JamesRH@microsoft.com Microsoft SQL Server 2002.06.25. Agenda. Common Language Runtime (CLR) Integration Overview: Basic infrastructure Design philosophy 4 S’s: Safety, Security, Scalability, & Speed SQL Features Enabled:

ova
Download Presentation

Multi-Language Extensibility in MS SQL Server

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Multi-Language Extensibility in MS SQL Server James Hamilton JamesRH@microsoft.com Microsoft SQL Server 2002.06.25

  2. Agenda • Common Language Runtime (CLR) Integration Overview: • Basic infrastructure • Design philosophy • 4 S’s: Safety, Security, Scalability, & Speed • SQL Features Enabled: • CLR assemblies • Scalar functions • Relational functions • Aggregate functions • Stored Procedures • Triggers • Types and Methods • SQL Types Package • Access methods • Systems & Integration issues

  3. Basic Infrastructure • CLR Hosted inside SQL Server: • 4S’s: safety, security, scalability, & speed • Run verified, type-safe code in process • .NET Framework languages (VB, C#, Java, …) • Competitive offerings: • Oracle & DB2: in-proc Java & JDBC • Data access in process: • Based on ADO.NET • Same programming model as middle-tier • SQLTypes support: • SQL type semantics in managed code on client & server tiers

  4. Philosophy: Safety & Security • Safety: • User code does not compromise integrity of server process • Verifiable code • Leverage CLR’s code access security • User code cannot call UI, create threads, synchronization, or call unmanaged code • Security: • Access to SQL data from user code via SQL authorization model • Access to system resources from user code via .NET Framework code permissions • Administrators control permissions given to assemblies

  5. Philosophy: Security model • Resources protected by permissions • threads, files, unmanaged code access etc. • API that exposes a resource • introduces a demand for that permission • Stack-walk based permission check: • every assembly in call-stack has permission • Permissions granted to assemblies determined by machine and user-level security policy • Allows host to add another policy layer to further restrict permissions

  6. Philosophy: 3 Permission Sets • SAFESQL • Internal computation plus data access • No access to resources outside SQL Server • No unmanaged calls • Must be verifiable • EXTERNAL ACCESS • SAFESQL + access to external resources • Requires EXTERNAL ACCESS permission to create • SQL Server will impersonate the caller • Must be verifiable • UNRESTRICTED • No controls: can call unmanaged, & un-verifiable • Only Sysadmin can create

  7. Philosophy: Scalability & Speed • Scalability: • As many concurrent users as TSQL • Integrated SQL and runtime threads • Collaboration between SQL and GC • Speed: • Efficient data access in process • Compiled user code, not interpreted as TSQL • Fast transitions in/out of runtime

  8. Philosophy: Functions speed • .NET functions approximating speed of TSQL inline expressions • .NET Framework functions much faster than TSQL functions for complex expressions

  9. Agenda • CLR Integration Overview: • Basic infrastructure • Design philosophy • 4 S’s: Safety, Security, Scalability, & Speed • SQL Features Enabled: • CLR assemblies • Scalar functions • Relational functions • Aggregate functions • Stored Procedures • Triggers • Types and Methods • Access methods • SQL Types Package • Systems & Integration issues

  10. VB, C#, … Build Assembly: “TaxLib.dll” SQL Data Definition: create assembly … create function … create procedure … create trigger … create type … SQL Queries: select sum(tax(sal,state) ) from Emp where county = ‘King’ SQL Server Development Steps VS .NET Project Runtime hosted inside SQL

  11. Assembly Registration CREATE ASSEMBLY lib_geom FROM ‘\\m1\types\geometry.dll’ WITH PERMISSION SET SAFE WITH AUTOREGISTER DROP ASSEMBLY lib_geom • Assemblies stored in DB • Backup, restore, etc. with data • Code permissions assigned per assembly • SafeSQL, external access, & unrestricted • Autoregister functions • Using .NET custom attributes • Assembly benefits • Self-describing metadata: types & file dependencies • Unit of deployment: permissions & versioning

  12. ALTER Assembly • Schema bound: • Cannot invalidate persistent data or indexes • No tables with columns of UDT from this assembly • No indexes on functions of that assembly • Packaging considerations • Place routines and types in different assemblies

  13. Register A Function CREATE FUNCTION distance ( @x1 int, @y1 int, @x2 int, @y2 int ) RETURNS float EXTERNAL NAME ‘lib_geom:CPoint.Distance’ DETERMINISTIC RETURNS NULL ON NULL INPUT DROP FUNCTION distance • Functions called from queries: • Static class functions • Deterministic functions: • No SQL updates or access to global state • Can be scalar or table-valued • Using a function in a query: SELECT s.name FROM Supplier s WHERE dbo.distance( s.x, s.y, @x, @y ) < 3

  14. Project double tax( double sal ) { if (sal < 50000.0) return sal * 0.15; if (sal >= 50000.0 && sal <= 90000.0) return sal * 0.23; else return sal * 0.35; } Filter Scan Employee Call Function SELECT name FROM Employee WHERE dbo.tax(salary) >= 50000 SQLSERVR.EXE Common Language Runtime Managed code Unmanaged code

  15. Register A Procedure CREATE PROCEDURE check_inventory EXTERNAL NAME ‘lib_events:CInventory.check_level’ DROP PROCEDURE check_inventory • Procedures called directly • Can contain SQL queries, updates, or DDL • Can return results directly to client • Not directly callable (as functions are) from queries

  16. Register A Trigger CREATE TRIGGER supplier_event ON supplier AFTER INSERT, UPDATE EXTERNAL NAME ‘lib_events:CNotif.Supp_Event’ DROP TRIGGER supplier_event • Similar to procedures • with access to inserted & deleted tables

  17. UDT: Introduction • Extends SQL Server type system • Complex structure & associated behavior • Managed classes in any CLR language • Functionally subsume SQL-99 distinct types • equivalent to structured types • Other products: Oracle Cartridges, Informix DataBlades, & IBM DB Extenders

  18. UDT: Ordering values • Two kinds of ordering supported • Binary ordering indicated via CLR property: public const bool IsByteOrdered; • Operator-based ordering • Overloaded comparison operators: public static SQLBool operator == (<type>, <type>);

  19. UDT: Creating • Registered as a type from an already registered assembly CREATE ASSEMBLY MapLib FROM ‘\\mysrv\share\MapLib.dll’ CREATE TYPE Point EXTERNAL NAME ‘MapLib:Basetypes’

  20. UDT: Instantiating • Can be declared as column type: Create table Cities( Name varchar(20), State varchar(20), Location Point DEFAULT new Point(0,0)) • Variables & params of T-SQL & .NET routines can UDTs

  21. UDT: Read operations • Registered methods, properties, & public data members can be use in SQL queries • Methods assumed to be non-mutators unless marked using custom attribute • Only non-mutators allowed in SELECTs Declare @p Set @p = new point(32, 23) Select Location::Distance(@p) From Cities

  22. UDT: Ordering Operations • Indexing, UNIQUE, & PRIMARY KEY constraints: • Requires binary ordering support • Inconsistent operator based ordering can cause corrupt indices, incorrect query results • ORDER BY, GROUP BY, & comparison operators: • If UDT supports binary ordering, always use binary ordering • Else use overloaded operators; rely on consistency and correctness of UDT implementation

  23. UDT: UDT Write Operations • Constructor can be called using new operator UPDATE Cities SET Location = new Point(12.3, 46.2) • Properties and public data members can be modified through assignment UPDATE Cities SET Location::X = 23.5, Location::Y = 23.5 WHERE Name = ‘Anchorage’

  24. UDT: Write Operations • Methods marked mutators callable in UPDATEs UPDATE Cities SET Location::SetXY(23.5, 23.5) WHERE Name = ‘Anchorage’ • INSERTs get values from string, binary form or from constructor invocation INSERT Cities(‘Anchorage’, ‘Alaska’, ’19.3:2.3’)

  25. UDAggs Required Methods • Implement QP interface to aggregate values over a group • Interface needed: • Initialize a group • Pass values in the group to accumulate the aggregation • Merge multiple groups (for parallel plans) • Finalize computation and retrieve result

  26. Creating UDAggs • Aggregates implemented as set of methods packaged in a .NET class • UDAgg is bound to a class in existing assembly CREATE AGGREGATE Concat(nvarchar) RETURNS nvarchar EXTERNAL ‘MyLib:Concat’

  27. UDAgg class definition e.g. Public class Concat { //Private store for accumulating results. private SQLString agg; //Optimizer properties public static bool IsNullOnNullsSet() {return true;} public static bool IsNullOnEmptySet() {return false;} //Aggregation interface public void Init(); public void Accum(SQLString str); public void Merge(Concat otheragg); public SQLString Term(); }

  28. Final Stage: Access Methods • Complete extensibility solution requires user defined access methods: • Informix supports adding access methods: • Expose locking, concurrency control, recovery, etc. • Hard to make work in general … few engineers able to write internal storage engine code well • Many will try and they will have bugs … hard on product quality image • Solution: Tailor existing access methods • E.g. map spatial queries to queries over 2 B-trees or single Z-transform • Query rewrite language such that system can rewrite a function as an appropriate set of operations over 1 or more tables/indexes

  29. CLR SQL Types Package • Defines C#, VB, & Java types corresponding SQL Server types • Reduce impedance mismatch between programming language & data • Consistent expression evaluation in mid- & server-tier programming • SQL Types library • Managed classes: system.Data.SQLTypes • Provide SQL semantics • Nullability, three-valued logic • Precision & scale in operations

  30. SQL Types Example • Tax function implemented with SQL types: using System; using System.Data.SQLTypes; public class myFinances { public static SQLDouble tax( SQLDouble sal ) { if ( sal < 50000.0 ) return sal * 0.15; if ( sal >= 50000.0 && sal <= 90000.0 ) return sal * 0.23 else return sal * 0.35; } }

  31. Agenda • CLR Integration Overview: • Basic infrastructure • Design philosophy • 4 S’s: Safety, Security, Scalability, & Speed • SQL Features Enabled: • CLR assemblies • Scalar functions • Relational functions • Aggregate functions • Stored Procedures • Triggers • Types and Methods • Access methods • SQL Types Package • Systems & Integration issues

  32. Integration: In-Proc Data Access • Goals: • Symmetric model • Client, server, & mid-tier • Fully supported by dev tools • VS dev environment including debug • In same address space as server: • Don’t marshal to TDS (tabular data stream) • Don’t loop-back through network interface • Avoid unnecessary copies • Avoid unnecessary transitions across managed/unmanaged (into VM) interface

  33. Integration: Process Model • Thread integration: • SQL Server: non-preemptive user-level thread scheduling • Fibers multiplexed on O/S thread with thread migration • Implication: TLS storage used by VM won’t work in SQL execution environment • Lazy preemptive: • When SQL calls potentially blocking code it must “go preemptive” (allocate a thread) • So thread required for each VM call – expensive • Lazy pre-emptive: assume preemption not required and set timer to catch rare cases when thread allocation is needed • Garbage collection blocks VM threads: • Hard on multi-user DB throughput • GC doesn’t directly suspend – calls DB to suspend and DB schedules other non-VM hosted work

  34. Integration: Memory Management • Memory allocation from DB system (rather than via O/S): • Allows memory resources to be used for different purposes over time • Garbage collection very general but DB systems often have more info: • e.g. free memory pool at end of statement • Goal: • DB memory costing able to request memory from VM when needed • Per user memory consumption tracking & quotas

  35. Integration: Remaining Issues • Assemblies (code) kept in database: • Dispatched to VM through buffers • Stream interface would be better yet • Support attention propagation (user interrupt) into VM • Track all VM resources • O/S portability layer tailored to DB execution environment • Error containment/minimize multi-user impact of failures: • Out-of-proc execution • Multiple VMs

  36. Integration: Query optimization • Gather function cost statistics • Value histograms, execution cost • Reorder of predicate evaluation • Based on cost of evaluation • Function indexes • Speed up expensive functions • Extends computed column indexes and indexed (materialized) views

  37. CLR Integration Summary • DB extensibility without product stability risk • Ability to add types and behavior w/o access to engine source code • Scalable, safe, secure & fast • Rich server programming model • Any CLR language • Symmetric mid- & server-tier model • Better development environment • Integrates SQL tools with Visual Studio

More Related