370 likes | 398 Views
Multi-Language Extensibility in MS SQL Server. James Hamilton JamesRH@microsoft.com Microsoft SQL Server 2002.06.25. Agenda. Common Language Runtime (CLR) Integration Overview: Basic infrastructure Design philosophy 4 S’s: Safety, Security, Scalability, & Speed SQL Features Enabled:
E N D
Multi-Language Extensibility in MS SQL Server James Hamilton JamesRH@microsoft.com Microsoft SQL Server 2002.06.25
Agenda • Common Language Runtime (CLR) Integration Overview: • Basic infrastructure • Design philosophy • 4 S’s: Safety, Security, Scalability, & Speed • SQL Features Enabled: • CLR assemblies • Scalar functions • Relational functions • Aggregate functions • Stored Procedures • Triggers • Types and Methods • SQL Types Package • Access methods • Systems & Integration issues
Basic Infrastructure • CLR Hosted inside SQL Server: • 4S’s: safety, security, scalability, & speed • Run verified, type-safe code in process • .NET Framework languages (VB, C#, Java, …) • Competitive offerings: • Oracle & DB2: in-proc Java & JDBC • Data access in process: • Based on ADO.NET • Same programming model as middle-tier • SQLTypes support: • SQL type semantics in managed code on client & server tiers
Philosophy: Safety & Security • Safety: • User code does not compromise integrity of server process • Verifiable code • Leverage CLR’s code access security • User code cannot call UI, create threads, synchronization, or call unmanaged code • Security: • Access to SQL data from user code via SQL authorization model • Access to system resources from user code via .NET Framework code permissions • Administrators control permissions given to assemblies
Philosophy: Security model • Resources protected by permissions • threads, files, unmanaged code access etc. • API that exposes a resource • introduces a demand for that permission • Stack-walk based permission check: • every assembly in call-stack has permission • Permissions granted to assemblies determined by machine and user-level security policy • Allows host to add another policy layer to further restrict permissions
Philosophy: 3 Permission Sets • SAFESQL • Internal computation plus data access • No access to resources outside SQL Server • No unmanaged calls • Must be verifiable • EXTERNAL ACCESS • SAFESQL + access to external resources • Requires EXTERNAL ACCESS permission to create • SQL Server will impersonate the caller • Must be verifiable • UNRESTRICTED • No controls: can call unmanaged, & un-verifiable • Only Sysadmin can create
Philosophy: Scalability & Speed • Scalability: • As many concurrent users as TSQL • Integrated SQL and runtime threads • Collaboration between SQL and GC • Speed: • Efficient data access in process • Compiled user code, not interpreted as TSQL • Fast transitions in/out of runtime
Philosophy: Functions speed • .NET functions approximating speed of TSQL inline expressions • .NET Framework functions much faster than TSQL functions for complex expressions
Agenda • CLR Integration Overview: • Basic infrastructure • Design philosophy • 4 S’s: Safety, Security, Scalability, & Speed • SQL Features Enabled: • CLR assemblies • Scalar functions • Relational functions • Aggregate functions • Stored Procedures • Triggers • Types and Methods • Access methods • SQL Types Package • Systems & Integration issues
VB, C#, … Build Assembly: “TaxLib.dll” SQL Data Definition: create assembly … create function … create procedure … create trigger … create type … SQL Queries: select sum(tax(sal,state) ) from Emp where county = ‘King’ SQL Server Development Steps VS .NET Project Runtime hosted inside SQL
Assembly Registration CREATE ASSEMBLY lib_geom FROM ‘\\m1\types\geometry.dll’ WITH PERMISSION SET SAFE WITH AUTOREGISTER DROP ASSEMBLY lib_geom • Assemblies stored in DB • Backup, restore, etc. with data • Code permissions assigned per assembly • SafeSQL, external access, & unrestricted • Autoregister functions • Using .NET custom attributes • Assembly benefits • Self-describing metadata: types & file dependencies • Unit of deployment: permissions & versioning
ALTER Assembly • Schema bound: • Cannot invalidate persistent data or indexes • No tables with columns of UDT from this assembly • No indexes on functions of that assembly • Packaging considerations • Place routines and types in different assemblies
Register A Function CREATE FUNCTION distance ( @x1 int, @y1 int, @x2 int, @y2 int ) RETURNS float EXTERNAL NAME ‘lib_geom:CPoint.Distance’ DETERMINISTIC RETURNS NULL ON NULL INPUT DROP FUNCTION distance • Functions called from queries: • Static class functions • Deterministic functions: • No SQL updates or access to global state • Can be scalar or table-valued • Using a function in a query: SELECT s.name FROM Supplier s WHERE dbo.distance( s.x, s.y, @x, @y ) < 3
Project double tax( double sal ) { if (sal < 50000.0) return sal * 0.15; if (sal >= 50000.0 && sal <= 90000.0) return sal * 0.23; else return sal * 0.35; } Filter Scan Employee Call Function SELECT name FROM Employee WHERE dbo.tax(salary) >= 50000 SQLSERVR.EXE Common Language Runtime Managed code Unmanaged code
Register A Procedure CREATE PROCEDURE check_inventory EXTERNAL NAME ‘lib_events:CInventory.check_level’ DROP PROCEDURE check_inventory • Procedures called directly • Can contain SQL queries, updates, or DDL • Can return results directly to client • Not directly callable (as functions are) from queries
Register A Trigger CREATE TRIGGER supplier_event ON supplier AFTER INSERT, UPDATE EXTERNAL NAME ‘lib_events:CNotif.Supp_Event’ DROP TRIGGER supplier_event • Similar to procedures • with access to inserted & deleted tables
UDT: Introduction • Extends SQL Server type system • Complex structure & associated behavior • Managed classes in any CLR language • Functionally subsume SQL-99 distinct types • equivalent to structured types • Other products: Oracle Cartridges, Informix DataBlades, & IBM DB Extenders
UDT: Ordering values • Two kinds of ordering supported • Binary ordering indicated via CLR property: public const bool IsByteOrdered; • Operator-based ordering • Overloaded comparison operators: public static SQLBool operator == (<type>, <type>);
UDT: Creating • Registered as a type from an already registered assembly CREATE ASSEMBLY MapLib FROM ‘\\mysrv\share\MapLib.dll’ CREATE TYPE Point EXTERNAL NAME ‘MapLib:Basetypes’
UDT: Instantiating • Can be declared as column type: Create table Cities( Name varchar(20), State varchar(20), Location Point DEFAULT new Point(0,0)) • Variables & params of T-SQL & .NET routines can UDTs
UDT: Read operations • Registered methods, properties, & public data members can be use in SQL queries • Methods assumed to be non-mutators unless marked using custom attribute • Only non-mutators allowed in SELECTs Declare @p Set @p = new point(32, 23) Select Location::Distance(@p) From Cities
UDT: Ordering Operations • Indexing, UNIQUE, & PRIMARY KEY constraints: • Requires binary ordering support • Inconsistent operator based ordering can cause corrupt indices, incorrect query results • ORDER BY, GROUP BY, & comparison operators: • If UDT supports binary ordering, always use binary ordering • Else use overloaded operators; rely on consistency and correctness of UDT implementation
UDT: UDT Write Operations • Constructor can be called using new operator UPDATE Cities SET Location = new Point(12.3, 46.2) • Properties and public data members can be modified through assignment UPDATE Cities SET Location::X = 23.5, Location::Y = 23.5 WHERE Name = ‘Anchorage’
UDT: Write Operations • Methods marked mutators callable in UPDATEs UPDATE Cities SET Location::SetXY(23.5, 23.5) WHERE Name = ‘Anchorage’ • INSERTs get values from string, binary form or from constructor invocation INSERT Cities(‘Anchorage’, ‘Alaska’, ’19.3:2.3’)
UDAggs Required Methods • Implement QP interface to aggregate values over a group • Interface needed: • Initialize a group • Pass values in the group to accumulate the aggregation • Merge multiple groups (for parallel plans) • Finalize computation and retrieve result
Creating UDAggs • Aggregates implemented as set of methods packaged in a .NET class • UDAgg is bound to a class in existing assembly CREATE AGGREGATE Concat(nvarchar) RETURNS nvarchar EXTERNAL ‘MyLib:Concat’
UDAgg class definition e.g. Public class Concat { //Private store for accumulating results. private SQLString agg; //Optimizer properties public static bool IsNullOnNullsSet() {return true;} public static bool IsNullOnEmptySet() {return false;} //Aggregation interface public void Init(); public void Accum(SQLString str); public void Merge(Concat otheragg); public SQLString Term(); }
Final Stage: Access Methods • Complete extensibility solution requires user defined access methods: • Informix supports adding access methods: • Expose locking, concurrency control, recovery, etc. • Hard to make work in general … few engineers able to write internal storage engine code well • Many will try and they will have bugs … hard on product quality image • Solution: Tailor existing access methods • E.g. map spatial queries to queries over 2 B-trees or single Z-transform • Query rewrite language such that system can rewrite a function as an appropriate set of operations over 1 or more tables/indexes
CLR SQL Types Package • Defines C#, VB, & Java types corresponding SQL Server types • Reduce impedance mismatch between programming language & data • Consistent expression evaluation in mid- & server-tier programming • SQL Types library • Managed classes: system.Data.SQLTypes • Provide SQL semantics • Nullability, three-valued logic • Precision & scale in operations
SQL Types Example • Tax function implemented with SQL types: using System; using System.Data.SQLTypes; public class myFinances { public static SQLDouble tax( SQLDouble sal ) { if ( sal < 50000.0 ) return sal * 0.15; if ( sal >= 50000.0 && sal <= 90000.0 ) return sal * 0.23 else return sal * 0.35; } }
Agenda • CLR Integration Overview: • Basic infrastructure • Design philosophy • 4 S’s: Safety, Security, Scalability, & Speed • SQL Features Enabled: • CLR assemblies • Scalar functions • Relational functions • Aggregate functions • Stored Procedures • Triggers • Types and Methods • Access methods • SQL Types Package • Systems & Integration issues
Integration: In-Proc Data Access • Goals: • Symmetric model • Client, server, & mid-tier • Fully supported by dev tools • VS dev environment including debug • In same address space as server: • Don’t marshal to TDS (tabular data stream) • Don’t loop-back through network interface • Avoid unnecessary copies • Avoid unnecessary transitions across managed/unmanaged (into VM) interface
Integration: Process Model • Thread integration: • SQL Server: non-preemptive user-level thread scheduling • Fibers multiplexed on O/S thread with thread migration • Implication: TLS storage used by VM won’t work in SQL execution environment • Lazy preemptive: • When SQL calls potentially blocking code it must “go preemptive” (allocate a thread) • So thread required for each VM call – expensive • Lazy pre-emptive: assume preemption not required and set timer to catch rare cases when thread allocation is needed • Garbage collection blocks VM threads: • Hard on multi-user DB throughput • GC doesn’t directly suspend – calls DB to suspend and DB schedules other non-VM hosted work
Integration: Memory Management • Memory allocation from DB system (rather than via O/S): • Allows memory resources to be used for different purposes over time • Garbage collection very general but DB systems often have more info: • e.g. free memory pool at end of statement • Goal: • DB memory costing able to request memory from VM when needed • Per user memory consumption tracking & quotas
Integration: Remaining Issues • Assemblies (code) kept in database: • Dispatched to VM through buffers • Stream interface would be better yet • Support attention propagation (user interrupt) into VM • Track all VM resources • O/S portability layer tailored to DB execution environment • Error containment/minimize multi-user impact of failures: • Out-of-proc execution • Multiple VMs
Integration: Query optimization • Gather function cost statistics • Value histograms, execution cost • Reorder of predicate evaluation • Based on cost of evaluation • Function indexes • Speed up expensive functions • Extends computed column indexes and indexed (materialized) views
CLR Integration Summary • DB extensibility without product stability risk • Ability to add types and behavior w/o access to engine source code • Scalable, safe, secure & fast • Rich server programming model • Any CLR language • Symmetric mid- & server-tier model • Better development environment • Integrates SQL tools with Visual Studio