370 likes | 374 Views
Learn about the basic infrastructure, design philosophy, and features enabled by CLR integration in MS SQL Server. Explore the concepts of safety, security, scalability, and speed, and understand how to use CLR assemblies, scalar functions, relational functions, stored procedures, triggers, and types in SQL Server. Discover the benefits of integrating SQL Server with .NET Framework languages, such as VB, C#, and Java. Gain insights into the philosophy of safety, security, and scalability, and learn about the permission sets and speed optimizations available in SQL Server CLR integration.
E N D
Multi-Language Extensibility in MS SQL Server James Hamilton JamesRH@microsoft.com Microsoft SQL Server 2001.04.12
Agenda • Common Language Runtime (CLR) Integration Overview: • Basic infrastructure • Design philosophy • 4 S’s: Safety, Security, Scalability, & Speed • SQL Features Enabled: • CLR assemblies • Scalar functions • Relational functions • Aggregate functions • Stored Procedures • Triggers • Types and Methods • SQL Types Package • Access methods • Systems & Integration issues
Basic Infrastructure • CLR Hosted inside SQL Server: • 4S’s: safety, security, scalability, & speed • Run verified, type-safe code in process • .NET Framework languages (VB, C#, Java, …) • Competitive offerings: • Oracle & DB2: in-proc Java & JDBC • Data access in process: • Based on ADO.NET • Same programming model as middle-tier • SQLTypes support: • SQL type semantics in managed code on client & server tiers
Philosophy: Safety & Security • Safety: • User code does not compromise integrity of server process • Verifiable code • Leverage CLR’s code access security • User code cannot call UI, create threads, synchronization, or call unmanaged code • Security: • Access to SQL data from user code via SQL authorization model • Access to system resources from user code via .NET Framework code permissions • Administrators control permissions given to assemblies
Philosophy: Security model • Resources protected by permissions • threads, files, unmanaged code access etc. • API that exposes a resource • introduces a demand for that permission • Stack-walk based permission check: • every assembly in call-stack has permission • Permissions granted to assemblies determined by machine and user-level security policy • Allows host to add another policy layer to further restrict permissions
Philosophy: 3 Permission Sets • SAFESQL • Internal computation plus data access • No access to resources outside SQL Server • No unmanaged calls • Must be verifiable • EXTERNAL ACCESS • SAFESQL + access to external resources • Requires EXTERNAL ACCESS permission to create • SQL Server will impersonate the caller • Must be verifiable • UNRESTRICTED • No controls: can call unmanaged, & un-verifiable • Only Sysadmin can create
Philosophy: Scalability & Speed • Scalability: • As many concurrent users as TSQL • Integrated SQL and runtime threads • Collaboration between SQL and GC • Speed: • Efficient data access in process • Compiled user code, not interpreted as TSQL • Fast transitions in/out of runtime
Philosophy: Functions speed • .NET functions approximating speed of TSQL inline expressions • .NET Framework functions much faster than TSQL functions for complex expressions
Agenda • CLR Integration Overview: • Basic infrastructure • Design philosophy • 4 S’s: Safety, Security, Scalability, & Speed • SQL Features Enabled: • CLR assemblies • Scalar functions • Relational functions • Aggregate functions • Stored Procedures • Triggers • Types and Methods • Access methods • SQL Types Package • Systems & Integration issues
VB, C#, … Build Assembly: “TaxLib.dll” SQL Data Definition: create assembly … create function … create procedure … create trigger … create type … SQL Queries: select sum(tax(sal,state) ) from Emp where county = ‘King’ SQL Server Development Steps VS .NET Project Runtime hosted inside SQL
Assembly Registration CREATE ASSEMBLY lib_geom FROM ‘\\m1\types\geometry.dll’ WITH PERMISSION SET SAFE WITH AUTOREGISTER DROP ASSEMBLY lib_geom • Assemblies stored in DB • Backup, restore, etc. with data • Code permissions assigned per assembly • SafeSQL, external access, & unrestricted • Autoregister functions • Using .NET custom attributes • Assembly benefits • Self-describing metadata: types & file dependencies • Unit of deployment: permissions & versioning
ALTER Assembly • Schema bound: • Cannot invalidate persistent data or indexes • No tables with columns of UDT from this assembly • No indexes on functions of that assembly • Packaging considerations • Place routines and types in different assemblies
Register A Function CREATE FUNCTION distance ( @x1 int, @y1 int, @x2 int, @y2 int ) RETURNS float EXTERNAL NAME ‘lib_geom:CPoint.Distance’ DETERMINISTIC RETURNS NULL ON NULL INPUT DROP FUNCTION distance • Functions called from queries: • Static class functions • Deterministic functions: • No SQL updates or access to global state • Can be scalar or table-valued • Using a function in a query: SELECT s.name FROM Supplier s WHERE dbo.distance( s.x, s.y, @x, @y ) < 3
Project double tax( double sal ) { if (sal < 50000.0) return sal * 0.15; if (sal >= 50000.0 && sal <= 90000.0) return sal * 0.23; else return sal * 0.35; } Filter Scan Employee Call Function SELECT name FROM Employee WHERE dbo.tax(salary) >= 50000 SQLSERVR.EXE Common Language Runtime Managed code Unmanaged code
Register A Procedure CREATE PROCEDURE check_inventory EXTERNAL NAME ‘lib_events:CInventory.check_level’ DROP PROCEDURE check_inventory • Procedures called directly • Can contain SQL queries, updates, or DDL • Can return results directly to client • Not directly callable (as functions are) from queries
Register A Trigger CREATE TRIGGER supplier_event ON supplier AFTER INSERT, UPDATE EXTERNAL NAME ‘lib_events:CNotif.Supp_Event’ DROP TRIGGER supplier_event • Similar to procedures • with access to inserted & deleted tables
UDT: Introduction • Extends SQL Server type system • Complex structure & associated behavior • Managed classes in any CLR language • Functionally subsume SQL-99 distinct types • equivalent to structured types • Other products: Oracle Cartridges, Informix DataBlades, & IBM DB Extenders
UDT: Ordering values • Two kinds of ordering supported • Binary ordering indicated via CLR property: public const bool IsByteOrdered; • Operator-based ordering • Overloaded comparison operators: public static SQLBool operator == (<type>, <type>);
UDT: Creating • Registered as a type from an already registered assembly CREATE ASSEMBLY MapLib FROM ‘\\mysrv\share\MapLib.dll’ CREATE TYPE Point EXTERNAL NAME ‘MapLib:Basetypes’
UDT: Instantiating • Can be declared as column type: Create table Cities( Name varchar(20), State varchar(20), Location Point DEFAULT new Point(0,0)) • Variables & params of T-SQL & .NET routines can UDTs
UDT: Read operations • Registered methods, properties, & public data members can be use in SQL queries • Methods assumed to be non-mutators unless marked using custom attribute • Only non-mutators allowed in SELECTs Declare @p Set @p = new point(32, 23) Select Location::Distance(@p) From Cities
UDT: Ordering Operations • Indexing, UNIQUE, & PRIMARY KEY constraints: • Requires binary ordering support • Inconsistent operator based ordering can cause corrupt indices, incorrect query results • ORDER BY, GROUP BY, & comparison operators: • If UDT supports binary ordering, always use binary ordering • Else use overloaded operators; rely on consistency and correctness of UDT implementation
UDT: UDT Write Operations • Constructor can be called using new operator UPDATE Cities SET Location = new Point(12.3, 46.2) • Properties and public data members can be modified through assignment UPDATE Cities SET Location::X = 23.5, Location::Y = 23.5 WHERE Name = ‘Anchorage’
UDT: Write Operations • Methods marked mutators callable in UPDATEs UPDATE Cities SET Location::SetXY(23.5, 23.5) WHERE Name = ‘Anchorage’ • INSERTs get values from string, binary form or from constructor invocation INSERT Cities(‘Anchorage’, ‘Alaska’, ’19.3:2.3’)
UDAggs Required Methods • Implement QP interface to aggregate values over a group • Interface needed: • Initialize a group • Pass values in the group to accumulate the aggregation • Merge multiple groups (for parallel plans) • Finalize computation and retrieve result
Creating UDAggs • Aggregates implemented as set of methods packaged in a .NET class • UDAgg is bound to a class in existing assembly CREATE AGGREGATE Concat(nvarchar) RETURNS nvarchar EXTERNAL ‘MyLib:Concat’
UDAgg class definition e.g. Public class Concat { //Private store for accumulating results. private SQLString agg; //Optimizer properties public static bool IsNullOnNullsSet() {return true;} public static bool IsNullOnEmptySet() {return false;} //Aggregation interface public void Init(); public void Accum(SQLString str); public void Merge(Concat otheragg); public SQLString Term(); }
Final Stage: Access Methods • Complete extensibility solution requires user defined access methods: • Informix supports adding access methods: • Expose locking, concurrency control, recovery, etc. • Hard to make work in general … few engineers able to write internal storage engine code well • Many will try and they will have bugs … hard on product quality image • Solution: Tailor existing access methods • E.g. map spatial queries to queries over 2 B-trees or single Z-transform • Query rewrite language such that system can rewrite a function as an appropriate set of operations over 1 or more tables/indexes
CLR SQL Types Package • Defines C#, VB, & Java types corresponding SQL Server types • Reduce impedance mismatch between programming language & data • Consistent expression evaluation in mid- & server-tier programming • SQL Types library • Managed classes: system.Data.SQLTypes • Provide SQL semantics • Nullability, three-valued logic • Precision & scale in operations
SQL Types Example • Tax function implemented with SQL types: using System; using System.Data.SQLTypes; public class myFinances { public static SQLDouble tax( SQLDouble sal ) { if ( sal < 50000.0 ) return sal * 0.15; if ( sal >= 50000.0 && sal <= 90000.0 ) return sal * 0.23 else return sal * 0.35; } }
Agenda • CLR Integration Overview: • Basic infrastructure • Design philosophy • 4 S’s: Safety, Security, Scalability, & Speed • SQL Features Enabled: • CLR assemblies • Scalar functions • Relational functions • Aggregate functions • Stored Procedures • Triggers • Types and Methods • Access methods • SQL Types Package • Systems & Integration issues
Integration: In-Proc Data Access • Goals: • Symmetric model • Client, server, & mid-tier • Fully supported by dev tools • VS dev environment including debug • In same address space as server: • Don’t marshal to TDS (tabular data stream) • Don’t loop-back through network interface • Avoid unnecessary copies • Avoid unnecessary transitions across managed/unmanaged (into VM) interface
Integration: Process Model • Thread integration: • SQL Server: non-preemptive user-level thread scheduling • Fibers multiplexed on O/S thread with thread migration • Implication: TLS storage used by VM won’t work in SQL execution environment • Lazy preemptive: • When SQL calls potentially blocking code it must “go preemptive” (allocate a thread) • So thread required for each VM call – expensive • Lazy pre-emptive: assume preemption not required and set timer to catch rare cases when thread allocation is needed • Garbage collection blocks VM threads: • Hard on multi-user DB throughput • GC doesn’t directly suspend – calls DB to suspend and DB schedules other non-VM hosted work
Integration: Memory Management • Memory allocation from DB system (rather than via O/S): • Allows memory resources to be used for different purposes over time • Garbage collection very general but DB systems often have more info: • e.g. free memory pool at end of statement • Goal: • DB memory costing able to request memory from VM when needed • Per user memory consumption tracking & quotas
Integration: Remaining Issues • Assemblies (code) kept in database: • Dispatched to VM through buffers • Stream interface would be better yet • Support attention propagation (user interrupt) into VM • Track all VM resources • O/S portability layer tailored to DB execution environment • Error containment/minimize multi-user impact of failures: • Out-of-proc execution • Multiple VMs
Integration: Query optimization • Gather function cost statistics • Value histograms, execution cost • Reorder of predicate evaluation • Based on cost of evaluation • Function indexes • Speed up expensive functions • Extends computed column indexes and indexed (materialized) views
CLR Integration Summary • DB extensibility without product stability risk • Ability to add types and behavior w/o access to engine source code • Scalable, safe, secure & fast • Rich server programming model • Any CLR language • Symmetric mid- & server-tier model • Better development environment • Integrates SQL tools with Visual Studio