550 likes | 752 Views
WG3 Database Languages. Stephen Cannan Convenor 2002-05-06. Agenda. Working Group changes ISO/IEC 9075 ISO/IEC 20606 Recap of recent changes Sketch of planned changes. Working Group changes. 3 ex-WG5 subprojects absorbed 20606 -1 Authorization and Audit 20606-2 Distribution Schema
E N D
WG3 Database Languages Stephen Cannan Convenor 2002-05-06
Agenda • Working Group changes • ISO/IEC 9075 • ISO/IEC 20606 • Recap of recent changes • Sketch of planned changes
Working Group changes • 3 ex-WG5 subprojects absorbed • 20606-1 Authorization and Audit • 20606-2 Distribution Schema • 20606-3 Encompassing Transaction • Effect on WG3 internal structure
ISO/IEC 9075 • Multi-part standard • Part 1: SQL/Framework • Part 2: SQL/Foundation • Part 3: SQL/CLI • Part 4: SQL/PSM • Part 9: SQL/MED • Part 10: SQL/OLB • Part 11: SQL/Schemata • Part 13: SQL/JRT • Part 14: SQL/XML
Part 1: SQL/Framework • Common definitions & concepts • Structure of multi-part standard • Basic conformance structure & statement • About 85 pages
Part 2: SQL/Foundation • The “core” of the standard • Includes: • Traditional SQL • Object-oriented SQL • Module Language • Host language bindings (except Java) • Dynamic SQL • Direct SQL • Excludes: • Information Schema & Definition Schema (in Part 11) • About 1300 pages
Part 3: SQL/CLI • Call-Level Interface • Best-known implementation: ODBC • About 400 pages
Part 4: SQL/PSM • Persistent Stored Modules • PSM-96 specified: • functions & procedures • SQL-server modules • computational completeness • In PSM:1999, functions & procedures moved to Foundation (same in 2003) • Analogous to PL/SQL, Transact-SQL, etc. • About 170 pages
Part 9: SQL/MED • Management of External Data • Foreign Servers, Foreign-Data Wrappers, Foreign Tables • SQL-aware vs non-SQL-aware • Datalinks • Merge with 20606-2 Distribution Schema • About 500 pages
Part 10: SQL/OLB • Object Language Bindings • Embedded SQL in Java • SQLJ Part 0 • About 360 pages
Part 11: SQL/Schemata • Information Schema • Definition Schema • About 300 pages
Part 13: SQL/JRT • Java Routines and Types • Java routines stored in an SQL database, and invoked from SQL statements • Java classes used as data types of SQL columns • About 200 pages
Part 14: SQL/XML • Under construction • Using SQL and XML together • About 150 pages, but still growing
ISO/IEC 20206 • 20606-1 Authorization and Audit • 20606-3 Encompassing Transaction
Project 20206 –1Authorization and Audit • Addresses issues of: • Authentication • Access Control • Transfer Integrity • Transfer Confidentiality • Non-repudiation • No Working Draft available • CD expected 2002-06-01 • John Hadjioannou listed as Editor but not present
Project 20206 –3Encompassing Transaction • Interacts with external Transaction Managers • Support for xa_ interface • 9075-6 XA Specialization (SQL/Transaction) • Propagates Global Context information • No Working Draft available • CD expected 2002-06-01 • Bob Sundaylisted as Editor but has withdrawn
Recap of recent changes: ISO/IEC 9075 SQL:2003 • Since 1999 we have already added: • Part 9 SQL/MED • Part 10 SQL/OLB • Part 13 SQL/JRT • Addendum 1 SQL/OLAP • SQL:2003 will be the 4th generation • 1987, 1992, 1999 • Consists of: • Many error corrections(TC stands at 376 pages and growing) • A few new features • Part 14 SQL/XML
New Features • Data types • Predicates • Semantics • Security • Active Database
New Data Types • BIGINT • MULTISET
New Data Types — BIGINT • Exact numeric, scale 0 • Precision of BIGINT precision of INTEGER • Must have same radix as SMALLINT and INTEGER
New Data Types — MULTISET • Varying-length, unordered collections of element having specified type • COL1 INTEGER MULTISET • COL2 ROW( F1 BIGINT, F2 VARHCAR(4000) ) MULTISET • No (specified) maximum cardinality • SQL:2003 collection types include: • MULTISET • ARRAY
New Data Types — MULTISET • INTEGER MULTISET() • Empty multiset with integer element type (not null!) • INTEGER MULTISET(2, 3, 5, 7) • Integer multiset with first few primes • INTEGER MULTISET(SELECT COL1 FROM TBL1 WHERE COL2 > 10) • Integer multiset populated from values in column of table
New Data Types — MULTISET • CARDINALITY (value1) • Type of value1must be multiset • Returns number of elements in value • SET (value1) • Type of value1 must be multiset • Returns value1 with duplicate elements removed • ELEMENT (value1) • Type of value1 must be multiset • Cardinality of value1 must be 1 • Returns the single element in value1
New Data Types — MULTISET • UNNEST(value1) AS corr-name • Type of value1 must be multiset • “Un-nests” value1and turns the elements into rows of a virtual table • UNNEST MULTISET (2, 3, 5, 7) AS P
New Data Types — MULTISET • value1 MULTISET setop quantifier value2 • setop — UNION or EXCEPT or INTERSECT • quantifier — ALL or DISTINCT • SELECT col1 MULTISET INTERSECT DISTINCT col2 FROM tbl1 WHERE CARDINALITY(col2) > 50 • Close analogs to ordinary set operators UNION, EXCEPT, and INTERSECT • However, ALL is the default quantifier
New Data Types — MULTISET • New aggregates: • COLLECT — Transform the values in a group into a multiset • FUSION — Form a union of the multisets in a group — number of duplicates of a given value in the result is the sum of the number of duplicates in the multisets in the rows of the group • INTERSECTION — Form an intersection of the multisets in a group — number of duplicates of a given value in the result is the minimum of the number of duplicates in the multisets in the rows of the group
New Predicates • NORMALIZED • If character string argument is in Unicode Normalization Form C (NFC), returns true • MEMBER • If first argument is a member of multiset in second argument, returns true • SUBMULTISET • If first argument is a submultiset of second argument, returns true • SET • If argument is a set (no duplicate values), returns true
NORMALIZED Predicate • string_value IS [ NOT ] NORMALIZED • Character repertoire of string_value must be UCS • Encourages, but does not force, normalization to NFC
MEMBER Predicate • value [ NOT ] MEMBER [ OF ] multiset • multiset must be a multiset • Type of valuecomparable to element type of multiset • If multiset is empty, returns false • If value is equal to some element of multiset, returns true • Else, if some element of multiset is null, returns unknown
SUBMULTISET Predicate • multiset1 [ NOT ] SUBMULTISET [ OF ] multiset2 • Both multiset1 and multiset2 must be multisets… • …and their element types must be comparable • If the cardinalities of multiset1 and multiset2 are equal and if every value in multiset1 has a corresponding value in multiset2, then returns true
SET Predicate • multiset IS [ NOT ] A SET • multiset must be a multiset • If there are no duplicate values in multiset, returns true • Maximum of 1 null value in a set
New Semantics • MERGE statement • OLAP: TABLESAMPLE • Generated columns • Identity columns and sequence generators
New Semantics — MERGE • If some row in the target table matches some row in the source table according to the specified predicate, then the row in the subject table is updated • If no row in the target table matches a given row in the source table according to the specified predicate, then the row from the source table is inserted into the subject table
New Semantics — MERGE • MERGE INTO table-name [ AS correlation ]USING table-referenceON search-conditionWHEN MATCHED THEN SET col = value • MERGE INTO table-name [ AS correlation ]USING table-referenceON search-conditionWHEN NOT MATCHED THEN INSERT [ ( col-list ) ] VALUES ( val-list ) • Both MATCHED and NOT MATCHED clauses permitted once each, in either order
TABLESAMPLE • New feature in the OLAP capability • Permits evaluation of aggregates on samples derived from database data • Permits faster debugging when database is huge • Two forms of sampling: BERNOULLI and SYSTEM
TABLESAMPLE • TABLESAMPLE method ( percentage ) [ repeatable ] • method is either BERNOULLI or SYSTEM • Bernoulli: Sample table contains approximately percentage % of the rows of the original table; the probability of a given row of the original table appearing in the sample table is percentage %, independently of every other row • System: Sample table contains approximately percentage % of the rows of the original table; the probability of a given row of the original table appearing in the sample table is percentage % • repeatable acts as a random number seed for sampling
Generated Columns • Ordinary columns of base tables: base columns • Generated column: value computed from values of zero or more base columns in same row • CREATE TABLE EMPLOYEES ( EMP_ID INTEGER, SALARY DECIMAL(7,2), BONUS DECIMAL(7,2), TOTAL_COMP GENERATED ALWAYS AS ( SALARY + BONUS ), HR_CLERK GENERATED ALWAYS AS ( CURRENT_USER ))
Identity columns &Sequence generators • Identity column: Invented as mechanism to allow automatic population of table keys • Sequence generator: Invented to allow generation of “next” value in a sequence • Together, they allow generation of keys for unique columns (or combinations of columns) when required • Application chooses: select value for identity column or let sequence generator make choice
Sequence generators • Sequence generators • Data type (exact numeric with scale zero) • Starting value • Increment (positive: ascending, negative: descending) • Minimum and maximum values • Cycle option • External (explicit schema object) or internal (part of another schema object, such a column)
External sequence generators • CREATE SEQUENCE seqname AS type START WITH value INCREMENT BY value MAXVALUE value CYCLE • Variations: • NO CYCLE • NO MAXVALUE, MINVALUE, NO MINVALUE • Order of clauses can vary } Common sequence generator options
Sequence generators • Every sequence generator has a “current base value” — initially set to the start value • Generate next value of a sequence generator: NEXT VALUE FOR seqname • Returns current base value + N * increment for some N 0 • If computed value > MAXVALUE (or < MINVALUE) and NO CYCLE, then raise an exception • Otherwise, reset to MINVALUE (or MAXVALUE) and compute new value for some N.
Sequence generators • INSERT INTO TBL ( COL1, COL2 ) VALUES ( 10, NEXT VALUE FOR seqgen ) • CALL myproc ( NEXT VALUE FOR seqgen ) • SET J = J + NEXT VALUE FOR seqgen
Sequence generators • ALTER SEQUENCE seqname • RESTART WITH newbase — resets start of computation • Common sequence generator options — change start value, maximum or minimum, increment, cycle option • DROP SEQUENCE seqname dropoption
Identity columns &Sequence generators • Base tables may optionally have a single identity column • Data type: exact numeric, scale 0 — e.g., INTEGER • Associated with internal sequence generator • Start value • Minimum and/or maximum value • Increment value • Cycle option
Identity columns &Sequence generators • GENERATED ALWAYS or GENERATED BY DEFAULT • ALWAYS — Not allowed to UPDATE column; INSERT requires OVERRIDING SYSTEM VALUE • BY DEFAULT — Allowed to INSERT or UPDATE column; if column not specified on INSERT, then value automatically generated
Identity columns &Sequence generators • CREATE TABLE employees ( EMP_ID INTEGER GENERATED ALWAYS AS IDENTITY START WITH 100 INCREMENT 1 MINVALUE 10 NO MAXVALUE NO CYCLE, SALARY DECIMAL(7,2), ...,)
New part – SQL/XML • Out for FCD ballot • Known not to be complete • Current contents comprise infrastructure — mappings of… • SQL identifiers to & from XML QNames • SQL types & values onto XML schema types & values • SQL tables onto XML schema types • Publishing functions (SQL data expressed as XML) • New built-in type for SQL: XML
To be added:Inserting XML into an SQL database • Transform character string containing XML text into an XML value • INSERT INTO employees ( empxml )VALUES XMLPARSE ( '<emp id="15339"><name>Gurney Halleck</name><hiredate>3998-04-25</hiredate><dept>Security</dept></emp>'STRIP WHITESPACE ) • No concrete proposal yet (still working out issues)
To be added:Inserting XML into an SQL database • XMLPARSE does not validate • But it does: • Syntax check all in-line DTDs • Process entity references defined in such DTDs • Apply default values defined in such DTDs • Whitespace handling not finalized: • STRIP and • PRESERVE are possible choices
To be added: Validating XML in an SQL context • Validate an XML value against an XML schema (possibly a DTD) • INSERT INTO employees ( empxml )VALUES XMLVALIDATE ( XMLPARSE ( '<emp id="15339"><name>Gurney Halleck</name><hiredate>3998-04-25</hiredate><dept>Security</dept></emp>' STRIP WHITESPACE ) ) • No concrete proposal yet (still working out issues)