1.1k likes | 1.43k Views
Compiler Concepts for Database Systems. Prof. Steven A. Demurjian Computer Science & Engineering Department The University of Connecticut 371 Fairfield Way, Unit 2155 Storrs, CT 06269-3155. steve@engr.uconn.edu http://www.engr.uconn.edu/~steve (860) 486 - 4818. Overview.
E N D
Compiler Concepts for Database Systems Prof. Steven A. Demurjian Computer Science & Engineering Department The University of Connecticut 371 Fairfield Way, Unit 2155 Storrs, CT 06269-3155 steve@engr.uconn.edu http://www.engr.uconn.edu/~steve (860) 486 - 4818
Overview • Motivation and Background • Database System Architecture • Exploring its Capabilities • Focusing on Compiler-Related Concepts • Compile Time Issues in Database Systems • The SQL Query Language • Optimization Issues in Database Systems • Typing • Runtime Issues in Database Systems • Transaction Processing • Execution for Complex Joins
Database System Architecture • What are the Various Components? • How do they Relate to Compilers?
Database Concepts - Summary • Schema vs. Data • Database-Structured Collection of Data Describing • Objects of Universe of Discourse being Modeling. • A Database Consists of Schema and Data • Schema: Describes the Intension (Type) of Objects • Data: Describes the Extension (Instances) of Objects • What is Schema w.r.t. Compilers? What is Data?
What is a DBMS? • A Database Management System (DBMS) is the Generalized Tool that Facilitates the Management of and Access to the Database • Main Functions: • Defining a Database: Specifying Data Types, Structures, and Constraints • Constructing a Database: the Process of Storing the Data Itself on Some Storage Medium • Manipulating a Database: Function for Querying Specific Data in the Database and Updating the Database • What are the Analogies of Each of the Main Functions w.r.t. Programming Languages and Compilers?
What is a DBMS? • Additional Functions: • Interaction with File Manager • So that Details Related to Data Storage and Access are Removed From Application Programs • Integrity Enforcement • Guarantee Correctness, Validity, Consistency • Security Enforcement • Prevent Data From Illegal Uses • Concurrency Control • Control the Interference Between Concurrent Programs • Recovery from Failure • Query Processing and Optimization • Again – What are Relevant Compiler Concepts?
DBMS Architecture • DBMS Languages • Data Definition Language (DDL) • Data Manipulation Language (DML) • From Embedded Queries or DB Commands Within a Program • “Stand-alone” Query Language • Host Language: • DML Specification (e.g., SQL) is Embedded in a “Host” Programming Language (e.g., Java, C++) • DBMS Interfaces • Menu-Based Interface • Graphical Interface • Forms-Based Interface • Interface for DBA (DB Administrator)
DBMS Architecture • Main DBMS Modules • DDL Compiler • DML Compiler • Ad-hoc (Interactive) Query Compiler • Run-time Database Processor • Stored Data Manager • Concurrency/Back-Up/Recovery Subsystem • DBMS Utility Modules • Loading Routines • Backup Utility • System Catalog/data Dictionary
ANSI/SPARC - Three Schema Architecture • External Data Schema (Users’ view) • Conceptual Data Schema (Logical Schema) • Internal Data Schema (Physical Schema) • What are the Programming Language Analogies?
Conceptual Schema • Describes the Meaning of Data in the Universe of Discourse • Emphasizes on General, Conceptually Relevant, and Often Time Invariant Structural Aspects of the Universe of Discourse • Excludes the Physical Organization and Access Aspects of the Data • This could be a UML Design that Realizes a Set of Classes (no data) or Java Class Declarations (APIs)
Conceptual Schema • Another Example – A Programming Language Level Definition
External Schema • Describes Parts of the Information in the Conceptual Schema in a form Convenient to a Particular User Group’s View • Derived from the Conceptual Schema • What is the View of the Outside World in OO? • Akin to Public Interface
External Schema • Another Example
Internal Schema • Describes How the Information Described in the Conceptual Schema is Physically Represented in a Database to Provide the Overall Best Performance
Internal Schema • Another Example • This Corresponds to Data Typing and Layout in Compilers from Runtime Environment!
Database Access Process • What Does This Access Process Resemble? • Akin to Runtime Execution Environment! • A More Complex Activation Process!
Metadata vs. Data • Recall Introspection and Reflection in Java where you Can “Look” into the Class Definitions Themselves!
Data Independence • Ability that Allows Application Programs Not Being Affected by Changes in Irrelevant Parts of the Conceptual Data Representation, Data Storage Structure and Data Access Methods • Invisibility (Transparency) of the Details of Entire Database Organization, Storage Structure and Access Strategy to the Users • Recall Software Engineering Concepts: • Abstraction the Details of an Application's Components Can Be Hidden, Providing a Broad Perspective on the Design • Representation Independence: Changes Can Be Made to the Implementation that have No Impact on the Interface and Its Users • Realized in Today’s Modern PLs!
What are System Components? • How are these Similar to Complier/PL Concepts?
Relational Model • Relational Model of Data Based on the Concept of a Relation • Relation - a Mathematical Concept Based on Sets • Strength of the Relational Approach to Data Management Comes From the Formal Foundation Provided by the Theory of Relations • RELATION: A Table of Values • A Relation May Be Thought of as a Set of Rows • A Relation May Alternately be Though of as a Set of Columns • Each Row of the Relation May Be Given an Identifier • Each Column Typically is Called by its Column Name or Column Header or Attribute Name
Relational Database Definition CREATE TABLE Student: Name(CHAR(30)), SSN(CHAR(9)), Gpa(FLOAT(2)) CREATE TABLE Faculty: Name(CHAR(30)), SSN(CHAR(9)), Ophone(CHAR(7)) CREATE TABLE Courses: Course#(CHAR(6)), Title(CHAR(20)), Descrip(CHAR(100)), PCourse#(CHAR(6)) CREATE TABLE Formats: Section#(INTEGER(3)), Quarter(CHAR(10)), Campus(CHAR(15)) CREATE TABLE TakeorTeach: SSN(CHAR(9)), Course#(CHAR(6)), Section#(INTEGER(3)) CREATE TABLE COfferings: Course#(CHAR(6)), Section#(INTEGER(3)) Student(Name*, SSN, Gpa) Faculty(Name*, SSN, Ophone) Courses(Course#*, Title, Descrip, PCourse#*) Formats(Section#*, Quarter, Campus) TakeorTeach(SSN, Course#, Section#) COfferings(Course#, Section#)
Relational Views • Two Views Derived From Prior Tables • Student Transcript View • Course Prerequisite View
SQL is a Partial Example of a Tuple Relational Language Simple Queries are all Declarative More Complex Queries are both Declarative and Procedural (e.g., joins, nested queries) Find the names of employees working on the CAD/CAM project SELECT EMP.ENAME FROM EMP, WORKS, PROJ WHERE (EMP.ENO= WORKS.ENO) AND (WORKS.PNO = PROJ.PNO) AND (PROJ.PNAME = “CAD/CAM”) SQL Defines a Programming Language and Associated Semantics for Usage and Processing SQL: Tuple Relational Calculus-Based
SQL Components • Data Definition Language (DDL) • For External and Conceptual Schemas • Views - DDL for External Schemas • Data Manipulation Language (DML) • Interactive DML Against External and Conceptual Schemas • Embedded DML in Host PLs (EQL, JDBC, etc.) • Note: Separation of Definition (DDL) from Usage (DML) – Is there Something Similar in PLs? • Others • Integrity (Allowable Values/Referential) • Transaction Control (Long-Duration and Batch) • Authorization (Who can Do What When)
SQL DDL and DML • Data Definition Language (DDL) - Declarations • Defining the Relational Schema - Relations, Attributes, Domains - The Meta-Data CREATE TABLE Student: Name(CHAR(30)),SSN(CHAR(9)),GPA(FLOAT(2)) CREATE TABLE Courses: Course#(CHAR(6)), Title(CHAR(20)), Descrip(CHAR(100)), PCourse#(CHAR(6)) • Data Manipulation Language (DML) - Code • Defining the Queries Against the Schema SELECT Name, SSN From Student Where GPA > 3.00
Data Definition Language - DDL • A Pre-Defined set of Primitive Types • Numeric • Character-string • Bit-string • Additional Types • Defining Domains • Defining Schema • Defining Tables • Defining Views • Note: Each DBMS May have their Own DBMS Specific Data Types - Is this Good or Bad? • What is this Similar to re. Different C++ Compilers? • These are Akin to PL Data Types!
DDL - Primitive Types • Numeric • INTEGER (or INT), SMALLINT • REAL, DOUBLE PRECISION • FLOAT(N) Floating Point with at Least N Digits • DECIMAL(P,D) (DEC(P,D) or NUMERIC(P,D)) have P Total Digits with D to Right of Decimal • Note that INTs and REALs are Machine Dependent (Based on Hardware/OS Platform) • Again – this is Similar to PLs/Compilers and Code Generation – Data Layout
DDL - Primitive Types • Character-String • CHAR(N) or CHARACTER(N) - Fixed • VARCHAR(N), CHAR VARYING(N), or CHARACTER VARYING(N) Variable with at Most N Characters • Bit-Strings • BIT(N) Fixed • VARBIT(N) or BIT VARYING(N) Variable with at Most N Bits
DDL - Primitive Types • These Specialized Primitive Types are Used to: • Simplify Modeling Process • Include “Popular” Types • Reduce Composite Attributes/Programming • DATE : YYYY-MM-DD • TIME: HH-MM-SS • TIME(I): HH-MM-SS-F....F - I Fraction Seconds • TIME WITH TIME ZONE: HH-MM-SS-HH-MM • TIME-STAMP: YYYY-MM-DD-HH-MM-SS-F...F{-HH-MM} • PLs also have Specialized Types! • Problem: Different Database Systems Sometime Implement these Types very Differently • This Impacts Portability!
What is a SQL Schema? • A Schema in SQL is the Major Meta-Data Construct • Supports the Definition of: • Relation - Table with Name • Attributes - Columns and their Types • Identification - Primary Key • Constraints - Referential Integrity (FK) • Two Part Definition • CREATE Schema - Named Database or Conceptually Related Tables • CREATE Table - Individual Tables of the Schema
DDL-Create/Drop a Schema • Creating a Schema:CREATE SCHEMA MY_COMPANY AUTHORIZATION Demurjian; • Schema MY_COMPANY bas Been Created and is Owner by the User “Demurjian” • Tables can now be Created and Added to Schema • Dropping a Schema:DROP SCHEMA MY_COMPANY RESTRICT;DROP SCHEMA MY_COMPANY CASCADE; • Restrict: • Drop Operation Fails If Schema is Not Empty • Cascade: • Drop Operation Removes Everything in the Schema
DDL - Create Tables CREATE TABLE EMPLOYEE ( FNAME VARCHAR(15) NOT NULL , MINIT CHAR , LNAME VARCHAR(15) NOT NULL , SSN CHAR(9) NOT NULL , BDATE DATE ADDRESS VARCHAR(30) , SEX CHAR , SALARY DECIMAL(10,2) , SUPERSSN CHAR(9) , DNO INT NOT NULL , PRIMARY KEY (SSN) , FOREIGN KEY (SUPERSSN) REFERENCES EMPLOYEE(SSN) , FOREIGN KEY (DNO) REFERENCES DEPARTMENT(DNUMBER) ) ;
DDL - Create Tables (continued) CREATE TABLE DEPARTMENT ( DNAME VARCHAR(15) NOT NULL , DNUMBER INT NOT NULL ,MGRSSN CHAR(9) NOT NULL , MGRSTARTDATE DATE , PRIMARY KEY (DNUMBER) , UNIQUE (DNAME) , FOREIGN KEY (MGRSSN) REFERENCES EMPLOYEE(SSN) ) ; CREATE TABLE DEPT_LOCATIONS (DNUMBER INT NOT NULL , DLOCATION VARCHAR(15) NOT NULL , PRIMARY KEY (DNUMBER, DLOCATION) , FOREIGN KEY (DNUMBER) REFERENCES DEPARTMENT(DNUMBER) ) ;
DDL - Create Tables (continued) CREATE TABLE PROJECT (PNAME VARCHAR(15) NOT NULL , PNUMBER INT NOT NULL ,PLOCATION VARCHAR(15) , DNUM INT NOT NULL , PRIMARY KEY (PNUMBER) , UNIQUE (PNAME) , FOREIGN KEY (DNUM) REFERENCES DEPARTMENT(DNUMBER) ) ; CREATE TABLE WORKS_ON (ESSN CHAR(9) NOT NULL , PNO INT NOT NULL , HOURS DECIMAL(3,1) NOT NULL , PRIMARY KEY (ESSN, PNO) , FOREIGN KEY (ESSN) REFERENCES EMPLOYEE(SSN) ,FOREIGN KEY (PNO) REFERENCES PROJECT(PNUMBER) ) ;
DDL - Create Tables with Constraints CREATE TABLE EMPLOYEE ( . . . , DNO INT NOT NULL DEFAULT 1, CONSTRAINT EMPPK PRIMARY KEY (SSN) , CONSTRAINT EMPSUPERFK FOREIGN KEY (SUPERSSN) REFERENCES EMPLOYEE(SSN) ON DELETE SET NULL ON UPDATE CASCADE , CONSTRAINT EMPDEPTFK FOREIGN KEY (DNO) REFERENCES DEPARTMENT(DNUMBER) ON DELETE SET DEFAULT ON UPDATE CASCADE );
DDL - Create Tables with Constraints • Is there an Equivalent to Keys and Constraints in PLs? • What Does Java Have Internally? • Constraints Facilitate Type Checking at Data Level! CREATE TABLE DEPARTMENT ( . . . , MGRSSN CHAR(9) NOT NULL DEFAULT '888665555' , . . . , CONSTRAINT DEPTPK PRIMARY KEY (DNUMBER) , CONSTRAINT DEPTSK UNIQUE (DNAME), CONSTRAINT DEPTMGRFK FOREIGN KEY (MGRSSN) REFERENCES EMPLOYEE(SSN) ON DELETE SET DEFAULT ON UPDATE CASCADE );
Data Manipulation Language - DML • SQL has the SELECT Statement for Retrieving Info. from a Database (Not Relational Algebra Select) • SQL vs. Formal Relational Model • SQL Allows a Table (Relation) to have Two or More Identical Tuples in All Their Attribute Values • Hence, an SQL Table is a Multi-set (Sometimes Called a Bag) of Tuples; it is Not a Set of Tuples • SQL Relations Can Be Constrained to Sets by • PRIMARY KEY or UNIQUE Attributes • Using the DISTINCT Option in a Query • Implied Processing and Procedural Semantics • SQL Queries have Specific Semantics • These Semantics Dictate Processing • Includes Code Generation, Optimization, etc.
Interactive DML - Main Components • Select-from-where Statement Contains: • Select Clause - Chosen Attributes/Columns • From Clause - Involved Tables • Where Clause - Constrain Tuple Values • Tuple Variables - Distinguish Among Same Names in Different Tables • String Matching - Detailed Matching Including • Exact • Starts With • Near • Ordering of Rows - Sorting Tuple Results
…and Corresponding DB Tables Which Represent Tuples/Instances of Each Relation A S C null W B null null 1 4 5 5
Simple SQL Queries B S C null W B null null • Query 0: Retrieve the Birthdate and Address of the Employee whose Name is 'John B. Smith'.SELECT BDATE, ADDRESSFROM EMPLOYEEWHERE FNAME='John' AND MINIT='B’ AND LNAME='Smith’ • Which Row(s) are Selected? • Note: While All of these Next Queries are from Chapter 8, Some are From “Earlier” Edition
Simple SQL Queries • Query 1: Retrieve Name and Address of all Employees who work for the 'Research' DepartmentSELECT FNAME, MINIT, LNAME, ADDRESS, DNAMEFROM EMPLOYEE, DEPARTMENTWHERE DNAME='Research' ANDDNUMBER=DNO • What Action is Being Performed? Join! Cartesian Product!
Simple SQL Queries - Result Theta Join on DNO=DNUMBER
Simple SQL Queries • Query 2: For Every Project in 'Stafford', list the Project Number, the Controlling Dept. Number, and the Dept. Manager's Last Name, Address, and BirthdateSELECT PNUMBER, DNUM, LNAME, BDATE,ADDRESSFROM PROJECT, DEPARTMENT, EMPLOYEEWHERE DNUM=DNUMBER AND MGRSSN=SSN AND PLOCATION='Stafford' • In Q2, there are Two Join Conditions: • The Join Condition DNUM=DNUMBER Relates a Project to its Controlling Department • The Join Condition MGRSSN=SSN Relates the Controlling Department to the Employee who Manages that Department
Query Results A S C null W B null null SELECT PNUMBER, DNUM, LNAME, BDATE,ADDRESSFROM PROJECT, DEPARTMENT, EMPLOYEEWHERE DNUM=DNUMBER AND MGRSSN=SSN AND PLOCATION='Stafford'