240 likes | 386 Views
No-Schema SQL Querying Relational Databases Independent of Schema. Adam Boot Michael Maur Keith Kelly Jing Yu Maryam Karimzadehgan. Agenda. Motivation Approach Method Challenges Demonstration Conclusion Future Work. Motivation (Schema Dependence).
E N D
No-Schema SQLQuerying Relational Databases Independent of Schema Adam Boot Michael Maur Keith Kelly Jing Yu Maryam Karimzadehgan CS511
Agenda • Motivation • Approach • Method • Challenges • Demonstration • Conclusion • Future Work CS511
Motivation(Schema Dependence) • Relational Model achieved certain forms of independence: • Independent of ordering • Independent of indexing • Independent of access path • Not independent of schema • Schema dependence is undesirable • Memorizing schema is a burden on the user • When schema changes, queries must change CS511
Motivation(The Problem with SQL) • SQL is difficult for the causal user • Relational schema is double-edged • Schema should exist to help in finding relevant data • User should not need to understand structure of underlying schema (transparency) • Implementation of a full natural-language interface is very hard • NaLIX [1] • CQL/NL [2] CS511
Approach • What No-Schema SQL (NS-SQL) is • Relaxed form of SQL’s syntactical structure • Frees users from having to know schema • Allows schema-specific elements to be omitted • Source-to-source translator • Infers schema from fkpk mappings • Qualifies column references • Specifies table references and joins (FROM clause) • What NS-SQL is not • Natural-language interface • User must still know basic SQL syntax • Mind reader (ambiguities become possible) • Column name ambiguity • Join path ambiguity CS511
id name salary Employee id title description id city state Job Facility Approach(A Simple Example) Find the name, city, and state of all employees whose salary is greater than $70,000 and job title is “Database Designer” • SELECT Employee.name, Facility.city, Facility.state FROM Employee JOIN Job ON (Employee.job_id = Job.id) JOIN Facility ON (Employee.facility_id = Facility.id) WHERE (Employee.salary > 70000) AND (Job.title = "Database Designer"); SQL NS-SQL • SELECT name, city, state WHERE (salary > 70000) AND (title = "Database Designer"); CS511
DBMS(via JDBC) DB-URL sql url schema console, GUI Hub NS-SQL ddl ambiguity error Schemafier schema dml ambig disambig command-line, script file, GUI SQL nssql Statement Parser parse error Method(System Architecture) Get Schema Schemafy Parse NS-SQL CS511
Employee. name Facility. city Facility. state Job. title Employee. salary key mapping for join Exployee.job_id = Job.id Employee.facility_id = Facility.id Legend SELECT Column reference Structure name city state Image node WHERE Table reference Expression salary title (Salary > 70000) and ( title = “Database Designer” ) Method(Statement Processing) SELECT name, city, state WHERE (salary > 70000) AND (title = "Database Designer"); Schemafier CS511
Employee. name Facility. city Facility. state Job. title Employee. salary Exployee.job_id = Job.id Employee.facility_id = Facility.id Method(Statement Processing) SELECT key mapping for join name state city WHERE Expression salary title (Salary > 70000) and ( title = “Database Designer” ) CS511
Employee. name Facility. city Facility. state Job. title Employee. salary Exployee.job_id = Job.id Employee.facility_id = Facility.id Employee. name Facility. state Facility. city FROM Joined Table Employee. facility_id Facility. id Facility Joined Table Job. id Job Employee. job_id Employee Employee. salary Job. title Method(Statement Processing) SELECT key mapping for join name state city WHERE Expression salary title ( > 70000) and ( = “Database Designer” ) CS511
Method(Statement Processing) SELECT Empoyee.name, Facility.city, Facility.state FROM Employee JOIN Job ON (Employee.job_id = Job.id) JOIN Facility ON (Employee.facility_id = Facility.id) WHERE (Employee.salary > 70000) AND (Job.title = "Database Designer"); SELECT Employee. name Facility. state Facility. city FROM Joined Table Employee. facility_id Facility. id Facility Joined Table Job. id Job Employee. job_id WHERE Employee Expression Employee. salary Job. title (Salary > 70000) and ( title = “Database Designer” ) CS511
Challenges: Parser • Input Requirement • Support standard SQL (SQL92) inputs and relaxed SQL (NS-SQL) inputs • Extensible (tolerate dialects) • AST Requirement • Easy to extract column/table references • Able to modify column reference and FROM clause given feedback from Schemafier • Solution • Build our own NS-SQL parser with JavaCC • Design a sufficient internal representation CS511
Challenges: Schemafier • Determine how referenced columns are relatable • From among referenced column names, determine a unique set of related physical columns • Determine a unique join path • Types of possible ambiguity • Column name ambiguity • Join path ambiguity CS511
Challenges: Schemafier(Column Name Ambiguity) • Column names might be similar between tables • eg, column name such as “id” or “name” • Assume all referenced columns are related via fkpk (no Cartesian products) • No other assumptions about which physical columns are intended • If multiple interpretations of column name physical column, notify user of ambiguity • Graph tagging algorithm (“reachability”) CS511
Enroll Student Course Survey Challenges: Schemafier(Name Ambiguity: Graph Tagging) tag 1 • Interpret a schema as a graph • Tables are nodes • Key mappings are edges • Mark all roots (nodes without parents) and their children with a common tag • A node may have multiple tags • Nodes that share a common tag are reachable amongst each other (valid solution) tag 2 two tags: • 1 = {Enroll, Student, Course} • 2 = {Survey, Student, Course} CS511
Enroll Student Course Survey Challenges: Schemafier(Name Ambiguity: Example) tag 1 SELECT name, title, grade; Unambiguous: tag 1 SELECT grade, rating; Unrelatable: no common tag SELECT name, designator; Unambiguous: tag 1 and tag 2 SELECT name, credithours; Ambiguous: two physical interpretations of reference “credithours” and each is a valid solution SELECT rating, credithours; Unambiguous: two versions of “credithours”, but only one valid solution (tag 2) grade credithours id designator title credithours id name email ta rating tag 2 CS511
Challenges: Schemafier(Join Path Ambiguity) • Set of columns returned from Column Disambiguation algorithm as input (physical columns) • Finds join paths that exist between columns in tables according to mapping between Foreign/Primary key • If there is more than one path unambiguous solution otherwise, ambiguous solution CS511
Enrollment Student Course Professor R R College FK PK R = referenced in query Challenges: Schemafier(Path Ambiguity: Finding Possible Paths) • Get set of common predecessors for all referenced tables • Get set of possible paths from root to each referenced table • Generate all possible path combinations reaching all referenced tables • Remove useless path combinations • Extract joins from path combinations common predecessors: Enrollment Course Professor paths from root ‘Course‘: to College: - Course, Student, College (C1) - Course, Professor, College (C2) to Professor: - Course, Professor (P1) possible combinations: - C1, P1 - C2, P1 possible combinations: - C1, P1 - C2, P1 (deleted) CS511
Enrollment StudentID, CourseID, Grade, CreditHours Course ID, Designator, Title, TA, Professor, CreditHours Student ID, Name, CollegeID, Email Professor ID, Name, College, Email College ID, Name, PhoneNr Challenges: Schemafier(Path Ambiguity: Example) SELECT Title, Grade; Unambiguous SELECT College, PhoneNr; Ambiguous: 3 joins exist SELECT Grade, CollegeID; Ambiguous: 2 joins exist SELECT Designator, PhoneNr; Ambiguous: 3 joins exist SELECT Title, CollegeID; Ambiguous: 2 joins exist SELECT Course.ID, Professor.Name; Unambiguous CS511
Conclusion • Current NS-SQL transformer • Independent of the schema • Able to qualify referenced columns and infer join paths for non-nested SELECT statements • Return detailed error message if ambiguity exists • Highly extensible CS511
Future Work(Parser) • Support dialects without affecting standard queries • Virtually every DBMS extends SQL • Example extensions: • Arguments: • SELECT TOP 10 … (SQL Server) • SELECT … LIMIT 0, 10 (MySQL) • SELECT … SAMPLE 10 (Oracle) • Functions: • CUBE and ROLLUP (SQL Server and Oracle) • Statements: • RENAME (MySQL) • PARTITION (Oracle) • Parser and internal representation are flexible CS511
Future Work(Schemafier) • Support more SQL statements • Extends Schemafier to support UPDATE, DELETE, INSERT • Support nested queries • Support aliasing • Usability • Given an ambiguity error, provide a more intuitive interface to resolve the ambiguity • Synonym support CS511
References [1] Yunyao Li, Huahai Yang, H. V. Jagadish; “NaLIX: an Interactive Natural Language Interface for Querying XML”; SIGMOD 2005, June 14-16, 2005. [2] Vesper Owei, Hyeun-Suk (Sue) Rhee, Shamkant Navathe; “Natural Language Query Filtration in the Conceptual Query Language”; IEEE 1997, 1060-3425197. CS511
Questions? CS511