1 / 24

No-Schema SQL Querying Relational Databases Independent of Schema

No-Schema SQL Querying Relational Databases Independent of Schema. Adam Boot Michael Maur Keith Kelly Jing Yu Maryam Karimzadehgan. Agenda. Motivation Approach Method Challenges Demonstration Conclusion Future Work. Motivation (Schema Dependence).

kyrie
Download Presentation

No-Schema SQL Querying Relational Databases Independent of Schema

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. No-Schema SQLQuerying Relational Databases Independent of Schema Adam Boot Michael Maur Keith Kelly Jing Yu Maryam Karimzadehgan CS511

  2. Agenda • Motivation • Approach • Method • Challenges • Demonstration • Conclusion • Future Work CS511

  3. Motivation(Schema Dependence) • Relational Model achieved certain forms of independence: • Independent of ordering • Independent of indexing • Independent of access path • Not independent of schema • Schema dependence is undesirable • Memorizing schema is a burden on the user • When schema changes, queries must change CS511

  4. Motivation(The Problem with SQL) • SQL is difficult for the causal user • Relational schema is double-edged • Schema should exist to help in finding relevant data • User should not need to understand structure of underlying schema (transparency) • Implementation of a full natural-language interface is very hard • NaLIX [1] • CQL/NL [2] CS511

  5. Approach • What No-Schema SQL (NS-SQL) is • Relaxed form of SQL’s syntactical structure • Frees users from having to know schema • Allows schema-specific elements to be omitted • Source-to-source translator • Infers schema from fkpk mappings • Qualifies column references • Specifies table references and joins (FROM clause) • What NS-SQL is not • Natural-language interface • User must still know basic SQL syntax • Mind reader (ambiguities become possible) • Column name ambiguity • Join path ambiguity CS511

  6. id name salary Employee id title description id city state Job Facility Approach(A Simple Example) Find the name, city, and state of all employees whose salary is greater than $70,000 and job title is “Database Designer” • SELECT Employee.name, Facility.city, Facility.state FROM Employee JOIN Job ON (Employee.job_id = Job.id) JOIN Facility ON (Employee.facility_id = Facility.id) WHERE (Employee.salary > 70000) AND (Job.title = "Database Designer"); SQL NS-SQL • SELECT name, city, state WHERE (salary > 70000) AND (title = "Database Designer"); CS511

  7. DBMS(via JDBC) DB-URL sql url schema console, GUI Hub NS-SQL ddl ambiguity error Schemafier schema dml ambig disambig command-line, script file, GUI SQL nssql Statement Parser parse error Method(System Architecture) Get Schema Schemafy Parse NS-SQL CS511

  8. Employee. name Facility. city Facility. state Job. title Employee. salary key mapping for join Exployee.job_id = Job.id Employee.facility_id = Facility.id Legend SELECT Column reference Structure name city state Image node WHERE Table reference Expression salary title (Salary > 70000) and ( title = “Database Designer” ) Method(Statement Processing) SELECT name, city, state WHERE (salary > 70000) AND (title = "Database Designer"); Schemafier CS511

  9. Employee. name Facility. city Facility. state Job. title Employee. salary Exployee.job_id = Job.id Employee.facility_id = Facility.id Method(Statement Processing) SELECT key mapping for join name state city WHERE Expression salary title (Salary > 70000) and ( title = “Database Designer” ) CS511

  10. Employee. name Facility. city Facility. state Job. title Employee. salary Exployee.job_id = Job.id Employee.facility_id = Facility.id Employee. name Facility. state Facility. city FROM Joined Table Employee. facility_id Facility. id Facility Joined Table Job. id Job Employee. job_id Employee Employee. salary Job. title Method(Statement Processing) SELECT key mapping for join name state city WHERE Expression salary title ( > 70000) and ( = “Database Designer” ) CS511

  11. Method(Statement Processing) SELECT Empoyee.name, Facility.city, Facility.state FROM Employee JOIN Job ON (Employee.job_id = Job.id) JOIN Facility ON (Employee.facility_id = Facility.id) WHERE (Employee.salary > 70000) AND (Job.title = "Database Designer"); SELECT Employee. name Facility. state Facility. city FROM Joined Table Employee. facility_id Facility. id Facility Joined Table Job. id Job Employee. job_id WHERE Employee Expression Employee. salary Job. title (Salary > 70000) and ( title = “Database Designer” ) CS511

  12. Challenges: Parser • Input Requirement • Support standard SQL (SQL92) inputs and relaxed SQL (NS-SQL) inputs • Extensible (tolerate dialects) • AST Requirement • Easy to extract column/table references • Able to modify column reference and FROM clause given feedback from Schemafier • Solution • Build our own NS-SQL parser with JavaCC • Design a sufficient internal representation CS511

  13. Challenges: Schemafier • Determine how referenced columns are relatable • From among referenced column names, determine a unique set of related physical columns • Determine a unique join path • Types of possible ambiguity • Column name ambiguity • Join path ambiguity CS511

  14. Challenges: Schemafier(Column Name Ambiguity) • Column names might be similar between tables • eg, column name such as “id” or “name” • Assume all referenced columns are related via fkpk (no Cartesian products) • No other assumptions about which physical columns are intended • If multiple interpretations of column name  physical column, notify user of ambiguity • Graph tagging algorithm (“reachability”) CS511

  15. Enroll Student Course Survey Challenges: Schemafier(Name Ambiguity: Graph Tagging) tag 1 • Interpret a schema as a graph • Tables are nodes • Key mappings are edges • Mark all roots (nodes without parents) and their children with a common tag • A node may have multiple tags • Nodes that share a common tag are reachable amongst each other (valid solution) tag 2 two tags: • 1 = {Enroll, Student, Course} • 2 = {Survey, Student, Course} CS511

  16. Enroll Student Course Survey Challenges: Schemafier(Name Ambiguity: Example) tag 1 SELECT name, title, grade; Unambiguous: tag 1 SELECT grade, rating; Unrelatable: no common tag SELECT name, designator; Unambiguous: tag 1 and tag 2 SELECT name, credithours; Ambiguous: two physical interpretations of reference “credithours” and each is a valid solution SELECT rating, credithours; Unambiguous: two versions of “credithours”, but only one valid solution (tag 2) grade credithours id designator title credithours id name email ta rating tag 2 CS511

  17. Challenges: Schemafier(Join Path Ambiguity) • Set of columns returned from Column Disambiguation algorithm as input (physical columns) • Finds join paths that exist between columns in tables according to mapping between Foreign/Primary key • If there is more than one path  unambiguous solution otherwise, ambiguous solution CS511

  18. Enrollment Student Course Professor R R College FK PK R = referenced in query Challenges: Schemafier(Path Ambiguity: Finding Possible Paths) • Get set of common predecessors for all referenced tables • Get set of possible paths from root to each referenced table • Generate all possible path combinations reaching all referenced tables • Remove useless path combinations • Extract joins from path combinations common predecessors: Enrollment Course Professor paths from root ‘Course‘: to College: - Course, Student, College (C1) - Course, Professor, College (C2) to Professor: - Course, Professor (P1) possible combinations: - C1, P1 - C2, P1 possible combinations: - C1, P1 - C2, P1 (deleted) CS511

  19. Enrollment StudentID, CourseID, Grade, CreditHours Course ID, Designator, Title, TA, Professor, CreditHours Student ID, Name, CollegeID, Email Professor ID, Name, College, Email College ID, Name, PhoneNr Challenges: Schemafier(Path Ambiguity: Example) SELECT Title, Grade; Unambiguous SELECT College, PhoneNr; Ambiguous: 3 joins exist SELECT Grade, CollegeID; Ambiguous: 2 joins exist SELECT Designator, PhoneNr; Ambiguous: 3 joins exist SELECT Title, CollegeID; Ambiguous: 2 joins exist SELECT Course.ID, Professor.Name; Unambiguous CS511

  20. Conclusion • Current NS-SQL transformer • Independent of the schema • Able to qualify referenced columns and infer join paths for non-nested SELECT statements • Return detailed error message if ambiguity exists • Highly extensible CS511

  21. Future Work(Parser) • Support dialects without affecting standard queries • Virtually every DBMS extends SQL • Example extensions: • Arguments: • SELECT TOP 10 … (SQL Server) • SELECT … LIMIT 0, 10 (MySQL) • SELECT … SAMPLE 10 (Oracle) • Functions: • CUBE and ROLLUP (SQL Server and Oracle) • Statements: • RENAME (MySQL) • PARTITION (Oracle) • Parser and internal representation are flexible CS511

  22. Future Work(Schemafier) • Support more SQL statements • Extends Schemafier to support UPDATE, DELETE, INSERT • Support nested queries • Support aliasing • Usability • Given an ambiguity error, provide a more intuitive interface to resolve the ambiguity • Synonym support CS511

  23. References [1] Yunyao Li, Huahai Yang, H. V. Jagadish; “NaLIX: an Interactive Natural Language Interface for Querying XML”; SIGMOD 2005, June 14-16, 2005. [2] Vesper Owei, Hyeun-Suk (Sue) Rhee, Shamkant Navathe; “Natural Language Query Filtration in the Conceptual Query Language”; IEEE 1997, 1060-3425197. CS511

  24. Questions? CS511

More Related