200 likes | 367 Views
Schema Evolution. Schema Evolution. Ref: Alex Borgida & K. E. Williamson. “Accommodating Exceptions in Databases, and Refining the Schema by Hearing From Them”. VLDB, 1985. SCHEMA User Describes the “domain of discoveries” of the database Check the correctness of the data entry DBMS
E N D
Schema Evolution Ref: Alex Borgida & K. E. Williamson. “Accommodating Exceptions in Databases, and Refining the Schema by Hearing From Them”. VLDB, 1985. SCHEMA • User • Describes the “domain of discoveries” of the database • Check the correctness of the data entry • DBMS • Storage efficiency (store extra attributes and non-unique keys, attribute domains constrain violations) • Faster retrieval • Semantic query optimization • Integrity constraints - used to enforce additional conditions not captured by the restrictive language of class definition • All employees have a supervisor • Supervisor wage > Employee wage
Exceptions In Database • Some employees make more than their supervisor. • Some employees may earn a foreign degree, which is not even close to the American ones. How To Live With Exceptions • Storage and access efficiency. • Semantic computation (employee is a consultant). • Sharing (US dollars are now Swiss francs). • Validating of future updates. • Use object concept to represent exceptional information so that such information can be maintained • Alert the user when exceptional information is being manipulated as well as when constraints are violated
Learning From Exceptions • Modifying integrity constraints to apply in more restricted circumstances. • Adding new attributes to a class definition. • Defining a new class and placing it in the current sub-hierarchy. Using Empirical Generalization For Schema Refinement • DBA keeps track of violations and exceptions to constraints in the schema. • When sufficient evidence is accumulated, it suggests one or more alternative changes to the schema.
The Generalization Algorithm • Introduce/expand range • Introduce/generalize class Supervisor: Accountant -> Supervisor: Employee • Drop attribute specification Research (age 30, Ph.D.) and Research (age 35, MS) -> Research (age 30-35, MS-Ph.D.)
Refinement to the Generalization Process • If certain attribute specification is dropped, then the resulting description becomes overly generalized. • If no longer detects some of the errors it detects earlier. • Need heuristic for determining when an attribute constraint is essential. • Notion of RELEVANCE • Keep the attribute in the generalization if more relevant • Discard the attributes in the generalization if less relevant
Integrity constraints must be refined to disregard the contradictions raised by exception value. • Machine learning to find commonalities among the elements of a set of objects, such as exception to a rule encountered so far and characterize these in the form of a class description. • Use such characterization to adjust schema. • Changing restriction on property value range • Modifying integrity constraints • Define new classes Shortcomings • Heuristic nature of the RELEVANCE function. • Inability to handle negation and disjunction.
A Proposal For Automatic Schema Evolution • Predefine Schema templates • Monitoring the exception frequency of constraints • Switch over to new schema when exception frequency reaches a certain pre-specified level • New schema is selected based on the exception characteristics
Ref: Q. Li and D. McLeod. “Object Flavor Evolution Through Learning in an Object-Oriented Database System”. Proceedings of the 2nd International Conference on Expert Database Systems, 1989, pp. 469-495. PKM: Personal Knowledge Model • A simple object-oriented model • Based on a small number of simple concepts • Stepwise develop more complex and abstract database models Level 1 Atomic Object - Non-decomposable units of information at a certain stage (e.g., John Smith). Level 2 Open Atomic Set - A collection of atomic objects, members are not fixed (e.g., P names).
Closed Atomic Set - A collection of atomic objects, members are fixed [e.g., sex, age (>= 0, integer)] Mapping - Mapping from one object to another - Domain & range mapping (e.g., has sender mapping from letter to person) Social - Concept describes relationship (e.g., Jon has GPA 3.75) Procedure - Operation/methods (e.g., method for creating or destroying objects)
Level 3 Open Social Set - classified a collection of objects (e.g., applicants) Closed Social Set – (e.g., courses) Composed Mapping - virtual objects derived by composing other objects Composed Procedure - virtual object derived by grouping a collection of other procedure objects
Evolution System Architecture ICE Intelligent Concept Evolver • How, what & when to evolve PKM Personal Knowledge Manager LFI Learning From Instruction (passive learning) • Initiated by user • Acquire knowledge from user • Dialogue-based • Involved some inference LFE Learning From Exceptions (passive learning) Learning is triggered by: • Violations of constraints • Definition of the object in the DB • Involved more inference than LFI
Evolution System Architecture (cont’d) LFO Learning From Observation (active learning) • Check if any evolution should be conducted • Discovery of target evolution • Creation of classification criteria for set object and similar tasks • LFO is automatic, more inference than LFI and LFE
1) Find GPA of all students in the third world country. Since third world country is not defined in the schema, ICE conducts LFI (learning from instruction) Acquire the concept of third world country given the member of the country and ask the user to identify if they are a third world country. ICE learning all the third world countries in the DB introduces a new object “countries-class” with member “third world”, “second world”, and “first world”. Countries evolved to countries class. LFO (learning from observation) is triggered to find mapping cardinality constraint of “has class”, also determines if it is a many-to-one mapping. 2) CS 10 -> CS 10a, 10b, 10c CS 10 violation! ICE calls LFE and LFI LFI learns there are 10a, 10b, 10c (same course name, unit number, instructor, and students) Students of each section should be proper subset of original student set. Examples
3) Honor student has GPA >= 3.75 • Honor students can also be selected based on recommendation letters from Professors but GPA < 3.75 • ICE’s LFO (learning from observation) learning that every instance satisfies a generalization condition GPA >= 3.70 • ICE changes the honor student’s condition from GPA>=3.75 to 3.70
Planning & Scheduling Initiative (PI) Sponsored by the Defense Advanced Research Project Agency (DARPA) Rome Laboratories (RL) • Goal: • Promote automated planning technology • Smooth transition of the planning technology to operational planning and scheduling problems • Approach: • Tier 1: Independent research effects develops automated reasoning capabilities that are relevant to PI • Tier 2: Integrated Feasibility Demos Merging efforts in Tier 1 into a single system and single operational problem • Sizes: Involved more than 20 contractors
Operational Problems Focus Transportation Planning: • Provide transportation planner with an ability to rapidly generate/evaluate/modify transportation plan • Initial version used in DESERT SHIELD Features: • Interconnected with a large number of Heterogeneous DBs (thousands) • Distributed, Multi-Agent Architecture (delegation, negotiation) • Knowledge-based (representative, acquisition) • Dynamic • Common Prototype Environment • Visionary Demo
Database Requirements • Distributed • Heterogeneous • Intelligent (KB/DB) • Temporal & Evolutionary • Multimedia • Security