Normal Forms and XML

Normal Forms and XML Zachary G. Ives University of Pennsylvania CIS 550 – Database & Information Systems October 10, 2007 Some slide content courtesy of Susan Davidson & Raghu Ramakrishnan

Announcements • Homework 3 will be due Monday10/22 • Midterm: Wednesday 10/24 • Please start forming into project groups of 4, with a list of names due 10/24

Lossless Join Decomposition R1, … Rk is a lossless join decomposition of R w.r.t. an FD set Fif for every instance r of Rthat satisfies F, ÕR1(r)⋈ ... ⋈ ÕRk(r) = r Consider: What if we decompose on(sid, name) and (serno, subj, cid, exp-grade)?

Testing for Lossless Join R1, R2 is a lossless join decomposition of R with respect to F iff at least one of the following dependencies is in F+ (R1 R2)  R1 – R2 (R1 R2)  R2 – R1 So for the FD set: sidname sernocid, exp-grade cidsubj Is (sid, name) and (serno, subj, cid, exp-grade) a lossless decomposition?

Dependency Preservation Ensures we can “easily” check whether a FD XYis violated during an update to a database: • The projectionof an FD set F onto a set of attributes Z, FZis {X Y | X Y  F +, X YÍZ} i.e., it is those FDs local to Z’s attributes • A decomposition R1, …, Rk is dependency preserving ifF + = (FR1 ... FRk)+ The decomposition hasn’t “lost” any essential FD’s, so we can check without doing a join

Example of Lossless and Dependency-Preserving Decompositions Given relation scheme R(name, street, city, st, zip, item, price) And FD set namestreet, city street, cityst street, cityzip name, itemprice Consider the decomposition R1(name, street, city, st, zip) and R2(name, item, price) • Is it lossless? • Is it dependency preserving? What if we replaced the first FD by name, streetcity?

Another Example Given scheme: R(sid, fid, subj) and FD set: fidsubj sid, subjfid Consider the decomposition R1(sid, fid) and R2(fid, subj) • Is it lossless? • Is it dependency preserving?

FD’s and Keys • Ideally, we want a design s.t. for each nontrivial dependency XY, X is a superkey for some relation schema in R • We just saw that this isn’t always possible • Hence we have two kinds of normal forms

Two Important Normal Forms Boyce-Codd Normal Form (BCNF). For every relation scheme R and for every XA that holds over R, either AX (it is trivial) ,or or X is a superkey for R Third Normal Form (3NF). For every relation scheme R and for every XA that holds over R, either AX (it is trivial), or X is a superkey for R,or Ais a member of some key for R

Normal Forms Compared BCNF is preferable, but sometimes in conflict with the goal of dependency preservation • It’s strictly stronger than 3NF Let’s see algorithms to obtain: • A BCNF lossless join decomposition (nondeterministic) • A 3NF lossless join, dependency preserving decomposition

BCNF Decomposition Algorithm(from Korth et al.; our book gives a recursive version) result := {R} compute F+ while there is a relation schema Ri in result that isn’t in BCNF { let AB be a nontrivial FD on Ri s.t. ARi is not in F+ and A and B are disjoint result:= (result – Ri)  {(Ri - B), (A,B)} } i.e., A doesn’t form a key

An Example Given the schema: Stuff(sid, name, serno, classroom, cid, fid, prof) And FDs: sid  name serno  classroom, cid, fid fid  prof • Find the Boyce-Codd Normal Form for this schema • What if instead: sid  name classroom, cid  sernofid  prof serno  cid

3NF Decomposition Algorithm Let F be a minimal cover i:=0 for each FD AB in F { if none of the schemas Rj, 1 j  i, contains AB { increment i Ri:= (A, B) } } if no schema Rj, 1  j  i contains a candidate key for R { increment i Ri:= any candidate key for R } return (R1, …, Ri) Build dep.-preserving decomp. Ensurelosslessdecomp.

An Example Given the schema: Stuff(sid, name, serno, classroom, cid, fid, prof) And FDs: sid  name serno  classroom, cid, fid fid  prof • Find the Third Normal Form for this schema • What if instead: sid  name classroom, cid  sernofid  prof serno  cid

Summary of Normalization • We can always decompose into 3NF and get: • Lossless join • Dependency preservation • But with BCNF: • We are only guaranteed lossless joins • The algorithm is nondeterministic, so there is not a unique decomposition for a given schema R • BCNF is stronger than 3NF: every BCNF schema is also in 3NF

Normalization Is Good… Or Is It? • In some cases, we might not mind redundancy, if the data isn’t directly updated: • Reports (people like to see breakdowns by semester, department, course, etc.) • Warehouses (archived copies of data for doing complex analysis) • Data sharing (sometimes we may export data into object-oriented or hierarchical formats)

XML: A Semi-Structured Data Model

Why XML? XML is the confluence of several factors: • The Web needed a more declarative format for data • Documents needed a mechanism for extended tags • Database people needed a more flexible interchange format • “Lingua franca” of data • It’s parsable even if we don’t know what it means! Original expectation: • The whole web would go to XML instead of HTML Today’s reality: • Not so… But XML is used all over “under the covers”

Why DB People Like XML Can get data from all sorts of sources • Allows us to touch data we don’t own! • This was actually a huge change in the DB community Interesting relationships with DB techniques • Useful to do relational-style operations • Leverages ideas from object-oriented, semistructured data Blends schema and data into one format • Unlike relational model, where we need schema first • … But too little schema can be a drawback, too!

XML Anatomy Processing Instr. <?xml version="1.0" encoding="ISO-8859-1" ?> <dblp> <mastersthesis mdate="2002-01-03" key="ms/Brown92"> <author>Kurt P. Brown</author> <title>PRPL: A Database Workload Specification Language</title> <year>1992</year> <school>Univ. of Wisconsin-Madison</school> </mastersthesis> <article mdate="2002-01-03" key="tr/dec/SRC1997-018"> <editor>Paul R. McJones</editor> <title>The 1995 SQL Reunion</title> <journal>Digital System Research Center Report</journal> <volume>SRC1997-018</volume> <year>1997</year> <ee>db/labs/dec/SRC1997-018.html</ee> <ee>http://www.mcjones.org/System_R/SQL_Reunion_95/</ee> </article> Open-tag Element Attribute Close-tag

Well-Formed XML A legal XML document – fully parsable by an XML parser • All open-tags have matching close-tags (unlike so many HTML documents!), or a special: <tag/> shortcut for empty tags (equivalent to <tag></tag> • Attributes (which are unordered, in contrast to elements) only appear once in an element • There’s a single root element • XML is case-sensitive

XML as a Data Model XML “information set” includes 7 types of nodes: • Document (root) • Element • Attribute • Processing instruction • Text (content) • Namespace • Comment XML data model includes this, plus typing info, plus order info and a few other things

XML Data Model Visualized(and simplified!) attribute root p-i element Root text dblp ?xml mastersthesis article mdate mdate key key author title year school 2002… editor title journal volume year ee ee 2002… 1992 1997 The… ms/Brown92 tr/dec/… PRPL… Digital… db/labs/dec Univ…. Paul R. Kurt P…. SRC… http://www.

What Does XML Do? • Serves as a document format (super-HTML) • Allows custom tags (e.g., used by MS Word, openoffice) • Supplement it with stylesheets (XSL) to define formatting • Data exchange format (must agree on terminology) • Marshalling and unmarshalling data in SOAP and Web Services

XML as a Super-HTML(MS Word) <h1class="Section1"><aname="_top“ />CIS 550: Database and Information Systems</h1> <h2class="Section1">Fall 2004</h2> <pclass="MsoNormal"> <place>311 Towne</place>, Tuesday/Thursday <timeHour="13" Minute="30">1:30PM – 3:00PM</time> </p>

XML Easily Encodes Relations Student-course-grade <student-course-grade> <tuple><sid>1</sid><serno>570103</serno><exp-grade>B</exp-grade></tuple> <tuple><sid>23</sid><serno>550103</serno><exp-grade>A</exp-grade></tuple> </student-course-grade>

But XML is More Flexible…“Non-First-Normal-Form” (NF2) <parents> <parentname=“Jean” > <son>John</son> <daughter>Joan</daughter> <daughter>Jill</daughter> </parent> <parentname=“Feng”> <daughter>Felicity</daughter> </parent> … Coincides with “semi-structured data”, invented by DB people at Penn and Stanford

XML and Code • Web Services (.NET, recent Java web service toolkits) are using XML to pass parameters and make function calls • Why? • Easy to be forwards-compatible • Easy to read over and validate (?) • Generally firewall-compatible • Drawbacks? XML is a verbose and inefficient encoding! • XML is used to represent: • SOAP: the “envelope” that data is marshalled into • XML Schema: gives some typing info about structures being passed • WSDL: the IDL (interface def language) • UDDI: provides an interface for querying about web services

Integrating XML: What If We Have Multiple Sources with the Same Tags? • Namespaces allow us to specify a context for different tags • Two parts: • Binding of namespace to URI • Qualified names <rootxmlns=“http://www.first.com/aspace” xmlns:otherns=“…”> <tagxmlns:myns=“http://www.fictitious.com/mypath”> <thistag>is in the default namespace (aspace)</thistag> <myns:thistag>is in myns</myns:thistag><otherns:thistag>is a different tag in otherns</otherns:thistag> </tag> </root>

Normal Forms and XML

Normal Forms and XML

Presentation Transcript

Schema Refinement and Normal Forms

Higher Normal Forms

Schema Refinement and Normal Forms

XML Court Forms

Schema Refinement and Normal Forms

Normal Forms and Infinity

Normal Forms

Schema Refinement and Normal Forms

Normal Forms

Normal Forms

Normal forms

Normal Forms

Normal Forms

4.4 Normal Forms

Normal Forms

Schema Refinement and Normal Forms

Database Design and Normal Forms

Schema Refinement and Normal Forms

XML Meets Forms

Schema Refinement and Normal Forms