300 likes | 435 Views
Complex Data and XML. Week 10 Review. Trees and the places where tree structures are used in computing and knowledge representation Nomenclature in trees Tree-based – root, node, branch, leaf Family-based – parent, child, sibling The Document Object Model as a tree DOM access in Javascript
E N D
Week 10 Review • Trees and the places where tree structures are used in computing and knowledge representation • Nomenclature in trees • Tree-based – root, node, branch, leaf • Family-based – parent, child, sibling • The Document Object Model as a tree • DOM access in Javascript • An example of DOM manipulation • DHTML – accessing and updating the DOM with Javascript • AJAX - • server communication from JavaScript
Tree and questions • Look at the tree in the diagram • Answer the following questions: • Name the root of the tree • Name the leaves of the tree • Using path notation in Windows or Unix • What is the location of J relative to the root? • What is the location of the root if you are at J? • What is the location of K if you are at J?
A R C B G L J D K M N
The rationale for XML • Complex data • StudentsOnline • Modules and Courses • Export and import – ‘self-documenting’ data • Flickr • TransXChange • XML-based tools • XForms
Year 1 Oracle example • Students on year 1 Database module modelled a much bigger data model concerning modules, students, staff and courses. • Oracle SQL DDL was reverse-engineered to QSEE, then the code regenerated to MySQL, installed and executed. • Query interface using PHP
Weak entity Foreign key identifies
Lessons • Problems with data types and data formats • ER model as Conceptual Model • Foreign keys in QSEE • Many of the entities are ‘weak’
Data types • Data types differ between the two databases – QSEE uses a more general set of datatypes and then outputs DDL dependant on the target. • Data typing is a big problem in databases • Data types too small for the data stored, so data might be lost • Date formats differ 2006-12-08 in MySQL, 08-dec-2006 in Oracle
Conceptual ER model • No foreign keys in the ER model. • foreign keys are the realisation (implementation) of the relationships defined in the ER Model in a RDBMS • Case tool can then • removes redundancy - there is no point in defining both the relationship AND the foreign keys [one thing in one place] • it emphasises the conceptual nature of the model, which may be implemented in other ways • the data type of the primary key and all foreign keys must be the same - but you may need to change it so better to change it in one place and regenerate the schema • all the necessary integrity constraints are also generated
Foreign Keys in QSEE • The bar across a relationship indicates that the foreign key is to be part of the primary key. • As well as creating a foreign key, the primary key constraint will be generated appropriately. • A dotted boundary indicates that the whole primary key is made up of foreign keys • Any entity in which a foreign key is part of the primary key is 'weak' - it depends for its existence on another entity. • Foreign key naming • default - fk1_ , fk2_ prefix • My policy is to • remove the prefix globally • set the prefix on specific relationships where required (manager, leader)
Foreign keys in RDBMS • In MySQL, action depends on which file model is being used • ISAM (Index Sequential access Method) does not enforce foreign key integrity • INNODB does • Loading data is a problem • Need to load data in the right order to prevent foreign key errors – roughly parent before child • Not possible if any cycles, so RDBMS allows it to be turned off for the loading • Special programs for bulk loading from files such as CSV– easier than using INSERT statements
Questions about the model • What foreign keys would be added to the Course table in an RDBMS? To the Teaches table? • Explain the optionality on the mgr relationship between staff and department? Is it on the right end? • Which two tables would be impossible to put in order to ensure foreign key constraints are not violated?
Non-first normal form • First normal form specifies that all fields must be atomic – single valued • Look at the qualification table. The bar indicates that it is dependant on staff. If repeated fields were allowed in tables then we could simply add: qualifications : set Varchar • to the staff table • We would need to be able to search for individual qualifications • Tags in the photo album site are a similar problem. • Which other table could be removed if non-first normal form tables were allowed? • MySQL has sets and operators to add, remove and search (but these all add complexity)
Harder cases • What additional structure would be required to get rid of the nextOfKin table? • How could we get rid of the ‘teaches’ and ‘undertakes’ tables? How would this data be handled in an Object-oriented model? • If all the weak entities were removed, how much would this simplify the model?
NextOfKin • Requires a sub-record of name, phone and relationship which can be repeated • Teaches • Decide which side this data belongs to and add a set of sub-records • Which side would you place the data from teaches and undertakes?
Complex data • Real data is much more complex than this • ISIS the student record system has over 200 tables, many them like the 5 weak tables in the simpler model – so the same ratio might be assumed to apply • Consider an actual module specification • Note the number of repeated sub-records? • How many tables would be required to model this data?
Communication • Applications which stand alone and cannot interchange data with other applications are called ‘silos’ • An RBMS must export data as files of INSERT statements or as a comma separated variable file • In both cases, you can’t make any sense of the data without knowing the Database schema INSERT INTO teaches VALUES ( 11, 'UFCEMQ-20-1', 5); INSERT INTO teaches VALUES ( 11, 'UFEEHV-20-1', 5); INSERT INTO teaches VALUES ( 13, 'UFQEGC-20-1', 6); INSERT INTO teaches VALUES ( 8, 'UFCEKN-20-1', 10); 1, 'David', 'Johnson', '109 Mount View Road, Kingswood, Bristol BS15 8UB', '0117 9676505', '3A31', 'Male', 32000, 'Professor', 'dpjohnso', 'Applied Sciences' 2, 'Carrie', 'Ford', '15 Heron Way, Chipping Sodbury, Bristol BS37 6NT', '01454 854089', '1C29', 'Female', 29000, 'Senior Lecturer', 'caford', 'Bristol Business School'
'BUWE B80 GF78', 'PgDIT', '3 Years', 13, 'CIS' 'BUWE B80 M221', 'Commercial Law', '3 Years', 7, 'Law' 'BUWE B80 B230', 'Pharmaceutical Sciences', '4 Years', 1, 'Applied Sciences' 'BUWE B80 G500', 'Business Decision Analysis', '4 Years', 9, 'Bristol Business School' 'BUWE B80 X300', 'Education Studies', '3 Years', 5, 'Education' 'BUWE B80 G451', 'Multimedia Computing', '4 Years', 10, 'Computing, Engineering and Mathematical Sciences' 'BUWE B80 L300', 'Sociology', '3 Years', 6, 'Humanities, Languages and Social Sciences' CREATE TABLE course( cCode VARCHAR(30) NOT NULL, title VARCHAR(50), duration VARCHAR(20), deptName VARCHAR(50) NOT NULL, leader_staffNo NUMERIC(4,0) NOT NULL, -- Specify the PRIMARY KEY constraint for table "course". -- This indicates which attribute(s) uniquely identify each row of data. CONSTRAINT pk_course PRIMARY KEY (cCode) ) TYPE=INNODB;
XML • XML is tagged data following some simple rules on the formation and relationship between tags – ‘well-formed’ • XML ‘documents’ are trees and thus support non-first normal form data • The tags in the document provide –some- information about the meaning of the data • The tags and the common tree structure allow general tools to do much of the processing without a separate schema.
Optional processing instruction Staff as XML <?xml version="1.0" encoding="UTF-8"?> <StaffList> <Staff staffNo="1"> <fName>David</fName> <lName>Johnson</lName> <address>109 Mount View Road, Kingswood, Bristol BS15 8UB</address> <phone>0117 9676505</phone> <officeNo>3A31</officeNo> <sex>Male</sex> <salary>32000</salary> <post>Professor</post> <computerid>dpjohnso</computerid> <department>Applied Sciences</department> <teaching> <teaches module="UGDFNP-20-1" hours="6"/> <teaches module="UGDF58-20-1" hours="7"/> </teaching> <qualification>PhD</qualification> </Staff> </StaffList> Root node element Element with children value attribute
Processes required for XML • Store XML (native XML database) • Query XML (XQuery) • Transform XML into other XML, RDF or text (XQuery, XSLT) • Transform structured text to XML (?) • Define deeper semantics for an XML file (XSchema, RelaxNG) • User Interface (XForms)
Applications of XML • XHTML – HTML as well-formed XML which conforms to standard • SVG – Structured Vector Graphics • kml – GoogleEarth overlay files • XForms – UI definition • Flickr API responses • RSS news syndication • ….
Well-formed XML • ‘Well-formed’ – absolute property of an XML document – must obey rules for well-formedness • Check by • Opening in Word 2003 • Submitting to an online validator • Check using an XML tool • XML Spy • Stylus Studio • Oxygen • Check out the rules in the Wikipedia entry
Well-formed XML Every XML document must be well-formed and must therefore adhere to the following rules (among others): • Every start-tag must have a matching end tag. • Elements may nest but must not overlap. <name>Anna<em>Coffey</em></name> - √ <name><em>Anna</name>Coffey</em> - × • There must be exactly one root element. • Attribute values must be quoted. • An element must not be quoted. • Comments and processing instructions may not appear inside tags. • No unescaped < or & signs may occur in the character data of an element.
Well-formed XML documents Element names are case sensitive - <NAME>, <name>, <Name> & <NaMe> are four different element types. No white spaces in element name - <First Name> not allowed; <First_Name> OK. Element names cannot start with the letters “XML” or “xml” – reserved terms. Element names must start with a letter or a underscore. Element names cannot start with a number but numbers may be embedded within an element name - <2you> not allowed; <me2you> is OK. Attribute names are constrained by the above rules for element names. Entity references are used to substitute specific characters. There are five predefined entities built into XML: Entity Char Notes & & Do not use inside processing instructions < < Use inside attribute values quoted with “. > > Use after ]] in normal text and inside processing instruction. " “ Use inside attribute values quoted with “. ' ‘ Use inside attribute values quoted with ‘.
Spot the mistakes <?xml version="1.0" encoding="UTF-8"> <StaffList> <Staff staffNo="1“ temporary> <fName>Davd</fName> <lName>Johnson</lname> <address>109 Mount View Road, Kingswood, Bristol BS15 8UB</addess> <phone>0117 9676505</phone> <officeNo>3A31</officeNo> <sex=Male/> <salary>32000</SALARY> <POST>Profesor</POST> <computerid>dpjohnso</computerid> <department>Applied Sciences</department> <teaching> <teaches module=UGDFNP-20-1 hours="6"/> <teaches module="UGDF58-20-1" hours="7"> </teaching> <qualification>PhD</qualification> </StaffList>
‘Valid’ • Validity is relative - valid with respect to a specified schema which defines: • The sequence and nesting of elements • The allowable attributes of an element • Whether elements are optional or repeated • The type of data in an element or attribute • Different languages for expressing these rules • DTD – Document Type Definition • XML Schema (xsd) – itself an XML vocabulary • Relax NG • Schematron – permissive • Document can be linked to a specific schema • Too inflexible
Tutorial • Review the exercises from this lecture • Take the module specification and work out how many tables would be required to hold this data • Take one of the TransXChange files and try to develop a data model. • Use a browser to view the file, don’t try to print it out. • Use ‘find’ to follow references. • Later • Read the wikipedia article on XML • Work through the w3.schools tutorial • XML dialects have developed in many fields of knowledge. Investigate the XML dialects which have been created in one particular area of interest e.g. Multimedia, Family History
Next Week • Normalisation Exercise • Processing Language • Prize for the best Photo Album as judged by you and us.