1 / 28

Info systems Spring 2013

Info systems Spring 2013. Focus of class. Cutting edge topics in information systems Conceptual material, as opposed to coding If you don’t have a good knowledge of relational databases and SQL, now is a good time to study it… A good MySQL book: murach mysql. Class blog & reaching me.

mora
Download Presentation

Info systems Spring 2013

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Info systems Spring 2013

  2. Focus of class • Cutting edge topics in information systems • Conceptual material, as opposed to coding • If you don’t have a good knowledge of relational databases and SQL, now is a good time to study it… • A good MySQL book: murachmysql

  3. Class blog & reaching me • http://rogerking.me • All assignments and announcements will be on the blog. • I will not send out class email. • Only read blog entries starting on December 30. • To contact me • buzzking@ymail.com • Do not use my CU mail, please. It is inundated with Spam. • 303 437 7419. • To talk to me, see me after class, or send email for an appointment. • rogerbuzzking on Skype, but I do not stay logged into Skype; you must call or email first.

  4. Required books • books: • NoSQL Distilled and  • Seven Databases in Seven Weeks

  5. Assignments and exams • Two exams – material will be from • 2 required books • Slides posted on blog, along with corresponding lectures • Handouts posted on blog • Exam and projects • Midterm – 35 % of final grade • On March 11 • Second exam – 35 % of final grade • This will be on the last schedule day of class, NOT at the final slot • i.e., it will be on May 1 • Small projects/assignments – 30% of final grade total

  6. The First Assignment • Build a simple application using Mongo • Groups of 1 or 2 • Due date to be posted on blog • The application will hold very large video and audio objects • More later…

  7. Suggestions… • Please come to class. • Feel free to suggest references and/or topics. • Speak up at any time – it would be nice to have an interactive class.

  8. Likely and possible topics • Post relational database systems • Full text processing • “Big” data • Databases and XML • Media and databases (images, video, audio, animation) • Personal information systems (not just databases) • Integrating diverse data sets • ???

  9. Review of relational databases • Schema based technology • Small schema • Lots of data • SQL, set-based data manipulation • Multi-language application development • Strict 2 phase ACID transactions • Rigidly structured, identically structured, simply structured data • User defined types extensions for “objects in the small”

  10. Relational basics • Tables (relations) • Columns (attributes) • Domains • Rows (tuples) • Primary and candidate keys • Foreign keys • Null values • Triggers • SQL as DDL and DML

  11. Relational applications • A lot of data, but in amounts no longer considered huge • Transactions manipulating sets of rows • Complete accuracy is required • Not real time or interactive • If distributed, only a modest number of centrally controlled servers • If distributed, there is a single, unified schema • Complex objects built by connecting tables • Logic of data is mostly in schema and SQL, not app • Mix of reads and writes, without writes being minimal • SQL programs (run as transactions) compiled once, and optimized, run many times

  12. Important facts about SQL DBs • Elegant, set theoretic underpinnings • Originally defined with an algebra: • Projection, join, union/difference/intersection, selection • Closed under this algebra • SQL is a tuple calculus based on the algebra • Declarative • Far more compact than algebra • Equally elegant algebra for optimization • Query results are legal view tables • Serious impedance mismatch between o-o languages and SQL

  13. Syntax of SQL SELECT select_list FROM table_sourceWHERE search_condition

  14. Example SELECT invoice_total, payment_total, credit_total, invoice_total - payment_total - credit_total AS balance_due FROM invoices WHERE NOT (invoice_total >= 5000 OR NOT invoice_date <= '2011-08-01')

  15. Example SELECT 'Paid' AS source, invoice_number, invoice_date, invoice_total FROM paid_invoices WHERE invoice_date >= '2011-06-01' ORDER BY invoice_total DESC

  16. Manipulating multiple tables • You can use the Where clause, T.color = R.color • You can use an explicit join • Equijoin, outerjoin SELECT vendor_name, invoice_number, invoice_total FROM vendors LEFT JOIN invoices ON vendors.vendor_id = invoices.vendor_id ORDER BY vendor_name

  17. Aggregates AVG([ALL|DISTINCT] expression) SUM([ALL|DISTINCT] expression) MIN([ALL|DISTINCT] expression) MAX([ALL|DISTINCT] expression) COUNT([ALL|DISTINCT] expression) COUNT(*) – counts nulls

  18. Example SELECT COUNT(*) AS number_of_invoices, SUM(invoice_total – payment_total – credit_total) AS total_due FROM invoices WHERE invoice_total – payment_total – credit_total > 0

  19. Strict 2 phase transactions • Each SQL program is within a begin and end transaction pair • Each transaction has its own workspace for DB items it is going to update • Any transactions that overlap in execution time will appear to have run in some serial order • This is done by transactions requesting read and write locks (also known as shared and exclusive locks) • Read locks can be shared with other readers • Write locks cannot be shared with readers or writers • All locks held until the end of the transaction • They are released and then the changes that a transaction has made are moved to the DB

  20. Serializability & transactions • The net effect is that the transactions that overlap in execution time appear to have run in some serial order • Transactions can be undone by throwing away the local store (conceptually, at least) • The write period at the end of the transaction must be atomic • The two phases: • Request read, write, upgrade locks (and wait on locks) and process • Release locks and move updates to the DB • There is a notion of “serializability” which means that the actual schedule of executed steps corresponds to some serial order of running the transactions

  21. Major concept: DB “state” • Changes at the end of each transaction (potentially) • Relies on 2 Phase transactions • We tradeoff throughput in favor of having safe state changes

  22. Normalization • Start with an entity model • Map to tables • Create PKs and FKs • Create other constraints • Normalize tables • Goals • Minimize redundant data • Minimize “update anomalies” • Minimize misinterpreted db by documenting more constraints • Problems • Skinny tables and more joins • Increases impedance mismatch

  23. Dependencies • FD  • We say that ai FD-> aj • Or “ai functionally determines aj” • MVD-> • We say that ai MVD-> aj • Or “ai multivalued determines aj” • Note: the right side of an FD or an MVD can be a set of attributes

  24. Third normal form • First (1NF) The value stored at the intersection of each row and column must be a scalar value, and a table must not contain any repeating columns. • Second (2NF) Every non-key column must depend on the entire primary key. • Third (3NF) Every non-key column must depend only on the primary key.

  25. Fixed 3NF and 4NF • Boyce-Codd (BCNF) A non-key column can’t be dependent on another non-key column. • Fourth (4NF) A table must not have more than one multivalued dependency, where the primary key has a one-to-many relationship to non-key columns.

  26. Example Customer ID Address ZIP 18 112 First 80304 17 123 Ash 80303 16 123 Ash 80303 Break into two tables: Customer ID Address Address Zip

  27. Example Mothers_PhoneFathers_PhoneChild_Name Break into: Mothers_PhoneChild_Name 3030000000 Sue 3031111111 Sue AndFathers_PhoneChild_Name 3032222222 Sue 3033333333 Sue Note: both fields needed for PK

  28. Stored programs • Stored procedures (can be called by an application) • Stored functions (can be called by an SQL program) • Triggers (tied to an operation like INSERT) • Events (tied to a clock)

More Related