250 likes | 345 Views
CS610 / CS710 Database Systems I Daisy Wong. Objectives. Modeling and design of databases. Programming: queries and DB operations. Database. A large collection of data: stored in mass storage exists over a long period of time
E N D
CS610 / CS710 Database Systems I Daisy Wong
Objectives • Modeling and design of databases. • Programming: queries and DB operations
Database • A large collection of data: • stored in mass storage • exists over a long period of time • Can take on a variety of appearances depending on the requirements at the time • Can serve as the data source for a variety of applications
Database Examples • Examples: • Wal-Mart : records every item purchased in every store • L. L. Bean: records detail information of each customer and their purchases • Hospitals: record patient demographics, conditions and progress, test results, etc • . . . • This course, refer to a collection of data managed by a Database Management System (DBMS)
DBMS • A combination of software, data, and structure of the data that support: • Users to create databases and specify their schema • Users to query and modify the data • Storage of very large amount of data, secure the data from accident or unauthorized use, and allow efficient access • Control concurrent access from many users, presenting correct data to each user, and prevent accidental corruption of the data from simultaneous accesses
DBMS lingo • Schema • logical structure of the data. Use Data Definition Language (DDL) • Query • A question about the data. Use Data Manipulation Language (DML) • Transaction (or atomic transaction) • A logical unit of work that must be completed as a whole or not at all.
Three levels of database schemas External View1 External View2 External View3 External level Logical to external mappings Logical Schema Logical level Internal to Logical mappings Disk Internal schema Internal level
DBMS User Roles • End users • access the database for information to do their jobs. Casual (use simple user interfaces) vs. Sophisticated ( use DML) • Database designers • specify information content (use DDL) to create database systems • Application developers • design and develop applications that extend the functionality of the dbms. E.g. user interface, data analysis and data mining, various business services • Database administrators (DBAs) • administer databases: control access, maintain data accuracy and integrity, monitor and improve database performance
A little history • First attempt – file systems • Hierarchical model (tree based) • Network model (graph base) • Relational model • Proposed by E. F. Codd (1970) • Data should be presented to user as tables (relations) • Queries expressed in a very high-level language • SQL (Structured Query Language) – most important language based on relational model
Relational Model • A conceptual model that represents data as relations. • Relations – tables of data • Query using SQL: SELECT balance FROM Accounts WHERE accountNo = 34567; • Relational DBMS finds an efficient way to answer the query Accounts
Major components of a DBMS (Simplified. Figure 1.1) Users / applications DBA External level Transaction Commands Queries/updates DDL commands Query Manager DDL compiler Transaction Manager Logical level Buffer/File Storage Manager Data Metadata Internal level
Figure 1.1: Database management system components Ref. FCDS 2ed. by Ullman
Storage Manager • Obtains the requested information from data storage • Modifies the information if requested and re-store • Indexes are used (data structures that help us find data items quickly given a part of their value). Advanced data structures such as B-tree are used for efficient access • Indexes are part of the data, their description is part of the metadata • Consists of 2 components: file manger and buffer manager • File manager keeps track of the location of files on disk and obtains the blocks containing the requested data • Buffer manager handles main memory. It manages the memory blocks, obtaining disk blocks from the file manager, trying to optimize the access to data
Query Manager • Parse and optimize the query using a query compiler • Execute the resulting query plan (sequence of actions for the DBMS to perform) • Issues a sequence of requests to storage manager for small pieces of data • Return the result to the requester
Transaction Manager • Assure all transactions are executed properly • ACID properties of “proper” execution: • Atomicity : All of the updates of a transaction are successful, or no update take place • Consistency: Each transaction should leave the database in a consistent state • Isolation: Each transaction, when executed concurrently with other transactions, should have the same effect as if it had been executed by itself • Durability: Once a transaction has completed successfully, its changes to the database should be permanent. Even serious failures should not affect the permanence of a transaction.
Techniques to enforce ACID • Locking – granularity of locks is important. • Logging – write a log to nonvolatile storage. Assure durability. • Transaction Commitment – for durability and atomicity, transactions are computed “tentatively”, recorded, but no changes are made to the db until the transaction gets committed. Changes copied to the log, then copied to db.
Trends • Object Oriented DB • Richer data types • Classes and class hierarchy enable share or reuse of sw and schemas • Protect misuse through abstract data types • Object Relational DB • Constraints and triggers handling • Multimedia data • Data Integration to support advance data analysis such as data mining. E.g. data warehouses, data marts • Multi-tier Client-Server architecture, move more processing to the client • Parallel processing • Support Web sites
Knowledge Discovery in Databases (KDD) Interpretation / Evaluation Data Mining Transformation Preprocessing Knowledge Selection Patterns Transformed Data Preprocessed Data Data Target Data Reference: Fayyad; Smyth: "From Data Mining to Knowledge Discovery: An Overview" 1996
Ch. 2 Entity-Relationship Data Model • Data models • Entity-Relationship diagrams • Design Principles • Modeling of constraints • Weak entity sets
Data Modeling • Used for conceptual database design Object-oriented DBMS Classes / Objects ODL Ideas Relational DBMS Relational Schema E/R
Relational Model • Based on mathematically defined relations of entities • Consists of: • Attributes (fields) • Domain legitimate values of attributes (data range) • Views of data presented in table format • Create new views by projections of the database • Records are n-tuples, where n = # of attributes
Entity-Relationship (E/R) Diagrams • Represents the schematic of a database • Useful for designing the conceptual model • Entity set : classes of objects • Entity : member of an entity set (data) • Attributes : properties of the entities in an entity set • Relationship – describe how entities relate to each other
Entity-Relationship Diagrams Symbols attributes relationships entity sets Cardinality: one - one m many - one m n many - many
Entity-Relationship Diagrams (continue) • Entity • Has a set of attributes • The key is underlined • Key • Attributes which uniquely identify an entity in an entity set • An inherent property of the data (e.g. movie title and year) • Serve as a constraint SS#