580 likes | 595 Views
Learn the history, fundamentals, and applications of Database Management Systems (DBMS). Understand the importance of studying databases, their evolution over the years, and the benefits of using a DBMS compared to traditional file systems.
E N D
DatabaseManagement System Introduction
Warning • This class is a lot of work. • But it is worth it. • Of all courses you take at CS, this may be the one that gets you a job.
Syllabus • The background and history of database management systems. • The fundamentals of using a database management systems. • Relational model. • Queries and Updates. • Relational Algebra. • Normalization • Transactions and Security. • Object-oriented, object-relational, semi-structured and XML database systems.
What Is a Database System? • Database: a very large, integrated collection of data. • Models a real-world enterprise • Entities (e.g., teams, games) • Relationships (e.g., Abo Teraka is playing in Al Ahly) • More recently, also includes active components , often called “business logic”. (e.g., the BCS ranking system) • A Database Management System (DBMS)is a software system designed to store, manage, and facilitate access to databases.
? Why Study Databases?? • Shift from computation to information • always true for corporate computing • Web made this point for personal computing • more and more true for scientific computing • Need for DBMS has exploded in the last years • Corporate: retail swipe/clickstreams, “customer relationship”, “supply chain”, “data warehouses”, etc. • Scientific: digital libraries, Human Genome project, NASA Mission to Planet Earth, physical sensors, grid physics network • DBMS encompasses much of CS in a practical discipline • OS, languages, theory, AI, multimedia, logic • Yet traditional focus on real-world apps
Database Applications • These examples are what we called traditional database applications (First part of book focuses on traditional applications) • More Recent Applications: • Youtube • iTunes • Geographic Information Systems (GIS) • Data Warehouses • Many other applications
History of Database Systems • 1950’s and early 1960’s: • Data processing using magnetic tapes for storage • Tapes provide only sequential access • Punched cards for input • Late 1960’s and 1970’s: • Hard disks allow direct access to data • Network and hierarchical data models in widespread use • Ted Codd defines the relational data model • Would win the ACM Turing Award for this work • IBM Research begins System R prototype • UC Berkeley begins Ingres prototype • High-performance (for the era) transaction processing
History (cont.) • 1980s: • Research relational prototypes evolve into commercial systems • SQL becomes industry standard • Parallel and distributed database systems • Object-oriented database systems • 1990s: • Large decision support and data-mining applications • Large multi-terabyte data warehouses • Emergence of Web commerce • 2000s: • XML and XQuery standards • Automated database administration • Increasing use of highly parallel database systems • Web-scale distributed data storage systems
Q: How do you write programs over a subsystem when it promises you only “???” ? A: Very, very carefully!! = Is a File System a DBMS? • Thought Experiment 1: • You and your project partner are editing the same file. • You both save it at the same time. • Whose changes survive? A) Yours B) Partner’s C) Both D) Neither E) ??? • Thought Experiment 2: • You’re updating a file. • The power goes out. • Which of your changes survive? A) All B) None C) All Since Last Save D) ???
Can we do it without a DBMS ? Sure we can! Start by storing the data in files: students.txt courses.txt professors.txt Now write C or Java programs to implement specific tasks
Doing it without a DBMS... Write a C program to do the following: • Enroll “Mary Johnson” in “CSE444”: • Read ‘students.txt’ • Read ‘courses.txt’ • Find&update the record “Ahmed Hassan” • Find&update the record “CS444” • Write “students.txt” • Write “courses.txt”
Enters a DMBS “Two tier database system” Database server(someone else’sC program) Applications Data files
Problems without a DBMS... Read ‘students.txt’ Read ‘courses.txt’ Find&update the record “Mary Johnson” Find&update the record “CSE444” Write “students.txt” Write “courses.txt” CRASH ! • System crashes: • What is the problem ? • Large data sets (say 50GB) • What is the problem ? • Simultaneous access by many users • Need locks: we know them from OS, but now data on disk; and is there any fun to re-implement them ?
Why Use a DBMS? Access by a collection of ad hoc programs in C++, Java, PHP, etc. • Without a DBMS, we'd have: users of the data There is no control or coordination of what these programs dowith the data data stored as bits on disks organized as files
Why Use a DBMS? applications • With a DBMS, we have: DBMS users of the data DBMS provides controland coordination to protect the data. data stored as bits on disks organized as files
Database definition • Database is “data” or facts supplied by a base or software • Files contain data with the same structure • Database is an integration of different kinds of data
Database Systems • The big commercial database vendors: • Oracle • IBM (with DB2) bought Informix recently • Microsoft (SQL Server) • Sybase • Some free database systems (Unix) : • Postgres • Mysql • Predator
DBMS Functions • Define the database • Construct the database • Manipulating database • Data security and integrity • Concurrency • Recovery
Disadvantages of database • Expensive • Incompatible with any other DBMS
Concurrency • A DBMS supports access by concurrent users • concurrent = happening at the same time • concurrent access, particularly writes (data changes), can result in inconsistent states (even when the individual operations are correct) • the DBMS can check the actual operations of concurrent users, to prevent activity that will lead to inconsistent states
Access Control • A DBMS can restrict access to authorized users • security policies often require control that is more fine-grained than that provided by a file system • since the DBMS understands the data structure, it can enforce fairly sophisticated and detailed security policies • on subsets of the data • on subsets of the available operations
Redundancy Control • A DBMS can assist in controlling redundancy • redundancy = multiple copies of the same data • with file storage, it's often convenient to store multiple copies of the same data, so that it's "local" to other data and applications • this can cause many problems: • wasted disk space • inconsistencies • need to enter the data multiple times
Backup and Recovery • A DBMS can provide backup and recovery • backup = snapshots of the data particular times • recovery = restoring the data to a consistent state after a system crash • the higher level semantics (relationships and constraints) can make it difficult to restore a consistent state • transaction analysis can allow a DBMS to reconstruct a consistent state from a number of backups
Views and Interfaces • A DBMS can support multiple user interfaces and user views • since the DBMS provides a well-defined data model and a persistent data dictionary, many different interfaces can be developed to access the same data • data independence ensures that these UIs will not be made invalid by most changes to the data • new user views can be supported as new schemas defined against the conceptual schema
DBMS =============== Design tools Table Creation Form Creation Query Creation Report Creation Procedural language compiler (4GL) ============= Run time Form processor Query processor Report Writer Language Run time Database Application Programs Database contains: User’s Data Metadata Indexes Application Metadata User Interface Applications Database Components
Actors on DBMS • Database Administrator • System analysis • Database designer • Application programmer • End user
Actors on the Scene • Database Administrators • acquiring a DBMS • managing the system • acquiring HW and SW to support the DBMS • authorizing access (security policies) • managing staff, including DB designers
Actors on the Scene • Database Designers • identifying the information of interested in the Universe of Discourse (UoD) • designing the database conceptual schema • designing views for particular users • designing the physical data layout and logical schema • adjusting data parameters for performance
Actors on the Scene • Systems Analysts and Application Programmers (generic database developers) • provide specialized knowledge to optimize database usage • provide generic (canned) application programs
Actors on the Scene • End Users • casual users: ad-hoc queries • naïve or parametric users: canned queries such as menus for a phone company customer service agent • sophisticated users: people who understand the system and the data and use it in many novel ways • standalone users: people who use personal easy-to-use databases for personal data
Three-Schema Architecture user-specific views External View External View External View generic view Conceptual Schema Internal Schema physical view
View 1 View 2 View 3 Conceptual Schema Physical Schema DB Users Levels of Abstraction • Views describe how users see the data. • Conceptual schema defines logical structure • Physical schema describes the files and indexes used. • (sometimes called the ANSI/SPARC model)
View 1 View 2 View 3 Conceptual Schema Physical Schema DB Example: University Database • Conceptual schema: • Students(sid: string, name: string, login: string, age: integer, gpa:real) • Courses(cid: string, cname:string, credits:integer) • Enrolled(sid:string, cid:string, grade:string) • External Schema (View): • Course_info(cid:string,enrollment:integer) • Physical schema: • Relations stored as unordered files. • Index on first column of Students.
Levels of Abstraction • Physical level: describes how a record (e.g., customer) is stored. • Logical level: describes data stored in database, and the relationships among the data. typecustomer = record customer_id : string; customer_name : string;customer_street : string;customer_city : string; end; • View level: application programs hide details of data types. Views can also hide information (such as an employee’s salary) for security purposes.
Conceptual Data Models • A data model describes the possible schemas (essentially the meta-schema) • A DBMS is designed around a particular data model • this is what allows all system components (and humans) to understand the schema and data • possible data models • relational, object-oriented, object-relational, entity-relationship, semantic, network, hierarchical, etc.
Physical Data Models • A physical data model describes the way in which data is stored in the computer • typically only of interest to database designers, implementers and maintainers …not end users • must provide a well-defined structure that can be mapped to the conceptual schema • allows optimization strategies to be defined generically
Instances and Schemas • Similar to types and variables in programming languages • Schema – the logical structure of the database • Example: The database consists of information about a set of customers and accounts and the relationship between them) • Physical schema: database design at the physical level • Logical schema: database design at the logical level • Instance – the actual content of the database at a particular point in time
Classification • DBMS has 3 criteria as • Data models (relational & object &….) • Number of users (single user & Multi-user) • Number of sites (Centralized & Distributed)
Data model Is a technique for organization data and concepts to describe the structure of data, relationship and integrity constrains.
Database models • Relational data model Oracle, Access • Hierarchical data mode (as a tree) IMS DBMS • Network data model (as a graph) IDMS DBMS • Object oriented model VERSANT DBMS • Object relational data model UNISQL DBMS
Books (id, title) Authors (first, last) Publisher Subjects Data Models • Hierarchical Model (1960’s and 1970’s) • Similar to data structures in programming languages.
Authors Subjects Books Publishers Data Models • Network Model (1970’s) • Provides for single entries of data and navigational “links” through chains of data.
Books (id, title) Authors (first, last) Publisher Subjects Data Models • Object Oriented Data Model (1990’s) • Encapsulates data and operations as “Objects”
Relational Model Attributes • Example of tabular data in the relational model
Relational data model • Based on the relations between data • Each relation or table (entity) is a data structure or a collection of attributes describing data • Attribute or a field is a column in the table • A tuple or record is a raw in the table
Relational data model • Null value is assigned to attribute which means that the attribute is not yet known • Primary key is a unique identifier for the table. One attribute or combination of attributes
Relational data model • Foreign key is an attribute (combination of attributes) is one relation whose values are required to match those of the primary of some relation • Candidate key is any key (primary or foreign keys)