290 likes | 299 Views
This lecture provides an overview of database management systems and covers the entity-relationship data model. It explains what a database system is, the benefits of using a DBMS over file systems, and the key concepts of data modeling and schema. The lecture also introduces the relational data model and its main concepts, such as relations and schema.
E N D
CS3431 –Database Systems I Introduction Instructor: Mohamed Eltabakh meltabakh@cs.wpi.edu CS3431-B11
Today’s Lecture • Overview on Database Management Systems • Course Logistics • Entity-Relationship Data Model
What is a Database System? • Software platform for managing large amounts of data • Managing means: • Storing, querying, indexing, and structuring the data • Different names refer to the same thing: • Database systems • Database management systems • DBMS
What is a Database System? (Cont’d) • DBMS contains information about a particular enterprise (application) • Collection of interrelated data • Set of programs to access the data • An environment that is both convenientand efficientto use • Usually data is too large to fit in computer memory at once • Usually many users want to access this data and do so fast • Databases touch all aspects of our lives. We use it without knowing !!!
? Database Applications Have you ever used a database application? • E-commerce: books, equipment etc. at Amazon • Banks -- your valuable $$ and ATM transactions • Airlines – manage flights to get you places • Universities – manage student enrollment • GIS (Maps) – find restaurants closest to WPI • WWW (World Wide Web) – blobs, wikis, etc. • Bio-informatics (genome data) Data is everywhere. To efficiently manage it, we need DBMS
Why use DBMS, and not files? Several drawbacks of using file systems to store data • Data redundancy and inconsistency • Multiple file formats, duplication of information in different files • Multiple records formats within the same file • No order enforced between fields • Difficulty in accessing data • Need to write a new program to carry out each new task • No indexes, always scan the entire file • Integrity problems • Modify one file (or field in a file), and not changing the dependent fields or files • Integrity constraints (e.g., account balance > 0) become “buried” in program code rather than being stated explicitly
Why use DBMS, and not files? (Cont’d) • Concurrent access by multiple users • Many users need to access/update the data at the same time (concurrent access) • Uncontrolled concurrent access can lead to inconsistencies • Example: Two people are updating the same bank account at the same time • Security problems • Hard to provide user access to some, but not all, data • Recovery from crashes • While updating the data the system crashes • Maintenance problems • Hard to search for or update a field • Hard to add new fields
DBMS Provides Solutions • Data consistency even with multiple users • Efficient access to the data • Data integrity embedded in the DBMS • Recovery from crashes, security
Basic Terminology • Data Model • Tools used for describing the data • Data Schema • Describes structures for a particular application, using the given model • Database • Collection of actual data that conforms to given schema • Database Management System (DBMS) • Software that allows us to create, stores, use, and maintain a database (conforming to given model) • Data Manipulation Language (DML) • Language to manipulate, e.g., update or query, the data
Data Model • A collection of tools for describing • Data • Data relationships • Data semantics • Data constraints • Several data models: • Relational model • Entity-Relationship (ERD) data model (mainly for database design) • Object-based data models (Object-oriented and Object-relational • Semi-structured data model (XML) • Other older models: • Network model • Hierarchical model We will learn these two models
Data Schema • Captures the relationships between objects (“entities”) in the application • Schemas can be represented graphically or textual
Data Manipulation Language (DML) • Language for accessing and manipulating the data organized by the appropriate data model • DML also known as Query Language • Two classes of languages • Procedural – user specifies what data is required and how to get those data • Declarative (nonprocedural) – user specifies what data is required without specifying how to get those data • DBMSs use declarative language • SQL (Structured Query Language ) is the most widely used declarative query language in DBMSs We will learn SQL
A Big Picture of What You will Learn • Data Model • Relational Model • Entity-Relationship (ERD) • Data Schema • How to put pieces together to build a schema describing the application • Database • Build an actual database and manipulate data • Database Management System (DBMS) • We will use Oracle • Data Manipulation Language (DML) • SQL Language
Relational Data Model: Overview • The most widely used model today • It is a tabular representation of the data • Main concepts: • Relations (Tables),basically a table with rows and columns. • Every relation has a schema, which describes the columns, or fields. Field or attribute
Example Database : Relational Tabular View of Data in Airline System Flight Passenger Travel • Tabular view of data is called Relational Model
Entity-Relationship Model: Overview • Models the application as a collection of entities and relationships • Represented using Entity-Relationship Diagram (ERD) • Main Concepts: • Entities: each entity describe an “object” or “thing” that has some characteristics • Each entity has attributesassociated with it • There are relationshipsbetween the entities Attributes Relationship Entities
SQL: Overview • SQL: Widely used non-procedural language to access the data inside a database • External programs, e.g., in C or Java, typically access the database using: • Language extensions to allow embedded SQL • Interfaces, e.g., ODBC or JDBC, that allow connection to a database and issuing SQL commands
Logical vs. Physical How this information is stored???
Levels of Abstraction • View Level --describes how users see the data • Logical Level – describes the logical structures used • Relational Model • ERD model • Physical Level -- describes files and indexes
Levels of Abstraction: Example • Logical (Conceptual) Level • Flight, Passenger, Travel tables • Physical Level • Flight table stored as a sorted file • Index on flightNo attribute for Flight relation • View Level (External Schema) • NoOfPassengers (flightNo, date, numPassengers) • Hide employees salary
Physical Data Independence • DBMS has the three levels of abstractions • Ability to modify the physical schema without modifying the logical or view schema • Logical data independence: • Logical schema can change, but views need not change • Protection from changes in logical structure of data • Physical data independence: • Physical schema such as indexes can change, but logical schema need not change • Protection from changes in physical structure of data
Other Aspects of DBMSs • Efficient access • Query optimization • Concurrency control • Recovery control >> We will not have time to study these subjects during the course >> It is important to know their existence and what is meant by each component
Efficient Access • Indexing : • Indexes gives direct access to “necessary” portion of data, as opposed to sequential access in files Directly find this customer without scanning all customers
Query Optimization • Costing: • Estimate expected execution times • Query optimization : • Generates many alternatives to answer a query • Estimates the cost of each alternative • Automatically determine and prepare optimal (or near optimal) access plans for getting the data Optimizer = “The Bread and Butter of a DBMS !”
Concurrency Control • DBMS ensures data is consistent under concurrent access • E.g.: multiple airline staff trying to reserve a seat for different customers • Concepts: • Transactions – grouping multiple instructions (reads/writes) into one atomic unit • Locks – locking of resources (tables)
Recovery Control • If system crashes in middle of transaction, recovery must be provided : • Cannot afford to loose data or leave it inconsistent • Concepts: • Logging of transactions’ actions • Ability to redo or undo transactions
Who uses databases? • End users • DB application programmers • Database Administrators • Database design • Security, Authorization • Data availability, crash recovery • Database tuning (for performance)
Summary : Why study DBMS? • Need to process large amounts of data efficiently • Video, WWW, computer games, geographic information systems (GIS), genome data, digital libraries, etc. • Make use of all functionalities provided by DBMSs • DB administrators and programmers hold rewarding jobs • DBMS research is one of the most exciting areas in Computer Science !!