550 likes | 563 Views
CSC443 Database Management. Course Introduction Professor Pepper adapted from presentations given by Professor Juliana Freire & Karl Aberer & Yan Chen & Silberschatz, Korth and Sudarshan. Today’s Goals. Course Overview Why study databases? Why use databases? Intro to Databases.
E N D
CSC443 Database Management Course Introduction Professor Pepper adapted from presentations given by Professor Juliana Freire & Karl Aberer & Yan Chen & Silberschatz, Korth and Sudarshan
Today’s Goals Course Overview • Why study databases? • Why use databases? • Intro to Databases
Major Course Objectives • Design and diagram relational databases • Create Access and Oracle databases • Use SQL commands • Be able to design a good relational database • Know how to get information out of a database to answer any question
Diagramming • Use Case • Class Diagram • Entity Relationship Diagram • Algebraic Relation Model
Tools • Panther • Unix • Oracle 9.2.0.1.0 • FTP Explorer – register for trial • MS Access
Books • Database System Concepts 5th Ed • Theory • Cross Reference for fourth ed • Oracle 9i Programming - A Primer • Practical examples • See course syllabus • Available in Library
Learning Resources • Blackboard:my.adelphi.edu • Web site Database System Concepts: • www.db-book.com/ • My office hours: • Tuesday & Thursday 12:15-1:30; Wed 12-12:30 • Alumni 114 or Science Lab • My email: pepper@adelphi.edu • My phone: 516-747-2362 • My Web:www.adelphi.edu/~pepperk
Adelphi Account Setup • Panther • Oracle • Blackboard • E-mail • Signin Sheet
Projects / Grading • Projects: 40% • Access – 15 • Oracle - 25 • Homework assignments: 20% • Midterm: 20% • Final: 20%.
Assignments • 2% dropped for anything 1 day late. • 10% dropped for anything 2 weeks late.
Delivering assignments • Email • ftp • drop box • discussion board • mailbox in math department • E-mail me if making a change in delivery place. • forward your email from Adelphi
What is a Database Management System? Database Management System = DBMS • A collection of files that store the data • A big program written by someone else that accesses and updates those files for you Relational DBMS = RDBMS • Data files are structured as relations (tables)
What is behind this Web Site? • http://www.ticketmaster.com/ • Search on a large database • Specify search conditions • Many users • Updates • Access through a web interface Central to Modern Computer Science
Database Systems: Today Field is developing quickly From Friendster.com on-line tour
Other databases you may use Databases are EVERYWHERE
Current Commercial Outlook • A major part of the software industry: • Oracle, IBM, Microsoft, Sybase • also Informix (now IBM), Teradata • smaller players: java-based dbms, devices, OO, … • Well-known benchmarks (esp. TPC) • Lots of related industries • data warehouse, document management, storage, backup, reporting, business intelligence, app integration • Relational products dominant and evolving • adapting for extensibility (user-defined types), adding native XML support. • Open Source coming on strong • MySQL, PostgreSQL, BerkeleyDB
? Why Study Databases?? • Need exploded • Corporate: retail swipe/clickstreams, “customer relationship mgmt”, “supply chain mgmt”, “data warehouses”, etc. • Scientific: digital libraries, Human Genome project, NASA Mission to Planet Earth, physical sensors, grid physics network
Why study databases? • Data is valuable: • bank account records, tax records, student records… • Protect It! - no matter what • Hurricane • Flood • Human error
Why study databases?Data often structured: • Example: Bank account records all follow the same structure • We can exploit this regular structure • To retrieve data in useful ways (that is, we can use a query language) • To store data efficiently
Why Study Databases Summary • Central to modern computer science • Databases are everywhere • Commercially successful • Fast moving technology • Plethora of structured data that business and people need
What is a database? • Whiteboard Exercise
Database Definition • Database • – a very large, integrated collection of data. (the stuff) • Models a real-world enterprise • Entities (e.g., teams, games) • Relationships (e.g., The Forty-Niners are playing in The Superbowl) • Database Management System • – software that stores and manages databases (the tools)
Database is better than simple file system because: • Data redundancy, inconsistency and isolation • Difficult to access • Integrity problems • Atomicity of updates (change one file and die before the other completes) • Multiple user issues
So a Database Has: • representing information • data modeling • languages and systems for querying data • complex queries with real semantics* • over massive data sets • concurrency control for data manipulation • controlling concurrent access • ensuring transactional semantics • reliable data storage • maintain data semantics even if you pull the plug • * semantics: the meaning or relationship of meanings of a sign or set of signs
Why Use a Database • Why use a database presentation
Describing Data: Data Models • A data modelis a collection of concepts for describing data. • Aschemais a description of a particular collection of data, using a given data model. • A relation is the data stored in a certain schema • The relational model of datais the most widely used model today. • Entities and relations among them • Integrity constraints and business rules • Perspective dependent (warehouse & sales view item differently)
Database Design The process of designing the general structure of the database: • Logical Design – Deciding on the database schema. • Business decision – What attributes • Computer Science decision – What relation schemas • Physical Design – Deciding on the physical layout of the database
Data Models • A collection of tools for describing • Data • Data relationships • Data semantics • Data constraints • Relational model • Entity-Relationship data model (mainly for database design) • Object-based data models (Object-oriented and Object-relational) • Semistructured data model (XML) • Other older models: • Network model • Hierarchical model
The Entity-Relationship Model • Models an enterprise as a collection of entities and relationships • Entity: a “thing” or “object” in the enterprise that is distinguishable from other objects • Described by a set of attributes • Relationship: an association among several entities • Represented diagrammatically by an entity-relationship diagram:
Relational Model • ER for concept map to Algebraic Relational Model • Relations (tables of possible data) • Instance (actual data at a given time) • Schema (description of those tables, their relations)
Relational Model Look • Notation: p(r) • p is called the selection predicate • Defined as:p(r) = {t | t rand p(t)} Where p is a formula in propositional calculus consisting of termsconnected by : (and), (or), (not)Each term is one of: <attribute> op <attribute> or <constant> where op is one of: =, , >, . <. • Example of selection:branch_name=“Perryridge”(account)
Object-Relational Data Models • Extend the relational data model by including object orientation and constructs to deal with added data types. • Allow attributes of tuples to have complex types, including non-atomic values such as nested relations. • Preserve relational foundations, in particular the declarative access to data, while extending modeling power. • Provide upward compatibility with existing relational languages.
Design Goals • Design Goals: • Avoid redundant data • Ensure that relationships among attributes represented • Ensure constraints are properly modeled: updates • check for violation of database integrity constraints.
Queries • What the programmer sees
Some Basic SQL Commands • Select – Get rows of data • * - everything • From – the name of the table (relation) will follow • Where – Only get the stuff that matches • Example: Select * from movies where theater = Loews • Exercise – • Write down the query to select all of your friends that live in NY State
View 1 View 2 View 3 Conceptual Schema Physical Schema DB Example: University Database • Conceptual schema: • Students(sid: string, name: string, login: string, age: integer, gpa:real) • Courses(cid: string, cname:string, credits:integer) • Enrolled(sid:string, cid:string, grade:string) • External Schema (View): • Course_info(cid:string,enrollment: integer) • Physical schema: • Relations stored as unordered files. • Index on first column of Students. • Key to good performance
View 1 View 2 View 3 Conceptual Schema Physical Schema DB Data Independence (levels of abstraction) • Applications insulated from how data is structured and stored. • Logical data independence: Protection from changes in logical structure of data – stablize views. • Physical data independence: Protection from changes in physical structure of data. • Q: Why are these particularly important for DBMS?
Queries • Change and get data from a database • Run over data model • Easy & efficient • Not good for complex calculations • DML and DDL
Data Manipulation Language (DML) • Language for accessing and manipulating the data organized by the appropriate data model • DML also known as query language • Two classes of languages • Procedural – user specifies what data is required and how to get those data • Declarative (nonprocedural) – user specifies what data is required without specifying how to get those data • SQL is the most widely used query language
Data Definition Language (DDL) • Specification notation for defining the database schema Example: create tableaccount (account-numberchar(10),balanceinteger) • DDL compiler generates a set of tables stored in a data dictionary • Data dictionary contains metadata (i.e., data about data) • Database schema • Data storage and definition language • Specifies the storage structure and access methods used • Integrity constraints • Domain constraints • Referential integrity (references constraint in SQL) • Assertions • Authorization
Count distinct Join Proj Join Emp Asgn Having Group(agg) Select Emp Emp Queries - What does it look like? SELECT COUNT DISTINCT (E.eid) FROM Emp E, Proj P, Asgn A WHERE E.eid = A.eid AND P.pid = A.pid AND E.loc <> P.loc SELECT eid, ename, title FROM Emp E WHERE E.sal > $50K SELECT E.loc, AVG(E.sal) FROM Emp E GROUP BY E.loc HAVING Count(*) > 5 • System handles query plan generation & optimization; ensures correct execution. Employees Projects Assignments • Issues: view reconciliation, operator ordering, physical operator choice, memory management, access path (index) use, …
SQL • SQL: widely used non-procedural language • Example: Find the name of the customer with customer-id 192-83-7465select customer.customer_namefrom customerwherecustomer.customer_id = ‘192-83-7465’ • Example: Find the balances of all accounts held by the customer with customer-id 192-83-7465selectaccount.balancefromdepositor, accountwheredepositor.customer_id = ‘192-83-7465’ anddepositor.account_number = account.account_number • Application programs generally access databases through one of • Language extensions to allow embedded SQL • Application program interface (e.g., ODBC/JDBC) which allow SQL queries to be sent to a database • For us: Oracle and Access SQL languages
Concurrency Control • Concurrent execution of user programs: key to good DBMS performance. • Disk accesses frequent, pretty slow • Keep the CPU working on several programs concurrently. • Interleaving actions of different programs: trouble! • e.g., account-transfer & print statement at same time • DBMS ensures such problems don’t arise. • Users/programmers can pretend they are using a single-user system. (called “Isolation”) • Thank goodness! Don’t have to program “very, very carefully”.