460 likes | 654 Views
CS 4 40 : Database Management Systems. 1: Introduction. Welcome to CS440 !. Arash Termehchy Assistant professor in the school of EECS Just moved here from Illinois Usable data exploration systems. Your turn: Name, field, DB background. Data management.
E N D
CS 440: Database Management Systems 1: Introduction
Welcome to CS440! • Arash Termehchy • Assistant professor in the school of EECS • Just moved here from Illinois • Usable data exploration systems. • Your turn: • Name, field, DB background
Data management • Modeling a large number of entities and relationships. • Called structured data • Formal (logical) model • Maintaining them on computational devices • Servers in the cloud, sensor networks, … • Keep them organizedaccording to model • Cope with failures • …
Data management • Exploring entities and relationships efficiently, easily, and effectively • Where are the more affordable apartments in Portland? • Who is the most similar person to Alan? • How a virus will likely to spread in a population? • Make an informed and effective decision
Why study data management? • Data is everywhere: • Business: financial analytics, … • Social: social network, data sharing, … • Personal: map apps, … • Science: spread of diseases, …
Data management is valuable • According to McKinsey & Company’s: • $300 billion potential annual value to US health care • €250 billion potential annual value to Europe’s public sector • 60% potential increase in retailer’s operating margins • Data scienceis transforming the way we make decisions, make scientific discovery, … • Analyzing genetic data to find cures for diseases.
Data management is challenging • According to McKinsey & Company’: • 30 billion data items shared on Facebook every month • 235 TB collected by the Library of Congress • 40% growth in the global data each year • 90% of world’s data was generated in the last two year! • Big data: huge, heterogeneous, evolving
We study these challenges • How to get what we like from the data easily, effectively, and efficiently?
Why should we learn these subjects? • Isn’t sufficient to know SQL? • Let companies that make database management systems to worry about these issues. • No! You will end up with: • A query that takes hundreds of hours to finish! • A database that contains negative salaries!
Why should we learn these subjects? • Managing conventional data requires more: • Tuning databases, developing efficient data exploration programs, … • You may face unconventional data management scenarios • The data may be a big graph that is constantly evolving. • You may use data management ideas in your own work.
Prerequisites • Good programming skills • CS 261 and CS 275 or equivalent • Contact instructor if you are not sure.
Readings • Required: • Database Systems: The Complete Book, Hector Garcia Molina, Jeffry Ullman, and Jennifer Widom • Notes on the course website for subjects not covered by the textbook.
Readings • Recommended: • Database Management Systems, Ragu Ramakrishnan and Johannes Gehrke • Foundations of Databases, Serge Abiteboul, Richard Hull, and Victor Vianu • Other useful readings on the course website.
Grading Scheme • Assignments 40% • Project 60%
Assignments • Written assignments • To understand the main concepts and methods. • Should be done individually. • Start soon!
Project • A database centric application • Data Engineering effort. • Advanced feature • Different from CS 275 • Easier search (keyword search) • Nice visualization • …
Project • Group 2 – 4 • Practice how to work in groups • Project definition is due in the third week of the class! • The data, application, and scope • 5% of total grade
Basic Concepts • Database management system (DBMS): • A piece of software that simplifies and facilitates data management and exploration. • Database content • Data • Schema: information about data, meaning of the data Schema Data Salary: 10 Age: 10
Physical Data Independence • Independence from physical details • File system, operating system, hardware, .. • Data models • The way that we see real-world data. • Relational data model: everything is a relation. • Declarative query language: SQL • Say what, not how
Relational Database Management Physical Storage Conceptual Design Schema Relational Model Entity Relationship(ER) Model Files and Indexes
Topics Physical Storage Conceptual Design Schema Relational Model Entity Relationship(ER) Model Files and Indexes
Topics Physical Storage Conceptual Design Schema Relational Model Entity Relationship(ER) Model Files and Indexes
Topics Modeling data and asking questions: Relational Model & Languages Physical Storage Conceptual Design Schema Relational Model Entity Relationship(ER) Model Files and Indexes
Topics Organizing the data: Database design Physical Storage Conceptual Design Schema Relational Model Entity Relationship(ER) Model Files and Indexes
Topics Keeping the data clean and meaningful: Integrity constraints Physical Storage Conceptual Design Schema Relational Model Entity Relationship(ER) Model Files and Indexes
Topics Doing more than asking queries: Stored procedures, ORM Physical Storage Conceptual Design Schema Relational Model Entity Relationship(ER) Model Files and Indexes
Topics Storing data in files: Storage Management Physical Storage Conceptual Design Schema Relational Model Entity Relationship(ER) Model Files and Indexes
Topics Finding data in a big file really fast: Data access methods Physical Storage Conceptual Design Schema Relational Model Entity Relationship(ER) Model Files and Indexes
Topics Translating complex queries to read & write: Query execution & optimization Physical Storage Conceptual Design Schema Relational Model Entity Relationship(ER) Model Files and Indexes
Topics Coping with failure: Transaction Management Physical Storage Conceptual Design Schema Relational Model Entity Relationship(ER) Model Files and Indexes
Topics Tuning Physical Storage Conceptual Design Schema Relational Model Entity Relationship(ER) Model Files and Indexes
Relational Database Management Physical Storage Conceptual Design Schema Relational Model Entity Relationship(ER) Model Files and Indexes
Conceptual Design • High level data model • Describe information in the database without worrying about implementation issues • ER model is the most popular tool for conceptual design • Invented by Peter Chen in 1976 • Provides an easy-to-use language: pictures • We review the basic stuff
ER Model/ Diagram title category name price sells Publisher Book address buys employs Person name ssn address
ER Model • Entity Set • An entity is distinctive real world object: cs540 textbook • An entity set is a collection of entities • Attribute • Belongs to an entity • Does not contain any other attribute: atomic • Atomic data types: string, integer, real, … Book Publisher title category price Book
Relationship • Describe relationships between entity sets • Do not exists without entities • May have attributes employs Person Publisher startdate employs Person Publisher
Relationship Multiplicity • One to one: • publisher - manager • Many to one • book – publisher • Many to many • publisher – person
Book Purchase Store Person Multi-way Relationships • Relationships between more than two entity sets • Each entity set has a different role in the relationship seller buyer
ER Model: Keys • Attribute(s) that uniquely identify entities • No standard way to annotate: usually underlined. • Each entity set must have a key • Why? • Relationships may also have keys Person name ssn address
Topics Modeling data and asking questions: Relational Model & Languages Physical Storage Conceptual Design Schema Relational Model Entity Relationship(ER) Model Files and Indexes
Relational Model • Relational model defines data organization and data retrieval/manipulation operations • It is easier to implement than ER model • It captures more details about the data
An Example Relation name Attribute names Book: Title Price Category Year MySQL $102.1 computer 2001 Cell biology $201.69 biology 1954 French cinema $53.99 art 2002 NBA History $63.65 sport 2010 tuples
Relational Model • Attributes • Atomic values • atomic types: string, integer, real, date, … • Each relation must have keys • Attributes without duplicate values • A relation does not contain duplicate tuples. • Reordering tuples does not change the relation. • Reordering attributes does not change the relation.
Database Schema vs. Database Instance • Schema of a Relation • Names of the relation and their attributes. • E.g.: Person (Name, Address, SSN) • Types of the attributes • Constraints on the values of the attributes • Schema of the database • Set of relation schemata • E.g.: Person (Name, Address, SSN) Employment(Company, SSN)
Database Schema vs. Database Instance • Schema: Book(Title, Price, Category, Year) • Instance: Title Price Category Year MySQL $102.1 computer 2001 Cell biology $201.69 biology 1954 French cinema $53.99 art 2002 NBA History $63.65 sport 2010
Example Schema Beers(name, manf) Bars(name, addr, license) Drinkers( name,addr, phone) Likes(drinker, beer) Sells(bar, beer, price) Frequents(drinker, bar)