380 likes | 559 Views
CS 5 40 : Database Management Systems. 1: Introduction. Welcome to CS540 !. Arash Termehchy Assistant professor in the school of EECS Just moved here from Illinois Usable data exploration systems . Your turn: Name, field, DB background. Data management.
E N D
CS 540: Database Management Systems 1: Introduction
Welcome to CS540! • Arash Termehchy • Assistant professor in the school of EECS • Just moved here from Illinois • Usable data exploration systems. • Your turn: • Name, field, DB background
Data management • Modeling a large number of entities and relationships. • Called structured data • Formal (logical) model • Maintaining them on computational devices • Servers in the cloud, sensor networks, … • Keep them organizedaccording to model • Cope with failures • …
Data management • Exploring entities and relationships efficiently, easily, and effectively • Where are the more affordable apartments in Portland? • Who is the most similar person to Alan? • How a virus will likely to spread in a population? • Make an informed and effective decision
Why study data management? • Data is everywhere: • Business: financial analytics, … • Social: social network, data sharing, … • Personal: map apps, … • Science: spread of diseases, …
Data management is valuable • According to McKinsey & Company’s: • $300 billion potential annual value to US health care • €250 billion potential annual value to Europe’s public sector • 60% potential increase in retailer’s operating margins • Data scienceis transforming the way we make decisions, make scientific discovery, … • Analyzing genetic data to find cures for diseases.
Data management is challenging • According to McKinsey & Company’: • 30 billion data items shared on Facebook every month • 235 TB collected by the Library of Congress • 40% growth in the global data each year • 90% of world’s data was generated in the last two year! • Big data: huge, heterogeneous, evolving
We study these challenges • How to get what we like from the data easily, effectively, and efficiently?
Why should we learn these subjects? • Isn’t sufficient to know SQL? • Let companies that make database management systems to worry about these issues. • No! You will end up with: • A query that takes hundreds of hours to finish! • A database that contains negative salaries!
Why should we learn these subjects? • Managing conventional data requires more: • Tuning databases, developing efficient data exploration programs, … • You may face unconventional data management scenarios • The data may be a big graph that is constantly evolving. • You may use data management ideas in your own research.
Prerequisites • Good programming skills • CS 261 and CS 275 or equivalent • Contact instructor if you are not sure.
Readings • Required: • Database Systems: The Complete Book, Hector Garcia Molina, Jeffry Ullman, and Jennifer Widom • Foundations of Databases, Serge Abiteboul, Richard Hull, and Victor Vianu • Notes on the course website for subjects not covered by the textbook.
Readings • Recommended: • Database Management Systems, Ragu Ramakrishnan and Johannes Gehrke • Other useful readings on the course website.
Grading Scheme • Assignments 40% • Project 60%
Assignments • Written assignments • To understand the main concepts and methods. • Should be done individually. • Start soon!
Project • A data centric application • Some research elements or heavy engineering effort. • Different from CS 275 or CS 440 • Group 1 – 4 • A list of possible projects are in the web site. • Project definition is due in the third week of the class!
Project • A list of possible projects are in the web site. • Project definition is due in the third week of the class! • 5% of total grade
Basic Concepts • Database management system (DBMS): • A piece of software that simplifies and facilitates data management and exploration. • Database content • Data • Schema: information about data, meaning of the data Schema Data Salary: 10 Age: 10
Physical Data Independence • Independence from physical details • File system, operating system, hardware, .. • Data models • The way that we see real-world data. • Relational data model: everything is a relation. • Declarative query language: SQL • Say what, not how
Relational Database Management Physical Storage Conceptual Design Schema Relational Model Entity Relationship(ER) Model Files and Indexes
Topics Physical Storage Conceptual Design Schema Relational Model Entity Relationship(ER) Model Files and Indexes
Topics Physical Storage Conceptual Design Schema Relational Model Entity Relationship(ER) Model Files and Indexes
Topics Modeling data and asking questions: Relational Model & Languages Physical Storage Conceptual Design Schema Relational Model Entity Relationship(ER) Model Files and Indexes
Topics Exploring more than one database: Data Integration Physical Storage Conceptual Design Schema Relational Model Entity Relationship(ER) Model Files and Indexes
Topics Asking vague queries: Approximate queries Physical Storage Conceptual Design Schema Relational Model Entity Relationship(ER) Model Files and Indexes
Topics Storing data in files: Storage Management Physical Storage Conceptual Design Schema Relational Model Entity Relationship(ER) Model Files and Indexes
Topics Finding data in a big file really fast: Data access methods Physical Storage Conceptual Design Schema Relational Model Entity Relationship(ER) Model Files and Indexes
Topics Translating complex queries to read & write: Query execution & optimization Physical Storage Conceptual Design Schema Relational Model Entity Relationship(ER) Model Files and Indexes
Topics Coping with failure: Transaction Management Physical Storage Conceptual Design Schema Relational Model Entity Relationship(ER) Model Files and Indexes
Topics Tuning Physical Storage Conceptual Design Schema Relational Model Entity Relationship(ER) Model Files and Indexes
Relational Database Management Physical Storage Conceptual Design Schema Relational Model Entity Relationship(ER) Model Files and Indexes
Conceptual Design • High level data model • Describe information in the database without worrying about implementation issues • ER model is the most popular tool for conceptual design • Invented by Peter Chen in 1976 • Provides an easy-to-use language: pictures • We review the basic stuff
ER Model/ Diagram title category name price sells Publisher Book address buys employs Person name ssn address
ER Model • Entity Set • An entity is distinctive real world object: cs540 textbook • An entity set is a collection of entities • Attribute • Belongs to an entity • Does not contain any other attribute: atomic • Atomic data types: string, integer, real, … Book Publisher title category price Book
Relationship • Describe relationships between entity sets • Do not exists without entities • May have attributes employs Person Publisher startdate employs Person Publisher
Relationship Multiplicity • One to one: • publisher - manager • Many to one • book – publisher • Many to many • publisher – person
Book Purchase Store Person Multi-way Relationships • Relationships between more than two entity sets • Each entity set has a different role in the relationship seller buyer
ER Model: Keys • Attribute(s) that uniquely identify entities • No standard way to annotate: usually underlined. • Each entity set must have a key • Why? • Relationships may also have keys Person name ssn address