390 likes | 681 Views
Database Systems CS 311. Lecture #1 (with some slides integrated from those of Jiawei Han, Kevin Chang, Alon Halevy, and Dan Suciu.). Self-Introduction. AnHai Doan database and information system group (DAIS) Research interests databases, data mining, web mining, artificial intelligence
E N D
Database SystemsCS 311 Lecture #1 (with some slides integrated from those of Jiawei Han, Kevin Chang, Alon Halevy, and Dan Suciu.)
Self-Introduction • AnHai Doan • database and information system group (DAIS) • Research interests • databases, data mining, web mining, artificial intelligence • Hobbies • mountain climbing, downhill skiing, sailing • Education history • Vietnam => Hungary => Wisconsin => Seattle => UIUC
Course Goals & Content • First course on database systems and data management at UIUC • cover mostlyrelational databases • how to design and create such databases • how to use them (via SQL query language) • how to implement them (only briefly) • will touch on some advanced issues • XML data models, semi-structured data • data integration • you may also try a simple research component • more on this later
Prerequisite • Must have data structure and algorithm background • CS 225 or 300 equivalent • Good at C++ or Java • project will require lot of programming • need C++ or Java to do a good job at talking with databases • you or your project group picks the language • Knowing only C will require more work • more difficult to talk in C to databases
Textbook • Required: Database Systems: The Complete Book, by Garcia-Molina, Ullman and Widom, 2002 • Comments on the textbook. • Do you have problems getting your textbook? • Books on reserve here at the Gringer Library • "Database Management Systems" by Ramakrishnan and Gehrke • "Database System Concepts" by Silberschatz, Korth, and Sudarsan
Course Format • For all students • two 75-min lectures / week • 4-6 homeworks • project • a midterm and a final exam • Graduate students do an extra project • survey papers on a research topic, write a 10-15 page report • I will talk with you in detail later in the course
Lectures • Lecture slides in ppt format will be posted shortly before or after the lecture • are to complement the lectures • Many issues discussed in the lectures will be covered in the exams and homeworks • hence try to attend lectures regularly
Homeworks • Some paper-based, some may involve light programming • Will be collected at the beginning of class on the due date, or be collected at my secretary place • to be decided later • No late homework will be accepted
Project • Select an application that needs a database • Build a database application from start to finish • Significant amount of programming • Will be done in stages • you will submit some work at the end of each stage • Will show a demo at semester end
Project Groups • Project will be done in group of 3-4 students • a lot of work, difficult to design so that one person can do all • learn how to work in a group: valuable skills • groups are like broccoli, they are good for you • Try to form groups as soon as possible • can start by posting requests on the class newsgroup • There will be a deadline later for forming groups • If you have not formed groups by then • we will help assign you to groups
More on Grouping • All group members receive same grading • If someone drops out, the rest pick up the work
Exams • Midterm & final • will be announced shortly • check final date and make sure no conflict! • There will be some brief review before each exam • If you have conflicts • do let us know in advance, see course homepage for more information
Tentative Grading Breakdown • Homework: 25% • Project: 30% • Midterm: 20% • Final: 25% • Will attempt to grade on an absolute scale as much as possible • not on a curve
Staff & Office Hours • Instructor: AnHai Doan • Room 2118 Siebel, anhai@cs.uiuc.edu • Office hours: Tue & Thu 10:45-11:45 (after lecture) • TAs: • Michael Makstman, 1271 DCL, cs311ta1@cs.uiuc.edu217-244-8522, office hours: TBD • Rishi Sinha, 1271 DCL, cs311ta2@cs.uiuc.edu217-244-8522, office hours: TBD • They are not here yet
Communications • www-courses.cs.uiuc.edu/~cs311 • newsgroup: class.cs311 • vitally important! • make sure to check it daily for new announcements • If you have a question/problem • talk to people in your group first • post your question on newsgroup • email TA • go to office hours to talk to TA or instructor • Office hours are held on ALL WEEKDAYS • so don't be shy
Newsgroup • class.cs311 • designed for you and your peer • to communicate and help one another • please do not post solutions to the newsgroup • TAs will monitor and try their best to help with your questions • There can be many questions • it is usually difficult to answer all of them or answer in a timely manner • hence should come to office hours or email TA
A Motivating Example • Suppose we are building a system to store the information about: • students • courses • professors • who takes what, who teaches what
Application Requirements • store the data for a long period of time • large amounts (100s of GB) • protect against crashes • protect against unauthorized use • allow users to query/update: • who teaches “CS 173” • enroll “Mary” in “CS 311”
allow several (100s, 1000s) users to access the data simultaneously • allow administrators to change the schema • add information about TAs
Trying Without a DBMS • Why Direct Implementation Won’t Work: • Storing data: file system is limited • size less than 4GB (on 32 bits machines) • when system crashes we may loose data • password-based authorization insufficient • Query/update: • need to write a new C++/Java program for every new query • need to worry about performance
Concurrency: limited protection • need to worry about interfering with other users • need to offer different views to different users (e.g. registrar, students, professors) • Schema change: • entails changing file formats • need to rewrite virtually all applications • Better let a database system handle it
What Can a DBMS Do for Us? • Data Definition Language - DDL • Data Manipulation Language - DML • query language • Storage management • Transaction Management • concurrency control • recovery • Think buying a plane ticket! Can you do it without a DBMS?
Building an Application with a DBMS • Requirements modeling (conceptual, pictures) • Decide what entities should be part of the application and how they should be linked. • Schema design and implementation • Decide on a set of tables, attributes. • Define the tables in the database system. • Populate database (insert tuples). • Write application programs using the DBMS • way easier now that the data management is taken care of.
Conceptual Modeling name category name cid ssn Takes Course Student quarter Advises Teaches Professor name field address
Schema Design and Implementation • Tables: • Separates the logical view from the physical view of the data. Students: Takes: Courses:
Querying a Database • Find all courses that “Mary” takes • S(tructured) Q(uery) L(anguage) • Query processor figures out how to answer the query efficiently. select C.namefrom Students S, Takes T, Courses Cwhere S.name = “Mary” and S.ssn = T.ssn and T.cid = C.cid
sname sid=sid cid=cid name=“Mary” Courses Takes Students Query Optimization Goal: Declarative SQL query Imperative query execution plan: select C.name from Students S, Takes T, Courses C where S.name=“Mary” and S.ssn = T.ssn and T.cid = C.cid Plan:tree of Relational Algebra operators, choice of algorithms at each operator
Traditional and NovelData Management • Traditional Data Management: • relational data for enterprise applications • storage • query processing/optimization • transaction processing • Novel Data Management: • Integration of data from multiple databases, warehousing. • Data management for decision support, data mining. • Exchange of data on the web: XML.
Database Industry • Relational databases are a great success of theoretical ideas. • Big DBMS companies are among the largest software companies in the world. • Oracle • IBM (with DB2) • Microsoft (SQL Server, Microsoft Access) • Others • $20B industry.
The Study of DBMS • Several aspects: • Modeling and design of databases • Database programming: querying and update operations • Database implementation • DBMS study cuts across many fields of Computer Science: OS, languages, AI, Logic, multimedia, theory...
For the next lecture:read some parts of the textbookthe reading requirements will be posted under “lectures/schedule” tomorrow