1 / 19

Database Systems: A Vertical Slice of Computer Science … or … It’s All About the Data!

Mike Carey Information Systems Group Computer Science Department UC Irvine. Database Systems: A Vertical Slice of Computer Science … or … It’s All About the Data!. Wait … Who Is This Guy?. Carnegie-Mellon University, 1975-80 B.S. and M.S. Student, EE/ECE UC Berkeley, 1980-83

vaschel
Download Presentation

Database Systems: A Vertical Slice of Computer Science … or … It’s All About the Data!

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Mike Carey Information Systems Group Computer Science Department UC Irvine Database Systems:A Vertical Slice of Computer Science … or …It’s All About the Data!

  2. Wait … Who Is This Guy? • Carnegie-Mellon University, 1975-80 • B.S. and M.S. Student, EE/ECE • UC Berkeley, 1980-83 • Ph.D. Student, CS • University of Wisconsin, 1983-95 • Assistant/Associate/Full Professor, CS • IBM, 1995-2000 • Industrial Researcher & Software R&D Manager • Propel Software, 2000-01 • Startup Company Fellow/CTO/VP of Software • BEA Systems, Inc., 2001-08 (acquired by Oracle) • Industrial Software Architect & Sr. Engineering Director • And now I’m here… Trivia tidbit: Here’s a photo of my first (ever) CS TA 

  3. Plan For Today’s Talk • Okay, so just what is a database system? • Based on lecture notes from the UW-Madison database curriculum, as immortalized in Database Management Systems (Ramakrishnan & Gehrke, a.k.a.“the Cow book”) • The database field is a vertical slice of all of CS! • You’ll see what I mean (and why)… • What’s exciting in “database systems” today? • UCI Information Systems Group (ISG) and beyond!

  4. What isa Database System? • So what’s a database? • A very large, integrated collection of data • Usually a model of a real-worldenterprise or a history of real-worldevents • Entities (e.g., students, courses, Facebook users, …) • Relationships (e.g., Susan is taking CS 234, Susan is a friend of Lynn, Mike filed a grade change for Lynn, …) • What’s a database management system (DBMS)? • A software system designed to store, manage, and provide access to one or more such databases

  5. Evolution of DBMS New Data New Data New Data Relational CODASYL/IMS Files

  6. Why Use a DBMS? • Reduced application development time • Efficient (and automatic!) data access • Data independence • Data integrity and security • Uniform data administration • Concurrent access and recovery from crashes

  7. Why Study Databases? • Shift from computation to information • At the “low end”: explosion of the web (a mess!) • At the “high end”: scientific applications • Datasets increasing in diversity and volume • Digital libraries, interactive video, social media, genomic data, big science data, … • ... need for DBMS exploding! • DBMS field encompasses most of CS • OS, languages, theory, AI, multimedia, logic, … ?!

  8. Data Models • A data model is a collection of concepts for describing data (to one another or to a DBMS) • Aschemais a description of a particular collection of data, using a given data model • The relational model is the most widely used data model today • Relation – basically a table with rows and (named) columns • Schema – describes the tables and their columns

  9. Levels of Abstraction Lies! • Many views of one conceptual (logical) schema and an underlying physical schema • Views describe how different users or groups see the data • Conceptual schema defines the logical structure of the database • Physical schema describes the files and indexes used “under the covers” View 1 View 2 View 3 Conceptual Schema Logical Model Physical Schema On-Disk Data Structures Bits

  10. Example: University DB • Conceptual schema: • Students(sid: string, name: string, login: string, age: integer, gpa: real) • Courses(cid: string, cname: string, credits: integer) • Enrolled(sid: string, cid: string, grade: string) • Physical schema: • Relations each stored as unordered files • Have indexes on first and third columns of Students • External schema (a.k.a. view): • CourseInfo(cid: string, cname: string, enrollment: integer)

  11. Data Independence • Applications are insulated from how data is actually structured and stored! • Logical data independence:Protection from changes in the logical structure of data • Physical data independence:Protection from changes in the physical structure of data • One of the most important benefits of using a DBMS! • Allows changes to be made w/o application rewrites

  12. Example: University DB (cont.) • User query (in SQL, against the external schema): • SELECT c.cid, c.enrollment FROM CourseInfo c WHERE c.cname = ‘Computer Game Design’ • Equivalent query (against the conceptual schema): • SELECT e.cid, count(e.*) FROM Enrolled e, Courses c WHERE e.cid = c.cid AND c.cname = ‘Computer Game Design’ GROUP BY c.cid • Under the hood (against the physical schema) • Access Courses – use index on cname to find associated cid • Access Enrolled – use index on cid to count the enrollments

  13. Query Optimization and Execution Relational Operators Files and Access Methods Buffer Management Disk Space Management DB Architecture of a DBMS Queries • A typical DBMS has a layered architecture • The figure doesn’t show the concurrency control and recovery components • This is one of several possible architectures; each actual system has its own variations Note: These layers must consider concurrency control and recovery

  14. DB Field is a Vertical Slice of CS • “I like programming languages and compilers” • Consider high-level, declarative languages like SQL • “I like low-level operating systems issues” • DBMSs manage records, memory, locks, logs, … • “I really want to work on distributed systems” • Distributed and parallel database systems are ripe with distributed algorithms and systems issues (!) • “Data structure and algorithm design is really cool” • Database indexes are data structures on disk (or flash) (And so on!)

  15. What’s Exciting in DB Land Today? • The Web is full of database challenges (“Big Data”!) • A box for keywords only goes so far… • How can I query the web, e.g., “Find me 5-string Fender bass guitars for sale in the $1000-1500 price range” • Click streams and social networks generate lots of data • How can I query and analyze all that data (e.g., toacton it)? • Ubiquitous computing is data-rich, too (IoT) • Build, deploy, and use location-based data services • Query and aggregate streams of sensor or video data • There’s data everywhere, and of all shapes and sizes • How do we integrate it, e.g., for rapid crisis response? • And when we do, how do we ensure privacy/security?

  16. Ex: DB Challenges at Facebook • Data store for low-latency, high-traffic Web sites • Only have a few hundred milliseconds to generate an entire page • Data heavily cached outside the DBMS today, which is “far from ideal” • Data systems for offline/batch-oriented processing • I mentioned this before: clickstream analysis, graph analysis, etc. • Potentially interested in faster, approximate answers • Would like to do this in real time as well, as data arrives • Hardware trends (always) present new opportunities • Flash storage, for example • Multicore CPUs (nobody uses them super well yet) • Some open source work fromFacebookrelated to DBs • Hive: Open source SQL on top of Hadoop • Cassandra: Large-scale distributed storage for semistructured data

  17. AsterixDB System (UCI / UCR)(https://asterixdb.apache.org/) Data loads & feeds from external sources (XML, JSON, …) AQL queries & scripting requests and programs Data publishing to external sources and apps ASTERIX Goal: To ingest, digest, persist, index, manage, query, analyze, and publish massive quantities of semi-structured information… Hi-Speed Interconnect CPU(s) CPU(s) CPU(s) Main Memory Main Memory (ADM = ASTERIX Data Model, AQL = ASTERIX Query Language) Main Memory Disk Disk Disk ADM Data ADM Data ADM Data

  18. Summary • A DBMSis for storing and querying big datasets • Benefits of using one are many: rapid development of new applications (“what, not how”), recovery after crashes, support for (safe) concurrent access, help in ensuring data integrity and security, … • Levels of schema abstraction  data independence • DB research is a vertical slice of all of CS (“for data”) • Big Dataexperts are in high industrial demand! () • Data is what it’s all about today! So, consider taking our three classes: CS 122A/B/C (and occasionally offered special topics classes)

  19. Questions?

More Related