350 likes | 378 Views
Principles of Data Management Syllabus & Intro. Welcome!. Course website: https://cis.temple.edu/~edragut/CIS5516-Spr17/teaching.htm Text Book(s) Workload Intended Schedule Projects Grading Reading List. Here Are Some DB Questions. How about This Question?.
E N D
Welcome! • Course website: • https://cis.temple.edu/~edragut/CIS5516-Spr17/teaching.htm • Text Book(s) • Workload • Intended Schedule • Projects • Grading • Reading List
What Is a DBMS? • A very large, integrated collection of data. • Models real-world enterprise. • Entities (e.g., students, courses) • Relationships (e.g., Madonna is taking CS564) • A Database Management System (DBMS)is a software package designed to store and manage data.
Files vs. DBMS • Application must stage large datasets between main memory and secondary storage (e.g., buffering, page-oriented access, 32-bit addressing, etc.) • Special code for different queries • Must protect data from inconsistency due to multiple concurrent users • Crash recovery • Security and access control
Why Use a DBMS? • Data independence and efficient access. • Reduced application development time. • Data integrity and security. • Uniform data administration. • Concurrent access, recovery from crashes.
? Why Study Databases?? • Shift from computation to information • at the “low end”: scramble to webspace (a mess!) • at the “high end”: scientific applications • Datasets increasing in diversity and volume. • Digital libraries, interactive video, Human Genome project, EOS project • ... need for DBMS exploding • DBMS encompasses most of CS • OS, languages, theory, AI, multimedia, logic
A Brief DB History • Early 1970s • Many database systems • Incompatible, exposing many implementation details • Then Ted Codd came along • Relational model • Structured Query Language (SQL) • Implementation differences became irrelevant • A few major DB systems dominated the market
Then Web 2.0 & 3.0, Big Data Happen • What do you think happen? • Semi-structured data happen. • A lot of it and in many forms…
Some Facts about Web x.0 and Big Data • Twitter: 255 million monthly active users and 500 million Tweets are sent per day, • Facebook: over 1 billion monthly users and faces 3 million message per 20 minute • Instagram: 200 Million Monthly Active Users and 1.6 Billion Likes and 60 Million Photos shared every day
NoSQL Databases Somebody, Please, Bring Some Order to This Madness – Cont’d
Different Interfaces Different hardware support Different application support Lack of Uniformity Somebody, Please, Bring Some Order to This Madness Source: http://www.infoq.com/articles/State-of-NoSQL
Additional Resources • Tutorial by C. Mohan, An In-Depth Look at Modern Database Systems • https://docs.google.com/file/d/0B7lNUaak0bK1encwYnBVUWZSWjA/edit
Tables or Relations Relational Data
Relational Database: Query Language • SQL - Structured Query Language • a declarative language designed for managing data held in a relational database management system • Tell what you want and from where • Do not tell: how to get the data
Key-Value Store • Implemented as an associative array, map, symbol table, or dictionary abstract data type composed of a collection of (key, value) pairs such that each possible key appears at most once in the collection. • A simple put/get interface • Great properties: scalability, availability, reliability
Key-Value Store Usage Scenarios • Increasingly popular within data centers and in P2P amazon.com LinkedIn Facebook Vuze uTorrent P2P Data center Voldemort Dynamo Cassandra Vuze DHT uTorrent DHT
Row Store and Column Store • In row store data are stored in the disk tuple by tuple. • Where in column store data are stored in the disk column by column. • Column-stores are more I/O efficient for read-only queries as they read, only those attributes which are accessed by a query. Source: Column-Oriented Database Systems, VLDB 2009. Tutorial; S. Harizopoulos, D. Abadi, P. Boncz
So column stores are suitable for read-mostly, read-intensive, large data repositories Row Store and Column Store
Graph Databases Ecological Network Social Network Biological Network Chemical Network Web Graph Program Flow
Graph Databases: Query • Find all the restaurants my friends (in Facebook) like
So, Why Study Relational DBs? • Jack Clark, The Register, 30 August 2013: “The tech world is turning back toward SQL, bringing to a close a possibly misspent half-decade in which startups courted developers with promises of infinite scalability and the finest imitation-Google tools available, and companies found themselves exposed to unstable data and poor guarantees.” • Google Spanner paper, October 2012: “We believe it is better to have application programmers deal with performance problems due to overuse of transactions as bottlenecks arise, rather than always coding around the lack of transactions.” • Sean Doherty in Wired, September 2013: “But don’t become unnecessarily distracted by the shiny, new-fangled, NoSQL red buttons just yet. Relational databases may not be hot or sexy but for your important data there is no substitute.”
And, The Key Reason of All • Gartner estimates RDBMS market at $26B with about 9% annual growth, whereas Market Research Media Ltd expects NoSQL market to be at $3.5B by 2018. • Source: C Mohan’s tutorial
Databases make these folks happy ... • End users and DBMS vendors • DB application programmers • E.g., smart webmasters • Database administrator (DBA) • Designs logical /physical schemas • Handles security and authorization • Data availability, crash recovery • Database tuning as needs evolve Must understand how a DBMS works!
Summary • DBMS used to maintain, query large datasets. • Benefits include recovery from system crashes, concurrent access, quick application development, data integrity and security. • Levels of abstraction give data independence. • A DBMS typically has a layered architecture. • DBAs hold responsible jobs and are well-paid! • DBMS R&D is one of the broadest, most exciting areas in CS.