1 / 31

14-848 Cloud Infrastructure

Learn about relational databases and the SQL language used to query and manipulate data. Understand how relational databases are organized on disk and their role in cloud infrastructure.

Download Presentation

14-848 Cloud Infrastructure

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. 14-848Cloud Infrastructure Lecture 10 * Fall 2018 * Kesden

  2. Socrative • https://api.socrative.com/rc/Nfu6Lp

  3. Today’s Goals • A brief introduction to relational databases • A brief introduction to the SQL language used by relational databases • A brief look at how relational databases are organized on disk • Reflect on how they can be used within clouds • Reflect on why there are limits to their ability to scale

  4. Not Among Today’s GoalS • A deep study of relational databases • Or, even minimal functional competency with SQL databases • If interested, take 15-615 or 15-645

  5. Database “A database is an organized collection of data, stored and accessed electronically.” -- https://en.wikipedia.org/wiki/Database

  6. Relational Database • “A relational database is a digital database based on the relational model of data, as proposed by E. F. Codd in 1970. “ • “The relational model (RM) for database management is an approach to managing data using a structure and language consistent with first-order predicate logic…where all data is represented in terms of tuples, grouped into relations. • “Virtually all relational database systems use SQL (Structured Query Language) for querying and maintaining the database.” -- https://en.wikipedia.org/wiki/Relational_database -- https://en.wikipedia.org/wiki/Relational_model

  7. Relational DatabasesRole In Clouds • Most widely used type of database since the 1980s • Unlikely to lose that title, or see reduced growth in use, any time soon • Good for up to millions of records • Beyond that, for “Big Data” applications, we’ll talk about various NoSQL databases • Everything from managing system configuration data to customer accounts to inventories to order and shipping records, to HR data, etc.

  8. Relational Database, Informally • A database wherein the data is structured in tables known as relations. • Rows represent entities and are unordered • A relational database, although drawn as a table, functions more like an unordered set of rows. • Columns represent attributes • Constraints can be enforced w.r.t. types and values • Schema for each table describes attributes and associated types. • Operations can be performed across tables.

  9. Example Relation:Student Table

  10. Keys:What Are They? • Used to name rows within tables • Used to associate rows across tables • Can be tuple of attributes

  11. KeYS:Example Role • Unique within relation • Used to associate relations

  12. SQL (“Sequel”)Structured Query Language • Used by nearly all relational databases • Both a Data Definition Language (DDL) and a Data Manipulation Language (DML) • DDL: Create, define, modify, and administer (access restrictions, etc) relations • Structure, types, constraints, etc – not values therein • DML: Query and modify values: Tuples and attributes thereof • Hides organization of data, e.g. whether and how indexed, how stored, location, etc. • Big difference from older database languages that described tree paths, etc. • Name derived from earlier IBM predecessor, “Standard English QUeryLanguage”

  13. Keys:Types • Candidate – Unique to row • Primary – Used as name for row • Foreign – Key used in another table • Think of a foreign key as a pointer

  14. SQL:CREATE CREATE TABLE StudentBios ( StudentID CHAR(9) NOT NULL, FName VARCHAR(200)NOT NULL, LName VARCHAR(200)NOT NULL, YearEnrolled YEAR )

  15. SQL:CREATE, REVISITED CREATE TABLE StudentCourses ( StudentID CHAR(9) NOT NULL, Course CHAR(6) NOT NULL, Semester CHAR(3) NOT NULL, Grade CHAR(2) PRIMARY KEY (StudentID, Course, Semester), FOREIGN KEY (StudentID) REFERENCES StudentBios ) Why is the primary key (StudentID, Course, Semester)? Not just StudentID?

  16. SQL:INSERT INSERT INTO StudentBios (CHAR(9), VARCHAR(200),VARCHAR(200),YEAR) VALUES (123456789, ‘lkennedy’, ‘Laurie’, ‘Kennedy’, 2014)

  17. SQL:SELECT SELECTFname, Lname from StudentBios WHERE YearEnrolled >= 2015

  18. SQL:SELECT, Continued SELECTStudentID, Fname, Lname from StudentBios WHERE YearEnrolled <= 2015 ORDER BY FName,LName

  19. SQL:DELETE DELETE FROM StudentBios WHERE StudentBios.Fname = ‘Laurie’

  20. SQL:JOIN • Combines columns from one or more relations into a new relation • Rows are aligned based upon a matching key • Columns are selected from logically combined rows • Yes, a relation can be joined with itself, hence one or more  • Complexity • Some keys may be present in some relations but not others • Depending upon which table is being iterated, some rows may be missed or some columns may be empty • This leads to different flavors of JOIN

  21. SQL:JOIN SELECT AndrewID, Course, Semester, Grade FROM StudentBiosINNER JOIN StudentCourses WHEREStudentBIOS.StudentID = StudentCourses.StudentID • This effectively forms a new relation which is the union of the StudentBios and StudentCourses tables • The rows of the two are associated up based upon the StudentID of the the rows matching. • INNER JOIN requires matches in both tables • If there isn’t a matching key in both tables it isn’t included • The result has (AndrewID, Course, Semester, Grade) columns

  22. SQL:Views CREATEVIEWStudentClassList AS SELECT AndrewID, Course, Semester FROM StudentBios INNER JOIN StudentCourses WHERE StudentBIOS.StudentID = StudentCourses.StudentID • This forms what is essentially a virtual table • Abstracts away complexity of underlying tables • Allows view to be created from tables that may be from different domains or systems • Allows protections to be applied to table, e.g. can access course list for student, but not course grades • May enable better system performance by aiding caching, etc.

  23. Relational Databases:Memory Hierarchy: A Quick Review • Really approximate access times (latency): • Registers: Speed of processor • SRAM (Cache): 2-3nS = ~2.5x10-6 mS = ~2.5/1,000,000 mS • DRAM (Main Memory): 20-30nS = ~2.5x10-5 mS = 2.5/100,000 mS • FlashRAM (SSD): 0.035 mS • Disk Drive: 1 mS, sequential • Disk Drive: 10 mS, random • The name of the game is… • To build a system that performs nearly as well as if all of its memory is of the fastest type • At a cost nearly as low as if all of its memory is of the slowest type • Caching is the magic sauce

  24. Relational Databases:Additional Considerations • Memory is limited • High throughput is key • ACID properties • Atomicity – A transaction is all done, or note done at all, but not half done (Succeed or fail) • Consistency – Find things consistent, leave things consistent. Don’t break things with side-effects. • Isolation – Even if there is concurrency, effects should be the same as if sequential • Durability – Once it is done, it is done (.) • Minimize overhead, while maximizing performance

  25. Relational Databases:Implications OF ENVIRONMENT, GOALS • Caching is critical • Durability will require consideration of hardware • SSD or other FlashRAM layer? • Battery powered backup? • Etc • Indexes and data stored separately • Pack as many indexes into RAM as possible. Think like Haystack, but 40 years earlier. • Multiple indexes, separate for each use • Indexes and data different caching domains

  26. Relational Databases:Typical Organization • Block/Page based storage of data pages • Often LRU cached • All same size for interchangeable allocation and caching, mapping to disk blocks, etc • Indexes constructed to speed queries • B-Tree based – sorted, range-based, etc • Caching may be page-based LRU. • Hash based – fast lookup without rich features • Trades disk space for lookup speed • Exact lookup only, not ranges, etc. • May be dynamic • Lookup results can be cached. • Indexes point into data pages • Commonly created by administrator, not based upon any intrinsic understanding by the database, itself

  27. Relational Databases:Obtaining Robustness • Backups are hard because they may require a freeze • Blurry backups may not be useful because of ACID • Some algorithms can make blurry backup and fix-up • Replicated databases are a common option • A few replicas provide robustness without blowing scale • Need policies for working through failure and obtaining recovery • Need atomic commit protocol or master-slave, etc, to maintain consistency

  28. Relational Databases:Scaling Up • Scaling to a point is easy enough • Buy bigger disk • Solution can be replicated • Scaling even bigger is easy enough • Buy an array • Solution can be replicated • Scaling even bigger gets complicated • Distributing whole tables makes joins more expensive • Sharding tables by row makes summarizing across attributes slow • Sharding tables by column makes lookup by entity slow • At a certain point, it may be better to focus on precise needs and structure a solution to meet them • NoSQL Databases: Row-Oriented, Column-Oriented, Key-Value Stores, etc.

  29. Relational Databases:Scaling Up, Simple Ideas • Distill data as much as possible. • Make representation of data as dense as possible. • Keep data redundantly, if (absolutely needed) to allow distribution with performance

  30. SQL:The STORY CONTINUES • There is a ton of theory behind relational databases • There is a ton more to the SQL language • There is a ton more to how SQL databases are organized, indexed, etc. • Take 15-615 or 15-645 if interested.

  31. Moving Forward:Data (Not File) Storage at Scale • Key-Value Stores • Column, Row, and Document-Oriented Databases • RAM layer storage

More Related