1 / 32

Course Design in Database Lecture 1

For 2004 ACM Honored Class Feng Qian April 3, 2007. Course Design in Database Lecture 1. Outline. Introduction & Implementation in C++ ( 钱风 ) Implementation in Java ( 周牧星 ) Schedule and grading policy ( 马融 ). Introduction. Objective: Build an Database Management System (DBMS) in C++/Java

currier
Download Presentation

Course Design in Database Lecture 1

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. For 2004 ACM Honored ClassFeng Qian April 3, 2007 Course Design in DatabaseLecture 1

  2. Outline • Introduction & Implementation in C++ (钱风) • Implementation in Java (周牧星) • Schedule and grading policy (马融)

  3. Introduction • Objective: Build an Database Management System (DBMS) in C++/Java • A team project, hard but fun engineering work • No official base code is given, you are encouraged to work from zero • No strict technical restrictions, much more flexible than previous projects (Compiler / OS) • TAs: 马融、周牧星、曲文涛、钱风

  4. What you will learn fromDBMS project • A Profound understanding of modern DBMS • A Proficient programming ability for large engineering projects • A Perfect training for debugging skill for thousands lines of codes • A Practical simulation of teamwork environment in industrial zone  No pains, No gains 

  5. Components of DBMS • Storage • Buffer • Heap File • Index • SQL Parser • SQL Engine • Concurrency Control • Database Manager • Interface Bottom Layer Top Layer

  6. Storage (Easy, 200 lines) • Arrange data on disk, in the form of Page • Similar to Virtual Memory in OS • High level layers only knows a Page ID as an entry point for access of the data

  7. Storage • My Implementation • Virtual address space: 4GB • Self-grow page files: 4MB each • An advanced bitmap: segment tree • Main interfaces(All deal with file operations on disk) • CreateDB, OpenDB, CloseDB, DestroyDB • AllocatePage, DeallocatePage • WritePage, ReadPage

  8. Buffer (Easy, 300 lines) • A traditional cache strategy, store frequently accessed pages in memory (instead of in the disk storage system)

  9. Buffer • My implementation • Store 2048 frequently used pages • 2nd-chance replacement algorithm • Main interfaces: • AllocatePage, DeallocatePage • GetPage (Handle page missing) • ReleasePage • PinPage, UnpinPage (Reference count)

  10. Heap File (medium, 1k lines) • provides the ability of sequential access of the records, as well as insertion, deletion, updating and etc basic features. • Manage table structures • Manipulate tables in unit of tuple • Generate iterators for upper layers • The heap file defines the format of table structures and tuples in pages.

  11. Heap File • Use single page to store a table structure CREATE TABLE student (studentid INTEGER, name VARCHAR(10), age INTEGER, schoolid INTEGER); Table Info: Table ID: 2B, Num of columns: 2B, Num of tuples: 4B, First page ID: 4B Num of pages: 4B, Table name: 10B Each Field: Column Name: 2B, Column type: 2B, Column size: 2B, Not null: 1B Key: 1B Offset: 2B, Index info: 4B In the table structure page:

  12. Heap File • Store the table content • Assume a tuple (123,“Zhang”,20,45) is stored in Page 5 • The unique identification (RID) of this tuple is defined by its page (5) and its offset in the page (0x010h) • Use linked multiple pages to store all tuples in a table Structure Page Global Bitmap Content Page 1 Content Page 2 Content Page 3 Content Page n …

  13. Heap File • Main Interface of my implementation • CreateTable • DropTable • InsertTuple • DeleteTuple • UpdateTuple • SearchTuple

  14. Index (difficult, 1000 lines) • To accelerate the access of tuples • The most simple type of index: Always keep tuples in order Screenshot of Microsoft Access 2007

  15. Index • Typical index technique: B+ Tree BT Header Page some important information about the B+ tree is recorded in the BT Header Page, such as the root page id of the tree and the key type BT Index Page BTIndexPage is the internal node and only contains index to the child B+ tree page. BT Leaf Page BTLeafPage is the leaf node and contain index entries with RIDs

  16. Index • Common B+ Tree operations • Search, Insert, Delete • BoundaryQuery (return an array of RIDs) • Maintain B+ Tree at the same time of manipulating the heap file. • Keep the balance of the tree • To be simple, you can build a B+ Tree for only the column with the “primary key” flag. • Please refer to algorithm books and papers for more information of B+ Tree

  17. SQL Parser (100 to 3k lines) • Parse SQL statement into the grammar tree • Two ways • Write a naïve parser yourself • Use tools (YACC for C++ users, JCup for Java users)

  18. SQL Parser • Your DBMS should support st the following SQL statements • SELECT fields FROM tables [WHERE Wclause] [GROUP BY fields] [HAVING fexpression] [ORDER BY fields] • CREATE TABLE tablename (ctclauses) • DROP TABLE tablename • INSERT INTO tablename (fields) VALUE (values) • DELETE FROM tablename [WHERE Wclause] • UPDATE tablename SET uclauses [WHERE Wclause] • Group functions: AVG | COUNT | MIN | MAX | SUM • Relation Op.: AND | OR • Data Types: CHAR | VARCHAR | DATE | TIME | DOUBLE | INTEGER | LONG

  19. SQL Parser • Examples • CREATE TABLE student (studentid INTEGER, name VARCHAR(30) DEFAULT '(NONAME)', age INTEGER DEFAULT -1, dat DATE DEFAULT DATE '2000-01-01' NOT NULL); • SELECT star1.name, star2.name FROM MovieStar star1, MovieStar star2 WHERE Star1.address=star2.address AND Star1.name<Star2.name; • SELECT name FROM MovieExec WHERE (cert#) IN (SELECT producerC# FROM Movie WHERE ( title , year ) IN (SELECT movieTitle , movieYear FROM StarsIn WHERE starName = 1)); • SELECT COUNT(birthdate) FROM MovieStarGROUP BY gender HAVING gender='F';

  20. SQL Engine (Medium, 500~3k lines) • Execute the parsed SQL statement • A possible implementation

  21. SQL Engine • Relational Algebra Query Tree

  22. SQL Engine • Optimized Execution Tree Refer http://www.ittc.ku.edu/~sgauch/647/s99/notes/11b/sld001.htm for details

  23. Concurrency Control (easy, 300 lines) • Different users can open the same database in the same time • They also can read from the same table concurrently • They cannot access the same table when someone do writing operations on it • Thus, a concurrency control is needed

  24. Concurrency Control • A possible implementation • Set up a table lock • Two kinds of locks: Read lock and Write lock When a table has a write lock, no operations are allowed on it. When a table has a read lock, only read requirements are allowed. • If a user want to do some operations on it, he must wait until the previous lock is released. • Avoid dead locking!

  25. Database Manager (medium, 500~1k lines) • The bridge between access interface and the core system • Since we allow open more than one database simultaneously, the DB manager should maintain all opened databases in current system. • Maintain system tables for each database • package the result from the low-level system, and send it to the right session • Logger

  26. Database Manager • Main functions in my implementation • Hold the connected sessions (multi-threaded) • Manage every opened database in current system • Package the result and do response • Logger

  27. User Interface (Easy, 500~1k lines) • Two types of user interfaces • Graphics User Interface • XDBC (JDBC, ODBC or you defined) programming interface • Remote user interfaces should be implemented • C++ users: recommend raw socket • Java users: recommend RMI

  28. User Interface - GUI

  29. User Interface - API • I “created” a simple API. Sample program: #include “FatwormAPI.h” CFatWormAPI::Connect(“192.168.1.12”,1202); CFatWormAPI::ExecuteCommand("LOGIN AAA AAA;"); CFatWormAPI::ExecuteCommand("OPEN DATABASE STU;"); CFatWormAPI::ExecuteCommand(“SELECT * FROM student2 WHERE studentid<1000;"); for (int i=0;i<CFatWormAPI::GetSelectCount();i++) { printf("%d %s %d %d\n", CFatWormAPI::GetInteger(i,0), CFatWormAPI::GetChar(i,1), CFatWormAPI::GetInteger(i,2), CFatWormAPI::GetInteger(i,3) ); } CFatWormAPI::DisConnect();

  30. Components of DBMS - Review Local Access Client X D B C Socket/RMI Remote Server Database Manager SQL parsing & SQL Engine Concurrency Control Heap File B+ TreeIndexing Storage & Buffer

  31. Implementation DBMS in C++ • Advantage • Very Fast (Most real DBMS are written in C++) • Direct memory access inline static int ReadInt(const BYTE * pData,int ofs) { return *((int *)(pData+ofs)); } • Powerful Marco #define MAKE_RID(PID,OFS) ((((DWORD)(PID))<<20) | (DWORD)(OFS)) • Disadvantage • “Access violation” program– Be careful when programming!

  32. Thank you • Contact me via e-mail • shinyflute@gmail.com • shinyflute@sjtu.org

More Related