290 likes | 311 Views
Advanced Database Techniques. Martin.Kersten @ cwi.nl Stefan Manegold@cwi.nl Sandor Heman @ cwi.nl Jennie Zhang @ cwi.nl Romulo Goncalves @cwi.nl. Administrative details. The website evolves as during the course Exam material is marked explicitly Lab work deadlines are strict
E N D
Advanced Database Techniques Martin.Kersten @ cwi.nl Stefan Manegold@cwi.nl Sandor Heman @ cwi.nl Jennie Zhang @ cwi.nl Romulo Goncalves @cwi.nl
Administrative details • The website evolves as during the course • Exam material is marked explicitly • Lab work deadlines are strict • Email is the preferred way to communicate • Tomorrow the assistants will be available in person between 11:00-12:00, room REC-P.123
Relational systems • A database system should simplify the organization, validation, sharing, and bookkeeping of information • Prerequisite knowledge • Relational data model and algebra • Data structures (B-tree, hash) • Operating system concepts • Using a SQL database system • What is your practical experience? [Ruby on Rails expertise needed]
Applications • Bread-and-butter applications? • Web-shop • Banking systems • Inventory systems • Production systems • Shopping systems • Government systems • Health systems • Multimedia systems • Science systems …
Advanced Applications • Bread-and-butter applications ??? • Banking systems • What happens if you install a stock trading system which should handle >100K transactions/minute • How to derive trading advice using compute intensive applications • How to warn thousands of users about their trading opportunity • …. Need for parallel, distributed main-memory database technology…
Advanced application requirements • Bread-and-butter applications • Inventory applications • How to install a battlefield inventory systems • How to deliver goods just in time? • How to keep track of moving objects/persons ? • … need for sensor-based database support and RFID tags … need for a new DBMS ?…
Advanced Applications • Production systems • How to interact with component suppliers • How to manage the production workflow • How to avoid bad production steps • How to maintain a database with 12000 tables (SAP) • … need for interoperability between autonomous systems… datamining and knowledge discovery…
Advanced Applications • Health information systems • How to monitor your health over 30 years • How to enable quick response to a heart attack • …need for interoperable database systems …
HELP The Ambient Home
HELP 911 called The Ambient Home
911 called MonetDB DataCell A Shared Tuple Space using an SQL DBMS nucleus
Advanced applications • Government systems • Security • Biometric data management issues, finger/image matching • Public safety • Forensics, manipulate complex objects using proprietary algorithms • …need for extensible database technology…need to support unstructured data…
Advanced Applications • Science systems • The new accelerator in CERN • how to handle >1PTByte files • The Sloan Digital Skyserver schema is 200 pages and the catalogued data 2.5Tb • How to query this efficiently • ..need for P2P and … a novel way to organize data…
LOFAR central processor specs • Streaming Data • Input: 320 Gbit/s • Internally within correlator: 20 Tbit/s • Into storage: 25 Gbit/s = 250 TByte/day • Final products: 1-3 TByte/day • High Performance Computing • Correlation: 15 Tflops • Pre processing and filtering: 5 Tflops • Off-line processing (calibration, analysis): 5-10 Tflops • Visualisation, control, scheduling etc: 2 Tflops • Storage • On-line temporal storage: 500 TByte • Archive: PByte range of data stored in Grid
Technological challenges • Data is often not structured as tables • XML and XQuery • Data does not always fit on one system • Distributed and parallel databases • Querying is more like world-wide searching • Continuous and streaming queries • A database tells more than facts • Datamining and knowledge discovery
Code bases • Database management systems are BIG software systems • Oracle, SQL-server, DB2 >1 M lines • PostgreSQL 300K lines • MySQL 500 K lines • MonetDB 200-800 K lines • SQLite 40K lines • Programmer teams for DBMS kernels range from a few to a few hundred
Performance components • Hardware platform • Data structures • Algebraic optimizer • SQL parser • Application code • What is the total cost of execution ? • How many tasks can be performed/minute ? • How good is the optimizer? • What is the overhead of the datastructures ?
Gaining insight • Study the code base (inspection + profiling) • Often not accessible outside development lab • Study individual techniques (data structures + simulation) • Focus of most PhD research in DBMS • Detailed knowledge becomes available, but ignores the total cost of execution. • Study as a functional black box • Analyse a small application framework
The Jack The Ripper Project • Study the snippet of the database technology and design an XQuery and SQL application • What is the schema? • What are the queries? • What are unorthodox solutions?
Learning points • My poor knowledge on relational database? Read the chapters on SQL and relational algebra. Knowledge on data structures comes in handy. • Database systems are much more than administrative bookkeeping systems
Learning points • Advanced application challenge the technology provided by a DBMS • Many techniques do not easily scale in size, complexity, functionality • Effectiveness of a DBMS is determined by many tightly interlocked components