250 likes | 284 Views
Learn essential database concepts including RDBMS fundamentals, creating tables, querying data, & exploring modern database categories like NoSQL, Hadoop & Spark. Understand importance of polyglot databases for modern applications.
E N D
Database Essentials Foundation about Databases
Agenda • Setting up Oracle environment • Different categories of Databases • RDBMS Concepts • Creating Tables • Loading data into tables • Query data from tables • Challenges with RDBMS • Hadoop and Spark • NoSQL Databases – MongoDB
Current Scenario – Architecture(Typical Enterprise Application) Client (Browser) App Server Client (Browser) Database App Server Client (Browser)
Setting up Oracle environment • Go to https://apex.oracle.com and sign up for an account • Load data by running script from my github account - https://github.com/dgadiraju/code/tree/master/rdbms/oracle
Categories of Databases • Relational Databases (Oracle, Informix, Sybase, MySQL etc) • OLAP/Data Warehouse appliances/MPP (Teradata, Greenplum, Vertica) • NoSQL Databases (Cassandra, HBase, MongoDB etc) • In memory Databases (Gemfire, Coherence etc) • Search based Databases (Elastic Search, Solretc) • Batch processing frameworks (Map Reduce, Spark etc) * Modern applications need to be polyglot (different modules need different category of databases)
Categories of Databases • Relational Databases – Mission Critical transactional based applications • OLAP/Data Warehouse appliances/MPP (Enterprise level reports) • NoSQL Databases – Non transactional based applications which require scalability • In memory Databases – In memory caches for performance reasons • Search based Databases – free flowing search based applications • Batch processing frameworks – Process huge volumes of data * Modern applications need to be polyglot (different modules need different category of databases)
Categories of Databases – LinkedIn • Relational Databases – Payment gateways • OLAP/Data Warehouse appliances/MPP – Reports for executives • NoSQL Databases – Endorsement engine • In memory Databases – Complements almost every thing • Search based Databases – Search with in LinkedIn • Batch processing frameworks – Offline processing for job recommendations, friend recommendations, mass campaigns based on certain criteria and more * Modern applications need to be polyglot (different modules need different category of databases)
RDBMS Concepts • Codd’s rules • Normalization • 1st normal form • 2nd normal form • 3rd normal form • BCNF • Database Objects • Tables • Indexes • And more
Codd’s rules • Here is the Wiki page - https://en.wikipedia.org/wiki/Codd%27s_12_rules • Rule 0: The foundation rule: relational capabilities • Rule 1: The information rule: information -> values in tables • Rule 2: The guaranteed access rule: Each and every row should be accessible by using names – such as tables, columns etc • Rule 3: Systematic treatment of null values: • Rule 4: Dynamic onlinecatalog based on the relational model:
Codd’s rules • Here is the Wiki page - https://en.wikipedia.org/wiki/Codd%27s_12_rules • Rule 5: The comprehensive data sublanguage rule: • A relational system may support several languages and various modes of terminal use (for example, the fill-in-the-blanks mode). However, there must be at least one language whose statements are expressible, per some well-defined syntax, as character strings and that is comprehensive in supporting all of the following items:Data definition. • View definition. • Data manipulation (interactive and by program). • Integrity constraints. • Authorization. • Transaction boundaries (begin, commit and rollback).
Codd’s rules • Here is the Wiki page - https://en.wikipedia.org/wiki/Codd%27s_12_rules • Rule 6: The view updating rule: • Rule 7: High-level insert, update, and delete: • Rule 8:Physical data independence: No impact on changing physical layer of data • Rule 9:Logical data independence: No impact on changing table structures • Rule 10:Integrity independence: Constraints • Rule 11:Distribution independence: • Rule 12: The nonsubversion rule:
Normalization • 1st normal form • 2nd normal form • 3rd normal form • BCNF
Creating Tables • It is called as Data Definition Language • Typical table should have primary key constraint (unique and not null) • Primary keys are supported by index • Relations are defined using foreign key constraint • Please review the ”CREATE TABLE” commands from load_schema.sql
Loading data into tables • It is called as DML • Typically followed by commit or rollback • Commit will preserve the changes to data permanently • CRUD operations • Insert • Update • Delete
Querying Data • Simple SQL – SELECT and FROM clause • Filtering data • Projection of data (selecting desired columns) • Aggregating data • Sorting data • Joining data from multiple tables
Functions • Pre-defined • String manipulation • Date manipulation • Aggregate functions • Analytical or windowing functions • User-defined (out of scope)
Challenges with RDBMS • While RDBMS is good for mission critical transaction based systems, it performs poorly in batch processing of data. • When ever CRUD operations happens • Performs constraint checks • Performs data type checks • Performs length and precision checks • Overhead for maintaining transactions (extensive logging – redo logs and undo in case of Oracle) • Overhead for maintaining consistency view of data between transactions • Expensive join operations (full table joins) • And more
Solutions for RDBMS challenges • Datawarehousing approach can solve the challenges for RDBMS for heavy weight data processing • Facts • Dimensions • Measures • Data Mart • Denormalized data model (fewer tables for join) • Star Schema • Snow Flake Schema • Methodologies • Kimball Methodology • Inmon Methodology • Aggregated Tables – Facts at different granularity • In many cases constraints are informational, they are not strictly enforced
NoSQL Databases • NoSQL – Not only SQL • Hbase, Cassandra, MongoDB • Tables are typically indexed and partitioned • Works very well for scalability of simple tables • Need not suit very well for transactional based systems • Example applications • Endorsement engine • Recommendation engine