1 / 13

Data Warehouse Notes

Data Warehouse Notes. OLTP vs OLAP. OLTP: Online Transaction Processing Most common type of database for data input ER design approach Database design looks like “real world” Throughput and response time are big concerns. Exercise: Examples of transactional databases.

Download Presentation

Data Warehouse Notes

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Data Warehouse Notes

  2. OLTP vs OLAP • OLTP: Online Transaction Processing • Most common type of database for data input • ER design approach • Database design looks like “real world” • Throughput and response time are big concerns Exercise: Examples of transactional databases

  3. Physical Data Storage and Usage in OLTP System: • Data stored on disk in data pages (4k, 8k, 16k, 32k in size) • Data IO is also in data pages • Memory image of database table is also data pages • Need one row of one table, read an entire data page. • Rows per Page ~ Page Size/Bytes per Row Physical storage of rows of a single table A B C D . . . . . . . . . . . . . . . . . . Exercise: Cost of Retrieval (A,B) vs (C,D)

  4. Data Input Strategies • All transactions get added to the end of the last data page in the chain of data pages that make up this table. • single point of contention slows everyone down • most reporting is looking at older data on different data pages so not blocked by data input on newly created data pages report accesses these pages . . . . . . . . . . . . . . . . . . new rows added here

  5. Data Input Strategies • Transactions with similar key values are placed on the same data pages (eg, orders from the same state) • multiple insertion points so less chance of insertions running into each other • reporting now runs into problems reporting from the same pages where data is being entered MA RI data insertion and reportinguse the same pages CT . . . . . . . . . . . . . . . . . . . . . insertion

  6. Solution 1 • Have OLTP and OLAP run on different copies of the same database – live and day-old. • Each night copy all new transactions to day-old database • Or mirror data insertion in a duplicate database • Problems: • Insertion and Reporting are “orthogonal” so the structure that makes one task easy makes the other difficult. • Insertion typically restricted to one external view at a time • Reporting is often across many external views • Insertion easy to design and designed only once • Reports hard to design and new reports all the time

  7. Solution 2: • OLAP database has a different structure than the OLTP database. • Most common structure is called the Star Schema. Dimension Tables Fact Table Address = {address, country, state, region}

  8. Alternative to Star Schema: • OLAP database has a different structure than the OLTP database. • Another common structure is called the Snowflake Schema. Dimension Tables Fact Table Address = {address, county_id} Country = {id, state_id} State = {id, region_id} Region = {id, ...}

  9. Fact Table: D1 ID D1_ID D2_ID D3_ID ... m1 m2 m3 ... ID ... Fact table schema consists of numeric facts called measures categorized by dimensions Dimensions are qualitative properties of the data and measures are quantitative properties. D2 ID ... D3 ID ...

  10. Example Fact Table HMO_CLAIM_HISTORY claim_type patient_state_code claim_user_id patient_claim_id prmy_procedure_id prmy_diagnosis_id claim_from_date .............................. claim_pymt_amt ddctbl_amt coinsurance_amt (cp_amt, dd_amt,coin_amt) state proc_id claim_type data cube

  11. Improvement? • All transactions fit into a single table • Every report is a slight modification of every other report • select new dimensions • select new measures • Problems: • Fact table can get very large (terabytes) • New business decisions lead to new dimensions so table has to be restructured • Smaller cubes are better for reporting since more efficient.

  12. Exercise: • Suppose that the Library database is used in several branch libraries of a large system. • Suppose that when a book is returned, a permanent record of the loan is put into a central fact table in a central data warehouse. • What kind of data might be put in the new fact table? What are the dimensions? Any measures?

  13. Library OLAP Schema: Cardholder LibraryBranch Library_ID Borrower_ID Accession_No isbn loan_date return_date return_code days_on_loan Book Copy

More Related