150 likes | 294 Views
Chapter 8: The Physical Data Warehouse. Data Warehousing Lab. 윤 혜 정. 목차. The VLDB Supporting a Star Schema Parallelism and Performance of Physical Data Warehouse Tablespace Segregation Other Physical Considerations / Oracle Internet File System Physical Implementation Approach. The VLDB.
E N D
Chapter 8:The Physical Data Warehouse Data Warehousing Lab. 윤 혜 정
목차 • The VLDB • Supporting a Star Schema • Parallelism and Performance of Physical Data Warehouse • Tablespace Segregation • Other Physical Considerations / Oracle Internet File System • Physical Implementation Approach
The VLDB • What is VLDB? • -managing a 30GB database • -(10 terabyte database and the 30GB is not a true VLDB) • Window of Opportunity • -The amount of time in a 24-hour period that the database is quite and within nightly management task can be applied. • -VLDB define : • cannot be managed during the nightly window of opportunity • Size requires to be into broken • Implementing a VLDB
The VLDB • Tables and Partitions • Partitioning breaks up a single table into smaller pieces. • 3 method to partition table 1. Range Partitioning 2. Hash Partitioning 3. Composite Partitioning / Subpartitioning • Nested Table • A table within table • provide a way to simplify the table structure. • Transportable Tablespaces * move complete tables from one database to another • Source and target database are • Same block size, NLS character, hardware platform • Source tablespace is unique • Transportable tablespace must be self-contained.
Supporting a Star Schema • Systematic Denormalization • violate the rule that relational database architect. • enhance the performance of the warehouse by reducing the need to understand every possible join path. • process of compressing lots of tables into fewer tables • Star schema Implementation • Materialized View -Actually store the data not just the SQL definition of the data -Together with Dimensions Oracle8i has powerful aggregate navigation functions.
Supporting a Star Schema Define and populate materialized View Create materialized view revenue_summary build immediate refresh compete as select r.product_id, r.location_id, sum(r.revenue_dollars) sum_revenue_dollars from revenue r group by r.product_id, r.location_id; REVENUE table is commited, the materialized view is also update Materialized View log Create materialized view log on revenue with rowid (product_id, location_id, revenue_dollars) including new values Materialized view log should be create before the Materialized view itself. DBMS_OLAP execute dbms_olap.recommend_mv(null, 4096000, null, 0)
Supporting a Star Schema • Dimension -used to define a hierarchy of data -database what columns are subsets of other columns Create dimension time_dimension level day is time.day level month is time.month level quarter is time.quarter level year is time.year hierarchy calendar_year ( day child of month child of quarter child of year); Day level Create materialized view revenue_daily_summary build immediate refresh complete as select r.product_id, r.day, sum(r.revenue_dollars) from revenue r group by r.product_id, r.day Month level Create materialized view revenue_monthly_summary build immediate refresh complete as select r.product_id, r.month, sum(r.revenue_dollars) from revenue r group by r.product_id, r.month
Supporting a Star Schema Cost-based Optimizer (비용 기준 옵티마이저) 처리 방법들에 대한 비용을 산정해 보고 그 중에서 가장 적은 비용이 들어가는 처리방법을 선택한다. Gathering Statistics for Cost-based Optimizer • Measurement of the expected resource • Data distribution and storage characteristics influence the choice the optimizer makes. The SQL ANALYZE command • Tables, partitions, subpartitions, indexes, and clusters • Estimate or compute • Estimate table and compute index • Estimate • Random sample of rows • Significantly less time-consuming • Less restrictive lock
Supporting a Star Schema • How to Analyze Object • Using SQL*Plus • Using a procedure Execute dbms_utility.analyze_schema(‘CVAN’,’ESTIMATE’, NULL, 20) • When to Analyze Object • Static objects and Dynamic objects • Where to view Statistics • Stale Statistics • Consistency with Statistic Collection Select table_name, num_rows From user_table Where num_rows is not null Analyze result Table_name num_rows Sale 19009 ITEM 2199 Sql>select count(*) from sale COUNT(*) 894981 Sql>select count(*) from item COUNT(*) 53219
Parallelism and Performance of Physical Data Warehouse • MMP(Massively parallel processor) & SMP(Symmetric parallel processor) • Parallel Query - be controlled by a dispatcher • Parallelism and the Warehouse Oracle8i can parallelize more than 20 operations • Table Scan • Not in • Group by • Select distinct • Aggregation • Order by • Create table as select • Index maintenance • Inserting rows from other tables • Star optimization
Parallelism and Performance of Physical Data Warehouse • Degree of Parallelism - number of query process associated with a single operation • At the Statement level using Hint (Hint used to influence the way the optimizer processes queries) • At the Object level the best place to define degree of parallelism. (example : create table statement includes a parallel (degree n)
Parallelism and Performance of Physical Data Warehouse • Choosing a Degree of Parallelism -defined for the objects and the number of query server processes the instance has available. • Turning on Parallel Query at the Instance Level • PARALLEL_MIN_SERVERS • PARALLEL_MAX_SERVERS • PARALLEL_SERVER_IDLE_TIME If number of CPUs =1 then process query serially Else if number of CPUs >> disk drives upon which object(s) stored then first degree = number of disk drives as degree Else first degree = number of CPUs End if
Tablespace Segregation • System Support Tablespaces • -SYSTEM, ROLLBACK, TEMPORARY, TOOLS, USERS • Application Tablespaces • Estimate the space required • Create Oracle accounts • Separate the data and index containers • Pointing Users at Tablespaces
Other Physical Considerations/ Oracle Internet File System • National Language Support • NLS parameters : reads an initialization parameter file(init.ora) • NLS_DATE_FORMAT • NLS_TERRITORY • Oracle Internet File System To manage unstructed data within the Oracle8i kernel Rolf GrỐenveld’s Rolf Groenveld’s
Physical Implementation Approach Step 1 : Plan the Database Instance language, space planning and allocation, Security, Performance structures, Object allocation(table, index, partition) Step 2 : Perform the Installation Step 3 : Test the Installation • The cost-based optimizer • Parallel processing • Indexing schemes • The Dimensions and materialized views