140 likes | 371 Views
Advanced Topics: Business Intelligence Features. Data Warehousing, Materialized View, Partitioning, SQL Analytics. Data Warehousing. Purpose: Query and Analysis of Data Usually contains historical data derived from transaction data (OnLine Trans. Processing or OLTP)
E N D
Advanced Topics:Business Intelligence Features Data Warehousing, Materialized View, Partitioning, SQL Analytics
Data Warehousing • Purpose: Query and Analysis of Data • Usually contains historical data derived from transaction data (OnLine Trans. Processing or OLTP) • OLAP (OnLine Analytical Processing) • Enable Fast execution of ad-hoc queries
Data Warehouse Architecture Data Mart: small, specialized DW for a dept or so DW Purpose: Enable analysis using Unified Data
OLAP example • Assume one (real-valued) measure value eg Sales Amount and several finite dimensional attributes e.g. Item, Place, Month. • Example fact: “Iced Tea was sold in Auckland in January”. Measure: $20k • Maybe no fact about “Iced Tea in Auckland in August”. • Mapping Item×Place×Time Sales Amount called a cube. (Think “array”.) • Answer queries based on dimensions time, place group by month, country or year, state, etc.) • All possible combinations need to be fast.
Logical Design: Star Schema Central “Fact” table storing “measure” attributes such as “sales” Dimension tables store information about the dimension attributes Queries: Find “sales” by product, region, customers, time (year),..
Dimensions • Hierarchy of attributes • Analysis based on any level in hierarchy • sales on region, subregion, country_name,…
Logical Design: Snowflake Schema • Snowflake: complex improvisation of star. • Products: divided into categories and suppliers (more normalized) • Queries: will be slower
Physical Design Tables: Can be regular tables or Partitioned Tables Indexes: Specialized indexes like bitmap Additional Objects: Materialized views, Dimensions
Partitioned Tables • Behave like regular tables • Query on entire table or on a specific partition • Parallelism on multiple partitions • Can load/modify multiple partitions concurrently
Query & Analysis Functions • All SQL analysis functions • SELECT SUM(amout_sold), p.pname FROM sales s, products p, times t WHERE s.product_id = p.product_id GROUP BY p.pname;
NUMERICABSACOSASINATANATAN2BITANDCEILCOSCOSHEXPFLOORLNLOGMODNANVLPOWERREMAINDERROUND (number)SIGNSINSINHSQRTTANTANHTRUNC (number)WIDTH_BUCKET CHARACATERCHRCONCATINITCAPLOWERLPADLTRIMNLS_INITCAPNLS_LOWERNLSSORTNLS_UPPERREGEXP_REPLACEREGEXP_SUBSTRREPLACERPADRTRIMSOUNDEXSUBSTRTRANSLATETREATTRIMUPPER More SQL Analytic Functions Conversion Functions ASCIISTRCASTCHARTOROWIDCOMPOSECONVERTDECOMPOSEHEXTORAWNUMTODSINTERVALNUMTOYMINTERVALRAWTOHEXRAWTONHEXROWIDTOCHARROWIDTONCHARSCN_TO_TIMESTAMPTIMESTAMP_TO_SCNTO_BINARY_DOUBLETO_BINARY_FLOATTO_CHAR (character)TO_CHAR (datetime)TO_CHAR (number)TO_CLOBTO_DATETO_DSINTERVALTO_LOBTO_MULTI_BYTETO_NCHAR (character)TO_NCHAR (datetime)TO_NCHAR (number)TO_NCLOBTO_NUMBER,… • DATEADD_MONTHSCURRENT_DATECURRENT_TIMESTAMPDBTIMEZONEEXTRACT (datetime)FROM_TZLAST_DAYLOCALTIMESTAMPMONTHS_BETWEENNEW_TIMENEXT_DAYNUMTODSINTERVALNUMTOYMINTERVALROUND (date)SESSIONTIMEZONESYS_EXTRACT_UTCSYSDATESYSTIMESTAMPTO_CHAR (datetime)TO_TIMESTAMPTO_TIMESTAMP_TZTO_DSINTERVALTO_YMINTERVALTRUNC (date)TZ_OFFSET
Analytic Functions • AVG, CORR, COVAR_SAMP, COVAR_POP, COUNT, CUME_DIST, DENSE_RANK, FIRST, LAST, LAG, LEAD, MAX, MIN, PERCENTILE_RANK, RANK, ROW_NUMBER, STDDEV, SUM, VARIANCE, …
Materialized Views • Queries joining fact and dimension tables • Precompute the join and store as Materialized Views • Denormalized and so not in 3NF • Queries are automatically ‘rewritten’ using Materialized view • Materialized views • Facilitate fast execution of queries
Summary • Fast Query and Analysis • Denormalization into Materialized Views • Queries implicitly ‘rewritten’ under the covers • Research issues: • What queries can be rewritten using the MV • Pushing the updates to the MVs • Refreshing the MVs