110 likes | 193 Views
Informix Formation. Chetana Mehta (chetana@pspl.co.in) PSPL, Pune. Outline. Overview of Formation PSPL’s role Future work. Relational Databases. Extraction Cleansing. Optimized Loader. ERP Systems. Data Warehouse Engine. Analyze Query. Purchased Data. Legacy Data.
E N D
Informix Formation Chetana Mehta (chetana@pspl.co.in) PSPL, Pune
Outline • Overview of Formation • PSPL’s role • Future work
RelationalDatabases ExtractionCleansing Optimized Loader ERP Systems Data Warehouse Engine AnalyzeQuery Purchased Data LegacyData Metadata Repository Data Warehouse Architecture
What is ETL? • Extract data from existing operational and legacy data, transform and load the warehouse. • Issues: • Sources of data for the warehouse • Data quality at the sources • Merging different data sources • Data Transformation • How to propagate updates (on the sources) to the warehouse • Terabytes of data to be loaded
Overview of Formation • ETL Tool • User-friendly • Scalable
Operators • Join - Hash, Non-equi, Nested loop, Sort-merge • Aggregate/GroupBy • Sort • Deduplicate • Surrogate Key
Performance Subsystem • Periodic statistics • Summary statistics • Operator summary • Group summary • Performance hints
Periodic Statistics • No. of records pushed/pulled • Memory used • Disk reads/writes • Temporary space used
Summary Statistics • No. of records pulled/pushed • Record size • Time when first/last record sent/received • No. of unique keys/groups • Ratio of output size to input size • Selectivity
Performance Hints • Ideal memory size • Suggested memory size • Parallelizing
Future work • Memory cognizant optimization • Parametric query optimization • Operator ordering • XML extensions