320 likes | 330 Views
Understand the intricacies of enterprise data modelling for data warehousing, from 2-layered to 3-layer information architecture. Learn how to create an effective enterprise data model and tackle challenges in the process.
E N D
Lecture 4 Themes in this session • How OLAP really works • Enterprise data models for data warehousing • Metadata
From 2-layered to 3-layer information architecture • Before (2 layered architecture): • Layer 1 - real-time data - run the business • Layer 2 - derived data - manage the business • Suggested (3 layered architecture): • Layer 1 - real time data • Layer 2 - reconciled data • Layer 3 - derived data
The failings of traditional data modelling in enterprises Used only in business applications with well defined boundaries and roles. This means that: • entities are generalised only to the extent needed within the boundaries • provides no support for integrating applications • provides no support for combining data from different sources
Enterprise data modelling Modelling at the enterprise level and not the operational level aids the understanding needed for the reconciliation of operational data demands by: • showing how different data sets interrelate • showing the role of the different data sets in the business This is achieved by: • treating and modelling data entities at their most general level • making all commonalties in business data visible and usable
Aims for enterprise data modelling • Providing a single systems development base and promoting the integration of existing applications • Supporting the sharing of data between different areas of the business • Enabling effective management of data resources by providing a single set of consistent data divisions • Supporting the establishment and maintenance of a company-wide comprehensive management information system • Providing a structured methodology which allows business users to be involved in the implementation of business information strategies
The structure of the enterprise data model The enterprise data model has 5 distinct layers: • scope and architecture layer • business data classifications • generic entity relationship model • logical applications view • physical data design
The challenges of enterprise data modelling • Very wide scope, is always very complex • Demands input from all areas of the business • Is very time consuming and constitutes a moving target • Requires good management • Must be planned so as to deliver value and gain momentum • Requires access to skilled and knowledgeable business users, this is always in direst competition to operational priorities • Difficult to apply in application development situations
A strategy for enterprise data modelling • Tackle problems by breaking them down and dealing with them piece by piece • Use a layered structure with vertical subdivisions into business subject areas • Employ a stages definition approach • Employ a staged implementation approach
General steps in the creation of the enterprise data model • Obtain a unified view of the data needed to run the company • should be sufficiently generic for all the sections of the business to accept • should be sufficiently detailed to allow reasonably independent subsets to be identified as the basis for further work • All key entities that are commonly used across the enterprise must be identified with certainty • Local key attributes must be given initial definitions • Important relationships between key entities must be identified Note: for many industries pre-constructed industry models can be bought and customised
Modelling the Business Data Warehouse (reconciliation layer) • Determine the vertical segment (depth and breadth) of the subject area which the BDW data model is to cover • this should include a set of strongly interrelated entities • Choose which section of the Generic ERM to model on the basis of the business units’ needs • Develop a logical application view in which the entities in the GERM are customised in order fit the purposes of the application • Generate an optimal physical data design for the logical application view. • Note: the physical data design will nearly always differ form the actual design of the BDW because the legacy systems compromise the physical design of the BDW
Modelling the Business Information Warehouse (derivation layer) • Identify end-user groups and ascertain their intentions and requirements for the use of information • Select the relevant subset of the GERM from within the bounds of the BDW segment • Identify any isolated data needs that fall outside the bounds of the BDW • Create the Logical Application view for the BIW • Create the Physical Data Design for the BIW • Map the transformations between the reconciled and derived data models stipulated how the data will be moved from the physical sources in the BDW to the BIW (Metadata) Note: Sometimes it will be necessary to bypass the BDW and collect data directly from the real-time systems. This must be seen as a short term solution and eliminated ASAP. • Note: bypass
Retrofitting the model • Employ an approach of modifying operational applications at the same time as the BDW is evolved. • Try and steer both the operational applications and the BDW towards the optimal form dictated by the model. • The model serves as a goal towards which both the operational applications and BDW are fitted
The staged implementation of a data warehouse • Stage 1 - define the high level enterprise model • (1-3 months) • Stage 2 - model the subset intended for the BDW • (6-9 months) • Stage 3 - model the first BIW • (1-2 months) • Stage 4 release initial versions and continue with BDW/BIW evolution Note: the whole process of modelling, in parallel with an implementation program may take a year to get off the ground!
Using pilot applications for Data Warehousing In order to achieve quicker ROI a scaled down version of the BDW can be released. The size of the BDW can coincide exactly with that of the first BIW. • In order to prevent a fragmented evolution of the BDW several guidelines must be followed: • Use a pilot only once in any given business area • The structure of the BDW/BIW should not be too highly optimised for performance • A plan should for the migration to a full-blown three layer architecture should be be delivered and approved before the delivery of the pilot
Approaches to building the data warehouses and data marts with enterprise models • The top-down approach • first develop an enterprise data warehouse from the enterprise data model. Follow this up with data marts until a multi-tier architecture is obtained • The bottom-up approach • random growth of data marts, hopefully the evolution of a enterprise data warehouse on top of these after a while • The hybrid approach • develop the enterprise model first. When the model is in place begin building the enterprise warehouse and data marts in parallel
The nine key data concepts on the FSDM - A level • Involved Party ex. customer, supplier • Location ex. address • Arrangement ex. contract • Business Direction Item ex. goal, method • Classification ex. account, • Condition ex. price, interest • Product ex. article, service • Event ex. payment • Resource Item ex. price list, document
Classification hierarchies on the B-level of the FSDM Value SCHEME Value SCHEME Value GEOGRAPHIC AREA TYPE Geographic Area Country Postcode Area Time Zone LOCATION TYPE : : Location ADDRESS TYPE Address Legal Address Internal Address Postal Address : :
Generic ERD on the C-Level Geographic Area Type Geographic Area Name Geographic Area Time Zone Difference Geographic Area Geographic Area ID Type Time Zone Geographic Area Name Type Location calssifies is_classified_by includes is_classified_by is_subtype_of is_supertype_of classifies is_included_in includes is_included_in Geographic Area ID is_included_in includes classifies is_classified_by is_subtype_of is_supertype_of
What is metadata? Data about data Main functions are to give... • data definitions • the origin of data • the structure of data • rules for the selection and transfer of data • qualitative and quantitative data about data
Why is metadata needed? • Increasing functionality of data warehouses • Increasing size and complexity of data warehouses • Increasing number of varied user groups • Evolution of data warehouses and historical data analysis requirements Users and developers need a better, more standardised, way to document and communicate their knowledge of the warehouse, its rules and data sources
The metadata repository A specialised database designed to maintain metadata together with tools and interfaces which allow the company to collect and distribute the data • Is a combination of shared and local data about data • Is the vital component in a distributed metadata architecture • supports the distribution of sharable components • supports the autonomy and control of unshared local components
The life cycle of metadata • Collection • identify and capture metadata in a central repository • Maintenance • establish processes to synchronise metadata automatically with the changing data structure • Deployment • provide metadata to users in the right form and with the right tools
Focus areas for the collection of metadata • Warehouse data sources • physical data structures • business definitions of all data elements • platforms,data formats, update frequencies • Data models • the logical and physical enterprise data model • Warehouse data models • the logical and physical schemas for the data warehouse • Warehouse mappings • between warehouse and operational data structures • Warehouse usage information • who’s using the warehouse and how they’re using it • try and relate business problems and specific queries
Target groups for metadata deployment • Warehouse developers • physical structure models for data sources • target physical data structures as they evolve • evolving mapping schemas • Warehouse maintenance staff • monitor changes in the provision and utilisation environment and manage the effects of these changes on the DW • responsible for updating the metadata when the Dw architecture is affected • ensure the capability of tracing changes • End-users • aid exploration and understandability of information • validate information on the basis of source and quality • standard queries for specific business problems
Integration with data access tools 4 levels of possible integration are suggested • Side-by-side access • Use a query tool to provide context sensitive help texts • Query tools specifically suited for accessing metadata itself • Full interconnectivity between metadata tool and query tool • access to business query tools through metadata • transparent move from business query tools to metadata
Versioning of metadata and metadata maintenance • DW always contains a long history of data in order to support analysis • The time-specific context of the information has to be saved in order to explain the content • Changes in the DW demand a new version of the metadata • Parallel version management of the DW and metadata