1 / 28

Data Administration Data Warehouse Implementation 5/26/04

Data Administration Data Warehouse Implementation 5/26/04. DW Current Resources. Servers – IBM RS6000 S7A with 4 engines, 4GB memory, 200GB disk space, running AIX 4.3.0

duell
Download Presentation

Data Administration Data Warehouse Implementation 5/26/04

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Data AdministrationData Warehouse Implementation5/26/04

  2. DW Current Resources • Servers – • IBM RS6000 S7A with 4 engines, 4GB memory, 200GB disk space, running AIX 4.3.0 • Dell PowerEdge 2550 with two 1.4 GHZ processors, 4GB of RAM, four 36GB disk drives and running Windows 2000 server operating system. • Data –100GB disk space used to house FRS, FES, HRS and SIS Oracle extracts. This is essentially a “data dump” of the IDMS data, and additional “views” still need to be created. • Staff – • DBA - Henry 80%, Song 20%, Chen 10% • DA - Kisil 70%, Cheesman 50%, Claunch 50%, Carter 100%

  3. DW Tasks • DBA – Extract data, design Oracle DB, load data, and production support (i.e. monitor system and DB performance, enforce security, schedule backups, etc.) • Data Administration – User interface, develop requirements document for all DW projects, evaluate data quality, create DB views, develop specialized reports, test, train users, and coordinate projects. • Both – Infrastructure design (with Systems staff), and tool evaluation (ETL, OLAP and desktop reporting) with help from the C/S group.

  4. DW Terms • Source Data: Operationaldata from internal systems, such as IDMS (FES, FRS, HRS, SIS), Oracle, etc. •  External Data: Data from systems external to the University, such as economic and census data collected by the government. • Data Staging Area: Storage and processing area fordata extracted from the internal and external systems prior to loading into the Warehouse, Data Marts or Ad Hoc Query Repository. Some of the data will remain un-cleansed and an exact replica of the data in the online systems, for subsequent loading into the Ad Hoc Query Repository. Other data will be cleansed and transformed before being moved to the Data Warehouse and Data Marts for analysis. Some data will be located in multiple places and in multiple forms and aggregations. • Metadata: A term used for data that describes or specifies other data. It is used to define all of the characteristics of data required to build databases and applications, and to support knowledge workers and information producers. This includes information currently in the Data Inventory (the element name, meaning, format, domain values), and additional information such as business integrity rules, relationships, owner, etc.

  5. DW Terms • Ad Hoc Query Repository:A collection of enterprise data from multiple sources, used to do ad hoc and operational reporting where the need to use the most current and un-standardized source data is a requirement. The Repository will typically contain only one or two years of the most recent data, unless regulatory or statutory requirements dictate otherwise. (Also known as an Operational Data Store or ODS.) • Data Warehouse: An enterprise-wide, cross-functional, cross-organizational database typically comprised of data extracted, cleansed and/or summarized from multiple online transaction processing systems, and other stores of data (Purdue University; Stanford University). It is designed for query and analysis, typically contains historical data, and is used to present information to support decision-making, tactical and strategic business processes.A data warehouse tends to start from an analysis of what data already exists and how it can be collected in such a way that the data can later be used. In general, a data warehouse tends to be a strategic, but somewhat unfinished concept; a data mart tends to be tactical and aimed at meeting an immediate need. (Improving Data Warehouse and Business Information Quality, Larry P. English, 1999.)

  6. DW Terms • Data Mart: A subset of enterprise data from the Data Warehouse that is summarized and stored in an optimal fashion for analysis and presentation of information to support trend analysis and tactical decisions and processes. Data Marts are typicallydesigned based on an analysis of user needs to answer specific questions in the pursuit of specific goals. The scope can be that of a complete data subject such as Student, or of a particular business area or line of business, such as Enrollment. (Improving Data Warehouse and Business Information Quality, Larry P. English, 1999.) • On-Line Analytical Processing (OLAP): A category of software technology that enables analysts, managers and executives to gain insight into data through fast, consistent, interactive access to a wide variety of possible views of information that has been transformed from raw data to reflect the real dimensionality of the enterprise as understood by the user. OLAP helps the user synthesize enterprise information through comparative, personalized viewing, as well as through analysis of historical and projected data in various "what-if" data model scenarios. This is achieved through use of an OLAP Server. (http://www.moulton.com/olap/olap.glossary.html) Functionality includes multi-dimensional analysis, slicing, drill-down and rotation.

  7. DW Terms (Continued) • Data Mining: A class of database applications that look for hidden patterns in a group of data. For example, data mining software can help retail companies find customers with common interests. The term is commonly misused to describe software that presents data in new ways. True data mining software doesn't just change the presentation, but actually discovers previously unknown relationships among the data. (http://www.webopedia.com/TERM/d/data_mining.html)

  8. DW Terms (Continued) • Executive Information System (EIS): An application developed to provide senior management direct access to information relevant to an organization’s goals and performance. These applications are developed to gather, analyze and integrate internal and external data to provide management with insight into key performance indicators, potential problems, and changes in the environment. Typical features include extensive use of graphics, simple navigational controls, automatic replacement of report contents, drill-down analysis, trend analysis capabilities, exception reporting or alerts, graphical charts with links to underlying reports, provision of data from multiple sources, and the highlighting of information an executive feels is critical. (The Data Warehouse Lifecycle Toolkit, Ralph Kimball, et al.)

  9. How do we get there? • Educate users. • Develop detailed requirements documents, including “Information Value Chains” for all goals which the DW/DM is expected to address. • Data Mart approach. • Phased implementation. • Additional resources.

  10. Educate Users • Basics – “What is a Data Warehouse?” Create a “single-source-of-truth.” “What it’s not!” (It is not all the data, with daily updates and online storage.) • Change in culture – “Let’s make better decisions based on objective analysis of data.” • Set realistic expectations - No silver bullet. It can help you make better decisions, but you still have to be responsible for implementing those decisions. • Focus on institutional goals – “What is it we need to achieve? What metrics do we need to evaluate our progress in attaining goals?” • Importance of business sponsors – Make timely business decisions and support IT requests for additional funds.

  11. Course Management (I.V.C.)

  12. Enrollment Management (I.V.C.)

  13. Course Management DM – Subject Areas • Accounts (dollars for faculty and supplies) • Assets (facilities and equipment) • Benchmark Data (what and when courses offered, degree requirements) • Courses (meeting pattern, teaching mode, requirements) • Department/College (who owns the course, degree offered and requirements) • Economic Data (salary and demand by profession) • Faculty (distribution of effort, availability to teach) • Faculty Applicants (who and how many) • Population Data (who, where and what volume) • Staff (instructional support) • Students (classification, course demand)

  14. Enrollment Management DM – Subject Areas • Accounts (dollars for services, financial aid) • Assets (facilities and equipment, housing) • Benchmark Data (enrollment patterns) • Courses (meeting pattern, teaching mode, requirements, professor) • Department/College (who owns the course, degree requirements, degrees offered) • Economic Data (general condition of economy, hot professions) • Faculty (quality, student evaluations) • Population Data (who, where and what volume) • Staff (instructional support, student service area) • Students (demographics, course demand, GPA) • Student Prospects (demographics, contacts) • Student Applicants (demographics, contacts)

  15. Project SchedulePhase I, 2001-02 Infrastructure & PlanningPhase II, 2002-03 Creation of Data Mart Phase III, 2003-04 Creation of Additional Data Marts

  16. Phase I – Infrastructure and Planning (2001-2002) • IDMS Data Dump to Oracle – 100% complete. • WebFOCUS Implementation – Completion 12/02. • Data Mining Tools for IR staff – Purchased in 2002.Implementation in progress. Target installation completion 3/03. • Create Views for “Data Dump” (Ad Hoc Reporting Repository) -Target completion 4Q 02-03. • Establish Enterprise Standards for Key Data– Analysis and recommendations are ongoing. • Identify and Prioritize Data Mart Development – Course Management Data Mart top priority for Data Stewards.

  17. Phase I – Infrastructure and Planning (2001-2002) (Continued) • GASB – Phase I completed 8/02. • CPE – Six years of data loaded in Oracle. • Review Desktop Reporting Tools– Ongoing review and testing of: • Brio • Crystal Reports • SAS • WebFOCUS

  18. Phase II – Creation of Data Mart (2002-2003) 1) Select and Purchase ETL Tools– Choices: • Ascential • IBM • Informatica • Oracle • SAS • Course ManagementDM – Requirements phase. Target completion 4/03. • Phase II of GASB. Target completion 7/03.

  19. Phase III – Creation of Additional Data Marts (2003-2004) • Create Metadata – Dependent on ETL implementation. • Data Marts – Complete Course Management DM by 7/1/04, and complete requirements for Resource Management DM. (Others to consider: Enrollment Mgt., Department Mgt., Diversity & Equity Initiatives, Grant Management, and Research Management.). • Evaluate External Data Needs - Identify external data needed for the Data Marts, locate source(s), estimate cost and request funds for 04-05.

  20. Phase ?? • Create Remaining Data Marts. • Develop OLAP applications. • Develop EIS. • Purchase and Load External Data.

  21. New DWE Resources • The existing server will need to be upgraded, and additional servers will need to be purchased for the Ad Hoc Reporting Repository, the Data Staging Area, the Data Mining and/or OLAP tools, and for one or more of the Data Marts. Size, type and cost will vary depending on function. • Storage – Additional disk space for DW server, and for other servers using Enterprise Storage System. Exact amounts for online storage and for archiving will be identified during the project requirements phase.

  22. New Resources (Cont.) • Training – • Infrastructure – “How do you design a DW environment? • Staff training on DW • Oracle (or other) DB training • Tools – Analysis and reporting tools. • Onsite visits to other universities. • Consulting on Infrastructure Design – Covansys, Gavroshe, Oracle, IBM, etc. • Staff – Additional staff required for database administration, institutional reporting, EIS development, ETL use, data modeling, etc.

  23. DWE Critical Issues • Personnel Resources –Need to keep people focused on DW in light of competing projects. (Given the current budget situation, this will be difficult.) • Training –IT and users. • Consulting – Assistance for infrastructure planning. • Culture Change – Focus on goals rather than data elements. Tell us what, not how! Value in making decision based on data. Proof of concept. • Requirements Gathering –Needs to be thorough, and heavily relies on timely user decisions. Defining who the “users” are on any given project is critical. • Budget/Funding – For additional hardware, software and people. • Business Sponsor – The Data Warehouse is not another IT project. It is an enterprise initiative!

  24. Data AdministrationQUESTIONS?

More Related