490 likes | 575 Views
Big Data Issues & Introduction to Business Intelligence Dr. Chang Liu. My Story …. Chair and Professor of MIS at the OM&IS Department, College of Business, Northern Illinois University.
E N D
Big Data Issues & Introduction to Business Intelligence Dr. Chang Liu
My Story … • Chair and Professor of MIS at the OM&IS Department, College of Business, Northern Illinois University. • Taught Database Management, Web Computing, Business Information Technologies, and Business Intelligence Applications courses • Received MIS Undergraduate Teaching Award in 2005. • Received MIS Graduate Teaching Awards in 2001, 2002, and 2006. • Recognized as top 20 researchers by Information & Management in 2007 • http://www.niutoday.info/2013/02/15/omis-chair-earns-2012-citation-of-excellence/ • Born and raised in Beijing, China • PH.D. from Mississippi State University in 1997: Doctor of Business Administration in Management Information Systems (MIS) • Taught at Beijing Institute of Business from 1988 to 1992 • Served as a Project Manager for Motorola Inc. Beijing office from 1992 to 1994 • Interest: • Ping Pong, Badminton
Department of Operations Management & Information Systems OM & IS http://cob.niu.edu/omis + Information Technology Business Processes
OM&IS Programs • B.S. in Operations and Information Management • M.S. in Management Information Systems • Certificate Programs • Certificate of Graduate Study in Business Analytics Using SAP Software • Certificate of Undergraduate Study in Business Analytics Using SAP Software • Certificate of Graduate Study in Management Information Systems • Certificate of Undergraduate Study in Information Systems • Certificate of Undergraduate Study in Service Management • SAP Student Recognition Award
OM&IS Department 11 tenured & tenure-track faculty 7 full-time/part-time instructors 1 program advisor & internship coordinator 1 department secretary 200 Undergraduate OM&IS majors 95 MIS master students
Top Ranked Programs In 2013, BusinessweekRanked NIU College of Business’s Information Systems Program #34 and Operations Management Program #60 nationwide.
Introductions Course Syllabus Name Background Hobbies
Big Data Issue Big Data Big Opportunities Big data is a term that describes the exponential growth and availability of data. It is estimated that by the year of 2020 the digital universe will have grown 44 times its size in 2009. Businesses must employ a data management, analysis, and security plan in order to stay compliant and competitive.
Business & Technology Priorities in 2013 Source: Gartner EXP (January 2013)
Worldwide BI, Analytics and Performance Management Revenue Estimates for 2011 (Millions of U.S Dollars) Source: Gartner(March2012), http://www.gartner.com/it/page.jsp?id=1971516
How Come It Takes Me So Long to Get Answers to Simple Questions About My Business? Technologies for Business Intelligence
Why BI / What’s the problem? • Businesses (people, really) can’t get answers efficiently. 32%!!!
An example – Let’s start small ?? Two spreadsheets. One has student name, znumber (student id) and major, the other has student name, znumber and quiz score.
What’s “the answer”? • Database approach • Centralized, any-time, any-place data in a data warehouse.
And the biggest? The “HI-TECH Act” The purpose of the HITECH Act is to promote the use of health information technology with a goal of utilization of an electronic health record for each person in the United States by 2014.
What can YOU do?(technology-wise) Step 1: Get familiar with Microsoft Access (or any Relational Database Management System, RDBMS). Step 2: Make Access centrally accessible to your employees. Step 3: Building a company-wide data warehouse.
What can YOU do?(process-wise) Step 1: Identify where your corporate information comes from. Step 2: For things spreadsheet-based, consider moving that data out of spreadsheets.
Some Definitions -1 “Data” is characters, fields, and files that are stored somewhere. “Information” is data with meaning and context. It is an organizational asset. A database is a collection of related data. A relational database is data stored in a table format. The most common use of a database is an “ad hoc” query. For example, “How many cases of bottled water did we sell to college students in September vs. August?”
Application #1 Application #2 Application #3 DBMS Database containing centralized shared data Relationships of Users, Database Applications, DBMS, and Database
Hierarchy of Database Elements Fields Records Tables(files) Database + Metadata
Class Exercise – Database Structure • You have been hired as the Personnel Director for a medium-sized firm (500 employees) and are expected to implement a database system to track employee compensation. You want to be able to calculate the age of every employee as well as length of service. You want to know each employee’s most recent performance evaluation. You want to be able to calculate the amount of the most recent salary increase, both in dollars and as a percentage of the previous salary. You also want to know how long the employee had to wait for that increase --- that is, how much time elapsed between the present and previous salary. Design a database table capable of providing this information.
Q. How can you make it work?A. Centralized database, allowing for BI and mining. Server - responds to client requests DBMS - the program. Manages interaction with databases. request response Client - makes requests of the DBMS server database - the collection of data. Created and defined to meet the needs of the organization. • Database Management System (DBMS) • a program for creating & managing databases; ex. Oracle, MS-Access, SQL Server, Sybase. • Basically synonymous with “database” at this point.
What’s the File Processing The “old” way of doing things; still often used in practice. Separate information stored on separate files.
Problems with File-Processing Systems • Data are separated and isolated • Data are often duplicated • Application program dependent • Incompatible data files • Difficult to understand • Create problems with data integrity because data is: Duplicated & Inconsistent
Duplicate Data File System: An Example
Benefits of Database Systems • Data is integrated • Data duplication is reduced • Data is consistence • Data is program independent • Data is easy to understand • Data is:Shared & Integrated
Class Exercise – Database Relationship How to create two tables in a database, one for Customer, the other for Order information? How to build a relationship between the two tables in a database?
Some Definitions - 2 A relational database has numerous tables (like spreadsheets) which are tied together by common fields. Primary key: A field that uniquely identifies each record in a table. Common Field (Foreign Key): A field that appears as a non primary key (field) in one table and as a primary key in another table.
Database Management? Ugh, just give me a spreadsheet • There are a number of ‘versions’ of the spreadsheet system, all with common features … • Lack of data definition/documentation • Lack of ownership • Lack of support • Lack of change control
But Excel can be a good front end tool: Pivot Tables & PowerPivot Pivot Tables can be used for data analysis and presentation as a front end tool if all of your data in one spreadsheet. Demo Exercise Assignments
Business Intelligence (BI) Definition • BI is a set of technologies for tuning raw data into actionable information • Leads to better decisions that are in line with business goals and objectives • Helps organizations operate more efficiently • Can lead to the discovery of new opportunities • The BI market is growing rapidly. • BI gives decision makers and operational staff access to ORGANIZATIONAL data • Allows them to interact with the data • Analyzes it • Uses the data to perform forecasts, etc.
Data Warehouse • A subject-oriented, integrated, time-variant, non-updatable collection of data used in support of management decision-making processes • Subject-oriented: e.g. customers, patients, students, products • Integrated: consistent naming conventions, formats, encoding structures; from multiple data sources • Time-variant: can study trends and changes • Non-updatable: read-only, periodically refreshed
Need for Data Warehousing • Integrated, company-wide view of high-quality information (from disparate databases) • Separation of operational and informational systems and data (for improved performance) • Operational system – a system that is used to run a business in real time, based on current data; also called a system of record • Informational system – a system designed to support decision making based on historical point-in-time and prediction data for complex queries or data-mining applications
Data Warehouse Architectures • Independent Data Mart • Dependent Data Mart and Operational Data Store • Logical Data Mart and Near Real-Time Data Warehouse • Three-Layer architecture All involve some form of extract, transform and load(ETL) Data Mart: A data warehouse that is limited in scope to support a single business function or process.
Data marts: Mini-warehouses, limited in scope L T E Separate ETL for each independent data mart Data access complexity due to multiple data marts Figure 1: Independent data mart data warehousing architecture
Figure 2: Dependent data mart with operational data store:a three-level architecture ODS provides option for obtaining current data L T E Simpler data access Dependent data marts loaded from EDW Single ETL for enterprise data warehouse (EDW)
Derived Data • Objectives • Ease of use for decision support applications • Fast response to predefined user queries • Customized data for particular target audiences • Ad-hoc query support • Data mining capabilities • Star Schema – A simple database design in which dimensional data are separated from fact data Most common data model = star schema (also called “dimensional model”)
How the dimensional model can solve the problem of analyzing data? • A retailer, John Doe, sells products to customers over a period of time • Here are some questions John needs to ask to analyze his business: • How many units of products did I sell altogether? • Which products sold the greatest number of units? • How did sales in Week 1 compare with sales in Week 2? • How did sales perform by street? • Who are my top two customers? • How do sales split by customer gender and by street? • To provide a solution for John, most people think SQL is the answer – write different SQL for each question • Slow, more joins, more loads on server
Dimensional Model (Cube) to the Rescue • John needs to have BI tool (such as SQL Server Analysis Services or SAP Business Information Warehouse) to create a cube for him • The cube stores the numeric answers for all combinations of product, store, customer, and time • The numbers can easily be analyzed by customer, by store, and by product. Customer, Time, Store, and Product are the cube’s dimensions • John does not need to run database queries as the answers are pre-calculated • All he needs is a nice client interface; e.g., Microsoft Excel
Figure 4: Components of a star schema Fact tables contain factual or quantitative data 1:N relationship between dimension tables and fact tables Dimension tables are denormalized to maximize performance Dimension tables contain descriptions about the subjects of the business
Figure 5: Star schema example Fact table provides statistics for sales broken down by product, period and store dimensions
Figure 6: Star schema with sample data Cubes are intuitive, which makes them easy for non-technical users
Different Names, Same Ideas • Data Mining - uncovers important patterns in existing data to support decision making. • Online Analytical Processing (OLAP) –User-driven discovery with multidimensional views of their data • Data Warehousing
Example: Sales Data in DB Question 1: How a star schema can be designed to represent data in a cube? Question 2: Which salespeople do better with certain customers?