1 / 43

ISQS 6339, Data Management & Business Intelligence Introduction

ISQS 6339, Data Management & Business Intelligence Introduction. Zhangxi Lin Texas Tech University. \TechSharecobadisqs3358. Outline. Big Data Definitions of BI Categorizations of BI BI Trend BI tools. What is Business Intelligence.

lynton
Download Presentation

ISQS 6339, Data Management & Business Intelligence Introduction

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. ISQS 6339, Data Management & Business IntelligenceIntroduction Zhangxi Lin Texas Tech University ISQS 6339, Data Mgmt & BI

  2. \\TechShare\coba\d\isqs3358 ISQS 6339, Data Mgmt & BI

  3. Outline • Big Data • Definitions of BI • Categorizations of BI • BI Trend • BI tools ISQS 6339, Data Mgmt & BI

  4. What is Business Intelligence • A Simple Definition: The applications and technologies transforming Business Data into Action • Business intelligence (BI) is a business management term • refers to applications and technologies which are used to gather, provide access to, and analyze data and information about their company operations. • Business intelligence systems can help companies gain more comprehensive knowledge of the factors affecting their business, and help companies to make better business decisions. • YouTube: • What is BI? – B, 2’ • Microsoft Business Intelligence Surface Demo 6’34” ISQS 6339, Data Mgmt & BI

  5. Data, information, and knowledge • Data – a collection of raw value elements or facts used for calculating, reasoning, or measuring. • Information – the result of collecting and organizing data in a way that establishes relationship between data items, which thereby provides context and meaning • Knowledge – the concept of understanding information based on recognized patterns in a way that provides insight to information. ISQS 6339, Data Mgmt & BI

  6. Online Video • What is business intelligence? 10’36” • Retail and Big Data Revolution, 2’12” • Explain Big data, 8’33” • Big data terms, 31’19”

  7. Driving force - Big Data • A collection of data sets so large and complex that it becomes awkward to work with using on-hand database management tools. • Difficulties include capture, storage, search, sharing, analysis, and visualization. • The trend to larger data sets is due to the additional information derivable from analysis of a single large set of related data, as compared to separate smaller sets with the same total amount of data. • Big data, 8’33” Copyright 2012

  8. Zettabyte (ZB) • A quantity of information or information storage capacity equal to 1021 bytes or 1,000 exabytes. • As of April 2012, no storage system has achieved one zettabyte of information. • The combined space of all computer hard drives in the world was estimated at approximately 160 exabytes in 2006. • Seagate reported selling 330 exabytes worth of hard drives during the 2011 Fiscal Year. • As of 2009, the entire World Wide Web was estimated to contain close to 500 exabytes. This is a half zettabyte. • 1,000,000,000,000,000,000,000 bytes = 10007 bytes = 1021 bytes

  9. Data Scale

  10. Market • "Big data" has increased the demand of information management specialists - major companies have spent more than $15 billion for this. • This industry is worth more than $100 billion and growing at almost 10% a year. • 4.6 billion mobile-phone subscriptions worldwide and between 1 billion and 2 billion people accessing the internet. • The world's effective capacity to exchange information through telecommunication networks was 281 petabytes in 1986, 471 petabytes in 1993, 2.2 exabytes in 2000, 65 exabytes in 2007 • It is predicted that the amount of traffic flowing over the internet will reach 667 exabytes annually by 2013. Copyright 2012

  11. Approach - Cloud Computing • Cloud computing is the use of computing resources (hardware and software) that are delivered as a service over a network (typically the Internet). The name comes from the use of a cloud-shaped symbol as an abstraction for the complex infrastructure it contains in system diagrams. Cloud computing entrusts remote services with a user's data, software and computation. • Buzzword: SaaS/IaaS/PaaS ISQS 6339, Data Mgmt & BI

  12. Distributed business intelligence • Deal with big data – the open & distributed approach • LAMP • Hadoop • MapReduce • HDFS • NOSQL • Zookeeper • Storm

  13. Apache Hadoop •  An open-source software framework for storage and large scale processing of data-sets on clusters of commodity hardware. • The Apache Hadoop framework is composed of the following modules : • Hadoop Common - contains libraries and utilities needed by other Hadoop modules • Hadoop Distributed File System (HDFS). • Hadoop YARN - a resource-management platform responsible for managing compute resources in clusters and using them for scheduling of users' applications. • Hadoop MapReduce - a programming model for large scale data processing. • Apache Hadoop's MapReduce and HDFS components originally derived respectively from Google's MapReduce and Google File System (GFS) papers. ISQS 6339, Data Mgmt & BI

  14. A Multi-node Hadoop Cluster ISQS 6339, Data Mgmt & BI

  15. ISQS 6339, Data Mgmt & BI

  16. ISQS 6339, Data Mgmt & BI

  17. Hadoop 2: Big data's big leap forward • The new Hadoop is the Apache Foundation's attempt to create a whole new general framework for the way big data can be stored, mined, and processed. • The biggest constraint on scale has been Hadoop’s job handling. All jobs in Hadoop are run as batch processes through a single daemon called JobTracker, which creates a scalability and processing-speed bottleneck. • Hadoop 2 uses an entirely new job-processing framework built using two daemons: ResourceManager, which governs all jobs in the system, and NodeManager, which runs on each Hadoop node and keeps the ResourceManager informed about what's happening on that node. ISQS 6339, Data Mgmt & BI

  18. MapReduce 2.0 – YARN(Yet Another Resource Negotiator) ISQS 6339, Data Mgmt & BI

  19. Introducing other projectsin sensing and intelligent traffic Big Data Matters

  20. TrafficSense – Energy Efficient Traffic with CrowdsensingAalto University Meeting arranged in August 2013 to discuss China cooperation – participation TBC (w.Matti Hämäläinen/ICT Alliance and Zhangxi Lin as Chinese data analysis expert) http://cse.aalto.fi/en/research/groups/distributed_systems/projects/trafficsense/

  21. Everyday Sensing - “Analyzing the City” of Oulu:Data Fusion: Sensor network data + open data sources (e.g. weather & digiroad) + Social Network Services analysis (sentiment etc.)

  22. TrafficSense – Energy Efficient Traffic with Crowdsensing • Traffic accounts for about ¼ of peoples’ total energy consumption. • TrafficSense studies how to save energy by advising users on better travel options based on the real-time traffic information collected from mobile devices and traffic-related information sources. • In addition, the information gathered can, in the longer term, also enable more energy efficient transportation services and business models.

  23. Fujian Provincial Traffic Information System • Covers the whole province • major roads, 17,873 bridges, 357 tunnels, 10,900 villages/cities • 100k+ vehicles • To be integrated with China's Beidou* satellite navigation system” following the  Global Positioning System (GPS) and the Russian GLONASS.  • System scale • 160 CPUs, with 200 TB storage • More than 100,000 vehicles of the province are in the system • Vehicle data collection 10GB per day, and video data one screen per 30 second. • The budget for the system is 28.58 million yuan by the province • 5 million yuan were spent for FJUT’s test platform • 3 million is for software, and 2 million is for hardware. • Application projects: • Vehicle route services – Beautiful tour system • Abnormal driving behavior detection and warning • Abnormal social event detection and warning • Unusual road condition detection and announcement • Traffic jam detection and evacuation guidance

  24. Example Information Sources: Taxis in Fuzhou City This map is updated every 15 seconds

  25. Google Route vs . FJUT’s (Fujian) System Route The route suggested by Google is seemingly shorter, but it takes into account fewer factors The route suggested by FJUT is seemingly longer, but practically faster because of taking into account local conditions and the experience of drivers.

  26. Beijing 1039 Traffic Radio • Beijing cross-Wide Media Limited, established in March 2006, is a state-owned holding enterprises, under Beijing's Beijing People's Radio Broadcasting Company. Its Traffic Radio (FM103.9 MHz) dominates the powerful resources for the media industry, construction and diversified operations, including: • digital broadcast media operators; • intelligent transportation technology R & D and operations; • mobile terminal development and application of new media (new media machine); • CIF customer resource management and additional financial payment based on consumer value-added features membership card services (ETS card); • Annual revenue from advertising is 6 billion RMB. It is to launch new Internet services covering the issues, for example as the following • Used Car Dealers and other automotive service projects; • public relations activities and driving the car as the main cultural events; • Post-service for car owners, etc. • Website: www.fm1039.com, with the development and operation of "1039" series of branded products.

  27. Data Center - The Headquarter of Big Data Case of BaoCloud Center at Shanghai

  28. The land for data center at Shanghai

  29. Customizable Data Center Baocloud data center

  30. Beijing 1039 traffic radio Project at Fuzhou Fujian University of Technology CAABI Civil Engineering CrowdSensing projects Aalto University, University of Oulu, TampaereUniversity ISQS 6339, Data Mgmt & BI

  31. The process of BI • Data -> information -> knowledge -> actionable plans • Data -> information: the process of determining what data is to be collected and managed and in what context • Information -> knowledge: The process involving the analytical components, such as data warehousing, online analytical processing, data quality, data profiling, business rule analysis, and data mining • Knowledge -> actionable plans: The most important aspect in a BI process ISQS 6339, Data Mgmt & BI

  32. Actionable Knowledge • An information asset retains its value on if the converted knowledge is actionable. • Need some methods for extracting value from knowledge • This is not a technical issue but an organizational one – need empowered individuals in the organization to take the action • There is an issue of Return on Investment (ROI) ISQS 6339, Data Mgmt & BI

  33. BI Problems • Structured • Detecting Credit card fraud • Setting Loan parameters • Market segmentation/Mass customization • Deciding Marketing mix • Customer Churn • Reducing employee turnover • Improving Quality/Efficiency • … • Unstructured • Data exploration • Utilization of resources (stored knowledge) to maximum effectiveness • … ISQS 6339, Data Mgmt & BI

  34. BI Applications • Customer Analytics • Customer profiling • Targeted marketing • Personalization • Collaborative filtering • Customer satisfaction • Customer lifetime value • Customer loyalty • Sales Channel Analytics • Marketing • Sales performance and pipeline ISQS 6339, Data Mgmt & BI

  35. BI Applications (2) • Supply Chain Analytics • Supplier and vendor management • Shipping • Inventory control • Distribution analysis • Behavior Analysis • Purchasing trends • Web activity • Fraud and abuse detection • Customer attrition • Social network analysis ISQS 6339, Data Mgmt & BI

  36. The Evolution of Business Intelligence • 1st Generation – Traditional analytics (query and reporting) • 2nd Generation – Traditional generation (OLAP, data warehousing) • 2.5nd Generation – New traditional generation • 3rd Generation - Advanced analytics • Rules, predictive analytics and realtime data mining • Stream analytics ISQS 6339, Data Mgmt & BI

  37. Business Intelligence Classifications Stream Analytics* Real-time, continuous, sequential analysis (ranging from basic to advanced analytics) * In lieu of stream analytics, “embedded analytics,” although architecturally different, could potentially play the same role 3rd-Generation BI Advanced Analytics/Optimization Rules Predictive Analytics Real-time and traditional Data Mining “New Traditional” Analytics “2.5-Gen” Analytics (In-Memory OLAP, Search-Based) Traditional Analytics 1st Generation Analytics (Query & Reporting) 2nd Generation Analytics (OLAP, Data Warehousing) Source: Bill O’Connell IBM, Aug 2007 Legacy BI ISQS 6339, Data Mgmt & BI

  38. Business Intelligence Use Cases Real-Time Threshold Example Target Solutions: Fraud Detection / Risk CRM Analytic Supply Chain Optimization RFID / Spatial Data Other High-Volume Focus on what ishappening RIGHT NOW Stream Analytics* Real-time, continuous, sequential analysis (ranging from basic to advanced analytics) * In lieu of stream analytics, “embedded analytics,” although architecturally different, could potentially play the same role Focus on what will happen Analytic applications that apply statistical relationships in the form of RULES Advanced Analytics/Optimization Rules Predictive Analytics Real-time and traditional Data Mining “New Traditional” Analytics “2.5-Gen” Analytics (In-Memory OLAP, Search-Based) Data mining to determine whysomething happened by unearthing relationships that the end-user may not have known existed. Focus on what did happen Turning data into information is limited by the relationships which the end-user already knows to look for. Traditional Analytics 1st Generation Analytics (Query & Reporting) 2nd Generation Analytics (OLAP, Data Warehousing) Source: Bill O’Connell IBM, Aug 2007 ISQS 6339, Data Mgmt & BI

More Related