1 / 24

Big Data Driven: Official Statistics

Big Data Driven: Official Statistics. Amish Patel, Big Data Leader for Government, Europe amishpat@uk.ibm.com. Agenda. Drivers for leveraging Big Data Implications of Big Data on Official Statistics Challenges & Opportunities Industrialisation and Collaborative model

chavi
Download Presentation

Big Data Driven: Official Statistics

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Big Data Driven: Official Statistics Amish Patel, Big Data Leader for Government, Europe amishpat@uk.ibm.com

  2. Agenda • Drivers for leveraging Big Data • Implications of Big Data on Official Statistics • Challenges & Opportunities • Industrialisation and Collaborative model • New products and indicators

  3. Drivers for leveraging big data

  4. The Big Data Conundrum • The economies of deletion have changed…. • Leading us into new opportunities and challenges • The percentage of available data an enterprise can analyze is decreasing proportionately to the data available to that enterprise • Quite simply, this means as enterprises, we are getting “more naive” about our business over time • Just collecting and storing “Big Data” doesn’t drive a cent of value to an organization’s bottom line Data AVAILABLE to an organization SignalsandNoise 001100110010010101001010011110010010110101101000100100010001010011110010001001000100100011001000100100010010001010001001000 Data an organization can PROCESS

  5. Implications Of Big Data On Official Statistics

  6. Challenges & Opportunity Impact on Policy and Development issues Methodological: bridging the gaps by combining multiple data sources Technology (processing and storage) Security/Privacy Governance Financial

  7. 1. Impact On Policy And Development IssuesExample: Leveraging Big Data for Currency of National Statistics

  8. 2. MethodologicalExample: Bridging the gaps by combining multiple data sources

  9. 3. Technology – Processing and StorageExample: Storage is key to your Infrastructure Cloud Agile Efficient by Design Incorporates cloud technologies to improve service quality, speed of delivery and efficiency Designed for data Deliver insights in seconds through systems built to process a variety of data at scale Smarter Storage Self-Optimizing Optimize performance and cost by matching workloads with the best platform to meet specific workload requirements 10

  10. Data Footprint Reduction • Real-Time Compression is a method of reducing storage needs by changing the encoding scheme as the data is being read and written • Short patterns for frequent data • Longer patterns for infrequent data. • Can achieve 40 to 80 percent reduction in storage capacity. • Data deduplication is a method of reducing storage needs by eliminating duplicate copies of data. • Store only one unique instance of the data • Redundant data replaced with pointer

  11. Storage Tiers – A trade-off between performance and cost Technologies allow us to place and move data to the appropriate storage tier to balance between performance and cost Server FasterPerformance Cache, Flashand Solid-State Drives Hard Disk Drives Tape LowerCost Cloud

  12. 4. Security/PrivacyNeed real-time data activity monitoring for security & compliance Data Repositories (databases, warehouses, file shares, Big Data) • Continuous, policy-based, real-time monitoring of all data traffic activities, including actions by privileged users • Database infrastructure scanning for missing patches, mis-configured privileges and other vulnerabilities • Data protection compliance automation Host-based Probes (S-TAPs) Collector Appliance Key Characteristics • 100% visibility including local DBA access • Minimal performance impact • Does not rely on resident logs that can easily be erased by attackers, rogue insiders • No environment changes • Prepackaged vulnerability knowledge base and compliance reports for SOX, PCI, etc. • Growing integration with broader security and compliance management vision • Single Integrated Appliance • Non-invasive/disruptive, cross-platform architecture • Dynamically scalable • SOD enforcement for DBA access • Auto discover sensitive resources and data • Detect or block unauthorized & suspicious activity • Granular, real-time policies • Who, what, when, how

  13. 5. GovernanceVision for information integration & governance Traditional Approach Structured, analytical, logical Systems of Record New ApproachCreative, holistic thought, intuition Systems Of Engagement HadoopStreams DataWarehouse Data Warehouse HadoopStreams Transaction Data Web Logs Social Data Internal App Data InformationIntegration, Governance & Context Accumulation StructuredRepeatableLinear UnstructuredExploratoryIterative Mainframe Data Text & Images OLTP System Data Sensor Data TraditionalSources NewSources ERP data New Sources Traditional Sources RFID Systems Of Record andSystems Of Engagement

  14. Governance concerns for big data customers How do I cleanse and validate the results of my big data analysis ? How do I integrate and link my big data environment with my current one ? Agile. Simple. Trusted Information. How do I create a trusted view of my customers and products for big data ? How do I protect data in a big data environment ? Is a governed and auditable archive possible with big data ?

  15. Governance in an exploratory Big Data environment 1. Ensure trust & compliance • Lineage of data as it enters and leaves the big data system • Secure the big data systems from breaches • Create masked dev and test analytics clusters 2. Accelerate time to value • High performance data provisioning • Integrated data integration and stream analytics platform 3. Lower total cost of ownership • Simplified tooling to improve productivity of developers and testers • Automated system security • Complete visibility into the data movement and lifecycle Create privatized data in real time or on the cluster to ensure data protection Secured BigInsights to prevent any data breaches High Performance and high quality data loads Integration for improved segmentation of analytical data sources Low cost historical archive loaded to Hadoop for exploratory analytics

  16. 6. Financial Engagement Model Business Model Citizens-Pay NS-Pay Businesses-Pay • To private Company for value-added services to citizens • Pay to private Company for inexpensive services • Typically cloud-based • Services free or discounted • Funded by other parts of the business • Can be non-profit organisations Invest and define Information (catalogue and datasets) Link Data, aggregate data Increase value of open-data NS Motivate and educate Incubate and evaluate Link Data NS co-invests Accelerate evolution of ecosystem Services built & maintained by community on top of open-data

  17. Industrialisation and Collaborative ModelLeverage City Forward model for National Statistics

  18. Impact on Everyday Life    Howsafeis myneighborhood? Whichcareeris rightforme? Whattypeof educationdoIneed? Sources:http://www.chicagocitycrime.com/,http://www.bls.gov/ooh/computer-and-information-technology/software-developers.htm,http://cityforward.org

  19. New Products and IndicatorsEvolving beyond statistics to predictive analytics, sharing complementary datasets with private sector and citizens Examples: • Predictive models for healthcare cost reduction and outcome optimisation • Epidemic outbreak surveillance – hotspots, progression waves • Aligning public services (federal, regional and city level) to existing and predictive demographic data

  20. Multimodal Data Streams GPS Cell-phones (location tracking) Public Transport (bus, docking) Pollution measurements Weather Conditions (including road conditions) Optical traffic flow detectors Travel time data based on plate recognition Induction loop detector data Accidents in network as they are being recorded Road closures (road work, etc) Still pictures from road cameras Real Time Traffic Monitoring & Information (Multimodal) Travel Planner Example: Traffic Management for Sustainability and Efficiency Real Time Transformation Logic Real Time Geo Mapping Real Time Speed & Heading Estimation Real Time Aggregates & Statistics GPS Data Streams Storage adapters Interactive visualization Data Warehouse Web Server Offline statistical analysis Google Earth

  21. Thank You

  22. www.sendsteps.com Prepare to react; keep your phone ready! Internet Go to sendc.com 1 Log in with Session 2 Type WS2 <space> your answer 3 TXT Text to +316 4250 0030 1 Type Session <space> WS2 <space> your answer 2 Posting messages is anonymous No additional charge per message

  23. What kind of Use-case enabled by Big Data technology do you think will add value to your organisation for calculating official statistics?

More Related