280 likes | 309 Views
Learn about the evolution of Big Data technology, the types of data it encompasses, and the crucial importance of extracting business value through textual disambiguation. Explore the challenges, strategies, and potential of leveraging Big Data for significant business gains.
E N D
ACHIEVING BUSINESS VALUE WITH BIG DATA A presentation by W H Inmon
Big Data – a brief history (using a military analogy) In military strategy, you want to take the high ground In technology, having the dbms technology that manages the largest amount of data IS the high ground 1960 – IBM with IMS 1970 – IBM with IMS /DC – transaction processing 1980 – Teradata with MPP technology 2010 – IBM + Hadoop Each successive iteration of technology took new high ground for the vendor
Hadoop Google Yahoo Amazon.com Then IBM, Cloudera, Hortonworks, Teradata et al came along and discovered Hadoop
Big Data can be divided into two major types of data Repetitive data metering data click stream data call level detail log tapes analog Non repetitive email call center corporate contracts healthcare warranty claims insurance claims
Repetitive data Same record size Same structure Often times same data Very regular Non repetitive data Record size is different Similar structures are accidental Almost never the same data Very irregular
Repetitive data Useful data Only a small percentage of data is useful Non repetitive data Useful data The vast majority of data is useful
Big Data can be divided into two major types of data Repetitive data metering data click stream data call level detail log tapes analog Limited business value Massive business value Non repetitive email call center corporate contracts healthcare warranty claims insurance claims
90% of fishermen 10% of the fish Repetitive data 10% of the fishermen 90% of the fish Non repetitive data
There is trouble in paradise - Wall Street Journal, winter 2013 – the return on investment for every dollar spent on Big Data is $.55 Large consulting firm – in 18 months we have done 150 proof of concepts for Big Data – 5 have been successful Large New York bank – for three years we have been trying to make Big Data work. We have done everything our vendor has told us to do. We just are not getting any business value from Big Data
budget sources of data analytical processing compatibility Cirro Mongo pig Hive Map Reduce what the analyst sees today
Business value No one talks about business value
Non repetitive business relevant unstructured data Big Data Business Value Here is what lies ahead in addressing the topic of achieving business value out of Big Data
Non repetitive business relevant unstructured data Big Data Business Value In order to get to business value you MUST solve the issues of unstructured data
The vendor’s notion of a solution Map Reduce Data scientist Big Data Business Value
Non repetitive business relevant unstructured data Here is the even bigger hurdle that no one is talking about Here’s what everyone is talking about Big Data
text text text text text text text text text text text text text text text text text text text text text text We need lots of things, but most of all we need CONTEXT
and what’s so challenging about raw text? it is dangerous and potentially very misleading to try to use raw text as a basis for decisions…. 7? Consider the following confusion….. the answer is seven… seven what? seven days? seven dollars? seven wonders of the world? seven seas? seven dwarfs?
She’s hot…. Or consider this confusion….. what is being said here? she is attractive and I want to date her… it is Houston Texas and it is 98 degrees. She is sweating… I just took her temperature and it was 104 degrees….. looking at the words “She’s hot” tells you nothing in order to make sense of the text you MUST supply context and that is true for ALL text
Textual disambiguation In order to achieve Business Value, the raw non repetitive business relevant text found in Big Data must pass through a process known as textual disambiguation
so how do you do textual disambiguation? The first step is to “contextualize” the raw data qualified vocabularies document metadata homographic resolution taxonomies ontologies document sensitive inference textual proximity documents acronym resolutions
The problem with “contextualization” is that there are many ways to “contextualize” the text, all depending on the text There is no one single algorithm……
Repetitive data Context is easy to find Non repetitive data Context is there but it is difficult to find
Contract type Date Contract party Term
doctor gender/race cancer type location description
raw Non repetitive Standard dbms disambiguated Analytical processing Textual disambiguation Analytical processing
Limited business value So where are organizations being pushed? Business value
Thank goodness someone understands Big Data! For more information see our white papers and articles at – www.forestrimtech.com Everything on the site is FREE!!!