210 likes | 319 Views
“Big Data” The wrong name for a major issue?. Clive Longbottom, Service Director, Quocirca Ltd. “Big Data”. It’s not about databases per se It is about: Volume – but not just databases Velocity – results need to be produced in near real-time Variety – the aspect that is missed by many
E N D
“Big Data”The wrong name for a major issue? Clive Longbottom, Service Director, Quocirca Ltd
“Big Data” • It’s not about databases per se • It is about: • Volume – but not just databases • Velocity – results need to be produced in near real-time • Variety – the aspect that is missed by many • Veracity – how good are the inputs • Value – is the data worth it?
Which of the following statements most closely matches your understanding of the term “big data”?
How well do you believe that you understand what tools are needed for “big data”?
From your point of view, big data can be dealt with through:
How important do you believe big data will be to your organisation over the next 2 years?
A basic “rule of thumb” • 20 years ago: • Only 20% of an organisation’s information was in electronic form • 80% of this was in a formal database • Today: • Well over 80% of an organisation’s information is in electronic form • Less than 20% is in a formal database
The enterprise application dilemma CRM ERP SCM Inf. Silo Inf. Silo Inf. Silo
The growth of unstructured • Not just text – but images, video media assets, VoIP, Videoconferencing • Replicated/archived data a large part of growth • But – is it completely unstructured? Source: Ram Subramanyam Gopalan
File formatting • XML (or quasi-XML) • CSV/tab delimited • Text blocks • Meta data • TCP/IP packet header information • Pattern recognition • Colour, shape, texture (CST) • Inferred data
The open “value chain” “Open” information from e.g. search engines, social networks Information flows Customer’s customer Your Organisation Supplier’s supplier Customer Supplier
Organisation information sources • Organisation data: • Enterprise application data • Office documents • Reports, analytics • GRC information • Information on competitors • Financial performance data • Images, voice, video… • …
Supplier information sources • Supplier data • Logistics data • Inventory data • Transactional data • Competitive information • Credit and background checks • Invoices, catalogues, contracts, images… • Voice, video… • …
Customer information sources • Customer data: • Orders, payment details, returns information • Past purchases • Credit and background checks • Searches, web analytics • Social media comments • …
Information issues • You no longer have control • The open value chain removes direct control • Security of information assets is critical • Identifying and aggregating information assets • Capturing information when and where possible – and legal • Bringing structured and unstructured together • Sifting through the dross to get to the “golden nuggets”
Shrink and filter… • Information under your control: • Deduplicate • Taxonomise • Index • Tag • Information not under your control: • Filter (intelligently) • Tag and index when it crosses your boundaries
Federate and aggregate • Link databases • Use master data management • Bring in unstructured data • Use Hadoop along with NoSQL datastores (e.g. Cassandra, MongoDB) • Use cross-function search and reporting tools • E.g. HP Autonomy, CommVault Simpana • Use analytics to present results in meaningful ways
Basic schematic approach Filter Apply metadata MapReduce App SQL NoSQL Search, analyse and report
A future glimpse? • It’s déjà vu all over again • Remember in-memory databases? • Big data cannot remain as a jigsaw solution • Full-service solutions will come forward • Who will be the winners? • Oracle, IBM, Microsoft? • SAP? • EMC, Symantec? • The Open Source environment (e.g. 10Gen, Apache/Cassandra, CouchDB)?
Conclusions • Big Data has many vectors • Volume, velocity, variety and veracity: each is as important as the others - value will accrue through getting them right • More information is outside the realm of your direct control • Capturing what can be captured in a useful manner is key • The evolution of the market is rapid • NoSQL and Hadoop provide the underpinnings for a new, information centric approach • The formal database is not dead • But it is only on aspect of the problem – and the solution
Thank you Contact details: Clive.Longbottom@Quocirca.com Further reading: http://quocirca.com/reports/150 http://quocirca.com/articles/617 http://quocirca.com/articles/637