250 likes | 464 Views
Big Data: Handling of huge data from mobile devices. axxessio. Axxessio – Individualsoftware mit architekturkompetenz. What is Big Data Use Cases Technologies The h adoop ecosystem Map / Reduce concept Demo: P rovisioning of a hadoop cluster in HDInsight
E N D
Big Data:Handling ofhugedatafrom mobile devices axxessio Axxessio – Individualsoftware mit architekturkompetenz
Whatis Big Data Use Cases Technologies The hadoopecosystem Map/Reduceconcept Demo: Provisioningof a hadoopcluster in HDInsight Demo: Development of a simple Map/Reducecode
Gold isonlyavailable in smallquantities 300 Kg ore + 20 tonstoxicants (e.g. cyanide) + … 2.700 Kg ofressourcesneededtoproduce a gold ring (approx. 3 g)
Big Data Whatis Big Data Mobile data, Internet Sites, Web Forums, Blogs, Social Networks, eMails, Sensors From an enourmousamountofdata, only a smallfractionisextractedasvaluabledata. Expensive separationofuselessandusefullinformations • Volume (bigamountofdata) • Variety (manydataformats) • Velocity (high performance) • Value (valuableinformation) • Veracity (qualityofdata) • Volatility (storageofdata)
Big Data fromCheapPhones * Oneofthe „10 breakthroughtechnologies 2013“ (MIT)
Big Data fromCheapPhones Malaria Activity can be tracked by cell-phone towers, providing a rough way to trace a person’s movements. Dark blue: Most important sources of malaria infections Light blue: Major destinations of people exposed to malaria
Big Data fromCheapPhones In january2010 an earthquake in Haiti killed more than 200.000 people Researchers at Sweden’s Karolinska Institute obtained data from Digicel, Haiti’s largest mobile carrier and mined the daily movement data from two million phones 630.000 people who had been in Port-au-Prince on the day of the earthquake had left the city within three weeks It was possible to calculate how many people had fled an area affected by a cholera outbreak, and where they went. Cholera
Usecases in different industries • Detection of fraudulent transactions in real time, risk assessments, accelerated case management, individualized services • Financial services • Accelerated case management, better risk assessment, behavioral pricing • Insurance • Quality assurance, individualized approach and services, fraud detection, new products • Telecommunication • Preventivemaintenanceandmonitoring, connecteddevices, individualizedservices, marketsurveillance • Production • Short-term demand forecasts, networked & customized equipment, predictive control • Energy • Forecasts for demand planning, dynamic pricing, market surveillance and individualized approach • Retail • Fast situation detection and early detection of hazardous events • Public safety • Connected devices, preventive control, efficient case management, data-driven development • Health • Connected cars, navigation, traffic jam help • Mobility
Big Data usecases Big Data isthewayhowdiscoveries will happen in future Google researchersfound out bychancethat certain search terms are good indicators of flu activity. Today Google Flu Trends uses aggregated Google search data to estimate flu activity in the different countries. In future from correlating huge amount of medical data new therapies could be discovered. patientrecords Computecorrelationswith Big Data analytics New therapies clinicalstudy Scientific publications
Big Data Big Data technologies For real time processingofhugeamountsofdata In-Memory databasesareused Forbatchprocessingofbigamountsofunstructureddatathehadoopecosystemisused. Big Data analysisisdonewiththe MAP/REDUCE framework Use Cases: Real Time processingofsensordataorfinancialtransactions Use Cases: Analysis ofSocial Data, eMail Data, …
Diversityofdata - Rows - Columns Relational DB SQL Query: SELECT SUM(GROSS) WHERE CUSTOMERID= 123456789
Diversityofdata Key Value Databases { Persons: { Firstname : Leonardo, Lastname : Da Vinci, Age : 30, Nationality : Italian }, { Firstname : Pablo, Lastname: Picasso, Age : 40, Nationality: Spanish } }
HadoopEcosystem MAHOUT libraryofalgorithmsformachinelearning PIG HIVE Scripting ofMapReducejobs HQL forMapReduce Java framework HBASE MapReduce Key-Value DB HDFS Hadoopdistributedfilesystem * ThereareothercomponentsoftheHadoopEcosystem, whichare not shown(Oozi, Ambari, ZooKeeper, Hcatalog, Scoop, Flume, etc.)
HDFS (hadoopdistributedfilesystem) A on Node 1 B on Node 2 C on Node3 Client Node Name Node File.txt 1 TB write B A C C A B Data Node 1 Data Node 2 Data Node 3 Data Node N
Traditional Architecture Structured Data System 1 Integration DWH Analytics System 2 System 3
Big Data Architecture Structured and not structureddata System 1 Distributed File System Map / Reduce DWH System 2 System 3 Analytics
MapReduce local remote Worker Worker Worker Worker aggregation Worker Result Worker Worker intermediate results Distributed data MAP PHASE REDUCE PHASE
DEMO Remote Desktop C# Code AzureBlob Storage Upload Twitter Jsondata HDInsight C# Code