1 / 32

Data / Information / Knowledge

Data / Information / Knowledge. Presentation by Pauline Lake Modifications by Rick Mercer

ahoff
Download Presentation

Data / Information / Knowledge

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Data / Information / Knowledge Presentation by Pauline Lake Modifications by Rick Mercer Acknowledgment and Disclaimer: This presentation is supported in part by the National Science Foundation under Grant 1240841. Any opinions, findings, and conclusions or recommendations expressed in these materials are those of the authors and do not necessarily reflect the views of the National Science Foundation.

  2. Outline • Processing Large Data Sets • Using Data • Big Data and Mobile Computing

  3. Processing Large Data Sets Sort a Petabyte (1015) bytes of data • 1015 = 103 x 103 x 103 x 103 x 103 bytes • Quicksortassumes the data are in RAM • 1 Petabyte would occupy • 1,000 1-TB disk drives, or • 10,000 100-GB drives

  4. The MapReduce Model MapReduceis a programming model for processing large data sets • Distributed file system -- data sets are stored over many computers • Parallel algorithm -- i.e., many identical processes running simultaneously • MapReduce, developed at Google • Hadoop, open source Apache version

  5. MapReduce Experiment: Sort a Petabyte (1018 bytes) • References: • Petabyte Sort Blog (Quantcast Sort Blog) • Sorting Petabytes with Map Reduce (Google Research)

  6. MapReduce Example Problem: Count the occurrences of every word in a large set of documents, D1, D2, …, DN. • D1: “a man, a plan, a canal, panama” • D2: “in for a penny in for a pound” • … • DN: ...

  7. MapReduce Example Algorithm: Map Step: for each word, w, in D1,...,DN, output the partial count (w, 1) Reduce: sum = 0 for each partial count, pc, produced by Map step sum = sum + pc

  8. MapReduceExample • Problem: Count the occurrences of every word in a set of documents Map/Reduce System Count the occurrences of every word in D1, D2, … DN.

  9. MapReduce Example • Problem: Count the occurrences of every word in a set of documents Map/Reduce System Count the occurrences of every word in D1, D2, … DN. “a man, a plan, a canal, panama” “in for a penny in for a pound” Master

  10. MapReduceExample • Problem: Count the occurrences of every word in a set of documents Map/Reduce System Count the occurrences of every word in D1, D2, … DN. “a man, a plan, a canal, panama” “in for a penny in for a pound” Master Mapper1 MapperM

  11. MapReduce Example • Problem: Count the occurrences of every word in a set of documents Map/Reduce System Count the occurrences of every word in D1, D2, … DN. “a man, a plan, a canal, panama” “in for a penny in for a pound” Master Mapper1 MapperM (for,1),(for,1) (in,1),(in,1) (a,1),(a,1),(a,1),(a,1),(a,1)

  12. MapReduce Example • Problem: Count the occurrences of every word in a set of documents Map/Reduce System Count the occurrences of every word in D1, D2, … DN. ç “a man, a plan, a canal, panama” “in for a penny in for a pound” Master partial counts Mapper1 MapperM (for,1),(for,1) (in,1),(in,1) (a,1),(a,1),(a,1),(a,1),(a,1)

  13. MapReduce Example • Problem: Count the occurrences of every word in a set of documents Map/Reduce System Count the occurrences of every word in D1, D2, … DN. “a man, a plan, a canal, panama” “in for a penny in for a pound” Master Mapper1 MapperM (for,1),(for,1) (in,1),(in,1) (a,1),(a,1),(a,1),(a,1),(a,1) Reducer1 Reducer2 Reducer3 ReducerR

  14. MapReduce Example • Problem: Count the occurrences of every word in a set of documents Map/Reduce System Count the occurrences of every word in D1, D2, … DN. “a man, a plan, a canal, panama” “in for a penny in for a pound” Master Mapper1 MapperM (for,1),(for,1) (in,1),(in,1) (a,1),(a,1),(a,1),(a,1),(a,1) Reducer1 Reducer2 Reducer3 ReducerR (a,5) (in,2) (man,1),... (for,5)

  15. MapReduce Example • Problem: Count the occurrences of every word in a set of documents Map/Reduce System Count the occurrences of every word in D1, D2, … DN. “a man, a plan, a canal, panama” “in for a penny in for a pound” Master Mapper1 MapperM (for,1),(for,1) (in,1),(in,1) (a,1),(a,1),(a,1),(a,1),(a,1) sum of partial counts Reducer1 Reducer2 Reducer3 ReducerR (a,5) (in,2) (man,1),... (for,2)

  16. MapReduce Example • Problem: Count the occurrences of every word in a set of documents. Map/Reduce System Count the occurrences of every word in D1, D2, … DN. (a,5), (for,2), (in,2), (man,1), (plan,1), (canal,1), (panama,1), (penny,1), (pound,1)

  17. Using Data

  18. Big Data: Government Data • 2012, Data.gov, 84 programs, six departments • Benefit: helping government address problems • Tradeoff: Government has too much data on us?

  19. Big Data: Web Analytics • Analyticsdiscovery and use of meaningful patternsin data • Benefit: Provide customers with targeted ads • Tradeoff: Loss of privacy and anonymity of web search

  20. Big Data: Data Mining • Data Mining -- discovering patterns in large data sets. • Benefit: Discovering risk factors in medical data. • Tradeoff: Can we keep patient medical data secure? Normal patients Diabetic patients

  21. Data Visualization • IBM chromogram of Wikipedia edits reveals known and new editing patterns

  22. Data Mining: Neonatal monitoring • Data mining real-time data (heart rate, respiratory rate, O2satur-ation) provides a non-invasive way of predicting neonatal health • Traditional approach: Apgar score: measure tone, cry, color, breathing, … scale of 1 through 9, at birth 5 minutes, 10 minutes

  23. Big Data and Mobile Computing

  24. Big Data and Mobile Google: Translate “Ciao mondo!”

  25. Big Data and Mobile Google: Translate “Ciao mondo!” Map/Reduce (speech recognition)

  26. Big Data and Mobile Google: Translate “Ciao mondo!” Map/Reduce (speech recognition) “Hello world!”

  27. Big Data and Mobile Google: Translate “Ciao mondo!” Map/Reduce (speech recognition) “Hello world!” Benefit: Improves ability to learn foreign language • Tradeoff: Google knows what we’re thinking about

  28. Big Data and Mobile Google: Augment reality

  29. Big Data and Mobile Google: Augment reality Map/Reduce

  30. Big Data and Mobile Google: Augment reality Map/Reduce

  31. Big Data and Mobile Google: Augment reality Map/Reduce Benefit: Better awareness of what’s around us. Tradeoff: Google knows where we are, what we’re thinking.

  32. Summary • The Digital era involves Large Data Sets • Presents challenges and opportunities. • Requires new processing and visualization techniques • Comes with the promise of benefits • Comes with tradeoffs in terms of privacy and security

More Related