1 / 51

De Wereld van Big Data Van Causaliteit naar Correlatie

De Wereld van Big Data Van Causaliteit naar Correlatie. Aske Plaat Leiden University Ministerie van Veiligheid en Justitie Leiden Centre of Data Science . Acknowledgements.

rmoreland
Download Presentation

De Wereld van Big Data Van Causaliteit naar Correlatie

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. De Wereld van Big DataVan Causaliteit naar Correlatie Aske Plaat Leiden University Ministerie van Veiligheid en Justitie Leiden Centre of Data Science

  2. Acknowledgements Research is team work. I would like to acknowledge for their help and inspiration: Jaap van den Herik, Stan Bentvelsen, Jos Vermaseren, Ben Ruijl, Joost Kok, Peter de Kock, Ron Boelsma, Rob van Eijk, Liesbeth Boer, Joke Hellemons and Eric Postma

  3. Contents • What is Big Data? • Where do we find Big Data? • Role of Big Data • Applications • Future of Big Data • Narrative Science • From correlations to causation? • Conclusion 3

  4. BIG DATA • Definition van Tom White (2012) :  “Big Data is the term for a collection of data sets so large and complex that it becomes difficult to process using on-hand databases management tools or traditional data processing applications.” • The challenges capture: • curation, • storage, • search, • sharing, • transfer, • analysis, • visualization, • interpretation, • real-time (van Eijk, 2013) 4

  5. Three perspectives onthe development of Big Data • An overflow of Data Results (e.g., in chess; in particlephysics) (2) A lack of coordinationamong the information items (as in the 9/11 events; a positiveexample is Watson) (3) The power of Big Data (byusingconcepts as visualizationand narrativescience) 5

  6. Chess • Much research has been performed in computer chess • DEEP BLUE (IBM) defeated the world champion Kasparov in 1997 • FRITZ defeated Kramnik (December 2006) 6

  7. Higgs boson found at LHC • 4 July 2012 7

  8. 100 petabytes collected so far 8

  9. Higgs particle The breakthrough of the century

  10. Jeopardy! the game to play From Big Data to Data

  11. Five Challenges and Improvements WATSON is bad in • Clues with a complex syntax • Clues with Art involved • Removing wrong answers that are submitted previously from its considerations • Not understanding the answers given by its opponents (embodiment) • Employing delaying tactics when answering (stalling)

  12. Applications IBM offered WATSON to - Columbia University Medical Center - University of Maryland School of Medicine In the Netherlands, IBM is negotiating on WATSON’s use by a large medical centre In 2014, IBM is setting up a large research institute in New York (2000 researchers)

  13. Research question of the future BIG DATA contains (almost) all available knowledge How to identify and extract relevant knowledge?

  14. The role of BIG DATA - Social-economic Ph.D theses from 1970 to 2000 are frequently “outdated” by BIG DATA developments. - Deep Knowledge vs. Partial Knowledge - Real-time bidding (RTB) happens in 30 milliseconds (0,030 sec.) 16

  15. Successful Commerce requires Speed Even superficial profiling leads to surprisingly good results. For instance, when • Looking for holiday destination • Buying book online • Searching for houses online → Key is Commerce Requirements to ads: clear and fast Real-time bidding happens in 0.030 seconds This is necessary otherwise the Web visitor has left. 17

  16. From Social Change to Social Innovation Two difficult social changes: • To fill up without paying • Cracking Thud (Ramkraak) • Liability at the Oil Companies (not at the Police or Public Prosecutor) • Liability at the Banks (not at the Police or Public Prosecutor)

  17. Eleven Applications • Safety (politics, military) • Public Safety (Live View) Example 1 • Commerce (ads) • Banking (money streams) • Health care Example 2 • Judiciary (CODR) • Waterway transport Example 3 • Communication (twitter, phablet) • Education (MOOC) • Public governance • Warfare (Multi Agent Systems, Socio Cognitive Models) MOOC = Massive Open Online Courses

  18. Public Safety

  19. Flu prediction from correlation with Google search results

  20. Transport behavior by one vessel

  21. The places of all vessels at one moment

  22. Four Recent Applications • Football Analytics 2. Legal Analytics 3. Airplane Emergency Analytics 4. Anticipating Criminal Behaviour

  23. Football Analytics

  24. Pointers to Football Analytics • http://world-cup-2014.squawka.com/netherlands-vs-costa-rica/05-07-2014/world-cup/matches • http://www.whoscored.com/Regions/155/Tournaments/13/Netherlands-Eredivisie • http://www.statsbomb.com/category/player-analytics/ • http://www.fifa.com/worldcup/statistics/index.html

  25. Legal Analytics Lawyer as a profession is in discussion by the current development. • Search (e-discovery) • Knowledge handling • Prediction (Avondvoor de Wetenschap, Jaap van den Herik met Jan-Jaap Oerlemans, eLaw)

  26. Zoekennaar (1) relevanteinformatie (2) precedenten (die relevant zijnvooreengeheelnieuwe casus) Knowledge handling voor (3) invullen van formulieren (4) schrijven van vonnissen Predicties (door advocaten) voor (5) voorspellenafloop van een casus In de VS is eenduidelijketerugloopwaarneembaar van studentenaaneen Law School

  27. Anticipating Criminal Behaviour Promovendus Peter de Kock Berenschot Promotie 10 september 2014 in Tilburg Probleemstelling: “To what extent can a scenario model support lawenforcement agencies in the anticipation of criminal behaviour?” Onderzoek naar Lone Wolves Tristan van der Vlis en Anders Breivik

  28. The twelve elementary scenario components

  29. Resultaten (1) Categorisering (2) Vervolg Boston (april 2013) (3) Ondersteuning Narrative Science (4) Mogelijke toepassingen Jihad strijders

  30. Computational Turn: From causality to correlation • Sampling is no longer at stake. Nowadays data from big populations (Twitter feeds, clicking behavior, Facebook data) are important. • Insight into causal relations has lost its importance at many places. • Correlation (what works well and what not) has taken over priority. This development is called Computational Turn. • Computational Turn asks for reflection from economics, law, social sciences, behavioral sciences, and philosophical perspectives.

  31. Narrative Science BIG DATA: - collection - awareness - usage How did it happen that way? - generation of data (collection) - visualization of data (Napoleon) - narrative science (which story is in BIG DATA? e.g., Wiki Leaks?)

  32. Napoleon

  33. Narrative science • Finding the causes behind the correlations: make a story • Examples: • Boston April 2013 • Google Fluchart Future for AI: Reason about correlations to predict causations

  34. Leiden Centre of Data Science • The new centre will focus on multidisciplinary research • Emphasis on data science: big data and small data • We start with: • Bioscience • Physics • Mathematics • Computer science • Aviation • Law

  35. Leiden Centre of Data Science

  36. Leiden Centre of Data Science

  37. Leiden Centre of Data Science Theoretical physics • Higgs boson found at LHC • 4 July 2012

  38. Leiden Centre of Data Science

  39. Leiden Centre of Data Science

  40. Leiden Centre of Data Science

  41. Leiden Centre of Data Science

  42. Leiden Centre of Data Science Data Mining

  43. Leiden Centre of Data Science

  44. Leiden Centre of Data Science

  45. Leiden Centre of Data Science Activities Business Optimization

More Related