510 likes | 521 Views
Explore the impact and future of Big Data, challenges, applications in fields like commerce, healthcare, and warfare, and its role in social change. Includes narratives on chess and the Higgs boson discovery.
E N D
De Wereld van Big DataVan Causaliteit naar Correlatie Aske Plaat Leiden University Ministerie van Veiligheid en Justitie Leiden Centre of Data Science
Acknowledgements Research is team work. I would like to acknowledge for their help and inspiration: Jaap van den Herik, Stan Bentvelsen, Jos Vermaseren, Ben Ruijl, Joost Kok, Peter de Kock, Ron Boelsma, Rob van Eijk, Liesbeth Boer, Joke Hellemons and Eric Postma
Contents • What is Big Data? • Where do we find Big Data? • Role of Big Data • Applications • Future of Big Data • Narrative Science • From correlations to causation? • Conclusion 3
BIG DATA • Definition van Tom White (2012) : “Big Data is the term for a collection of data sets so large and complex that it becomes difficult to process using on-hand databases management tools or traditional data processing applications.” • The challenges capture: • curation, • storage, • search, • sharing, • transfer, • analysis, • visualization, • interpretation, • real-time (van Eijk, 2013) 4
Three perspectives onthe development of Big Data • An overflow of Data Results (e.g., in chess; in particlephysics) (2) A lack of coordinationamong the information items (as in the 9/11 events; a positiveexample is Watson) (3) The power of Big Data (byusingconcepts as visualizationand narrativescience) 5
Chess • Much research has been performed in computer chess • DEEP BLUE (IBM) defeated the world champion Kasparov in 1997 • FRITZ defeated Kramnik (December 2006) 6
Higgs boson found at LHC • 4 July 2012 7
Higgs particle The breakthrough of the century
Jeopardy! the game to play From Big Data to Data
Five Challenges and Improvements WATSON is bad in • Clues with a complex syntax • Clues with Art involved • Removing wrong answers that are submitted previously from its considerations • Not understanding the answers given by its opponents (embodiment) • Employing delaying tactics when answering (stalling)
Applications IBM offered WATSON to - Columbia University Medical Center - University of Maryland School of Medicine In the Netherlands, IBM is negotiating on WATSON’s use by a large medical centre In 2014, IBM is setting up a large research institute in New York (2000 researchers)
Research question of the future BIG DATA contains (almost) all available knowledge How to identify and extract relevant knowledge?
The role of BIG DATA - Social-economic Ph.D theses from 1970 to 2000 are frequently “outdated” by BIG DATA developments. - Deep Knowledge vs. Partial Knowledge - Real-time bidding (RTB) happens in 30 milliseconds (0,030 sec.) 16
Successful Commerce requires Speed Even superficial profiling leads to surprisingly good results. For instance, when • Looking for holiday destination • Buying book online • Searching for houses online → Key is Commerce Requirements to ads: clear and fast Real-time bidding happens in 0.030 seconds This is necessary otherwise the Web visitor has left. 17
From Social Change to Social Innovation Two difficult social changes: • To fill up without paying • Cracking Thud (Ramkraak) • Liability at the Oil Companies (not at the Police or Public Prosecutor) • Liability at the Banks (not at the Police or Public Prosecutor)
Eleven Applications • Safety (politics, military) • Public Safety (Live View) Example 1 • Commerce (ads) • Banking (money streams) • Health care Example 2 • Judiciary (CODR) • Waterway transport Example 3 • Communication (twitter, phablet) • Education (MOOC) • Public governance • Warfare (Multi Agent Systems, Socio Cognitive Models) MOOC = Massive Open Online Courses
Four Recent Applications • Football Analytics 2. Legal Analytics 3. Airplane Emergency Analytics 4. Anticipating Criminal Behaviour
Pointers to Football Analytics • http://world-cup-2014.squawka.com/netherlands-vs-costa-rica/05-07-2014/world-cup/matches • http://www.whoscored.com/Regions/155/Tournaments/13/Netherlands-Eredivisie • http://www.statsbomb.com/category/player-analytics/ • http://www.fifa.com/worldcup/statistics/index.html
Legal Analytics Lawyer as a profession is in discussion by the current development. • Search (e-discovery) • Knowledge handling • Prediction (Avondvoor de Wetenschap, Jaap van den Herik met Jan-Jaap Oerlemans, eLaw)
Zoekennaar (1) relevanteinformatie (2) precedenten (die relevant zijnvooreengeheelnieuwe casus) Knowledge handling voor (3) invullen van formulieren (4) schrijven van vonnissen Predicties (door advocaten) voor (5) voorspellenafloop van een casus In de VS is eenduidelijketerugloopwaarneembaar van studentenaaneen Law School
Anticipating Criminal Behaviour Promovendus Peter de Kock Berenschot Promotie 10 september 2014 in Tilburg Probleemstelling: “To what extent can a scenario model support lawenforcement agencies in the anticipation of criminal behaviour?” Onderzoek naar Lone Wolves Tristan van der Vlis en Anders Breivik
Resultaten (1) Categorisering (2) Vervolg Boston (april 2013) (3) Ondersteuning Narrative Science (4) Mogelijke toepassingen Jihad strijders
Computational Turn: From causality to correlation • Sampling is no longer at stake. Nowadays data from big populations (Twitter feeds, clicking behavior, Facebook data) are important. • Insight into causal relations has lost its importance at many places. • Correlation (what works well and what not) has taken over priority. This development is called Computational Turn. • Computational Turn asks for reflection from economics, law, social sciences, behavioral sciences, and philosophical perspectives.
Narrative Science BIG DATA: - collection - awareness - usage How did it happen that way? - generation of data (collection) - visualization of data (Napoleon) - narrative science (which story is in BIG DATA? e.g., Wiki Leaks?)
Narrative science • Finding the causes behind the correlations: make a story • Examples: • Boston April 2013 • Google Fluchart Future for AI: Reason about correlations to predict causations
Leiden Centre of Data Science • The new centre will focus on multidisciplinary research • Emphasis on data science: big data and small data • We start with: • Bioscience • Physics • Mathematics • Computer science • Aviation • Law
Leiden Centre of Data Science Theoretical physics • Higgs boson found at LHC • 4 July 2012
Leiden Centre of Data Science Data Mining
Leiden Centre of Data Science Activities Business Optimization