70 likes | 94 Views
Explore the evolution of Data Science as an outreach-driven approach to extract insights from large data with computational methods, visualization, and domain knowledge for interpretable results.
E N D
Data Science • Data Science aims to extract insights from large data • Less emphasis on algorithms • More emphasis on ‘outreach’ • Term Data Science is about 10 years old, very popular nowadays • Many people reinvent themselves as Data Scientists • data miners, statisticians, BI people, analysts, database developers
Data Mining & Data Science Data Science • Computational methods • Dealing with large data • Visualisation • Involving domain knowledge • Interpretable and interpreted results Data Mining fff Statistics
cost per Gigabyte in dollars $1,000,000 $10,000 $100 $1 $0.01 2000 2010 1980 1990 Big Data • Because you can… • cheap storage • Administrative/financial reasons • Internet and social computing • Internet of Things, ubiquitous computing
Cheap Storage 1956, IBM 350, 5 Mb 90 Tb
Big Data Many facets, often people focus on only one • Very, very large data • CERN, Google, Facebook, Twitter, … • Analytics • Internet-generated • Social data • Heterogeneous, unstructured data • Large-scale technologies • MapReduce, Hadoop
Size-complexity trade-off • Technological restrictions produce a trade-off • Many BigData projects algorithmically not so complex • Embarrassingly parallel CERN size complexity