1 / 83

Data Science and Scientific Discovery New Approaches to Nature’s Complexity

Data Science and Scientific Discovery New Approaches to Nature’s Complexity. Dr. John Rumble President R&R Data Services Gaithersburg MD www.randrdata.com John.rumble@randrdata.com.

damara
Download Presentation

Data Science and Scientific Discovery New Approaches to Nature’s Complexity

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Data Science and Scientific Discovery New Approaches to Nature’s Complexity Dr. John Rumble President R&R Data Services Gaithersburg MD www.randrdata.com John.rumble@randrdata.com

  2. To understand scientific and technical data today, we must first understand how the information revolution has changed both Science and Data and their relationship DC Data Science May 2012

  3. My Talk • Science today • The Data Revolution in science • Scientific data and scientific databases • Data and scientific discovery • The challenges of using data science on scientific data DC Data Science May 2012

  4. Why Do We Do Science Two primary motivations for advancing science • First is our insatiable thirst to understand the world– probably from when we started thinking • Second is a direct result of the Industrial Revolution: How does the technology we are inventing actually work? DC Data Science May 2012

  5. 21st Century Science • From the fundamental to the complex • Determining the laws of nature for a few particles to understanding real systems - cells, the atmosphere, the Earth, ecology • From reductionism to constructionism • Using our basic knowledge to make models and predict behavior of real systems – that is all systems we find in nature or that we can construct DC Data Science May 2012

  6. Science Vol. 336, p. 707 (2012) DC Data Science May 2012

  7. J. Schmitt et al, Science vol 336, p 708, 2012 DC Data Science May 2012

  8. Today’s Science and E-Science The Data Revolution has enabled E-Science through • Advanced telecommunications and networks • Computation power and storage • New algorithms for data management, visualization, analysis, and mathematics • Today, E-Science can be done faster and more powerfully, and scientific communication can occur almost instantly The real revolution, however, is in the relationship between science and data DC Data Science May 2012

  9. When it hits the New York Times, you know it is for real! DC Data Science May 2012

  10. Science and Data • To understand scientific and technical data today, we must first understand how the information revolution has changed Science and Data and their relationship • Science today is not about reduction to a few basic laws • Science is about how do we understand and control all aspects of nature • How is this done? • By careful measurement, accurate tests, keen observations, and powerful models and simulations that lead to scientific knowledge • The results are expressed as scientific data! DC Data Science May 2012

  11. Scientific Knowledge • What does this really mean? DC Data Science May 2012

  12. Scientific Knowledge Scientific knowledge means understanding the independent variables governing a phenomenon and how they influence it • What does this really mean? Analyze its components Recognize a new phenomenon Identify the variables that govern it Change the phenomenon Isolate the important variables Demonstrate understanding by control DC Data Science May 2012

  13. Science Today A major theme of science today is that we are able to make accurate measurementson a complex worldthat • Advance our understanding of nature, • Improve our ability to harness technology, And, in spite of many challenges, • Increase the importance of science to society in the future Scientific data are at the core of modern science DC Data Science May 2012

  14. My Talk • Science today • The Data Revolution in science • Scientific data and scientific databases • Data and scientific discovery • The challenges of using data science on scientific data DC Data Science May 2012

  15. Today, E-Science is real Computer at every desk Connectivity: The Internet/WWW explosion Computerized experiments and observations Database tools on every computer Electronic publications Model and simulation-based R&D Comprehensive databases Virtual libraries The Data Revolution in Science DC Data Science May 2012

  16. Four Ways to Generate Scientific Data • Observations • Experiments • Standardized testing • Modeling and simulation DC Data Science May 2012

  17. Observational Science Today Today we have exciting new capability to observe nature in situ better than ever before • Hubble Space Telescope • High sensitivity seismographs • Bio-macromolecule sequencing instruments • LTER (Long-term ecological research) platforms • Earth-observing satellites • High power computers to analyze data Generates huge amounts of quality data DC Data Science May 2012

  18. Today we have exciting new capability to observe nature in controlled circumstances better than ever before Atomic force microscopes Micro-electronics and lasers High energy accelerators Femto-second chemical reactors High power computers to analyze data Generates large amounts of high quality data Experimental Science Today DC Data Science May 2012

  19. Testing Today Today we have new capability to test and analyze materials using standard methods • Electronic test equipment • Analytical databases fully integrated into equipment • Analyzing unknown substances • Carbon and other techniques dating objects • Genomic sequencing • National and international standard test procedures • Data analysis tools to generate properties • Self-calibrating instruments Generates medium amounts of high quality data DC Data Science May 2012

  20. Computation Today We now also have the ability to create a Virtual World • Models and simulations of complex systems • Techniques to do advanced mathematics • Computers to execute immense calculations • Visualization tools to examine our virtual world Uses and generates large amounts of data DC Data Science May 2012

  21. Characteristics of Approaches for Generating Scientific Data DC Data Science May 2012

  22. The Data Revolution in Science is Real • Observation, experimentation, testing, and calculation all produce, and in some cases use, large amounts of data • E-Science has provided an incredible array of tools, technologies, and methods to collect, store, manage, analyze, exploit, preserve, and disseminate these data Science today is more fully based on data and data collections than ever before! DC Data Science May 2012

  23. My Talk • Science today • The Data Revolution in science • Scientific data and scientific databases • Data and scientific discovery • The challenges of using data science on scientific data DC Data Science May 2012

  24. Scientific Data and Scientific Databases • Data communicate measurement (experimental and observational) and computational results • “When you can measure what you are speaking about, and express it in numbers, you know something about it; • Lord Kelvin DC Data Science May 2012

  25. 1, 2, 3… ABCs Greek, scripts, symbol E=mc2 Types of Scientific Data • Numbers • Simple text • Complex text • Equations • Graphs • Diagrams • Pictures • Software • Rules DC Data Science May 2012

  26. All Data Are Not the Same • Measurement or property: There is a difference! • Measurements are a one-time look at nature • Properties are the inherent characteristics of nature • They are Nature Itself DC Data Science May 2012

  27. Measurements are what you see now Capture one point of view Usually limited number of variables changed One of 1300 measurements of Diego Giacometti Measurements are for Today DC Data Science May 2012

  28. Properties are Forever • Properties are the real thing • Need many repeated measurements • Far too many substances and systems to determine properties • Will never properties of everything The real Diego Giacometti DC Data Science May 2012

  29. Hypotheses Questions The Classical Paradigm for Science and Data Measure-ment Scientific Knowledge Theories Models Data

  30. The True data paradigm has always been this Hypotheses Questions Measure-ment Scientific Knowledge Theories Models Data Collections Data

  31. Scientific Databases in History • Preserved data collections (large and small) • At first, simply data preservation • Data was stored, but not really exploited • Accuracy • Comprehensiveness • Systematizing DC Data Science May 2012

  32. Accuracy Newgrange – Ireland • 6000 years old • Aligned to the rising sun in the winter solstice • Depended on careful observational data on the rising sun • One data point! DC Data Science May 2012

  33. Volume and Accuracy Improving Stonehenge • 5000 years old • Over 100 stones • Complicated stone alignments • Marks position of the moon and major stars as well as the sun • Storage of several observations DC Data Science May 2012

  34. Galen Greek physician Experimental physiologist Arabic copy from 800 AD Pictorial, descriptive, function describing Representative of botanical and animal catalogs Comprehensive Data Sets DC Data Science May 2012

  35. Pliny the Elder Roman scholar Natural History (77 AD) One of earliest known encyclopedias of the natural world Systemization of data Systematizing a Comprehensive Collection DC Data Science May 2012

  36. My Talk • Science today • The Data Revolution in science • Scientific data and scientific databases • Data and scientific discovery • The challenges of using data science on scientific data DC Data Science May 2012

  37. Data and Scientific Discovery • The advent of the Baconian Revolution –anchoring scientific understanding to physical observation • Led to databases becoming the foundation of scientific discovery • True Beginnings of Data Science! DC Data Science May 2012

  38. Scientific Databases in History • Preserved data collections (large and small) form the foundation of scientific discovery • Trends in data preservation and discovery • Accuracy • Comprehensiveness • Systematizing • Extraction of essence • Explanation of the complex • Prediction of new phenomena! • Physical theory from data! DC Data Science May 2012

  39. Extraction of Essence TychoBrahae • Late 16th Century • Danish Astronomer • Made precise measurements that led to Kepler’s theories • Led to discovery of simple relationships DC Data Science May 2012

  40. Charles Darwin Combined with others in geology, zoology and botany A wide variety of facts and phenomena recorded Theory of Evolution had to explain many diverse observations and measurements from different disciplines Explanation of the Complex DC Data Science May 2012

  41. Prediction of New Phenomena Mendeleev and the Chemical Periodic Table Predicting properties of unknown elements from properties (data) of known elements DC Data Science May 2012

  42. Physical Theory from Data • Notes on the Spectral Lines of Hydrogen: Johann Jacob BalmerAnnalenderPhysik und Chemie 25 80-5 (1885) • “I gradually arrived at a formula which, at least for these four lines, expresses a law by which their wavelengths can be represented by striking precision…From the formula, we obtained for a fifth hydrogen line 3936.65x10-7 mm. “ • The development of quantum mechanics Bohr Schrödinger DC Data Science May 2012

  43. Brief History of Modern S&T Databases 1950sCrystal structures (software generated data -1960s Neutron data (modeling weapons) 1970s Analytical chemistry (identify chemicals) Thermochemistry (properties linked) Environmental and toxicology Large physics experiments Space science 1980s Astronomy Materials Earth sciences Biology Genomics DC Data Science May 2012

  44. Scientific Databases Today Preserving” Data is Easy • Database management tools are inexpensive and powerful • Many models for good interfaces exist • Collecting data (data deposition) can be routine • Expertise is easily available from many sources Building databases today is remarkably easy DC Data Science May 2012

  45. All observation for every point in the sky For all living things! 30,000 or 300,000? Water, earth, atmosphere and all they contain Many millennia, the entire planet 60 elements, 5 at a time, many ratios, 109 – 1010 compounds 5M species? or 10M? or 50M? Every person, every thought forever Comprehensive Data Collections for 21st Century Science • International Virtual Observatory • Structural Genomics • Proteomics • Climate change • Historic geologic • Chemistry on demand • Biodiversity • Brain scans Very large databases will be found in every scientific discipline DC Data Science May 2012

  46. The Face of 21st Century Science • Complex • Multi-disciplinary • Real systems • Virtual as well as physical Access to quality data becomes critical Attention to the problems and challenges of long term preservation of and access to data becomes more important than ever! DC Data Science May 2012

  47. Today Collections managed by groups Collections not readable by any individual Collections interpretable only with aid of software The Future Discoveries aided or made by computers, with verification by people? Scientific Discovery and Data Collections The Paradigm has Changed Yesterday • Collections managed by a small number of people • Collections readable by one scientist • Collections interpretable by one person • Discoveries made by thinking, with analysis by one person DC Data Science May 2012

  48. The Proposition Scientific databases in the future will be even more important source for scientific discovery • Data collections are critical for • New insights • New scientific principles • New knowledge • Understanding complex systems Let’s look at 3 problems and the challenges they present DC Data Science May 2012

  49. My Talk • Science today • The Data Revolution in science • Scientific data and scientific databases • Data and scientific discovery • The challenges of using data science on scientific data DC Data Science May 2012

  50. Three Problems in the Data Era Hypotheses Questions • Too much data • Complex systems • Complex science Measurement Scientific Knowledge Theory Models Data Collections Science and Data How the information revolution has changed their relationship Data DC Data Science May 2012

More Related