1 / 18

Fourth Paradigm

Fourth Paradigm. Science-based on Data-intensive Computing. How to read the text?. Foreword: A very interesting foreword by Gordon Bell discusses innovation and discoveries through the centuries…

gisela
Download Presentation

Fourth Paradigm

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Fourth Paradigm Science-based on Data-intensive Computing

  2. How to read the text? • Foreword: • A very interesting foreword by Gordon Bell • discusses innovation and discoveries through the centuries… • Data-intensive science consists of three basic activities: data capture, curation and analysis (we will add one: visualization). • Various sources of data are identified

  3. Some sources of big data • Australian square kilometer array of telescopes • CERN Hadron Collider • Gene sequencing machines (HTS: High Throughput Sequencing) • Human genome project

  4. Activities around big data • Capture and aggregation • Curation involves • Right data structures • Schema • Metadata • Integration across instruments, labs, and experiments • Modeling • Analysis (including visualization) • Archiving

  5. How can we contribute? • “..there are many opportunities and challenges for data-intensive science.” • “scientific data mash-ups” • … Read the foreword in the Fourth Paradigm book. It is quite inspiring.

  6. Jim Grey on eScience • Research cycle: data capture, data curation, data analysis and data visualization • Mega-scale, milli-scale • Publication of research results • We need tools … • Capture, curate, analyze, model, visualize, make decision and take actions (This will be our objectives for the two projects for the course.)

  7. Science paradigms (chronologically) • Thousands of years ago: • Science was empirical • Describing natural phenomena • Last few hundred years • Theoretical branch • Models and generalizations • Last few decades • Computational branch • Simulating complex phenomena • Today • Data exploration/data science • Unify theory, experiment and simulation • Data captured by simulator or instrument • Processed by software • Info/knowledge/ intelligence • Analysis and visualization

  8. Fourth Paradigm: Area 1: Environmental Sciences

  9. Three Phases • Phase 1: largely discipline-oriented: geology, atmospheric chemistry, ecosystems • Phase 2: study of interacting element earth sciences, human behavior and systems: Complex systems and models for explaining these systems emerged; knowledge developed for scientific understanding • Phase 3: knowledge developed for practical decisions and actions • This new knowledge endeavor is termed “Science of environmental applications”

  10. Knowledge and Queries • “With the basic understanding now well established, the demand for climate applications knowledge is emerging. • How do we quantify and monitor total forest biomass so that carbon markets can characterize supply? • What are the implications of regional shifts in water resources for demographic trends, agricultural output, and energy production? • To what extent will seawalls and other adaptations to rising sea level impact coasts?” • These questions and additional issues need applications to built around this basic knowledge • Integration of knowledge from many disciplines: physics, biogeochemical, engineering, and human processes (demographics, practices). • Snow-melt problem affects 1 billion people in the world: it is discussed in detail. • Models have to consider interactions among various systems: traditional approaches may not suffice. My opinion: We may need a compendium of methodsand the knowledge derived from these methods have to be unified.

  11. Consideration for designing models • Need driven vs. curiosity driven • Externally constrained • Consequential and recursive (knowledge generates more knowledge) • Useful even when incomplete • Scalable • Robust • Data-intensive

  12. Development of New Knowledge Types and Tools • For acquiring knowledge multiple sources: satellite imagery, energy-balancing reconstruction, etc. • Practical answers, intellectually captivating models, knowledge for policy making, etc. • Equally important is using this knowledge in everyday lives: esp. with the availability of mobile devices and the Internet. Ex: Hurricane Irene models, knowledge and consequences

  13. Simple Example • Figure 1: • Sierra Nevada and Central Valley of CA geography image • NASA satellite images at various spectral bands • Use these two collection to arrive at the snow cover model • Something of a scientific “mashup”

  14. Figure 1

  15. Ecological Sciences Systems (p.24..) • This is especially relevant in the context of recent anomalies such as super storms Sandy. • Navigating the ecological data flood • Step1: in ecological scientific analysis is data discovery and harmonization…sources, conversions, scrapping, web services, RSS, namespace mediation, search portals, wikipedia, … • Step 2: Moving ecological synthesis into the cloud: CSV MATlab ready data files from hundreds of sites; analysis on the cloud; SQL Server Analysis Data Cube on the cloud;

  16. Sample Ecology Mashup • See figure F1 and the link that goes with it at the bottom of the page.

  17. Ecological Sciences Systems (contd.) • Step 2 (contd.): analysis will download 3 terabyte of imagery, 4000 CPU hours, and generate < 100MB results. • Challenges: complex visualization, diversity of the data set, data semantics, data publisher, meta data, collaboration tools. • So you think these are NOT CSE problems, take a look at this site: http://climatechange.cs.umn.edu/

  18. Summary • Climate research is one of the critical application area of research for data-intensive computing. • We looked at some of the sample applications. • We also observed “mashup” of data from various sources is a common approach used in these applications.

More Related