1 / 24

eScience

eScience. Jim Gray Microsoft Research presented @ 21st Century Computing Conference October 2006. eScience: What is it?. Synthesis of information technology and science. Science methods are changing.

Download Presentation

eScience

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. eScience Jim Gray Microsoft Research presented @ 21st Century Computing Conference October 2006

  2. eScience: What is it? • Synthesis of information technology and science. • Science methods are changing. • Science is being codified/objectified.How represent scientific knowledge in computers? • Science faces a data deluge.How to manage and analyze information? • Scientific communication changing;integrate online literature and data.

  3. Science Paradigms • Thousand years ago: science was empirical describing natural phenomena • Last few hundred years: theoretical branch using models, generalizations • Last few decades: a computational branch simulating complex phenomena • Today:data exploration (eScience) unify theory, experiment, and simulation • Data captured by instrumentsOr generated by simulator • Processed by software • Information/Knowledge stored in computer • Scientist analyzes datausing data management and statistics

  4. Data Mining Algorithms Miners Scientists Science Data & Questions What X-info Needs from us (cs)(not drawn to scale) Database To Store Data Execute Queries Systems Question & AnswerVisualization Tools

  5. How Engage With An Area • eScience is inter-disciplinary • We bring informatics expertise • Process: • Long-term and deep collaborations • Find someone who is desperate. • Start with requirements: 20 questions • Help build systems to: • Answer those questions much faster • Answer new questions.

  6. Astronomy • Help build world-wide telescope • All astronomy data and literature online and cross indexed • Tools to analyze the data • Built SkyServer.SDSS.org • Built Analysis system • MyDB • CasJobs (batch job) • OpenSkyQueryFederation of ~20 observatories. • Results: • It works and is used every day • Spatial extensions in SQL 2005 • A good example of Data Grid • Good examples of Web Services.

  7. Ecosystem Sensor NetLifeUnderYourFeet.Org • Small sensor net monitoring soil • Sensors feed to a database • Helping build system to collect & organize data. • Working on data analysis tools • Prototype for other LIMSLaboratory Information Management Systems

  8. RNA Structural Genomics • Goal: Predict secondary and tertiary structure from sequence.Deduce tree of life. • Technique: Analyze sequence variations sharing a common structure across tree of life • Representing structurally aligned sequences is a key challenge • Creating a database-driven alignment workbench accessing public and private sequence data

  9. VHA: largest standardized electronic medical records system in US. Design, populate and tune a ~20 TB Data Warehouse and Analytics environment Evaluate population health and treatment outcomes, Support epidemiological studies 7 million enrollees 5 million patients Example Milestones: 1 Billionth Vital Sign loaded in April ‘06 30-minutes to population-wide obesity analysis (next slide) Discovered seasonality in blood pressure -- NEJM fall ‘06 VHA Health Informatics

  10. HDR Vitals Based Body Mass Index Calculation on VHA FY04 Population Source: VHA Corporate Data Warehouse Total Patients 23,876 (0.7%) 701,089 (21.6%) 1,177,093 (36.2%) 1,347,098 (41.5%) 3,249,156 (100%)

  11. Other Projects • Carbon Cycle Portal • Hydrology Portal • Oceanography Workbench

  12. Common Themes • Each science is codifying & objectifying their data and knowledge • What is a galaxy? • What is a molecule? • So that they can • Ask questions of the data • Exchange data with one another • Result will be a Data Grid • Datasets published as “objects” • Service Oriented Architecture

  13. All Scientific Data Online • Many disciplines overlap and use data from other sciences. • Internet can unify all literature and data • Go from literature to computation to data back to literature. • Information at your fingertipsFor everyone-everywhere • Increase Scientific Information Velocity • Huge increase in Science Productivity

  14. Unlocking Peer-Reviewed Literature • Agencies and Foundations mandating research be public domain. • NIH (30 B$/y, 40k PIs,…)(see http://www.taxpayeraccess.org/) • Welcome Trust • Japan, China, Italy, South Africa,.… • Public Library of Science.. • Other agencies will follow NIH

  15. How Does the New Library Work? • Who pays for storage access (unfunded mandate)? • Its cheap: 1 milli-dollar per access • But… curation is not cheap: • Author/Title/Subject/Citation/….. • Dublin Core is great but… • NLM has a 6,000-line XSD for documents http://dtd.nlm.nih.gov/publishing • Need to capture document structure from author • Sections, figures, equations, citations,… • Automate curation • NCBI-PubMedCentral is doing this • Preparing for 1M articles/year • Automate it!

  16. Portable PubMedCentral • “Information at your fingertips” • Helping build PortablePubMedCentral • Deployed US, China, England, Italy, South Africa, (Japan soon). • Each site can accept documents • Archives replicated • Federate thru web services • Working to integrate Word/Excel/…with PubmedCentral – e.g. WordML, XSD, • To be clear: NCBI is doing 99% of the work.

  17. articles Data Sets Overlay Journals • Articles and Data in public archives • Journal title page in public archive. • All covered by Creative Commons License • permits: copy/distribute • requires: attribution http://creativecommons.org/ Data Archives

  18. title page articles Data Sets Overlay Journals • Articles and Data in public archives • Journal title page in public archive. • All covered by Creative Commons License • permits: copy/distribute • requires: attribution http://creativecommons.org/ JournalManagement System Data Archives

  19. title page comments articles Data Sets Overlay Journals • Articles and Data in public archives • Journal title page in public archive. • All covered by Creative Commons License • permits: copy/distribute • requires: attribution http://creativecommons.org/ JournalCollaboration System JournalManagement System Data Archives

  20. Better Authoring Tools • Extend Office tools to • capture document metadata (NLM DTD) • represent documents in standard format • WordML (ECMA standard) • capture references • Make active documents (words and data). • Easier for authors • Easier for archives

  21. Conference Management Tool • Currently a conference peer-review system (~300 conferences) • Form committee • Accept Manuscripts • Declare interest/recuse • Review • Decide • Form program • Notify • Revise

  22. eJournal Management Tool • Connect to Archives • Manage archive document versions • Capture Workshop • presentations • proceedings • Capture classroom ConferenceXP • Moderated discussions of published articles • Connect literature and data archives • Add publishing steps • Form committee • Accept Manuscripts • Declare interest/recuse • Review • Decide • Form program • Notify • Revise • Publish • Discuss & Critique

  23. Why Not a Wiki? • Peer-Review is different • It is very structured • It is moderated • There is a degree of confidentiality • Wiki is egalitarian • It’s a conversation • It’s completely transparent • Don’t get me wrong: • Wiki’s are great • SharePoints are great • But.. Peer-Review is different. • And, incidentally: review of proposals, projects,… is more like peer-review.

  24. eScience: What is it? • Synthesis of information technology and science. • Science methods are changing. • Science is being codified/objectified.How represent scientific information and knowledge in computers? • Science faces a data deluge.How to manage and analyze information? • Scientific communication changingintegrate online literature and data.

More Related