1 / 10

Data Management for Frontiers at the Interface Between Computing and Biology

Explore the complexities of managing vast amounts of data at the intersection of computing and biology, addressing key questions, barriers, and collaboration hindrances. Discover the future of data storage, organization, and access, along with the need for standard formats and terminology.

joannad
Download Presentation

Data Management for Frontiers at the Interface Between Computing and Biology

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Data Management for Frontiers at the Interface Between Computing and Biology Jim Gray Microsoft Research

  2. Cosmic Questions • Where are we today? • Where in 5 years? • What are the key questions? • What am I doing next? • What are the barriers? • What hinders collaboration? • What changes needed in education?

  3. Yotta Zetta Exa Peta Tera Giga Mega Kilo How much information is there? Everything! Recorded • Soon everything can be recorded and indexed • Most bytes will never be seen by humans. • Human attention is the precious resource. • Automatic: Capture, store, organize, analyze, summarize • Manual visualize/iterate All BooksMultiMedia All LoC books (words) .Movie A Photo A Book 24 Yecto, 21 zepto, 18 atto, 15 femto, 12 pico, 9 nano, 6 micro, 3 milli

  4. Plumbing • Everything can be online • Storage is nearing 1 K$/TeraByte, • Networking is 1$ / delivered GB • Software is cheap or free • Systems are becoming self-managing

  5. Data Management Systems • Can ingest/store/search/analyze Tera Bytes • Numbers • Text • Some progress on “objects” • But semantics have to come from the domain • Good science and engineering, but…Flopped in marketplace.

  6. Basic Problems • Data Acquisition: • I do not much to say here • Data Ingest: • This is a huge problem • Data Organization & Access • This is what databases are good at for text & numbers • For “semantic” data it requires domain –specific tools. • Data Publication/ Discovery/ Interchange • Requires good standards • We have syntactic standards, Semantic standards are needed.

  7. My #1 Problem Data Interchange(includes publication and discovery) • What does the data mean? • The answer is: 42. • Units? • Precision? Accuracy? • How was the number derived? • How can you tell me what it means(without us talking on the phone or you visiting my laboratory) • Need standard terminology, and standard formats. • Hard to do for “new” stuff.

  8. Great Hope & Promise • XML is the answer • Reality: XML is one layer up from Unicode. • Can describe structured information • But not process, not meaning, not… • Answer #2: Objects • SOAP, Web Services,… • Probably a better answer • But… still needs tools to make it workable.

  9. Discussion

  10. Gifford’s List • Data Interchange • Scale: whats big • Quality: how do you keep it up • DBs need more semantics.

More Related