100 likes | 188 Views
Data Management for Frontiers at the Interface Between Computing and Biology. Jim Gray Microsoft Research. Cosmic Questions. Where are we today? Where in 5 years? What are the key questions? What am I doing next? What are the barriers? What hinders collaboration?
E N D
Data Management for Frontiers at the Interface Between Computing and Biology Jim Gray Microsoft Research
Cosmic Questions • Where are we today? • Where in 5 years? • What are the key questions? • What am I doing next? • What are the barriers? • What hinders collaboration? • What changes needed in education?
Yotta Zetta Exa Peta Tera Giga Mega Kilo How much information is there? Everything! Recorded • Soon everything can be recorded and indexed • Most bytes will never be seen by humans. • Human attention is the precious resource. • Automatic: Capture, store, organize, analyze, summarize • Manual visualize/iterate All BooksMultiMedia All LoC books (words) .Movie A Photo A Book 24 Yecto, 21 zepto, 18 atto, 15 femto, 12 pico, 9 nano, 6 micro, 3 milli
Plumbing • Everything can be online • Storage is nearing 1 K$/TeraByte, • Networking is 1$ / delivered GB • Software is cheap or free • Systems are becoming self-managing
Data Management Systems • Can ingest/store/search/analyze Tera Bytes • Numbers • Text • Some progress on “objects” • But semantics have to come from the domain • Good science and engineering, but…Flopped in marketplace.
Basic Problems • Data Acquisition: • I do not much to say here • Data Ingest: • This is a huge problem • Data Organization & Access • This is what databases are good at for text & numbers • For “semantic” data it requires domain –specific tools. • Data Publication/ Discovery/ Interchange • Requires good standards • We have syntactic standards, Semantic standards are needed.
My #1 Problem Data Interchange(includes publication and discovery) • What does the data mean? • The answer is: 42. • Units? • Precision? Accuracy? • How was the number derived? • How can you tell me what it means(without us talking on the phone or you visiting my laboratory) • Need standard terminology, and standard formats. • Hard to do for “new” stuff.
Great Hope & Promise • XML is the answer • Reality: XML is one layer up from Unicode. • Can describe structured information • But not process, not meaning, not… • Answer #2: Objects • SOAP, Web Services,… • Probably a better answer • But… still needs tools to make it workable.
Gifford’s List • Data Interchange • Scale: whats big • Quality: how do you keep it up • DBs need more semantics.