100 likes | 112 Views
Explore the complexities of managing vast amounts of data at the intersection of computing and biology, addressing key questions, barriers, and collaboration hindrances. Discover the future of data storage, organization, and access, along with the need for standard formats and terminology.
E N D
Data Management for Frontiers at the Interface Between Computing and Biology Jim Gray Microsoft Research
Cosmic Questions • Where are we today? • Where in 5 years? • What are the key questions? • What am I doing next? • What are the barriers? • What hinders collaboration? • What changes needed in education?
Yotta Zetta Exa Peta Tera Giga Mega Kilo How much information is there? Everything! Recorded • Soon everything can be recorded and indexed • Most bytes will never be seen by humans. • Human attention is the precious resource. • Automatic: Capture, store, organize, analyze, summarize • Manual visualize/iterate All BooksMultiMedia All LoC books (words) .Movie A Photo A Book 24 Yecto, 21 zepto, 18 atto, 15 femto, 12 pico, 9 nano, 6 micro, 3 milli
Plumbing • Everything can be online • Storage is nearing 1 K$/TeraByte, • Networking is 1$ / delivered GB • Software is cheap or free • Systems are becoming self-managing
Data Management Systems • Can ingest/store/search/analyze Tera Bytes • Numbers • Text • Some progress on “objects” • But semantics have to come from the domain • Good science and engineering, but…Flopped in marketplace.
Basic Problems • Data Acquisition: • I do not much to say here • Data Ingest: • This is a huge problem • Data Organization & Access • This is what databases are good at for text & numbers • For “semantic” data it requires domain –specific tools. • Data Publication/ Discovery/ Interchange • Requires good standards • We have syntactic standards, Semantic standards are needed.
My #1 Problem Data Interchange(includes publication and discovery) • What does the data mean? • The answer is: 42. • Units? • Precision? Accuracy? • How was the number derived? • How can you tell me what it means(without us talking on the phone or you visiting my laboratory) • Need standard terminology, and standard formats. • Hard to do for “new” stuff.
Great Hope & Promise • XML is the answer • Reality: XML is one layer up from Unicode. • Can describe structured information • But not process, not meaning, not… • Answer #2: Objects • SOAP, Web Services,… • Probably a better answer • But… still needs tools to make it workable.
Gifford’s List • Data Interchange • Scale: whats big • Quality: how do you keep it up • DBs need more semantics.