Telegraph: A Universal System for Information

Telegraph: A Universal System for Information

Telegraph History & Plans • Initial Vision • Carey, Hellerstein, Stonebraker • “Regres”, “B-1” • Sweat, ideas and further vision • 4 of my grads committed • Brewer + 2 grads committed • Franklin will play • obvious tie-ins with other projects

& synergies! Telegraph Architecture Control, DigLib Query/Browse/Mine Mariposa, Millenium, Control Global Agoric Federation Continuously Reoptimizing Query Processor Adaptive Data Placement River, Ninja, Aetherstore, Control,STIX Storage Manager (FS, DB, Web) Ninja, GiST, IStore

Storage Manager • Historic chance to start over! • new hardware realities • variable-length segments, not blocks • big main memories • extra CPUs at the devices (IStore) • revisit and clean up infrastructure for transactions • clean API supporting both log-based & version-based schemes; version-based runs today! • big SW Eng. challenge • unify DB/FS/Web server! • Clients: Ninja’s persistent hash table, query processing, web server, Linux (NT?) filesystem. • (Mohan Lakhamraju, Rob von Behren, Steve Gribble)

Query Engine • Shared-nothing (cluster) • all data flow (no blocking ops) • auto load-balance to micro/macro changes in environment • adaptivity more important than raw performance!! • CONTROL! || ripple join, online reordering • (Shankar Raman) • continuously reoptimizing query plans • tie-ins with STIX (Christos/Sinclair/Russell/Hellerstein) • (Ron Avnur) • first steps in handling streaming sources

Cluster Data Layout • issues: fragmentation, placement, replication on 10^6 disks. For DB/FS/Web. • goals: availability, efficiency, consistency, manageability. • Adaptivity: cooperative vs. competitive ($$) techniques? • (Mehul Shah)

Global Federation • Global distribution • federated DBMS layer a la Mariposa/Cohera • address all the hard stuff they dropped! • Global data placement • as in cluster, but must be competitive. (Mehul Shah) • Global query processing (Amol Deshpande) • Agoric query optimization • distributed query processing • Global metadata • yellow pages both for services & datasets • Millenium/Ninja tie-ins?

Applications • Really finding stuff in all the world’s data? • UI meets AI meets Logic (browse/mine/query) • CONTROL is key: seamless, non-blocking interaction • multi-res output and feedback during browse/query • hints, wizards, training (AI mining, user in the loop) • build on existing “scalable spreadsheet”/xform tools (Shankar Raman)

Telegraph: A Universal System for Information