180 likes | 353 Views
The Top 10 Reasons Why Federated Can’t Succeed. And Why it Will Anyway. But First…. What is our purpose as a community? Produce (wonderful) new ideas Structure the field Educate the workforce. A Brief History of Federation. Multibase @1980 Many attempts since Functional Relational
E N D
The Top 10 Reasons Why Federated Can’t Succeed And Why it Will Anyway
But First… • What is our purpose as a community? • Produce (wonderful) new ideas • Structure the field • Educate the workforce
A Brief History of Federation • Multibase @1980 • Many attempts since • Functional • Relational • Object-oriented • Logic-based • XML • Still not solved (think of last night) • And never will be?
Number 10: Robustness • Systems fail • Sources slow or unavailable • In a distributed system, more pieces • => more failures • Users don’t like failures
Number 9: Security • Different systems have different security mechanisms • Hard to create a single coherent view of permissions • Distributed systems are more vulnerable • More points of failure • Hard to make security guarantees • Data is often the corporate jewels • It must be protected
Number 8: Updates • Recording change isn’t always an UPDATE • Application semantics must be accounted for • Application APIs must be reckoned with • ACIDity isn’t always achievable • Not all data sources display ACID properties • Varying degrees of support • Strong transaction semantics not always possible or appropriate • And always painful • Changes to multiple sources must be coordinated • Requirements for consistency vary
Number 7: Configurability • Many architectures possible • Even with pre-existing sources, many choices • Little or no guidance on tradeoffs • Lots of code to install • Federation engine, data source clients • Often choices here • Lots of connections to define • Need tooling to support
Number 6: Administration • Monitoring is hard • Not all sources have facilities to track events • Variety of mechanisms for different events, and different sources • Not always APIs • Tuning is difficult • Need to understand what must change • Need to take appropriate actions • Repairing is painful • Distributed debugging • Different vendors to deal with for fixes
Number 5: Semantic heterogeneity • Hard to identify commonalities • Same terms, different meanings • Different terms, same meaning • Different structures representing different interpretations • Can’t integrate data effectively without them • Can’t make sensible queries
Number 4: Insufficient Metadata • Need metadata to integrate, configure, administer and query • Every data source has different metadata • No uniform standard • Not always collected • Tools to examine and exploit missing
Number 3: Performance (Data Movement) • Distributed queries involve moving data • Geographic distribution is common • WAN is slow • Large data volumes common • Large numbers of objects • Large objects • Caching isn’t a complete answer • Changes can be frequent and hard to track • Storage is not unlimited
Number 2: Performance(Complexity) • Decision-support appls do complex queries • Many choices for how to execute • Big differences in performance among choices • Need data from diverse sources • May not have enough power in source • Performance at sources may vary • Need expensive functions of data • Function may not be implemented everywhere • Flowing the data to the function expensive
Number 1: Performance(Pathlength) • Simple queries (OLTP-like) incur huge overheads • Processing and networking costs • Simple queries are common • Easier to write • Automatically produced • Workflows
So Why Will Federated Succeed? • It has to • Integration one of the top IT issues • And it’s not going away • Alternatives are expensive and/or painful • Write it by hand • EAI/Workflow • Consolidation (warehouse, data marts…)
So Why Will Federated Succeed? (2) • Simple scenarios exist • Don’t need OLTP, high security, great robustness, … for all applications • Customers know their data, or must learn anyway • Needs are so great, compromise is possible
So Why Will Federated Succeed? (3) • Progress on technology being made • 20 years of distributed query processing • Plumbing in place • Commit protocols • Reliable messaging • Connectivity infrastructure • XML (basic community agreement) • XML data format • XML schema • Web services • We’re getting closer
What would we do if it ever did work? • Retire • Integrate the web? • Data grids • Data Google • P2P database?
For Discussion • Is research in this area warranted? • What are the most important research topics? • Did we miss any?