430 likes | 577 Views
Middleware Diarrhea and Other Ailments. Michael Stonebraker Adjunct Professor Massachusetts Institute of Technology (stonebraker@lcs.mit.edu). Outline. Too much middleware XML ailments Web services ills Our professional sickness. Client-Server Got Replaced by N-Tier Computing. The Web
E N D
Middleware Diarrhea and Other Ailments Michael Stonebraker Adjunct Professor Massachusetts Institute of Technology (stonebraker@lcs.mit.edu)
Outline • Too much middleware • XML ailments • Web services ills • Our professional sickness
Client-Server Got Replaced by N-Tier Computing • The Web • Gizmos • Scalability and management problems with client server
Humility Lesson • We all sold client-server hard • during the 80’s • and even into the 90’s • Less than 10 years later • it is the worst idea on the planet We should feel really dumb!
N-Tier Computing Produced Lots of Middleware • App servers • EAI/messaging • ETL • Federators • Workflow • CMS • Portals • DBMS
Middleware Diarrhea • Average enterprise has • one (or more) app servers • one (or more) EAI packages • one (or more) ETL packages • one (or more) portal products • one (or more) application packages • and maybe someday a federated DBMS
All of these systems • Contain transformation engines • And often do function activation (app service) • And often have adapters to legacy systems Huge overlap in functionality!!
Less Moving Parts • Less systems • More uniformity • Less duplication
Less Systems • Less system administrators • Less training • Less manuals • Less bugs • Less cross system issues
More Uniformity • Every island has • memory management • security model • threading model • Less is better
Less Duplication • Most of the islands support transformations • reasonable chance you will do each one 6 or more times • maintenance headache
So How To Consolidate…… • Converge app server into OR DBMS • dumbest OR query is execute function Remember that everything looks like a nail to the guy with the hammer!
component component component Pictorially client DBMS DBMS
This Requires…. • DBMS to send queries to other DBMSs • I.e. be a data federator • Load balance also requires a federator
Best of Breed Federators • Support schema heterogeneity • by executing OR functions • Support materialized views • to cache static data
Less Moving Parts…. • Federators dominate ETL • ETL only supports “push” • federators do both “push” and “pull”
Workflow • A collection of rules • who’s allowed to buy what • and who must approve it • Best considered as a boxes and arrows diagram • And compiled into components to run on an app server
Workflow Framework -- PO’s IT? manager no no PO Big? Laven yes yes
Data Intensive Workflow Should Move Inside an OR DBMS • GUI for “boxes and arrows” • Compiler for the diagram • processing steps become components • business rules become triggers • all data flow inside the DBMS • Worked great in Media/360
Why? • Big Big Big performance advantage • no polling of the DBMS • no data movement • easy to change! Watch for Informix product in this area!
Nirvana • One integrated system that does • federation • EAI • app service • With a single transformation system • Based on DBMS technology (or something else….)
XML • Good for content storage and movement • Good as “on the wire” format for data movement • as long as you don’t need to send a lot of stuff fast • Bad for data storage!
History Lesson • 1960’s • IMS and IDMS get traction • customers start complaining about rewriting everything when schema changes
History Lesson • 1970 • Codd writes pioneering paper • starts a decade long argument between IMS/CODASYL advocates and Codd supporters
Net-Net of Argument • Putting semantics into data order is bad • restricts storage options • Hidden meaning bad • no self-defining fields
Net-Net of Argument • Data independence is good • schemas change often • don’t want to rewrite anything when this happens
Net-Net of Argument • Complexity is bad • high level query languages are good • KISS arguments • Call these three premises “Codd’s laws”
History Lesson • 1983(?) • Codd wins Turing award • acknowledgement for being right
XML in This Historical Light • Most of the bad features of IMS/Codasyl • allows semantics in data order • data independence will be a challenge • try updates on inverted hierarchies • look at IMS LDBs • more complex than Codasyl
Our Field • We look a little silly saying • an idea renounced in the 1970’s • is back • Leading our colleagues to ask “What’s different?” • if somebody disproved Codd’s laws; they didn’t tell me…..
How to Win the Turing Award Circa 2020 • 2000’s • XML data storage gets traction • 2010 • dust off Codd’s paper • Wait 10 years to be proven right
In Any Case • In line tags turn 1Tbyte of EMP data into 10 Tbytes of EMP data • Won’t store anything big in native XML • will use something else…. • like what?
OR DBMS • XML is merely this year’s data type • Next year it will be WML or … • and there will be a next year….
XMLSchema • Contains the kitchen sink • Complexity run amok • diarrhea from the SGML types • Includes lots of known hard stuff • e.g. union types
Xquery • Mostly syntactic sugar on OR SQL • // is a user-defined function in Informix OR engine • Try to keep the semantics close to OR SQL
Another History Lesson • Typical enterprise wanted data integration for business analysis badly • needed data in a variety of systems • in a variety of formats • often with no unique ids • often with incompatible semantics • 2 day delivery means lots of things • often dirty
ETL Warehouse Projects of the 90’s • Well into 8 digits • Usually a factor of three behind schedule • Delivering a factor of 3 less stuff • Everybody dented their pick on semantic heterogeneity • which is hard, hard, hard • and not solved by the blizzard of 3 letter acronyms from Redmond
Web Services • Will be a long time coming outside of simple domains (where there is no data integration to deal with) • E.g. catalog management • Grainger perspiration….
The Depressing State of Affairs • ~50-75% of IT projects fail • if we built bridges, our profession would be fired • and the same mistakes are repeated over and over (excessive ambition, rolling specs, bad design, failure to load a large data set early)
What To Do? • We typically don’t teach this stuff (and do a serious disservice to our students) • probably because we don’t (can’t) spend any time in industry to figure it out Action item: at the very least read a couple of Robert L. Glass’s books
The Depressing State of Affairs • Hardware “half-life” is 18 months • Software half-life is 18 years (or more)! • In 25 years we moved from • C to Java • SQL to Xquery
What To Do? • Much higher level design environments • vis • workflow • special purpose languages (report writers,…) • And stop turning down papers on this stuff
Grand Challenge • Improve application productivity (probability of success * programmer productivity) by 2 this decade