810 likes | 969 Views
XPipe - An XML Processing Methodology. XML SIG, NY USA Feb 12, 2002 Sean McGrath CTO Propylon. What is XPipe?. It is an architecture / methodology /framework for developing robust, scaleable, manageable XML processing systems.
E N D
XPipe - An XML Processing Methodology XML SIG, NY USA Feb 12, 2002 Sean McGrath CTO Propylon
What is XPipe? • It is an architecture / methodology /framework for developing robust, scaleable, manageable XML processing systems. • based on proven mechanical manufacturing techniques. Specifically: • The Assembly Line Principle • Component assembly and component re-use
What is XPipe? • An open source project hosted on Sourceforge • http://xpipe.sourceforge.net • A contribution to the blossoming meme of using pipeline based processing to tame the burgeoning complexity of XML transformations • (If you do not find XML transformation complicated, you are not sufficiently well informed.) • (And no, XSLT does not solve all your problems)
What is XPipe? • A way of thinking about systems that focuses on structured dataflows rather than Object APIs • It is also: • A Scandinavian sewage treatment technology • An exhaust pipe system for high performance engines • A VT100 based strategy game for DECs VAX/VMS Operating System
Contents of this talk • The XPipe philosophy • Major functional elements • Some examples • The XGrid and Commoditized XML Processing • Some anticipated objections (and answers) • Relationship to other technologies
Contents of this talk • Current status • Current problems • Future plans • Some (contentious) musings • Something cold to drink
XPipe Philosophy • XML is all about (potentially) complex, hierarchical data structures
XPipe Philosophy Cars are complex, hierarchical structures Henry Ford’s Model T Ford Assembly Line – 1914
XPipe Philosophy Lunch is a complex, hierarchical structure Lunch Assembly Line. NY, 2002
XPipe Philosophy We are complex, hierarchical structures
XPipe philosophy • What have these scenes got it common? • Complex construction of cars, tuna melts and tendons made possible and efficient through • assembly line manufacturing • re-usable component processes and component materials • Why not apply this approach to XML “manufacturing”?
XPipe philosophy • Why does the assembly line approach work? • Transformation task decomposition • Re-usable transformation components • Transformation decomposition is the key to complexity management. Just ask: • Henry Ford • Herbert Simon (The Two Watchmakers – “The Architecture of Complexity”) • George Miller (7+/-2) • Adam Smith (An Inquiry into the Nature And Causes of the Wealth of Nations,1776) • Any electrical or chemical engineer.
XPipe philosophy • Component re-use is the key to productivity • Ask any form of engineer (electrical, chemical etc.) apart from software engineers… • Component re-use remains a holy grail in software engineering • XPipe is yet another attempt…
XPipe philosophy • A lot of data processing for the forseable future will consist of XML to XML transformation • A lot of non-XML data processing can consist of XML to XML transformations with the addition of top and tail transformations • Mantra • Get data into XML as quickly as possible • Keep it in XML until the last possible minute • Bring all your XML tools to bear on solving the data processing problem
XPipe philosophy Input XML Output XML Top Transformation Tail Transformation Non-XML Input Non-XML Output
XPipe philosophy • The philosophy hinges on the fact that every complex XML transformation can be broken down into a series of smaller ones than can be chained together
XPipe philosophy • Only so many ways to re-arrange an XML tree structure • A finite number of fundamental transformations, from which all higher order transformations can be derived
XPipe philosophy • Transformation Decomposition leads to • a series of small, manageable, “stand alone” problems with an XML input “spec” and an XML output “spec”. • Can build, test, use and then re-use these transformation components • Very team development friendly • High cohesion, loose coupling – just like the professor advised
XPipe philosophy • Pipeline approach means you can mix ‘n’match black-box components that internally use whatever paradigm best suited the problem • Lexical • SAX • DOM • XSLT • XDuce, Pyxie, Haskell, AF-NG…
Sample XPipe DB /CMS Character Set Mods Add Doctype + validate + strip doctype Lexical Re-arrange Elements Validation Lexical DOM Stats + FTP Schematron/ RelaxNG/ Rhino SQL Replace Jython XHTML Generate Java XSLT
XPipe philosophy • Assertion : developers would use a component based approach to XML processing if they did not have to write the plumbing (orchestration, exception handling) themselves • “Gee, this problem is complex. Maybe I’ll do it in multiple stages! Gee, now I have to orchestrate the stages somehow. Batch files/shell scripts/driver program – all ugly and error prone. Maybe I’ll just write a single program after all…”
XPipe philosophy • “Professional developers spend 50 percent of their time writing plumbing” – Adam Bosworth • XPipe aims to look after the plumbing letting developers concentrate on the interesting stuff
Philosophy Summary • Preambles • Make things as complex as necessary but not more complex than necessary • Solve all the worlds problems – but only one at a time • Don’t even think about performance until it is too late – then it will look after itself • Only increase complexity linearly w.r.t. functionality and only in “elevator pitch sized” functionality quanta
Philosophy Summary – 1#2 • Data processing == data transformation w.r.t. time. • XML is the current runaway winner in the self-descriptive data stakes and a very good QDDL (Quiescent Data Description Language)
Philosophy summary – 2#2 • Inside every complex XML transformation is a sequence of simpler XML transformations trying to get out – a Pipe • Decomposed transformation = new transformations + already componentized transformations -> Component Reuse • Inside every graph transformation (read “workflow” or “business process model”) is a combination of simple Pipes trying to get out
Leveled architetecture – levels build on one another but any level is usable independently of higher levels XPipe Philosophy Out Level 2 - XRigs In Out Level 1 - XPipes In Out Level 0 - XComponents In Out
Major Functional Elements – XComponents In Out • Developed in any language that runs on the Java Virtual Machine (Jython, Java, XSLT, Rhino (JavaScript) etc.) • All XComponents are standalone programs of the form • [Name] [InputXML] [OutputXML] [ErrorXML] [Optional Args]
Major Functional Elements - XComponents • XComponents described in XML form. An XComponent consists of: • Metadata (keywords etc.) • Documentation • Pre and Post Conditions • Unit Tests (input,output XML stream pairs + Pre/Post Conditions) • Code (Java / Jython / XSLT / Exec)
Major Functional Elements – XPipes In Out • A linear assembly of XComponents that together achieve some useful transformation function • Described in XML • Documentation • Metadata (keywords etc.) • Pre/Post conditions • Unit Tests (input,output XML stream pairs + Pre/Post Conditions) • References to XComponents (URIs) which are resolved when the XPipe is installed/executed
Major Functional Elements – XRigs Out In In Out • An assembly of XPipes that together achieve some useful transformation function • Described in XML • Documentation • Metadata (keywords etc.) • Pre/Post conditions • Unit Tests (input,output XML stream pairs + Pre/Post Conditions) • References to XPipes (URIs) which are resolved when the XRig is installed/executed
Major Functional Elements • Unit Testers • XComponent, XPipe and XRig level Test Harnesses • Executives • XComponent, XPipe and XRig level Execution Environments (on-the-fly, disk install, compiled, web service…) • (Executing an Xcomponent is identical to executing an XPipe of arity 1, is identical to executing an XRig of arity 1…)
Major Functional Elements • Executives • Uniprocessor Execution • Executed on 1 CPU, possibly with separate threads for each instantiated X* • Multiprocessor Execution (Vapor) • XML based protocol to implement “Job Shop” work distribution over a P2P network (XJCL)
Major Functionality Elements – Miscellany (Vapor) • Whizzy GUI Component and Pipe Editors • XComponent Creators • “Wrap” Java, XSLT etc. into XComponent compliant XML, Ant build target • XComponent Proxies – “pretend” to be a simple XComponent but invoke some external functionality – from Windows DLL to SOAP end-point • XPipe masquerading as XComponent – this could be a very powerful paradigm
Major Functionality Elements – Miscellany (Vapor) • Compilers / Packers • Pack XPipes/XRigs into standalone XPipes/XRigs for distribution (with or without an executive) • Compile pure XSLT XPipe into a self contained translet (self contained or as an XComponent) • “Compile away”/optimize intermediate files via a variety of tricks (Jackson Inversion, Java IO hook, shadow marshalling etc.)
Simple XComponent examples • Fundamental Operation – Rename Element • Rename • Input : <foo>baz</foo> • Output: <bar>baz</bar> foo bar baz baz
Simple XComponent examples • Fundamental Operation - Peel • Input : <foo><bar>baz</bar></foo> • Output: <foo>baz</foo> foo foo bar baz baz
Simple XComponent examples • Compound Operation - Matryoshka • Input: • <foo><bar>baz</bar></foo> • Output: • <foo></foo><bar></bar>baz foo bar foo bar baz baz
Simple XComponent examples • KlingonCloak • Input: • <foo><bar>baz</bar></foo> • Output: • <tag name=“foo”><tag name=“bar”>baz</tag></tag> foo tag type=“foo” bar tag type=“bar” baz baz
Sample XComponents • Once you start thinking in terms of Pipes – components appear everywhere: • Regular fragmentations • Doctype changer • Namespace normalizer • Character set transcoder • Hash generator • Architectural Forms • RelaxNG/Schematron etc • A validator can be thought of as a component in an XPipe that mirrors its input on its output
Sample XComponents • Reading a file is an XML to XML transformation • <file>lewisscarrol.xml</file> • <poem><line>Twas brillig, and the slithy tomes, did gyre and gimbal in the wave</line>…</poem>
Sample XComponents • Arithmetic is an XML to XML transformation • <expr>1 + 2</expr> • <res>3</res>
Sample XComponents • Unix pipe utilities e.g. tr • hello world • HELLO WORLD
Sample XComponents • Conditionals are XML to XML transformation “tee junctions” triggered by XPaths if XPath TRUE branch In if XPath if XPath FALSE branch
Validation as an XComponent XML A XML A’ RelaxNG Schematron Jython/Java/JACL XComponent Input Output Validation Log Error
Some related open technologies • | - Unix Pipes • SAX Filters • TRAX • XBeans • Cocoon • axKit • Ant • JXTA • Translets • TupleSpaces
The XGrid • Grid Technologies – computational power “on tap” (http://www.gridforum.org) • The XGrid – computational power “on tap” to execute XPipes/XRigs
The XGrid Out In Out DMZ
Some objections (with some answers) • It will be slow • No it won’t - Premature optimization is the root of all evil! • Speed is a three headed monster. I’m old enough to have left the X axis and currently heading for Y through Z The 3 Axes to Speed
Some objections (with some answers) • It will be slow (cont.) • Massive Parallelism will kill all von Neumann throughput arguments • Documents per second, not seconds per document – throughput is the true measure of XML processing speed • Document fulcra – Locality of reference (Denning) applies to XML processing (more on this later) • A myriad of “compile time” optimizations on XPipes possible • Keep the architecture simple – and speed will sort itself out