500 likes | 602 Views
XPipe - An XML Processing Methodology. XML 2001 Florida, USA Sean McGrath CTO Propylon. What is XPipe?. It is an architecture / methodology /framework for developing robust, scaleable, manageable XML processing systems. based on proven mechanical manufacturing techniques. Specifically:
E N D
XPipe - An XML Processing Methodology XML 2001 Florida, USA Sean McGrath CTO Propylon
What is XPipe? • It is an architecture / methodology /framework for developing robust, scaleable, manageable XML processing systems. • based on proven mechanical manufacturing techniques. Specifically: • The Assembly Line Principle • Component assembly and component re-use
What is XPipe • An open source project hosted on Sourceforge • http://xpipe.sourceforge.net • A contribution to the blossoming meme of using pipeline based processing to tame the burgeoning complexity of XML transformations • (If you do not find XML transformation complicated, you are not sufficiently well informed.) • (And no, XSLT does not solve all your problems) • A way of thinking about systems that focuses on information flows rather than APIs
Contents of this talk • The XPipe philosophy • Major functional elements • Some examples • Relationship to other technologies • The XGrid • Some anticipated objections (and answers) • Current status • Current problems • Future plans
XPipe Philosophy Cars Are complex, hierarchical structures Henry Ford’s Model T Ford Assembly Line – 1914
XPipe Philosophy Lunch is a complex, hierarchical structure Lunch Assembly Line – 2001
XPipe Philosophy We are complex, hierarchical structures
XPipe philosophy • What have these scenes got it common? • Complex construction of cars, tuna melts and tendons made possible and efficient through • assembly line manufacturing • re-usable component processes and component materials • Why not apply this approach to XML “manufacturing”?
XPipe philosophy • Why does the assembly line approach work? • Transformation task decomposition • Re-usable transformation components • Transformation decomposition is the key to complexity management. Just ask: • Henry Ford • Herbert Simon (The Two Watchmakers – “The Architecture of Complexity”) • George Miller (7+/-2) • Adam Smith (An Inquiry into the Nature And Causes of the Wealth of Nations,1776) • Any electrical or chemical engineer.
XPipe philosophy • Component re-use is the key to productivity • Ask any form of engineer (electrical, chemical etc.) apart from software engineers… • Component re-use remains a holy grail in software engineering • XPipe is yet another attempt…
XPipe philosophy • A lot of data processing will consist of XML to XML transformation • A lot of non-XML data processing can consist of XML to XML transformations with the addition of top and tail transformations • Mantra • Get data into XML as quickly as possible • Keep it in XML until the last possible minute • Bring all your XML tools to bear on solving the data processing problem
XPipe philosophy Input XML Output XML Top Transformation Tail Transformation Non-XML Input Non-XML Output
XPipe philosophy • The philosophy hinges on the fact that every complex XML transformation can be broken down into a series of smaller ones than can be chained together
XPipe philosophy • Only so many ways to re-arrange an XML tree structure • A finite number of fundamental transformations, from which all higher order transformations can be derived
XPipe philosophy • Transformation Decomposition leads to • a series of small, manageable, “stand alone” problems with an XML input “spec” and an XML output “spec”. • Can build, test, use and then re-use these transformation components • Very team development friendly • High cohesion, loose coupling – just like the professor advised
XPipe philosophy • Pipeline approach means you can mix ‘n’match black-box components that internally use whatever paradigm best suited the problem • Lexical • SAX • DOM • XSLT • XDuce, Pyxie, Haskell…
Sample XPipe DB /CMS Character Set Mods Add Doctype + validate + strip doctype Lexical Re-arrange Elements Validation Lexical DOM Stats + FTP Schematron/ RelaxNG/ Rhino SQL Replace Jython XHTML Generate Java XSLT
XPipe philosophy • Assertion : developers would use a component based approach to XML processing if they did not have to write the plumbing (orchestration, exception handling) themselves • “Gee, this problem is complex. Maybe I’ll do it in multiple stages! Gee, now I have to orchestrate the stages somehow. Batch files/shell scripts/driver program – all ugly and error prone. Maybe I’ll just write a single program after all…”
XPipe philosophy • “Professional developers spend 50 percent of their time writing plumbing” – Adam Bosworth • XPipe aims to look after the plumbing letting developers concentrate on the interesting stuff
Major Functional Elements – XComponents • Developed in any language that runs on the Java Virtual Machine (Jython, Java, XSLT, Rhino (JavaScript) etc.) • All XComponents are standalone programs of the form • [Name] [InputXML] [OutputXML] [ErrorXML]
Major Functional Elements - XComponents • XComponents described in XML form. An Xcomponent consists of: • Documentation • Unit Tests (input,output XML stream pairs) • Metadata for retrieval • Input and Output predicates – declarative (DTD/RelaxNG/Schema) or procedural (code)
Major Functional Elements – XComponent Unit Tester • Standalone program analogous to JUnit or PyUnit but for XML transformation component testing • Very outsource-friendly and “inbetweenable” approach (specify everything but the code == spec+doc+test harness all in one)
Major Functional Elements – XPipes • Described in XML • Consist of • Documentation • Input/Output Predicates (Schemas/Code) • Test Suite • References to XComponents which are resolved when the XPipe is installed
Major Functional Elements – XPipe Executive • Uniprocessor • XPipe executed on 1 machine, possibly with separate threads for each XComponent task • Multiprocessor • XML based protocol to implement “Job Shop” work distribution over a P2P network
Some related open technologies • | - Unix Pipes • SAX Filters • TRAX • XBeans • Cocoon • axKit • JXTA • Translets • TupleSpaces
Simple XComponent examples • Fundamental Operation – Rename Element • Rename • Input : <foo>baz</foo> • Output: <bar>baz</bar> foo bar baz baz
Simple XComponent examples • Fundamental Operation - Peel • Input : <foo><bar>baz</bar></foo> • Output: <foo>baz</foo> foo foo bar baz baz
Simple XComponent examples • Compound Operation - Matryoshka • Input: • <foo><bar>baz</bar></foo> • Output: • <foo></foo><bar></bar>baz foo bar foo bar baz baz
Simple Xcomponent examples • KlingonCloak • Input: • <foo><bar>baz</bar></foo> • Output: • <tag name=“foo”><tag name=“bar”>baz</tag></tag> foo tag type=“foo” bar tag type=“bar” baz baz
Sample Xcomponents • Once you start thinking in terms of Pipes – components appear everywhere: • Regular fragmentations • Doctype changer • namespace normalizer • Character set transcoder • Hash generator • RelaxNG/Schematron etc • A validator can be thought of as a component in an Xpipe that mirrors its input on its output
Validation as an XComponent XML A XML A’ RelaxNG Schematron Jython/Java/JACL XComponent Input Output Validation Log Error
The XGrid • Grid Technologies – computational power “on tap” (http://www.gridforum.org) • The XGrid – computational power “on tap” to execute XPipes
Some objections (with some answers) • It will be slow • No it won’t - Premature optimization is the root of all evil! • Speed is a three headed monster. I’m old enough to have left the X axis and currently heading for Y through Z The 3 Axes to Speed
Some objections (with some answers) • It will be slow (cont.) • Massive Parallelism will kill all von Neumann throughput arguments • Documents per second, not seconds per document • A myriad of “compile time” optimizations on XPipes possible • Keep the architecture simple – and speed will sort itself out
Some objections (with some answers) • Pipes are not rich enough, real data flows require graphs • Inside every graph is a collection of straight segments • Do the smallest thing than can possible work • XComponents can conditionally flow data in different directions – graph
Some objections (with some answers) • Component based software? Harumph! We have heard that one before… • XPipe is data flow based not API based (COM, VBX, CORBA). They payload is what is important – not the plumbing • Information integration (needed on the server side)– not application integration (needed on the client side)
Current Status • Schemas for XPipes and XComponents on xpipe.sourceforge.net. – feedback required • Sample components (Java/XSLT/Jython) and some documentation • Simple, illustrative XPipe uniprocessor executives • Draft of XJCL – XGrid Job Control Language
Current Status • Uniprocessor XPipe used to develop • 80-C pipe from Hub notation for a complex document type to a legacy mainframe display notation. 120 page spec. • 20-C pipe for semantic validation of legislation documents • Xpipe and XComponent validators
Current Problems • Everybody agrees that an XML document is a tree but: • The content and structure of the tree depends on the parser • The content and structure of re-generated XML (The round-tripping problem)
Current Problems • Naming things • Taxonomy of XTLs (XML Transformation Languages) • Taxonomy of re-usable XComponents and XPipes
Current Problems • Flexible transformation scheduling is hard • Optimal transformation scheduling is very hard • Packaging
Future Plans • Evangelize the idea that DTD validated XML 1.0 is just Well Formed XML that has been through a pipe consisting of: • A transclusion component (entity expansion) • A macro pre-processor (conditional marked sections) • An attribute decorator (implied/fixed attributes) • A grammar checker • …
Well Formed XML Valid XML Paremeter Entity Expansion Conditional Sections General Entity Expansion Attribute Decoration Grammer Validation Valid XML
Future Plans • XPipes and XComponents as web services (SOAP/XML-RPC, UDDI etc.) • Getting the P2P and Grid Technology communities input into XGrid. • Getting help to develop the XPipe reference implementation on Sourceforge
Future Plans • Development of commercial implementations of XPipe integrated with leading EAI systems (Ongoing) • Use of SCADA tools to develop XPipe process control and monitoring systems
Future Plans • Use of Animation Engineering techniques for CAXTE tools (Computer Aided XML Transformation Engineering) • Digging around hierarchy theory, self-assembly, bio-informatics and nanofabrication for concepts and tools applicable to XML transformations
In conclusion • XPipe is simple • Simplicity works! • Plenty of evidence outside of XML engineering that this approach will work • Plenty of lore and tools from other fields of science can be brought to bear to build systems using the XPipe approach
Thank you • http://xpipe.sourceforge.net