1 / 50

XPipe - An XML Processing Methodology

XPipe - An XML Processing Methodology. XML 2001 Florida, USA Sean McGrath CTO Propylon. What is XPipe?. It is an architecture / methodology /framework for developing robust, scaleable, manageable XML processing systems. based on proven mechanical manufacturing techniques. Specifically:

kizzy
Download Presentation

XPipe - An XML Processing Methodology

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. XPipe - An XML Processing Methodology XML 2001 Florida, USA Sean McGrath CTO Propylon

  2. What is XPipe? • It is an architecture / methodology /framework for developing robust, scaleable, manageable XML processing systems. • based on proven mechanical manufacturing techniques. Specifically: • The Assembly Line Principle • Component assembly and component re-use

  3. What is XPipe • An open source project hosted on Sourceforge • http://xpipe.sourceforge.net • A contribution to the blossoming meme of using pipeline based processing to tame the burgeoning complexity of XML transformations • (If you do not find XML transformation complicated, you are not sufficiently well informed.) • (And no, XSLT does not solve all your problems) • A way of thinking about systems that focuses on information flows rather than APIs

  4. Contents of this talk • The XPipe philosophy • Major functional elements • Some examples • Relationship to other technologies • The XGrid • Some anticipated objections (and answers) • Current status • Current problems • Future plans

  5. XPipe Philosophy Cars Are complex, hierarchical structures Henry Ford’s Model T Ford Assembly Line – 1914

  6. XPipe Philosophy Lunch is a complex, hierarchical structure Lunch Assembly Line – 2001

  7. XPipe Philosophy We are complex, hierarchical structures

  8. XPipe philosophy • What have these scenes got it common? • Complex construction of cars, tuna melts and tendons made possible and efficient through • assembly line manufacturing • re-usable component processes and component materials • Why not apply this approach to XML “manufacturing”?

  9. XPipe philosophy • Why does the assembly line approach work? • Transformation task decomposition • Re-usable transformation components • Transformation decomposition is the key to complexity management. Just ask: • Henry Ford • Herbert Simon (The Two Watchmakers – “The Architecture of Complexity”) • George Miller (7+/-2) • Adam Smith (An Inquiry into the Nature And Causes of the Wealth of Nations,1776) • Any electrical or chemical engineer.

  10. XPipe philosophy • Component re-use is the key to productivity • Ask any form of engineer (electrical, chemical etc.) apart from software engineers… • Component re-use remains a holy grail in software engineering • XPipe is yet another attempt…

  11. XPipe philosophy • A lot of data processing will consist of XML to XML transformation • A lot of non-XML data processing can consist of XML to XML transformations with the addition of top and tail transformations • Mantra • Get data into XML as quickly as possible • Keep it in XML until the last possible minute • Bring all your XML tools to bear on solving the data processing problem

  12. XPipe philosophy Input XML Output XML Top Transformation Tail Transformation Non-XML Input Non-XML Output

  13. XPipe philosophy • The philosophy hinges on the fact that every complex XML transformation can be broken down into a series of smaller ones than can be chained together

  14. XPipe philosophy • Only so many ways to re-arrange an XML tree structure • A finite number of fundamental transformations, from which all higher order transformations can be derived

  15. XPipe philosophy • Transformation Decomposition leads to • a series of small, manageable, “stand alone” problems with an XML input “spec” and an XML output “spec”. • Can build, test, use and then re-use these transformation components • Very team development friendly • High cohesion, loose coupling – just like the professor advised

  16. XPipe philosophy • Pipeline approach means you can mix ‘n’match black-box components that internally use whatever paradigm best suited the problem • Lexical • SAX • DOM • XSLT • XDuce, Pyxie, Haskell…

  17. Sample XPipe DB /CMS Character Set Mods Add Doctype + validate + strip doctype Lexical Re-arrange Elements Validation Lexical DOM Stats + FTP Schematron/ RelaxNG/ Rhino SQL Replace Jython XHTML Generate Java XSLT

  18. XPipe philosophy • Assertion : developers would use a component based approach to XML processing if they did not have to write the plumbing (orchestration, exception handling) themselves • “Gee, this problem is complex. Maybe I’ll do it in multiple stages! Gee, now I have to orchestrate the stages somehow. Batch files/shell scripts/driver program – all ugly and error prone. Maybe I’ll just write a single program after all…”

  19. XPipe philosophy • “Professional developers spend 50 percent of their time writing plumbing” – Adam Bosworth • XPipe aims to look after the plumbing letting developers concentrate on the interesting stuff

  20. Major Functional Elements – XComponents • Developed in any language that runs on the Java Virtual Machine (Jython, Java, XSLT, Rhino (JavaScript) etc.) • All XComponents are standalone programs of the form • [Name] [InputXML] [OutputXML] [ErrorXML]

  21. Major Functional Elements - XComponents • XComponents described in XML form. An Xcomponent consists of: • Documentation • Unit Tests (input,output XML stream pairs) • Metadata for retrieval • Input and Output predicates – declarative (DTD/RelaxNG/Schema) or procedural (code)

  22. Major Functional Elements – XComponent Unit Tester • Standalone program analogous to JUnit or PyUnit but for XML transformation component testing • Very outsource-friendly and “inbetweenable” approach (specify everything but the code == spec+doc+test harness all in one)

  23. Major Functional Elements – XPipes • Described in XML • Consist of • Documentation • Input/Output Predicates (Schemas/Code) • Test Suite • References to XComponents which are resolved when the XPipe is installed

  24. Major Functional Elements – XPipe Executive • Uniprocessor • XPipe executed on 1 machine, possibly with separate threads for each XComponent task • Multiprocessor • XML based protocol to implement “Job Shop” work distribution over a P2P network

  25. Major Functional Elements – XPipe Monitor

  26. Some related open technologies • | - Unix Pipes • SAX Filters • TRAX • XBeans • Cocoon • axKit • JXTA • Translets • TupleSpaces

  27. Simple XComponent examples • Fundamental Operation – Rename Element • Rename • Input : <foo>baz</foo> • Output: <bar>baz</bar> foo bar baz baz

  28. Simple XComponent examples • Fundamental Operation - Peel • Input : <foo><bar>baz</bar></foo> • Output: <foo>baz</foo> foo foo bar baz baz

  29. Simple XComponent examples • Compound Operation - Matryoshka • Input: • <foo><bar>baz</bar></foo> • Output: • <foo></foo><bar></bar>baz foo bar foo bar baz baz

  30. Simple Xcomponent examples • KlingonCloak • Input: • <foo><bar>baz</bar></foo> • Output: • <tag name=“foo”><tag name=“bar”>baz</tag></tag> foo tag type=“foo” bar tag type=“bar” baz baz

  31. Sample Xcomponents • Once you start thinking in terms of Pipes – components appear everywhere: • Regular fragmentations • Doctype changer • namespace normalizer • Character set transcoder • Hash generator • RelaxNG/Schematron etc • A validator can be thought of as a component in an Xpipe that mirrors its input on its output

  32. Validation as an XComponent XML A XML A’ RelaxNG Schematron Jython/Java/JACL XComponent Input Output Validation Log Error

  33. The XGrid • Grid Technologies – computational power “on tap” (http://www.gridforum.org) • The XGrid – computational power “on tap” to execute XPipes

  34. The XGrid

  35. Some objections (with some answers) • It will be slow • No it won’t - Premature optimization is the root of all evil! • Speed is a three headed monster. I’m old enough to have left the X axis and currently heading for Y through Z The 3 Axes to Speed

  36. Some objections (with some answers) • It will be slow (cont.) • Massive Parallelism will kill all von Neumann throughput arguments • Documents per second, not seconds per document • A myriad of “compile time” optimizations on XPipes possible • Keep the architecture simple – and speed will sort itself out

  37. Some objections (with some answers) • Pipes are not rich enough, real data flows require graphs • Inside every graph is a collection of straight segments • Do the smallest thing than can possible work • XComponents can conditionally flow data in different directions – graph

  38. Some objections (with some answers) • Component based software? Harumph! We have heard that one before… • XPipe is data flow based not API based (COM, VBX, CORBA). They payload is what is important – not the plumbing • Information integration (needed on the server side)– not application integration (needed on the client side)

  39. Current Status • Schemas for XPipes and XComponents on xpipe.sourceforge.net. – feedback required • Sample components (Java/XSLT/Jython) and some documentation • Simple, illustrative XPipe uniprocessor executives • Draft of XJCL – XGrid Job Control Language

  40. Current Status • Uniprocessor XPipe used to develop • 80-C pipe from Hub notation for a complex document type to a legacy mainframe display notation. 120 page spec. • 20-C pipe for semantic validation of legislation documents • Xpipe and XComponent validators

  41. Current Problems • Everybody agrees that an XML document is a tree but: • The content and structure of the tree depends on the parser • The content and structure of re-generated XML (The round-tripping problem)

  42. Current Problems • Naming things • Taxonomy of XTLs (XML Transformation Languages) • Taxonomy of re-usable XComponents and XPipes

  43. Current Problems • Flexible transformation scheduling is hard • Optimal transformation scheduling is very hard • Packaging

  44. Future Plans • Evangelize the idea that DTD validated XML 1.0 is just Well Formed XML that has been through a pipe consisting of: • A transclusion component (entity expansion) • A macro pre-processor (conditional marked sections) • An attribute decorator (implied/fixed attributes) • A grammar checker • …

  45. Well Formed XML Valid XML Paremeter Entity Expansion Conditional Sections General Entity Expansion Attribute Decoration Grammer Validation Valid XML

  46. Future Plans • XPipes and XComponents as web services (SOAP/XML-RPC, UDDI etc.) • Getting the P2P and Grid Technology communities input into XGrid. • Getting help to develop the XPipe reference implementation on Sourceforge

  47. Future Plans • Development of commercial implementations of XPipe integrated with leading EAI systems (Ongoing) • Use of SCADA tools to develop XPipe process control and monitoring systems

  48. Future Plans • Use of Animation Engineering techniques for CAXTE tools (Computer Aided XML Transformation Engineering) • Digging around hierarchy theory, self-assembly, bio-informatics and nanofabrication for concepts and tools applicable to XML transformations

  49. In conclusion • XPipe is simple • Simplicity works! • Plenty of evidence outside of XML engineering that this approach will work • Plenty of lore and tools from other fields of science can be brought to bear to build systems using the XPipe approach

  50. Thank you • http://xpipe.sourceforge.net

More Related