360 likes | 561 Views
Enterprise (Data) Architecture. Christian Nentwich Model Two Zero. Caveat Emptor. This is an introductory overview, a detailed treatment would take at least a whole term! It’s fiendlishly complex – nobody has a satisfactory answer. About Model Two Zero.
E N D
Enterprise (Data) Architecture Christian Nentwich Model Two Zero
Caveat Emptor This is an introductory overview, a detailed treatment would take at least a whole term! It’s fiendlishly complex – nobody has a satisfactory answer
About Model Two Zero • Software company based near “Silicon Roundabout” (Old Street, London) • Mission: help large businesses establish the next generation of enterprise data architectures • Innovators in: • Executable specifications • Code generation from (restricted) natural language • High-volume document matching
About Me • BSc and PhD in Computer Science from University College London • Never held down a “proper job”... • Founded two startups – Model Two Zero is number 2! • Long involvement in financial services IT • Strategic data architecture consultancy for many big banks, fund managers, clearing houses • Pushing forward industry-wide data standards
Motivation • Large companies build or buy new systems all the time • Nobody likes monolithic systems, but components need to be assembled Batch load or real-time query Customer Records Invoicing System Batch load Real-time message Order Management System
Motivation • Companies also buy other companies • Consolidation requirements Company B Accounting System Company A Accounting System Joint Accounting System
Format Diversity • Relational databases • APIs • Java • .Net • Mainframe • File formats • Comma-separator • Fixed width • XML • COBOL mainframe data files • Tag/Value files • Proprietary binary • Message formats • XML • JSON • Proprietary queueing systems • Taking a consistent approach is a challenge
Complexities • Domain complexity • Architecture diversity • Volume of data • Complex, often out of date business processes • Key person dependencies • Lost business knowledge • Time pressure
ESB – Key Idea • From this:
ESB – Key Idea • To this: Common Bus
Zoomed in View Common Bus Standardised Data Format Adapter Proprietary Data Format Proprietary System
Calculation • Bilateral connections require O(n2) integration projects for n systems • ESB with standard formats and adapters requires 2n = O(n) integration projects • This calculation does not always work out in practice! • Why?
Key Disciplines • Data Modelling • The data is more important than any execution technology! • Need to define a “standard language” for interchanging data on the bus • Very frequently XML and XML Schema • Sometimes JSON, but schema definition is problematic • Governance • Curation of the standard • Versioning (one of the hardest problems in data architecture) • Regression testing and rollout management
Quick Comparison • UML • Fits lots of information on one page • Understood by relatively technically unsophisticated users • A huge standard, contains more than is necessary for the task • Has no “wire format” • XML Schema • Technically complex • Comes with a wire format – XML • Liked by developers • Fairly big standard, contains more than is necessary for the task
Governance • Central team • Distributed teams (open source style) Change requests Changes Standard Model Standard Model
Versioning • Versioning of standard data formats needs to deal with two important scenarios: • Backward compatibility: a new version of a connecting system is able to read/write an old version of the standard • Forward compatibility: an old version of a connecting system is able to read/write a newer version of the standard • Careful not to confuse them – most people do
Versioning Example produces System A validates <customerAccount> <firstName>Christian</firstName> <lastName>Nentwich</lastName> <address1>Somewhere</address1> <address2>in</address2> <address3>London</address3> <postcode>ABC CDE</postcode> </customerAccount> V1.xsd consumes System B
Versioning Example produces System A validates <customerAccount> <firstName>Christian</firstName> <lastName>Nentwich</lastName> <dateOfBirth>1977-01-01</dateOfBirth> <address1>Somewhere</address1> <address2>in</address2> <address3>London</address3> <postcode>ABC CDE</postcode> </customerAccount> V2.xsd consumes System B
Changes • Adding optional elements • Leaves senders unaffected • Breaks receiver forward compatibility if receivers cannot deal with unknown elements • Adding mandatory elements • Breaks senders • Breaks receivers • Removing mandatory elements • Breaks senders • Breaks receivers • Removing optional elements • Breaks senders • Leaves receivers unaffected
Service-Oriented Architecture • Establish a landscape of coarse grain business services • Establish service wrappers around applications • Build “value add” applications by composing / orchestrating services Reservation Service Customer Management Service Flight booking orchestration
Service-Oriented Architecture WSDL Wrapper REST Wrapper Reservation Service Customer Management Service Flight booking Orchestration (BPEL)
Event-Driven Architecture • Build an architecture build on business eventsrather than data • Components of the architecture subscribe for updates and listen for events they are interested in (sources and sinks model) From this: <trade> <id>abc</id> <amount>500000.00</amount> <ccy>GBP</ccy> <ticker>IBM</ticker> </trade> To this: <tradeEntered> <id>abc</id> <amount>500000.00</amount> <ccy>GBP</ccy> <ticker>IBM</ticker> </tradeEntered>
Event Modelling <cancellation> <id>abc</id> </cancellation> <tradeEntered> <id>abc</id> <amount>500000.00</amount> <ccy>GBP</ccy> <ticker>IBM</ticker> </tradeEntered> <amendment> <id>abc</id> <amount>400000.00</amount> </amendment>
Event-Driven Architecture • Discipline: send only data required for an event • Model things that actually happen in the business • Let participants determine how to react to events • No central planning • Scalability benefits
Semantic Integration • Instead of “syntactic” manipulation of data, specify the full meaning of data in each system • Relate data items to an overarching “ontology” • Infer integration automatically • Immature discipline – many issues to solve Trade Date Field9234B TrdDt StartDate
Epilogue: What we do Instead of writing informal specifi- cations and then code, create executable specifications.
What we do – Executable Specifications Java publicvoid validate(Transaction transaction) { if (transaction.getDebit() != null && transaction.getCredit() != null) { if (transaction.getDebit().getValue() == transaction.getCredit() .getValue() && getChangeInAmount() != null) { exceptions.add(newValidationError( "Must not specify changeInAmount if a debit is equal to a credit")); } } } NRL Context: Transaction Validation Rule"Our Sample Rule" Ifa debit ispresentanda credit ispresentthen nochangeInAmountispresent Report'Must not specify changeInAmount if a debit is equal to a credit' OCL context: Transaction inv: (self->debit = self->credit) implies (self->changeInAmount->empty()) Schematron <rule context="ns:transaction"> <assert test="ns:debit != ns:credit or not(ns:changeInAmount)"> Must not specify changeInAmount if a debit is equal to a credit </assert> </rule>
Natural Rule Language • An open language for expressing: • Validation rules (constraints) • Action rules (e.g. Enrichment rules) • Transformation / mapping • Aimed at the core problem areas in integration • Goals • Read like an English sentence wherever possible • Require no customisation to get going • Offer symbolic and textual alternatives • Specification: http://nrl.sourceforge.net
NRL Parser • The NRL parser is free and open source, also on sourceforge • Designed for processing / code generation 1. Text in NRL concrete syntax 2. Parser-generated AST ConstraintRuleDeclaration IfThenStatement 3. Code generators (from AST) ExistsStatement ExistsStatement
Possible Projects • Propose solutions to the model versioning / XML schema versioning problem • Classify permissible changes in detail • Specify a formal logic for proving the impact of changes, and establishing policies • Review W3C approaches to the problem and contrast
Technologies to Review • Data standards / formats • XML / XML Schema • JSON • Semantic standards • RDF • OWL • SBVR • NRL • Open source frameworks • Spring integration • Mule ESB • Messaging • AMQP • SOAP / XML RPC • JMS • Useful • REST • XPath • Xquery • Schematron • JAXB • JIBX • SAX / DOM • Python format modules