820 likes | 948 Views
Data on the Outside versus Data on the Inside. Pat Helland Partner Architect Microsoft Corporation. Session Objectives And Takeaways. Understand the use and interpretation of data It is different inside services than outside services Many new issues arise in the world between services
E N D
Data on the OutsideversusData on the Inside Pat Helland Partner Architect Microsoft Corporation
Session Objectives And Takeaways • Understand the use and interpretation of data • It is different inside services than outside services • Many new issues arise in the world between services • Versioning, meta-data (schema), immutability, idempotence, and much more • Think carefully about the interactions across services • XML, SQL, and Objects all have their place • Understand their strengths and weaknesses and use them together to create the best services Slide 2
Introduction: The Shift Towards Services • Behavior: Encapsulation and Trust • Data: Then and Now • Outside Data: Reference Data • Outside Data: Sending Messages • Outside Data: XML and Schema • Outside Data: Commonality of Schema and Understanding • Inside Data • Inside/Outside: Representations of Data • Inside/Outside: Tying It All Together • Conclusion Outline Slide 3
Introduction: The Shift Towards Services • Behavior: Encapsulation and Trust • Data: Then and Now • Outside Data: Reference Data • Outside Data: Sending Messages • Outside Data: XML and Schema • Outside Data: Commonality of Schema and Understanding • Inside Data • Inside/Outside: Representations of Data • Inside/Outside: Tying It All Together • Conclusion Outline Slide 4
Service Policy Schema and Contract Service-Oriented Architecture • Service-orientation • Independent services • Chunks of code and data • Interconnected via messaging • Four basic tenets: • Boundaries are explicit • Services are autonomous • Services share schema and contract • Not implementation • Service compatibility is based on policy Slide 5
Service-A Service-B Services Communicate With Messages • Services communicate with messages • Nothing else • No other knowledge about partner • May be heterogeneous Slide 6
Data MSG MSG SQL Data Outside the Service Data Inside the Service Data Inside and Outside Services • Data is different inside from outside • Outside the service • Passed in messages • Understood by sender and receiver • Independent schema definition important • Extensibility important • Inside the service • Private to service • Encapsulated by service code Slide 7
Introduction: The Shift Towards Services • Behavior: Encapsulation and Trust • Data: Then and Now • Outside Data: Reference Data • Outside Data: Sending Messages • Outside Data: XML and Schema • Outside Data: Commonality of Schema and Understanding • Inside Data • Inside/Outside: Representations of Data • Inside/Outside: Tying It All Together • Conclusion Outline Slide 8
Service • Things I’ll Do for Outsiders • Deposit • Withdrawal • Transfer • Account Balance Check Bounding Trust via Encapsulation • Services only do limited things for their partners • This is how they bound their trust • Encapsulation is about bounding trust • Business logic ensures only the desired operations happen • No changes to the data occur except through locally controlled business logic! Slide 9
Sanitized Datafor Export Data Exported Data PrivateInternalData Business Request Encapsulating Both Change and Reads • Encapsulating change • Ensures integrity of the service’s work • Ensures integrity of the service’s data • Encapsulating exported data for read • Ensures privacy by controlling what’s exported • Allows planning for loose coupling and expirations • E.G. Wednesday’s price-list
Trust and Transactions • Some propose atomic transactions across services • E.G. WS-Transactions • Requires holding locks • Lots of trust in timely unlock • Doesn’t sound autonomous and independent to me… • Debate is the definition of the word service • Requires autonomy and independence? • Allows intimacy across service boundaries? • There will be code connected by 2-phase commit • Same service or in different services? • For this talk, I presume no cross-service txs • Simply the definition of the word “service” Slide 11
Service Contract One Of Tentative Place-Order Accept-Order One Of Reject-Order Confirm Place-Order Cancel Place-Order Interconnecting with Independent Services • Services are connected by messaging • The only interaction between two services is by the messages that they exchange • Schema: the formats of the individual messages • Contracts :the allowable sequences of messages Slide 12
Service Deposit Operands Operator Operators and Operands • Messages contain operators • Requests a business operation • Operators provide business semantics • Part of the contract between the two services • Operator messages contain operands • Details needed to do the business operation • The sending service must put them into the message Slide 13
Where Do Operands Come From? • Operands come from reference data • New kind of data in SOA • Except it’s not new; we’ve done variations of SOA for decades… • We’re just getting better at it! • Reference data is versioned and each version is immutable • Immutable images are shared across many services • We will talk about the creation, publication, and management of reference data Slide 14
Introduction: The Shift Towards Services • Behavior: Encapsulation and Trust • Data: Then and Now • Outside Data: Reference Data • Outside Data: Sending Messages • Outside Data: XML and Schema • Outside Data: Commonality of Schema and Understanding • Inside Data • Inside/Outside: Representations of Data • Inside/Outside: Tying It All Together • Conclusion Outline Slide 15
Transactions and Inside Data • Transactions make you feel alone • No one else manipulates the data when you are • Transactional serializability • The behavior is as if a serial order exists Slide 16
Life in the “Now” • Transactions live in the “now” inside services • Time marches forward • Transactions commit • Advancing time • Transactions see the committed transactions • A service’s biz-logic lives in the “now” Slide 17
Sending Unlocked Data Isn’t “Now” • Messages contain unlocked data • Assume no shared transactions • Unlocked data may change • Unlocking it allows change • Messages are not from the “now” • They are from the past • There is no simultaneity at a distance! • Similar to speed of light • Knowledge travels at speed of light • By the time you see a distant object it may have changed! • By the time you see a message, the data may have changed! • Services, transactions, and locks bound simultaneity! • Inside a transaction, things appear simultaneous (to others) • Simultaneity only inside a transaction! • Simultaneity only inside a service! Slide 18
Outside Data: a Blast from the Past • All data from distant stars is from the past • 10 light years away; 10 year old knowledge • The sun may have blown up 5 minutes ago • We won’t know for 3 minutes more… • All data seen from a distant service is from the “past” • By the time you see it, it has been unlocked and may change • Each service has its own perspective • Inside data is “now”; outside data is “past” • My inside is not your inside; my outside is not your outside • Going to SOA is like going from Newtonian to Einstonian physics • Newton’s time marched forward uniformly • Instant knowledge • Before SOA, distributed computing many systems look like one • RPC, 2-phase commit, remote method calls… • In Einstein’s world, everything is “relative” to one’s perspective • SOA has “now” inside and the “past” arriving in messages Slide 19
Versioned Images of a Single Source • A sequence of versions describing changes to data • Updates fromone service • Owner controlled • Owner changes the data • Sends changes as messages • Data is seenas advancingversions Slide 20
Operators: Hope for the Future • Messages may contain operators • Requests for business functionality part of the contract • Service-B sends an operator to Service-A • If Service-A accepts the operator, it is part of its future • It changes the state ofService-A • Service-B is hopeful • It wants Service-A to dothe work • When it receives a reply,its future is changed! Slide 21
Operands: Past and Future • Operands may live in the past • Values published as reference data • Come from Service-A’s past • Operands may live in the future • They may contain a proposed value submitted to Service-A Slide 22
Between Services: Life in the “Then” • Everything between services lives in the past or future • Operators live in the future • Operands live in the past or the future • It’s not meaningful to speak of “now” between services • No shared transactions no simultaneity • Life in the “then” • Past or future • Not now • Each service hasa separate “now” • Different temporalenvironments! Slide 23
Services: Dealing with “Now” and “Then” • Services Make the “Now” Meet the “Then” • Each Service Lives in Its Own “Now” • Messages Come and Go Dealing with the “Then” • The Business-Logic of the Service Must Reconcile This!! • Example: accepting an order • A biz publishes daily prices • Probably want to accept yesterday’s prices for a while • Tolerance for time differences must be programmed • Example: “Usually ships in 24 hours” • Order processing has old info • Available inventory not accurate • Deliberately “fuzzy” • Allows both sides to cope with difference in time domains! • The world is no longer flat! • SOA is recognizing that there is more than one computer • Multiple machines mean multiple time domains • Multiple time domains mandate we cope with ambiguity to allow coexistence, cooperation, and joint work Slide 24
Introduction: The Shift Towards Services • Behavior: Encapsulation and Trust • Data: Then and Now • Outside Data: Reference Data • Outside Data: Sending Messages • Outside Data: XML and Schema • Outside Data: Commonality of Schema and Understanding • Inside Data • Inside/Outside: Representations of Data • Inside/Outside: Tying It All Together • Conclusion Outline Slide 25
Purposes for Reference Data Historic Artifacts Shared Collections of Data Operands What Is Reference Data? • Reference data is published across service boundaries • For each collection of reference data: • One service creates and publishes the data • Other services receive periodic versions of the data Slide 26
Service Deposit Operands Operator Reference Data: Operands for the Operators • As discussed above, messages across services invoke business operations… • Each service-to-service message is an operator • Each operator message is filled with operands • Parameters, options, customer-id, parts-being-ordered, etc • The data for these operands is published as reference data Slide 27
Service(Bank) BankStatementJul-2007 Reference Data: Historic Artifacts • Historic artifacts report on what happened in the past • Sometimes these snapshots need to be sent to other services • Examples: • Sales quarterly results • Monthly bank statements • Any and all monthly bills • Well… • Both requests for payment (operations) and the historic artifact of how much power you used… • Inventory status at end of quarter Slide 28
Ref Vers#24of EmployeeData Vers#24 UpdateEmployees Reference Data: “Shared Collections of Data” • Many services may need access to the same data • The data is changing… • Someone owns updating and distributing the data… • Examples: • Customer database • Employee database • Parts database and price-list HR Service Sales Service Authoritative CustomerData Authoritative EmployeeData – Vers#24 Authoritative EmployeeData – Vers#23 Ref Vers#23of EmployeeData Update! Ref Vers#24of EmployeeData Slide 29
1 2 A’s-Data Vers-Z A’s-Data Vers-Y A’s-Data Vers-X 3 Request Uses: Vers-Z 4 Publishing Versioned Reference Data • The owner of data periodically publishes • Using whatever messaging technique it wants • Publications are always versioned • The version numbers increase Service-A Service-B Slide 30
1 2 Request Uses: Vers-X Please MakeData Change A’s-Data Vers-Y A’s-Data Vers-X 3 Business Operations May Request Changes • If a non-owner wants a change it must do a biz-operation • This is a request sent to the owning service • The owning service may agree to the operation causing changeto the data in question • If it changes, this affects the next version` Owning Service-A Service-B Slide 31
Optimistic Concurrency Control:Anti-Encapsulation • What is optimistic concurrency control? • Data is read • Changes are made and submitted to the data’s owner • If the original data hasn’t changed, the new changes are applied • This assumes the remote system should be able to write directly on the data • This is a trusting relationship… not autonomous! • Autonomy and updates to data • Autonomy means independent control • My local biz-logic decides how my data changes! • If you want a change, ask me to do a business op • It’s my data…I’ll decide how it changes! Slide 32
Example: Updating the Customer’s Address • What about a salesperson updating a customer’s address? • Shouldn’t that just be optimistic concurrency control? • No! It should invoke business logic with a request! • Not all fields of the customer record should be updated by sales people • Requests across service boundaries invoke business logic when the customer address is changed Slide 33
Introduction: The Shift Towards Services • Behavior: Encapsulation and Trust • Data: Then and Now • Outside Data: Reference Data • Outside Data: Sending Messages • Outside Data: XML and Schema • Outside Data: Commonality of Schema and Understanding • Inside Data • Inside/Outside: Representations of Data • Inside/Outside: Tying It All Together • Conclusion Outline Slide 34
VersionIndependent Immutable And/Or Versioned Data • Windows Vista, SP1 • The Same Set of Bits Every Time • Data may be immutable • Once written, it is unchangeable • Immutable data needs an ID • From the ID, comes the same data • No matter when, no matter where • Versions are immutable • Each new version is identified • Given the identifier, the same data comes • Version independent identifiers • Let you ask for a recent version • Recent NY Times • Maybe Today’s, Maybe Yesterday’s • New York Times; 7/24/07 • Specific Version of the Paper -- Contents Don’t Change • Latest SP of Vista • Definitely Vista, Results Vary Over Time Slide 35
Service-A Once It’s Outside,It’s Immutable! Immutability of Messages • Retries are a fact of life • Zero or more delivery semantics • Messages must be immutable • Retries must not see differences… • Once it’s sent, you can’t un-send! Slide 36
To Cache Or Not To Cache • OK to cache immutable data • It’s never wrong • Never have to invalidate! • Caching should only be used for immutable data • Caching data that changes may lead to anomalies • Consider caching data labeled with a version dependent ID • Because versions are immutable it will work • Store the mapping from version independent to version dependent in an accurate location Slide 37
Classic problemwith de-normalization Can’t updateSam’s phone #since there aremany copies Emp # 91 18 66 47 Joe Emp Name Mary Pete Sally 5-1234 Emp Phone 5-7349 3-3123 2-1112 13 Mgr # 13 02 38 Betty Harry Sam Mgr Name Sam 5-6782 6-9876 6-9876 Mgr Phone 4-0101 Normalization And Immutable Data • Databases design for normalized data • Can be changed without “funny behavior” • Each data item lives in one place • Sometimes data should be de-normalized • If data is immutable it’s OK De-normalization is OK if you aren’t going to update! Slide 38
Stability Of Data • Immutability isn’t enough! • We need a common understanding • President Bush 1990 vs. President Bush 2007 • Stable data has a clearly understood meaning • The schema must be clearly understood • The interpretation of values must be unambiguous • Suggestion • Timestamping or versioning makes stable data • Observation • A monthly bank statement is stable data • Advice • Don’t recycle customer-ids • Observation • Anything called “current” is not stable Slide 39
A Few Thoughts on Stable Data • Outside data must be stable • Consistent interpretation across valid spaces and times • Inside data may be stable • Notably, when it is the same data as outside data… • Sometimes data inside is not stable • Classic normalization for vibrant update • Needs to be cast into a stable shape to send outside Slide 40
Validity Of Data In Bounded Space And Time • Bounding the valid times • It may have an expiration • Bounding the valid locations • Restrictions on where the data is valid • When valid, the data should be: • Immutable (the ID yields the same bits) • Stable (the meaning is clear) Price-List Valid Until Dec 31st Data Valid For Service-X Only “Offer Good Until Next Tuesday” “Offer Good to Washington State Residents Only” Slide 41
Identify theMessage Put Unique ID in All Messages Part of the Unique ID May Be a Version… ImmutableData Don’t Change the Data Associated withthe Unique ID; Never Return Different Bits OK toCache The Same Bits Will Always Be Returned Define ValidRanges Valid for a Certain Time Period and OverSome Space; OK to Always Be Valid Must BeStable Must Ensure There Is Never Any ConfusionAbout the Meaning (Within Valid Range) Rules For Sending Data In Messages Slide 42
Introduction: The Shift Towards Services • Behavior: Encapsulation and Trust • Data: Then and Now • Outside Data: Reference Data • Outside Data: Sending Messages • Outside Data: XML and Schema • Outside Data: Commonality of Schema and Understanding • Inside Data • Inside/Outside: Representations of Data • Inside/Outside: Tying It All Together • Conclusion Outline Slide 43
SQL, DDL, and Serializability • SQL’s DDL (data definition language) is transactional • Changes are made using transactions • The structure of the data may be changed • The interpretation after the DDL change is different • DDL lives within the time scope of the database • The database’s shape evolves over time • DDL is the change agent for this evolution • SQL lives in the “now” • Each transaction’s execution is meaningful only within the schema definition at the moment of its execution • Serializability makes this crisp and well-defined Slide 44
Service-A Message Schema Immutable Message Immutable Schema for the Message Message Schema and Immutable Messages • When a message is sent, it must be immutable • It is crossing temporal boundaries • Retries mustn’t give different results • The message’s schema must be immutable • It makes a mess if the interpretation of the message changes Slide 45
Immutable Schema and Its Identifiers • Immutable schema needs an identifier • It must be possible to unambiguously identify the schema • This must occur across the namespaces of sender and receiver • The schema definition must never change • Given the identifier, the same schema is returned • URIs (Universal Resource Identifiers) work well • Guaranteed to be unique • If you follow the rules • URLs (Universal Resource Locators) are cool • URLs are URIs • Also tell you a location to get the stuff (e.g. the schema) Slide 46
Address Customer Purchase Order SKU Number/Street Name Part Customer Address City/State Color Delivery Addr Postal Code Credit Rating Size SKUs Country Composition of Schema as a DAG • Schema make contain sub-schema • Inside the message are chunks of data • A purchase-order may contain customer information • They have their own definitions • The sub-schema are referenced by identifier • This leads to a tree of references to immutable schema • It’s really a DAG (Directed Acyclic Graph) • Sometimes, different sub-schema reference the same stuff Slide 47
Versioning and Schema • Frequently, schema is versioned • A new format of the schema is created • It is given a new identifier • Version independent schema identifiers • Specify a set of versions for a type of schema • The set may evolve over time • Version dependent schema identifiers • Specify a specific version of a specific schema • The version-dependent schema is immutable • Messages should always specify a version-dependent schema • This ensures no ambiguity Slide 48
Purchase Order Service-A Customer Delivery Addr Don’t Deliver in Morning Purchase Order SKUs Customer Delivery Addr SKUs Extensibility and Schema • Extensibility is the addition of non-schema specified information into the message • The schema does not specify the additional stuff • The sender wanted to add it anyway • Adding extensions is like scribbling in the margins • Sometimes adding notes to a form helps! • Sometimes it does no good at all! Slide 49
Infosets, XML-Schema, And PSVI • XML-Infoset • Semantics of XML, not syntax • Tree: parents, children, elements, & attributes • Allows (encourages) schema • Any representation OK • XML-Schema • Datatype library and schema definition • Composed schema uniquely identified (URI) • PSVI – Post Schema Validated Infoset • Infoset after validation against schema • Can leverage schema knowledge Slide 50