590 likes | 844 Views
Microsoft PAKISTAN DEVELOPER CONFERENCE. 2005. June 13-15, 2005. Architecting for High Performance, Decentralized, Agent-Oriented, Connected Applications. Arvindra Sehmi Head of Enterprise & Architecture Developer & Platform Group Microsoft EMEA HQ asehmi@microsoft.com
E N D
Microsoft PAKISTAN DEVELOPER CONFERENCE 2005 June 13-15, 2005
Architecting for HighPerformance, Decentralized, Agent-Oriented, Connected Applications Arvindra Sehmi Head of Enterprise & Architecture Developer & Platform Group Microsoft EMEA HQ asehmi@microsoft.com www.thearchitectexchange.com/asehmi
Topics • Performance-Challenged Applications • Messaging Network Architecture • "Agile Machine" Agent Architecture • Conclusion
Topics • Performance-Challenged Applications • Messaging Network Architecture • "Agile Machine" Agent Architecture • Conclusion
Performance-Challenged Applications • App scalability is hard to plan for • App performance is hard to design for • Processing latency depends on the above • Processing often needs to proceed with incomplete information • Flight reservations • Financial trading
CAPCO A global financial technology solutions & business strategy company
CAPCO Requirements of their customers are: • A consistent system & application architecture • Systems integration capability • Predictable & controllable high performance
SGX Opportunity • Capco commissioned by the Singapore Exchange Limited (SGX) to provide business assessment & technical architecture advice • Required to implement a centralised processing utility that will provide matching services for Post Trade, Pre-Settlement interactions for equities & fixed income trading in the Singapore market • Initial prototype built in C++/Unix
InvestmentManager Broker/Dealer 6.The Broker/Dealer allocates the Trades according to the Invest Manager’s instructions and books them into individual client accounts. 5.The Investment Manager tells the Broker/Dealer how to allocate the Trade to individual clients. 8.The CSD sends confirms to the Investment Manager and the Custody Bank. Custodian Exchange(e.g. NYSE) 7.The Broker/Dealer sends the Trade information to the CSD for confirmation and settlement. CSD(e.g. DTCC) 9.The Custodian or the Investment Manager affirms that the Trade is good. 10.On settlement date, deliveries of shares and money are moved between the Client’s Custodian and the Broker/Dealer accounts at the CSD. Benchmark –High Performance ScenarioFinancial Trade Lifecycle 1.The Investment Manager sends the Block Trade to the Broker/Dealer. 4.The Broker/Dealer send the Investment Manager an average Price. 2.The Broker/Dealer send the Block Trade to the Specialist on the Exchange. 3.The Specialist executes all the individual blocks and sends the result to the Dealer. Adapted from Sungard Inc. – Case of the Bungled Trade
Message Queue MessageProcessor The Simple Queuing Model Message Handling MQs & Business Logic Processors (per message type state management & message matching) In-bound MQ & message routing processors Confirms & out-bound MQs & processors
x4 Alloc Q ALC x2 Noe Q NOE x1 x1 x2 ComputeServer Trade Q PAM CFM Q CFM CLS Q CLS x4 Settle Q SET x2 PAM Trade Q x4 Nett Q NET Processing Per Compute Server ComputeServer
Pipeline is replicated as required on new machines Scalable Transaction EngineCore processing pipeline DatabaseServer Database MessageSink MessageDriver Compute Server Core Processing Pipeline
Redmond ISV Labs Benchmark (1/2) • Focused early efforts on optimizing performance of system on a single dual processor IBM x330 server accessing a SQL 2000 SP2 database on 8 processor IBM x370 server • App CPU utilization ~100% • SQL CPU utilization less than 50% • Network cards saturated • The 3 key inhibitors performance were: • SQL log bound (disk I/O) • Use of dynamic SQL statements rather than sprocs • Original process configuration was not correctly “balanced”
Redmond ISV Labs Benchmark (2/2) • All servers used Windows Server 2003 • SQL performance team recommended: • Move audit log to local disk store • Hash data between N compute servers & N database instances spreading disk load over multiple striped Raid 1+0 arrays • Change to clustered index on match table • Use multiple disk controllers • MSMQ performance team recommended MSMQ v3.0, overcoming 4MB memory buffer limits & context switching issues • Use ICECAP & built-in WS2K3 performance counters
Scale-out performance results • Scale-out over 8 Application Servers & 1 SQL Server achieved a performance of 7734 msg/sec
Raid Array [x4] Compute Server [x8] MessageDriver [x8] 7734 msg/sec = 668,217,600 msg/day= ~65 Million Trades per Day DatabaseServer [x1] 4 • 1 x 8-way WinServer-2K3 PC • Running 1 x SQL Server 2000 • with 4 instances of the STE Database • 4 x Raid 1+0 Disk Arrays • Using a clustered index on the match • table with date-time ordering was a natural • fit to the real data’s arrival time heuristics • Data grew gracefully at the end of the clustered • index table • Disk queues on the Raid arrays maxed despite • only 50% SQL Server CPU utilization • This implied we could increase performance even • more by increasing the number of spindles on • the Raid Arrays • The 100 GB network cards also saturated 1 • 8 x 2-way WinServer-2K3 PC • Each running 2 Pam, 2 Noe, 4 Allocs, • 4 Net Proceeds, & 4 Settlement Instructions • processing components • All processing components read off queues & • write to queues. All queues, except the trade input • queue are local, express message queues • Database updates have affinity to one of the available • databases through the use of a multi-key • hashing algorithm computed over the trade match fields • 8 x 1-way Win2K PC • Each running 1 x Message Driver • Each Message Driver populates a remote • input trade queue on the target compute server • Seven million messages are delivered in total per test • Each “Trade” transaction consists of 1 Noe, • 2 Allocs, 2 Net Proceeds, & 2 Settlement Instructions
London Stock Exchange [ 2002/2003 ] Largest most international equity exchange in Europe Has $6.8 trillion of UK and international companies market capitalisation on its markets Has transacted $9.2 trillion of business over the year One third of the world’s liquidity!
Scalabilityis a KEY Criterion The Scalability Challenge 5x increase in peak order volumes in 2 years (2000-02) 2x increase in peak order volumes in 1 year (2001-02) The first Tuesday of February 2003 saw order message volumes increase over the 1,500,000 level due to current stock market volatility. These messages form the key input to LSE’s market data systems. Scalability is therefore KEY to success.
Performance & Latency are KEY Criteria The Performance Challenge Annual Revenue • Example: • Half of the LSE’s revenues come from information sales. • Timely and accurate information is critical for the market to operate correctly. • Information “degrades” rapidly over time as shown below. 45% of Revenue Age of Information 1 Second 6 Months 15 Minutes 1 Month
LSE System Functional OverviewA System Built by Accenture (UK) • Real Time Processing Generation of new messages, based on RTP of existing LSE message types Real time messages must be processed in order of receipt • Data Warehousing Archiving of all LSE sourced messages • Batch EOD activity reports, warehouse query execution etc. LSE System Real Time Processing Operational Data Store Outputs Inputs LMIL Trading In LMIL Data Warehouse Batch Processing
Real-Time Processing Examples Trade High-Low: • Trade prices come from ‘Trade Report’ message • System must disseminate new message for each new Highest Price for each stock • System must disseminate new message for each new Lowest Price for each stock • Trade Reports can be cancelled, reported late, delayed, never publicised • !! Trade Reports (applies to every message) must be processed in order (per stock) !! • !! All processing on a trade report must be done transactionally !! Opening Price • Opening prices come from ‘Trade Reports’ and / or ‘Best Price’ messages • If price is ‘first of the day’, system must disseminate new message for each stock • Rules for ‘first of the day’ are complex • !! All processing on a trade report / best price must be done transactionally !! Trades VWAP (Volume Weighted Average Price) • VWAP is Σ(Price * Volume) / Σ(Volume) • New VWAP disseminated every 15 seconds if volume has changed by a threshold amount • Trade cancellations and contras must be taken into account • !! All processing on a trade report must be done transactionally !!
Agent ProcessingDB RecoveryMessage Store Internal Queues In-memoryStore Listener Hash Queue Processing Agent Component Balancer Reader Component Router Checkpoint For RecoveryLogic AuditDB LSE “Agents” • Listener components listen to external interfaces, or other Agents • Hash balancer distributes messages across internal queues, maintaining order of processing • Internal queues exploit multiprocessing capabilities of a machine (1 thread per queue) and allow for fast input • Processing components execute processing in one transaction, using ‘fat’ SPs • Agent router routes and sends messages to other Agents via .NET Remoting (TCP/IP) • All components checkpoint for recovery
Agent Agent Agent ProcessingDB ProcessingDB ProcessingDB RecoveryMessage Store RecoveryMessage Store RecoveryMessage Store Internal Internal Internal Queues Queues Queues In-memoryStore In-memoryStore In-memoryStore Listener Listener Listener Hash Hash Hash Queue Queue Queue Processing Processing Processing Agent Agent Agent Component Component Component Balancer Balancer Balancer Reader Reader Reader Component Component Component Router Router Router Checkpoint Checkpoint Checkpoint For RecoveryLogic For RecoveryLogic For RecoveryLogic AuditDB AuditDB AuditDB Scale Out LSE “Agents”
Architecture Overview - Key Points • .NET Infrastructure • C#, .NET Remoting, ADO.NET, ASP.NET (UI + Web Services) • Application Logic • C# code wraps ‘thin’ data access layer • Functionally Stored Procedures (64 bit SQL 2000) • Packaging • System Agents contain all message processors • Database configures each Agent’s roles (processors) • Agents are spread amongst all application servers (scalability) • Agents have several internal queues (parallelism) • Agents communicate via Remoting (TCP/IP)
Architecture Overview - Key Points • Resilience / Reliability • System is hosted on Windows Server 2003 • Agents run under MS Cluster Services • All communication is checkpointed and fully recoverable • Scalability / Load Balancing • Hash Balancing (order of processing) • Internal (Agent queues) and External (amongst Agents) • NOTE: In-memory queues, NOT MSMQ queues
Redmond Enterprise Engineering Center Benchmark – Technical Scope • Real Time Processing only • No data warehousing, no batch processing • 1000 messages per second(200 per second on day 1!) • Latency < 1 second • Focus on new technologies • Clustering on Windows Server 2003 • .NET CLR Performance ( esp. Remoting & v1.1 ) • 64 bit SQL Server • Test, Measure, Enhance • Availability • Reliability • Performance
Topics • Performance-Challenged Applications • Messaging Network Architecture • "Agile Machine" Agent Architecture • Conclusion
Architecture Requirements • Be able to scale up quickly • Minimum turnaround time – matter of minutes • Easy deployment, reappropriation of resources • Break complex tasks into smaller steps • Distribute work across machines • Use specialized resources for certain work • Crypto Hardware, Fast Disk, Large Memory • Maximize Total System Throughput • Allow parallelization of work • Must work on expectation of stuff happening • Ambiguity is the rule not the exception
Queuing Networks – Re-Invented! • Sets of Primitive handlers, or technical operations • Generic: Route, Map, Transform, Log, Encrypt • Specific: Inference Rules • Composition of primitives into Processing Units • Map and Route based on Inference Rule • Store and Forward, Synchronize, Split • Correlate, Validate, Locate State, Match State • Composition of Processing Units into Networks • Combine tasks into workflows • Enlist external activities
Benefits this architecture?Primitives • Reusable • Perform common tasks • Can be easily adjusted by configuration • Act only on well-known message headers
Benefits of this architecture?Processing Units • Reusable • Perform steps in dynamic business processes • Have no co-location assumptions • Good for scale out and fail over • Autonomous services in their own right • Good for autonomous computing • Good for agents
/net/node1 Head Primitive Primitive Tail Primitive Queuing Networks: Processing Units • A processing unit composes primitives • Primitives are sequentially aligned in a "pipeline" • Messages pass through the pipeline • Primitives modify, split, create or consume msgs
Gateway Transform/net/node1 Preproc.Balance/net/node2 Match/net/node3 Augment/net/node4 Match/net/node3 Network Queuing Networks: Networks • A "network" is a set of connected pipelines • Each message travels along a path in the network • Path is determined dynamically by routing rules • A path is a pipeline made from processing units • Processing units in a network are called "nodes"
Inside a Single Node /net/node1 QueueListener Gate-keeper SenderPort Router Checks Security Pipeline Selects Route Sends via Transport Head Handler Tail Handler
AppDomain AppDomain /net/node1 /net/node2 Gate-keeper Router SenderPort Gate-keeper Router SenderPort Pipeline Pipeline Head Handler Tail Head Handler Tail Handler Handler Hosting Nodes to Create a Network Request Queue Thread Pool MSMQListener TCPListener ConfigManager Listeners Enterprise Services Port Process Initializer / Process Controller Enterprise Services Runtime (dllhost.exe) Process
Queuing Network Architecture (FABRIQ) Configuration
Topics • Performance-Challenged Applications • Messaging Network Architecture • "Agile Machine" Agent Architecture • Conclusion
Agents • A powerful architectural abstraction used to manage the inherent complexity of software • An agent is a self-contained, problem-solving system capable of autonomous, reactive, pro-active, internally motivated, social behaviour • Exist in changing, uncertain worlds in which they perceive and act • Manage mental states for beliefs, desires, intentions and commitments
Control Data Beliefs Commitments Pursue Desires(Exec Plans) Update Intentions Update Beliefs React Execute Commitments Accept Messages Other Data Messages fromInput Queue Messages forOutput Queue Messages forBack-end Processes Agent-Oriented Architecture Pattern
Control Data Beliefs Commitments Pursue Desires(Exec Plans) Update Intentions Update Beliefs React Execute Commitments Accept Messages Other Data Messages fromInput Queue Messages forOutput Queue Messages forBack-end Processes Agent-Oriented Architecture PatternThe BDI processing loop Desire / Goal Pursuit BeliefUpdate Intention / Commitment Update Illocution
Control Data Permanent& MessageState ProcessState Match State & Execute Workflow Update Action Schedule Correlate & Update State Perform Immediate Actions Perform Current Actions P-N-V A&A Other Data Messages fromInput Queue Messages forOutput Queue Messages forBack-end Processes A-O Architecture Pattern RealizationThe different kinds of processing ProcessManagement Data Management Action Management Communication
Control Data Permanent& MessageState ProcessState Match State & Execute Workflow Update Action Schedule Correlate & Update State Perform Immediate Actions Perform Current Actions P-N-V A&A Other Data Messages fromInput Queue Messages forOutput Queue Messages forBack-end Processes A-O Architecture Pattern Realization
Network Gateway Validate/net/node1 Enrich & Route/net/node2 Correlate X/net/node3 CheckIntegrity/net/node5 Correlate Y/net/node4 MatchState/net/node6 ApplyL/C Rules/net/node7 SetCommitments/net/node8 DoCommitments/net/node9 SendNotifications/net/node10 Database Queuing Network Agent - ‘Agile Machine’ Network of processing unit nodes
Network DoCommitments/net/node9 SetCommitments/net/node8 MatchState/net/node6 Correlate Y/net/node4 CheckIntegrity/net/node5 SendNotifications/net/node10 Enrich & Route/net/node2 Validate/net/node1 Gateway ApplyL/C Rules/net/node7 Correlate X/net/node3 Database Queuing Network Agent - ‘Agile Machine’ Network of processing unit nodes ReactiveProcessing DataManagement Communication ProcessManagement Action Management
Agent Implementation TMX – Trade Matching Exchange
Time t1 X: Sell Commitment[X, Sell][Y, Buy][Y, Alloc] Time t3+d Time t2 Y: Buy Time t3 Intention[X, Sell] Time t1+d Time t2+d Y: Alloc Agent/net/agent Intention[X, Sell][Y, Buy] Bucket4713 Bucket4712 Bucket4711 c x n m DEMO – TMX Beliefs and State Matching UpdateCOMMITMENTS UpdateINTENTIONS UpdateBELIEFS Agent Logic Match(Correlation)Engine Match on content Buckets hold knowledge Bucket4713 Belief support intention Knowledge forms belief