1 / 18

Customizing Middleware to Improve Performance and Footprint

Customizing Middleware to Improve Performance and Footprint. Arvind S. Krishna arvindk@dre.vanderbilt.edu. Institute for Software Integrated Systems Vanderbilt University Nashville, Tennessee. Applications. Middleware Services. Middleware. Operating Sys & Protocols. Hardware &

deon
Download Presentation

Customizing Middleware to Improve Performance and Footprint

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Customizing Middleware to Improve Performance and Footprint Arvind S. Krishna arvindk@dre.vanderbilt.edu Institute for Software Integrated Systems Vanderbilt University Nashville, Tennessee

  2. Applications Middleware Services Middleware Operating Sys & Protocols Hardware & Networks Motivation (1/2) Where are we right now? • Maturation of Distributed Object Computing Middleware (DOC) • ACE+TAO middleware • Open-source implementation of CORBA and Real-time CORBA • Highly optimized implementation implementing almost all features of CORBA • From Stovepiped to reusable architectures Functionality factored in middleware Product Line Architectures • Set of Systems that share common “core features” • Families of systems then built using core features • Reduce time to market pressures, cost productivity etc • Example: Boeing Bold Stroke Architecture Product line architectures minimize cost for building variants

  3. Motivation (2/2) Model Driven Development Paradigm (MDD) • Reduces costs of building new families of systems • Compose different systems at modeling level • Model Check for correctness • Code-generators synthesize artificats: XML deployment information, configuration information, benchmarking code….. Models capture System properties: structure and behavior Middleware for Product-Lines • Still general purpose layered • Enables different variants to be hosted by different configurations • However not optimized for each variant Information propagation What we need? Optimizations that customize middleware based on system invariants

  4. Customizing Middleware via Partial Evaluation • Partial Evaluation • Technique of automatically specializing programs based on ahead of time known parameters • Two level mechanism: • First level annotating information • Second level involves synthesizing code • Templates and Template meta-programming • Research will examine • Techniques used in programming languages can be used in middleware • Move from a general purpose to a more specialized architecture Optimized Implementation Stack General Purpose Layered Architecture Optimize the “known knowns” leave “known uknowns” to the middleware and use exceptions for “unknown unknowns”

  5. Existing Middleware Optimizations • Footprint Reduction Optimization • Micro ORB Architecture  Virtual Component Pattern • Micro POA Architecture  Pluggable components • Request Demux/Dispatch Optimizations • Connection Management  Acceptor-Connector pattern, Reactor • Buffer Management Strategies • Request Demultiplexing  Active Demultiplexing & Perfect Hashing • Aren’t these optimizations enough? • Have worked really well for different applications in domains • General purpose middleware is still layered • Techniques that will fold layers (code and run-time checks) to improve performance • Will add more to the general purpose optimizations

  6. Capturing System Invariants in Models (1/2) Hypothesis  Solution Approach Use early binding parameters to tailor middleware Techniques applied could range from: • Conditional Compilation • Optimize/Stub skeleton generation • Strategy pattern to handle alternatives Example System • Basic Simple (BasicSP) three component Distributed Real-time Embedded (DRE) application scenario • Timer Component – triggers periodic refresh rates • GPS Component – generates periodic position updates • Airframe Component – processes input from the GPS component and feeds to Navigation display • Navigation Display – displays GPS position updates • Program Specialization Invariants • Must hold for all specializations • output(porig) = output (pspl) • speed (pspl) > speed(porig) Boeing Product line scenario – Representative DRE application: rate based CoSMIC/examples/BasicSP ACE_wrappers/TAO/CIAO/DaNCE/examples/BasicSP

  7. Capturing System Invariants in Models (1/2) Component Deployment Component Interactions Same Endianess Periodic Timer Single method interfaces Collocated Components • Mapping Ahead of Time (AOT) System Properties to Specializations • Periodicity  Pre-create marshaled Request • Single Interface Operations  Pre-fetch POA, Servant, Skeleton servicing request • Same Endianess  Avoid de-marshaling (byte order swapping) • Collocated Components  Specialize for target location (remove remoting) • Same operation invoked  Cache CORBA Request header/update arguments only

  8. Specializations Implemented in TAO • Client Side Specialization • Request Header Caching • Pre-creating Requests • Marshaling checks • Target Location • Server Side Specialization • Specialize Request Processing • Avoid Demarshaling checks • Cumulative Effect • More than additive increase of adding specializations • For example: • Client side – request caching • Server side – specialize request processing • 1+1 = 3?

  9. Specialize for Target Location (1/2) Intent Specialize a path based on knowledge that objects are collocated • Model Invariants • All communication between GPS, Airframe and Display components are collocated • All Invocations are local • Do not need remoting code (Connection code not required) • Transformations to TAO (foot-print) • Eliminate Connection handling code • Connection Strategies, Flushing Strategies • Eliminate Invocation classes • Remote Invocation classes • One way and two way invocation classes • Transformations to TAO (performance) • Eliminate Remoting Checks • Object Proxy checks for remoting • Invocation Adapter checks for remoting for each invocation • Checks for one-way or two-way invocation

  10. Specialize for Target Location (2/2) TAO Implementation & Automation • All implementations present in branch “TAO_PE_Collocation” • Specialization implemented by Conditional compilation technique (TAO_HAS_COLLOCATION) flag to remove remoting • Profiled optimistic case of absolute no remoting (i.e. no code to handle requests and replies) Configuration • 2.4.21-27.0.1.ELsmp #1 SMP Redhat kernel • Athlon dual processor 2 GHz processor • 1 GB RAM and 256 KB cache for each processor • Test run TAO’s performance-tests/Latency/Collocation

  11. Specialize CORBA Request Header (1/4) Intent Avoid the considerable overhead of creating new CORBA requests and replies for each of a series of request calls • Model Invariants • Timer Component periodically sends same event • Operations to retrieve data from the models are also the same. • Update Rather than Create • Do not create new Request each time • Use old request and re-use the Request Header • Various levels of re-use possible • Reuse only Request Header • Reuse both Request Header + Message Specific Header • Reuse entire request This approach similar to TCP header prediction

  12. Specialize Request Header (2/4) Request Header Caching • First level specialization – Cache only the Request Header Part • Everything else in the request is variable • Avoid marshaling de-marshaling costs for the header part alone • Implemented at client side TAO Implementation • First request creates the entire request (code flow same as normal path) • Cache header information (marshaled) • Update only the total size and ID after request creation on subsequent messages • Implemented via conditional compilation

  13. Specialize CORBA Request Header (3/4) TAO Implementation • Move buffer pointer to start of data segment • Write out the arguments for the call • Update the total size of the request (SIZE) and REQUEST_ID fields in the request • Message Specific Header Caching • Cache both Request Header and Message Specific Header • Object Key is the same • Service context information (same) • Operation name same e.g., get_data Server side  Only when Thread per connection used GIOP Formats  Only for GIOP 1.2 as 1.0 and 1.1 service contexts are written first

  14. Specialize CORBA Request Header (4/4) Intent • Instead of caching only the header (Request + Message specific) pre-create entire CORBA request • Model Invariants • Timer component sends “trigger” (heart beats) to recipient component. Similar situation for timeouts • Request and data contents are the same • Proposed TAO implementation • Special IDL flag that will pre-create (marshal the request) • Each time same request is sent to the client • Update request ID of the request only • Save cost of request construction and marshaling

  15. Specialized Request Processing (1/2) Intent • Resolve the mapping of incoming requests to the POA, Servant, Skeleton, and operation to which they are dispatched only once, then use these pre computed results to optimize the dispatch of subsequent requests Model Invariants • get_data operation invokes operation on the same component, located in the same POA serviced by the same servant and operation • Once Per Connection Resolution of Dispatch • TAO provides Active Demultiplexing + Perfect Hashing for O(1) lookup time bound • Caching just POA may not give a lot of performance improvement

  16. Specialized Request Processing (2/2) TAO Implementation • As the operation names are the same: We directly cache the skeleton and advance the current buffer pointer to beginning of arguments • The length is calculated only for the first request and re-used. Cost amortized over number of operations • Implemented via TAO_CACHE_SERVANT_REF conditional compilation macro • $TAO_ROOT/performance-tests/Latency/Single-Threaded This is similar to “Direct Collocation” optimization for a collocated request

  17. Specialize Marshaling/De-marshaling Intent • To mask endianess GIOP Request header contains a flag that indicates endianess of the request • If different endianess, do byte swapping Model Invariants • The two machines on which the components are hosted have the same endianess (byte order) No checks for byte order required ACE Implementation • ACE_CDR streams provide for ACE_SWAP_ON_WRITE and ACE_DISABLE_SWAP_ON_READ macros that can be used to eliminate checks for byte-ordering • Macros and not set by default. Model interpreters could generate configuration setting to enable these macros

  18. Concluding Remarks & Future Work • Specialization techniques can be used as a technique for “folding layers” based on system invariants • Current implementation “first cut” uses conditional compilation strategies. Examine more appropriate strategies for implementing these specialization • Request Header Caching – Strategies controlled by svc.conf • Specialize Request Processing – POA request processing policy • Marshaling/de-marshaling – ACE level • Pre-create request – IDL Generated code • Collocation specialization – Macros + Strategies (Invocation classes) Examine specialization at the Component Middleware level and Infrastructural Middleware level

More Related