440 likes | 556 Views
Real-Time ORB Middleware: Standards, Applications, and Variations. Christopher Gill cdgill@cse.wustl.edu Center for Distributed Object Computing Department of Computer Science and Engineering Washington University, St. Louis, MO.
E N D
Real-Time ORB Middleware: Standards, Applications, and Variations Christopher Gill cdgill@cse.wustl.edu Center for Distributed Object Computing Department of Computer Science and Engineering Washington University, St. Louis, MO Research conducted in collaboration with colleagues at Washington University, Vanderbilt University, University of Kansas, University of Rhode Island, Ohio University, OOMWorks, Boeing, BBN, Honeywell, and Tech-X Research supported in part by DARPA contracts F33615-01-C-1898 (NEST); and F33615-00-C-3048 and F33615-03-C-4111 (PCES)
Main Themes • Standards enforce commonality • Specify interfaces, etc., on which applications can rely • Applications are heterogeneous • Which standards are relevant may vary from app to app • Apps may rely on different subsets of standard features • What if commonality & heterogeneity don’t match? • E.g., app needs a feature the standard doesn’t address • E.g., a needed feature may conflict with specified ones • Developing and using standards-based middleware effectively demands attention to these issues (especially if time, space, reliability are involved)
Motivating Example: Avionics Mission Computing Collaborative research with Boeing, BBN, Honeywell Technology Center, supported by Boeing/AFRL contract F33615-97-D-1155/0005 (WSOA) • In-flight collaboration between aircraft personnel • Exchange imagery and annotations over a wireless network • Trade-offs between image quality and transfer latency • Managed adaptively during download, to ensure timeliness • Why use CORBA, and for what parts of the system? • For DOC between Ada/ORBExpress server and C++/TAO client • For prioritization of OFP and image handling operations on client • For adaptive rate-based scheduling on client
Outline: Three Illustrative Technology Studies • Real-Time CORBA 1.0 • Location/language transparency, low latency, priorities • Trade-offs in time, footprint, and features • Lightweight CCM • Component assembly, deployment, (re-)configuration • Trade-offs in the timeliness of configuration itself • Real-Time CORBA 1.2 • Pluggable dynamic scheduling, distributable threads • Trade-offs in flexibility, overhead, and mechanisms
Technology Study I: Real-Time CORBA 1.0 • Location/language transparency • Low latency • Static Priorities • Trade-offs • time, footprint, and features
CORBA Location/Language Transparency object reference Client Servant Stub Skeleton IIOP message ORB ORB • IDL provides type safety between client and server • A client obtains an interoperable object reference (IOR) • Encodes IP address, port, object ID, etc. • A wire format is defined for invocation messages • Client stubs marshal, server skeletons un-marshal messages • Other details are left as ORB implementation features • How to combine threads, sockets, event de-multiplexers, etc. • ORB developers can (and should) exploit this design freedom
Exploiting Design Freedom for Low Latency E.g., ACE Framework • Re-use portable, type-safe, efficient mechanisms • Concurrency, communication, event demultiplexing, etc. • Available for many POSIX-like OS platforms • Also RTOS: VxWorks, LynxOS, KURT-Linux/LibeRTOS • Compose to avoid blocking, queueing, locking, etc.
Real-Time CORBA 1.0: Static Priorities // Define two lanes RTCORBA::ThreadpoolLane high_priority = {10 /*Prio*/, 3 /*Static Threads*/, 0 /*Dyn Threads*/ }; RTCORBA::ThreadpoolLane low_priority = {5 /*Prio*/, 2 /*Static Threads*/, 2 /*Dyn Threads*/}; RTCORBA::ThreadpoolLanes lanes(2); lanes.length (2); lanes[0] = high_priority; lanes[1] = low_priority; RTCORBA::ThreadpoolId pool_id = rt_orb->create_threadpool_with_lanes (1024 * 10, // Stacksize lanes, // Thread pool lanes false, // No thread borrowing false, 0, 0); // No request buffering Thread Pool with Lanes • Lanes enforce priority separation between threads • Set minimum (static) and additional (dyn) # of threads • Set stack size, use of thread borrowing, request buffering PRIORITY PRIORITY 5 10
1 2 4 3 When Trade-Offs Impinge on a Standard Structure with Embedded or Bonded Piezoelectric Transducers Acoustic Waves (kHz Range) • Active Damage Detection on structures (e.g., aircraft tail) • Ping nodes create vibrations that are measured by sensors • Computational nodes do analysis, schedule other nodes • DOC middleware can help ease programming complexity • Crucial trade-offs in time vs. footprint vs. features • Can (and should) ORB developers stay within the standard?
Design Challenges • General purpose middleware aims at supporting a wide variety of applications • Tends to support a breadth of alternative features • Extra features may impact some applications • E.g., Foot-print in memory-constrained networked embedded systems demanding real-time assurances • Need to study and select middleware features based on application requirements • Fundamental tension between • Generality/standardization • Application specific customization
Foo() Bar() Could be avoided for homogenous nodes 1) Only a subset of GIOP messages 2) 3) Simple Life cycle management 4) Hash-table vs linear search Critical Path Analysis and Trade-offs in nORB Remote call Call to implementation 1 1 Stub code (using ACE_CDR) Unmarshall parameters Skeleton code (using ACE_CDR) Marshall parameters Operation lookup and dispatch 4 Simple Object Adapter 3 Reactor Reactor Connection Cache Connection Cache ORB 2 ORB Acceptor Acceptor
Footprint Comparison: ACE, nORB, TAO ACE costs 212KB; nORB+ACE costs 345KB; TAO+ACE costs ~1.7MB Node application code alone costs 164KB
Technology Study I: Summary • The CORBA standard promotes DOC programming • Portable, interoperable, language/location transparent • Gives ORB developers freedom to optimize/strategize • The RT-CORBA 1.0 standard adds real-time QoS • E.g., thread pools, prioritized lanes, etc. • Here too, design freedom is crucial, e.g., for low latency • However, some application contexts raise issues • E.g., with stringent memory and RT constraints, how crucial is strict standards compliance to developers? • Minimum CORBA, other specifications acknowledge this • Further attention to “degrees of compliance” may help
Technology Study II: Lightweight CCM • Component assembly, deployment, re-configuration • Some applications require optimization and trade-offs in the timeliness of configuration itself • Rethink deployment/configuration lifecycle • Must fit within stringent system initialization bounds
An Review of RT-DOC Middleware Evolution • Distributed Object Computing (DOC) Middleware • E.g., CORBA, Java RMI • Simplifies client-side programming via location (language) independence • Real-Time DOC Middleware • E.g., Real-Time CORBA 1.0, 1.2 • Enforces real-time properties between client and server • Component Middleware • E.g., CORBA Component Model (CCM), EJB/J2EE • Simplifies server programming through declarative configuration • Real-Time Component Middleware • E.g., the Component-Integrated ACE ORB (CIAO), QoS EJB • Enforces configured real-time properties within server itself • Are the configuration activities themselves real-time?
Motivating Example Application • Simple component application from avionics domain (Boeing) • Represents many other distributed real-time applications • Application composed flexibly via component middleware • Real-time (and other ) aspects can be configured this way • E.g., RT-CORBA policies, thread pools, replicas for fault tolerance, etc. • Real-time bounds on configuration itself may matter as well • E.g., minimum initialization time when system is (re-)started • Constrains timing of component assembly and deployment stages
Static vs. Dynamic Configuration • Dynamic Configuration • Component assembly & deployment • uses DLLS, XML parsing • Problems • parsing/loading time • no support for .so/.dll libs on some platforms (e.g., VxWorks) • Static Configuration • Move as much off-line as possible • Focus on preserving only run-time flexibility that is needed • Use static linking to “load” implementations • Use run-time drivers to configure implementations at initialization
Static vs. Dynamic Configuration Experiments • Compared performance of static and dynamic configuration • Used example avionics domain application • Goal: identify sources of performance difference • Tests were run on a single machine • Pentium IV 2.5GHz CPU, 500MB RAM, 512KB Cache • OS was Linux 2.4.18 with KURT-Linux patches applied • Supports DLLs for dynamic configuration approach • Offers good real-time predictability for experiments • Experiments used CIAO 0.4.1 / TAO 1.4.1 / ACE 5.4.1
Time for Application Assembly • Without RT features • msec vs. 100s of msec • 2 orders of magnitude • With RT features • Constant additional overhead • Greater relative cost at low orders of magnitude • Differences attributed to • Loading DLLs, spawning processes • Most expensive • XML parsing on-line • Secondary
Component Server Creation Time • Server configuration is 2nd largest contributor to performance differences • 100s vs. 10s of msec • Static gives a baseline • Most of time was spent hooking RT CORBA features into server • 2 orders of magnitude less for non-RT version Configuring RT-CORBA features
Home Creation Time • Homes manage component instances • Configuring homes less expensive than • application assembly • component server • Loaded vs. linked homes accounts for the difference • Real-time features • Didn’t increase the total time significantly
CIAO vs. PRISM Configuration • CIAO’s static configuration similar to Boeing’s PRISM domain-specific component middleware • But configuration steps and flexibility/cost differ significantly • CCM(Extension Interface pattern) vs. C++ (Façade pattern) model
CIAO vs. PRISM Configuration Experiments • Platform details • Motorola 5110-2263 VME board • MPC7410 500MHz processor w/ 512 MB RAM • VxWorks 5.4.2 • Post x.4 (pre-release) version of CIAO w/ static configuration • High resolution time measurement used two tick counters • 5msec resolution: VxWorks tickGet() • 40ns resolution: VxWorks sysTimestamp()
PRISM/CIAO Home Creation Time home activation, etc. • PRISM homes • C++ objects • Memory allocation • Object initialization • CIAO homes • C++ object costs … • … plus CORBA initialization costs
PRISM/CIAO Component Creation Time component activation, etc. • Again see C++ vs. CORBA differences • Most expensive step in static CIAO configuration • Still well bounded • for all but the finest time scales Bounded by 4 msec
PRISM/CIAO Connection Establishment Time CORBA connection setup cost • Least expensive configuration step • Again reflects C++ vs. CORBA differences • Trade-off is between performance and flexibility
Technology Study II: Summary • Static approach gives real-time configuration • Avoids costs/features that hamper real-time behavior • Main costs are DLL loading, spawning processes • Concentrated in application assembly, server creation • Intermediate design point: limited on-line XML parsing? • PRISM & CIAO differ somewhat in flexibility, cost • C++ based components vs. CORBA components • Intermediate design point: mixture of object types? • Static configuration capabilities described here are available as open-source within DAnCE • Implement Deployment & Configuration specification • http://deuce.doc.wustl.edu/Download.html
Technology Study III: Real-Time CORBA 1.2 • Distributable threads • Pluggable dynamic scheduling • Trade-offs in flexibility, overhead, mechanisms
Motivation • More evolution of middleware programming model • Distributable threads are natural for certain applications • I.e., those with long-running distributed sequential activities • May also help with distributed scheduling, load balancing, etc. • Integrated with pluggable/dynamic scheduling semantics • Design and implementation goals • Flexible on-the-fly adaptation of real-time properties • Preserve info on paths a distributable thread traverses • Provide efficient, rigorous enforcement mechanisms
RT-CORBA 1.2 Implementation in TAO • Implementation of Distributable Threads • Thread identity and cancellation design considerations • Give the application better control of concurrency overall • OS vs. distributable thread identity issues and approach • Cancellation interface and its implementation • Dynamic scheduling service framework • Flexible interface between scheduler and application • OS and middleware based prio scheduler implementations • Benchmarks • Quantify cost of managing distributable, OS thread ids • Compare OS, middleware scheduling techniques
RT-CORBA 1.2 Concepts • Distributable thread – distributed concurrency abstraction • Scheduling segment – governed by a single scheduling policy • Locus of execution – where distributable thread is currently running • Dynamic schedulers – enforce distributable thread eligibility
Intro to RTC2 Distributable Threads • With only 2-way CORBA invocations, distributable threads behave much like traditional OS threads • But can move (with their context) from one endsystem to another • Cross through different resource scheduling domains • Distributable threads contend with OS threads, each other • With locking, effect can span endsystems, though scheduling is local
Creating Distributable Threads • Distributable threads can be created three different ways • An application thread calling BSS outside a distributable thread • A distributable thread calling the spawn() method • A distributable thread making an asynchronous (one-way) invocation • New distributable thread inherits scheduling parameters
Distributable Thread Path Example • Scheduler upcalls at several points on path • At creation of a new distributable thread • At BSS, USS, ESS calls • When GIOP request is sent • Receipt of GIOP request • When GIOP reply is sent • Receipt of GIOP reply • In each upcall, scheduling information is updated • Additional interception points can (and sometimes should) be supported by the ORB and the scheduler/policy
Middleware Based Scheduling • Benefit: scales in # of distributable threads per OS thread • Drawback: queue management costs for some policies • Alternatives: 1:1 OS:distributable thread, lanes, groups
Middleware/OS Scheduling Benchmark OS level scheduling middleware level scheduling • Simple comparison of OS and middleware scheduling • Both approaches show reasonable control at a resolution of seconds • Notice some latency in last transition in middleware approach • This OS/middleware difference is characteristic of other dynamic scheduling approaches (e.g., Group Scheduling) Δ latency
Other mechanisms affect real-time performance, as well Managing identities of distributable and OS threads Configuring and using mechanisms sensitive to thread identity Supporting safe and efficient cancellation of thread execution <GUID, TID> <GUID, TID> Thread Identity and Cancellation Issues DT carries scheduling parameters with it Binding of a single DT to two different OS threads Host 1 Host 2 RTCORBA 2.0 Scheduler RTCORBA 2.0 Scheduler DT Can cancel from either endsystem
Thread Specific Storage (TSS) Example • A distributable thread can use thread-specific storage • Avoids locking of global data • OS provided TSS is efficient, uses OS thread id • However, distributable thread may span OS threads • Solution: TSS emulation based on <GUID,tid> pair • What is TSS emulation cost compared to OS TSS?
TSS Emulation Benchmarks • Pentium tick timestamps • nsec resolution on 2.8 GHz P4, 512KB cache, 512MB memory • RedHat 7.3, real-time class • Called create repeatedly • Then, called write/read repeatedly on one key • Upper graph shows scalability of key creation • Scales linearly with number of keys in OS, ACE TSS • Emulation cost ~2usec more per key creation • Lower graph shows the emulated write costs ~1.5usec, read ~.5usec more
Distributable Thread Cancellation • Context: distributable thread can be cancelled to save cost • Problem: only safe to cancel • on an endsystem that is in the thread’s run-time “call stack” • when thread is at a safe preemption point • Solution: cancellation is • invoked via cancel method on distributable thread instance • handled at next scheduling point (scheduler upcall)
Technology Study III: Summary • RT-CORBA 1.2 can give predictable real-time performance • Allows dynamic scheduling of distributable threads • A range of thread management mechanisms matter • must also be designed for real-time performance • RT-CORBA 1.2 implementation in TAO • open-source software, freely available on the web • http://deuce.doc.wustl.edu/Download.html
Concluding Remarks • CORBA Developers balance ongoing trade-offs • Between what standards specify … • And what their applications need • Often many application needs are addressed well • Inter-operability, location/language transparency • Component configuration support • Prioritization, other QoS aspects as well • However, the standards don’t cover everything • Developers must exercise judgment WRT standards • When to adhere, when to augment, when to diverge from them • Sometimes, divergences are the basis for upgrading standards • The key point is that it’s an evolutionary process • Applications try to converge toward standards • Standards try to converge toward applications
For More Information • Avionics application case study • www.cse.wustl.edu/~cdgill/PDF/RTSJ_WSOA.pdf • Small footprint real-time middleware • www.cse.wustl.edu/~cdgill/PDF/rtas04_nORB.pdf • RT-CORBA 1.2 • www.cse.wustl.edu/~cdgill/PDF/JBCS_RTC1.2.pdf • www.cse.wustl.edu/~cdgill/PDF/rtas05_DTEC.pdf • Dynamic scheduling • www.cse.wustl.edu/~cdgill/PDF/dynamic.pdf • www.cse.wustl.edu/~cdgill/PDF/embedded_sched.pdf • www.cse.wustl.edu/~cdgill/PDF/rtas05_groupsched.pdf • www.cse.wustl.edu/~cdgill/PDF/rtas05_DSRM.pdf • Real-Time component configuration • www.cse.wustl.edu/~cdgill/PDF/doa04_ciao.pdf • www.cse.wustl.edu/~cdgill/PDF/rtss04_ciao.pdf