430 likes | 512 Views
CHAIMS: Mega-Programming Research . C ompiling H igh-level A ccess I nterfaces for M ulti-site S oftware Stanford University Objective : Investigate revolutionary approaches to large-scale software composition .
E N D
CHAIMS: Mega-Programming Research Compiling High-level Access Interfaces for Multi-site Software Stanford University Objective: Investigate revolutionary approaches to large-scale software composition. Approach: Develop & validate a composition-only language. Contributions and plans: Hardware and software platform independence. Asynchrony by splitting up CALL-statement. Performance optimization by invocation scheduling. Potential for multi-site dataflow optimization. www-db.stanford.edu/CHAIMS CHAIMS
Presentation • Motivation and Objectives • changes in software production • basis for new visions and education • Concepts of CHAIMS • CHAIMS language • CHAIMS architecture and composition process • Scheduling • Dataflow optimization • Status, Plans, Conclusions CHAIMS
Shift in Programming Tasks Integration Coding 1970 1990 2010 CHAIMS
Hypotheses • After the Y2K effort no large software app-lications will be written from the ground up. They will always be composed using existing legacy code. • Composition requires functionalities not available in current mainstream programming languages. • Large-scale systems enable and require different optimizations. • Composition programmers will use different tools from base programmers. (type A versus type B -- [Belady] CHAIMS
Languages & Interfaces • Large languages intended to support coding and composition have not been successful • Algol 68 • PL/1 • Ada • CLOS • Databases are being successfully composed, using Client-server, Mediator architectures • distribution -- exploit network capabilities • heterogeneity -- autonomy creates heterogneity • simple schemas -- some human interpretation • service model -- public and commercial sources in use: C, C++, Fortran, Java CHAIMS
Typical Scenario: Logistics A general has to ship troops and/or equipment from San Diego NOSC to Washington DC: • at different times ship different kind of materiel: • criteria for suitable means of transport differ • not every airport equally suited • congestion, prices • actual weather • certain due or ready dates Today: call different companies, look up information on the web, make reservations one-by-one Tomorrow: system proposes shipping methods that take many conditions into account • hand-coded systems • composition of processes CHAIMS
C H A I M S Megaprogram for composition, written by domain programmer CHAIMS system automates generation of client for distributed system CHAIMS Megamodules, provided byvarious megamodule providers Megamodules CHAIMS
Megamodules - Definition Megamodules are large, autonomous, distributed, heterogeneous services or processes. • large: computation intensive, data intensive, ongoing processes (monitoring of the real world, simulation services) • distributed: remote, available to more than one client • heterogeneous: a variety of languages and systems accessible by various distribution protocols • autonomous: maintenance and control over recourses remains with provider, differing ontologies ( ==> SKC) Examples: • logistics: “find best transportation from A to B”, reservation systems • genomics: compose various analysis tools (now manual control) CHAIMS
I/O I/O Data Resources Architecture for today: Fat Clients Domain expert Client computer Control & Computation Services c e a b d Wrappers to resolve differences CHAIMS
MEGA modules Sites Data Resources Service Architecture: Thin Clients Domain expert Client workstation IO module IO module C Computation Services e b a d T c S U T R CHAIMS
Issues in Heavy-weight Services Services are not free for a client: • execution time of a service • transfer time for data • fees for services ? What the client applications need: ==>monitoring progress of a service ==> allow choice among equivalent services based on estimated waiting time and fees ==> high performance due to parallelism among distributed remote services ==> preliminary overview results, information to select level of accuracy / results size ==> effective optimization techniques CHAIMS
Challenge in the new world:Empower Non-technical Domain Experts Company providing services: • domain experts of domain of service (e.g. weather) • technical experts for programming for distribution protocols, setting up servers in a middleware system • marketing experts “Megaprogrammer”: • is domain expert of domain that uses these services • is not technical expert of middleware system or experienced programmer, • wants to focus on problem at hand (=results of using megaprogram) • e.g. scientist, logistics officer CHAIMS
A purely compositional language? Which languages did succeed? • Algol, ADA: integrated composition and computation • C, C++ focus on computation Why a new language? • complexity: not all facilities of a common language (compare to approach of Java), • inhibiting traditional computational programming (compare C++ and Smalltalk concerning object-oriented programming) • focus on issue of composition, parallelism by natural asynchrony, and novel optimizations CHAIMS
CHAIMS “Logical” Architecture Customer Megaprogram clients (in CHAIMS) Network/Transport (DCE, CORBA,...) Megamodules (Wrapped or Native) CHAIMS
CHAIMS Physical Architecture Megaprogram Clients in CHAIMS Network CORBA, JAVA RMI, DCE, DCOM... Megamodules (wrapped, native) each supporting setup, estimate, invoke, examine, extract, and terminate. CHAIMS
CALL statements - growth & split CALL gained functionality • Copying • Code sharing • Parameterized computation • Objects with overloaded method names • Remote procedure calls to distributed modules • Constrained (black box) access to encapsulated data progress in scale of computing CHAIMS decomposes CALL functions Setup Estimate Invoke Examine Extract CHAIMS
CHAIMS Primitives Pre-invocation: SETUP: set up the connection to a megamodule SET-, GETATTRIBUTES: set global parameters in a megamodule ESTIMATE: get estimate of execution time for optimization Invocation and result gathering: INVOKE: start a specific method EXAMINE: test status of an invoked method EXTRACT: extract results from an invoked method Termination: TERMINATE: terminate a method invocation or a connection to a megamodule Control: Utility: WHILE, IF GETPARAM: get default parameters CHAIMS
Megaprogram Example: Overview General I/O-megamodule • Input function takes as parameter a default data structure containing names, types and default values for expected input Travel information: • Computing all possible routes between two cities • Computing the air and ground cost for each leg given a list of city-pairs and data about the goods to be transported Two megamodules that offer equivalent functions for calculating optimal routes • Optimum and BestRoute both calculate the optimum route given routes and costs • Global variables: Optimization can be done for cost or for time InputOutput - Input - Output RouteInfo - AllRoutes - CityPairList - ... AirGround - CostForGround - CostForAir - ... Routing - BestRoute - ... RouteOptimizer - Optimum - ... CHAIMS
Megaprogram Example: Code io_mmh = SETUP ("InputOutput") route_mmh = SETUP ("RouteInfo") ... best2_mmh.SETATTRIBUTES (criterion = "cost") cities_default = route_mmh.GETPARAM(Pair_of_Cities) input_cities_ih = io_mmh.INVOKE ("input”, cities_default) WHILE (input_cities_ih.EXAMINE() != DONE) {} cities = input_cities_ih.EXTRACT() ... route_ih = route_mmh.INVOKE ("AllRoutes", Pair_of_Cities = cities) WHILE (route_ih.EXAMINE() != DONE) {} routes = route_ih.EXTRACT() … IF (best1_mmh.ESTIMATE("Best_Route") < best2_mmh.ESTIMATE("Optimum") ) THEN {best_ih = best1_mmh.INVOKE ("Best_Route", Goods = info_goods, Pair_of_Cities = cities, List_of_Routes = routes, Cost_Ground = cost_list_ground, Cost_Air = cost_list_air)} ELSE {best_ih = best2_mmh.INVOKE ("Optimum", Goods = info_goods, … ... best2_mmh.TERMINATE() // Setup connections to megamodules. // Set global variables valid for all invocations // of this client. // Get information from the megaprogram user // about the goods to be transported and about // the two desired cities. // Get all routes between the two cities. //Get all city pairs in these routes. //Calculate the costs of all the routes. // Figure out the optimal megamodule for // picking the best route. //Pick the best route and display the result. // Terminate all invocations CHAIMS
Operation of one Megamodule • SETUP • SETATTRIBUTES provides context • ESTIMATE serves scheduling • INVOKE initiates remote computation • EXAMINE checks for completion • EXTRACT obtains results • TERMINATE I / ALL M handle M handle M handle M handle I handle I handle I handle I handle M handle CHAIMS
CHAIMS Megaprogr. Language Purely compositional: • only variety of CALLs and control flow • no primitives for input/output ==> instead use general and problem-specific I/O megamodules • no primitives for arithmetic ==> use math megamodules Splitting up CALL-statement: • parallelism by asynchrony in sequential program • novel possibilities for optimizations • reduction of complexity of integrated invoke statements • higher-level language (assembler => HLLs, HLLs => composition/megamodule paradigm) CHAIMS
Architecture: Creation Process Megamodule Provider Writes native programs or wraps non-CHAIMS compliant megamodules adds information to Wrapper Templates CHAIMS Repository e b d a MEGA modules c CHAIMS
Megaprogram (in CHAIMS language) Architecture: Composition Process Megaprogrammer writes information CHAIMS Repository information CHAIMS Compiler generates CSRT(compiled megaprogram) CHAIMS
IO module(s) Distribution System (CORBA, RMI…) Runtime Architecture b d e a CSRT(compiled megaprogram) c MEGA modules CHAIMS
Megaprogram (in CHAIMS language) Distribution System (CORBA, RMI…) Architecture: AllActive at different times Megamodule Provider Megaprogrammer wraps non-CHAIMS compliant megamodules writes adds information to information Wrapper Templates CHAIMS Repository information CHAIMS Compiler b d generates e a CSRT(compiled megaprogram) c MEGA modules CHAIMS
CHAIMS-protocols CORBA-idl DCE-idl Java-class M e g a m o d u l e s Multiple Transport Protocols Megaprogrammer CHAIMS API defines interface between megaprogrammer and megaprogram; the megaprogram is written in the CHAIMS language. CHAIMS - language Megaprogram The CHAIMS protocols define the calls the mega-modules have to understand. These protocols are slightly different for the different distribution protocols, and are defined by an idl for CORBA, another idl for DCE, and a Java class for RMI. CHAIMS
Data objects: Blobs Minimal Typing within CHAIMS: Integer, boolean only for control All else is placed into Binary Large OBjects (Blobs), transparent to compiler : Alternatives • ASN.1, with conversion routines • XML Example: Person_Information Name of Person complex Personal Data complex Address First Name string Joe Last Name string Smith Date of Birth date 6/21/54 Soc.Sec.No string 345-34-345 CHAIMS
Wrapper: CHAIMS Compliance CHAIMS protocol- support all CHAIMS primitives • if not native, achieved by wrapping legacy codes • State management and asynchrony: • clientId (megamodule handle in CHAIMS language) • callId (invocation handle in CHAIMS language) • results must be stored for possible extraction(s) until termination of the invocation • Data transformation: • BLOBs must be converted into the megamodule specific data types (coding/decoding routines) CHAIMS
Architecture: Three Views Composition View (megaprogram) - composition of megamodules - directing of opaque data blobs Data View - exchange of data - interpretation of data - in/between megamodules CHAIMS Layer Transport View moving around data blobs and CHAIMS messages Distribution Layer Objective: Clear separation between composition of services, computation of data, and transport CHAIMS
s s,i s,i i e e e s setup / set attributes invoke a method i extract results e Scheduler: Decomposed Execution time time time decomposed (no benefit for one module) asynchronous synchronous execution of a remote method available for other methods CHAIMS
invoke a method i extract results e Optimized Execution of Modules i1 M1 i3 e1 i1 M3 (>M1+M2) i4 M1 i2 M4 (<M1+M2) e1 M2 i2 M2 time e2 e4 i3 e3 M3 e2 time i5 M5 e5 e3 optimized by scheduler according to estimates i4 M4 e4 data dependencies i5 M5 e5 execution of a module non-optimized CHAIMS
time Decomposed Parallel Execution M1 M4 (<M1+M2) M3 <M1+M2) Long setup times occur, for instance, when a subset of a large database has to be loaded for a simple search, say Transatlantic fights for an optimal arrival. M2 M5 set up / setattributes optimized by scheduler according to estimates invoke a method extract results CHAIMS
M3 (>M1+M2) M1 M4 (<M1+M2) time prior time M2 M5 Decomposed Optimized Execution M3 (>M1+M2) M1 M4 (<M1+M2) M2 M5 set up / setattributes optimized by scheduler according to estimates invoke a method extract results CHAIMS
Scheduling: Simple Example 1 cost_ground_ih = cost_mmh.INVOKE ("Cost_for_Ground", 1 List_of_City_Pairs = city_pairs,Goods = info_goods) 2 WHILE (cost_ground_ih.EXAMINE() != DONE) {} 3 cost_list_ground = cost_ground_ih.EXTRACT() 3 cost_air_ih = cost_mmh.INVOKE ("Cost_for_Air", 2 List_of_City_Pairs = city_pairs,Goods = info_good) 4WHILE (cost_air_ih.EXAMINE() != DONE) {} 4 cost_list_air = cost_air_ih.EXTRACT() order in unscheduled megaprogram order in automatically prescheduled megaprogram CHAIMS
M6.1 M6.1 M6.2 M6.2 time M6.3 prior time M6.3 M6.4 M6.5 M6.4 set up / setattributes M6.5 invoke a method extract results Iterated Invocations Avoid repeated setups CHAIMS
M6.1 M6.1 M6.1 M6.2 M6.2 M6.3 M6.2 M6.3 M6.4 M6.5 M6.3 M6.4 M6.5 M6.4 set up / setattributes invoke a method extract results partial for iterating full for presentation M6.5 & Repeated Extractions Avoid large exacts until satisfied t i m e , shared setup & partial extract time, shared setup prior time, disibct invoctions CHAIMS
Scheduling: Heuristics INVOKES: call INVOKE’s as soon as possible • may depend on other data • moving it outside of an if-block: depending on cost-function (ESTIMATE of this and following functions concerning execution time, dataflow and fees (resources). EXTRACT: move EXTRACT’s to where the result is actually needed • no sense of checking/waiting for results before they are needed • instead of waiting, polling all invocations and issue next possible invocation as soon as data could be extracted TERMINATE: terminate invocations that are no longer needed (save resources) • not every method invocation has an extract (e.g. print-like functions) CHAIMS
Compiling into a Network current CHAIMS system Mega Program Mega Program Module B Module F Module F Module D Module D Module A Module C Module E control flow data flow with distribution dataflow optimization Mega Program Module B Module F Module D Module A Module C Module E CHAIMS
CHAIMS Implementation • Specify minimal language • minimal functions: CALLs, While, If * • minimal typing {boolean, integer, string, handles, object} • objects encapsulated using ASN.1 standard • type conversion in wrappers, service modules* • Compiler for multiple protocols (one-at-time, mixed*) • Wrapper generation for multiple protocols • Native modules for I/O, simple mathematics*, other • Implement API for CORBA, Java RMI, DCE usage • Wrap / construct several programs for simple demos • Schedule optimization * • Demonstrate use in heterogeneous setting • Define full-scale demonstration * in process CHAIMS
Conclusion: Research Questions • Is a Megaprogramming language focusing only on composition feasible? • Can it exploit on-going progress in client-server models and be protocol independent? • Can natural parallelism for distributed services be effectively scheduled? • Can high-level dataflow among distributed modules be optimized? • Can CHAIMS express clearly a high-level distributed SW architecture? • Can the approach affect SW process concepts and practice? CHAIMS
Conclusion: Questions not addressed • Will one Client/Server protocol subsume all others? • distributed optimization remains an issue • Synchronization / Concurrency Control • autonomy of sources negates current concepts • if modules share databases, then database locks may span setup/terminate all for a megaprogram handle. • Will software vendors consider moving to a service paradigm? • need CHAIMS demonstration for evaluation CHAIMS
Integration Science Artificial Intelligence knowledge mgmt models uncertainty Systems Engineering analysis documentation costing Databases access storage algebras Integration Science CHAIMS