250 likes | 340 Views
A Comprehensive Model for Arbitrary Result Extraction. Neal Sample, Gio Wiederhold Stanford University Dorothea Beringer Hewlett-Packard. Shift in Programming Tasks. Integration/Composition. Coding. 1970 1990 2010. Sample Composition Tasks. Logistics
E N D
A Comprehensive Model for Arbitrary Result Extraction Neal Sample, Gio Wiederhold Stanford University Dorothea Beringer Hewlett-Packard
Shift in Programming Tasks Integration/Composition Coding 1970 1990 2010
Sample Composition Tasks • Logistics • Reservation and distribution systems, “find the best transportation route from A to B” • Genomics • Framework for composing various processing tools and repositories • Modeling • Weather prediction, complex chemical systems, basin modeling • Composition of processes (vs. components, data)
CLAM Composition Language • Purely compositional • no primitives for arithmetic • no primitives for I/O, etc. • Splitting up CALL-statement • parallelism by asynchrony in sequential program • novel possibilities for optimizations • reduction in complexity of invocation statements • Higher-level language • assembly HLLs • HLLs compositional paradigm • Intent: Enable domain experts
CLAM Primitives Pre-invocation: SETUP: set up the connection to a service SET-, GETPARAM: in a service ESTIMATE: for optimization Invocation and result gathering: INVOKE: begin execution EXAMINE: test progress of an invoked method EXTRACT: extract results from an invoked method Termination: TERMINATE: terminate a method invocation/connection to a service
Data Dependencies & Scheduling START service1 service2 service3 service4 service5 END // begin program A = service1(); B = service2(); C = service3(A,B); D = service4(C); E = service5(C); // end of program
Runtime: data extraction is hard • Data extraction with native modules worked • No language-level specifications in CLAM • E.g., Polling, threading, exception handling… • Multiple middleware for transport difficult mapping • CORBA-RMI, RMI-COM, COM-CPAM, etc. • Crisis of legacy services • To generalize or restrict? • Refine the strategy…
Strategy: hide it & depend on it • Have to respect service capabilities • Or suffer the LCD… (more in a bit) • Simple and flexible programming • Data extractions is a runtime issue, it is not central to composition task • Simplified Integration • Legacy ambivalence • Simple bridging for middleware • Increase audience for services • Better scheduling • Declarative language, data dependencies
Where are we? • Declarative language for composition • Data is used synchronization • No primitives to support synchronization • Apparent “mismatch” in data extraction methods & capabilities among various actors • What does the data look like? • How can data be extracted?
Data View: Services RESULTS Result A Result B Result C
Extraction Techniques • Asynchrony • Explicitly controlled: spin-locks, polling, interrupt handling, etc. • Can use with any DAG schedule • Partial extraction • web browsing - HTML text as a schema • SQL cursors (thanks to the reviewer) • Progressive extraction (exceptional) • Adaptive mesh refinements, JPEG interleaving
Current Focus Pre-invocation: SETUP: set up the connection to a service SET-, GETPARAM: in a service ESTIMATE: for optimization Invocation and result gathering: INVOKE: begin execution EXAMINE: test progress of an invoked method EXTRACT: extract results from an invoked method Termination: TERMINATE: terminate a method invocation/connection to a service
Current Focus Pre-invocation: SETUP: set up the connection to a service SET-, GETPARAM: in a service ESTIMATE: for optimization Invocation and result gathering: INVOKE: begin execution EXAMINE: test progress of an invoked method EXTRACT: extract results from an invoked method Termination: TERMINATE: terminate a method invocation/connection to a service
EXAMINE Primitive in CLAM • Returns “status” and “progress” • Status – 2 bits of state • status = {DONE, NOT_DONE, PARTIAL, ERROR} • Progress – open descriptor • Indicates progress in application specific-way • Could be variance, mean, amplitude, etc. • Default assumption: integer 0-100 = % done • Resolution of EXAMINE • Can apply per service (black box) • Can apply per result (white box) • Not complete for many legacy systems:only “status”, no “progress”
EXAMINE Service A B C Service.EXAMINE() {PARTIAL, 40} Service.EXAMINE(A) {DONE, 100} Service.EXAMINE(B) {NOT_DONE, 0} Service.EXAMINE(C) {PARTIAL, 20} Service A B C Service.EXAMINE() {DONE, 100} Service.EXAMINE(A) {DONE, 100} Service.EXAMINE(B) {DONE, 100} Service.EXAMINE(C) {DONE, 100} Service A B C Service.EXAMINE() {NOT_DONE, 0} Service.EXAMINE(A) {NOT_DONE, 0} Service.EXAMINE(B) {NOT_DONE, 0} Service.EXAMINE(C) {NOT_DONE, 0}
EXTRACT Primitive • Extracts data from a service • Per service (black box) • (var) = Service.EXTRACT(); • Per result (white box) • (varA = A, varC = C) = Service.EXTRACT(); • Allows partial data extraction • saves volume: abandon uninteresting elements • saves time: termination of useless invocation • Allows progressive data extraction with 2-value EXAMINE (status+progress) • Steering, time saving
Examine-Extract Relationship EXTRACT per service per result per servicestatus only asynchronousprocedure call, Java RMI limited Partial Extraction,(binary) thumbnails per servicestatus+progress partitionedprogressive extract(full result set) ? EXAMINE per resultstatus only semantic partialextraction(full result set) partial extraction browsing, SQL cursor(no progressive) per resultstatus+progress progressiveextraction(full result set) progressive andpartial extractionCLAM
Examine-Extract Relationship EXTRACT per service per result per servicestatus only asynchronousprocedure call, Java RMI limited Partial Extraction,(binary) thumbnails per servicestatus+progress partitionedprogressive extract(full result set) ? EXAMINE per resultstatus only semantic partialextraction(full result set) partial extraction browsing, SQL cursor(no progressive) per resultstatus+progress progressiveextraction(full result set) progressive andpartial extractionCLAM
Examine-Extract Relationship EXTRACT per service per result per servicestatus only asynchronousprocedure call, Java RMI limited Partial Extraction,(binary) thumbnails per servicestatus+progress partitionedprogressive extract(full result set) ? EXAMINE per resultstatus only semantic partialextraction(full result set) partial extraction browsing, SQL cursor(no progressive) per resultstatus+progress progressiveextraction(full result set) progressive andpartial extraction*CLAM
Examine-Extract Relationship EXTRACT per service per result per servicestatus only asynchronousprocedure call, Java RMI limited Partial Extraction,(binary) thumbnails per servicestatus+progress partitionedprogressive extract(full result set) ? EXAMINE per resultstatus only semantic partialextraction(full result set) partial extraction browsing, SQL cursor(no progressive) per resultstatus+progress progressiveextraction(full result set) progressive andpartial extraction*CLAM
Examine-Extract Relationship EXTRACT per service per result per servicestatus only asynchronousprocedure call, Java RMI limited Partial Extraction,(binary) thumbnails per servicestatus+progress partitionedprogressive extract(full result set) ? EXAMINE per resultstatus only semantic partialextraction(full result set) partial extraction browsing, SQL cursor(no progressive) per resultstatus+progress progressiveextraction(full result set) progressive andpartial extraction*CLAM
Examine-Extract Relationship EXTRACT per service per result per servicestatus only asynchronousprocedure call, Java RMI limited Partial Extraction,(binary) thumbnails per servicestatus+progress partitionedprogressive extract(full result set) ? EXAMINE per resultstatus only semantic partialextraction(full result set) partial extraction browsing, SQL cursor(no progressive) per resultstatus+progress progressiveextraction(full result set) progressive andpartial extraction*CLAM
Examine-Extract Relationship EXTRACT per service per result per servicestatus only asynchronousprocedure call, Java RMI limited Partial Extraction,(binary) thumbnails per servicestatus+progress partitionedprogressive extract(full result set) ? EXAMINE per resultstatus only semantic partialextraction(full result set) partial extraction browsing, SQL cursor(no progressive) per resultstatus+progress progressiveextraction(full result set) progressive andpartial extraction*CLAM
Examine-Extract Relationship EXTRACT per service per result per servicestatus only asynchronousprocedure call, Java RMI limited Partial Extraction,(binary) thumbnails per servicestatus+progress partitionedprogressive extract(full result set) ? EXAMINE per resultstatus only semantic partialextraction(full result set) partial extraction browsing, SQL cursor(no progressive) per resultstatus+progress progressiveextraction(full result set) progressive andpartial extraction*CLAM
Conclusions • Data extraction hiding is bueno! • User is not responsible for data management • Synchronizing extractions not in the language simplicity • Enables effective service scheduling • Simplified integration • Blueprint for proactive design pattern for future services