150 likes | 481 Views
Parallel Computation of Skyline Queries Verification. COSC6490A Fall 2007 Slawomir Kmiec. Presentation Outline. Skyline Concepts The Parallel Algorithm JPF Experience JPF Issues Abstraction Results Future Work Summary Questions. Skyline Concepts.
E N D
Parallel Computation ofSkyline QueriesVerification COSC6490A Fall 2007 Slawomir Kmiec
Presentation Outline • Skyline Concepts • The Parallel Algorithm • JPF Experience • JPF Issues • Abstraction • Results • Future Work • Summary • Questions
Skyline Concepts In a set of points (or records) identify points that are better than (i.e. not worse than) any of the others by a given set of their attributes. Point pa is said to dominate point pb if for all i such that 1 ≤ i ≤ d we have xi(pa) ≤ xi(pb) , and at least one of those inequalities is strict. A point p is a skyline point if it is not dominated by any other point in S. The skyline of S is denoted sky(S).
The Parallel Algorithm (A) • Principles: → data divided equally and distributed → local skyline is computed at each peer → size of the local skyline is shared with peers → if combined results fit on any processor → local skylines are exchanged with peers then → processor pi picks ith chunk of the combined skyline and eliminates points in it that the combined skyline dominates → local results are sent to the central process → end // of processing
The Parallel Algorithm (B) • Principles (continued) → else // combined results do not fit on some pi → loop until required number of results is available or all pi have finished do → each processor pi picks a random set of points (in proportion of his local skyline) → this set is submitted to all peers that mark point that they dominate and marked points are returned to sender → each processor pi collects back points submitted to peers and removes marked ones from the original set but sends the remaining ones to the central processor → end loop → end // of processing
JPF Experience • getting JPF • getting JPF to run • the Eclipse way • the Linux way • incremental examples • configuration options • JPF value-added services
JPF Issues • independent processors- restricted to threads • eliminate native code classes- no Swing, Sockets, NIO, Regex (Eclipse)- out of 15 just java.util.ArrayList left- eliminate Socket-oriented developed classes • search-state-space reduction- input: 10 points- 2 worker threads- operation abstraction- output discarded
Abstraction • 2 types of developed classes left SkylineMain and SkylineWorker - workflow classes “Handler” classes - request handling classes SkylineMain Thread SkylineMainListener ServeSocket SkylineMainHandler Socket SkylineWorker Thread SkylineWorkerListener ServerSocket SkylineWorkerHandler Socket
Abstraction (cont.) • high volume of work:- due to a lot of original code • removed all GUI:- remove Swing and AWT elements • asynchronous Socket messaging done as:- keep references to workers instead of addresses- eliminate the “Listener” classes- each message done as an instance of the handler- create a handler for the destination worker- execute synchronous (blocking) part of data sending- start handler to execute asynchronous processing- each type of messages split into synch- and asynch- part • file IO done as:- store parameters as static constants- store input data as an array- replace input scanning with referencing the array- display or discard output • String.split() method (Regex) done as:- re-done as a String manipulation method
Results • issues reported - different issues at different settings - large volume of output to be analyzed • uncaught-exception conditions - issues regarding un-synchronized access - the above as IllegalMonitorStateException • dead-lock conditions - issues regarding termination conditions • PreciseRaceDetector -“Unprotected Variable Access” severe warnings • possibly more - it ran for a long time with no other errors - it did not finish in the time given
Future Work • atomize code - wrap code fragments into atomic operations • protect shared variable access - use locks of synchronized blocks - re-run PreciseRaceDetector • run it for an extended period of time - to search the complete state space • analyze the applicability of issues found - wrt the applicability to the original app - not as a result of the abstraction or transformation • reduce shared data interaction - handlers to create private data structures to be quickly accepted by corresponding main process - this will allow greater robustness and redundancy
Summary • JPF is a flexible and complex tool • JPF is memory- and time- intensive • JPF is a valuable verification tool • the application had to be changed extensively to work with JPF • potential issues were found by JPF • verification = value-added service extra testing code refinement (robustness)
Questions ???