260 likes | 431 Views
"A Python Library for Provenance Recording and Querying“ "Requirements for a Provenance Visualization Panel“. Presentation on IPAW‘08: Henning Bergmeyer. Overview. Brief Overview: Provenance System A Python Library for Provenance Recording and Querying Usage Examples
E N D
"A Python Library for Provenance Recording and Querying“"Requirements for a Provenance Visualization Panel“ Presentation on IPAW‘08: Henning Bergmeyer
Overview • Brief Overview: Provenance System • A Python Library for Provenance Recording and Querying • Usage Examples • Initializing, Recording, Querying, Extending • Architecture • Requirements for a Provenance Visualization Panel • User Groups and Intentions • Graphical Representation and Exploration • Requirements
Main Model Concepts of the Provenance System„Grid Provenance“, PReServ 1.0 (University of Southampton) • Interactions between actors • Relationships (1 subject, 1..n objects, 1 relation type) • are dependencies between interactions (e.g. cause-and-effect) • describe internal, otherwise hidden functionality of actors • Actor States • are assertions about internal states of actors • Interaction Records • complete documentation of an interaction through assertions of all influencing incidents and dependencies • Tracers • unique markers that serve to identify individual workflow executions • distributed along message paths
Main System Concepts of the P-System • Distribution • Several connected P-Stores • differentiation of asserter views
"A Python Library for Provenance Recording and Querying“(Roland Gude, Carsten Bochner)
A Python Library for Provenance Recording and Querying • Open Source: http://sourceforge.net/projects/provenance-csl/ • Purpose • easy Provenance recording and querying for Python applications or applications with interface to Python • independent of Java on the client side • Examples for • Initialization • Recording • Querying • Extending own types
Code Examples: Initialization from provenance.api import * • looks like bad coding style at first • but automatic lazy-loading of required modules prevents severe performance losses cl = client.Client(“http://localhost:8080”, asserter=“me”) • That‘s it! • A trace file can be specified to log communication with P-Store
Code Example: Recording subj = utils.createSubjectId(1, “dataAccessor”, "parametername") objlist = [utils.createObjectId( utils.createInteractionKey("http://sink", http://source"), pAssID, 'anything', 'dataAccessor', 'parameter', 'isSender')] keys,response = self.cl.record([ [utils.createActorState(a_content_0, doc_style), utils.createRelationship(subj, rel_type, objlist), utils.createInteraction(m_content_0, doc_style), utils.createInteraction(xml_content_0) ] ], "isSender", sink, source) res = interfaces.IRecordAck(response)
Code Example: Querying queryString = "for $n in $ps:pstruct return $n" response = self.cl.query(queryString) result = interfaces.IQueryAck(response) • Afterwards „result“ contains an XML structure containing all „pstructs“ available in that store.
Architecture • SOAP interface translated from WSDL by ZSI • pyProtocols • Python lacks of OO-concept "Interfaces" • pyProtocols allows protocol definitions and automatic adaption • used to make SOAP interface transparent to user • Lazy-loading • PEAK framework
Code Example: Extending Types class IAddress(IZSITypeCode): """ interface for string typecodes """ def getAsString(self): """ returns a String with the Value of the Stringlike. """ IString = protocols.protocolForType(basestring,[]) class AddressAdapter(object): protocols.advise(instancesProvide=[IAddress], asAdapterForProtocols=[IString]) def __init__(self, string): self._delegate = serverAPI.Address(string.__str__()) def getAsString(self): return self._delegate.__str__() def toTypeCode(self): return self._delegate
Requirements for a Provenance Visualization Panel (Markus Kunde, Henning Bergmeyer)
Motivation • Determine requirements for a Provenance visualization panel • Requirement to document Provenance in our projects (e.g. AeroGrid) • No specification for concrete use of the documented provenance, yet • => Tool at least for general browsing of low-level documentation is needed • Raw provenance data in XML is hard to browse • Verification of records • Experimental browsing to determine better query and interpretation methods • Panel provided by project „Grid Provenance“ not suitable
Approach • Identify User Groups • User interests (What do they want to explore?) • User intentions (Why do they want to explore that?) • Analyse the Provenance data structure • Elements • Properties • Connections • Scale • Determine visualization and analysis methods • What information to be shown, • Where to show it • When, for how long, static or animated • Clear and consistent semantics for visual elements • Determine exploration strategy
Identifying User Groups • Interest / Scope • What documentation is asked for? • What documentation is a user allowed to see? • Abstraction high-level border, range of access • Intention • Why is that documentation asked for? • Abstraction low-level border, type and level of detail of required documentation
Identified User Groups • General User • Scientist, Engineer, Portal User • Interest: own work, own results,origin of used data • Intentions: reliability and authenticityof results, reproducibility • Designer • Software Engineer, Workflow Developer • Project related, all origins, monitored system, partner-made components • workflow behavior, service interaction, product evolution • Manager • Workflow Provider, Provenance Analyst, User Support • all assigned user and system Provenance • correctness of services, interpretation support, quality of the P-system • Administrator • Developer / Admin of Provenance System • all P-data available in connected P-stores • building the P-system and maintaining its function
User Analysis Intentions Process: Evaluation of the approach of a workflow Actors, Interactions, Sequence of Process steps Results: Quality of intermediate and end results of processes Dependencies of inputs and outcome Relationship: Analysis of the evolution of data Relationships of interactions or actors Time Line Finding performance bottlenecks, improving workflows Evolution of results, actor behavior Participation Trust to result Participating actors Comparison Validate correctness of processes and results, by comparing documented executions with reference structures, like processes, views on interactions, results Interpretation Custom visualization requirements, deriving knowledge from Provenance data Custom, probably all aspects => Exploration required
Exploration • Difficulty in a large scale graphic exploration system: • Where to start? • Begin with on overview • Select processes, interaction channels or actors • Fade out the rest and choose specific detail visualizations. • Read application specific content
Focus on Interaction Process Map (inspired by tube map) • Processes • Participating Actors • Bottlenecks Interaction Stretch • Individual Interactions • Relationships and order
Combined Flow-Chart • Typical Data Flow Graph • Shows directions of message flows • No notion of time => Requires previous selection of recorded process. System / Process Context
Process Aerial • Find individual executions of selected processes • Find anomalies • Show only interesting actor states and relationships • Scrolling up and down along time axis
Graphical and Exploration Requirements • distinct, consistent representations of documentation elements to allow intuitive interpretation • extensible support of different layout methods • adjustment of alignment helps to interpret • switching of scope and detail • proxy displays for large data sets • e.g. navigation maps • mixing and migrating of layouts (animated)
Architectural Requirements • support of VO management • store access • actor/asserter views • caching and merging of query results • extensible architecture • layout methods • element representations • exploration methods • "content" support • GUI abstraction • Web Portals • Desktop Applications