170 likes | 362 Views
New Developments in Kepler . January, 23, 2006 Ilkay Altintas. Kepler System Architecture. Authentication. GUI. …Kepler GUI Extensions…. Vergil. Documentation. Smart Re-run / Failure Recovery. Provenance Framework. Kepler Object Manager. SMS. Type System Ext. Actor&Data
E N D
New Developments in Kepler January, 23, 2006 Ilkay Altintas
Kepler System Architecture Authentication GUI …Kepler GUI Extensions… Vergil Documentation Smart Re-run / Failure Recovery Provenance Framework Kepler Object Manager SMS Type System Ext Actor&Data SEARCH Kepler Core Extensions Ptolemy
Joint Authentication Framework • Requirements: • Coordinating between the different security architectures • GEON uses GAMA which requires a single certificate authority. • SEEK uses LDAP with has a centralized certificate authority with distributed subordinate Cas • To connect LDAP with GAMA • Coordinating between 2 different GAMA servers • Single sign-on/authentication at the initialize step of the run for multiple actors that are using authentication • This has issues related to single GAMA repository vs multiple, and requires users to have accounts on all servers. • Kepler needs to be able to handle expired certificates for long-running workflows and/or for users who use it for a long time. • A trust relation between the different GAMA servers must be established in order to allow for single authentication.
Functional Prototype Completed • APIs and tests cases in place • More work required on certificate renewal and multiple server access
Vergil is the GUI for Kepler Actor Search Data Search • Actor ontology and semantic search for actors • Search -> Drag and drop -> Link via ports • Metadata-based search for datasets
Actor Search • Challenges: • Building/searching a repository … • Making changes to MoML (see KAR) • GUI changes • Ontology management • Kepler Actor Ontology • Used in searching actors and creating conceptual views (= folders) • Currently 160 Kepler actors added!
Data Search and Usage of Results • Kepler DataGrid • Discovery of data resources through local and remote services • SRB, • Grid and Web Services, • Db connections • Registry of datasets on the fly using workflows
Vergil Updates • To make it more useful to the user • Updated actor icons • Menu redesign • Improve readability • Develop cohesive visual language • Follow standard HF principles • Improve organization Composite DB Query Computation or Operation Transformation Filter File Operation Web Service
Kepler Archives • Purpose: Encapsulate WF data and actors in an archive file • … inlined or by reference • … version control • More robust workflow exchange • Easy management of semantic annotations • Plug-in architecture (Drop in and use) • Easy documentation updates • A jar-like archive file (.kar) including a manifest • All entities have unique ids (LSID) • Custom object manager and class loader • UI and API to create, define, search and load .kar files
KAR File Example <entity name="Multiply or Divide" class="ptolemy.kernel.ComponentEntity"> <property name="entityId" value="urn:lsid:localhost:actor:80:1" class="org.kepler.moml.NamedObjId"/> <property name="documentation" class="org.kepler.moml.DocumentationAttribute"></property> <property name="class" value="ptolemy.actor.lib.MultiplyDivide" class="ptolemy.kernel.util.StringAttribute"> <property name="id" value="urn:lsid:localhost:class:955:1" class="ptolemy.kernel.util.StringAttribute"/></property> <property name="multiply" class="org.kepler.moml.PortAttribute"> <property name="direction" value="input" class="ptolemy.kernel.util.StringAttribute"/> <property name="dataType" value="unknown" class="ptolemy.kernel.util.StringAttribute"/> <property name="isMultiport" value="true" class="ptolemy.kernel.util.StringAttribute"/></property> <property name="divide" class="org.kepler.moml.PortAttribute"> <property name="direction" value="input" class="ptolemy.kernel.util.StringAttribute"/> <property name="dataType" value="unknown" class="ptolemy.kernel.util.StringAttribute"/> <property name="isMultiport" value="true" class="ptolemy.kernel.util.StringAttribute"/> </property> <property name="output" class="org.kepler.moml.PortAttribute"> <property name="direction" value="output" class="ptolemy.kernel.util.StringAttribute"/> <property name="dataType" value="unknown" class="ptolemy.kernel.util.StringAttribute"/> <property name="isMultiport" value="false" class="ptolemy.kernel.util.StringAttribute"/></property> <property name="semanticType00" value="http://seek.ecoinformatics.org/ontology#ArithmeticMathOperationActor" class="org.kepler.sms.SemanticType"/> </entity>
Kepler Object Manager • Designed to access local and distributed objects • Objects: data, metadata, annotations, actor classes, supporting libraries, native libraries, etc. archived in kar files • Advantages: • Reduce the size of Kepler distribution • Only ship the core set of generic actors and domains • Easy exchange of full or partial workflows for collaborations • Publish full workflows with their bound data • Becomes a provenance system for derived data objects => Separate workflow repository and distributions easily
Initial Work on Provenance Framework • Provenance • Track origin and derivation information about scientific workflows, their runs and derived information (datasets, metadata…) • Need for Provenance • Association of process and results • reproduce results • “explain & debug” results (via lineage tracing, parameter settings, …) • optimize: “Smart Re-Runs” • Types of Provenance Information: • Data provenance • Intermediate and end results including files and db references • Process (=workflow instance) provenance • Keep the wf definition with data and parameters used in the run • Error and execution logs • Workflow design provenance (quite different) • WF design is a (little supported) process (art, magic, …) • for free via cvs: edit history • need more “structure” (e.g. templates) for individual & collaborative workflow design
Kepler Provenance Recording Utility • Parametric and customizable • Different report formats • Variable levels of detail • Verbose-all, verbose-some, medium, on error • Multiple cache destinations • Saves information on • User name, Date, Run, etc…
Provenance: Possible Next Steps • Provenance Meeting: Last week at SDSC • Deciding on terms and definitions • .kar file generation, registration and search for provenance information • Possible data/metadata formats • Automatic report generation from accumulated data • A GUI to keep track of the changes • Adding provenance repositories • A relational schema for the provenance info in addition to the existing XML
What other system functions does provenance relate to? • Failure recovery • Smart re-runs • Semantic extensions • Kepler Data Grid • Reporting and Documentation • Authentication • Data registration Re-run only the updated/failed parts Guided documentation generation an updates
Hot Topics in Kepler http://kepler-project.org/Wiki.jsp?page=HotTopics