170 likes | 277 Views
Pages. A File System based eScience Workbench Roger Menday Jülich Supercomputing Centre, Germany. Pages is …. … a Problem Solving Environment … inspired by UNICORE experience … acknowledging that it is difficult to beat the flexibility of doing things from the command line
E N D
Pages A File System based eScience Workbench Roger Menday Jülich Supercomputing Centre, Germany
Pages is … … a Problem Solving Environment … inspired by UNICORE experience … acknowledging that it is difficult to beat the flexibility of doing things from the command line … reflecting common usage patterns … following ‘convention over configuration’ … a unified description, execution and results … doing some challenging workflow tasks quite naturally and doing some easy things ‘quirkily’ … not perfect for all cases
‘Tinkering’ • An important part of the eScience process • Not always possible to specify everything up front • Iterative process • Want to react to changes • Start before knowing where it will end up … • Purely user driven workflow is useful • So is partially assisted workflow
Define Submit Execute Retrieve This parsing stage is the one which leads to static-ness. => Must eliminate it ! parse
Define Submit Execute Retrieve Eliminated ! (but what does this unified workflow look like ?) parse
Define Submit Execute Retrieve Now able to directly interact with the workflow during all stages of the workflow a bit like a spreadsheet Enhance / Redefine
Strategy • The working environment of the eScientist is the file-system • Through following conventions and using simple command-line tools, a workflow environment is provided by <pages> reading/writing the file-system in harmony with the user reading/writing the file-system
API u.follow(‘.containsjobs.containsjob’) { |i| i.delete } Deletes all jobs across my Grid u.follow(‘.containsstorage[name=‘home’]’) { |i| i.grind(3, ‘*.mp3’) { |i| i.backup( …dest… ) } } Find all mp3 files in all home storages (up to a depth of 3 in the hierachy), then backup
Side Note Such an API works best by having a unified model of the Grid - a graph of interlinked resources. And a uniform way of addressing these resources and interacting with them.
Command line … mirroring the API gak follow .containsjobs.containsjob | gak delete gak follow .containsstorage[name=‘home’] | gak grind –d 3 ‘*.mp3’ | gak backup –f storage
Bench, Book, Chapter, Page • books on a bench … a book for each research interest (?) • a book has a number of chapters • chapters consist of pages In addition, it would be nice to share one or more of my books with my colleagues …
Start from the command line (or using API) • Initialisation of a ‘long running task’ recorded • … in the file-system, at your PWD • Each task is a page
A collection of pages in a chapter • Directory is a chapter • Express dependencies between chapters • Directed Acyclic Graph • Dependencies recorded in a file for each chapter • A triggered chapter will trigger its contained pages (the book analogy breaks a little here)
The workbench will be monitored • For each page scripts are invoked as lifecycle events are reached (BEFORE, AFTER) • Similarly for each chapter, scripts are invoked (STARTED, READY, DONE)
Dependencies Explicit Implicit Implied processing order for the individual pages with a single chapter • … expression of dependencies • i.e. between chapters there is a explicit workflow ‘please read chapters 2 & 3 before reading chapters 6’
Assistance at the command line pages graph • draws graph of current workbook pages contents • lists all chapters pages link ‘n’ • link chapters pages clone • clones current book pages clone • clones current book pages book • list all books / creates new one pages www • publishes workbench as website pages facebook • publishes workbench as facebook website
Finally • Package workflow as zip/tar files • Behaviour via file-system/command line/API • Insert new behaviour through lifecycle scripts • Hold, pause, request input • Share results (selectively) to web • As much (or as little) automation as you need