1 / 19

The Collage Authoring Environment: a Platform for Executable Publications

The Collage Authoring Environment: a Platform for Executable Publications. Piotr Nowakowski , Eryk Ciepiela, Tomasz Bartyński , Grzegorz Dyk , Daniel Harężlak , Marek Kasztelnik , Joanna Kocot, Maciej Malawski and Jan Meizner ACC CYFRONET AGH Kraków, Poland. Presentation outline.

andie
Download Presentation

The Collage Authoring Environment: a Platform for Executable Publications

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. The Collage Authoring Environment: a Platform for Executable Publications Piotr Nowakowski, Eryk Ciepiela, Tomasz Bartyński, Grzegorz Dyk, Daniel Harężlak, Marek Kasztelnik, Joanna Kocot, Maciej Malawski and Jan Meizner ACC CYFRONET AGH Kraków, Poland

  2. Presentation outline • Problem description • Outline of our solution • Collage from the end user’s perspective • Conducting computational experiments • Declaring executable content • Embedding executable content in a research paper • Publishing and accessing the paper • Some technical information • Discussion

  3. The gist of the problem • Modern computational science revolves around massive volumes of data and complex algorithms to process said data (case in point: a single proteomics study on which our team currently collaborates with the Jagiellonian University Medical College is expected to generate and reprocess 15 TB of data). • Traditional means of publishing scientific results – i.e. the research paper – is woefully incompatible with this type of research. It does not lend itself to publishing and sharing large volumes of data. Ultimately, the publication cannot stand on its own merits – there is no way to verify the published research basing on the publication alone. Here’s what I found out: Protein folding conforms to Gauss’ „fuzzy oil drop” model. Here’s how I figured it out: I have discovered a truly marvelous algorithm proving this, which this paper is too short to contain! So instead I’ll just say that I downloaded some data from PDB, wrote a bunch of Python scripts, set up a custom database and crunched the numbers. Here’s the Gnuplot diagram showing my results. By the way, I can’t give you my actual data (because there’s too much of it) or the application (because you won’t be able to install it), so I guess you’ll just have to trust me on this one… Here’s what I found out: e-iπ = 1 Here’s how I figured it out: According to Euler [1] eix = cos x + i sin x Since cos π = -1 and sin π = 0 it follows that eiπ + 1 = 0 and hence e-iπ = 1 Modern computational scientist Traditional researcher

  4. Some observations… • Computational science often involves the generation of one-off applications and temporary data which is subsequently used to obtain publishable results. • Validating such software is a crucial part of ensuring that the reported results remain trustworthy. • However, computational scientists are not IT professionals. Producing publishable software involves great effort, which is not usually budgeted for in the course of scientific research (or indeed considered part of it). • Thus, the best-case scenario is that the IT tools used to generate scientific results remain unverifiable. The worst-case scenario is that they’re flawed and produce bogus results (which are, again, unverifiable in any meaningful way). Well, we have this Ruby application my grad students developed, but you don’t really expect me to write a user interface for it…? Here’s the list of libraries our software requires to work… Hmm, I didn’t expect the user could enter a negative value in this field… What’s a DDoS attack…? Modern computational scientist

  5. So, what are we trying to accomplish? • Thegoal of Collage is to enableauthors of scientific papers to embedexecutablecontentintheirpublications; • The environment isaimedat scientific disciplineswhich make heavy use of computationaltechnologies (includingmolecularbiology, genomics, virology etc.); • …however, the Collage platform isgeneric and may be adoptedinanyarea of science wherethereisneed to conductcomputationsorbrowselargeresultspaces.

  6. Ourconceptin a nutshell • Collage works by allowingauthors to embedpieces of interactivecontent (calledassets) inonlineresearchpublications; • Interactivecontentmaydirectlyexploitthecodewhich was used to obtainthepublishedresults; • Publicationscan be viewedonline, withinteractivecontentavailable to authorizedusers (Collage managesuserauthorization and data encryptionduring transfer); • Execution of interactivecodeisperformed by a dedicatedcomputingbackend, whichcanfurtherdelegatecomputations to HPC resources and data repositories; • Ouptutcan be updatedautomaticallywhenevertheexperimentisreenacted. Collage supportsgraphicalvisualization of experimentresults (diagrams, images etc.) Access experimentcodesnippetsand executethem on thefly Providearbitraryinput data usinginteractiveforms Reviewresults of computations (includingimages), automatically updatedduringexecution

  7. Collage fromtheenduser’sperspective • Collage followsthe standard research-publish-reviewmodel, wellknown to computationalscientists; • A dedicatedExperimentation UI (Web-based IDE) ispresented to theresearcher, enablingiteractive development of experiments and providingaccess to computational resources; • Oncecompleted, theexperimentcan be directlyused to provideinteractivecontent to thereader, via theseparateAuthoring UI; • BothUiscan be securedagainstunauthorizedaccess, according to policiesdefined by thepublisher. All data istransmittedsecurely, withtheuse of encryptedprotocols. Reader (incl. reviewers) Computationalscientist (publicationauthor) • Experimentation UI • Iterativelydevelop • experiments and perform • computations • Interface HPC resources • Tag assets for publication • Authoring UI • Preparepublications • Embedinteractiveassets • Authorizereaders • Display publications and mediateinteractivity 3. Review publication 1. Conduct research 2. Publish results

  8. Collage servers and interfaces Experimentation UI Authoring UI • Collage Server • Alsocalledtheexperiment workbench server; • Acts as a gatewaybetweentheenduser and theunderlying computational resources (called experiment hosts); • Servesalldynamiccontent; • Controls execution of experiments; • Experimentdevelopersaremapped to useraccounts on the Collage Server; • Publisher Server • Servestheexecutable paper, whichincludestheframework of thepublication and all of itsstaticcontent; • Can be based on any Web authoring software, theonlyrequirementbeingtheability to embedarbitrary HTML codeinthedocument; • Follows a separateauthorizationpolicy.

  9. TheExperimentation UI • TheExperimentation UI, based on theGridSpaceExperimentWorkbench, is a full-fledged IDE whereexperimentscan be developed and executedwiththeuse of a Web interface; • Eachexperimentconsists of snippets, whichcan be expressedinanyprogramminglanguagesupported by the experiment host; • TheWorkbenchcan be used to access and managefilesstoredinthedeveloper’shomedirectory on the experiment host; • The UI providesfacilities for sharing and embeddingexperiments, storing and accessingconfidential data and declaringassetswhichcan be embeddedinthepublication. Snippet management panel Interpreter selector Snippetcodewindow Developer console File management utilities Useraccount management

  10. Writingexperiments Snippets #1 and #2 Snippets #4 and #5 Snippet #3 (code) • Writingexperimentsis as simple as typing(orpasting) executablecodeintheExperimentWorkbencheditor, whichis part of theExperimentation UI; • The Experiment Workbench server (Collage Server) can communicate with multiple experiment hosts. Depending on theconfiguration of theexperiment host, a variety of interpretersareavailable, includinggeneral-purposeprogramminglanguages (Ruby, Python, Perl), shellscripting (includinginteractiveshellsessions) and customtools (such as Mathematica, Matlab etc.); • Any toolwhichoffers a command-lineinterfacecan be used as a Collage interpreter. Additionalinterpretersareeasy to set up, oncetheyhavebeeninstalled on theexperiment host; • Snippetscan be executedsequentiallyorindividually, to supportexploratoryprogramming. • Snippet management panel • Select interpreter • Manageassets and secrets • Executesnippet • Add/removesnippets • Mergesnippets

  11. Declaringassets • Assetsaretheprimarymechanism by which a Collage publicationcan be enrichedwithinteractive elements. Assets are meant to be embedded in HTML documents; • Eachsnippet may declare one ormoreassets, includinginputassets(required by thesnippet to performitscalculations) and outputassets(visualizations of output data). Eachassetismapped to a file on the Collage experiment host; • Assetscan be reused – for instance, multiplesnippetsmayrely on the same inputasset, while an outputasset of one snippetcanserve as input for anothersnippet; • Declaring and managingassetshas no impact on experiment code: Collage does not alterthesyntax of theprogramminglanguagesused to develop snippets. Assetsalreadydeclared for thissnippet Declaring a newasset (includesallassetsalreadydeclaredwithintheexperiment)

  12. Types of Collage assets (1/2) • Master asset (1 per experiment) • Must be embeddedintheExecutable Paper in order to allowaccess to otherassets; • Handlesuser login and authorizesaccess to interactive content. • Snippet assets (1 per snippet) • Containsnippetcode and enableviewers to modify/executethiscode on theExperiment Host; • Executing a snippetautomaticallyupdatesalloutputassetswhichdepend on that snippet; • EmbeddingsnippetassetsinExecutablePapersis not mandatory (users may also invoke operations by manipulating input assets).

  13. Types of Collage assets (2/2) • Input assets (snippet-specific) • Provideinput data for snippets, required to performcomputations; • Embeddingthistype of assetintheExecutable Paper enablesthereader to feedcustom data intothe experiment; • In addition to being able to upload files to the experiment host, Collage alsoprovides a convenient Web form mechanismthroughwhichinputassetsmayrequest data in a user-friendlymanner. • Output assets (snippet-specific) • Representtheresults of computationsperformed by snippets; • Embeddingthistype of assetintheExecutable Paper enablesthereader to view and downloadexperimentoutput; • Output assets arerefreshedwheneverthesnippets on whichtheydependareexecuted by thereader.

  14. Publishingassets • TheExperimentation UI provides a convenientmechanism by whichassetscan be embeddedin an externalpublication (such as theExecutable Paper); • For eachasset, the UI generatessuitable HTML embedcode. Insertingthiscodeintoyourpublicationenablesit to visualizetheselected asset; • The embed code may be customized (for instance, the author may change the default width and height of the asset); • While Collage comeswith a preinstalledAuthoring UI based on theWordPress CMS system, anyauthoring software may be used to prepareexecutablepapers – as long as itenablesusers to embedcustom HTML codeintheirpublications. Assetsdeclared by thisexperiment (clickasset to viewitsembedcode) Generatesampledocumentwithallassets Embedcode for selectedasset

  15. Embeddingassets – a detailedview • Theassetembedcodeinstructsthe Publisher Server to inject an IFrame element intothedocumentbeinggenerated; • Thepayload (content) of this element isserved by the Collage Server – thus the publication becomes a Web mashup. In thiswayassetwindowscanaccessfiles and experimentsstored on theExperiment Host; • Different management optionsareexposed by theIFrame, depending on thetype of assetbeingvisualized; • As IFramesmaycommunicatewith one another, itispossible to refreshoutputassetswhenthesnippet upon whichtheyarebased finishes executing. Thisishandledautomatically by the Collage Server. IFramewidget Assetpayload (served by the Collage Server via SSL) DownloadUploadOpen

  16. Interactingwith an Executable Paper – a detailedview (1/2) • Thestaticcontent of theExecutable Paper can be served by the Publisher Server without Collage Server involvement; • Dynamic content is served by the Collage Server directly (bypassing the Publisher Server); • Publisher and HPC provider roles are decoupled and follow mutually independent access policies (including authentication, authorization, accounting etc.) Access to static content is controlled by the Publisher Server while access to interactive elements requires a Collage Server account. 1a. Readernavigates to URL whichhousesthepublication Collage Server Publisher Server 1b. Publisher Server displaysthestaticcontent of thepublication, withplaceholdergraphics for eachasset 2. Readerusesthepre-embedded Master Asset to authenticateselfwiththe Collage Server 3. Collage Server responds by refreshingexperimentassets and populatingthemwithinitialvaluesspecified by theexperiment developer

  17. Interactingwith an Executable Paper – a detailedview (2/2) • Theusermayinteractwitheachasset by usingthe controls provided by theasset’sIFrame (which is specific to the type of asset being visualized); • Interaction is backended by the Collage Server which may delegate requests to HPC resources (where available); • Assets are automatically refreshed without reloading the entire Executable Paper. 4. Readerclicks „Execute” insnippetasset window, or submits a Web form with input data 5. Executionrequestishandled by Collage Server Collage Server HPC Resources 6. Executionrequestmayoptionally be forwarded to attached HPC resources. Collage provides a mechanism to securelystoreusercredentialsrequired for access 7. Onceexecutioncompletes, Collage Server automatically populates the relevantoutputassets 8. Output data mayalso be downloaded by theuser

  18. SciVerse Integration

  19. For furtherinformation… • For information regarding the pilot deployment of Collage, visit http://collage.elsevier.com • A moredetailedintroduction to Collage (includingusermanuals and samplepapers) can be foundathttp://collage.cyfronet.pl

More Related