1 / 42

Exploring Jupyter: Notebook & Ecosystem

Dive into Jupyter Notebook and its ecosystem at the ICALEPCS 2019 workshop. Learn about its tools and use cases with insights from various facilities. Discover how to share and export notebooks for research collaboration and reproducibility. Explore Jupyter as a versatile tool for accelerator physics, data analysis, and more.

eechevarria
Download Presentation

Exploring Jupyter: Notebook & Ecosystem

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. 09:00 - 09:30 Jupyter Notebook and Ecosystem 30', Hans Fangohr (European XFEL GmbH) • 09:30 - 10:00 Status updates from facilities • Jupyter at Synchrotron Soleil (Alain Buteau & Gwenaelle Abeille) 5' • Jupyter at CERN (Manuel Gonzalez & Jakub Wozniak) 5' • Jupyter at European Southern Observatory (Gianluca Chiozzi) 5' • Jupyter at J-PARC MLF (Kentaro Moriyama) 5’ • Jupyter at MAX IV Synchrotron (Vincent Hardion) 5’ • 10:00 - 10:30 Coffee break • 10:30 - 12:30 JupyterLab Tutorial 2h0', Saul Shanabrook (Quansight) • 12:30 - 14:00 Lunch and networking • 14:00 - 14:30 Jupyter for processing neutron event data at ESS 30', Dr. Jonathan Taylor (European Spallation Source) • 14:30 - 15:00 Jupyter at Brookhaven National Laboratory 30', Dr. Daniel B. Allan (Brookhaven National Lab) • 15:00 - 15:20 Jupyter for Accelerator Physics 20', Dr. Jonathan Edelen (RadiaSoft LLC) • 15:30 - 16:00 Coffee break • 16:00 - 17:00 Questions and answers, discussion, Hans Fangohr (European XFEL GmbH)

  2. Introduction Jupyter notebook and ecosystem Hans Fangohr Data Analysis European X-ray Free Electron Laser (EuXFEL), Germany Professor of Computational Modelling University of Southampton, United Kingdom hans.fangohr@xfel.eu @ProfCompMod Jupyter Workshop, ICALEPCS 2019, New York, US, Saturday 5 October 2019 https://indico.desy.de/indico/event/23354

  3. Outline • Quick introductionJupyter Notebook • Introduce a numberofusecases (mostfrom European XFEL) • Introduce a numberoftoolsfrom Project Jupyter Format (fortheday) • Informal • Askquestionstomakeit informal • Toguidethepresenterwhatisuseful • Slides will bemadeavailable on Workshop site (speakersplease send to Hans Fangohr)

  4. Jupyter Notebook • Document hosted in web browser (demo) • Combines • text (markdown with LaTeX support) • Computer code (Python) • Output from code • Saved in one document • *.ipynb [IPYthonNoteBook] • Combines input and output cells • JSON format

  5. Toolbar Notebook name Kernel name Markdown cell Currently selected cell Code cell Code output

  6. History of Jupyter • IPython IPYthonNoteBook  *.ipynb • IPython notebooks were language agnostic, as they run over open network protocols • The community began adding other languages to the notebooks, starting with Julia and R • As the project expanded away from just Python, the notebooks had to be renamed • JUlia, PYThon, and R  Jupyter • “Jupyter” is also a homage to Galileo’s notebooks which recorded the discovery of Jupiter’s moons

  7. How does it work? • Starting jupyter-notebook starts the notebook server • Opening a notebook starts up the kernel • The kernel communicates to the notebook server via ZMQ • The notebook server shows content via the browser • JavaScript used in Browser

  8. Supported Kernels • 50+ languages supported. • More complete list at • https://github.com/jupyter/jupyter/wiki/Jupyter-kernels

  9. Use case 1: data analysis in notebook • Explorative data analysis • Convenient combination of processing, results and interpretation • Complete capture of all computational steps • good record for reproducibility and re-use • FAIR data • Through export to HTML, easy to share with collaborators & supervisors • Scientists are confident drivers of this • Example on the right from SCS instrument

  10. Sharing andexportingnotebooks • Can share *.ipynb files • Can be displayed using jupyter-notebook • Code can be re-executed on collaborator‘s machine • Only if software is available, and data is available, and collaborators know how to start notebook • “Static sharing“ through html and email (for example) • Often sufficient – does not require installation or additional skills • Effective to communicate with supervisors, line managers etc • Convert using menu “File -> Download as -> html“ • Or using nbconvert: $ jupyter-nbconvert --to html 2-widgets.ipynb [NbConvertApp] Converting notebook 2-widgets.ipynb to html [NbConvertApp] Writing 382609 bytes to 2-widgets.html

  11. Usecase 2: notebooksasrecipes • Pre-populate notebook with cells to carry out a particular type of data analysis • Provide a directory full of such recipes to users • Users execute cells during beamtime and later • Convenient compromise between • Static recipe (=script) • Interactive exploration • Experience • Keep code in notebook cells short and • move functionality into library (here “ToolBox“) • Archive directory of modified recipes with data

  12. Use case 4: notebooks as a script • Use Jupyter Notebook as a script • Can execute using nbconvertto take commands in notebook, execute them, save resulting notebook. • Can create data files and plots in process. • Use case: detector calibration pipeline • Use of nbparametrize (or papermill) to insert run parameters into notebook before execution • Automatic execution and creation of pdf • Error messages embedded in output

  13. Executing a notebook from the command line (nbconvert) • Convert from ipynb to html: $ jupyter-nbconvert --to html 2-widgets.ipynb • Can optionally set output name$ jupyter-nbconvert --to html --output myout.html 2-widgets.ipynb • Can execute notebook before conversion: $ jupyter-nbconvert --execute --to html --output myout.html 2-widgets.ipynb • Execute notebook and save results as notebook: $ jupyter-nbconvert --execute --to ipynb --output myout.ipynb 2-widgets.ipynb [NbConvertApp] Converting notebook 2-widgets.ipynb to ipynb [NbConvertApp] Executing notebook with kernel: python3 [NbConvertApp] Writing 103398 bytes to myout.ipynb

  14. nbconvert • nbconvert is a tool which can convert notebooks to various other formats such as: • HTML, LaTeX, PDF, Markdown, ReStructuredText, Python, … • Static HTML pages can be created as documentation or tutorial • Nbsphinx-> Jupyter notebook provides section in sphinx documentation • Homepage: https://github.com/jupyter/nbconvert • Notebook as a script approach • Can change parameters inside notebook before execution: • nbparametrize or papermill

  15. Use case 5: (remote) data analysis environment (JupyterHub) • JupyterHub allows to • users connect through browser and https • use existing authentication systems • serve notebooks on facility hardware • connect to user’s file storage • Example: Jupyter Hub at EuXFEL & DESY • Uses Maxwell HPC cluster • Popular with users: • no software installation & browser of choice • works locally and remotely the same

  16. Using remote resources: JupyterHub • In principleusercansshto HPC resource, useportforwardingtoconnectlocalmachinewith HPC computerrunningnotebookserver • JupyterHub helps orchestrate and manage individual Jupyter instances for multiple users • It provides an interface allowing users to easily spawn and connect their own Jupyter Server • Different options for resource allocation • Integrated with HPC scheduler • . . .

  17. JupyterHub – manage individual Jupyter instances for multiple users

  18. JupyterHub – manage individual Jupyter instances for multiple users

  19. Use case 6: blending GUI and script • JupyterWidgets provide graphical control elements in notebook • Buttons, sliders etc trigger code execution and update of plot • Useful for • Data analysis of fixed type • Data exploration of data sets • Discussion • Less powerful than, for example, Qt GUI • Popular with users due to • Being embedded in notebook • No software installation (via JupyterHub)

  20. Binder project • Given a (github) repository with • Jupyter notebooks • Software requirements (Dockerfile, requirements.txt, environment.yml) • Binder service • builds a container with the required software • starts Jupyter notebook server in that container offering the notebooks • Binder project provides free pilot at • https://mybinder.org • Institutional Binder instances are being deployed Example: https://github.com/fangohr/jupyter-demo

  21. Example: https://github.com/fangohr/jupyter-demo

  22. Example: https://github.com/fangohr/jupyter-demo

  23. Example: https://github.com/fangohr/jupyter-demo

  24. BinderHub – mybinder.org

  25. Use case 7: documenting software library • Use notebook as chapter in documentation • Supported by sphinx html, pdf as usual • Documentation easy to create: • enter commands in notebook • ouput is produced automatically • updating docs means re-running notebook • Can run regression test on documentation notebooks using NoteBookVALidate (nbval)

  26. Use case 7: documenting software library • Use notebook as chapter in documentation • Supported by sphinx html, pdf as usual • Documentation easy to create: • enter commands in notebook • ouput is produced automatically • updating docs means re-running notebook • Can run regression test on documentation notebooks using NoteBookVALidate (nbval) • With Binder, can make documentation executable (for example DiscretisedField Tutorials)

  27. nbval • py.test is a popular Python testing framework • nbval is a py.testplugin which lets py.test recognise and collect Jupyter notebooks • In each notebook, each cell is a test: • The test passes if execution of the input creates the stored output • The test fails otherwise • There is a variety of configuration parameters • Home page: - https://github.com/computationalmodelling/nbval

  28. nbval example $ py.test --verbose --nbval 2-widgets.ipynb ============================= test session starts ============================== platform darwin -- Python 3.6.8, pytest-3.10.0, py-1.7.0, pluggy-0.8.0 -- /Users/fangohr/anaconda3/bin/python rootdir: /Users/fangohr/Desktop/jupyter-demo, inifile: plugins: remotedata-0.3.1, openfiles-0.3.0, doctestplus-0.1.3, arraydiff-0.2, nbval-0.9.1 collected 9 items 2-widgets::ipynb::Cell 0 PASSED [ 11%] 2-widgets::ipynb::Cell 1 PASSED [ 22%] 2-widgets::ipynb::Cell 2 PASSED [ 33%] 2-widgets::ipynb::Cell 3 PASSED [ 44%] 2-widgets::ipynb::Cell 4 PASSED [ 55%] 2-widgets::ipynb::Cell 5 PASSED [ 66%] 2-widgets::ipynb::Cell 6 PASSED [ 77%] 2-widgets::ipynb::Cell 7 PASSED [ 88%] 2-widgets::ipynb::Cell 8 PASSED [100%] ===================== 9 passed, 1 warning in 2.11 seconds ====================== (base) [20:49:09] fangohr:jupyter-demo git:(master*) $

  29. Use case 8: reproducible publication • Create github repository to complement publication • Create one notebook per figure / main result • Define software environment in github repository using Binder syntax • Close to reproducible publication: • fully specified software environment • fully specified data analysis • Data access i Andy Götz talk, PaNOSC • Zenodo for long term preservation • Create Zenodo deposit for repository • Cite Zenodo DOI in publication Example: https://github.com/maxalbert/paper-supplement-nanoparticle-sensing

  30. Jupyter Lab • Next generationJupyternotebookinterface • Windowmanagerembedded in browser • “classic notebook“ in onewindow • Additional features in otherwindows • File browser • Extensions • CSV viewer.

  31. Multiple tabs Multiple panels Explore files (e.g. CSV) Table of Contents extension Dark theme!

  32. Jupyter Notebook vs. JupyterLab • JupyterLab is the newer interface to the notebooks • It provides a more flexible interface closer to a modern IDE • More on this later in the day • Flexible layouts and panes • More flexible console/text editor • Drag/Drop/Expand/Collapse cells • Themes • Extensions

  33. Nbdime – NoteBookDIff and MErge • Jupyter notebooks are rich media documents, stored as plain text JSON files • Basic diff and merge tools (such as that used by git) do not handle this format well • Small changes to text/plots result in unreadable diffs • nbdime provides multiple tools to help with “content-aware” diffing and merging for Jupyter notebook files • nbdiff compare notebooks in a terminal-friendly way • nbmerge three-way merge of notebooks with automatic conflict resolution • nbdiff-web shows you a rich rendered diff of notebooks • nbmerge-web gives you a web-based three-way merge tool for notebooks • nbshow present a single notebook in a terminal-friendly way • Homepage: -https://github.com/jupyter/nbdime

  34. Summary – tools • Jupyter Notebook • Jupyter Lab – next generation Jupyter user interface • Jupyter Hub – serving notebook from compute facility for multiple users • NBDIME – DIffing and MErging tools • NBVAL – VALidation tool; use each cell as a test • NBCONVERT – conversion of notebooks to other formats & execution • NBParametrize and Papermill – inject parameters into notebook files • IPyWidgets – GUI like elements in notebook • Binder – Cloud hosted execution of notebooks from github repositories • There is much more.

  35. Summary – use cases • Use cases • Data analysis • Provision of recipes • Notebook-as-a-script • Remote data analysis (JupyterHub) • Mixing GUI and script-driven analysis • Documentation of Software • Reproducibility • . . .

  36. Summary • Jupyter notebook and ecosystem provides many options • Acknowledgements • Authors and co-authors of ICALEPCS contribution TUCPR02 (Tuesday 14:30) • Robert Rosca and Thomas Kluyver • OpenDreamKit Horizon 2020, European Research Infrastructures project (#676541), http://opendreamkit.org • PaNOSC: Photon and Neutron Open Science Cloud, European Union’s Horizon 2020 research and innovation programme under grant agreement No 654220 • The Gordon and Betty Moore Foundation through Grant GBMF #4856, by the Alfred P. Sloan Foundation and by the Helmsley Trust. • EPSRC's Centre for Doctoral Training in Next Generation Computational Modelling, http://ngcm.soton.ac.uk (#EP/L015382/1),

  37. 09:00 - 09:30 Jupyter Notebook and Ecosystem 30', Hans Fangohr (European XFEL GmbH) • 09:30 - 10:00 Status updates from facilities • Jupyter at Synchrotron Soleil (Alain Buteau & Gwenaelle Abeille) 5' • Jupyter at CERN (Manuel Gonzalez & Jakub Wozniak) 5' • Jupyter at European Southern Observatory (Gianluca Chiozzi) 5' • Jupyter at J-PARC MLF (Kentaro Moriyama) 5’ • Jupyter at MAX IV Synchrotron (Vincent Hardion) 5’ • 10:00 - 10:30 Coffee break • 10:30 - 12:30 JupyterLab Tutorial 2h0', Saul Shanabrook (Quansight) • 12:30 - 14:00 Lunch and networking • 14:00 - 14:30 Jupyter for processing neutron event data at ESS 30', Dr. Jonathan Taylor (European Spallation Source) • 14:30 - 15:00 Jupyter at Brookhaven National Laboratory 30', Dr. Daniel B. Allan (Brookhaven National Lab) • 15:00 - 15:20 Jupyter for Accelerator Physics 20', Dr. Jonathan Edelen (RadiaSoft LLC) • 15:30 - 16:00 Coffee break • 16:00 - 17:00 Questions and answers, discussion, Hans Fangohr (European XFEL GmbH)

More Related