1 / 50

Highlights from EPC 2006

Highlights from EPC 2006. Vincenzo Innocente On behalf of the Local Organizing Committee. EuroPython at CERN. EuroPython conference organized by SFT this year! Three days Parallel sessions in Bld 40 Keynotes and “Lightning” in main auditorium Dinner in the Globe 280 participants

Download Presentation

Highlights from EPC 2006

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Highlights from EPC 2006 Vincenzo Innocente On behalf of the Local Organizing Committee

  2. EuroPython at CERN • EuroPython conference organized by SFT this year! • Three days • Parallel sessions in Bld 40 • Keynotes and “Lightning” in main auditorium • Dinner in the Globe • 280 participants • 100 presentations (w/o lightning) • 5 by “CERN” VI @ EPC06

  3. Schedule 4 parallel sessions (in bld 40) • All synchronized • 5 minutes pause between talks • Easy for people to move from one session to another Plenary Lightning & key notes (in Main Amphi) VI @ EPC06

  4. Scientific Program • 7 tracks • Python in Science • Python Language & Libraries • Agile Development • Web Frameworks • Business and Applications • Teaching • Games and Entertainment VI @ EPC06

  5. Community • Who • Wide age spectrum • Many in post-doc age-range • All 5 continents • Very few women (1-2%, all managers?) • Where • Mostly Companies developing Software Solutions • Revenue from Selling custom products or services • Find business advantages • In using open source software (contribute to its development) • Develop components reusable beyond a specific project • Some Research Labs • Domain specific applications • Reuse in the community (adapting to pre-existing “habits”) VI @ EPC06

  6. Community • What: • Core language development • Web framework, web applications • Software development tools (web based) • Scientific data processing, visualization • No sys-admin, net-admin, embedded-software, office automation • Why: • Hear news about Language, Libraries, key products (Zope,…) • Discuss, propose, complain • Present their products • In many cases just a spin-off component • Work (in Sprint sessions) VI @ EPC06

  7. Messages • Python: • A language for rapid-prototyping, extreme-programming, just-in-time deployment • THE integration framework • THE Business Domain Language • THE embedded scripting language Python is faster than Assembler VI @ EPC06

  8. Outline • What I will not cover • Latest greatest features of Python • Python 3000 • SciPy, PyTables, PyPy, Zope, Plone, Gjango,… • Python in HEP • Google…. • I will focus on • Python: a framework for scientific application • Building and sharing components • Python: from fast-prototyping to engineered code • Dispersed development VI @ EPC06

  9. Scientific Frameworks

  10. MGL ToolsIndependent and re-usable component for structural bioinformatics VI @ EPC06

  11. MGL ToolsIndependent and re-usable component for structural bioinformatics VI @ EPC06

  12. VI @ EPC06

  13. AutoDock tools VI @ EPC06

  14. VI @ EPC06

  15. VI @ EPC06

  16. Python Molecular Viewer VI @ EPC06

  17. VI @ EPC06

  18. Pyphant VI @ EPC06

  19. Pyphant application VI @ EPC06

  20. Pyphant architecture VI @ EPC06

  21. Worker Code VI @ EPC06

  22. Building & Sharing Components

  23. VI @ EPC06

  24. Builds upon SciPy (data representation) And HDF5 (I/0 layer) VI @ EPC06

  25. VI @ EPC06

  26. The Company VI @ EPC06

  27. The Customer VI @ EPC06

  28. The “new” Components • For this customer they had to two additional requirements to fulfill: • Avoid to blow the CMS with binary files • Count the number of accesses • They developed two lightweight products • Plug in the deployed solution • Reuse the existing infrastructure • Reusable outside this project and company • Extendable to other architecture/framework • Contribution to open source software VI @ EPC06

  29. Tramline • Tramline plugs between Apache and Plone/ZOPE • On Upload: • extract data to disk • Assign id • Store id in ZOPE • On download • Replace id with file content VI @ EPC06

  30. Linktally • Scan logs • Count request • Store in the DB as Metadata • Rank content in CMS VI @ EPC06

  31. LinkTally status & prospects Now Solution for one customer Limited spin-off Evolution Contribution from community Spin-in: use it in other projects! VI @ EPC06

  32. From a prototype to a product

  33. The Indico Technology • Main programming language: Python • Runs on Apache using the Python module mod_python • Persistence based in ZODB (Zope Object Database) • Transparency: no need for explicit read/writes of the objects • Fits very well with Indico complex object model • Proven performance and scalability • Timetable generation: libXML, libXSLt + python bindings • Portable technologies: runs on Windows, linux • Export gateways: • iCalendar ; XML ; PDF outputs • OAI (Open Archive Initiatives) for ensuring integration with other services • Standard protocol for information exchange between digital libraries • Allows to expose conference data • Allows other systems to fetch conference data and build services over it • Simple mechanism  XML over HTTP VI @ EPC06

  34. The Invenio Technology • Main programming language: Python • Runs on Apache using the Python module mod_python • Uses MySQL RDBMS • Take advantage of fully featured query language • Invenio home made Indexes • Internal representation with XML-MARC • Export gateways: • Multiple output formats: HTML, XML, MARC, OAI, DC, etc. • Some modules: • Still in PHP (slowly moved to Python) • Some in Common Lisp (BibCheck) VI @ EPC06

  35. Index Space Design (II) • Two important speed factors to consider: • speed of set intersections (Web App Server) • speed of set marshalling (Web App <-> DB Server) • Data structures tested: • sorted (lists, Patricia trees) • unsorted (hashed sets, binary vectors) • fast prototyping: (Python) • throw-away coding, organic-growth software • development model • typical search time gain: 4.0 sec  0.2 sec • typical indexing time loss: 7 hours  4 days • binary vectors found the best compromise (for all types of sets) VI @ EPC06

  36. Performance Benchmarks (2002) • Testing marshalling/intersection/union/unmarshalling • Bytecode interpreted language study: (Python, Java) • Python faster than Java (mainly due to marshalling) • Machine code compiled language study: (ML, Lisp) • OCaml, CMU CL: 3+ times faster than Python C libs • CMU CL best scalable: intersecting 6M records in 0.01 sec, 30M records in 0.04 sec • Data structure study: • OCaml, 3,000,000 records: bit vectors 0.43 sec, hashed sets 1.71 sec, lists 3.76 sec, Patricia trees do not scale well for dense sets • Python fast enough for production (1M records) • fast C modules: Numeric (byte/bit), Marshal, Psyco VI @ EPC06

  37. The + of Python • Clean aesthetical language • Easy to learn, important for many internship students and temporary members working on the project • Very good for rapid prototyping & organic-growth development • Plenty of ready-to-be-used modules • Bytecode-compiled only, speed okay for our needs VI @ EPC06

  38. Use Python? VI @ EPC06

  39. VI @ EPC06

  40. Dispersed Teams

  41. Dispersed teams VI @ EPC06

  42. VI @ EPC06

  43. VI @ EPC06

  44. VI @ EPC06

  45. VI @ EPC06

  46. VI @ EPC06

  47. At Last

  48. What I Learned • Python is not just a language for scripting and glue code • Fully fledged, highly engineered frameworks can be written in Python • Frameworks and component architectures are established practices • Frameworks tend to be domain specific • All very similar to each other and share many design patters • Many concepts common to modern HEP-framework architectures • BusinessDomainLanguages are essential: • Python has the expressive power to implement them VI @ EPC06

  49. What I learned • What can be reused? • Experience, patterns, • Provided one has a common “culture” • Low level components • Plugin components • Provided that the interface is NOT business-domain specific • LHC is not anymore at the frontier of distributed collaboration • There are Individuals/Labs/Companies which value • Sharing information • Building reusable software components • Cooperating in developing the basic building blocks • Become a community around such a common ground VI @ EPC06

  50. More? • Visit • http://vanrees.org/weblog/topics/europython • http://indico.cern.ch/conferenceDisplay.py?confId=44 • http://www.europython.org/ • http://www.google.com/search?q=europython VI @ EPC06

More Related