1 / 31

The ALMA Computing Project Update and Management Approach

The ALMA Computing Project Update and Management Approach. Brian Glendenning (1) bglenden@nrao.edu Gianni Raffi (2) graffi@eso.org (1) National Radio Astronomy Observatory (NRAO), Socorro, NM, USA (2) European Southern Observatory (ESO), Munich, Germany. ALMA partner organizations.

mahsa
Download Presentation

The ALMA Computing Project Update and Management Approach

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. The ALMA Computing ProjectUpdate and Management Approach Brian Glendenning (1)bglenden@nrao.edu Gianni Raffi (2)graffi@eso.org (1) National Radio Astronomy Observatory (NRAO), Socorro, NM, USA (2)European Southern Observatory (ESO), Munich, Germany ICALEPCS’2005 - Geneva

  2. ALMA partner organizations The Alma Computing Project - B.Glendenning, G.Raffi

  3. ALMA Project in Summary • 64 x 12m antennas , 30-950 GHz => Reality check: 50 antennas proposed for the time being • Array configurations:150 m-14 Km • Near S. Pedro de Atacama, Chile at 5000 m • EU and North America as equal partners • Japan will add Compact Array: 12 x 7m + 4 x 12m antennas and extra correlator, receivers • 2 prototype antennas (in Socorro, NM) • Construction phase 2003-2011 • Early Science foreseen for 2009 The Alma Computing Project - B.Glendenning, G.Raffi

  4. ALMA Antenna Configurations The Alma Computing Project - B.Glendenning, G.Raffi

  5. ALMA Computing requirements • Control of antennas and receivers • Correlator control/ data acquisition (input: 96 Gb/s per antenna, output to archive up to 64 MB/s) • On-line Pipeline(quicklook, flagging, images), Off-line Data Reduction, Telescope Calibration • Archiving (Data rate >10MB/s - 300 TB/year) • Observing Preparation, Scheduling • Support of novice science intent to get Sched. Blocks • Dynamic scheduling to take advantage of weather The Alma Computing Project - B.Glendenning, G.Raffi

  6. Software Scope • From the cradle… • Proposal Preparation • Proposal Review • Program Preparation • Dynamic Scheduling of Programs • Observation • Calibration & Imaging • Data Delivery & Archiving • Afterlife: • Archival Research & VO Compliance The Alma Computing Project - B.Glendenning, G.Raffi

  7. Trilateral Computing IPT Organisation Total Bilateral staff now: 40 FTEs Total trilateral staff now: 65 FTEs The Alma Computing Project - B.Glendenning, G.Raffi

  8. ALMA Computing • Large but extremely distributed team • 40 Full Time Equivalent for whole E2E sw • Total development effort to 2011 ~280 FTE-years • The fundamental output of the CIPT will be a ~2M SLOC “end to end” software system running on over 200 computers on 4 continents. • (2M figure does not include comments, tests, documentation, or adopted/modified products like AIPS++, NGAS, ATM, etc). • Staff in 14 Institutions Europe/North America/Japan • Japanese Computing fully integrated. It includes: • Staff in Japan working on ACA ~ 30 FTE-years • Staff and cash for developments in Europe, US ~ 60 FTE-years The Alma Computing Project - B.Glendenning, G.Raffi

  9. Software Architecture The Alma Computing Project - B.Glendenning, G.Raffi

  10. AOS Network 1 Gb fibers from Antenna pads Patch Panel Room Correlator Room 10 Gb CDP Master X 250 16 CDP Beowulf nodes X 64 CCC Computer Patch Panel Computer Room Office Area Terminal PCs (Diskless + RFI quiet) ARTM, GPS .. (Diskless computers) fiber Patch Panel IP-Telephony copper SRST-Router Structured copper cabling 10 Gb fibers to OSF The Alma Computing Project - B.Glendenning, G.Raffi

  11. ALMA software development process • Software to be developed in two main phases: Array sw by 2008, Observatory sw by 2011 • Incremental synchronized development via 6 monthly Releases at FIXED dates • allows adjusting priorities to status • We consider a fixed-date development pacing to be crucial in our distributed environment • Monthly integration tags (end-of-month) and inter-subsystem interface freezes (middle of month) • Releases every 6 months (alternating major/minor) • We believe development of an integrated system requires integrations from the beginning to avoid the well-known “integration hell” problem • Non regression- + User (Test Cases)-Tests (Goal:20% effort) The Alma Computing Project - B.Glendenning, G.Raffi

  12. ALMA software approach We have requirements since the beginning: • Science + Operation Requirements => Architecture => We are tracking them (vs Features, Tests, Delivery time) (using Telelogic’s DOORS) Prototypes were done (using ACS – see below) • Software for prototype antennas, first correlator Common infrastructure (software rather than rules): • ALMA Common Software (ACS), started very early and now getting more and more stable. • S/w engineering procedures, integration, tests The Alma Computing Project - B.Glendenning, G.Raffi

  13. ACS Concepts Component 1 Component 2 Client Component-Container • Supports Separation of Concerns between technology and specific applications. • Same idea as .NET, EJB, CCM Container ACS Entity objects Structured data, e.g. Scheduling Blocks to be passed between components defined & serialized with XML ... Component 3 The Alma Computing Project - B.Glendenning, G.Raffi

  14. ALMA Computing Project Management & Oversight • Oversight • Yearly reviews • Assignment of “subsystem scientists” • Subsystem contact meetings • Planning, Control Plan coming year in some detail (high-level requirements decomposed into granular features), place remaining features in a backlog, to be drawn in priority order • Verify (trace) feature completion via user end tests The Alma Computing Project - B.Glendenning, G.Raffi

  15. Planning: R3 Master Test Plan The Alma Computing Project - B.Glendenning, G.Raffi

  16. Computing Group Communications and Reporting • Yearly Incremental Design Reviews, Review Plans revised every 6 months • TWiki is used/useful for orderly discussions • Contact meetings with subsystems and among subsytem leads • Yearly subsystem leads meetings (design and interface discussions) • People meet by working together at each other’s site • Videoconf more troublesome than telecons The Alma Computing Project - B.Glendenning, G.Raffi

  17. Tests will grade full/partial requirements. SSR sign off on a requirement as ‘Adequate’ by grading requirements as shown in example below. Overall Grade Test Grades The Alma Computing Project - B.Glendenning, G.Raffi

  18. Status • Passed external PDR (2003) and CDR2 (‘04) and internal CDR1(’04), CDR3 (‘05) • Delivered R0-R3 release (+Rx.1 Releases) • Prototype control/correlator used with prototype antennas • Every subsystem has a dedicated astronomer, who checks developed features twice per year (release validation). The Alma Computing Project - B.Glendenning, G.Raffi

  19. Status (cont.) • Most subsystems have substantial development with infrastructure in place, external interfaces defined and implemented, and some functionality. • Most subsystems have had external user tests • Integrated tests with simulated/elementary data has taken place • internal testing of the system at the VLA site early 2006 • Antenna evaluation required significant software, but was done essentially via scripting of control components • ACA (Japanese compact array) and Observatory Support software still in early design The Alma Computing Project - B.Glendenning, G.Raffi

  20. (~850 kSLOCs Oct.05) In-kind contributions (NGAS, AIPS++, ATM) not included Test Interferometer Control Software prototype The Alma Computing Project - B.Glendenning, G.Raffi

  21. Lessons learned Geographical distribution with this size & pace is difficult (*): • Computing Subsystems mixed across continents (sometimes, it was inevitable) • Acceptance of common software (optimized for system, not for everybody’s taste & mandatory. In general OK) => Requires team spirit. • Stability of interfaces among subsystems => No last minute changes • Difficulty of Integration. Subsystems tend to give priority to own development vs. stability of system (but we are still in the early phases). => Takes two months for an integrated system. Continuous integration remains a goal (dream?) • In front of problems finger-pointing to “the others” occurs too quickly. • Some inefficiency has to be accepted (balanced by more discussion, better design) We gave some thought to Agile developments.. but are at wrong end of spectrum (vs local small team). At least: Light doc.+ Some form of emergency “pair programming” at integration time. (*) Not a statement against collaborations (typically among labs with different projects). We believe to be a very good example of a collaborative project (Hopefully we will also have a successful software to show at the end as well). The Alma Computing Project - B.Glendenning, G.Raffi

  22. Prototype Antennas at the VLA Site (New Mexico) Evaluated using prototype control software (with ACS) Vertex/RSI Alcatel/EIE The Alma Computing Project - B.Glendenning, G.Raffi

  23. First Operator GUI The Alma Computing Project - B.Glendenning, G.Raffi

  24. ALMA Sites in Chile Antenna Operations Site (AOS) 60 MB/s (peak) 6 MB/s (average) OperationSupportFacility (OSF) Santiago Central Office (SCO) The Alma Computing Project - B.Glendenning, G.Raffi

  25. Earthwork for the OSF Technical Facilities The Alma Computing Project - B.Glendenning, G.Raffi

  26. ALMA Operation Site Facility today The Alma Computing Project - B.Glendenning, G.Raffi

  27. ALMA Operation Site Facility (2900m – Atacama desert) ALMA operated from here up to 2009 The Alma Computing Project - B.Glendenning, G.Raffi

  28. Antenna Operation Site Technical Building Concept The Alma Computing Project - B.Glendenning, G.Raffi

  29. ALMA Santiago Office • Support operation from Santiago with: • Final master archive • Pipeline monitoring • ALMA Regional Centers in • Europe, US, Japan • Wide area network connectivity • Copies of archive data • Support of users in proposal prep. & final data reduction The Alma Computing Project - B.Glendenning, G.Raffi

  30. ALMA Related Papers and Posters at ICALEPCS’2005 Sat.-Sun: ALMA Common Software (ACS) Workshop http://almasw.hq.eso.org/almasw/bin/view/ACS/ACSWorkshop2005 WE1.4-4: Advanced Hardware Technology in ALMA Back End and Correlator, F. Biancat Marchet etc. WE4A.2-5: A generic software interface simulator for ALMA common software, D. Fugate etc. WE2.4-6 : The ALMA Common Software ACS Status and Developments, G.Chiozzi etc. WE3A.3-6: The ALMA Telescope Control System, A. Farris etc. PO1.012-1: Development of the control system for the 40m radiotelescope of the OAN using the Alma Common Software, P. de Vicente etc. PO1.032-6: Transmitting huge amounts of data design implementation and performance of the bulk data transfer mechanism in ALMA ACS, P. Di Marcantonio etc. PO2.067-4 : ALMA Correlator Real-Time Data Processor, J.Pisano etc. PO1.100-8 : Migration from ACS 1.1 to ACS 4 at ANKA, I.Križnar etc. The Alma Computing Project - B.Glendenning, G.Raffi

  31. ALMA Sites: Chajnantor + www.alma.info The Alma Computing Project - B.Glendenning, G.Raffi

More Related