310 likes | 403 Views
The ALMA Computing Project Update and Management Approach. Brian Glendenning (1) bglenden@nrao.edu Gianni Raffi (2) graffi@eso.org (1) National Radio Astronomy Observatory (NRAO), Socorro, NM, USA (2) European Southern Observatory (ESO), Munich, Germany. ALMA partner organizations.
E N D
The ALMA Computing ProjectUpdate and Management Approach Brian Glendenning (1)bglenden@nrao.edu Gianni Raffi (2)graffi@eso.org (1) National Radio Astronomy Observatory (NRAO), Socorro, NM, USA (2)European Southern Observatory (ESO), Munich, Germany ICALEPCS’2005 - Geneva
ALMA partner organizations The Alma Computing Project - B.Glendenning, G.Raffi
ALMA Project in Summary • 64 x 12m antennas , 30-950 GHz => Reality check: 50 antennas proposed for the time being • Array configurations:150 m-14 Km • Near S. Pedro de Atacama, Chile at 5000 m • EU and North America as equal partners • Japan will add Compact Array: 12 x 7m + 4 x 12m antennas and extra correlator, receivers • 2 prototype antennas (in Socorro, NM) • Construction phase 2003-2011 • Early Science foreseen for 2009 The Alma Computing Project - B.Glendenning, G.Raffi
ALMA Antenna Configurations The Alma Computing Project - B.Glendenning, G.Raffi
ALMA Computing requirements • Control of antennas and receivers • Correlator control/ data acquisition (input: 96 Gb/s per antenna, output to archive up to 64 MB/s) • On-line Pipeline(quicklook, flagging, images), Off-line Data Reduction, Telescope Calibration • Archiving (Data rate >10MB/s - 300 TB/year) • Observing Preparation, Scheduling • Support of novice science intent to get Sched. Blocks • Dynamic scheduling to take advantage of weather The Alma Computing Project - B.Glendenning, G.Raffi
Software Scope • From the cradle… • Proposal Preparation • Proposal Review • Program Preparation • Dynamic Scheduling of Programs • Observation • Calibration & Imaging • Data Delivery & Archiving • Afterlife: • Archival Research & VO Compliance The Alma Computing Project - B.Glendenning, G.Raffi
Trilateral Computing IPT Organisation Total Bilateral staff now: 40 FTEs Total trilateral staff now: 65 FTEs The Alma Computing Project - B.Glendenning, G.Raffi
ALMA Computing • Large but extremely distributed team • 40 Full Time Equivalent for whole E2E sw • Total development effort to 2011 ~280 FTE-years • The fundamental output of the CIPT will be a ~2M SLOC “end to end” software system running on over 200 computers on 4 continents. • (2M figure does not include comments, tests, documentation, or adopted/modified products like AIPS++, NGAS, ATM, etc). • Staff in 14 Institutions Europe/North America/Japan • Japanese Computing fully integrated. It includes: • Staff in Japan working on ACA ~ 30 FTE-years • Staff and cash for developments in Europe, US ~ 60 FTE-years The Alma Computing Project - B.Glendenning, G.Raffi
Software Architecture The Alma Computing Project - B.Glendenning, G.Raffi
AOS Network 1 Gb fibers from Antenna pads Patch Panel Room Correlator Room 10 Gb CDP Master X 250 16 CDP Beowulf nodes X 64 CCC Computer Patch Panel Computer Room Office Area Terminal PCs (Diskless + RFI quiet) ARTM, GPS .. (Diskless computers) fiber Patch Panel IP-Telephony copper SRST-Router Structured copper cabling 10 Gb fibers to OSF The Alma Computing Project - B.Glendenning, G.Raffi
ALMA software development process • Software to be developed in two main phases: Array sw by 2008, Observatory sw by 2011 • Incremental synchronized development via 6 monthly Releases at FIXED dates • allows adjusting priorities to status • We consider a fixed-date development pacing to be crucial in our distributed environment • Monthly integration tags (end-of-month) and inter-subsystem interface freezes (middle of month) • Releases every 6 months (alternating major/minor) • We believe development of an integrated system requires integrations from the beginning to avoid the well-known “integration hell” problem • Non regression- + User (Test Cases)-Tests (Goal:20% effort) The Alma Computing Project - B.Glendenning, G.Raffi
ALMA software approach We have requirements since the beginning: • Science + Operation Requirements => Architecture => We are tracking them (vs Features, Tests, Delivery time) (using Telelogic’s DOORS) Prototypes were done (using ACS – see below) • Software for prototype antennas, first correlator Common infrastructure (software rather than rules): • ALMA Common Software (ACS), started very early and now getting more and more stable. • S/w engineering procedures, integration, tests The Alma Computing Project - B.Glendenning, G.Raffi
ACS Concepts Component 1 Component 2 Client Component-Container • Supports Separation of Concerns between technology and specific applications. • Same idea as .NET, EJB, CCM Container ACS Entity objects Structured data, e.g. Scheduling Blocks to be passed between components defined & serialized with XML ... Component 3 The Alma Computing Project - B.Glendenning, G.Raffi
ALMA Computing Project Management & Oversight • Oversight • Yearly reviews • Assignment of “subsystem scientists” • Subsystem contact meetings • Planning, Control Plan coming year in some detail (high-level requirements decomposed into granular features), place remaining features in a backlog, to be drawn in priority order • Verify (trace) feature completion via user end tests The Alma Computing Project - B.Glendenning, G.Raffi
Planning: R3 Master Test Plan The Alma Computing Project - B.Glendenning, G.Raffi
Computing Group Communications and Reporting • Yearly Incremental Design Reviews, Review Plans revised every 6 months • TWiki is used/useful for orderly discussions • Contact meetings with subsystems and among subsytem leads • Yearly subsystem leads meetings (design and interface discussions) • People meet by working together at each other’s site • Videoconf more troublesome than telecons The Alma Computing Project - B.Glendenning, G.Raffi
Tests will grade full/partial requirements. SSR sign off on a requirement as ‘Adequate’ by grading requirements as shown in example below. Overall Grade Test Grades The Alma Computing Project - B.Glendenning, G.Raffi
Status • Passed external PDR (2003) and CDR2 (‘04) and internal CDR1(’04), CDR3 (‘05) • Delivered R0-R3 release (+Rx.1 Releases) • Prototype control/correlator used with prototype antennas • Every subsystem has a dedicated astronomer, who checks developed features twice per year (release validation). The Alma Computing Project - B.Glendenning, G.Raffi
Status (cont.) • Most subsystems have substantial development with infrastructure in place, external interfaces defined and implemented, and some functionality. • Most subsystems have had external user tests • Integrated tests with simulated/elementary data has taken place • internal testing of the system at the VLA site early 2006 • Antenna evaluation required significant software, but was done essentially via scripting of control components • ACA (Japanese compact array) and Observatory Support software still in early design The Alma Computing Project - B.Glendenning, G.Raffi
(~850 kSLOCs Oct.05) In-kind contributions (NGAS, AIPS++, ATM) not included Test Interferometer Control Software prototype The Alma Computing Project - B.Glendenning, G.Raffi
Lessons learned Geographical distribution with this size & pace is difficult (*): • Computing Subsystems mixed across continents (sometimes, it was inevitable) • Acceptance of common software (optimized for system, not for everybody’s taste & mandatory. In general OK) => Requires team spirit. • Stability of interfaces among subsystems => No last minute changes • Difficulty of Integration. Subsystems tend to give priority to own development vs. stability of system (but we are still in the early phases). => Takes two months for an integrated system. Continuous integration remains a goal (dream?) • In front of problems finger-pointing to “the others” occurs too quickly. • Some inefficiency has to be accepted (balanced by more discussion, better design) We gave some thought to Agile developments.. but are at wrong end of spectrum (vs local small team). At least: Light doc.+ Some form of emergency “pair programming” at integration time. (*) Not a statement against collaborations (typically among labs with different projects). We believe to be a very good example of a collaborative project (Hopefully we will also have a successful software to show at the end as well). The Alma Computing Project - B.Glendenning, G.Raffi
Prototype Antennas at the VLA Site (New Mexico) Evaluated using prototype control software (with ACS) Vertex/RSI Alcatel/EIE The Alma Computing Project - B.Glendenning, G.Raffi
First Operator GUI The Alma Computing Project - B.Glendenning, G.Raffi
ALMA Sites in Chile Antenna Operations Site (AOS) 60 MB/s (peak) 6 MB/s (average) OperationSupportFacility (OSF) Santiago Central Office (SCO) The Alma Computing Project - B.Glendenning, G.Raffi
Earthwork for the OSF Technical Facilities The Alma Computing Project - B.Glendenning, G.Raffi
ALMA Operation Site Facility today The Alma Computing Project - B.Glendenning, G.Raffi
ALMA Operation Site Facility (2900m – Atacama desert) ALMA operated from here up to 2009 The Alma Computing Project - B.Glendenning, G.Raffi
Antenna Operation Site Technical Building Concept The Alma Computing Project - B.Glendenning, G.Raffi
ALMA Santiago Office • Support operation from Santiago with: • Final master archive • Pipeline monitoring • ALMA Regional Centers in • Europe, US, Japan • Wide area network connectivity • Copies of archive data • Support of users in proposal prep. & final data reduction The Alma Computing Project - B.Glendenning, G.Raffi
ALMA Related Papers and Posters at ICALEPCS’2005 Sat.-Sun: ALMA Common Software (ACS) Workshop http://almasw.hq.eso.org/almasw/bin/view/ACS/ACSWorkshop2005 WE1.4-4: Advanced Hardware Technology in ALMA Back End and Correlator, F. Biancat Marchet etc. WE4A.2-5: A generic software interface simulator for ALMA common software, D. Fugate etc. WE2.4-6 : The ALMA Common Software ACS Status and Developments, G.Chiozzi etc. WE3A.3-6: The ALMA Telescope Control System, A. Farris etc. PO1.012-1: Development of the control system for the 40m radiotelescope of the OAN using the Alma Common Software, P. de Vicente etc. PO1.032-6: Transmitting huge amounts of data design implementation and performance of the bulk data transfer mechanism in ALMA ACS, P. Di Marcantonio etc. PO2.067-4 : ALMA Correlator Real-Time Data Processor, J.Pisano etc. PO1.100-8 : Migration from ACS 1.1 to ACS 4 at ANKA, I.Križnar etc. The Alma Computing Project - B.Glendenning, G.Raffi
ALMA Sites: Chajnantor + www.alma.info The Alma Computing Project - B.Glendenning, G.Raffi