170 likes | 296 Views
Data Management Breakouts. Jeff Kantor. Data Management Sessions. DM Overview for Newcomers / Intro to Summer 2014 Session – Jeff Kantor, Mario Juric We introduced the new members of DM Jeff Kantor presented an overview of Data Management
E N D
Data Management Breakouts Jeff Kantor
Data Management Sessions • DM Overview for Newcomers / Intro to Summer 2014 Session – Jeff Kantor, Mario Juric • We introduced the new members of DM • Jeff Kantor presented an overview of Data Management • We had a presentation from Arfon Smith about github development, and a question and answer session • The DM Leadership Team will decide how to leverage what we heard
Data Management Sessions • DM's Calibrations Plans; Refine Calibration Data, Products, Processing – Robert Lupton • Presented latest calibration plan • Discussion failed to find any fatal flaws • Next step • Add details to document and circulate • Identify inputs from the camera system, and check against LSE-130
Data Management Sessions • Summer 2014 Development Retrospective – Jeff Kantor, Mario Juric • We reviewed the Summer 2014 plan in PMCS (as imported from JIRA Agile) noting what was complete and what remained to be done • We explained the import process and discussed how we can standardize the JIRA Agile information to support EV • We reviewed the major deliverables from each team and what was accomplished to date • Performance analysis • The good: new tools (JIRA) in use and appear to be working, majority of the goals set by the individual teams were accomplished, initial rebuild of the Continuous Integration (CI) system done • The bad: Not everything completed yet, but can be by September 30th • The ugly: JK/MJ and many others were kept busy by hiring activities, NSF, and long-term planning; no time left to oversee the development work. Some of it diverged from the long-term plan. We knew this was a risk, but made the conscious calculation that (quality) hiring & starting construction was more important. Going forward, it’s critical to ramp up ASAP (w. good people!).
Data Management Sessions • LSST Transient Management: Building on Current Experiences 1 & 2 – Andy Becker • Presentations were given by: • Robert Gruendl – DES Data Management • Luizde Costa - Real-time streaming QA for DES • Rick Kessler - Image subtraction for DES • Simon Krughoff - obs_decam • Alex Kim - Random forest classification of transients • Francisco Forster and Guillermo Cabrera-Looking for core collapse supernova shock breakouts with DECam • Przemek Wozniak - LANL version of Real/Bogus software v5.0 for iPTF • Jonathan Myers – Linktracklets • A number of areas of future collaboration interest were identified
Data Management Sessions • Tool Chains, Developer Visualization & Debugging Tools – Kian-Tat Lim • DM-internal training and discussion • Described and gave links to documents on DM development tools and their usage • Decisions: • C++11 after testing SWIG 3 (Russell Owen) • Python 3 when dependencies prefer it • Allow force-push of ticket branches, rebase/squash • Improve shared stack performance at NCSA
Data Management Sessions • LSST Software Stack Users Tutorial – Dick Shaw • We described the LSST Stack: • how to install it, use it, and how to get help. • The ~35 participants were ~50% scientists vs. engineers & software developers. ~1/3 were comfortable with programming in python • The examples demonstrated how to: • Download (SDSS) or create (with PhoSim) data and configure it • Bulk process the data with command-line tasks, all the way through a data-release production • We plan to address user requests: • More worked examples of using the Stack • Simpler examples of how to customize the Stack for their use • The ability to perform photometry on a single FITS image from any camera
Data Management Sessions • Winter 2015 Planning 1 & 2 – Mario Juric, Jeff Kantor • We reviewed the major features and results of the Summer 2014 release in LDM-240 Data Management Development Roadmap • We reviewed the major features planned for the Winter 2014 release in LDM-240, adjusting those that needed to move to/from another release • Winter 2015 Priorities: • Establish the Continuous Integration system • Track performance metrics • Release often (intermediate releases) • Adopt a DevOps mindset • Bring new people on and up to speed • Implement MultiFit in the DM framework, and start on 2015 roadmap goals • We tasked the team to develop the Winter 2015 JIRA Epics, prioritize them, and estimate resources required for each and available within the team • With this input, we will implement the Epics in JIRA, and import this information into the PMCS for EV
Data Management Sessions • DM Stack Boot Camp – Paul Price, Simon Krughoff • Introduction to afw, Task, CameraGeom • Feedback so far has been all positive • Greater understanding of and appreciation for the DM stack • Hope this translates to more extensive and more confident use of the stack • Now we need to: • Continue the push on documentation • Support our growing user base • Establish policy for supporting existing APIs
Data Management Sessions • How to Use, Re-use Tasks and Integrating Camera Geom in other work – Paul Price , Simon Krughoff • Paul did a comprehensive survey of available tasks and covered many of the basic Task concepts: configuration, inheritance, sub-tasks. • Simon gave an overview of how Camera Geom is used and how to construct a camera using obs_decam as an example. • No decisions, but good interaction with the community. Interest in using Tasks and CameraGeom. Some priorities are: • Documentation of tasks and task flow. • Tools to build CameraGeom from multi-extension FITS files.
Data Management Sessions • Unit and Regression Tests – Kian-Tat Lim • DM-internal training and discussion • Worked example of coverage tools and improving unit tests • Discussed end-to-end integration needs • Improve test dataset • (Re-)write end-to-end test scripts with multiple configurations • Write Tasks to compute performance metrics • Build monitoring/trending for metrics
Data Management Sessions • Using the DM Stack to Characterize Detectors – Robert Lupton • Discussed work at BNL/SNAL • Identified ways that DM can help: • Introduce use of DM’scameraGeom • Write code to generate from LCA-10140 headers • Use DM’s ISR/assembleCCD tasks • Not clear how useful it would be to use DM’s measurement framework
Data Management Sessions • How to Fit a Galaxy Model – Jim Bosch • Tutorial and discussion of algorithmic ideas, with mostly non-DM participants. • Some topics discussed: • what MultiFit means to us and what we plan to do • kinds of models to fit, and how to evaluate them • Bayesian priors and sampling • how to define galaxy colors • star/galaxy classification • whether per-pixel variances should be used when doing photometry (very lively discussion)
Data Management Sessions • Summit, Summit - Base Network Infrastructure – Ron Lambert • We showed the current and proposed plans for the networking paths from Chile to NCSA and Summit to Base. There are improved path diversification on some links than were previously considered. • Plans of the summit and base computer networks were presented. • Continued work required to refine the data paths for the summit and base computer networks • Continuing to work with International link provider AmLight, Chilean telecoms and Chile NREN to legally ratify the various network paths • Expect to have the network paths for LSST with required bandwidths from Summit to NCSA in operation well before end of CY2016. • We took actions to further review and update the diagrams and to develop an inventory of the network equipment in both sites • After August 25 meetings with telcos we will turn in an LCR to LSE-78 to update it.
Data Management Sessions • DM Developer Hackathon 1 & 2– Mario Juric • Qserv now builds within the continuous integration system • We now have the science pipelines, the database, MAF, and CatSim, all building/being tested automatically! • CI Team: All your base are belonging to us! • Fixes for CFHT processing • Speedups of builds (work in progress) • General note: we should do this more often.
Data Management Sessions • Base Facility Infrastructure, Data Center Design – Kian-Tat Lim Don Petravick • Went through LSE-77 and support document • Exposed issues of redundant cooling, need for inventories of equipment with notional layout • Decided: • Will support two reliability tiers • No separate access controls needed within computing area • Need to engage an engineer knowledgeable about cooling • Need to look at typical tape library footprints in addition to best case • Concrete floor with overhead wiring is now the consensus • Power Utilization Efficiency monitoring required • Loading dock requirements defined
Data Management/SE session • Visualization Tools – Gregory Dubois-Felsmann (Thursday 11:00) • Assembled people from OCS, TCS, Camera, DM, EPO, Simulation to discuss visualization requirements and devise a plan • We put together a list of highest-level use cases • The plan is to flesh these out into finer-grained use cases and requirements, and use that information to… • Identify areas where common tools can be adopted/built and areas where subsystems may have to go it alone • We will consider the overlap between tools required internally by the project (during construction and operation) and tools required for the external science user interface • We plan to have two major teleconferences: October 3rd and early December, and then an in-person meeting in January at which scope decisions can be made • We have a Confluence page and a mailing list for communication