1 / 16

NAASC data processing capabilities (including reprocessing scope)

NAASC data processing capabilities (including reprocessing scope). Mark Lacy Data Services Lead, NAASC, NRAO. NAASC Data Services. Data services group formed within the NAASC (other groups are User Support Services [Brogan] and JAO support [Hibbard]).

edena
Download Presentation

NAASC data processing capabilities (including reprocessing scope)

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. NAASC data processing capabilities (including reprocessing scope) Mark Lacy Data Services Lead, NAASC, NRAO ANASAC 13-14 Sept 2010

  2. NAASC Data Services • Data services group formed within the NAASC (other groups are User Support Services [Brogan] and JAO support [Hibbard]). • Goal: to provide processed ALMA data and the tools to analyze it to NA users. • Responsibilities: • NA ALMA archive and user portal, including VO and interaction with VAO LLC • Splatalogue • Simdata • Pipeline (implementation) • “Advanced tools” (e.g. data cube visualization and marginalization). ANASAC 13-14 Sept 2010

  3. Overview of ES processing • JAO plans to process all Early Science (ES) data in order to perform Quality Assurance (QA2) • Processing at SCO will be performed using desktop machines. • Tests indicate that these will be able to deal with ES data rates (expected to be ~20TB/year. ~1/10th of Full Science, but with ramp-up near end). • NAASC will provide reprocessing capabilities for NA users. • Already getting experience with CSV data processing through NAASC SV “Tiger team” ANASAC 13-14 Sept 2010

  4. Details of NAASC processing plans in ES • We have recently written a computing plan for the NAASC covering ES through operations. • A small cluster (~4-12 nodes), forerunner of NAASC pipeline machine will be built up slowly, based on EVLA experience. • In addition, we will purchase desktop machines for visitor use and evaluation, with the aim of producing recommendations to users for offline processing. • We will thus have the ability to perform reprocessing of all NA data. ANASAC 13-14 Sept 2010

  5. NAASC cluster - ES ANASAC 13-14 Sept 2010

  6. NAASC cluster - operations ANASAC 13-14 Sept 2010

  7. How users will reprocess • Option 1:Come to the NAASC and use the cluster through a login on an NRAO desktop machine (or the desktop directly for small datasets). • Option 2: Use VNC from their home institution to login to the cluster. • Option 3: Submit a pipeline job remotely to the NAASC via a webpage. Which we do will depend on the level of support and interaction with the data that is required. Likely to begin with option 1 and move to option 3 as algorithms for e.g. automatic flagging improve, with option 2 as a backup. (Also likely to have ASDM to MS conversion implemented for users getting their data from the archive.) ANASAC 13-14 Sept 2010

  8. Getting the data to NA • Baseline plan is disk shipment for bulk data, but we will attempt to take advantage of improved links to Chile required by NOAO for DES and LSST. • Have AUI/AURA agreement to share fast data link Chile to Florida Intl University (10Gb/s). • Thereafter data travels via Internet 2 to Charlottesville/UVa • Should be adequate to move both bulk data and metadata without requiring shipping of disks. • Archive replication tests to begin next year. ANASAC 13-14 Sept 2010

  9. NAASC and related software systems • Splatalogue • Currently concentrating on documentation and database enhancement. • Future plans include improvements to usability (new front end). • Plan to make Splatalogue an “official” ALMA software project, working on a Splatalogue memo to ALMA describing the database and the plan for management and maintenance. • Simdata (task in CASA) • Simdata now largely complete, including single dish capability (in collaboration with NAOJ). • CASA code freeze Sept 17th prior to October release. • Working on new ES examples. simdata will allow us to demonstrate the limitations of the ES array both in terms of sensitivity and dynamic range/uv-coverage ANASAC 13-14 Sept 2010

  10. Example: ALMA Band 6 deep pointing 9x8hr 234GHz ALMA track in continuum. Simulated using Oxford S-cubed simulations (Obreschkow & Rawlings 2009) for the model and simdata2 in CASA for the “observation” Model Early Science (16 ants) Full Science (50 ants)

  11. CASA/pipeline performance • CASA currently has similar speed to other packages for ~ 10 GB datasets except for a few high nails being aggressively pursued (flagging, plotting) • CASA’s architecture has been written with parallelization in mind • Channelization of radio data makes the problem “embarrassingly parallelizable” • However, particularly for imaging, the problem is I/O and not CPU limited making the problem trickier (~60:40 I/O:CPU). • Pursuing mitigation through hardware solutions (fast file systems e.g. Lustre with Infiniband interconnect), and software solutions (improving i/o efficiency in code). • Nevertheless, parallelization efforts of highest current risk and priority • Release of multi-core CASA functionality will be staged so that functionality becomes available for pipeline testing and the community as soon as possible • Simple imaging (single field or simple mosaic cube) well progressed, expected for October 2010 release • Multi-core flagging and more imaging cases (multi-frequency synthesis continuum) expected June 2011

  12. CASA development Priorities • Support of ALMA and EVLA commissioning needs • Parallelization and cluster fine-tuning for imaging and flagging • Working on combining Torque resource manager with Python scripting in CASA • Improvements needed for polarization calibration of linear feeds • Improvements to calibration table plotting (incorporate into plotms) • Planet models for use as resolved calibrators • Splatalogue search capabilities (including offline database) and overplotting • Viewer improvements (especially for spectral line plotting and analysis) • Improvements to image analysis tasks • Improvements to “TV” based flagging in the Viewer (on-the-fly spectral and time averaging) • A CARMA miriad filler (through partnership with Peter Teuben at U. Maryland) • Expanded and more modularized simulation capabilities.

  13. NAASC advanced tools • The NAASC staff will push some of the ALMA-related software development items as Splatalogue & Simdata reach completion. We will also be hiring an additional developer. • For example, image cube visualization and analysis are areas which will likely require work. • Can’t do this all ourselves, so will aim to be responsive to community suggestions and contributions,incorporating some into CASA and posting others as “contributed software”. ANASAC 13-14 Sept 2010

  14. Summary • Within 1 year expect significant data from ALMA, comparable data rate to that from e.g. HST, Spitzer. • Within 3 years, data rate will exceed by more than an order of magnitude that from any other PI-driven telescope apart from the EVLA. • Must continue to be focused on the challenges and opportunities this presents. ANASAC 13-14 Sept 2010

  15. Backup slides ANASAC 13-14 Sept 2010

  16. CASA tutorial examples M99 moment maps (CARMA) 3C391 polarization (EVLA) ANASAC 13-14 Sept 2010

More Related