1 / 49

Data Analysis System of HSC HSC-ANA

Data Analysis System of HSC HSC-ANA. Hisanori Furusawa Subaru Telescope, NAOJ for HSC development team. Contents. 1. Concept and Goals for HSC-ANA 2. Our approaches for development of HSC-ANA 3. Plan for Development. HSC-ANA Concept and Goals. HSC Data Rate.

pferris
Download Presentation

Data Analysis System of HSC HSC-ANA

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Data Analysis System of HSCHSC-ANA Hisanori Furusawa Subaru Telescope, NAOJ for HSC development team SAC Workshop - HSC

  2. Contents • 1. Concept and Goals for HSC-ANA • 2. Our approaches for development of HSC-ANA • 3. Plan for Development

  3. HSC-ANA Concept and Goals

  4. HSC Data Rate Suprime-Cam Data Rate = 160MB/shot  HSC2deg=176CCD=2.8GB/shot Data Format is TBD

  5. Difficulty in Quality Control Even for the current observation, quality control is difficult Many trials and errors by human (possibly subjective) interactions Data Analysis Scientific Results For more massive HSC data Objective Automated Data analysis System ? Problems: Difficult to make their observing plan to trace their analysis to evaluate their results to correct or update their results

  6. HSC-ANA Goals • Maximizing Science Outputs • Provide calibrated data guaranteed • Immediate release of best-effort catalogs upon users’ requests at any times • Quality Control of Key Survey Data • Achieves uniform data quality in long-term survey programs • On-site quality assurance tool (seeing, transparency etc) into Database/FITS headers • Traceable analysis by appropriate Database • Useful Archive Data • Frame selectionby quality information attached to data frames Framework Middleware

  7. HSC-ANA Team in Japan H. Aihara, T. Uchida (U-Tokyo) M. Tanaka, Y. Yasu, S. Yamagata, R. Itoh, S., N. Katayama (KEK - high energy accelerator research organization) H. Furusawa, S. Miyazaki, Y. Komiyama (NAOJ)

  8. 2. Our Approaches

  9. DB Analysis Gigabit ether? Scalable Master HSC-ANA Framework Command flow House Keeping Watchdog Data flow (image, status, catalog) Data Retriever User Interface Control Operators or Users HSC archive (or OBCP) or STARS Input: Data frames Analysis configurations Output: Resutls (Images, Catalogs plots etc), Status

  10. System Components • Analysis Part • Analysis programs • Error Handling  Robust system • Quality Assurance Tool (QA) • Extracting and registering quality information • Framework • Interfacing analysis tasks • Database, Process distribution • Other Components • Data retrieval mechanism from archive • U/Is • Watchdog

  11. Component 1Analysis Part – main pipelines Developing analysis tasks by dividing the entire procedure • 1st –stage analysis • 1st a: Data reduction, removing instrumental characteristics (bias sub.  sky subtraction) • 1st b: Mosaicking • 2nd-stage analysis • Photometric/Astrometric Calibrations • Object extraction and Catalog Creation

  12. To exploit the capacity of computing system and to reduce the bottleneck of pipelines Pipeline 1st stage In simulation server • Preprocessing • Distributing CCD data • Database registration Distribute data & processes Pipeline1st a (CCD-by-CCD) ・OverScan ・FlatMaking ・FlatFielding ・Distortion correction ・PSF match ・Catalog for each chip Pipeline1st b (Pointing-by-Pointing) ・Mosaicing ・Sky sub ・Stacking ・Deep Catalog

  13. Component 2. Quality Assurance (QA) Tool • Obtains quality information for each data frame on a semi-real-time basis  DB and FITS header Seeing @ UKIRT Quick-look Observing Plan • Uniform survey data • Service/Queue obs. - Time variation analysis Transparency Skyprobe @ CFHT Achieved S/N Quick Coadded Images

  14. Component 3. Framework Organizes the whole system • Provides interfacing for analysis commands, constructing, executing, and monitoring pipelines • Communicates with Databases • Downloads requested data frames • Distributes processes • Provides User Interfaces to each system component

  15. 3. Our Developing Plan

  16. Prototyping :Zero-version HSC-ANA System • Implements a minimal pipeline for S-Cam data • 1a, 1b, and 2nd –stage analysis + on-site QA tools •  provided to observers • Figures out bottlenecks & potential problems • Large inputs of archive Suprime-Cam data • Science = critical evaluation • Evaluates framework middleware • R&D Chain Management (RCM), BASF+ROOT (@KEK)  Evaluation through autumn in 2008  Development of full HSC-ANA system

  17. Framework of Analysis Pipelines https XML-U/I Load Distribution Record Histories and Search priority priority normal Annotation Command Log Search Targets Control server Workflow Pipeline busy DB request Diskfull XML-DB medium Optimal assignments Database、Backup/Restore analysis servers file servers RCM Pipelines Management

  18. Login To RCM System Data accessibility can be controlled by Account and groups

  19. Workflow of HSC-ANA Overscan Subtraction Coadding

  20. Pipeline Execution Keyword list recorded in DB Configuration of parameters Frame Search by keywords

  21. Viewing Results Retry or proceed XML-based Summary Viewer - hierarchical structure Thumbnails of processed images DS9 is invoked if needed Some evaluation results and statistical information for processed data will be added

  22. Prototyping Quality Assurance Tools for Suprime-Cam • Application to the Suprime-Cam observations • As a part of the HSC-ANA • (2nd-stage analysis and QA) • Important project to the observatory •  a project on a high priority (as a SS)

  23. Functions in SC QA • Quick-look & Quality-check Assistance • Observation Planning • Quality Assessment for Long-term Data • Quality Control for Survey Project Data

  24. A. Quick-Look & Quality-Check Assistance Overview of QL&QC Assist • 2.Statistics • Seeing • Focusing • Read noise level • Background level Suprime-Cam • 3.Photometry & Transmission • Standard / ref. stars • Relative transmission • 1.Quick Reduction • Bias Sub • (Flatfielding) • Coarse Astrometry • 4.Depth (limiting mag) • Image co-adding • Noise statistics DB A-LAN machine 5.Observing Logs OBCP (DAQ) QA Servers ANA

  25. Roadmap

  26. Summary • For massive HSC data, objective automated data analysis system is needed • Conceptual designing of the HSC-ANA is underway • Prototyping of the HSC-ANA based on the RCM middleware is ongoing, evaluated this yearQA system developed and tested this year • Inputs from observers and the science community.Consultation/collaboration with experienced domestic and international groups.

  27. Thank you.

  28. RCM Workflow (=pipelines) User Login Entry in the Data base Search / Monitor / Check Move on Task (chip-by-chip, shell-by-shell etc) Parallel processing can be done In the RCM frame work Pipeline 1a Move on Algorithm TBD Parallel processing architecture will be discussed and developed. Pipeline 1b Move on Pipeline 2 Calibrations and Catalog Making Under developing

  29. A. Quick-Look & Quality-Check Assist • Quick Reduction of data frames (FITS validation, Bias sub, Flatfielding, Med-precision astrometry for photometric, stacking analysis) • Statistical values injested to Database (Seeing FWHM, Focusing, Noise level in overscan, Sky background level) • Statistical values • Seeing • Focusing • Read noise level • Background level Quality check and assessment Parameters in the next stages. DB Automated focusing

  30. A. Quick Look & Quality Check Assist 3. Photometry & Relative Sky Transmission • Photometric analysis of standard stars Registered in Database • Relative photometry among frames during the night Zeropoints if available QA Database Shot 1 Shot 2 Shot 3 Shot 1  Shot 2  Shot 3 Attenuation (mag) Trace Same Objects Time-to-time variation

  31. A. Quick Look & Quality Check Assist 4. Estimating Limiting Magnitude • A particular area of images are co-added which meets users’ query to estimate attained depth until that time • Suggests necessary exposure times and observing plan to achieve the target depth User Input Target: 26.2mag S/N: 3.0 -------------------------- Now: 25.5mag S/N: 3.0 -------------------------- Exptime to be done: 2500 sec Recommended Plan: 630sec x 4shots Sky noise stat. Limiting mag. • Filters5. Coods, fields • Seeing6.Magnitudes • Transmittance • Background level Mosaic Stacking DB Query Analysis Server Output to on-site users

  32. A. Quick-Look & Quality-Check Assist 5.Observing Log • Obtains information on the data frames from FITS header and other environmental status • Add users’ comments and store the logs in database for each frame or shot. Already being provided QA will add HST NAME EXP-ID OBJECT FILTER01 EXPTIME …... SKY SEEING TRANSP RONOISE WEATHER 05:57:58 object000 SUPE00555290 DOMEFLAT W-J-B 10.0 12142 N/A N/A 10.5 Clear 18:43:52 bias000 SUPE00555300 BIAS W-J-B 0.0 0 N/A N/A 11.0 Clear 19:52:14 object001 SUPE00555310 SA107 W-J-B 5.0 8205 N/A 1.00 10.2 Clear 20:00:05 object002 SUPE00999980 SXDS_1 W-J-B 900.0 4873 0.75 0.97 10.8 Clear 20:17:10 object003 SUPE00999990 SXDS_1 W-J-B 900.0 4911 0.73 0.95 10.5 Clear Analysis pipelines Obs. Planning, Quality assessment Survey data quality control DB User Submit QA servers

  33. B. Observation Planning Procedure of Observation Planning Quality, Depth Check Target Depth, Filter, Field Generate Obs. Plan Editing by observers Generate Obs. Proc Script (OPE)

  34. B. Observation Planning • Assists planning observations based on the sky transmission, limiting mag achieved, target visibility etc 2.Generate observing plan 1.Check achieved depth etc Target: 26.2mag S/N: 3.0 -------------------------- Now: 25.5mag S/N: 3.0 -------------------------- Exptime to be done: 2500 sec Recommended Plan: 630sec x 4shots HST OBJECT FILTER (AZ,EL) ------------------------------------------------------------------------------ 23:54 STD:PG1633 in W-S-Z+ 3(sec) x 1(shot) (-70, 31) - 2 min 23:56 ==>W-J-B (-70, 46) - 5 min 24:01 SXDS_1 in W-J-B 630(sec) x 4(shot) (+15, 55) - 46 min 24:47 Slew - 2 min 24:49 SXDS_2 in W-J-B 600(sec) x 5(shot) (+82, 48) - 55 min ------------------------------------------------------------------------------

  35. B. Observation Planning 2. Generate Observing Procedure Scripts based on the observing plan and users’ inputs 1.Automatic genaration of obsplan 23:54 STD:PG1633 in W-S-Z+ 3(sec) x 1(shot) - 2 min 23:56 ==>W-J-B - 5 min 24:01 SXDS_1 in W-J-B 630(sec) x 4(shot) - 46 min 24:47 Slew - 2 min 24:49 SXDS_2 in W-J-B 600(sec) x 5(shot) - 55 min 2.Inputs and editing by observers Submit 3. OCS Proc Script Submit

  36. Co-working with science community Analysis Procedure Linked To Science Objectives Output data format, Catalog format Acceptable uncertainties in photometry, astrometry, & object parameters Science Community HSC-ANA development Science Objectives ↓ Survey Design Target Data Products ↓ Analysis Procedure Algorithms Satisfactory Result

  37. Development Hardware Environment KEK KEK/Hilo CNT server WEB server WEB server CNT server DB server CPU:amd dual core Opteron 2.8GHz x 2 MEM:16GB, HD:250GB DB server Simulation server Simulation server CPU:intel Xeon dual core 1.8GHz x 2 MEM:2GB, HD:500GB CPU:intel Xeon quad core 2.6GHz MEM:2GB, HD:500GB 1st Setup 2nd Setup

  38. 10 CCDs : MIT/LL 2,048 x 4,096 • Rate = 160MB/shot •  Good for test data • FoV = 34‘ x 27’ Wide-field Imager – Suprime-Cam Strong Capability of Wide-Field & Deep Imaging AΩ=13.17 e.g., Megacam(9.59), SDSS(22.99), HSC(162) 8.2m Subaru Telescope

  39. Standard Reduction Procedure For each chip Mosaicking • Pattern Matching • Determination of • Offset (dX, dY, dtheta) • Flux scaling • CoAdd, Stacking • Subtraction of bias (based on overscan region) 2. Making flat frames (objects, domeflat, twilight) 3. Flatfielding 4. Masking or removing Cosmic rays, Bad pixels • Well works for most extragalacic objects • 2. Not optimized to a large data input or very wide field surveys • 3. Critical parts are handled by users (frame selection, result check, calibration) 5. Distortion correction based on a formula 6. Equalization of PSF among frames 7. Sky background subtraction

  40. Retry Error Handling U/I Database Maintain analysis histories Alert notification Control Server task1 check1 Synchronous check Failure task2 Analysis Pipeline check2 invoke processes Un-synchronous check task3 check3

  41. Components and Status

  42. Distributed Processing To exploit the capacity of computing system, reducing the bottleneck of pipelines

  43. Software Layers • Pursuing a possibility of sharing technologies between upcoming big projects in NAOJ and KEK RCM LSF/NQS dBASF BASF

  44. Goals with Quality Assurance (QA) Controls Quality of Data and Makes observations more efficient • Assists observers to quick-look dataReasonable quality evaluation of each data on a semi-real-time and automatic basis • Outputs results available to observers forobserving planning(# of shots, exptime, bands) • Searches for and retrieves necessary archived images and meta-data connecting to DatabasePerforms quality assessment(zeropoint, flatfield) • Provides quality-controlled data products in the long-lasting surveys(S/N per pointing, filters)

  45. A. Quick Look & Quality Check Assist • Assist quality checking by interactive operations with FITS viewers (Zview, ds9) • Photometry/Transmit. • Stacking • Achieved depth

  46. C. Quality Assessment for Long-term Dataset • Inspects time-time variation of characteristics or particular parameters of existing data • Monitors the system health and secures uniform quality of data products Sends query, Requests assessment DB QA Servers ANA Results Stores quality meta-data for old data frames

  47. C. Quality Assessment • Target Analyses (TBD) • Search and retrieval for particular archived data • Time-to-time variation of- Flat patterns- Readout noise- System throughputs and response functions • Obtain stacked images and achieved depths for particular range of frames

  48. D.Survey Quality Control • Coworking with QC and Observation Planning, Maintains achievements in a survey project for multiple pointings, filters etc • Provides efficient operations of gigantic programs and service/queue observations. Summary Output of Achievement • Survey Targets • Field • Filter • Depth • Area DB Estimated Exposures to be done QA Servers Input to Observation Planning

  49. D.Survey Quality Control • An example for the outputs from the QA system – A achievement summary of a certain virtual survey. • Target example • SXDS • W-J-B • 28.3magAB(3sigma) • 5 FOVs DB QAサーバ

More Related