500 likes | 512 Views
Data Analysis System of HSC HSC-ANA. Hisanori Furusawa Subaru Telescope, NAOJ for HSC development team. Contents. 1. Concept and Goals for HSC-ANA 2. Our approaches for development of HSC-ANA 3. Plan for Development. HSC-ANA Concept and Goals. HSC Data Rate.
E N D
Data Analysis System of HSCHSC-ANA Hisanori Furusawa Subaru Telescope, NAOJ for HSC development team SAC Workshop - HSC
Contents • 1. Concept and Goals for HSC-ANA • 2. Our approaches for development of HSC-ANA • 3. Plan for Development
HSC Data Rate Suprime-Cam Data Rate = 160MB/shot HSC2deg=176CCD=2.8GB/shot Data Format is TBD
Difficulty in Quality Control Even for the current observation, quality control is difficult Many trials and errors by human (possibly subjective) interactions Data Analysis Scientific Results For more massive HSC data Objective Automated Data analysis System ? Problems: Difficult to make their observing plan to trace their analysis to evaluate their results to correct or update their results
HSC-ANA Goals • Maximizing Science Outputs • Provide calibrated data guaranteed • Immediate release of best-effort catalogs upon users’ requests at any times • Quality Control of Key Survey Data • Achieves uniform data quality in long-term survey programs • On-site quality assurance tool (seeing, transparency etc) into Database/FITS headers • Traceable analysis by appropriate Database • Useful Archive Data • Frame selectionby quality information attached to data frames Framework Middleware
HSC-ANA Team in Japan H. Aihara, T. Uchida (U-Tokyo) M. Tanaka, Y. Yasu, S. Yamagata, R. Itoh, S., N. Katayama (KEK - high energy accelerator research organization) H. Furusawa, S. Miyazaki, Y. Komiyama (NAOJ)
DB Analysis Gigabit ether? Scalable Master HSC-ANA Framework Command flow House Keeping Watchdog Data flow (image, status, catalog) Data Retriever User Interface Control Operators or Users HSC archive (or OBCP) or STARS Input: Data frames Analysis configurations Output: Resutls (Images, Catalogs plots etc), Status
System Components • Analysis Part • Analysis programs • Error Handling Robust system • Quality Assurance Tool (QA) • Extracting and registering quality information • Framework • Interfacing analysis tasks • Database, Process distribution • Other Components • Data retrieval mechanism from archive • U/Is • Watchdog
Component 1Analysis Part – main pipelines Developing analysis tasks by dividing the entire procedure • 1st –stage analysis • 1st a: Data reduction, removing instrumental characteristics (bias sub. sky subtraction) • 1st b: Mosaicking • 2nd-stage analysis • Photometric/Astrometric Calibrations • Object extraction and Catalog Creation
To exploit the capacity of computing system and to reduce the bottleneck of pipelines Pipeline 1st stage In simulation server • Preprocessing • Distributing CCD data • Database registration Distribute data & processes Pipeline1st a (CCD-by-CCD) ・OverScan ・FlatMaking ・FlatFielding ・Distortion correction ・PSF match ・Catalog for each chip Pipeline1st b (Pointing-by-Pointing) ・Mosaicing ・Sky sub ・Stacking ・Deep Catalog
Component 2. Quality Assurance (QA) Tool • Obtains quality information for each data frame on a semi-real-time basis DB and FITS header Seeing @ UKIRT Quick-look Observing Plan • Uniform survey data • Service/Queue obs. - Time variation analysis Transparency Skyprobe @ CFHT Achieved S/N Quick Coadded Images
Component 3. Framework Organizes the whole system • Provides interfacing for analysis commands, constructing, executing, and monitoring pipelines • Communicates with Databases • Downloads requested data frames • Distributes processes • Provides User Interfaces to each system component
Prototyping :Zero-version HSC-ANA System • Implements a minimal pipeline for S-Cam data • 1a, 1b, and 2nd –stage analysis + on-site QA tools • provided to observers • Figures out bottlenecks & potential problems • Large inputs of archive Suprime-Cam data • Science = critical evaluation • Evaluates framework middleware • R&D Chain Management (RCM), BASF+ROOT (@KEK) Evaluation through autumn in 2008 Development of full HSC-ANA system
Framework of Analysis Pipelines https XML-U/I Load Distribution Record Histories and Search priority priority normal Annotation Command Log Search Targets Control server Workflow Pipeline busy DB request Diskfull XML-DB medium Optimal assignments Database、Backup/Restore analysis servers file servers RCM Pipelines Management
Login To RCM System Data accessibility can be controlled by Account and groups
Workflow of HSC-ANA Overscan Subtraction Coadding
Pipeline Execution Keyword list recorded in DB Configuration of parameters Frame Search by keywords
Viewing Results Retry or proceed XML-based Summary Viewer - hierarchical structure Thumbnails of processed images DS9 is invoked if needed Some evaluation results and statistical information for processed data will be added
Prototyping Quality Assurance Tools for Suprime-Cam • Application to the Suprime-Cam observations • As a part of the HSC-ANA • (2nd-stage analysis and QA) • Important project to the observatory • a project on a high priority (as a SS)
Functions in SC QA • Quick-look & Quality-check Assistance • Observation Planning • Quality Assessment for Long-term Data • Quality Control for Survey Project Data
A. Quick-Look & Quality-Check Assistance Overview of QL&QC Assist • 2.Statistics • Seeing • Focusing • Read noise level • Background level Suprime-Cam • 3.Photometry & Transmission • Standard / ref. stars • Relative transmission • 1.Quick Reduction • Bias Sub • (Flatfielding) • Coarse Astrometry • 4.Depth (limiting mag) • Image co-adding • Noise statistics DB A-LAN machine 5.Observing Logs OBCP (DAQ) QA Servers ANA
Summary • For massive HSC data, objective automated data analysis system is needed • Conceptual designing of the HSC-ANA is underway • Prototyping of the HSC-ANA based on the RCM middleware is ongoing, evaluated this yearQA system developed and tested this year • Inputs from observers and the science community.Consultation/collaboration with experienced domestic and international groups.
RCM Workflow (=pipelines) User Login Entry in the Data base Search / Monitor / Check Move on Task (chip-by-chip, shell-by-shell etc) Parallel processing can be done In the RCM frame work Pipeline 1a Move on Algorithm TBD Parallel processing architecture will be discussed and developed. Pipeline 1b Move on Pipeline 2 Calibrations and Catalog Making Under developing
A. Quick-Look & Quality-Check Assist • Quick Reduction of data frames (FITS validation, Bias sub, Flatfielding, Med-precision astrometry for photometric, stacking analysis) • Statistical values injested to Database (Seeing FWHM, Focusing, Noise level in overscan, Sky background level) • Statistical values • Seeing • Focusing • Read noise level • Background level Quality check and assessment Parameters in the next stages. DB Automated focusing
A. Quick Look & Quality Check Assist 3. Photometry & Relative Sky Transmission • Photometric analysis of standard stars Registered in Database • Relative photometry among frames during the night Zeropoints if available QA Database Shot 1 Shot 2 Shot 3 Shot 1 Shot 2 Shot 3 Attenuation (mag) Trace Same Objects Time-to-time variation
A. Quick Look & Quality Check Assist 4. Estimating Limiting Magnitude • A particular area of images are co-added which meets users’ query to estimate attained depth until that time • Suggests necessary exposure times and observing plan to achieve the target depth User Input Target: 26.2mag S/N: 3.0 -------------------------- Now: 25.5mag S/N: 3.0 -------------------------- Exptime to be done: 2500 sec Recommended Plan: 630sec x 4shots Sky noise stat. Limiting mag. • Filters5. Coods, fields • Seeing6.Magnitudes • Transmittance • Background level Mosaic Stacking DB Query Analysis Server Output to on-site users
A. Quick-Look & Quality-Check Assist 5.Observing Log • Obtains information on the data frames from FITS header and other environmental status • Add users’ comments and store the logs in database for each frame or shot. Already being provided QA will add HST NAME EXP-ID OBJECT FILTER01 EXPTIME …... SKY SEEING TRANSP RONOISE WEATHER 05:57:58 object000 SUPE00555290 DOMEFLAT W-J-B 10.0 12142 N/A N/A 10.5 Clear 18:43:52 bias000 SUPE00555300 BIAS W-J-B 0.0 0 N/A N/A 11.0 Clear 19:52:14 object001 SUPE00555310 SA107 W-J-B 5.0 8205 N/A 1.00 10.2 Clear 20:00:05 object002 SUPE00999980 SXDS_1 W-J-B 900.0 4873 0.75 0.97 10.8 Clear 20:17:10 object003 SUPE00999990 SXDS_1 W-J-B 900.0 4911 0.73 0.95 10.5 Clear Analysis pipelines Obs. Planning, Quality assessment Survey data quality control DB User Submit QA servers
B. Observation Planning Procedure of Observation Planning Quality, Depth Check Target Depth, Filter, Field Generate Obs. Plan Editing by observers Generate Obs. Proc Script (OPE)
B. Observation Planning • Assists planning observations based on the sky transmission, limiting mag achieved, target visibility etc 2.Generate observing plan 1.Check achieved depth etc Target: 26.2mag S/N: 3.0 -------------------------- Now: 25.5mag S/N: 3.0 -------------------------- Exptime to be done: 2500 sec Recommended Plan: 630sec x 4shots HST OBJECT FILTER (AZ,EL) ------------------------------------------------------------------------------ 23:54 STD:PG1633 in W-S-Z+ 3(sec) x 1(shot) (-70, 31) - 2 min 23:56 ==>W-J-B (-70, 46) - 5 min 24:01 SXDS_1 in W-J-B 630(sec) x 4(shot) (+15, 55) - 46 min 24:47 Slew - 2 min 24:49 SXDS_2 in W-J-B 600(sec) x 5(shot) (+82, 48) - 55 min ------------------------------------------------------------------------------
B. Observation Planning 2. Generate Observing Procedure Scripts based on the observing plan and users’ inputs 1.Automatic genaration of obsplan 23:54 STD:PG1633 in W-S-Z+ 3(sec) x 1(shot) - 2 min 23:56 ==>W-J-B - 5 min 24:01 SXDS_1 in W-J-B 630(sec) x 4(shot) - 46 min 24:47 Slew - 2 min 24:49 SXDS_2 in W-J-B 600(sec) x 5(shot) - 55 min 2.Inputs and editing by observers Submit 3. OCS Proc Script Submit
Co-working with science community Analysis Procedure Linked To Science Objectives Output data format, Catalog format Acceptable uncertainties in photometry, astrometry, & object parameters Science Community HSC-ANA development Science Objectives ↓ Survey Design Target Data Products ↓ Analysis Procedure Algorithms Satisfactory Result
Development Hardware Environment KEK KEK/Hilo CNT server WEB server WEB server CNT server DB server CPU:amd dual core Opteron 2.8GHz x 2 MEM:16GB, HD:250GB DB server Simulation server Simulation server CPU:intel Xeon dual core 1.8GHz x 2 MEM:2GB, HD:500GB CPU:intel Xeon quad core 2.6GHz MEM:2GB, HD:500GB 1st Setup 2nd Setup
10 CCDs : MIT/LL 2,048 x 4,096 • Rate = 160MB/shot • Good for test data • FoV = 34‘ x 27’ Wide-field Imager – Suprime-Cam Strong Capability of Wide-Field & Deep Imaging AΩ=13.17 e.g., Megacam(9.59), SDSS(22.99), HSC(162) 8.2m Subaru Telescope
Standard Reduction Procedure For each chip Mosaicking • Pattern Matching • Determination of • Offset (dX, dY, dtheta) • Flux scaling • CoAdd, Stacking • Subtraction of bias (based on overscan region) 2. Making flat frames (objects, domeflat, twilight) 3. Flatfielding 4. Masking or removing Cosmic rays, Bad pixels • Well works for most extragalacic objects • 2. Not optimized to a large data input or very wide field surveys • 3. Critical parts are handled by users (frame selection, result check, calibration) 5. Distortion correction based on a formula 6. Equalization of PSF among frames 7. Sky background subtraction
Retry Error Handling U/I Database Maintain analysis histories Alert notification Control Server task1 check1 Synchronous check Failure task2 Analysis Pipeline check2 invoke processes Un-synchronous check task3 check3
Distributed Processing To exploit the capacity of computing system, reducing the bottleneck of pipelines
Software Layers • Pursuing a possibility of sharing technologies between upcoming big projects in NAOJ and KEK RCM LSF/NQS dBASF BASF
Goals with Quality Assurance (QA) Controls Quality of Data and Makes observations more efficient • Assists observers to quick-look dataReasonable quality evaluation of each data on a semi-real-time and automatic basis • Outputs results available to observers forobserving planning(# of shots, exptime, bands) • Searches for and retrieves necessary archived images and meta-data connecting to DatabasePerforms quality assessment(zeropoint, flatfield) • Provides quality-controlled data products in the long-lasting surveys(S/N per pointing, filters)
A. Quick Look & Quality Check Assist • Assist quality checking by interactive operations with FITS viewers (Zview, ds9) • Photometry/Transmit. • Stacking • Achieved depth
C. Quality Assessment for Long-term Dataset • Inspects time-time variation of characteristics or particular parameters of existing data • Monitors the system health and secures uniform quality of data products Sends query, Requests assessment DB QA Servers ANA Results Stores quality meta-data for old data frames
C. Quality Assessment • Target Analyses (TBD) • Search and retrieval for particular archived data • Time-to-time variation of- Flat patterns- Readout noise- System throughputs and response functions • Obtain stacked images and achieved depths for particular range of frames
D.Survey Quality Control • Coworking with QC and Observation Planning, Maintains achievements in a survey project for multiple pointings, filters etc • Provides efficient operations of gigantic programs and service/queue observations. Summary Output of Achievement • Survey Targets • Field • Filter • Depth • Area DB Estimated Exposures to be done QA Servers Input to Observation Planning
D.Survey Quality Control • An example for the outputs from the QA system – A achievement summary of a certain virtual survey. • Target example • SXDS • W-J-B • 28.3magAB(3sigma) • 5 FOVs DB QAサーバ