1 / 147

PS1 PSPS Object Data Manager Design

PS1 PSPS Object Data Manager Design. PSPS Critical Design Review November 5-6, 2007 IfA. Outline. ODM Overview Critical Requirements Driving Design Work Completed Detailed Design Spatial Querying [AS] ODM Prototype [MN] Hardware/Scalability [JV] How Design Meets Requirements

Download Presentation

PS1 PSPS Object Data Manager Design

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. PS1 PSPSObject Data Manager Design PSPS Critical Design Review November 5-6, 2007 IfA

  2. Outline • ODM Overview • Critical Requirements Driving Design • Work Completed • Detailed Design • Spatial Querying [AS] • ODM Prototype [MN] • Hardware/Scalability [JV] • How Design Meets Requirements • WBS and Schedule • Issues/Risks [AS] = Alex, [MN] = Maria, [JV] = Jan

  3. ODM Overview The Object Data Manager will: • Provide a scalable data archive for the Pan-STARRS data products • Provide query access to the data for Pan-STARRS users • Provide detailed usage tracking and logging

  4. ODM Driving Requirements • Total size 100 TB, • 1.5 x 1011 P2 detections • 8.3x1010 P2 cumulative-sky (stack) detections • 5.5x109 celestial objects • Nominal daily rate (divide by 3.5x365) • P2 detections: 120 Million/day • Stack detections: 65 Million/day • Objects: 4.3 Million/day • Cross-Match requirement: 120 Million / 12 hrs ~ 2800 / s • DB size requirement: • 25 TB / yr • ~100 TB by of PS1 (3.5 yrs)

  5. Work completed so far • Built a prototype • Scoped and built prototype hardware • Generated simulated data • 300M SDSS DR5 objects, 1.5B Galactic plane objects • Initial Load done – Created 15 TB DB of simulated data • Largest astronomical DB in existence today • Partitioned the data correctly using Zones algorithm • Able to run simple queries on distributed DB • Demonstrated critical steps of incremental loading • It is fast enough • Cross-match > 60k detections/sec • Required rate is ~3k/sec

  6. Detailed Design • Reuse SDSS software as much as possible • Data Transformation Layer (DX) – Interface to IPP • Data Loading Pipeline (DLP) • Data Storage (DS) • Schema and Test Queries • Database Management System • Scalable Data Architecture • Hardware • Query Manager (QM: CasJobs for prototype)

  7. High-Level Organization

  8. Detailed Design • Reuse SDSS software as much as possible • Data Transformation Layer (DX) – Interface to IPP • Data Loading Pipeline (DLP) • Data Storage (DS) • Schema and Test Queries • Database Management System • Scalable Data Architecture • Hardware • Query Manager (QM: CasJobs for prototype)

  9. Data Transformation Layer (DX) • Based on SDSS sqlFits2CSV package • LINUX/C++ application • FITS reader driven off header files • Convert IPP FITS files to • ASCII CSV format for ingest (initially) • SQL Server native binary later (3x faster) • Follow the batch and ingest verification procedure described in ICD • 4-step batch verification • Notification and handling of broken publication cycle • Deposit CSV or Binary input files in directory structure • Create “ready” file in each batch directory • Stage input data on LINUX side as it comes in from IPP

  10. DX Subtasks DX Initialization Job FITS schema FITS reader CSV Converter CSV Writer Batch Ingest Interface with IPP Naming convention Uncompress batch Read batch Verify Batch Batch Conversion CSV Converter Binary Converter “batch_ready” Interface with DLP Batch Verification Verify Manifest Verify FITS Integrity Verify FITS Content Verify FITS Data Handle Broken Cycle

  11. DX-DLP Interface • Directory structure on staging FS (LINUX): • Separate directory for each JobID_BatchID • Contains a “batch_ready” manifest file • Name, #rows and destination table of each file • Contains one file per destination table in ODM • Objects, Detections, other tables • Creation of “batch_ready” file is signal to loader to ingest the batch • Batch size and frequency of ingest cycle TBD

  12. Detailed Design • Reuse SDSS software as much as possible • Data Transformation Layer (DX) – Interface to IPP • Data Loading Pipeline (DLP) • Data Storage (DS) • Schema and Test Queries • Database Management System • Scalable Data Architecture • Hardware • Query Manager (QM: CasJobs for prototype)

  13. Data Loading Pipeline (DLP) • sqlLoader – SDSS data loading pipeline • Pseudo-automated workflow system • Loads, validates and publishes data • From CSV to SQL tables • Maintains a log of every step of loading • Managed from Load Monitor Web interface • Has been used to load every SDSS data release • EDR, DR1-6, ~ 15 TB of data altogether • Most of it (since DR2) loaded incrementally • Kept many data errors from getting into database • Duplicate ObjIDs (symptom of other problems) • Data corruption (CSV format invaluable in catching this)

  14. sqlLoader Design • Existing functionality • Shown for SDSS version • Workflow, distributed loading, Load Monitor • New functionality • Schema changes • Workflow changes • Incremental loading • Cross-match and partitioning

  15. Distributed design achieved with linked servers and SQL Server Agent LOAD stage can be done in parallel by loading into temporary task databases PUBLISH stage writes from task DBs to final DB FINISH stage creates indices and auxiliary (derived) tables Loading pipeline is a system of VB and SQL scripts, stored procedures and functions sqlLoader Workflow

  16. Load Monitor Tasks Page

  17. Load Monitor Active Tasks

  18. Load Monitor Statistics Page

  19. Load Monitor – New Task(s)

  20. Data Validation Test Uniqueness Of Primary Keys Test the unique Key in each table Test Foreign Keys Test for consistency of keys that link tables Test Cardinalities Test consistency of numbers of various quantities Test HTM IDs Test the Hierarchical Triamgular Mesh IDs used for spatial indexing Test Link Table Consistency Ensure that links are consistent • Tests for data integrity and consistency • Scrubs data and finds problems in upstream pipelines • Most of the validation can be performed within the individual task DB (in parallel)

  21. Distributed Loading Master Schema View of Master Schema View of Master Schema Publish Schema Publish Data Task Data Task Data Task Data Samba-mounted CSV/Binary Files Load Monitor Master LoadAdmin Slave Slave LoadSupport LoadSupport LoadSupport View of Master Schema Task DB Task DB Task DB Publish Finish

  22. Schema Changes • Schema in task and publish DBs is driven off a list of schema DDL files to execute (xschema.txt) • Requires replacing DDL files in schema/sql directory and updating xschema.txtwith their names • PS1 schema DDL files have already been built • Index definitions have also been created • Metadata tables will be automatically generated using metadata scripts already in the loader

  23. Workflow Changes LOAD • Cross-Match and Partition steps will be added to the workflow • Cross-match will match detections to objects • Partition will horizontally partition data, move it to slice servers, and build DPVs on main Export Check CSVs Create Task DBs Build SQL Schema Validate XMatch PUBLISH Partition

  24. Matching Detections with Objects • Algorithm described fully in prototype section • Stored procedures to cross-match detections will be part of the LOAD stage in loader pipeline • Vertical partition of Objects table kept on load server for matching with detections • Zones cross-match algorithm used to do 1” and 2” matches • Detections with no matches saved in Orphans table

  25. XMatch and Partition Data Flow Detections Loadsupport Load Detections ObjZoneIndx Detections_In XMatch Orphans LinkToObj_In Pm Detections_m Detections_chunk Merge Partitions Update Objects Pull Chunk LinkToObj_chunk LinkToObj_m PS1 Objects Objects_m Pull Partition Switch Partition Objects_m LinkToObj_m LinkToObj

  26. Detailed Design • Reuse SDSS software as much as possible • Data Transformation Layer (DX) – Interface to IPP • Data Loading Pipeline (DLP) • Data Storage (DS) • Schema and Test Queries • Database Management System • Scalable Data Architecture • Hardware • Query Manager (QM: CasJobs for prototype)

  27. Data Storage – Schema

  28. PS1 Table Sizes Spreadsheet

  29. PS1 Table Sizes - All Servers Sizes are in TB

  30. Data Storage – Test Queries • Drawn from several sources • Initial set of SDSS 20 queries • SDSS SkyServer Sample Queries • Queries from PS scientists (Monet, Howell, Kaiser, Heasley) • Two objectives • Find potential holes/issues in schema • Serve as test queries • Test DBMS iintegrity • Test DBMS performance • Loaded into CasJobs (Query Manager) as sample queries for prototype

  31. Data Storage – DBMS • Microsoft SQL Server 2005 • Relational DBMS with excellent query optimizer • Plus • Spherical/HTM (C# library + SQL glue) • Spatial index (Hierarchical Triangular Mesh) • Zones (SQL library) • Alternate spatial decomposition with dec zones • Many stored procedures and functions • From coordinate conversions to neighbor search functions • Self-extracting documentation (metadata) and diagnostics

  32. Documentation and Diagnostics

  33. Data Storage – Scalable Architecture • Monolithic database design (a la SDSS) will not do it • SQL Server does not have cluster implementation • Do it by hand • Partitions vs Slices • Partitions are file-groups on the same server • Parallelize disk accesses on the same machine • Slices are data partitions on separate servers • We use both! • Additional slices can be added for scale-out • For PS1, use SQL Server Distributed Partition Views (DPVs)

  34. Distributed Partitioned Views • Difference between DPVs and file-group partitioning • FG on same database • DPVs on separate DBs • FGs are for scale-up • DPVs are for scale-out • Main server has a view of a partitioned table that includes remote partitions (we call them slices to distinguish them from FG partitions) • Accomplished with SQL Server’s linked server technology • NOT truly parallel, though

  35. Scalable Data Architecture S1 Detections_S1 Objects_S1 Head Objects Objects_S1 Objects_S2 S2 Detections_S2 Objects_S3 Objects_S2 Detections DPV Detections_S1 Detections_S2 Detections_S3 S3 Detections_S3 Objects_S3 • Shared-nothing architecture • Detections split across cluster • Objects replicated on Head and Slice DBs • DPVs of Detections tables on the Headnode DB • Queries on Objects stay on head node • Queries on detections use only local data on slices

  36. Hardware - Prototype Storage: S3 PS04 4 10A = 10 x [13 x 750 GB] 3B = 3 x [12 x 500 GB] 2A Server Naming Convention: Function: S2 PS03 4 LX = Linux L = Load server S/Head = DB server M = MyDB server W = Web server PS0x = 4-core PS1x = 8-core 2A S1 PS12 8 L2/M PS05 4 A 2A W PS02 4 Head PS11 8 L1 PS13 LX PS01 4 8 B 2B 2A A Web Staging MyDB Function DB Loading 39 TB 0 TB 9 TB 10 TB Total space RAID10 RAID10 RAID10 RAID5 RAID config 12D/4W 14D/3.5W Disk/rack config

  37. Hardware – PS1 Ingest Queries Queries Queries Queries Queries Live (Copy 1) Offline (Copy 2) Spare (Copy 3) Live (Copy 1) Offline (Copy 2) Spare (Copy 3) Replicate Live (Copy 2) Offline (Copy 1) Spare (Copy 3) Replicate Live (Copy 1) Live (Copy 2) Spare (Copy 3) • Ping-pong configuration to maintain high availability and query performance • 2 copies of each slice and of main (head) node database on fast hardware (hot spares) • 3rd spare copy on slow hardware (can be just disk) • Updates/ingest on offline copy then switch copies when ingest and replication finished • Synchronize second copy while first copy is online • Both copies live when no ingest • 3x basic config. for PS1

  38. Detailed Design • Reuse SDSS software as much as possible • Data Transformation Layer (DX) – Interface to IPP • Data Loading Pipeline (DLP) • Data Storage (DS) • Schema and Test Queries • Database Management System • Scalable Data Architecture • Hardware • Query Manager (QM: CasJobs for prototype)

  39. Query Manager • Based on SDSS CasJobs • Configure to work with distributed database, DPVs • Direct links (contexts) to slices can be added later if necessary • Segregates quick queries from long ones • Saves query results server-side in MyDB • Gives users a powerful query workbench • Can be scaled out to meet any query load • PS1 Sample Queries available to users • PS1 Prototype QM demo

  40. ODM Prototype Components • Data Loading Pipeline • Data Storage • CasJobs • Query Manager (QM) • Web Based Interface (WBI) • Testing

  41. Spatial Queries (Alex)

  42. Spatial Searches in the ODM

  43. Common Spatial Questions Points in region queries • Find all objects in this region • Find all “good” objects (not in masked areas) • Is this point in any of the regions Region in region • Find regions near this region and their area • Find all objects with error boxes intersecting region • What is the common part of these regions Various statistical operations • Find the object counts over a given region list • Cross-match these two catalogs in the region

  44. Sky Coordinates of Points • Many different coordinate systems • Equatorial, Galactic, Ecliptic, Supergalactic • Longitude-latitude constraints • Searches often in mix of different coordinate systems • gb>40 and dec between 10 and 20 • Problem: coordinate singularities, transformations • How can one describe constraints in a easy, uniform fashion? • How can one perform fast database queries in an easy fashion? • Fast:Indexes • Easy: simple query expressions

  45. Describing Regions Spacetime metadata for the VO (Arnold Rots) • Includes definitions of • Constraint: single small or great circle • Convex: intersection of constraints • Region: union of convexes • Support both angles and Cartesian descriptions • Constructors for • CIRCLE, RECTANGLE, POLYGON, CONVEX HULL • Boolean algebra (INTERSECTION, UNION, DIFF) • Proper language to describe the abstract regions • Similar to GIS, but much better suited for astronomy

  46. Things Can Get Complex

  47. We Do Spatial 3 Ways • Hierarchical Triangular Mesh (extension to SQL) • Uses table valued functions • Acts as a new “spatial access method” • Zones: fits SQL well • Surprisingly simple & good • 3D Constraints: a novel idea • Algebra on regions, can be implemented in pure SQL

  48. PS1 Footprint • Using the projection cell definitions as centers for tessellation (T. Budavari)

  49. CrossMatch: Zone Approach • Divide space into declination zones • Objects ordered by zoneid, ra (on the sphere need wrap-around margin.) • Point search look in neighboring zones within ~ (ra ± Δ) bounding box • All inside the relational engine • Avoids “impedance mismatch” • Can “batch” comparisons • Automatically parallel • Details in Maria’s thesis r ra-zoneMax x zoneMax ra ± Δ

  50. Indexing Using Quadtrees 222 20 223 2,3,0 220 221 2,0 23 2,3,1 2,3,2 2,3,3 21 22 2,1 2,2 2,3 • Cover the sky with hierarchical pixels • COBE – start with a cube • Hierarchical Triangular Mesh (HTM) uses trixels • Samet, Fekete • Start with an octahedron, andsplit each triangle into 4 children,down to 20 levels deep • Smallest triangles are 0.3” • Each trixel has a unique htmID

More Related