560 likes | 675 Views
PS1 PSPS Object Data Manager Design. PSPS Critical Design Review November 5-6, 2007 IfA. Detail Design. General Concepts Distributed Database architecture Ingest Workflow Prototype. Zones. Declination (Dec). Right Ascension (RA). Zones (spatial partitioning and indexing algorithm)
E N D
PS1 PSPSObject Data Manager Design PSPS Critical Design Review November 5-6, 2007 IfA
Detail Design • General Concepts • Distributed Database architecture • Ingest Workflow • Prototype
Zones Declination (Dec) Right Ascension (RA) Zones (spatial partitioning and indexing algorithm) • Partition and bin the data into declination zones • ZoneID = floor ((dec + 90.0) / zoneHeight) • Few tricks required to handle spherical geometry • Place the data close on disk • Cluster Index on ZoneID and RA • Fully implemented in SQL • Efficient • Nearby searches • Cross-Match (especially) • Fundamental role in addressing the critical requirements • Data volume management • Association Speed • Spatial capabilities
Zoned Table ZoneID = floor ((dec + 90.0) / zoneHeight) * ZoneHeight = 8 arcsec in this example
SQL CrossNeighbors SELECT * FROM prObj1 z1 JOIN zoneZone ZZ ON ZZ.zoneID1 = z1.zoneID JOIN prObj2 z2 ON ZZ.ZoneID2 = z2.zoneID WHERE z2.ra BETWEEN z1.ra-ZZ.alpha AND z2.ra+ZZ.alpha AND z2.dec BETWEEN z1.dec-@r AND z1.dec+@r AND (z1.cx*z2.cx+z1.cy*z2.cy+z1.cz*z2.cz) > cos(radians(@r))
Partitions • SQL Server 2005 introduces technology to handle tables which are partitioned across different disk volumes and managed by a single server. • Partitioning makes management and access of large tables and indexes more efficient • Enables parallel I/O • Reduces the amount of data that needs to be accessed • Related tables can be aligned and collocated in the same place speeding up JOINS
Partitions • 2 key elements • Partitioning function • Specifies how the table or index is partitioned • Partitioning schemas • Using a partitioning function, the schema specifies the placement of the partitions on file groups • Data can be managed very efficiently using Partition Switching • Add a table as a partition to an existing table • Switch a partition from one partitioned table to another • Reassign a partition to form a single table • Main requirement • The table must be constrained on the partitioning column
Partitions • For the PS1 design, • Partitions mean File Group Partitions • Tables are partitioned into ranges of ObjectID, which correspond to declination ranges. • ObjectID boundaries are selected so that each partition has a similar number of objects.
Distributed Partitioned Views • Tables participating in the Distributed Partitioned View (DVP) reside on different databases which reside in different databases which reside on different instances or different (linked) servers
Concept: Slices • In the PS1 design, the bigger tables will be partitioned across servers • To avoid confusion with the File Group Partitioning, we call them “Slices” • Data is glued together using Distributed Partitioned Views • The ODM will manage slices. Using slices improves system scalability. • For PS1 design, tables are sliced into ranges of ObjectID, which correspond to broad declination ranges. Each slice is subdivided into partitions that correspond to narrower declination ranges. • ObjectID boundaries are selected so that each slice has a similar number of objects.
Detail Design Outline • General Concepts • Distributed Database architecture • Ingest Workflow • Prototype
PS1 Distributed DB system objZoneIndx orphans_l1 Detections_l1 LnkToObj_l1 detections detections objZoneIndx Orphans_ln Detections_ln LnkToObj_ln Linked servers Load Support1 Load Supportn LoadAdmin PartitionsMap Linked servers P1 Pm [Objects_p1] [LnkToObj_p1] [Detections_p1] Meta [Objects_pm] [LnkToObj_pm] [Detections_pm] Meta PS1 PartitionsMap Objects LnkToObj Meta Detections PS1 database Query Manager (QM) Legend Database Full table [partitioned table] Output table Partitioned View Web Based Interface (WBI)
Design Decisions: ObjID • Objects have their positional information encoded in their objID • fGetPanObjID (ra, dec, zoneH) • ZoneID is the most significant part of the ID • It gives scalability, performance, and spatial functionality • Object tables are range partitioned according to their object ID
ObjectID Clusters Data Spatially Dec = –16.71611583 ZH = 0.008333 ZID = (Dec+90) / ZH = 08794.0661 ObjectID = 087941012871550661 RA = 101.287155 ObjectID is unique when objects are separated by >0.0043 arcsec
Design Decisions: DetectID • Detections have their positional information encoded in the detection identifier • fGetDetectID (dec, observationID, runningID, zoneH) • Primary key (objID, detectionID), to align detections with objects within partitions • Provides efficient access to all detections associated to one object • Provides efficient access to all detections of nearby objects
DetectionID Clusters Data in Zones Dec = –16.71611583 ZH = 0.008333 ZID = (Dec+90) / ZH = 08794.0661 DetectID = 0879410500001234567 ObservationID = 1050000 Running ID = 1234567
ODM Capacity 5.3.1.3 The PS1 ODM shall be able to ingest into the ODM a total of • 1.51011 P2 detections • 8.31010 cumulative sky (stack) detections • 5.5109 celestial objects together with their linkages.
PS1 Table Sizes - Monolithic Sizes are in TB
What goes into the main Server Linked servers P1 Pm PS1 PartitionsMap Objects LnkToObj Meta PS1 database Objects LnkToObj Meta PartitionsMap Legend Database Full table [partitioned table] Output table Distributed Partitioned View
What goes into slices Linked servers P1 Pm [Objects_pm] [LnkToObj_pm] [Detections_pm] PartitionsMap Meta [Objects_p1] [LnkToObj_p1] [Detections_p1] PartitionsMap Meta PS1 PartitionsMap Objects LnkToObj Meta PS1 database [Objects_p1] [LnkToObj_p1] [Detections_p1] Meta PartitionsMap Legend Database Full table [partitioned table] Output table Distributed Partitioned View
What goes into slices Linked servers P1 Pm [Objects_pm] [LnkToObj_pm] [Detections_pm] PartitionsMap Meta [Objects_p1] [LnkToObj_p1] [Detections_p1] PartitionsMap Meta PS1 PartitionsMap Objects LnkToObj Meta PS1 database [Objects_p1] [LnkToObj_p1] [Detections_p1] Meta PartitionsMap Legend Database Full table [partitioned table] Output table Distributed Partitioned View
Duplication of Objects & LnkToObj • Objects are distributed across slices • Objects, P2ToObj, and StackToObj are duplicated in the slices to parallelize “inserts” & “updates” • Detections belong into their object’s slice • Orphans belong to the slice where their position would allocate them • Orphans near slices’ boundaries will need special treatment • Objects keep their original object identifier • Even though positional refinement might change their zoneID and therefore the most significant part of their identifier
Glue = Distributed Views Linked servers P1 Pm [Objects_pm] [LnkToObj_pm] [Detections_pm] PartitionsMap Meta [Objects_p1] [LnkToObj_p1] [Detections_p1] PartitionsMap Meta PS1 PartitionsMap Objects LnkToObj Meta Detections PS1 database Detections Legend Database Full table [partitioned table] Output table Distributed Partitioned View
Partitioning in Main Server • Main server is partitioned (objects) and collocated (lnkToObj) by objid • Slices are partitioned (objects) and collocated (lnkToObj) by objid Linked servers P1 Pm PS1 PS1 database Query Manager (QM) Web Based Interface (WBI)
PS1 Table Sizes - Main Server Sizes are in TB
PS1 Table Sizes - Each Slice Sizes are in TB
PS1 Table Sizes - All Servers Sizes are in TB
Detail Design Outline • General Concepts • Distributed Database architecture • Ingest Workflow • Prototype
PS1 Distributed DB system objZoneIndx orphans_l1 Detections_l1 LnkToObj_l1 detections detections objZoneIndx Orphans_ln Detections_ln LnkToObj_ln Linked servers Load Support1 Load Supportn LoadAdmin PartitionsMap Linked servers P1 Pm [Objects_p1] [LnkToObj_p1] [Detections_p1] PartitionsMap Meta [Objects_pm] [LnkToObj_pm] [Detections_pm] PartitionsMap Meta PS1 PartitionsMap Objects LnkToObj Meta Detections PS1 database Query Manager (QM) Legend Database Full table [partitioned table] Output table Partitioned View Web Based Interface (WBI)
“Insert” & “Update” • SQLInsert and Update are expensive operations due to logging and re-indexing • In the PS1 design, Insert and Update have been re-factored into sequences of: Merge + Constrain + Switch Partition • Frequency • f1: daily • f2: at least monthly • f3: TBD (likely to be every 6 months)
Ingest Workflow X(1”) DZone DXO_1a X(2”) NoMatch Resolve Detect DXO_2a P2PsfFits Orphans P2ToObj ObjectsZ CSV
Ingest @ frequency = f1 Orphans_1 P2ToPsfFits_1 P2ToObj_1 StackToObj P2ToObj Objects 1 11 2 12 13 3 Orphans_1 Stack*_1 P2ToPsfFits_1 Objects_1 P2ToObj_1 ObjectsZ P2ToObj P2PsfFits Metadata+ Orphans SLICE_1 LOADER MAIN
Updates @ frequency = f2 StackToObj P2ToObj Objects 1 11 2 12 3 13 Objects P2ToObj_1 Stack*_1 P2ToPsfFits_1 Orphans_1 Objects_1 SLICE_1 LOADER MAIN Metadata+
Updates @ frequency = f2 StackToObj P2ToObj Objects 1 11 2 12 13 3 Objects Objects_1 P2ToPsfFits_1 Stack*_1 Orphans_1 P2ToObj_1 Objects_1 Objects Metadata+ SLICE_1 LOADER MAIN
Snapshots @ frequency = f3 Objects P2ToObj StackToObj Objects 1 2 3 Snapshot Metadata+ MAIN
Batch Update of a Partition select into 1 1 2 1 2 3 A1 A2 A3 … merged select into … where select into … where select into … where B1 + PK index B2 + PK index B3 + PK index switch switch switch B1
Scaling-out • Apply Ping-Pong strategy to satisfy query performance during ingest 2 x ( 1 main + m slices) [Objects_p1] [LnkToObj_p1] [Detections_p1] [Objects_p2] [LnkToObj_p2] [Detections_p2] Meta Linked servers P1 P2 Pm P1 [Objects_pm] [LnkToObj_pm] [Detections_pm] [Objects_p1] [LnkToObj_p1] [Detections_p1] Meta P2 P3 Pm-1 Pm PS1 PS1 Detections Detections PartitionsMap Objects LnkToObj Meta PartitionsMap Objects LnkToObj Meta PS1 database Query Manager (QM) Legend Database Duplicate Full table [partitioned table] Partitioned View Duplicate P view
Scaling-out • More robustness, fault-tolerance, and reabilability calls for 3 x ( 1 main + m slices) [Objects_p1] [LnkToObj_p1] [Detections_p1] [Objects_p2] [LnkToObj_p2] [Detections_p2] Meta Linked servers P1 P2 Pm P1 [Objects_pm] [LnkToObj_pm] [Detections_pm] [Objects_p1] [LnkToObj_p1] [Detections_p1] Meta P2 P3 Pm-1 Pm PS1 PS1 Detections Detections PartitionsMap Objects LnkToObj Meta PartitionsMap Objects LnkToObj Meta PS1 database Query Manager (QM) Legend Database Duplicate Full table [partitioned table] Partitioned View Duplicate P view
Adding New slices SQL Server range partitioning capabilities make it easy • Recalculate partitioning limits • Transfer data to new slices • Remove data from slices • Define an d Apply new partitioning schema • Add new partitions to main server • Apply new partitioning schema to main server
Detail Design Outline • General Concepts • Distributed Database architecture • Ingest Workflow • Prototype
ODM Ingest Performance 5.3.1.6 The PS1 ODM shall be able to ingest the data from the IPP at two times the nominal daily arrival rate* * The nominal daily data rate from the IPP is defined as the total data volume to be ingested annually by the ODM divided by 365. • Nominal daily data rate: • 1.51011 / 3.5 / 365 = 1.2108 P2 detections / day • 8.31010 / 3.5 / 365 = 6.5107 stack detections / day
Number of Objects * “SDSS” includes a mirror of 11.3 < < 30 objects to < 0 Total GB of csv loaded data: 300 GB CSV Bulk insert load: 8 MB/s Binary Bulk insert: 18-20 MB/s CreationStarted: October 15th 2007 Finished: October 29th 2007 (??) Includes • 10 epochs of P2PsfFits detections • 1 epoch of Stack detections
Size of Prototype Database Table sizes are in billions of rows
Size of Prototype Database Table sizes are in GB 9.6 TB of data in a distributed database
Ingest and Association Times Educated Guess Wild Guess