130 likes | 243 Views
DRMS Core System. Karen Tian ktian@Stanford.EDU. Requirements and usage status. Requirements 1TB per year Thousands of transactions per day … Stable system Usage started in late 2005 Current users SID project MDI data: export HMI ground test data: ingest … DB size SUMS+DRMS: 11GB
E N D
DRMS Core System Karen Tian ktian@Stanford.EDU
Requirements and usage status • Requirements • 1TB per year • Thousands of transactions per day • … • Stable system • Usage started in late 2005 • Current users • SID project • MDI data: export • HMI ground test data: ingest • … • DB size • SUMS+DRMS: 11GB • DRMS: > 7.7M records (hmi ground: 7.2M records)
drms_server thread pool client 1 in SUMS thread SUMS client 2 out client n signal thread sockets DB DRMS software structure
A typical module Select input drms_open_records() Processing/analysis drms_getkey_*() drms_segment_read() Write output drms_create_records() or drms_clone_records() drms_setkey_*() drms_segment_write() drms_close_records()
A typical module Select input drms_open_records() Processing/analysis drms_getkey_*() drms_segment_read() Write output drms_create_records() or drms_clone_records() drms_setkey_*() drms_segment_write() drms_close_records()
DB performance evaluation • Query speed • Table size (both width and length) • Number of records in a series 150M records • Fraction of a second for query on indexed keyword, >30 minutes on non-indexed keyword • Performance depending on hardware (disk speed, etc) • Avoid such big tables: split into smaller tables • Insert speed: bulk better than individual • Concurrency • No problem yet with table locking • Performance depends on mix • Index type • Order matters in a composite index • Better way to implement prime key: currently composite index • Additional indices for selected keywords • Dataset names • GROUP BY, ORDER BY • GROUP BY to select the most recent version • ORDER BY to sort according to prime key • Upgrade PostgreSQL to get better sorting performance • Better query formulation • Better table design? Should be completely transparent to user • Effect of index
Records in memory • Current implementation use one query to gets all keywords • Efficient for DB query • Inefficient for memory usage, especially if interested in • A subset of keywords of a record • A column of keywords from a set of records
A typical module Select input drms_open_records() Processing/analysis drms_getkey_*() drms_segment_read() Write output drms_create_records() or drms_clone_records() drms_setkey_*() drms_segment_write() drms_close_records()
Processing/Analysis • Warning against long running ( >1 day) transaction • Drain on system resources • Vacuum can not deleting dead rows • Replication tool Slony-I can not start when there are some transactions open • No checkpoint available yet • Difficulty in committing in the middle of a module because SUs are not committed until the end of a session • Application needs to break up jobs into manageable pieces
A typical module Select input drms_open_records() Processing/analysis drms_getkey_*() drms_segment_read() Write output drms_create_records() or drms_clone_records() drms_setkey_*() drms_segment_write() drms_close_records()
Write output • Transient records: intermediate results, removed at the end of a session • current implementation leaves dead rows in the series table • alternative: CREATE TEMPORARY TABLE • No dead rows to vacuum • Added complexity for drms_open_records() • Modify series definition, e.g., add keywords, etc • Updatable records? • Currently a DRMS record can only be written into once. • To update such a record, one first makes a clone of it, then update the clone • The clone and the original share the same prime keys • Query APIs automatically pick up the latest version unless told otherwise • Would like to allow records to be updated within the same session • Drawback: upset our vacuum plan With insert only, DRMS tables requires minimum vacuum. Allowing records to be updatable even within the same session leaves dead rows behind, which makes DRMS tables candidates for vacuum
Remote DRMS and SUMS • Remote DRMS • Remote DB replicates a subset of HMI/AIA data series • Subscription based • Slony-I logging shipping • Minimum customization of DRMS code • Stanford may subscribe to series originated from remote sites • Remote SUMS • Use higher bits in SUID to identify SUMS sites if local SUM_get(SUID) elsif cached global SUID -> local SUID; SUM_get(local SUID) else fetch from remote SUMS asynchronously; ingest into local SUMS
Export tools • DRMS interacts with SUMS • drms_open_records() does not stage SU's • Head-of-line blocking queue design • Bunch multiple SUM_get() requests into one • Need better polling mechanism • Export: stage data only • SU staging task takes time, easily a few hours, resulting in long running transactions if run as modules • Need to provide a non-blocking alternative: either through direct SUMS connection or allow staging option in drms_open_records()