250 likes | 333 Views
LCG – Databases - Meeting. 25 March 2008. Presences: Miguel Anjo, John Shade, Paolo Tedesco, Phool Chand, David Collados, Judit Novak, James Casey, Steve Traylen. Outline. Main issue during the power cut .
E N D
LCG – Databases - Meeting 25March 2008
Presences: Miguel Anjo, John Shade, Paolo Tedesco, Phool Chand, David Collados, Judit Novak, James Casey, Steve Traylen Presentation title - 2
Outline Presentation title - 3
Main issue during the power cut • Ethernet network switches in RAC6 were not connected to the critical power (wrong connection of the power bar) • The public and cluster interconnect networks went down Presentation title - 4
Service Changes • Announce and schedule interventions • Have a main contact that keeps plan and progress, contact all parts and announces restart of all services • Move from dbms_job to dbms_Scheduler EGEE_PPS_SAM rmTDLOneWeekOld; LCG_FTS_PROD begin fts_history.movedata; end; LCG_FTS_PROD begin fts_servicestate.runjob; end; LCG_FTS_PROD_T2 begin fts_history.movedata; end; LCG_SAM_PPS p_testdef_autodel; LCG_SAM_PPS rmTDLOneWeekOld; http://oracle-documentation.web.cern.ch/oracle-documentation/10gr2doc/server.102/b14231/jobtosched.htm • End synonyms • Use Schema.TABLE_NAME • Select from LCG_GRIDVIEW.SITES (from lcg_gridview_r or lcg_same_wor …) • Only need to grant the privileges • Check usage outside CERN (Miguel Anjo) LCG meeting - 6
Developer tasks Presentation title - 7
Points to improve • ServiceMapaccount • Need reader/writer • LCG_Gridmap service • User using LCG_SAM service • Cleanup/partitioning of SAM (sam meeting?) • GridviewMerge/partitioning (gridview meeting?) • Weekly report checkup LCG meeting - 8
AOB • Lemon alarm for DB availability • Create lemon metric for DB services • Next meeting • 29th July? LCG meeting - 9
Why and what was done • Space pressure on LCGR storage arrays • Of ~2750GB, only 175GB are available • Not possible to shrink datafiles • 650GB space not used in datafiles • Solution: move segments Presentation title - 10
Why and what was done • Overview • Backup system will appreciate <200GB datafiles • Datafile is smallest unit to backup, not possible to parallelize neither resume Presentation title - 11
Why and what was done • Partitioned tables • LCG_SAME.TESTDATA (April 2007) • Monthly partitions up to “2008”, indexed, clob • Data since July 2007 • Work to do to (data move during CPUJan08 not finished - see later) • LCG_SAME.TESTDATA_HISTORY (March 2008) • half-yearly partitions/tablespaces, no indexes • Data between July 2006 and July 2007 • Created during CPUJan2008 • LCG_GRIDVIEW.JOBSTATUSRAW (March 2008) • Monthly partition up to Dec/2010, indexed • Created during CPUJan2008 Presentation title - 12
Why and what was done • LCG_GRIDVIEW_DATA01 space waste • Not possible to shrink datafiles. • Solution: move data to different datafile • ALTER TABLE MOVE + ALTER INDEX REBUILD • Copy table, constraints online • Copy indexes online • Made some cursors invalid (need to restart app) • Done for tables <1GB (Thursday 6.March) • DBMS_REDEFINITION does online • Copy table, indexes, constraints, keep synchronized • Rename tables, copy privileges • Done successfully for 7 tables (Monday 10.March) • Failed for table VO (but reported successful) Presentation title - 13
Why and what was done • Why it failed (table VO)? • Service request open to Oracle • Table VO is heavily used (several users, synonyms, views, procedures) • Oracle failed to get a lock but did not report error • “ORA-4020 Deadlock when trying to lock xxx” reported for other tables when moving • Similar problem for table SITES and NODES • Currently difficult to create/drop tables referencing those tables • (tables in bad state? Service Request) Presentation title - 14
Missing operations • LCG_GRIDVIEW • Recreate tables SITES, NODES, VO • Move 10 tables off DATA01 (120GB) • Possible “exp/imp” or “table move + index rebuild” • 8 hours?? • LCG_SAME • Move partitions 2H2007 to correct tablespace • Split 2008 partitions • Create partitions up to Dec2010 • >1 day, “transparent” Presentation title - 15
Outline Presentation title - 16
Current situation Presentation title - 17
What can be done • Partitioning • Some maintenance work • No space gain • Aggregates • After aggregation, delete row data • Space gain and performance boost • History table (no indexes, compressed) • Little space gain • Heavy maintenance work Presentation title - 18
Expected growth • Start monitoring of space growth per table • What are expectations? • How much aggregate data will be kept? • What about aggregation of aggregates? • LCG_SAM? • LCG_GRIDVIEW? • LCG_FTS? Presentation title - 19
Outline Presentation title - 20
Transparent interventions • Huge database (for SAM, GridView) • Impossible to perform full scale tests • Some operations ‘with risk’ for long periods • How to schedule? • Possible to do with downtime? (less risk) • Notification flow? Presentation title - 21
Applications resilience to interventions • Resilient: adj. • Marked by the ability to recover readily, as from misfortune; • Capable of returning to an original shape or position, as after having been compressed. Presentation title - 22
Outline Presentation title - 23
Next meeting • Main developers of LCG • Weekly report, interventions planned, SQL optimization, share solutions • Schedule: Monday after the 15th at 14:00 • Next meeting 21st April – 14:00 Presentation title - 24