190 likes | 336 Views
CDF Taking Stock. By Anil Kumar CD/CSS/DSG June 22, 2005. Current Infrastructure. Current Infrastructure. CDF Capacity All Applications. CDF Offline DB Growth* 43(online)+5G(offline)/year * Slow Control is not in Offline CDF Online DB Growth 50G/year. CDF Online Applications.
E N D
CDF Taking Stock By Anil Kumar CD/CSS/DSG June 22, 2005 CDF Taking Stock 2005-2006
Current Infrastructure CDF Taking Stock 2005-2006
Current Infrastructure CDF Taking Stock 2005-2006
CDF Capacity All Applications • CDF Offline DB Growth* 43(online)+5G(offline)/year * Slow Control is not in Offline • CDF Online DB Growth 50G/year CDF Taking Stock 2005-2006
CDF Online Applications CDF Taking Stock 2005-2006
CDF Offline Applications CDF Taking Stock 2005-2006
Monitoring And Data Modeling Tools Monitoring Tools : • dbatool/toolman To monitor the space usage, users, SQL, tempspace, sniping of inactive sessions, auto start of Listener, IA, estimate table/Index stats • OEM (Oracle Enterprise Manager) - DB Monitoring tool/ Monthly charts posted on web Db Performance Charts : http://www-cdserver.fnal.gov/cd_public/css/dsg/db_stats/data/db_stats.html The url for the ganglia charts (monitoring tools) is:http://fcdfmon2.fnal.gov/ Data Modeling Tool : Oracle Designer is used for Data Modeling and initial space estimates for applications. CDF Taking Stock 2005-2006
Uptimes • Cdfonprd 100% • Cdfofprd 99.4356% 1776 minutes unscheduled Down Time since 11/11/2004 • Cdf Replica 100% CDF Taking Stock 2005-2006
Accomplishments • Upgraded CDF databases to 9.2.0.6 • Quarterly Database Security Up-to-date • Tuned/Regression test the streams replication as per current API usage. • Deployment of bzora1 for cdfonprd Very smooth transition. No interruption to Data Taking ! • Decommissioned b0dau35 • Oracle Backups for cdfonprd to DCache/Enstore http://www-css.fnal.gov/dsg/external/cdfdbmtgs/all_other_documentation/bzora1.pdf • Deployed the long/eagerly awaited streams replication across CDF databases. Hard Work of css-dsg spanned across more than 2 years is finally in production. All issues encountered are addressed in timely manner. • Smooth Transition to fcdfora6 with streams replication. • Decommissioned fcdflnx1. • Implemented Capture of Long transactions in db. CDF Taking Stock 2005-2006
Replication Tool • Streams Replication tool “strmrep” • Production Deployment of Streams Replication encountered two issues : a) Replication of two packages RUNDB and HDWDB caused Streams to halt. Worked very hard to address the issue/deploy the workaround. Permanent bug Fix is released by Oracle on Thus 06/16/05. This bug was not encountered in integration Test. b) SAM can’t be replicated using streams since SAM application has variable length CLOBS and functional index. There was not enough time to do regression test and no use case. • One more error after production deployment that was causing one of streams process to halt. Deployed the work around. Oracle found the root cause and bug fix will be available in 1-2 weeks. • Cdf Streams Status on-line ( Courtesy Randy Herber) http://dbb.fnal.gov:8520/cdfr2/databases?type=ora-strms&fsrc=cdfofpr2&nsrc=cdfofpr2&gsrc=cdfofpr2&dcbk=FILECATALOG • Documentation http://www-css.fnal.gov/dsg/internal/ora_repl/ CDF Taking Stock 2005-2006
Freeware Db Support • Mysql/Postgres prototype • proof of product with CDF data • Mechanism for population IS on demand, it does not support updates • CDF successfully tested with CDF code - (Karlsruhe) • DSG has begun to provide consulting for freeware databases • actively maintaining new versions of mysql & postgres in KITS and working towards a more robust environment • actively maintaining documentation for mysql & postgres in our freeware area. • Reference url: http://www-css.fnal.gov/dsg/external/freeware • actively assisting users with questions, upgrades, testing, etc. for freeware products. CDF Taking Stock 2005-2006
Back-up • CDF ONLINE DATABASES cdfonprd - Daily, 7 days of archives, Two Backup copies always on DISK - Allocated 535GB Used 461GB ( 2 Copies) , Backup time 1 Hr 23 Min Vs 2 Hrs 30 Min on old hardware - CDF on-line Backup to DCache/Enstore: Daily cdfondev Daily, 14 days archives, one always on DISK cdfondev -> 2 Hrs 30Min cdfonint Daily, 30 days archives, one always on Disk - Allocated 356GB, Used 219GB ,cdfonint -> 2 Hrs 15 Min • CDF OFFLINE DATABASES cdfofprd DFC+SAM, Daily, 8 days of archive and Export. One always on Disk. Allocated 270GB , Used 25 GB Backup time -> 1 Hr 24 Min Cdfstrm1 being replica of on-line and DFC. No backup ->RMAN/ Tape. cdfofdev– Daily cdfofdev, 7 days of archives , cdfofdev -> 3 Hrs 20 Mins cdfofint 2 times/week for cdfofint, 7 days of archives Allocated 67Gb, used 36Gb (2 copies) , cdfofint-> 2:30Mins CDF Taking Stock 2005-2006
Oracle Backup for cdfonprd toDCache/Enstore • RMAN to DCache/Enstore is working fine, but needs fine tuning to fit our(dsg) standard, firewall independent backup mode. • Working reliably. Fully automated for dailys. • Data Integrity tested twice while recovery. • Data Integrity tested 4 times via md5sum • Not currently using weekly or monthly PNFS directory structure. • Intend to send weekly on Sunday and Monthly on Ist. • No archives being sent yet. • PNFS metadata maintenance being done manually. CDF Taking Stock 2005-2006
RMAN Backup on SAN • Inexpensive, large disk array can accommodate growing RMAN backups • Fast & reliable backup and recovery • 24 x 7 and 8 x 5 support tiers available • Can serve various O/S platforms • Briefing on the database backup/recovery standardization on june 16, discussed the san testing in more detail. http://www-css.fnal.gov/dsg/internal/briefings_and_projects/briefings/standardizing_database_backups.ppt • Multiplexing of archives to local disk and SAN CDF Taking Stock 2005-2006
RMAN to SAN Experience • d0ofdev1 RMANs to SAN since Nov. ’04 • Two 1TB SAN mount points available • Keep 2 alternating days of RMANs on SAN, once/week to local backup disk • RMAN validation to determine backup file integrity • One validation failure since Nov. ’04 • Recoveries from SAN were all successful CDF Taking Stock 2005-2006
SAN issues • Current SAN is not 24 x 7 support • IDE disks are not as reliable as other, more expensive disks are • Purchasing 24 x 7 SAN requires licensing and changes to O/S to be able to use it • Firewall issues (CDF & D0 online) • We will be extremely careful in implementing SAN for bzora1. On bzora1 : a) PCI Card has been installed. b) fiber between cdf and fcc has been identified for use, we are waiting for additional san hardware for bzora1. CDF Taking Stock 2005-2006
SAM Schema • Production Deployments : - Autodestination Sub-System of SAM schema - Indexes on Param Values Deployed in production. - Data Types correction cut. - Indexes for Volumes • Work-in-progress - Request Sub System of SAM Schema. Cut in Mini-sam. • Upgrade to Mini SAM as SAM Schema Evolved. -> This facilitate individual developers to have copy of SAM metadata and seed data available for server software rewrite if needed. • Mini-SAM in Postgres. Initiative to move towards free ware Databases for SAM . Proof of product not complete, requires testing with a dbserver from the sam development team CDF Taking Stock 2005-2006
What’s Next ? • Deploy san/enstore backup recovery plan. ( TESTING OF SAN on d0 offline is work-in-progress) Backup to DCache/Enstore already in place for CDF on-line • Re-allocate Winchester Disk Array from fcdflnx1 to fcdfora1 sothat enough space to reconfigure streams integration setup. • Reconfigure Streams Test Env cdfonint -> cdfofint -> cdfrep23 • SAM Request Sub System Schema Deployment • Patch cdf database for replication of RUNDB and HDWDB packages. ( Patch was released by Oracle on Thu 06/16/05) • Converting cdfonline to 64 bit. Testing will be challenge. • O/S upgrade (reinstall) to 2.9 on b0dau36 . Decommissioned Veritas. • Performance tuning on fcdfora4 to sga > 2Gb to allocate more memory to streams • Migrating Slow Control to Linux. • Rewrite of dbatools/toolman for enhanced features of monitoring and 10g support. • Upgrade OEM to 10g . Work in progress. • Possible Upgrade to 10g due to incremental database backups and streams replication’s enhanced features. • Testing of postgres mini sam for proof of product. CDF Taking Stock 2005-2006
Concerns • Replication of SAM depends upon the stress test results on fcdfora4. • Simulation of Applications as we have for CALIB. Robust test Suite needed. • Single point of failure for SAM and DFC • Migration of DFC to SAM . Plan and Schedule ? • Close Out for Data Guard/Standby is still pending. • Move Slow Control off of bzora1 - Require 3 instances - OS Linux ? If Linux then not a 24*7 Machine. • Some of CDF Applications Data Model is not in Designer. • What is cdf's direction, if any, in respect to freeware? • Any more Streams replica ? • Deputy CDF database Liaison ? • TNSNAMES deployment for CDF was a nightmare. Experience should be documented. • Special Clean-up jobs should be co-ordinated with css-dsg • In case of Hardware Failure on offline, we have to resintantiate replication Vs recovery since we have partial backups on offline prd db. • Move off VxWorks from b0dau36. CDF Taking Stock 2005-2006