250 likes | 550 Views
Migrating Enterprise Manager 12c to a new data center with near zero downtime. Darrell Hiraoka – AMGEN Middleware Team Lynn Lu - AMGEN DBA Team September 26, 2013. AGENDA. About AMGEN Background Why OEM 12c OEM12c HA Architecture DR Failover Testing Datacenter Migration Plan
E N D
Migrating Enterprise Manager 12c to a new data center with near zero downtime Darrell Hiraoka – AMGEN Middleware Team Lynn Lu - AMGEN DBA Team September 26, 2013
AGENDA • About AMGEN • Background • Why OEM 12c • OEM12c HA Architecture • DR Failover Testing • Datacenter Migration Plan • Implementation Challenges • Benefits
About AMGEN • World’s leading independentbiotechnology company, with a mission to serve patients • Amgen medicines have reached more than 25 million patients • Presence in more than 50 countries • More than 30 years of pioneering science and vital medicines • Focus solely on discovering, developing, and making human therapeutics • Specializing in innovative medicines for serious illness • Pioneer and world leader in protein therapeutic manufacturing • Broad and deep pipeline of novel product candidates
Background • Database versions from 8i to 11gR2 (total ~1100 instances) • WebLogic versions from 8 to 12c (total ~600 instances) germany IRELAND netherlands massachusetts washington colorado Rhode island San francisco Las vegas Thousand oaks Puerto rico
Why OEM 12c • New Features in OEM 12c • NFS Storage Monitoring • Improved Reporting Features • Improved Performance Diagnostics • Provisioning and Patching • Metric Extensions • Real Time ADDM • Improved Monitoring Thresholds • Need an Enterprise Standard monitoring tool for the Database and Middleware teams • Replace Custom Scripts
Why OEM 12c (continued) • Proactive rather than reactive monitoring • Standard centralized tool for monitoring and reporting • Need more capacity to support growth • Legacy OEM 11g has limited resources • Both OMS and DB running on shared Solaris zones • Need to re-architect OEM environment to support HA • Remediate single point of failure OEM 12c simplified monitoring of databases and middleware providing a global, centralized, and highly available monitoring tool.
OEM 12c HA Architecture • Oracle recommended Level 4 Maximum Availability Architecture (MAA), achieving highest availability in the most cost effective, simple design. • OMS on the primary site in Active/Active Configuration. Repository running on VERITAS two-node cluster. • Duplicate hardware deployed at the standby site. • Standby/additional Standby OMS installed at the standby/DR site. .
OEM 12c HA Architecture (continued) • Database DR using Oracle Data Guard with maximum performance • Local load balancer implemented in both Primary and DR/standby sites • Software Library uses NetApp Storage snap mirror replication between Primary and Standby sites.
OEM 12cArchitecture Diagram • EMCLI DNS Alias Active Site Standby Site OMS2 OMS2 OMS1 OMS1 Shared software library shared software library Netapp snap mirror Vip Vip VCS DB node 1 VCS DB node 2 VCS DB node 2 VCS DB node 1 Data Guard Repository database Repository database
DR Failover Testing A formal DR exercise was performed with the high level steps below: • Switch standby database to be primary • Switch DNS alias to point to DR F5 load balancer • Break software library snap mirror • Reconfigure standby OMS to point to new Primary DB, start standby OMS • Re-sync the repository database with the management agents • Verify all monitored targets and environment are operational
DR Failover Testing (continued) High level tasks performed to verify DR was operational. • Install a new agent and configure targets • Test monitoring alerts and escalation • Run existing jobs • Run reports • The DR exercise completed successfully • in the production pilot. • 40 Database servers (~300 targets) • 10 Middleware servers (~100 targets)
DR Failover Testing Diagram • EMCLI DNS Alias Active Site Down Site OMS2 OMS2 OMS1 OMS1 Shared software library shared software library Vip Vip VCS DB node 1 VCS DB node 2 VCS DB node 2 VCS DB node 1 Repository database Repository database
Datacenter Migration Plan • Enterprise datacenter consolidation drove the relocation of the OEM 12c infrastructure • OEM 12c relocation is the only option available that would meet the minimum downtime requirement • The design of the relocation is based on our current HA implementation and DR test results • The plan will require two additional standby environments which will be provisioned in new data centers
Datacenter Migration Plan (continued) Final Cutover Deployment Planning POC The planned migration to the two new data centers should complete within ~30 min. The migration time could be reduced to ~10 min with further automation and a smaller DNS TTL setting.
Datacenter Migration Plan - Deployment • EMCLI DNS Alias Standby Site Standby Site Standby Site Active Site OMS2 OMS2 OMS2 OMS2 OMS1 OMS1 OMS1 OMS1 Shared software library Shared software library Shared software library Shared software library Netapp snap mirror vip vip vip vip VCS DB node 2 VCS DB node 2 VCS DB node 2 VCS DB node 2 VCS DB node 1 VCS DB node 1 VCS DB node 1 VCS DB node 1 Repository database Repository database Repository database Repository database Data Guard
Datacenter Migration Plan – Final Cutoff • EMCLI DNS Alias OMSDeleted OMS Deleted OMS2 OMS2 OMS1 OMS1 Shared software library Shared software library vip vip VCS DB node 2 VCS DB node 2 VCS DB node 1 VCS DB node 1 Repository database Repository database New Primary DC Active Site Standby Site New DR DC OMS2 OMS2 OMS1 OMS1 Netapp snap mirror Shared software library Shared software library vip vip DB node 2 DB node 2 DB node 1 DB node 1 Repository database Repository database Data Guard
Implementation Challenges • Software Library should be shared across all sites during installation (NFS mount), post installation works with local mirrored copy. • 12c agent highly unstable in Solaris 9. Fix was to upgrade the agent’s JDK version to 1.6-u43.
Implementation Challenges (continued) • False Database/Middleware down alerts sent by OEM for targets in Solaris 9. Fix was with the same JDK upgrade. • Integrating OEM 12c agent with VCS Cluster. • Incorrect metrics as a result of migrating templates from earlier version of OEM. • WebLogic admin server discovery issue – target discovered but attached to wrong agent.
Implementation Challenges (continued) • 12.1.0.2 agent stops monitoring with TOO_MANY_OPEN_FILES error. Patch fixes the bug. • Netapp plugin did not work as expected – currently using metric extensions to monitor Storage. • BI publisher not integrated in Standby OMS • Import/Export of templates - monitoring templates can be exported via EMCLI, but not the associated notification rules.
EM12c Benefits • Real time storage data collection from monitored host targets. • All monitoring alerts are going through the AMGEN standard escalation tool. • Improved blackout – Blackouts were not working as expected in 11g. • Enhanced availability report - Targets frequently in unknown/pending status in OEM 11g.
EM12c Benefits (continued) • Enhanced Procedure Library • Infrastructure available for team to test self service reporting, provisioning, automation, patching, and profiling, etc. OEM 12c provides Amgen a centralized, scalable, and highly available monitoring tool for the Oracle database and WebLogic middleware platforms.
Thank You • Oracle – Perren Walker & Mark McGill • AMGEN – DBA, Middleware, Network, Storage, UNIX