300 likes | 439 Views
Evolving the Enterprise’s Database Infrastructure. “Move to the Grid”. Agenda. Problems Introduction to Oracle 10g features Demonstrate impact on the Enterprise Propose Phase I Project Consolidation Scalable Grid Architecture. Top 7 Problems for DBAs .
E N D
Evolving the Enterprise’s Database Infrastructure “Move to the Grid”
Agenda • Problems • Introduction to Oracle 10g features • Demonstrate impact on the Enterprise • Propose Phase I Project • Consolidation • Scalable Grid Architecture
Top 7 Problemsfor DBAs • Growth in number and size of databases do not match staffing levels • Root cause of performance bottlenecks are not easily diagnosed or obvious • After a session ends, statistics and troubleshooting information are not always available • Databases are shoehorned onto servers without consideration of correct layout leading to IO bottlenecks
Top 7 Problemsfor DBAs • Impossible to manually monitor and tune all databases • Managing storage correctly is very time consuming • Database tuning is part experience, part science, part art and part intuition.
Top Problemsfor Sysadmins • Many different servers, different architectures • High number of databases per single node – complex to schedule maintenance windows • Grey area between DBA and sysadmin responsibilities
New in 10g • The vision for the grid • 10g not a regular database upgrade • RAC enhancements • Backup strategy • ASM (Automatic Storage Management) • ADDM & Advisors • DataGuard
Problems Solved in 10gfor DBAs • Some tedious and time consuming DBA tasks are now managed by Oracle • Oracle will identify root causes of performance issues and rank the effectiveness of fixing them • Oracle stores statistics about every session in its repository • ASM will rebalance hot spots making it easier to have many databases on a server
Problems Solved in 10gfor DBAs • 10g metrics and alerts will allow the DBAs to be more proactive by providing out of the box alerts • ASM will allow for Oracle to manage storage reducing this very time consuming problem • Oracle 10g provides advisors for tuning
The vision for the Grid • The “g” in 10g • Grid is not RAC, RAC is not Grid • Treat all computing resources like a utility in all layers of the product stack • Clustered application servers (ias cluster) • Clustered database (RAC) • Automatic Storage management (ASM) for provisioning Storage
The vision for the Grid • Scalability – Easily add more resources • Management, monitoring and provisioning with “Grid Control” • Virtualization of resources – Applications are not tied to specific hardware but rather see one large pool of resources
10g NOT a regular database upgrade • Big learning curve • Changes at all levels of the hardware stack • Good opportunity to define job responsibilities in relation to the hardware stack
The grid hardware stack • Application servers (ISR/NCS depending on application) • Databases (DBA Team / ISR) • Load balancers/Interconnects/Network Infrastructure (NCS) • Servers (NCS Sysadmins) • Storage Architect (NCS) • Cluster (Sysadmins/Storage Architects) • Firewall appliances (NCS) • Backups (DBA / NBU Admins)
RAC Enhancements • FAN – Fast Application Notification • Smarter load balancing across nodes • Can now mix different classes of servers in your Cluster this gives ability to leverage existing hardware • Before grid some servers were almost always idle and some were never idle, grid makes the best use of resources • Assign % of CPU usage to a Service • Better management of workload
Backup Philosophy in 10g • Backups go to disk not tape • Flashback logs • Supports flashback database and recovery through resetlogs • Flash recovery area • On disk • Holds one full backup • Holds all Incrementals • Archive & flashback logs • Backed up and managed by RMAN • Flash recovery area backed up to Tape • Best practice: Use ASM for this area • Shared by all instances on server
Backup philosophy in 10g • Benefits • Most failures now are due to NBU on a rate of 5 or 6 per day. Requires operations to resubmit the backup and DBA time to follow up. • Time of Backup now at 4-6 hours (for MCGP) • Lots of time spent waiting on tape • Recovery from tape is slow, new features help minimize downtime • All files to recover are in same location • Having this on ASM minimizes work to maintain archivelog free space (avoid database hang)
Automatic Storage management • Oracle’s “Smart” Filesystem • DBAs only have to deal with a few diskgroups rather then trying to fit datafiles on fixed size mountpoints. • Raw partitions have always been recommended for performance but before ASM were very difficult to manage • ASM can stripe and mirror your storage (Optional) • ASM can rebalance to avoid hot spots • Managing storage is very time consuming to do right, ASM does the tedious tasks for you.
ADDM & Advisors • Oracle has internalized metric collection in 10g • ADDM runs and looks for problems • ADDM will recommend the use of advisors to further investigate the problem • Will help the DBA (and developer) by providing tuning advice.
DataGuard • What is redo • RAC = Instance availability • DataGuard = Database availability • Logical and Physical standby • Protect database vs. Provide service • All enterprise systems should have Dataguard • Imagine loosing an hour of committed transactions in Banner or Vista? • Time to rebuild an enterprise system? • Uses for DWH
Phase I Project scope • Bring in required infrastructure • Consolidate • Tempest/Squall replaced with scalable grid technology • Migrate DORACs/ORACs into this architecture
Phase I Project scope Current grid control implementation not highly available • Migrate Grid Control repository database to RAC. • Cluster application server, Norad2 • Leverage virtualization
Required Infrastructure(Grid Control) • Have been using grid control for the past two years since it was beta • Not optional in 10g* • Has helped us to develop standards and be proactive • Upgrade to release 2 in progress • Release 2 improves on provisioning and RAC management • Will be used by developers as well as DBAs when we go to 10g • In release 2, Oracle has partnered with third parties to deploy agents on non Oracle software and appliances Including SQL Server, WebLogic, F5 Load Balancers
Losing Grid Control • No monitoring and alerts for databases • No GUI to manage 10g databases • Loss of tools for programmers and DBAs • Scheduled DBA jobs would not run
Required Infrastructure(OID) Oracle Internet Directory • ONAMES is deprecated in 10g. ONAMES is a central naming service used to translate a name to a connect string and is needed for connectivity. • Bridge from Oracle products to Active Directory for single sign-on and authentication • Could have many other uses to manage and simplify security in Oracle products (Needs more research) • Should be highly available or risk users not being able to connect to databases
Required Infrastructure(OID) Establish a two node OID, objectives: • Replace ONAMES and shared TNSNAMES files as a standard naming method • Clean up of all names as well as investigate the use of global_names • Replace infra1.portal.mcgill.ca for managing authentication. (Migrate asdb instance on infra1 to RAC - solely for Portal metadata)
Infrastructure(worth investigating) WebCache • Part of Oracle application server install • Used by Portal (but not currently installed in HA config) • Should be made highly available • Should have a better understanding of how it works • Can it benefit more than just the portal? (Improve Registration?) • Investigate “Times 10” data cache
Consolidation(Tempest/Squall) • Tempest and Squall are servers funded by NCS as per a Tony Masi initiative to consolidate disparate databases from across campus. • Tempest is a test server containing 12 databases. • Squall is a production server containing 20 databases • Databases serve mostly E-business group’s clients, ICS (HEAT) and ARR (Scheduling) • On-going demand for new databases • Difficult to estimate capacity and resource needs • Not scalable and not highly available • Best candidate for new architecture
Consolidation(Tempest/Squall) • Set up a 10g test grid to replace Tempest • Set up a 10g production grid to replace Squall • Migrate any applications on Tempest/Squall to 10g grid for which 10g is supported as well as migrate all McGill developed applications currently residing on Tempest/Squall. • Migrate NCS databases • Production Grid will provide a location for any 10g database that needs to be highly available (Grid Control repository, Portal repository) • Project should include consultant from Oracle to review plan, discuss best practices and guide in initial setup of test environment. • Good learning experience before restructuring large Enterprise systems (Vista, Banner)
Risks of non-action • Not a Tony Masi “Top 5” project but if we do not get Phase I accomplished and gain the needed knowledge we will not meet next year’s objectives (i.e. Vista upgrade, Banner upgrade) • Staff resources continue to be stressed • Advantages of new best practices for RMAN and backups of flash recovery area • Development of methodology for migrating to Cost based optimizer • Learning best practices for ASM on Hitachi SAN • Benefiting from new features in OEM (monitoring, tuning and provisioning) • New failover and load balancing features on RAC (FAN – Fast application Notification) • Setup and configuration of 10g RAC
Key Skills to Develop • Best practice to migrate 9i RAC to 10g RAC • Correct use of WebCache • Understand implications of global_names=true • Get developers up to speed on writing good code and performance tuning as well as trained on using new 10g tools • Oracle Internet Directory
Summary • Big learning curve • Need to move forward or future projects will be in jeopardy of failure • All levels of hardware stack are implicated