180 likes | 261 Views
Database Administration & Optimization. By Shanna Epstein IS 257 September 16, 2008. Cnet.com. Provides information, tools, and advice to help customers decide what to buy and how to get the most out of technology: News Reviews Downloads. Cnet.com Other Sites. Cnet.com Bnet.com
E N D
Database Administration & Optimization By Shanna Epstein IS 257 September 16, 2008
Cnet.com • Provides information, tools, and advice to help customers decide what to buy and how to get the most out of technology: • News • Reviews • Downloads
Cnet.com Other Sites • Cnet.com • Bnet.com • Zdnet.com • Gamespot.com • TV.com • Chow
Job Responsibilities • Software Engineer for Trax • Lead team of 2 developers • Responsible for development and maintenance of product • Support Windows servers • Support php websites and SQL Server 2005 databases
What is Trax? • Industry’s most advanced tool for • Tracking • Analyzing Entertainment Data • Consumer Awareness • Interest and Purchase Intent • Competitive Mindshare • Ad campaign effectiveness • Audience profiles • Editorial coverage
Data Sources • Gamespot (Gamespot Trax) • Gamespot.com • GameRankings.com • Download.com • TV (TV Trax) • tv.com • MovieTome (Trax coming soon) • movietome.com • MP3 – Juke (Trax coming soon) • juke.com (coming out soon)
Metrics • About 20 total • Most important: • Users/Avg. Daily Users (by Sessions) • Page Views (hits of pages) • Searches (execute search for a game) • New Trackers/Total Trackers (track games in Favorites) • Downloads • Videos
Data Processing • Stage servers • process heavy jobs at night and early morning • data comes from various sources • direct database queries • text file feeds • xml feeds • replicate to production early in the morning • replicate to development in off hours
Architecture • Development • Code: developer PCs (php) • Database server: SQL Server2005 • Staging (data processing, replication, testing) • Windows server, SQL Server 2005, Apache • 2 servers: Gamespot Trax, TV Trax • Production • Windows server, SQL Server 2005, Apache • 2 servers: Gamespot Trax, TV Trax
Database Maintenance Optimization • Jobs scheduled in SQL Server Job Agent consists of several steps involving: • Job status can be retrieved through queries against the Master db • DBA can be notified of success/failure through user interface or emails • 1st step checks whether job already completed successfully today • 2nd step checks whether dependent jobs completed • Later steps consist of php scripts (read and process data, send e-mails, etc) and SQL queries
Controlling Database Processes • Task Manager on Windows Server • monitors CPU usage • SQL Server Activity Monitor - ad hoc monitoring tool • Allows to determine volume and general types of activities • lists user connections , locks, open, and blocked transactions • SQL Query Profiler - query optimizer • allows to trace queries for analysis and catch query bottlenecks • Logging • Automatic system error logging • Database consists of data and log file
Optimizing Database Performance • Caching • Saves frequently used information into an easy to get to area (usually memory) • Reduces disk access, computation (CPU), and speeds up query results • Indexes • Data structure that improves speed of operations on db table • Sorted by key values • Query Syntax • tips and tricks • Good database design • normalization • Data cleaning • Old and unused records
Optimizing query performanceTypes of Database Caching • Ad hoc - not cached • Easy to debug, but same query executed every time • Parameterized queries - cached queries • Replace constant literal values by variables, and compiles query plans. If subsequent query differs in only values of constants, it will match against auto-parameterized query. • Prepare statements – cached handles • batch text is sent once at "prepare" time. SQL Server responds by returning handle that can be used to invoke the batch at execute time. At execute time, a handle and parameter values are sent to server. • Stored procedures - cached parameter sniffing • When stored procedure is compiled for first time, values of parameters supplied with execution call are used to optimize statements within that stored procedure. This process is known as "parameter sniffing." If these values are typical, then most calls to that stored procedure will benefit from stored procedures.
Optimizing Query PerformanceIndexes • Clustered • Defines physical storing of rows (reorders how records are physically stored) • Each table can have one clustered index • If clustered index not defined, usually Primary Key becomes clustered index automatically • Non-Clustered • logical order of the index does not match the physical stored order of the rows on disk • Contains references to sorting • Slows down modification and insertion process, so keep to a minimum • Good candidates for non-clustered indexes are those columns frequently used in WHERE, GROUP BY, and HAVING • SQL Server’s Tuning Wizard • recommends indexes after running SQL Server Profiler (query optimization tool that traces queries)
Optimizing Query Performance Syntax tricks • Avoid joining tables on non-indexed columns • Most expensive operation SQL Server can do • Specify columns in select statements instead of SELECT * • Try to avoid DISTINCT when possible • Restrict with WHERE clauses • Select TOP number of rows • Experiment with switching ORDER BY column positions
Optimizing Query Performance(Syntax tricks - cont.) • Union ALL faster than Union • UNION ALL does not look for duplicate rows, and UNION statement does look for duplicate rows, whether or not they exist • SET NOCOUNT ON – stored procedures • Stops message indicating number of rows and reduces network traffic because message with number of rows retrieved is not displayed to client
Backup and Restore Procedures • IT backups • Stored on tape • Ship to offsite • Backup requested when needed • SQL Server runs nightly backups • Backups stored on server and replaced by new versions • Types of files backed up: • Master file stores system information, user logons, user permissions, etc. • Msdb file stores job content and scheduling information • Other database files
Conclusion • Questions?