Call us Today! 201-728-8809 | | Contact | About |
   


Database Troubleshooting and Tuning Series

The Friday Afternoon Crisis

Are your developers hindering database performance? Part II

February 2012. Volume 1 - Issue 2
By Tom Romeo

The goal of this article is to help developers design applications with consideration for the importance of proper database interaction for maximum performance.

Discovery

It's Friday afternoon. You receive a call from some middle manager about a problem on one of corporate's servers or at a customer's site. He/she proceeds to verbally hemorrhage about the observed symptoms, managements' diagnosis, and instructions for you to resolve your new number one priority. Senior leadership is keenly aware of this issue, the top floor is watching, and is expecting periodic updates as to its resolution.

What is a developer to do? Step number one is to remove your developer hat and put on your trouble shooting hat. Empirical data is what you need to begin your quest. The starting place to discovery is to find existing empirical data and to start your own collection of data points that will give you clues as to where to look next. The existing data, that is the data that exists before you start collecting, could reside in logs from the application, windows event logs, dump files, etc.

Start with PERFMON on the server that is exhibiting the problem and on the database server that hosts the database. While watching PERFMON in real time is usually beneficial, it is important to collect data in a file as well. Many times something that seems intermittent actually has a pattern or coincidence. These patterns sometimes emerge from collected data while they are more elusive in real time.

Using the collected data you may find areas of concern. The starting metrics are listed down below. It may be necessary to collect other metrics depending on your initial discovery. For example, if CPU sustains an undesirably high percentage and SQL Server, (or Oracle Server) is not the culprit, you may need to look at the currently running processes and start tracking metrics on the guilty process.

While allowing the collection to run for a period of time, perhaps several minutes or hours, we can watch the real time performance to see if anything jumps out at us. Looking first at the easy targets like CPU, Memory, and Physical Disk, we can easily gauge if the three targets are operating within healthy ranges.

I'd like to note here that as developers, we really want to see the server as underutilized as possible, but IT departments that are highly virtualized like to see their VMs running fairly hot as there is a negative monetary association with idle hardware. It is therefore not uncommon to see servers utilizing CPU at 60-70%, memory at 70-80% and disk at 40-50%. The understanding here is that the phrase "healthy ranges" is highly subjective and perhaps even political.

It is also important to note that sustained high values are the initial target, whereas short infrequent spikes are secondary to our quest. That is to say if CPU or Disk utilization spike to 100% for a second and then return to an acceptable target then we won't worry about that just yet. If however Disk spikes to 100% for a second or three every ten minutes, then we should probably find the coinciding process and what it does.

Rules of Thumb:

Is the database server exhibiting performance issues due to volume? Determine if the problem is volume related or database implementation related. In this regard we refer to volume as the number of SQL batches running per second. Depending on the hardware platform, sustained SQL batches per second exceeding 7,000 could be considered high (sometimes not). Sometimes the only way to remedy volume issues is with hardware, but not always. Determine if the volume is necessary or if it is a chatty application that repeats the same query over and over.

Unless your network folks can show you that you are using too much band width, it is likely latency is just a symptom of something else. Sometimes chatty applications appear to be latency bound.

If SQL Errors or deadlocks per second seem high it may be necessary to run a SQL trace.

If SQL Logins per second or user connections seem high, a SQL trace might help you here as well.

CPU:

If CPU sustains unhealthy high values on the database server, another area of concern could be SQL compilations. If excessive SQL compilations are causing CPU problems, consider using more stored procedures. If that is not an option then for SQL Server ensure that "Forced Parameterization" is set on or on Oracle ensure that cursor sharing is set properly. Look at the SQL Statistics for compilations and forced parameterizations.

Also it is important to keep in mind that dynamic SQL within a stored procedure does not take advantage of the procedure cache.

For example the stored procedure:

Create Procedure [dbo].[sp_getMyFoo]
@myWhere varchar(500)
as
Begin
declare @SQL varchar(5000)
set @SQL = 'select * from foo ' + @myWhere
exec (@sql)
End

This stored procedure will not benefit from being precompiled because the actual query is not known until run time. It offers no SQL caching benefit, and if the front end is not coded properly could be subject to SQL injection. However, setting SQL Server forced parameterization = on or Oracle Cursor Sharing = Forced, should help with regard to the compilations.

Disk:

If disk utilization and queue sustains unhealthy high values on the database server, and the database is already spread intelligently among several disks, and the issue is not volume related, then it is possible that certain queries are mal formed, or indexes are incorrect or missing, or your application could be imposing unnecessary locks. Sometimes your application could be perfectly fine as you find something more insidious like one or more disks are completely full. Failure to perform routine database maintenance and backups can lead to problems over time.

Sometimes poorly written applications as mentioned in our previous article, can cause abnormally long locks rendering the SQL transaction log and tempdb, or Oracle Rollback segments to place unnecessary strain on disk. This usually won't manifest until concurrency is exercised.

Information about which queries were run on SQL server can be retrieved via this query:

Shows all queries and statistics since the cache was last cleared.

Select ob.SQL, pl.execution_count, pl.min_elapsed_time, pl.max_elapsed_time, pl.last_elapsed_time, pl.total_elapsed_time, pl.total_worker_time, pl.last_worker_time, pl.min_worker_time, pl.max_worker_time, pl.total_physical_reads, pl.last_physical_reads, pl.min_physical_reads, pl.max_physical_reads, pl.total_logical_writes, pl.last_logical_writes, pl.min_logical_writes, pl.max_logical_writes, pl.total_logical_reads, pl.last_logical_reads, pl.min_logical_reads, pl.max_logical_reads, pl.creation_time, pl.last_execution_time, pl.sql_handle, pl.plan_handle
from sys.dm_exec_query_stats pl
join sys.dm_exec_cached_plans ph on pl.plan_handle = ph.plan_handle
join sys.syscacheobjects ob on ph.bucketid = ob.bucketid
order by pl.max_elapsed_time desc

And similar information from Oracle can be retrieved via one of these queries:

Shows current queries running or just ran

SELECT to_char(sysdate,'yyyy-mm-dd hh24:mi:ss') NOW, to_char(b.LAST_LOAD_TIME,'yyyy-mm-dd hh24:mi:ss') LAST_LOAD, to_char(b.LAST_ACTIVE_TIME,'yyyy-mm-dd hh24:mi:ss') LAST_ACTIVE, a.USERNAME, a.machine, b.sql_text, a.STATUS
FROM V$SESSION a
INNER JOIN V$SQLAREA b ON a.SQL_ID = b.SQL_ID;

Shows Last Five Minutes of SQL

SELECT to_char(sysdate,'yyyy-mm-dd hh24:mi:ss') NOW, to_char(b.LAST_LOAD_TIME,'yyyy-mm-dd hh24:mi:ss') LAST_LOAD, to_char(b.LAST_ACTIVE_TIME,'yyyy-mm-dd hh24:mi:ss') LAST_ACTIVE, b.PROGRAM_ID, b.cpu_time, b.elapsed_Time, b.DISK_READS, b.DIRECT_WRITES, b.PARSING_USER_ID, b.PARSING_SCHEMA_ID, b.ROWS_PROCESSED, b.USER_IO_WAIT_TIME, b.Loads, b.sql_text
From V$SQLAREA b
Where b.LAST_ACTIVE_TIME >= (sysdate - (5/1440))
order by b.LAST_ACTIVE_TIME desc;

Shows Last Five Hours of SQL

SELECT to_char(sysdate,'yyyy-mm-dd hh24:mi:ss') NOW, to_char(b.LAST_LOAD_TIME,'yyyy-mm-dd hh24:mi:ss') LAST_LOAD, to_char(b.LAST_ACTIVE_TIME,'yyyy-mm-dd hh24:mi:ss') LAST_ACTIVE, b.PROGRAM_ID, b.cpu_time, b.elapsed_Time, b.DISK_READS, b.DIRECT_WRITES, b.PARSING_USER_ID, b.PARSING_SCHEMA_ID, b.ROWS_PROCESSED, b.USER_IO_WAIT_TIME, b.Loads, b.sql_text From V$SQLAREA b
Where b.LAST_ACTIVE_TIME >= (sysdate - (5/24))
order by b.LAST_ACTIVE_TIME desc;

Copy the offending queries from the results and run them individually after setting auto trace on for Oracle set autotrace on;

Or on SQL server selecting the "Display actual execution plan" from SQL Management Studio query window options.

The principles that follow are the same for Oracle and SQL server. Target full table or index scans and try to augment your query and or adjust indexing to replace as many scans with seeks. In a future article we'll go into more detail about this.

Start by collecting this data on the database server (assuming SQL Server). PERFMON lists MSSQL for SQL Express or SQLServer for SQL Server.

Alphabetically for ease of listing:

  • Memory
    • Available Mbytes
    • Page Faults/sec
    • Pages/sec
  • Network Interface
    • Bytes Total/sec
  • Paging File
    • % Usage
    • % Usage Peak
  • Physical Disk (select each disk individually)
    • % Disk Time
    • Avg. Disk Queue Length
    • Avg. Disk sec/Read
    • Avg. Disk sec/Write
    • Current Disk Queue Length
  • Processor (select total)
    • % Interrupt Time
    • % Privileged Time
    • % Processor Time
    • % User Time
    • Interrupts/sec
  • SQLServer:Databases (Could have an instance name - select your database or _Total)
    • Active Transactions
    • Log Flush Wait Time
    • Log Flush Waits/sec
    • Log Flushs/sec
    • Transactions/sec
  • SQLServer:General Statistics
    • Active Temp Tables
    • Connection Reset/sec
    • Logical Connections
    • Logins/sec
    • Logouts/sec
    • Temp Tables Creation Rate
    • Temp Tables for Destruction
    • Transactions
    • User Connections
  • SQLServer:Locks
    • Average Wait Time )ms)
    • Lock Requests/sec
    • Lock timeouts (timeout >0)/sec
    • Lock Wait time (ms)
    • Deadlocks/sec
  • SQL Server:Plan Cache
    • Hit Ratio
  • SQLServer:Errors
    • Errors/sec
      • DB Offline Errors
      • Info Errors
      • Kill Connection Errors
      • User Errors
  • SQLServer:Statistics
    • Batch Requests/sec
    • Failed Auto-Params/sec
    • Forced Parameterizations/sec
    • SQL Compilations/sec
    • SQL Re-compilations/sec
  • System
    • Processes
    • Processor Queue Length
    • System Calls/sec
    • Threads


If it is an Oracle server then similar metrics may be available in one of the Oracle enterprise tools.

After you have performed your collection of empirical data and have compared it to data that has already been collected, you will be in a better position to form your own diagnosis or decide to perform more tests.

Empirical data is our friend as it will help us discover the root cause of the problem. Everything up to this is observation which can be deceiving. We wish to discover the root cause of the problem and focus our energy towards its resolution.

One of my favorite anecdotes is the story of the frog and the scientist. The scientist took careful notes as he proceeded to make the frog jump by verbally commanding him to jump. He noted that the frog jumped shorter distances as he removed his legs one at a time. When the frog finally had no more legs, he did not jump at all. The scientist concluded that as the frog lost limbs he gradually went deaf.

As you discover root cause you may sometimes find that management's initial diagnosis resembles the scientist's hypothesis. This is not a criticism of management's ability. Even your family doctor will proclaim the wrong diagnosis if the necessary tests are not completed.

Click here to see other articles in the series.

Corporate Headquarters

Practical Compliance Solutions
One Orient Way - Suite F156 Rutherford, NJ 07070
Phone: 201-728-8809


40.827731 N 74.101029 W
Connect With Us


Share




© 2002 - 2017 - Copyright Practical Compliance Solutions - All rights reserved