Wednesday, August 5, 2015

SQL Error Log Shows Random User and SPID Stopping SQL Trace!

Testing Environment: SQL 2014

Recently, my colleague noticed several messages in SQL error log of a SQL 2008 server about non-DBAs stopping SQL traces. The login that started traces was SQL service account, which is also sysadmin on the server. However, random logins were showing up in the messages of trace being stopped. Messages were like:
Trace ID 3 was started by login “ServiceAccount”.

SQL Trace stopped. Trace ID = '3'. Login Name = 'DomainName\LoginA'.
Trace ID 4 was started by login “ServiceAccount”.
SQL Trace stopped. Trace ID = '4'. Login Name = 'DomainName\LoginB'.
Trace ID 3 was started by login “ServiceAccount”.
SQL Trace stopped. Trace ID = '3'. Login Name = 'DomainName\UserA'.
Users did not have any more access than read and write on tables in user databases. They should not have been able to stop SQL trace.
To investigate this issue, I looked at roles and permission on the server to make sure that the logins and the groups they are members of in fact did not have permission to stop trace. The time interval between start and stop of trace was always 5 minutes and that gave me the clue that it was scheduled. I found a job another DBA had created that ran trace every 5 minutes with a stop time of 5 minutes. Session that starts trace just triggers trace after which SQL takes over running of trace. However, when it is time to stop trace, it appears that SQL uses one of the running sessions to stop trace and registers that as the SPID and login that stopped trace.

I did a test with SQL 2014 server. Here is the trace script used in test. It traces SQL batch completed and RPC completed for login testlogin.

-- Create a Queue
declare @rc int
declare @TraceID INT
declare @DateTime datetime
declare @maxfilesize BIGINT
declare @FileName nvarchar(128)

set @maxfilesize = 8

set @DateTime = dateadd(SECOND,5,getdate()) –-5 seconds stop time

set @FileName = N'D:\Trace_' + replace(convert(nvarchar(10),getdate(),120),'-','') + '_' + replace(convert(nvarchar(5),getdate(),8),':','') + '_Lock_Trace'

-- Create trace
exec @rc = sp_trace_create @TraceID output, 2, @FileName , @maxfilesize, @DateTime

if (@rc != 0) goto error

-- Set the events
declare @on bit

set @on = 1

exec sp_trace_setevent @TraceID, 10, 9, @on
exec sp_trace_setevent @TraceID, 10, 2, @on
exec sp_trace_setevent @TraceID, 10, 10, @on
exec sp_trace_setevent @TraceID, 10, 6, @on
exec sp_trace_setevent @TraceID, 10, 11, @on
exec sp_trace_setevent @TraceID, 10, 12, @on
exec sp_trace_setevent @TraceID, 10, 13, @on
exec sp_trace_setevent @TraceID, 10, 14, @on
exec sp_trace_setevent @TraceID, 10, 15, @on
exec sp_trace_setevent @TraceID, 10, 16, @on
exec sp_trace_setevent @TraceID, 10, 17, @on
exec sp_trace_setevent @TraceID, 10, 18, @on
exec sp_trace_setevent @TraceID, 12, 1, @on
exec sp_trace_setevent @TraceID, 12, 9, @on
exec sp_trace_setevent @TraceID, 12, 11, @on
exec sp_trace_setevent @TraceID, 12, 6, @on
exec sp_trace_setevent @TraceID, 12, 10, @on
exec sp_trace_setevent @TraceID, 12, 12, @on
exec sp_trace_setevent @TraceID, 12, 13, @on
exec sp_trace_setevent @TraceID, 12, 14, @on
exec sp_trace_setevent @TraceID, 12, 15, @on
exec sp_trace_setevent @TraceID, 12, 16, @on
exec sp_trace_setevent @TraceID, 12, 17, @on
exec sp_trace_setevent @TraceID, 12, 18, @on

-- Set the Filters
declare @intfilter int
declare @bigintfilter bigint

exec sp_trace_setfilter @TraceID, 10, 0, 7, N'SQL Server Profiler - efee2919-4d2e-468a-846a-b3e2452fd898'
exec sp_trace_setfilter @TraceID, 11, 0, 6, N'%testlogin%'
exec sp_trace_setfilter @TraceID, 11, 0, 1, NULL

-- Set the trace status to start
exec sp_trace_setstatus @TraceID, 1

-- display trace id for future references
select TraceID=@TraceID

goto finish
select ErrorCode=@rc

Query 1
Since this trace runs for only 5 seconds, I ran a session by testlogin in a separate windows right after executing the trace. The session by testlogin ran following SQL statements as shown in "Query 2". It runs continuous loop of adding rows into a table and then dropping that table every second. I stopped it after 6-7 seconds enough time for trace to stop.

   SELECT TOP 100 * INTO #names FROM sys.databases WITH(NOLOCK)
   DROP TABLE #names
   WAITFOR DELAY '00:00:01'
Query 2
Refer screen shot in Figure 1. I ran trace of "Query 1" 7 times each with 5 seconds stop time. Each set of start and stop messages have been numbered to make it easier to follow discussion. In 1 and 2 instances of trace, I was running "while loop" query from "Query 2" using testlogin account when trace was running. SQL used session by testlogin to stop trace. I did not run query using testlogin in 3 to 7. In 3, SQL did not find any active session by a user and it picked up a system session (SPID 13) by sa. In 4, 5, 6, and 7, SQL used other active sessions to stop trace. In fact, if there is nothing active, SQL will not stop trace after 5 seconds. In test, I have seen instances of trace that is setup to stop after 5 seconds running for more than 13 seconds before SQL could find an active session. In 3, the trace ran for 7 seconds. in 5, the trace ran for 8 seconds.
Figure 1
If you run another trace to try to trace stopping of the 5-second trace in "Query 1", it will not capture stopping of the trace.

None of the logins and SPIDs specified in “SQL Trace stopped” messages actually ran query to stop trace. So, these messages are misleading and could cause of confusion/misunderstanding between DBAs and users. I am not sure why SQL has to use a user session. I would expect SQL to use a system session to stop trace instead of picking random SPIDs.

Conclusion: SQL error log message about which user and SPID stopped a timed trace is misleading. It appears that SQL uses a random active session to stop timed trace. It may be something that MS SQL team overlooked.