Sampling Problem with Filtered Statistics

October 12, 2009 · Posted in SQL Server · Comments 

I was writing a blog post to show how to solve the problem of bad estimations due to uneven data distribution with filtered statistics. After an hour or so of banging my head against the wall, I discovered that I ran into a bug feature. It appears that if the predicate on the filtered stats is very selective then a non-null histogram will not be generated unless they are created\updated with full scan. This is true for automatically generated stats for filtered indexes as well but they are created with full scan when the index is created.

Let’s look at demo data created with this code:


CREATE TABLE filtered_stats
(
c1 INT IDENTITY(1,1) PRIMARY KEY NONCLUSTERED,
c2 DATETIME DEFAULT GETDATE(),
c3 CHAR(500),
c4 INT
)

--lets at 1 million rows with c4=1
--lets add 1 row with c4 = 2
--Lets add a million more with c4 =3
--Finally, lets add 50 more with c4 =3 so the estimates are skewed.
DECLARE @ctr INT = 1000000
WHILE @ctr > 0
BEGIN
   INSERT INTO
filtered_stats(c3, c4)
  
VALUES
  
('blah', 1)
  
SELECT @ctr = @ctr-1
END
GO
INSERT INTO filtered_stats(c3, c4)
  
VALUES
  
('blah', 2)
DECLARE @ctr INT = 1000000
WHILE @ctr > 0
BEGIN
   INSERT INTO
filtered_stats(c3, c4)
  
VALUES
  
('blah', 3)
  
SELECT @ctr = @ctr-1
END
GO
DECLARE @ctr INT = 50
WHILE @ctr > 0
BEGIN
   INSERT INTO
filtered_stats(c3, c4)
  
VALUES
  
('blah', 2)
  
SELECT @ctr = @ctr-1
END

So now we have 1 million rows with c4 = 1 and 3. We have 51 rows with c4=2.

If we create this filtered stats, we have a null histogram: CREATE statistics fstats2 on filtered_stats(c4) where c4 =2

image

Updating or creating with fullscan give use the following data from dbcc show_statistics.

image

Bug or Feature? This is what BOL says:

“When only a few values are returned, however, sampling might not provide accurate statistics. For highly selective predicates, which return relatively few values, using the WITH FULLSCAN option will improve accuracy because it will scan all of the values instead of sampling.”

So they “recommend” that you use full scan on highly selective stats. However, I think that is a bug since we do not even get unfiltered rows or rows sampled back. Not to mention that inaccurate stats means to me that numbers are returned but they are most likely less than what is actually there. I will submit it as a bug unless one of you have thoughts otherwise.

So in a nut shell, ALWAYS update filtered stats with full scan or they may not actually be doing you any good. There are also some other considerations in regards to filtered stats that Kimberly Tripp has recently written about. There is some good conversion about it as well. Conor also posted about it here.

I will have more on filtered stats coming up in the next post.

Upgrade but Avoid v1.0 Features

August 5, 2009 · Posted in SQL Server · Comments 

Holding off on the newest features sounds like a no brainer especially when the next SQL Server release seem a million years away and is BI focused. In the reality, I was partitioning tables the first night we upgraded to SQL Server 2005. I could not replace TEXT with VARCHAR(MAX) fast enough. Some DTS packages got hit with the shamwow and emerged in SSIS. All of these changes broke stuff. None of these were necessary. A straight upgrade of the engine would have been fine and brought plenty of bugs on its own. :)

So, new features have more bugs when we eyeball it based on SP and CU fix lists based on these three examples.

Mirroring

SQL Server 2005 at SP2 – 6 fixed published bugs

SQL Server 2008 at SP1 CU3 – 1 fixed published bug

Partitioning

SQL Server 2005 at SP2 – 5 fixed published bugs

SQL Server 2008 at SP1 CU3 – 1 fixed published bug

SSIS

SQL Server 2005 at SP2 – 10 fixed published bugs

SQL Server 2008 at SP1 CU3 – 3 fixed published bugs

This, of course, it not scientific. Not all bugs are published and not all bugs are fixed. For example, I found a bug that has been present since SQL Server 2005 RTM and they just stated “issue should be addressed in the next major release of SQL Server that comes after 2008 R2.”. That is like 2012 or 2013. Granted, it is not a major problem but still.

To MSFT’s credit, they have revamped the SQL Sever dev process after SQL Server 2005 so hopefully that will result in less bugs but sometimes I get the impression that they could throw more dev cycles at bug fixes.

So, yes, upgrade but hold off on the version 1.0 features unless there is a good business case for using them. How would you like to be one of the handful of customers relying on SQL Server Notification Services?

A Nasty SQL Bug – USERSTORE_TOKENPERM

October 15, 2008 · Posted in SQL Server · Comments 

Executive: Um, the Internet is slow.

DBA: You mean the website?

Executive: I think it is the dataplace.

DBA: Database?

Executive: Yah, I rebooted my modem but it is still slow.

So your manager catches wind and comes running over. You pull open perfmon and CPU, memoryIO and network are idle. Blocking? Nada. You tell your boss that he ought to check with the App and Network guys.

However, it is a database issue in this specific instance.

This is not new news. Why am I blogging about this?

  • It is hard to diagnose.
  • I have run into it several times including today.
  • The users say it is slow but SQL looks happy.
  • This is hard to google unless you already know the problem.

The bug and hotfix is at this KB: http://support.microsoft.com/kb/933564    

PSS expands on it here: http://blogs.msdn.com/psssql/archive/2008/06/16/query-performance-issues-associated-with-a-large-sized-security-cache.aspx