Filtered Stats to Counter Data Skew Issues
There are a handful of things that can make SQL Server choose a “bad plan”. They usually revolve around variable sized result sets. Some problematic scenarios are parameter sniffing, data skew, local variables, table variables, UDF’s or complex calculations in the where clause. All of these and more are detailed in my favorite and recently updated white paper.
We are going to be talking about data skew in this post. Picture a table where you have 1 billion rows where IsShipped = 1 but 50k rows where IsShipped=NULL. Now, let’s say you have a query that returns orders that are not shipped(NULL). During compilation, the optimizer asks the stats how many rows are IsShipped=NULL. If the stats sampling touched, few, if any of the pages, where IsShipped = NULL, then SQL Server will guess that only 1 row will be returned. This may surpass the key lookup threshold(Part 1, Part 2) and a scan would not be optimal but better than a key lookup. Get it? We will look at an example soon if not.
In the past, your options have been limited. Update stats with fullscan was usually the best option unless the table was just too large and volatile. If it was, you probably had to use index hints or OPTIMIZE FOR the lesser of the two evils. You have covering indexes and is probably a good solution for the example above unless the result set has to be very wide.
Filtered statistics are a significant improvement and can be used nicely when dealing with this problem. They are smaller so cheaper to maintain. Their scope can be very narrow so they are more accurate.
There are a couple of caveats. I wrote about the need to use full scan on very selective stats here and Kimberly Tripp wrote about the fact that they will not auto update until the entire table crosses the threshold.
Let’s look at an example.
CREATE TABLE filtered_stats
(
c1 INT IDENTITY(1,1) PRIMARY KEY NONCLUSTERED,
c2 datetime DEFAULT GETDATE(),
c3 CHAR(500),
c4 INT
)
--lets add 1 million rows with c4=1
--lets add 1 row with c4 = 2
--Lets add a million more with c4 =3
--Finally, lets add 50 more with c4 =2 so the estimates are skewed.
DECLARE @ctr INT = 1000000
WHILE @ctr > 0
BEGIN
INSERT INTO filtered_stats(c3, c4)
VALUES
('blah', 1)
SELECT @ctr = @ctr-1
END
GO
INSERT INTO filtered_stats(c3, c4)
VALUES
('blah', 2)
DECLARE @ctr INT = 1000000
WHILE @ctr > 0
BEGIN
INSERT INTO filtered_stats(c3, c4)
VALUES
('blah', 3)
SELECT @ctr = @ctr-1
END
GO
DECLARE @ctr INT = 50
WHILE @ctr > 0
BEGIN
INSERT INTO filtered_stats(c3, c4)
VALUES
('blah', 2)
SELECT @ctr = @ctr-1
END
--Let's add a clustered index
CREATE CLUSTERED INDEX ix1 ON filtered_stats(c2)
--This index will be used for the key lookups
CREATE INDEX ix2 ON filtered_stats(c4)
--The estimated rows returned are 1 but the actual rows are 51.
--This is a bad estimate due to uneven data distribution.
SELECT c1, c2
FROM filtered_stats
WHERE c4=2
--Lets create narrow and more accurate filtered stats
--You still have to do full scan but you are touching just a fraction of the data – NOTE: Full scan is needed based on my observations here.
CREATE STATISTICS fstats ON filtered_stats(c4) WHERE c4 = 2 WITH fullscan
--This should estimate 51 no matter how many times you update stats since they are more accurate
SELECT c1, c2
FROM filtered_stats
WHERE c4=2
This is not bullet proof but I say it is an improvement. The downside include the fact that filtered stats are manually created. Manual stats updates will need to be done with a full scan. You may also say that a covering index(or filtered covering index) would kill this. That is true but there may be an actual need to return all columns of a table.
A quick wrap up: Put it in the tool box. It will be like that $50 wrench that you use once a year but it saves you 4 hours every time you touch it.
Key Lookup Threshold Part 2
Grant Fritchey(@gfritchey) posed this question to me on twitter after my posted on the key lookup threshold post.
@statisticsio Interesting post. Do you think that threshold is dependent on the data involved? Or maybe on the size of the key?8:28 AM Feb 5th from TwitterGadget in reply to statisticsio
I kind of assumed so which is why I did char(1000). However, I have just tested with char(1). The numbers are interesting. Please refer to the original post for repro code.
| IO | IO | |
| Rows | CHAR(1000) | CHAR(1) |
| Scan | 422 | 9 |
| 1 row | 4 | 4 |
| 5 rows | 12 | 12 |
| 10 rows | 22 | 22 |
| 25 rows | 52 | 52 |
| 50 rows | 114 | 102 |
| 100 rows | 217 | 202 |
| 250 Rows | 526 | 502 |
| 500 Rows | 1043 | 1003 |
| 1000 Rows | 4141 | 4007 |
The threshold is crossed much earlier because the scan is smaller. This is a small table especially with CHAR(1) so take this test with a grain of salt. Run your own tests when you are working with large production data.
I would also venture to guess that as the complexity of the query increases the threshold gets lower especially on more complex queries like when grouping especially when grouping and outer joining.
A covering index is probably the best solution in most cases unless you need to touch all rows even then, it might be better,.
Key Lookup Threshold
Gail talks about bookmark lookups…. err.. key lookups in this post. So are they good or bad? Well, like many things in SQL, it depends. The main factor is the number of rows returned. A few rows are fine but the cost rises sharply with larger result sets. There comes a point where the threshold is crossed and a scan is more efficient. This is because a scan leverages sequential IO while a lookup does random IO.
Here are the results of the code at the end of the post.
| Rows | IO | |
| Scan | 422 | |
| 1 row | 4 | |
| 5 rows | 12 | |
| 10 rows | 22 | |
| 25 rows | 52 | |
| 50 rows | 114 | |
| 100 rows | 217 | |
| 250 Rows | 526 | <–Threshold |
| 500 Rows | 1043 | |
| 1000 Rows | 4141 |
As you can see, at 250 rows, we have crossed the threshold and it is cheaper do a scan. If you are passing in a literal, the optimizer can detect this and switch to a scan. If it is a stored proc or parameterized SQL, a plan is cached the first time it is run. Problems happen when the result size greatly varies depending on the parameter. There are ways around this all with their pro’s and con’s. Here are some:
- A covering index.
- A statement level recompile hint
- Plan guides and hints
Here is the sample code that can repro these numbers on SQL Server 2008.
CREATE TABLE #temp
(
id INT IDENTITY(1,1) PRIMARY KEY CLUSTERED,
c1 CHAR(1000) DEFAULT( 'blah'),
c2 INT
)
INSERT INTO #temp(c2)
VALUES
(1)
GO
INSERT INTO #temp(c2)
VALUES
(5)
GO 5
INSERT INTO #temp(c2)
VALUES
(10)
GO 10
INSERT INTO #temp(c2)
VALUES
(25)
GO 25
INSERT INTO #temp(c2)
VALUES
(50)
GO 50
INSERT INTO #temp(c2)
VALUES
(100)
GO 100
INSERT INTO #temp(c2)
VALUES
(250)
GO 250
INSERT INTO #temp(c2)
VALUES
(500)
GO 500
INSERT INTO #temp(c2)
VALUES
(1000)
GO 1000
INSERT INTO #temp(c2)
VALUES
(1000)
GO 1000
CREATE INDEX ix ON #temp(c2
)
--The baseline
SET STATISTICS io ON
SELECT *
FROM #temp WITH (INDEX=1)
WHERE c2 = 1
--1 row returned
SELECT *
FROM #temp
WHERE c2 = 1
--5 rows returned
SELECT *
FROM #temp
WHERE c2 = 5
--10 rows returned
SELECT *
FROM #temp
WHERE c2 = 10
--25 rows returned
SELECT *
FROM #temp
WHERE c2 = 25
--50 rows returned
SELECT *
FROM #temp
WHERE c2 = 50
--100 rows returned
SELECT *
FROM #temp
WHERE c2 = 100
--250 rows returned
--Must begin using hints because the optimizer can tell that a scan is better
SELECT *
FROM #temp WITH (INDEX=ix, forceseek)
WHERE c2 = 250
--1000 rows returned
--Must begin using hints because the optimizer can tell that a scan is better
SELECT *
FROM #temp WITH (INDEX=ix, forceseek)
WHERE c2 = 1000
Updated Unused Index Query
I have cleaned up and updated my unused index query. It returns a list of indexes ordered by ascending reads. If the instance has been up for a long time, it is probably safe to drop indexes with zero or close to zero reads. If the reads are low and the writes are high, this may help improve your transactions per second count. Otherwise, you are just cleaning up unused space. Use caution though. A missing index is worse than an unused index.
SELECT objectname=OBJECT_NAME(s.OBJECT_ID)
, indexname=i.name
, i.index_id
, reads=user_seeks + user_scans + user_lookups
, writes = user_updates
, p.rows
FROM sys.dm_db_index_usage_stats s JOIN sys.indexes i
ON i.index_id = s.index_id AND s.OBJECT_ID = i.OBJECT_ID
JOIN sys.partitions p ON p.index_id = s.index_id AND s.OBJECT_ID = p.OBJECT_ID
WHERE OBJECTPROPERTY(s.OBJECT_ID,'IsUserTable') = 1
AND s.database_id = DB_ID()
AND i.type_desc = 'nonclustered'
AND i.is_primary_key = 0
AND i.is_unique_constraint = 0
AND p.rows > 10000
ORDER BY reads, rows DESC
I will also be putting this query up at the SSP WIKI.
CXPACKET, MAXDOP and your OLTP system
So you are experiencing CXPACKET wait types? If you run a google search, you will quickly find out you are experiencing the dreaded SQL Server parallelism problem and you must reduce MAXDOP to 1. While that is possible, I say most likely not. I say that, most of the time, SQL is doing the best it can with what it has to work with. That is the query, the data and the schema.
Microsoft has come a long way since version <= 7.0 when it comes to parallelism. If your OLTP queries follow best practices and are well indexed, they probably will never generate a parallel plan. This is because they are fast and access a small amount of rows. If they are missing indexes or SQL overestimates cardinality, SQL might decide to do scans, sorts, hashes, spools etc. These iterators, among others, can go parallel to reduce execution time at the cost of system resources. These iterators are not bad and they do have their place. It just isn’t on OLTP type of queries most of the time.
So dropping the MAXDOP on an OLTP system to 1 probably won’t hurt much because most of the time there is an IO bottleneck. If the CXPACKET wait types are a symptoms of poor indexing and row count estimation, it won’t help either. Whole books have been written on indexing, query tuning and there is a nice whitepaper on stats best practices to avoid estimation problems so I am not going to go into that. I will give you some pointers on quickly identifying whether the CXPACKET is a symptom or the cause.
So your boss and boss’s boss run over to your desk yelling about the customer complaints of slowness with SQL. You run this query or something similar:
select r.cpu_time
, r.logical_reads
, r.session_id
into #temp
from sys.dm_exec_sessions as s
inner join sys.dm_exec_requests as r
on s.session_id =r.session_id --and s.last_request_start_time=r.start_time
where is_user_process = 1
waitfor delay '00:00:01'
select substring(h.text, (r.statement_start_offset/2)+1 , ((case r.statement_end_offset when -1
then datalength(h.text) else r.statement_end_offset end - r.statement_start_offset)/2) + 1) as text
, r.cpu_time-t.cpu_time as CPUDiff
, r.logical_reads-t.logical_reads as ReadDiff
, p.query_plan
, r.wait_type
, r.wait_time
, r.last_wait_type
, r.wait_resource
, r.command
, r.database_id
, r.blocking_session_id
, r.granted_query_memory
, r.session_id
, r.reads
, r.writes
, r.row_count
, s.[host_name]
, s.program_name
, s.login_name
from sys.dm_exec_sessions as s
inner join sys.dm_exec_requests as r
on s.session_id =r.session_id and s.last_request_start_time=r.start_time
full outer join #temp as t on t.session_id=s.session_id
CROSS APPLY sys.dm_exec_sql_text(r.sql_handle) h
cross apply sys.dm_exec_query_plan(r.plan_handle) p
order by 3 desc
drop table #temp
You notice that you have several rows with CXPACKET wait types. With this query, you click the xml show plan link and:
1. search the xml for missing indexes.
2. Save as a .sqlplan and reopen in SSMS
3. Compare estimated and actual rows in the iterators on the right side of the plan. Poor estimates may bubble to the left as well.
4. If there are no missing indexes, estimates and actual are fairly close, reducing MAXDOP may help if it is not a huge report or query.
5. If there are missing index or bad estimations, fix it!
Of course, these are not rules set in stone. Just a style in the art of database administration. Just don’t blame it on a “parallelism bug” because it is a poor musician that blames his instrument.
One thing to note, if you do turn down MAXDOP server-wide, turn in on at the query level(enterprise edition) on your index operation because they are optimized for it.
Alter index all on tblBlah rebuild with (maxdop=32);
Query Hashes in SQL 2008 RC0
I believe this is a new feature RC0. If I missed it before, shame on me because this is awesome. My first 2 blog posts(Part 1 and Part 2) were on doing a similar things with query patterns. It is basically an additional column on sys.dm_exec_requests and sys.dm_exec_query_stats that hold a binary hash. The hash is same for queries that are the same or similar. For example, these queries should all have the same hash(disregard parameterization and trivial plans for simplicity):
select c1 from t1 where c2 = 1
select c1 from t1 where c2 = 1
select c1 from t1 where c2 = 2
But this query will have a different hash:
select c1 from t1 where c2 > 1
This allows you to group and aggregate similar queries by the hash. It is really useful in OLTP environments where the same query might only take milliseconds but execute 1000’s of times a second. If you don’t aggregate, those those types of queries may not show up on the radar but might be responsible 90% of the CPU usage. Shaving a couple of cycles of CPU off of one of those and server-wide utilization drops.
Here is a nice sample from BOL.
1: SELECT TOP 5 query_stats.query_hash AS "Query Hash",
2: SUM(query_stats.total_worker_time) / SUM(query_stats.execution_count) AS "Avg CPU Time",
3: MIN(query_stats.statement_text) AS "Statement Text"
4: FROM
5: (SELECT QS.*,
6: SUBSTRING(ST.text, (QS.statement_start_offset/2) + 1,
7: ((CASE statement_end_offset
8: WHEN -1 THEN DATALENGTH(st.text)
9: ELSE QS.statement_end_offset END
10: - QS.statement_start_offset)/2) + 1) AS statement_text
11: FROM sys.dm_exec_query_stats AS QS
12: CROSS APPLY sys.dm_exec_sql_text(QS.sql_handle) as ST) as query_stats
13: GROUP BY query_stats.query_hash
14: ORDER BY 2 DESC;
15: GO
RE: Filtered indexes not compatible plan reuse
Before the original post, I had submitted a connect item for something similar.
It turns out that filtered indexes may not be used when auto-parameterization(or for stored proc) occurs without a recompile hint. This is sort of understandable but I can think of workarounds and I suspect additional logic will be added in the future as mentioned below. Here is the feedback from a connect item I submitted.
“
Thanks for your feedback. The query in question, select lastname from Person.Contact where ContactID <=(100-20) is auto-parameterized by SQL Server into the following form, using the standard auto-parameterization rules: select lastname from Person.Contact where ContactID <=(@1-@2) These rules are designed to minimize compile time cost for simple queries like this, possibly at the expense of additional optimizations such as using a filtered index. Admittedly, it is a difficult tradeoff. In a future release, we’ll consider extending the design to make a better decision in a cost-based way. For now, this behavior is by design.
“
alt head: You can’t have your cake and eat it too.
Filtered indexes not compatible plan reuse??
So I was messing around with filtered indexes tonight. Yes, beta software is my idea of a wild Saturday night. Hey, I played some Smash Bros first.
I was trying to figure out what happens if a plan is created using a covering but filtered index and try to reuse it with a parameter that is covered vertically but not horizontally. Let’s look at this example:
Find worst performing queries without a trace
Here is a report that I wrote for a customer that requested the TOP X queries on their system daily. It is basically like analyzing a trace without having to run the trace, load it and query it with some caveats. It strips out all of the literals so you can group by query pattern. This allows us to see if a small and fast query running a 100 times a second is causing more cumulative overhead that the long running reports.
The function is CLR using regular expressions from Itzak Ben-Gan. It is in the book, Inside SQL Server 2005: TSQL Querying. It is worth the price of the book just for this function. It is has been my secret weapon. He has the source code for it on his website.
select avg(total_worker_time/execution_count) as AvgCPU
, avg(total_elapsed_time/execution_count) as AvgDuration
, avg((total_logical_reads+total_physical_reads)/execution_count) as AvgReads
, sum(execution_count ) as ExecCount
, min(creation_time) as MinCreationTime
, max(last_execution_time) as MaxLast_Execution_Time
, dbo.fn_SQLSigCLR(left(substring(st.text, (qs.statement_start_offset/2)+1 , ((case qs.statement_end_offset when -1 then datalength(st.text) else qs.statement_end_offset end - qs.statement_start_offset)/2) + 1), 3900),4000) as txt
from sys.dm_exec_query_stats as qs
cross apply sys.dm_exec_sql_text(qs.sql_handle) as st
group by dbo.fn_SQLSigCLR(left(substring(st.text, (qs.statement_start_offset/2)+1 , ((case qs.statement_end_offset when -1 then datalength(st.text) else qs.statement_end_offset end - qs.statement_start_offset)/2) + 1), 3900),4000)
order by 3 desc –Order as needed.
The caveats are the fact that we are working off of the procedure cache so entries could be flushed for many reasons like stats updates or memory pressure.
So there are limitations but if you want to quickly see the queries that are causing the statements that are causing the most overhead on your system, this will get you started.
SQL Server 2008 filtered indexes in 5 minutes
My jaw literally dropped when I saw it. If this works as advertised, it has the potential of changing everything. The days of over indexing will be over. Dynamic indexing off of the missing and unused index DMV’s could be possible especially with added support. I also think this will better accomplish what people tried to do with partitioning for performance reasons in SQL Server 2005.
It will be really interesting to see how it gets applied in production and where this leads in the next versions. Maybe an autoindex checkbox will replace the DBA
This is a small and quick example. We will have to see how it scales to larger environments.
use adventureworks
update Production.WorkOrder
set EndDate = null
where WorkOrderID in (select top 400 WorkOrderID from Production.WorkOrder where enddate is not null)
–Let’s give a random 200 a recent date to mimic prod data
update top (200) Production.WorkOrder
set duedate = getdate()-10
where enddate is null
–This is query type that should be simple yet common
–Let’s get open WO’s that will be due soon
set statistics io on
select p.Name, wo.OrderQty, wo.DueDate
from Production.WorkOrder wo JOIN Production.Product p on wo.ProductID=p.ProductID
where wo.EndDate is null and wo.DueDate >= ‘2007-11-24′
–Table ‘WorkOrder’. Scan count 1, logical reads 530, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
create index ix01 on Production.WorkOrder(EndDate,DueDate, ProductID) include (OrderQty)
set statistics io on
select p.Name, wo.OrderQty, wo.DueDate
from Production.WorkOrder wo JOIN Production.Product p on wo.ProductID=p.ProductID
where wo.EndDate is null and wo.DueDate >= ‘2007-11-24′
–Does NCI seek
–Table ‘WorkOrder’. Scan count 1, logical reads 5, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
–Clean up
drop index [Production].[WorkOrder].[ix01]
–Create Filter index
create index ix01 on Production.WorkOrder(EndDate,DueDate, ProductID) include (OrderQty) where EndDate is null and DueDate >= ‘2007-11-24′
set statistics io on
select p.Name, wo.OrderQty, wo.DueDate
from Production.WorkOrder wo JOIN Production.Product p on wo.ProductID=p.ProductID
where wo.EndDate is null and wo.DueDate >= ‘2007-11-24′
–It uses it. We have a real small sample but it performs better
–Table ‘WorkOrder’. Scan count 1, logical reads 2, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
–Now let’s look at size
–Create unfiltered index for comparison
create index ix02 on Production.WorkOrder(EndDate,DueDate, ProductID) include (OrderQty)
–note the new syntax
declare @dbid int = db_id()
declare @objid int = object_id(‘[Production].[WorkOrder]‘)
select * from sys.dm_db_index_physical_stats(@dbid, @objid, null, null, ‘detailed’) ps join sys.indexes i
on ps.object_id=i.object_id and ps.index_id=i.index_id
and i.name in (‘ix01′, ‘ix02′) and i.type_desc=‘NONCLUSTERED’
That is a huge difference in size. The major point is not the fact that we retrieve the same while taking up so much less space on disk but so much less space in memory as well. We are going to get into this much more!
Technorati Tags: SQL Server 2008,Filtered indexes,CTP6
