What is Professional Development?

October 13, 2009 · Posted in SQL Server · Comments 

I am writing this post because the SQL PASS Professional Development(PD) Virtual Chapter is going to be launching soon. I wanted to define what this means to me, not PASS, and put out a desperate cry call for help. Arnie Rowland and Mark Caldwell are my partners in crime with supporting roles from Blythe Morrow and Thomas LaRock. The virtual chapter is launching at the Summit. More to come on this. This is what we are in need of.

  • Short article\post authors on Professional Development topics as they relate to the SQL Server Professional
  • SQL Bloggers who would like their existing professional development content syndicated to the PD feed like general topic on SQLServerpedia.
  • Volunteers for monthly live meetings
  • Help us spread the word through your networks and blogs
  • Your ideas on how the Professional Dev PASS virtual chapter can serve the community.

This chapter can only be as good as its members and by helping the chapter grow, you are really helping yourself grow.

So what do I think consists of professional development? Well, you could start of with this tag but that might give you the wrong impression. ;) Let’s see what the Internet says:

From Wikipedia:

Professional development refers to skills and knowledge attained for both personal development and career advancement. Professional development encompasses all types of facilitated learning opportunities, ranging from college degrees to formal coursework, conferences and informal learning opportunities situated in practice. It has been described as intensive and collaborative, ideally incorporating an evaluative stage [1] There are a variety of approaches to professional development, including consultation, coaching, communities of practice, lesson study, mentoring, reflective supervision and technical assistance.[2]

http://en.wikipedia.org/wiki/Professional_development

That is pretty close to what I was thinking especially skills\knowledge obtained for personal development and career advancement. Here are some PD topics IMO:

  • Social networking\Blogging
  • Handing tough situations\office politics\obnoxious coworkers in the work place
  • Negotiating offers and raises
  • Career path decision trees
  • Job seeking skills
  • Presentation delivery techniques
  • Management(Time, People, Project etc.)
  • Development of organizational skills
  • Volunteering in the SQL Server community

And the list could go on for a long time. Hit Arnie, Mark or myself up if this is something you would like to help out with.

What do you think should be covered in a Professional Development virtual chapter?

Filtered Stats to Counter Data Skew Issues

October 13, 2009 · Posted in SQL Server · Comments 

There are a handful of things that can make SQL Server choose a “bad plan”. They usually revolve around variable sized result sets. Some problematic scenarios are parameter sniffing, data skew, local variables, table variables, UDF’s or complex calculations in the where clause. All of these and more are detailed in my favorite and recently updated white paper.

We are going to be talking about data skew in this post. Picture a table where you have 1 billion  rows where IsShipped = 1 but 50k rows where IsShipped=NULL. Now, let’s say you have a query that returns orders that are not shipped(NULL). During compilation, the optimizer asks the stats how many rows are IsShipped=NULL. If the stats sampling touched, few, if any of the pages, where IsShipped = NULL, then SQL Server will guess that only 1 row will be returned. This may surpass the key lookup threshold(Part 1, Part 2) and a scan would not be optimal but better than a key lookup. Get it? We will look at an example soon if not.

In the past, your options have been limited. Update stats with fullscan was usually the best option unless the table was just too large and volatile. If it was, you probably had to use index hints or OPTIMIZE FOR the lesser of the two evils. You have covering indexes and is probably a good solution for the example above unless the result set has to be very wide.

Filtered statistics are a significant improvement and can be used nicely when dealing with this problem. They are smaller so cheaper to maintain. Their scope can be very narrow so they are more accurate.

There are a couple of caveats. I wrote about the need to use full scan on very selective stats here and Kimberly Tripp wrote about the fact that they will not auto update until the entire table crosses the threshold.

Let’s look at an example.

CREATE TABLE filtered_stats
(
c1 INT IDENTITY(1,1) PRIMARY KEY NONCLUSTERED,
c2 datetime DEFAULT GETDATE(),
c3 CHAR(500),
c4 INT
)

--lets add 1 million rows with c4=1
--lets add 1 row with c4 = 2
--Lets add a million more with c4 =3
--Finally, lets add 50 more with c4 =2 so the estimates are skewed.
DECLARE @ctr INT = 1000000
WHILE @ctr > 0
BEGIN
   INSERT INTO
filtered_stats(c3, c4)
  
VALUES
  
('blah', 1)
  
SELECT @ctr = @ctr-1
END
GO
INSERT INTO filtered_stats(c3, c4)
  
VALUES
  
('blah', 2)
DECLARE @ctr INT = 1000000
WHILE @ctr > 0
BEGIN
   INSERT INTO
filtered_stats(c3, c4)
  
VALUES
  
('blah', 3)
  
SELECT @ctr = @ctr-1
END
GO
DECLARE @ctr INT = 50
WHILE @ctr > 0
BEGIN
   INSERT INTO
filtered_stats(c3, c4)
  
VALUES
  
('blah', 2)
  
SELECT @ctr = @ctr-1
END

--Let's add a clustered index
CREATE CLUSTERED INDEX ix1 ON filtered_stats(c2)

--This index will be used for the key lookups
CREATE INDEX ix2 ON filtered_stats(c4

--The estimated rows returned are 1 but the actual rows are 51.

--This is a bad estimate due to uneven data distribution.
SELECT c1, c2
FROM filtered_stats
WHERE c4=2

 

underestimate

--Lets create narrow and more accurate filtered stats
--You still have to do full scan but you are touching just a fraction of the data – NOTE: Full scan is needed based on my observations here.
CREATE STATISTICS fstats ON filtered_stats(c4) WHERE c4 = 2 WITH fullscan

--This should estimate 51 no matter how many times you update stats since they are more accurate
SELECT c1, c2
FROM filtered_stats
WHERE c4=2

 

image

This is not bullet proof but I say it is an improvement. The downside include the fact that filtered stats are manually created. Manual stats updates will need to be done with a full scan. You may also say that a covering index(or filtered covering index) would kill this. That is true but there may be an actual need to return all columns of a table.

A quick wrap up: Put it in the tool box. It will be like that $50 wrench that you use once a year but it saves you 4 hours every time you touch it.

Sampling Problem with Filtered Statistics

October 12, 2009 · Posted in SQL Server · Comments 

I was writing a blog post to show how to solve the problem of bad estimations due to uneven data distribution with filtered statistics. After an hour or so of banging my head against the wall, I discovered that I ran into a bug feature. It appears that if the predicate on the filtered stats is very selective then a non-null histogram will not be generated unless they are created\updated with full scan. This is true for automatically generated stats for filtered indexes as well but they are created with full scan when the index is created.

Let’s look at demo data created with this code:


CREATE TABLE filtered_stats
(
c1 INT IDENTITY(1,1) PRIMARY KEY NONCLUSTERED,
c2 DATETIME DEFAULT GETDATE(),
c3 CHAR(500),
c4 INT
)

--lets at 1 million rows with c4=1
--lets add 1 row with c4 = 2
--Lets add a million more with c4 =3
--Finally, lets add 50 more with c4 =3 so the estimates are skewed.
DECLARE @ctr INT = 1000000
WHILE @ctr > 0
BEGIN
   INSERT INTO
filtered_stats(c3, c4)
  
VALUES
  
('blah', 1)
  
SELECT @ctr = @ctr-1
END
GO
INSERT INTO filtered_stats(c3, c4)
  
VALUES
  
('blah', 2)
DECLARE @ctr INT = 1000000
WHILE @ctr > 0
BEGIN
   INSERT INTO
filtered_stats(c3, c4)
  
VALUES
  
('blah', 3)
  
SELECT @ctr = @ctr-1
END
GO
DECLARE @ctr INT = 50
WHILE @ctr > 0
BEGIN
   INSERT INTO
filtered_stats(c3, c4)
  
VALUES
  
('blah', 2)
  
SELECT @ctr = @ctr-1
END

So now we have 1 million rows with c4 = 1 and 3. We have 51 rows with c4=2.

If we create this filtered stats, we have a null histogram: CREATE statistics fstats2 on filtered_stats(c4) where c4 =2

image

Updating or creating with fullscan give use the following data from dbcc show_statistics.

image

Bug or Feature? This is what BOL says:

“When only a few values are returned, however, sampling might not provide accurate statistics. For highly selective predicates, which return relatively few values, using the WITH FULLSCAN option will improve accuracy because it will scan all of the values instead of sampling.”

So they “recommend” that you use full scan on highly selective stats. However, I think that is a bug since we do not even get unfiltered rows or rows sampled back. Not to mention that inaccurate stats means to me that numbers are returned but they are most likely less than what is actually there. I will submit it as a bug unless one of you have thoughts otherwise.

So in a nut shell, ALWAYS update filtered stats with full scan or they may not actually be doing you any good. There are also some other considerations in regards to filtered stats that Kimberly Tripp has recently written about. There is some good conversion about it as well. Conor also posted about it here.

I will have more on filtered stats coming up in the next post.

Life

October 6, 2009 · Posted in Other · Comments 

Life happens and the next thing you know a month is gone. I know you have been wondering where I have been. Srsly, my ego allows me to think that the Internet takes notice when I am not in the usual hangouts. Anyway since I am that cool(in my head), I figure I would share the mundane details of my life. Besides, I figure if I leave anything to imagination I am bound to attract a couple dozen stalkers. I am pretty sure I have some web stalkers now. They are just very discreet.

Well to start things off, my kids spilled soymilk on my laptop. I have accidental damage coverage through best buy so I have a get out of jail free card but I am going to wait on the public Win7 launch for the new models. I have been eyeing the quad core i7 Dells.

The other time leech has been Halo OSDT. It was a great game. I spent most of my time working my way through the solo campaign. The story is pretty good but the ending had me saying that is it? Fire fight is fun. Multiplayer is ok except people cheat I suck at it.

I had a major project I accidentally fell into. I took a customer to Geek Mecca when he was in town from Denmark. I decided that I was going to come back and get a home theater with blu-ray. I did and the wife made me take it back after a day or two. It was too late at that point. The electronaholic was loose and prowling electronics stores at lunch. So I researched and hunted for a deal and got an Onkyo HT-S6200. I spent a week wiring, rewiring, reading manuals, and tweaking settings in a hedonistic geekfest of primal proportions. I think I excreted nerd musk at one point.

There was just one problem. I HAD TO HAVE Dolby tru-hd or DTS-HD master audio to get the true experience out of a 7.1 sound system. Another trip to Frys and the problem was solved with a blu-ray player.

At least, I thought the problem was solved. It turns out that the blu-ray could not go through the Onkyo to my TV because it was connected HMDI to DVI and it was not HDCP compliant. About this time, I also discovered that my HD cable channels were not working either. You know where this is going…

I was planning on a new TV for the family for Christmas anyway. What was I supposed to do? Go to component and optical cables which would quadruple the amount of cables behind my TV? I had no choice. Right?

Here is the setup and I really did shop around. I got some good deals. I even checked Craig’s list for the blu-ray player and the same model was going for more used than a new one because the seller bought it 6-9 months ago when it was twice as much.

Here is the setup:

It is not video\audiophile worthy but it is an upgrade for me. Not to mention that with a wife and three kids, I can’t have an orgy of one in a pile of new electronics and bubble wrap very often.

The only thing that I am worried about is the monkey on my back whispering about the 73” of goodness in the WD73737. It is just a whisper but it is getting louder. Must. Resist. Impure. Urges.

Here are before and after pics.

Before:

IMG00081-20091004-1209

After:

IMG00090-20091006-2037

IMG00091-20091006-2038