Archive for April, 2012

The notes are primarily taken from SQL Server 2008 Internals – Chapter 6 Indexes

Indexes – in general

The leaf level of an index contains something for every row of the table in indexed order.
Each non-leaf level stores something for every page of the level below.
When INCLUDE is used (non-clustered indexes only), leaf level pages can contain wider rows – wider than the 900 byte 16 col key max.
An index with a smaller key size scales faster -> Scale = non-leaf levels. This is because smaller key = less pages. This will also require fewer IOs to traverse.
The goal of an index is not to be extremely narrow. Extremely narrow indexes will be of less use than wider ones. But you need to be aware of the consequences of a wider index. INCLUDE and FILTERS can affect the index in both size and usefulness.
Index pages have almost the same structure as data pages except they store index records instead of data records.
8kb or 8,192 bytes
96 byte header
Off-set arrays of 2 bytes per row
Data rows have an Ncol field and null bitmap whether or not any columns allow null. Index rows only have a null bitmap and Ncol field if nulls are allowed in any of them columns.
Every index maintains a histogram representing the distribution of values for the leading column of the index. (To see the stats for a table use SP_AutoStats(TableName).
A constraint is a logical construct an index is a physical one.

Clustered Indexes

When a clustered index is created, the data becomes the leaf level of the index.
If there is no clustered index (a heap) then rows are identified by their physical row identifier (RID). This process of looking up corresponding data rows in the base table is known as a Bookmark Lookup.
The primary reason why a clustered index must be unique is so that non-clustered index entries can point to exactly one specified row.
If you build a clustered index without specifying the UNIQUE keyword, SQL Server guarantees uniqueness by adding a hidden uniquifier column to the rows.
The uniquifier is a 4-byte integer. Once added it becomes a part of the clustering key: meaning that it is duplicated in every non-clustered index. No duplicates on a non-unique index = no uniquifier. It is only added when necessary.
If a non-unique clustered index is created and duplicates are found the duplicate values will have the uniquifier value added to them:
Smith, Smith1, Smith2, Jones, Jones1, Jones2 – where 1 and 2 are the uniquifier.
There is no difference between the structure of a heap and the row structure of a clustered index unless the clustered index is not unique in which case a uniquifier is added to duplicate rows.
Be sure to choose a clustering key which is not volatile.
A table’s clustering key should also be chosen to minimise fragmentation for inserts.

Non-Clustered Indexes

For a non-clustered index the leaf level is a separate and additional structure which contains a copy of some of the table data.
The leaf level of a non-clustered index consists of the index key, any included columns, and either the clustered index or the row’s physical RID.
A non-clustered index is there to either help point to the data or to answer a query directly (covering index).
The bookmark lookup of a row occurs when a non-clustered index does not have all the data required by the query but the query is driven by a predicate that the index can help find.
The base table’s structure – either a heap or with a clustered index – affects the structure of your non-clustered index.
A non-clustered index can have non-key columns included in the leaf level using INCLUDE.
Create a filtered index by using INCLUDE in the create command.
IGNORE_DUP_KEY doesn’t allow the uniqueness of the index to be violated; instead, it makes a violation in a multi-row modification non-fatal to the non-violating rows. The violating rows are discarded.
If a non-clustered index is not-unique a copy of the clustered index or the RID will appear on the non-leaf levels as well as the leaf level of the index.

Filtered Indexes

Filtered indexes can be used to enforce uniqueness – ie you can have any number of Ancillary Add On Codes but only one where Active = True. Therefore you can create a unique index on Ancillary Add On Codes if you use the filter WHERE Active = True. You may also have a column which accepts nulls or unique values – if not null it must be unique (a unique index will only allow for one null entry).
Unique if not Null is useful for SPARSE columns.
Useful for busy queries over a narrow range.

Duplicate Indexes

SQL Server will allow you to create multiple identical indexes – which will need to be maintained separately and consume excess space on your disks.
To spot a duplicate index you need to consider these points:
Include columns only appear at the leaf level of the non-clustered index.
The clustered index key or the RID (for a heap) will appear at the leaf level of a non-clustered index.
A non-unique non-clustered index will also have the clustered key or RID at the non-leaf levels.
Columns must appear in the same order to be duplicated.
Table A
Col A – Clustered key
Col B
Col C
Col D
Col E
Index A – Non Unique – on ColB
Index B – Non Unique – on ColB, ColA
Index C – unique – on ColB, ColA
Index D – unique on ColA, ColB
Index E – Non Unique – on ColB – Include ColA
Index F – Non Unique – on ColB
Index A != Index B because A will have ColA on the non-leaf levels
Index C != Index D because the column order is different
Index E = Index F = Index C all have the same non-leaf columns (ColB and ColA) and they have the same leaf level – ColB and ColA.
You don’t need to add or Include a clustered index column in a non-clustered index as it is there by default.
Index Index columns Non-leaf level Leaf Level
A ColB Non Unique ColB, ColA ColB, ColA
B ColB, ColA Non Unique ColB, ColA ColB, ColA
C ColB, ColA unique ColB, ColA ColB, ColA
D ColA, ColB unique ColA, ColB ColA, ColB
E ColB ColB, ColA ColB, ColA
F ColB ColB, ColA ColB, ColA

A good blog post on duplicate indexes
Kimberly Tripp – Duplicate Indexes
SQL University: Index Usage (The Scary DBA)

MCM Training videos on indexing
Indexing Stragergies
Index Internals
The Clustered Index Debate

And that book again – I can’t over state how much I like this book…
SQL Server 2008 Internals – Chapter 6 Indexes

When SSMS starts up it copies the files from here:
%ProgramFiles%\Microsoft SQL Server\90\Tools\Binn\VSShell\Common7\IDE\sqlworkbenchprojectitems\sql
in my case on a 64bit machine
C:\Program Files (x86)\Microsoft SQL Server\100\Tools\Binn\VSShell\Common7\IDE\SqlWorkbenchProjectItems\sql

To your user folder (this is an example for me, my login is lynchs)
C:\Users\lynchs\AppData\Roaming\Microsoft\Microsoft SQL Server\100\Tools\Shell\Templates\Sql

The first thing I did was go into %ProgramFiles%\Microsoft SQL Server\90\Tools\Binn\VSShell\Common7\IDE\sqlworkbenchprojectitems and copy all the templates out somewhere. I’ve never used them but I guess I might.

You should leave the file SQLFile.sql – this is your default New Query file.
I also left Sql.vsdir although I’m not sure what that one is for/

Then I went into C:\Users\lynchs\AppData\Roaming\Microsoft\Microsoft SQL Server\100\Tools\Shell\Templates\Sql and deleted all of it.

Now I have no templates. I do have my own TSQL Library and so I set up a robocopy job to populate my templates directory from my library. It looks like this:
robocopy “H:\IT\Delivery\DBA\SQLTools\TSQL” “C:\Users\lynchs\AppData\Roaming\Microsoft\Microsoft SQL Server\100\Tools\Shell\Templates\Sql” /R:2 /W:2 /Mir *.sql

I choose to do it this way because the templates directory is more volatile. I’ve already mentioned how it tries to copy files from %ProgramFiles%\Microsoft SQL Server\90\Tools\Binn\VSShell\Common7\IDE\sqlworkbenchprojectitems. I am also more likely to accidently delete the file from templates in SSMS than I am from my own library (which could be set to readonly).

Remember if you want ot update or reorganise the templates you do the work in your TSQL Library and robocopy will then modify the templates folder. I have robocopy set to run on log-on.

When importing Excel documents in SSIS on a 64 bit machine you may will encounter this error:

[SSIS.Pipeline] Error: component “Excel Source” (35) failed validation and returned error code 0xC020801C.

It may also tell you that it failed to acquire a connection.

This is because there is no 64bit driver for Access or Excel – with csv files you can use the Flat File connector.

To carry on working you need to set your project up to run in 32 bit mode. Right click on the project properties and go to debugging. Change the value in run64BitRuntime to False.

64bit properties

I recently added a filtered index to a table without any problems. Later that night jobs began to fail with the error:

Description: Executing the query “NameOfStoredProcedure” failed with the following error: “UNKNOWN TOKEN failed because the following SET options have incorrect settings: ‘QUOTED_IDENTIFIER’. Verify that SET options are correct for use with indexed views and/or indexes on computed columns and/or filtered indexes and…

The problem was with the Quoted Identifier settings for the stored procedure. For the query to work the Quoted Identifier needs to be set to ON. The problem is that each stored procedure inherits the settings used when it was created.

You can check these settings by right-clicking on the stored proc and selecting properties. In the general tab there is a list of options which includes the QUOTED_IDENTIFIER option. These values will be set to either True or False.

The default Value for QUOTED_IDENTIFIER is ON.

From BOL


When a stored procedure is created, the SET QUOTED_IDENTIFIER and SET ANSI_NULLS settings are captured and used for subsequent invocations of that stored procedure.

Create Procedure

The Database Engine saves the settings of both SET QUOTED_IDENTIFIER and SET ANSI_NULLS when a Transact-SQL procedure is created or modified. These original settings are used when the procedure is executed. Therefore, any client session settings for SET QUOTED_IDENTIFIER and SET ANSI_NULLS are ignored when the procedure is running.

Other SET options, such as SET ARITHABORT, SET ANSI_WARNINGS, or SET ANSI_PADDINGS are not saved when a procedure is created or modified. If the logic of the procedure depends on a particular setting, include a SET statement at the start of the procedure to guarantee the appropriate setting. When a SET statement is executed from a procedure, the setting remains in effect only until the procedure has finished running. The setting is then restored to the value the procedure had when it was called. This enables individual clients to set the options they want without affecting the logic of the procedure.

Any SET statement can be specified inside a procedure, except SET SHOWPLAN_TEXT and SET SHOWPLAN_ALL. These must be the only statements in the batch. The SET option chosen remains in effect during the execution of the procedure and then reverts to its former setting.