Archive for the ‘Certification’ Category

The notes are primarily taken from SQL Server 2008 Internals – Chapter 6 Indexes

Indexes – in general

The leaf level of an index contains something for every row of the table in indexed order.
Each non-leaf level stores something for every page of the level below.
When INCLUDE is used (non-clustered indexes only), leaf level pages can contain wider rows – wider than the 900 byte 16 col key max.
An index with a smaller key size scales faster -> Scale = non-leaf levels. This is because smaller key = less pages. This will also require fewer IOs to traverse.
The goal of an index is not to be extremely narrow. Extremely narrow indexes will be of less use than wider ones. But you need to be aware of the consequences of a wider index. INCLUDE and FILTERS can affect the index in both size and usefulness.
Index pages have almost the same structure as data pages except they store index records instead of data records.
8kb or 8,192 bytes
96 byte header
Off-set arrays of 2 bytes per row
Data rows have an Ncol field and null bitmap whether or not any columns allow null. Index rows only have a null bitmap and Ncol field if nulls are allowed in any of them columns.
Every index maintains a histogram representing the distribution of values for the leading column of the index. (To see the stats for a table use SP_AutoStats(TableName).
A constraint is a logical construct an index is a physical one.

Clustered Indexes

When a clustered index is created, the data becomes the leaf level of the index.
If there is no clustered index (a heap) then rows are identified by their physical row identifier (RID). This process of looking up corresponding data rows in the base table is known as a Bookmark Lookup.
The primary reason why a clustered index must be unique is so that non-clustered index entries can point to exactly one specified row.
If you build a clustered index without specifying the UNIQUE keyword, SQL Server guarantees uniqueness by adding a hidden uniquifier column to the rows.
The uniquifier is a 4-byte integer. Once added it becomes a part of the clustering key: meaning that it is duplicated in every non-clustered index. No duplicates on a non-unique index = no uniquifier. It is only added when necessary.
If a non-unique clustered index is created and duplicates are found the duplicate values will have the uniquifier value added to them:
Smith, Smith1, Smith2, Jones, Jones1, Jones2 – where 1 and 2 are the uniquifier.
There is no difference between the structure of a heap and the row structure of a clustered index unless the clustered index is not unique in which case a uniquifier is added to duplicate rows.
Be sure to choose a clustering key which is not volatile.
A table’s clustering key should also be chosen to minimise fragmentation for inserts.

Non-Clustered Indexes

For a non-clustered index the leaf level is a separate and additional structure which contains a copy of some of the table data.
The leaf level of a non-clustered index consists of the index key, any included columns, and either the clustered index or the row’s physical RID.
A non-clustered index is there to either help point to the data or to answer a query directly (covering index).
The bookmark lookup of a row occurs when a non-clustered index does not have all the data required by the query but the query is driven by a predicate that the index can help find.
The base table’s structure – either a heap or with a clustered index – affects the structure of your non-clustered index.
A non-clustered index can have non-key columns included in the leaf level using INCLUDE.
Create a filtered index by using INCLUDE in the create command.
IGNORE_DUP_KEY doesn’t allow the uniqueness of the index to be violated; instead, it makes a violation in a multi-row modification non-fatal to the non-violating rows. The violating rows are discarded.
If a non-clustered index is not-unique a copy of the clustered index or the RID will appear on the non-leaf levels as well as the leaf level of the index.

Filtered Indexes

Filtered indexes can be used to enforce uniqueness – ie you can have any number of Ancillary Add On Codes but only one where Active = True. Therefore you can create a unique index on Ancillary Add On Codes if you use the filter WHERE Active = True. You may also have a column which accepts nulls or unique values – if not null it must be unique (a unique index will only allow for one null entry).
Unique if not Null is useful for SPARSE columns.
Useful for busy queries over a narrow range.

Duplicate Indexes

SQL Server will allow you to create multiple identical indexes – which will need to be maintained separately and consume excess space on your disks.
To spot a duplicate index you need to consider these points:
Include columns only appear at the leaf level of the non-clustered index.
The clustered index key or the RID (for a heap) will appear at the leaf level of a non-clustered index.
A non-unique non-clustered index will also have the clustered key or RID at the non-leaf levels.
Columns must appear in the same order to be duplicated.
Table A
Col A – Clustered key
Col B
Col C
Col D
Col E
Index A – Non Unique – on ColB
Index B – Non Unique – on ColB, ColA
Index C – unique – on ColB, ColA
Index D – unique on ColA, ColB
Index E – Non Unique – on ColB – Include ColA
Index F – Non Unique – on ColB
Index A != Index B because A will have ColA on the non-leaf levels
Index C != Index D because the column order is different
Index E = Index F = Index C all have the same non-leaf columns (ColB and ColA) and they have the same leaf level – ColB and ColA.
You don’t need to add or Include a clustered index column in a non-clustered index as it is there by default.
Index Index columns Non-leaf level Leaf Level
A ColB Non Unique ColB, ColA ColB, ColA
B ColB, ColA Non Unique ColB, ColA ColB, ColA
C ColB, ColA unique ColB, ColA ColB, ColA
D ColA, ColB unique ColA, ColB ColA, ColB
E ColB ColB, ColA ColB, ColA
F ColB ColB, ColA ColB, ColA

A good blog post on duplicate indexes
Kimberly Tripp – Duplicate Indexes
SQL University: Index Usage (The Scary DBA)

MCM Training videos on indexing
Indexing Stragergies
Index Internals
The Clustered Index Debate

And that book again – I can’t over state how much I like this book…
SQL Server 2008 Internals – Chapter 6 Indexes


Posted: August 2, 2010 in 40-441, 70-442, Certification

I passed the 70-442 exam: PRO: Designing and Optimizing Data Access by Using Microsoft SQL Server 2005 today, with 910/1000. This now makes me a MCITP: Database Developer – so I must update my CV 🙂

I liked this exam although I felt there was a little too much XML and cursors and not enough TSQL. The 70-441 exam felt a bit false and strained – I am going to word this question so that you have to pick one of the new technologies rather than solving a problem.

Well that’s that for now, I’ll do the 2008 upgrade next year. I will probably go for the 2008 DBA upgrade this year. This year I am planning to take the 70-445 exam: TS: Microsoft SQL Server 2005 Business Intelligence – Development and Maintenance. Then I need the two windows admin exams to finally complete the SLQ 2000 DBA exams. I have been putting those off as I do so little with Windows outside of SQL server and a bit with Active Directory.

I passed the 70-444 exam today which is the final exam in the MCITP: Database Administrator trilogy.
This exam was 52 multiple choice questions – old style mcp type exam.
The 70-443 was a set of 6 case studies with 12 questions on each.

I find that after revising for a while I begin to get what they are testing and can normally answer the question based on the answer options – although reading the question helps!

For the 70-443 exam I skimmed through the case studies and then went back to look at the specifics as they related to the individual questions.

I have still got to finish off the 2000 MCDBA. I have the 2 SQL exams just need to do the windows admin exams.