SQL Server Clustered Indexes internals with examples

In this article, we will learn the SQL Server clustered index concept and some internal details. Indexes are the
database objects that accelerate the performance of data accessing when are designed properly. A clustered index is
one of the main index types in SQL Server and the working principle is a bit complicated but in the next sections of
this article, we are going to simply learn the clustered index working principle and uncover the secrets.

What is a clustered index?

SQL Server clustered index creates a physical sorted data structure of the table rows according to the defined index
key.

    Secret:

  • The physical word is mostly used in clustered index descriptions, but the clustered index does not guarantee the physical order of the rows

This sorted data structure is called a B-tree (balanced tree). B-tree structure enables us to
find the queried rows faster to using the sorted key value(s). Table data can be sorted physically in only one
direction for this reason we can define only one clustered index per table. The following image illustrates a
logical structure of the clustered index.

Structure of the clustered index

The root and intermediate levels contain the index key values and page pointers. The page pointers point to the
previous and subsequent index pages of their own. These two levels don’t store any row data. At the same time, index
pages hold information about the ahead and behind index page numbers.B-tree structure based-on three different
levels:

  • Root level: The top level of the B-tree is called as root level. The root level is the starting
    point of the data searching
  • Intermediate level: This level provides a connection between root and leaf levels. SQL Server
    does not create an intermediate level when the amount of data rows are too small
  • Leaf Level: This level is the lowest level of the clustered index and all records are stored
    at this level

For example, when we want to query a row that’s record number 10, the storage engine will travel across the
following red dotted path. The searching mechanism begins its travel at the root level and reaches the data row at
the leaf level.

SQL Server clustered index record searching mechanism

As we can see, the searched row was found with the minimum read operation by the storage engine otherwise it has to
read whole table rows.

SQL Server Clustered Index and Singleton Seek

After briefly discussing the clustered index structure, let’s go into the details of how a record in a table is
searched by the clustered index. Firstly, we will create a Cars table and then we will populate
some sample data into it.

We will define a unique clustered index to the Cars table.

SQL Server performs a clustered index seek process when it accesses data to using the B-tree structure. This operation type is represented by the following icon in the execution plans.

Clustered index seek operator icon

When we execute the following query, it will perform a clustered index seek operation. The clustered index seek
operation uses the structure of the B-tree structure very efficiently and easily finds the qualified row(s).

Seek predicates details in an execution plan

In this execution plan, the seek predicates indicate that the storage engine uses the B-tree structure and finds the
leaf level that stores the data rows. For this query, the uniqueness of the clustered index is very important
because this constraint guarantees that only one row will return from the query. This data searching concept is
called singleton seek.

    Secret:

  • Defining a clustered index as unique can gain performance improvements when the indexed column(s) is used after the WHERE clause

SQL Server Clustered Index and Range Scan

When we explain the structure of the clustered index, we mentioned an interconnection between the index pages and
its backward and forward pages. This connection is very useful for queries that have upper or lower boundaries or
have both of them. For example, the following query will perform singleton seek firstly and reaches the leaf level
that contains the data row (Id=12). According to retrieve the index keys, the range scans operation
has been performed either in forward or backward directions.

Range scan and seek predicates

In the execution plan, the ScanDirection attributes show the direction of the range seek process.

Scan direction of a query

The following image illustrates how the range scan process works.

Range scan working principle and SQL Server clustered index

Most of the time, we can hear that the SQL Server clustered index seek operator is super faster than the other data searching operations. However, this myth may not be exactly true for some queries that perform a range scan. For example, we create a table and insert some synthetic data.

Now we execute the following query and examine the execution plan.

It has read 49 pages and has performed a table scan.

Table scan operator and logical read

In this step, we will add a unique non-clustered index and re-execute the previous query.

SQL Server clustered index and performance

For this query, the clustered index seek operator does not change the performance of the query and both of them have
read the same amount of the data. The problem of this query is it reads whole leaf pages of the index so it doesn’t
make any difference in performance between the table scan and index seek search. On the other hand, if we repeat the
same test for the following query, we will not obtain the same result.

Without the clustered index:

Table scan and logical read

With the clustered index:

Clustered index seek and logical read

Besides that, the singleton seeks principle can also work multiple times for some queries as we can see in the below query.

Multiple singleton seeks

The following image illustrates how the multiple singleton seek process works:

Multiple singleton seeks and SQL Server clustered index

The following query is another example of the multiple singleton seeks:

Multiple singleton seeks example

SQL Server Clustered Index and Primary Key

The primary key ensures that the values of a column in the table are unique so that all rows are identified with a
unique key. By default when we create a primary key SQL Server creates a unique clustered index. However, we can
create a primary key without a clustered index because the only mandatory requirement is uniqueness for the primary
key. So the main differences between the primary key and clustered index are:

  • A primary key is a logical structure and it provides the uniqueness for a table
  • A clustered index is a physical structure and it provides the physical order of the records on the storage

Perhaps, this question is the right one to ask:

Why we use the primary key and clustered indexes on the same key column?

SQL Server adds a 4-byte uniquefier value for every duplicate index key on the
non-unique clustered indexes.

At first, we will create a very basic table, it will include only two-column and then we create a non-unique
clustered index for this table.

Now, we will insert a row to this table and querying the size of max and minimum record size of the index pages with
help of the sys.dm_db_index_physical_stats view.

Record sizes of the clustered indexes

We will add a duplicate row and re-examine the max record size column.

dm_db_index_physical_stats DMV usage

Now let’s look at the metadata of the index page. At first, we will determine the page id of the clustered index
page.

Finding the SQL Server clustered index  page number

DBCC PAGE command shows the contents of the data and index page. We will use this command to find out detailed information about the index page.

Metadata of a clustered index

As we can see the first index page uniquifier field value is 0 which means there isn’t a
uniquifier defined for this index page. However, the second index page includes an
uniquifier field value is 1 which means SQL Server adds an additional KeyHashValue.

Conclusion

In this article, we have uncovered some secrets of SQL Server clustered indexes. Actually clustered indexes are
widely used by the SQL developers or DBAs but some details can go unnoticed. Singleton seek and range scan have to
be considered for the performance of the clustered index.

Esat Erkec
Latest posts by Esat Erkec (see all)

Author: admin

1 thought on “SQL Server Clustered Indexes internals with examples

Leave a Reply

Your email address will not be published.