The title for this post might sound ironical since rebuild of the index is used to remove the fragmentation of the index so how does rebuild of index affect the fragmentation of index?  Well, this is what I would like to explain in this blog post.

Generally, for rebuilding of indexes (alter index … rebuild) optimizer tries to generate a serial plan with no parallelism, however for large tables, the index rebuild compels the optimizer to generate parallel plan to distribute the keys across multiple threads to perform parallel sorts so that indexes rebuild can be performed faster at the expense of higher CPU.

The Parallel Index rebuild Query Plan uses range partitioning where in each of the thread is given a range of keys which it sorts and build a smaller index subunits.Finally there is Gather stream operator which links all the index subunits together (logically) to form a logically contiguous index page links but physically non-contiguous.

Thus when the optimizer decides to go for parallel plan to rebuild the index, the parallel threads does extent allocation in noncontiguous allocation and thereby doesn’t eliminate the fragmentation completely and still leads to some fragmentation due to extent allocation in non-contiguous location.

Let me illustrate this with an example

Performing large number of random inserts in the table which will cause fragmentation in the table

Let us know check the fragmentation in the clustered index of the table caused due to random inserts

database_id object_id index_id  index_level    avg_fragmentation_in_percent    fragment_count        avg_fragment_size_in_pages      page_count
12     197575742         1              0               66.7894587701899                     2358                        1.49660729431722                    3529
12     197575742         1              1               90.9090909090909                      11                            1                                              11
12     197575742         1              2               0                                                 1                             1                                               1

As we see in the above output, the leaf level of the clustered index (level 0) is around 67% fragmentated with 2358 fragments with around 1.5 pages per fragments. Hence the total clustered index takes around 3541 pages (3529+11+1).

Now let me rebuild the index by using the following command

Even if I do not specific MAXDOP=2 , SQL instance on my laptop  will use parallelism of 2 by default since it is a dual-core machine.

After rebuild the index, let us check the fragmentation again.

database_id object_id      index_id  index_level avg_fragmentation_in_percent   fragment_count   avg_fragment_size_in_pages  page_count
12               197575742   1             0              0.0868809730668983                  8                      143.875                               1151
12               197575742   1             1              0                                                2                      1                                          2
12               197575742   1             2              0                                                1                      1                                          1

As expected we see in the above output, the fragmentation in the leaf level of the index is reduced to 0.09% with 8 fragments at leaf level and 2 fragments at the intermediate level. The leaf level has on an average 144 pages per fragments which intermediatel level has 1 page per fragments

Now let us truncate the table,populate the table again with the same script above and rebuild the index but this time WITH MAXDOP=1 setting and check the fragmentation of the index

database_id object_id             index_id   index_level   avg_fragmentation_in_percent   fragment_count   avg_fragment_size_in_pages   page_count
12              197575742             1            0                 0                                             3                        383.333333333333                1150
12              197575742             1            1                 0                                             1                        2                                            2
12              197575742             1            2                 0                                             1                        1                                            1

As we see in the output now, the leaf level of the clustered index has now only 3 fragments with each fragments having around 384 pages on an average per fragment. Further the entire leaf level now requires only 1150 pages as compared 1151 pages when the index is rebuild WITH MAXDOP=2. if we compare the intermediate level we now have only 1 fragments which consists of 2 pages as compared to 2 fragments observed previously
A fragment is a chunk or a segment in the datafile with continuous allocation of pages. So more number of fragments is an indication of more number of dis-continuous allocation of segments hence more fragmentation.

My laptop didn’t had enough resources nor did I have patience for data load, but if you want to see difference more prominent, you can load more rows in the above script and if you have more cpus and hence more parallel threads used for rebuild, the amount of fragments will be huge

Using the above learning we conclude the following

Conclusion

For large databases/datawarehouse generally of the order of TBs which are hosted on the  servers with large number of CPUs i.e SMP architecture, while rebuilding indexes it is recommended to rebuild with (MAXDOP=1). This can lead to index rebuild being slower but at the same time we ensure the indexes are completely defragmentated and further serial plan uses less resources (CPU and memory) as compared to parallel plan which causes high utilization of CPU and Memory.

Further, due to same reason in SQL 2008 R2 Fastrack Datawarehouse, Microsoft recommends to rebuild the index WITH (MAXDOP=1).

So, I would not recommend to use serial plan for rebuild indexes always. The answer is our favourite “Its depends”, if the parallel index rebuild doesn’t use much resources and reduces the fragmentation to a tolerable limit we can go for the parallel plan which is default. However if either the resource utilization caused by the index rebuild is high or if the fragmentation of the index is not reduced within the desired limit, we should go for MAXDOP=1

Further Reading
——————

http://msdn.microsoft.com/en-us/library/ms191292(v=sql.90).aspx

http://blogs.msdn.com/b/sqlqueryprocessing/archive/2006/12/13/index-build-strategy-in-sql-server-part-2-offline-parallel-no-partitioning-non-stats-plan-no-histogram.aspx

http://blogs.msdn.com/b/sqlqueryprocessing/archive/2006/12/11/index-build-strategy-in-sql-server-part-2-offline-parallel-no-partitioning.aspx

Hope this helps!!!

Regards

Parikshit Savjani

Like us on FaceBook Follow us on Twitter | Join the fastest growing SQL Server group on FaceBook

Follow me on Twitter | Follow me on FaceBook

SQL Server What is a Sparse Column?
SQL Server Failover Cluster - sysprep your cloned images