Sep 8, 2009

Improving database performance with Table Partitioning

Table Partitioning is new concept introduced in MS SQL 2005. Partitioning is concept of taking performance benefit from SQL server by splitting the tables in multiple parts, on the basis of values in pre-defined column. However in old days of sql (6.5-2000), good developers knew how to take benefit from sql by splitting tables. They used to create multiple tables instead of one table and writing check constraint to group data in tables. Let me try to explain you this concept using following example:

create table Sales_1
SKU varchar(20),
SaleDate datetime

alter table Sales_1
ADD CONSTRAINT sales_date_1 check(SaleDate < '2009-01-01')

create table sales_2 (
SKU varchar(20)
SaleDate datetime

alter table Sales_2 ADD CONSTRAINT sales_date_1 check (saledate >= '2009-01-01')

We have two tables, sales_1 and sales_2, on each table we have setup the check constraint. On one of the tables we added constraint to keep the records prior to 2009 and in other table we are keeping records where sales belong to 2009.

create view sales
select * from sales_1
union all
select * from sales_2

We created a view sales, in which we used union to get combine the rows from both of the tables. If you query data for date prior to 2009, it will hit only sales_1 table, and otherwise it will hit sales_2. This approach is when we think of archiving data, since in this case, we almost archived our prior to 2009 data and that table will not get hit, unless we write query to do that.

The only limitation to this approach was to inserting rows using views.

In sql 2005, it is quite easy and systematic approach to achieve the same thing, but using Table Partitioning. By default sql creates one partition for each table in database, however more than one partitions can be added to tables. Logically speaking partitioning is a horizontal split of rows. Each set of rows will be saved into certain partition that matches the value in specified column on the basis of certain rule.

Let's say, In your database you have data for ten years with million of rows, and you can split the data (rows) into ten different partitions and each partition belongs to each year. Thus it logically makes it well organized and also boosts the performance. This is importance to know that to get performance benefit, partitioning should be applied to large tables, where performance is generally poor, while selecting and storing data.

Well, once partitioning is in place, you don't need to worry about putting data into selective partitions, or while selecting data you don't need to worry about writing more than one queries. It would be SQL's duty to take care of partitioning rules.

Prior to creating partition on tables, two things needs to be followed, First creating function and second creating a scheme on basis of function. Once partition function and scheme are in place, last step will be to create table on the basis of function and scheme.

Partition Function

In simple terms, a partition function defines the logical boundary for the data. Let's take the same example as I took for partitioned view. I want to distribute the sales information of year 2008-2009 in first partition, 2009-2010 in second partition. To do this, first I need to decide which data type need to be considered to partition. Of-course in this case, it will be datetime datatype. One thing needs to be understand here that Partition Functions are independent of any tables. Partition functions can be referenced by any table.

Now I know that I have to make the groups or boundaries, i.e. 2008-1-1 to 2008-12-31, then 2009-1-1 to 2009-12-31. There are two type of partition functions, Left Range Functions and Right Range Functions.

Left Range

Data equivalent to the boundary will move into the Left Partition. If left boundary is defined as the first boundary in the range then all data equivalent to the boundary or less than the boundary will move into the first partition. If we create a Left Range Function with boundaries of 2008-12-31, and 2009-12-31, then it creates three partitions. Partition 1 - values equal to 2008-12-31 or less than this. Partition 2 - values equal to 2009-12-31 and greater than 2008-12-31. Any value greater than 2009-12-31 will go into partition 3.


Right Range

The only difference between Left Range and Right Range is that using Right Range data equal to the boundary will move into the next partition. To accomplish the same result as Left Range, the Right Range can be defined in following way.

('2008-1-1', '2009-1-1')

Partition Schemes

This is not all. Once partition function is created, next step is to decide in to which file group partitioned data should go. In the function we created three different boundaries and to gain highest performance out of the partition function, we need to create three different file groups. Let's assume that I have created three different file groups, fg1, fg2 and fg3 on three different storage arrays. Partition scheme decides which file group the data should go for each partition.

AS PARTITION part_sales

Partition Table

Once the partition function and partition schemes are created, next step is to create the partition table. A partition table must be created with non clustered primary key. The clustered index while adding to the table, will partition the table on the basis of partition scheme.

productID int NOT NULL,
Sale_agent VARCHAR(50) NOT NULL,
Sale_Final_Date datetime NOT NULL

ON sales(sale_final_date)
ON ps_Sales(sale_final_date);

When you query the partition table, no special query is required. Query Optimizer it self choses the right partition to display the results.

I will write more blogs on partitioning that will show how to edit the partition, removing partition etc.

No comments:

Post a Comment

Optimizing Indexes with Execution Plans

Contact Me


Email *

Message *