Good evening/day/morning, dear habrapeople! We continue to develop and expand the blog about my beloved open source rdbms Postgresql. Miraculously, it so happened that the topic of today’s topic has never been raised here before. I must say that partitioning in postgresql is very well described in the documentation, but will that really stop me?).

Introduction

In general, partitioning is generally understood not as some kind of technology, but rather as an approach to database design that appeared long before DBMSs began to support the so-called. partitioned tables. The idea is very simple - divide the table into several smaller parts. There are two subtypes - horizontal and vertical sectioning.
Horizontal partitioning
Parts of a table contain different rows. Let's say we have a log table for some abstract application - LOGS. We can break it down into parts - one for January 2009 logs, another for February 2009, etc.
Vertical partitioning
Parts of a table contain different columns. Finding an application for vertical partitioning (when it is actually justified) is somewhat more difficult than for horizontal partitioning. As a spherical horse, I propose to consider this option: the NEWS table has columns ID, SHORTTEXT, LONGTEXT, and let the LONGTEXT field be used much less frequently than the first two. In this case, it makes sense to split the NEWS table by columns (create two tables for SHORTTEXT and LONGTEXT, respectively, connected by primary keys + create a NEWS view containing both columns). Thus, when we only need a description of the news, the DBMS does not have to read the entire text of the news from disk.
Support for partitioning in modern DBMSs
Most modern DBMSs support table partitioning in one form or another.
  • Oracle- supports partitioning starting from version 8. Working with sections, on the one hand, is very simple (you don’t have to think about them at all, you work like with a regular table*), and on the other hand, everything is very flexible. Sections can be divided into “subpartitions”, deleted, divided, transferred. Various options for indexing a partitioned table are supported (global index, partitioned index). Link to lengthy description.
  • Microsoft SQL Server - support for partitioning appeared recently (in 2005). The first impression of use is “Well, finally!! :)”, the second is “It works, everything seems to be ok.” Documentation on msdn
  • MySQL- supports starting from version 5.1. Very good description on Habré
  • And so on…
*-I’m lying, of course, there is a standard set of difficulties - creating a new section on time, throwing out the old one, etc., but somehow everything is simple and clear.

Partitioning in Postgresql

Partitioning tables in postgresql is slightly different in implementation from other databases. The basis for partitioning is table inheritance (a thing unique to postgresql). That is, we must have a master table, and its sections will be successor tables. We will consider partitioning using an example of a task close to reality.
Formulation of the problem
The database is used to collect and analyze data about visitors to the site/sites. The volumes of data are large enough to think about partitioning. In most cases, analysis uses data from the last day.
1. Create the main table:
CREATE TABLE analytics.events

user_id UUID NOT NULL ,
event_type_id SMALLINT NOT NULL ,
event_time TIMESTAMP DEFAULT now() NOT NULL ,
url VARCHAR (1024) NOT NULL ,
referrer VARCHAR (1024),
ip INET NOT NULL
);

2. We will partition by day using the event_time field. We will create a new section for each day. We will name the sections according to the rule: analytics.events_DDMMYYYY. Here, for example, is the section for January 1, 2010.
CREATE TABLE analytics.events_01012010
event_id BIGINT DEFAULT nextval("analytics.seq_events" ) PRIMARY KEY ,
CHECK (event_time >= TIMESTAMP "2010-01-01 00:00:00" AND event_time< TIMESTAMP "2010-01-02 00:00:00" )
) INHERITS(analytics.events);

* This source code was highlighted with Source Code Highlighter.


When creating a section, we explicitly set the event_id field (PRIMARY KEY is not inherited) and create a CHECK CONSTRAINT on the event_time field so as not to insert unnecessary things.

3. Create an index on the event_time field. When we partition the table, we expect that most queries against the events table will use a condition on the event_time field, so an index on this field will help us a lot.

CREATE INDEX events_01012010_event_time_idx ON analytics.events_01012010 USING btree(event_time);

* This source code was highlighted with Source Code Highlighter.


4. We want to ensure that when inserted into the main table, the data ends up in the section intended for it. To do this, we do the following trick - we create a trigger that will control data flows.
CREATE OR REPLACE FUNCTION analytics.events_insert_trigger()
RETURNS TRIGGER AS $$
BEGIN
IF (NEW .event_time >= TIMESTAMP "2010-01-01 00:00:00" AND
NEW .event_time< TIMESTAMP "2010-01-02 00:00:00" ) THEN
INSERT INTO analytics.events_01012010 VALUES (NEW .*);
ELSE
RAISE EXCEPTION "Date % is out of range. Fix analytics.events_insert_trigger", NEW .event_time;
END IF ;
RETURN NULL ;
END ;
$$
LANGUAGE plpgsql;

* This source code was highlighted with Source Code Highlighter.


CREATE TRIGGER events_before_insert
BEFORE INSERT ON analytics.events
FOR EACH ROW EXECUTE PROCEDURE analytics.events_insert_trigger();

* This source code was highlighted with Source Code Highlighter.

5. Everything is ready, we now have a partitioned table called analytics.events. We can start furiously analyzing her data. By the way, we created CHECK constraints not only to protect sections from incorrect data. Postgresql can use them when creating a query plan (however, with a live index on event_time, the gain will be minimal), just use the constraint_exclusion directive:

SET constraint_exclusion = on ;
SELECT * FROM analytics.events WHERE event_time > CURRENT_DATE ;

* This source code was highlighted with Source Code Highlighter.

End of the first part
So what do we have? Let's go point by point:
1. The events table, divided into sections, analysis of available data for the last day becomes easier and faster.
2. The horror of realizing that all this needs to be supported somehow, sections need to be created on time, not forgetting to change the trigger accordingly.

I’ll tell you how to easily and carefree work with partitioned tables in the second part.

UPD1: Replaced partitioning with partitioning
UPD2:
Based on a comment from one of the readers who, unfortunately, does not have an account on Habré:
There are several issues associated with inheritance that should be taken into account when designing. Partitions do not inherit the primary key and foreign keys on their columns. That is, when creating a section, you need to explicitly create PRIMARY KEY and FOREIGN KEYs for the section columns. I would like to note on my own that it is not possible to create FOREIGN KEY on the columns of a partitioned table. the best way. In most cases, the partitioned table is a "fact table" and itself refers to a "dimension" of the table.

Good evening/day/morning, dear habrapeople! We continue to develop and expand the blog about my favorite open source rdbms Postgresql. Miraculously, it so happened that the topic of today’s topic has never been raised here before. I must say that partitioning in postgresql is very well described in the documentation, but will that really stop me?).

Introduction

In general, partitioning is generally understood not as some kind of technology, but rather as an approach to database design that appeared long before DBMSs began to support the so-called. partitioned tables. The idea is very simple - divide the table into several smaller parts. There are two subtypes - horizontal and vertical sectioning.
Horizontal partitioning
Parts of a table contain different rows. Let's say we have a log table for some abstract application - LOGS. We can break it down into parts - one for January 2009 logs, another for February 2009, etc.
Vertical partitioning
Parts of a table contain different columns. Finding an application for vertical partitioning (when it is actually justified) is somewhat more difficult than for horizontal partitioning. As a spherical horse, I propose to consider this option: the NEWS table has columns ID, SHORTTEXT, LONGTEXT, and let the LONGTEXT field be used much less frequently than the first two. In this case, it makes sense to split the NEWS table by columns (create two tables for SHORTTEXT and LONGTEXT, respectively, connected by primary keys + create a NEWS view containing both columns). Thus, when we only need a description of the news, the DBMS does not have to read the entire text of the news from disk.
Support for partitioning in modern DBMSs
Most modern DBMSs support table partitioning in one form or another.
  • Oracle- supports partitioning starting from version 8. Working with sections, on the one hand, is very simple (you don’t have to think about them at all, you work like with a regular table*), and on the other hand, everything is very flexible. Sections can be divided into “subpartitions”, deleted, divided, transferred. Various options for indexing a partitioned table are supported (global index, partitioned index). Link to lengthy description.
  • Microsoft SQL Server- support for partitioning appeared recently (in 2005). The first impression of use is “Well, finally!! :)”, the second is “It works, everything seems to be ok.” Documentation on msdn
  • MySQL- supports starting from version 5.1.
  • And so on…
*-I’m lying, of course, there is a standard set of difficulties - creating a new section on time, throwing out the old one, etc., but somehow everything is simple and clear.

Partitioning in Postgresql

Partitioning tables in postgresql is slightly different in implementation from other databases. The basis for partitioning is table inheritance (a thing unique to postgresql). That is, we must have a master table, and its sections will be successor tables. We will consider partitioning using an example of a task close to reality.
Formulation of the problem
The database is used to collect and analyze data about visitors to the site/sites. The volumes of data are large enough to think about partitioning. In most cases, analysis uses data from the last day.
1. Create the main table:
CREATE TABLE analytics.events

user_id UUID NOT NULL ,
event_type_id SMALLINT NOT NULL ,
event_time TIMESTAMP DEFAULT now() NOT NULL ,
url VARCHAR (1024) NOT NULL ,
referrer VARCHAR (1024),
ip INET NOT NULL
);

2. We will partition by day using the event_time field. We will create a new section for each day. We will name the sections according to the rule: analytics.events_DDMMYYYY. Here, for example, is the section for January 1, 2010.
CREATE TABLE analytics.events_01012010
event_id BIGINT DEFAULT nextval("analytics.seq_events" ) PRIMARY KEY ,
CHECK (event_time >= TIMESTAMP "2010-01-01 00:00:00" AND event_time< TIMESTAMP "2010-01-02 00:00:00" )
) INHERITS(analytics.events);

* This source code was highlighted with Source Code Highlighter.


When creating a section, we explicitly set the event_id field (PRIMARY KEY is not inherited) and create a CHECK CONSTRAINT on the event_time field so as not to insert unnecessary things.

3. Create an index on the event_time field. When we partition the table, we expect that most queries against the events table will use a condition on the event_time field, so an index on this field will help us a lot.

CREATE INDEX events_01012010_event_time_idx ON analytics.events_01012010 USING btree(event_time);

* This source code was highlighted with Source Code Highlighter.


4. We want to ensure that when inserted into the main table, the data ends up in the section intended for it. To do this, we do the following trick - we create a trigger that will control data flows.
CREATE OR REPLACE FUNCTION analytics.events_insert_trigger()
RETURNS TRIGGER AS $$
BEGIN
IF (NEW .event_time >= TIMESTAMP "2010-01-01 00:00:00" AND
NEW .event_time< TIMESTAMP "2010-01-02 00:00:00" ) THEN
INSERT INTO analytics.events_01012010 VALUES (NEW .*);
ELSE
RAISE EXCEPTION "Date % is out of range. Fix analytics.events_insert_trigger", NEW .event_time;
END IF ;
RETURN NULL ;
END ;
$$
LANGUAGE plpgsql;

* This source code was highlighted with Source Code Highlighter.


CREATE TRIGGER events_before_insert
BEFORE INSERT ON analytics.events
FOR EACH ROW EXECUTE PROCEDURE analytics.events_insert_trigger();

* This source code was highlighted with Source Code Highlighter.

5. Everything is ready, we now have a partitioned table called analytics.events. We can start furiously analyzing her data. By the way, we created CHECK constraints not only to protect sections from incorrect data. Postgresql can use them when creating a query plan (however, with a live index on event_time, the gain will be minimal), just use the constraint_exclusion directive:

SET constraint_exclusion = on ;
SELECT * FROM analytics.events WHERE event_time > CURRENT_DATE ;

* This source code was highlighted with Source Code Highlighter.

End of the first part
So what do we have? Let's go point by point:
1. The events table, divided into sections, analysis of available data for the last day becomes easier and faster.
2. The horror of realizing that all this needs to be supported somehow, sections need to be created on time, not forgetting to change the trigger accordingly.

I’ll tell you how to easily and carefree work with partitioned tables in the second part.

UPD1: Replaced partitioning with partitioning
UPD2:
Based on a comment from one of the readers who, unfortunately, does not have an account on Habré:
There are several issues associated with inheritance that should be taken into account when designing. Partitions do not inherit the primary key and foreign keys on their columns. That is, when creating a section, you need to explicitly create PRIMARY KEY and FOREIGN KEYs for the section columns. I would like to note on my own that creating FOREIGN KEY on the columns of a partitioned table is not the best way. In most cases, the partitioned table is a "fact table" and itself refers to a "dimension" of the table.

In this article, I'm going to demonstrate the specifics of query execution plans when accessing partitioned tables. Note that there is a big difference between partitioned tables (which only became available in SQL Server 2005) and partitioned views (which were available in SQL Server 2000 and are still available in SQL Server 2005 and later versions). I will demonstrate the specifics of query plans for partitioned views in another article.

View table

Let's create a simple partitioned table:

create partition function pf(int) as range for values ​​(0, 10, 100)

create partition scheme ps as partition pf all to ()

create table t (a int, b int) on ps(a)

This script creates a table with four partitions. SQL Server assigned values ​​to the IDs of each of the four partitions as shown in the table:

PtnId Values
1 t.a<= 0
2 0 < t.a <= 10
3 10 < t.a <= 100
4 100 < t.a

Now let's look at a query plan that would force the optimizer to use a Table Scan:


……|–Constant Scan(VALUES:(((1)),((2)),((3)),((4))))
…….|–Table Scan(OBJECT:([t]))

In the above plan, SQL Server explicitly specifies all partition IDs in the "Constant Scan" statement, which implements the table scan and supplies the data to the nested loop join operator. It's worth remembering here that the nested loop join operator traverses the internal table (in this case, a full table scan) once for each value from the outer table (in our case, "Constant Scan"). So we scan the table four times; once for each section ID.

It should also be noted that the connection of nested loops clearly shows that the outer table is the values ​​of the column where the partition IDs are stored. Although it is not immediately visible in the text view of the execution plan (unfortunately, we sometimes do not notice this information), the table scan uses a column with the IDs of the sections that are selected to perform the scan and determine which section to scan. This information is always available in the graphical execution plan (you need to look at the properties of the table view operator), as well as in the XML representation of the query execution plan:

Static section filtering

Consider the following query:

select * from t where a< 100

|–Nested Loops(Inner Join, OUTER REFERENCES:() PARTITION ID:())
…….|–Constant Scan(VALUES:(((1)),((2)),((3))))
<(100)) PARTITION ID:())

Predicate "a"<100» явно исключает все строки для секции со значением идентификатора равным 4. В данном случае, нет смысла в просмотре соответствующей секции, поскольку ни одна из строк этой секции не удовлетворяет условию предиката. Оптимизатор учитывает этот факт и исключает эту секцию из плана исполнения запроса. В операторе «Constant Scan» указаны только три секции. У нас принято называть это статической фильтрацией секций (static partition elimination), поскольку мы знаем, что во время компиляции список просматриваемых секций остаётся статичным.

If static filtering eliminates all but one partition, we won't need the Constant Scan and Nested Loops Join operators at all:

select * from t where a< 0

|–Table Scan(OBJECT:([t]), WHERE:([t].[a]<(0)) PARTITION ID:((1)))

Note that the "PARTITION ID:((1))" statement, which specifies the ID of the partition to be scanned, is now part of the Table Scan statement.

Dynamic section filtering

In some cases, SQL Server cannot determine that the composition of the sections being viewed will not change at compile time, but it can see that some sections can be excluded.

select * from t where a< @i

|–Nested Loops(Inner Join, OUTER REFERENCES:() PARTITION ID:())
…….|–Filter(WHERE:(<=RangePartitionNew([@i],(0),(0),(10),(100))))
…….| |–Constant Scan(VALUES:(((1)),((2)),((3)),((4))))
…….|–Table Scan(OBJECT:([t]), WHERE:([t].[a]<[@i]) PARTITION ID:())

This is a parameterized query. Since we do not know the value of the parameter before execution (the fact that I use a constant as a parameter in the same batch does not change the situation), it is impossible to determine the value of the section identifier for the “Constant Scan” operator at the compilation stage. You may have to look at only section 1, or it will be sections 1 and 2, and so on. Therefore, all four section IDs are specified in this statement, and we use runtime filtering of section IDs. We call this Dynamic Partition Elimination.

The filter compares each section identifier with the result of the special function “RangePartitionNew”. This function calculates the results of applying the partitioning function to the parameter value. The arguments to this function (from left to right) are:

  • the value (in this case the @i parameter) that we want to display on the section ID;
  • a boolean flag indicating whether the partitioning function displays left (0) or right (1) boundary values;
  • section boundary values ​​(in this case, these are 0, 10, and 100).

In this example, since @i is 0, the result of "RangePartitionNew" is 1. Thus, we are only scanning the partition with ID 1. Note that unlike the static partition filtering example, although we are only scanning one partition, we are -we still have “Constant Scan” and “Nested Loops Join”. The reason we need these statements is because we don't know the sections that will be scanned until the execution stage.

In some cases, the optimizer can already determine at compile time that we will scan only one section, even if it cannot determine which one. For example, if a query uses a partition key equivalence predicate, then we know that only one partition can satisfy that condition. Therefore, despite the fact that we had to have dynamic filtering of sections, we no longer need the “Constant Scan” and “Nested Loops Join” operators. Example:

select * from t where a = @i

|–Table Scan(OBJECT:([t]), WHERE:([t].[a]=[@i]) PARTITION ID:(RangePartitionNew([@i],(0),(0),(10 ),(100))))

Combination of static and dynamic section filtering

SQL Server can combine static and dynamic partition filtering in a single query plan:

select * from t where a > 0 and a< @i

|–Nested Loops(Inner Join, OUTER REFERENCES:() PARTITION ID:())
……|–Filter(WHERE:(<=RangePartitionNew([@i],(0),(0),(10),(100))))
……| |–Constant Scan(VALUES:(((2)),((3)),((4))))
……|–Table Scan(OBJECT:([t]), WHERE:([t].[a]<[@i] AND [t].[a]>(0)) PARTITION ID:())

Note that in the last plan there is static filtering of the ID=1 section using “Constant Scan”, and there is also dynamic filtering for other sections defined by predicates.

$partition

You can explicitly call the RangePartitionNew function using $partition:

select *, $partition.pf(a) from t

|–Compute Scalar(DEFINE:(=RangePartitionNew([t].[a],(0),(0),(10),(100))))
……|–Nested Loops(Inner Join, OUTER REFERENCES:() PARTITION ID:())
………..|–Constant Scan(VALUES:(((1)),((2)),((3)),((4))))
………..|–Table Scan(OBJECT:([t]))

A distinctive feature of this query execution plan is the appearance of the Compute Scalar operator.

Additional Information

Page 23 of 33

Range Partitioning - Sales Details

The use of sales information is often variable. Generally, current month data is live data; data from previous months is largely data for analysis. Most often, the analysis is performed monthly, quarterly, or annually. Because different analysts may require significant amounts of different analytical data at the same time, partitioning is best suited to isolating their activities. In the scenario below, data is collected from 283 nodes and supplied as two standard ASCII files. All files are sent to the central file server no later than 3.00 am on the first day of each month. File sizes fluctuate, but average approximately 86,000 orders per month. Each order averages 2.63 items, so the OrderDetails files average 226,180 lines. Approximately 25 million new orders and 64 million order item lines are added each month. The history analysis server supports data for the last 2 years. Two years of data is just under 600 million orders and over 1.5 billion rows in the OrderDetails table. Since data is often analyzed by comparing the performance of months in the same quarter, or the same months in previous years, range partitioning is chosen. Month is selected as the range size.

Based on Diagram 11 (Steps to Create a Partitioned Table), we decided to partition the table using range partitioning on the OrderDate column. Our analysts mainly combine and analyze data from the last 6 months, or the last 3 months of the current and last year (for example, January-March 2003 plus January-March 2004). To maximize disk striping, and at the same time isolate most data groupings, several file groups will be located on one physical disk, but they will be offset by six months in order to reduce the number of conflicts when sharing resources. The current month is October 2004, and all 283 stand-alone offices manage their current sales locally. The server stores data from October 2002 to September 2004 inclusive. To take advantage of the new 16-processor system and SAN (Storage Area Network), each month's data will be in its own filegroup file, and located on a set of striped mirrors (RAID 1+0). Figure 12 illustrates the placement of data on logical drives.


Figure 12: Orders partitioned table

Each of the 12 logical disks uses a RAID 1+0 configuration, so the total number of disks required for the Orders and OrderDetails tables is 48. However, the SAN supports up to 78 disks, so the remaining 30 disks are used for transaction log, TempDB, system databases and other small tables such as Customers (9 million records) and Products (386,750 records), etc. The Orders and OrderDetails tables will use the same boundary conditions and the same disk placement; in fact, they will use the same partitioning scheme. As a result (look at the two logical drives E:\ and F:\ in Figure 13), data from the Orders and OrderDetails tables for the same months will be located on the same drives:


Figure 13: Allocation of range partition extents on disk arrays

Although it looks complicated, it is all quite simple to implement. The hardest part about creating our partitioned table is delivering data from a large number of sources - 283 stores must have a standard delivery mechanism. However, there is only one Orders table and one OrderDetails table on the central server. To turn both tables into partitioned tables, we must first create a function and a partitioning scheme. The partition scheme determines the physical layout of the partitions on the disks, so filegroups must also exist. Since our tables require filegroups, the next step is to create them. The syntax for creating each filegroup is identical to the following, however, all twenty-four filegroups must be created this way. You can change the drive names/locations to a single drive to test and learn the syntax. Make sure you correct the file sizes to MB instead of GB, and select a smaller initial file size based on your available disk space. Twenty-four files and filegroups will be created in the SalesDB database. All will have similar syntax, except for the location, filename, and filegroup name:

ALTER DATABASE SalesDB
ADD FILE
(NAME = N"SalesDBFG1File1" ,
FILENAME=N "E:\SalesDB\SalesDBFG1File1.ndf",
SIZE = 20GB,
MAXSIZE = 35GB,
FILEGROWTH = 5GB)
TO FILEGROUP
GO

Once all twenty-four files and filegroups have been created, you can define the partitioning function and scheme. You can verify that your files and filegroups are created by using the sp_helpfile and sp_helpfilegroup system stored procedures.

The section function will be defined on the OrderDate column with datetime data type. In order for both tables to be partitioned on the OrderDate column, this column must be present in both tables. In reality, the partition key values ​​of both tables (if both tables are partitioned using the same key) will be duplicates of each other; however, this is necessary to take advantage of alignment, and in most cases the size of the key columns will be relatively small (the datetime field is only 8 bytes). As already described in the "CREATE PARTITION FUNCTION for range partitions" chapter, our function will be a range partition function whose first boundary condition will be in the first (LEFT) partition.

CREATE PARTITION FUNCTION TwoYearDateRangePFN(datetime )
AS
RANGE LEFT FOR VALUES ("20021031 23:59:59.997" , -- Oct 2002
"20021130 23:59:59.997" , -- Nov 2002
"20021231 23:59:59.997" , -- Dec 2002
"20030131 23:59:59.997" , -- Jan 2003
"20030228 23:59:59.997" , -- Feb 2003
"20030331 23:59:59.997" , -- Mar 2003
"20030430 23:59:59.997" , -- Apr 2003
"20030531 23:59:59.997" , -- May 2003
"20030630 23:59:59.997" , -- Jun 2003
"20030731 23:59:59.997" , -- Jul 2003
"20030831 23:59:59.997" , -- Aug 2003
"20030930 23:59:59.997" , -- Sep 2003
"20031031 23:59:59.997" , -- Oct 2003
"20031130 23:59:59.997" , -- Nov 2003
"20031231 23:59:59.997" , -- Dec 2003
"20040131 23:59:59.997" , -- Jan 2004
"20040229 23:59:59.997" , -- Feb 2004
"20040331 23:59:59.997" , -- Mar 2004
"20040430 23:59:59.997" , -- Apr 2004
"20040531 23:59:59.997" , -- May 2004
"20040630 23:59:59.997" , -- Jun 2004
"20040731 23:59:59.997" , -- Jul 2004
"20040831 23:59:59.997" , -- Aug 2004
"20040930 23:59:59.997" ) -- Sep 2004
GO

Since both the far left and far right edge cases are covered, this partitioning function effectively creates 25 partitions. The table will support a 25th partition, which will remain empty. This empty section does not require any special filegroup, since no data should ever go into it. To ensure that no data gets into it, constraint will limit the data range of this table. In order to direct data to the appropriate disks, a partitioning scheme is used that maps partitions into file groups. The partitioning scheme will use an explicit filegroup definition for each of the 24 filegroups containing data, and PRIMARY for the 25th empty partition.

CREATE PARTITION SCHEME
AS
PARTITION TwoYearDateRangePFN TO
(, , , , , ,
, , , ,,,
,,,,,,
,,,,,,
GO

The table can be created with the same syntax that previous releases of SQL Server supported - using a default or user-defined filegroup (to create a NOT partitioned table) - or using a schema (to create a partitioned table). As for which option is preferable (even if this table becomes partitioned in the future), it all depends on how the table will be populated and how many partitions you intend to manipulate. Filling the heap and then creating a clustered index on it will likely provide better performance than loading onto a table containing the clustered index. In addition, on multiprocessor systems, you can load data into a table in parallel, and then build indexes in parallel, too. As an example, let's create an Orders table and load data into it using INSERT ... SELECT statements. To create the Orders table as partitioned, define the partition scheme in the ON clause of the CREATE TABLE statement.

CREATE TABLE SalesDB..
NOT NULL
NULL
NULL
NULL
NULL
NULL
NOT NULL
NULL
NULL
NULL
NOT NULL
NULL
CONSTRAINT OrdersRangeYear
CHECK ( >= "20021001"
AND< "20041001" ),
NULL

GO

Because the OrderDetails table is going to use the same schema, it must include an OrderDate column.

CREATE TABLE .(
NOT NULL
NOT NULL
NULL
NULL
NULL
NULL
NULL
NOT NULL
CONSTRAINT OrderDetailsRangeYearCK
CHECK ( >= "20021001"
AND< "20041001" ),
NULL
NOT NULL
CONSTRAINT
DEFAULT(getdate()),
AS ((*)),
AS ((-))
GO

The next step is to load the tables with data from the new AdventureWorks training database. Make sure you have installed the AdventureWorks database.

INSERT dbo.
SELECT o.
, o.
, o.
, o.
, o.
, o.
, o.
, o.
, o.
, o.
, o.
, o.
, o.
FROM AdventureWorks.Purchasing.PurchaseOrderHeader AS o
WHERE ( >= "20021001"
AND< "20041001" )
GO

INSERT dbo.
SELECT od.PurchaseOrderID
od.LineNumber
od.ProductID
od.UnitPrice
od.OrderQty
od.ReceivedQty
, od.RejectedQty
o.OrderDate
od.DueDate
od.ModifiedDate
FROM AdventureWorks.Purchasing.PurchaseOrderDetail AS od
JOIN AdventureWorks.Purchasing.PurchaseOrderHeader AS o
ON o.PurchaseOrderID = od.PurchaseOrderID
WHERE (o. >= "20021001"
AND o.< "20041001" )
GO

Now that you've loaded data into a partitioned table, you can use a new built-in system function to determine the partition on which the data will reside. The following query for each of the sections containing data returns information about how many rows are contained in each of the sections, as well as the minimum and maximum values ​​of the OrderDate field. A section that does not contain rows will not be included in the final result.

SELECT $partition.TwoYearDateRangePFN(o.OrderDate)
AS
, min (o.OrderDate) AS
, max (o.OrderDate) AS
FROM dbo.Orders AS o
GROUP BY $partition.TwoYearDateRangePFN(o.OrderDate)
ORDER BY
GO

SELECT $partition.TwoYearDateRangePFN(od.OrderDate)
AS
, min (od.OrderDate) AS
, max (od.OrderDate) AS
, count (*) AS
FROM dbo.OrderDetails AS od
GROUP BY $partition.TwoYearDateRangePFN(od.OrderDate)
ORDER BY
GO

Finally, now that you have loaded the data, you can create a clustered index and a Foreign key between the OrderDetails and Orders tables. In this case, the clustered index will be built on the Primary Key in the same way that you identify both of these tables by their partition key (for OrderDetails, you will add a LineNumber column to the index for uniqueness). By default, when indexes are built on a partitioned table, they are aligned to the partitioned table according to the same partitioning scheme; It is not necessary to explicitly specify the scheme.

ALTER TABLE ORDERS
ADD CONSTRAINT OrdersPK

GO




GO

The complete syntax defining the partitioning scheme would look like this:

ALTER TABLE ORDERS
ADD CONSTRAINT OrdersPK
PRIMARY KEY CLUSTERED (OrderDate, OrderID)
ON TwoYearDateRangePScheme(OrderDate)
GO

ALTER TABLE dbo.OrderDetails
ADD CONSTRAINT OrderDetailsPK
PRIMARY KEY CLUSTERED (OrderDate, OrderID, LineNumber)
ON TwoYearDateRangePScheme(OrderDate)
GO

The Oracle DBMS has the ability partition views. The basic idea is simple. Let the physical table be split into several tables (optionally using methods sectioning tables) according to partitioning criteria, which makes query processing more efficient. We will call the partition criterion partitioning predicate. You can then create and customize views to make it easier for the user to access the data in these tables. The presentation section is defined according to the range of values partition key. Queries that use a value range to retrieve data from view partitions will only access partitions that match the value ranges partition key.

View sections can be defined by predicates sectioning, specified either using a CHECK constraint or a WHERE clause. Let's show how both techniques can be applied using the example of a slightly modified Sales table, which we looked at in the previous section. Let's say that sales data for a calendar year is placed in four separate tables, each corresponding to a quarter of the year - Q1_Sales, Q2_Sales, Q3_Sales, and Q4_Sales.

Example 20.14.

Using the CHECK constraint. Using the ALTER TABLE command, you can add restrictions to the Sale Date (s_date) column of each table so that its rows correspond to one of the quarters of the year. The sales view we then create allows us to access these tables, either one or all of them together.

ALTER TABLE Q1_Sales ADD CONSTRAINT C0 CHECK (s_date BETWEEN "jan-1-2002" AND "mar-31-2002"); ALTER TABLE Q2_Sales ADD CONSTRAINT C1 CHECK (s_date BETWEEN "apr 1-2002" AND "jun-30-2002"); ALTER TABLE Q3_Sales ADD CONSTRAINT C2 check (s_date BETWEEN "jul-1-2002" AND "sep-30-2002"); ALTER TABLE Q4_Sales ADD CONSTRAINT C3 check (s_date BETWEEN "oct-1-2002" AND "dec-31-2002"); CREATE VIEW sales_v AS SELECT * FROM Q1_Sales UNION ALL SELECT * FROM Q2_Sales UNION ALL SELECT * FROM Q3_Sales UNION ALL SELECT * FROM Q4_Sales;

The advantage of this view partitioning is that the CHECK constraint predicate is not evaluated for each query row. Such predicates prevent the insertion into tables of rows that do not meet the predicate criterion. Strings matching the predicate sectioning, are retrieved from the database faster.

Example 20.15.

Partitioning Views using the WHERE clause. Let's create a view for the same tables as in the example above.

CREATE VIEW sales_v AS SELECT * FROM Q1_Sales WHERE s_date BETWEEN "jan-1-2002" AND "mar-31-2002" UNION ALL SELECT * FROM Q2_Sales WHERE s_date BETWEEN "apr-1-2002" AND "jun-30-2002" UNION ALL SELECT * FROM Q3_Sales WHERE s_date BETWEEN "jul-1-2002" AND "sep-30-2002" UNION ALL SELECT * FROM Q4_Sales WHERE s_date BETWEEN "oct-1-2002" AND "dec-31-2002";

Method view partitioning using a WHERE clause has some disadvantages. Firstly, the criterion sectioning is checked at runtime for all rows in all partitions covered by the query. Secondly, users may mistakenly insert a row into the wrong section, i.e. Insert a row from the first quarter into the third quarter, which will result in incorrect data selection for those quarters.

This technique also has advantages over using the CHECK constraint. You can place the section that matches the WHERE predicate on a remote database. A fragment of the definition of repose is given below.

When deciding to create, you need to keep the following factors in mind.

  • and data deletion, work at the section level, rather than the entire base table.
  • Access to one of the sections has no effect on the data in the other sections.
  • Oracle DBMS has the necessary built-in capabilities for recognizing partitioned views.
  • Partitioning Views very useful when working with tables containing a large amount of historical data.

Partitioning tables in a DBMS of the MS SQL Server family

Creating Partitioned Tables

The DBMS of the MS SQL Server family also supports sectioning tables, indexes and views. However, unlike the Oracle family of DBMSs, sectioning in the DBMS of the MS SQL Server family it is performed according to a unified scheme.

In MS SQL Server, all tables and indexes in the database are considered partitioned, even if they consist of only one section. In fact, partitions represent the basic organizational unit in the physical architecture of tables and indexes. This means that the logical and physical architecture of tables and indexes that include multiple partitions closely mirrors the architecture of tables and indexes that consist of a single partition.

Sectioning tables and indexes are set rigidly at the row level ( sectioning columnar is not allowed) and allows access through a single entry point (table name or index name) so that application code does not need to know the number of partitions. Sectioning can be carried out on the base table, as well as on the indexes associated with it.

Each value range in a section has boundaries, which are defined in the FOR VALUES statement. If the date of sale was June 23, 2006, then the line will be stored in section 2 (P2).

Now let's create partitioning scheme. Partitioning scheme maps sections to different filegroups (named MyFilegroup1, MyFilegroup2, MyFilegroup3, MyFilegroup4) as shown in the following command:

CREATE PARTITION SCHEME MyPartitionScheme AS MyPartitionFunction TO (MyFilegroup1, MyFilegroup2, MyFilegroup3, MyFilegroup4)

MyPartitionScheme is the name partitioning schemes, and the name MyPartitionFunction defines partitioning function. This command displays data in partitions that are associated with one or more filegroups. Data rows with Date_of_Event date column values ​​before 1/1/05 are associated with MyFilegroup1 . Rows in this column with values ​​greater than or equal to 1/1/05 and before 1/1/07 are assigned to MyFilegroup2 . Rows with values ​​greater than or equal to 1/1/07 and before 1/1/09 are associated with MyFilegroup3 . All other rows with values ​​greater than or equal to 1/1/09 are associated with MyFilegroup4 .

For each set of boundary values ​​(which are specified by the FOR VALUES condition partitioning functions) the number of sections will be equal to "Number of boundary values" + 1 section. The previous CREATE PARTITION SCHEME statement includes three constraints and four partitions. Regardless of whether partitions are created with RANGE RIGHT or RANGE LEFT , the number of partitions will always be "Number of Boundary Values" + 1, up to a maximum of 1000 partitions per table.

Now we can create partitioned table facts "Sales" (SALES). Creation partitioned table not much different from creating a regular table, you just need to reference the name partitioning schemes in the ON condition as shown in the command below.

CREATE TABLE SALES (Sales_ШВ bigint identity (1, 1) primary not clustered NOT NULL, Cust_ID bigint null, Prod_ID bigint null, Store_ID bigint null, REG_ID char(10) null, Time_of_Event time null, Quantity integer not null, Amount dec(8 ,2) not null, Date_of_Event date NOT NULL) ON MyPartitionScheme (Date_of_Event)

Defining a name partitioning schemes, the designer specifies that this table is indexed. This allows the designer to design the index structure based on the partitioned data rather than on the entire table's data. Creating partitioned indexes entails creating separate balanced trees on the partitioned indexes. By splitting the indexes, smaller indexes are created, and database administrator or the data storage becomes easier to maintain when changing, adding and deleting data.

When creating partitioned indexes, you can create aligned or unaligned indexes. Aligned indexes provide a direct link to the partitioned data in the table. In the case of unaligned indexes, another one is selected partitioning scheme.

Of the two methods, the aligned index is preferred and is chosen by default if, after creation partitioned table indexes are created without specifying another partitioning schemes. Using aligned indexes provides the necessary flexibility to create additional partitions in a table, and also allows you to transfer the ownership of a particular partition to another table. To solve most problems related to sectioning, it is enough to apply for indexes partitioning scheme tables.

Example. 20.19.

Let's create a partitioned nonclustered index on partitioned table "Sales" (SALES) from the previous example 20.18.

CREATE PARTITION SCHEME Index_primary_Left_Scheme AS PARTITION Index_Left_Partition ALL TO ()

Now let's execute the create index command as shown below.

CREATE NONCLUSTERED INDEX cl_multiple_partition ON multiple_partition(Cust_ID) ON Index_primary_Left_Scheme (Cust_ID)

In that nonclustered index the "Customer ID" (Cust_ID) column is used as an index key, which is not partition key"Sales" tables (SALES).

Decisions about index partitioning are accepted by the data warehouse designer at the design stage or by the data warehouse administrator at the data storage operation stage. Purpose index partitioning is either to ensure query performance or to simplify index maintenance procedures.