Data partitioning and clustering for performance

Certify and Increase Opportunity.
Be
Govt. Certified Data Mining and Warehousing
Data partitioning and clustering for performance
Partitioning

Types of Partitioning

This section describes the partitioning features that significantly enhance data access and improve overall application performance. This is especially true for applications that access tables and indexes with millions of rows and many gigabytes of data.

Partitioned tables and indexes facilitate administrative operations by enabling these operations to work on subsets of data. For example, you can add a new partition, organize an existing partition, or drop a partition with minimal to zero interruption to a read-only application.

Using the partitioning methods described in this section can help you tune SQL statements to avoid unnecessary index and table scans (using partition pruning). You can also improve the performance of massive join operations when large amounts of data (for example, several million rows) are joined together by using partition-wise joins. Finally, partitioning data greatly improves manageability of very large databases and dramatically reduces the time required for administrative tasks such as backup and restore.

Granularity in a partitioning scheme can be easily changed by splitting or merging partitions. Thus, if a table’s data is skewed to fill some partitions more than others, the ones that contain more data can be split to achieve a more even distribution. Partitioning also allows one to swap partitions with a table. By being able to easily add, remove, or swap a large amount of data quickly, swapping can be used to keep a large amount of data that is being loaded inaccessible until loading is completed, or can be used as a way to stage data between different phases of use. Some examples are current day’s transactions or online archives.

Partitioning Methods

Oracle offers four partitioning methods:

Each partitioning method has different advantages and design considerations. Thus, each method is more appropriate for a particular situation.

Range Partitioning

Range partitioning maps data to partitions based on ranges of partition key values that you establish for each partition. It is the most common type of partitioning and is often used with dates. For example, you might want to partition sales data into monthly partitions.

Range partitioning maps rows to partitions based on ranges of column values. Range partitioning is defined by the partitioning specification for a table or index in PARTITIONBYRANGE(column_list)and by the partitioning specifications for each individual partition in VALUESLESSTHAN(value_list), where column_listis an ordered list of columns that determines the partition to which a row or an index entry belongs. These columns are called the partitioning columns. The values in the partitioning columns of a particular row constitute that row’s partitioning key.

An ordered list of values for the columns in the column list is called a value_list. Each value must be either a literal or a TO_DATEor RPADfunction with constant arguments. Only the VALUESLESSTHANclause is allowed. This clause specifies a non-inclusive upper bound for the partitions. All partitions, except the first, have an implicit low value specified by the VALUESLESSTHANliteral on the previous partition. Any binary values of the partition key equal to or higher than this literal are added to the next higher partition. Highest partition being where MAXVALUEliteral is defined. Keyword, MAXVALUE, represents a virtual infinite value that sorts higher than any other value for the data type, including the null value.

The following statement creates a table sales_rangethat is range partitioned on the sales_datefield:

CREATE TABLE sales_range 
(salesman_id  NUMBER(5), 
salesman_name VARCHAR2(30), 
sales_amount  NUMBER(10), 
sales_date    DATE)
COMPRESS
PARTITION BY RANGE(sales_date)
(PARTITION sales_jan2000 VALUES LESS THAN(TO_DATE('02/01/2000','DD/MM/YYYY')),
 PARTITION sales_feb2000 VALUES LESS THAN(TO_DATE('03/01/2000','DD/MM/YYYY')),
 PARTITION sales_mar2000 VALUES LESS THAN(TO_DATE('04/01/2000','DD/MM/YYYY')),
 PARTITION sales_apr2000 VALUES LESS THAN(TO_DATE('05/01/2000','DD/MM/YYYY')));

Note:

This table was created with the COMPRESSkeyword, thus all partitions inherit this attribute.

Hash Partitioning

Hash partitioning maps data to partitions based on a hashing algorithm that Oracle applies to a partitioning key that you identify. The hashing algorithm evenly distributes rows among partitions, giving partitions approximately the same size. Hash partitioning is the ideal method for distributing data evenly across devices. Hash partitioning is also an easy-to-use alternative to range partitioning, especially when the data to be partitioned is not historical.

Oracle Database uses a linear hashing algorithm and to prevent data from clustering within specific partitions, you should define the number of partitions by a power of two (for example, 2, 4, 8).

The following statement creates a table sales_hash, which is hash partitioned on the salesman_idfield:

CREATE TABLE sales_hash
(salesman_id  NUMBER(5), 
salesman_name VARCHAR2(30), 
sales_amount  NUMBER(10), 
week_no       NUMBER(2)) 
PARTITION BY HASH(salesman_id) 
PARTITIONS 4;

You cannot define alternate hashing algorithms for partitions.

List Partitioning

List partitioning enables you to explicitly control how rows map to partitions. You do this by specifying a list of discrete values for the partitioning column in the description for each partition. This is different from range partitioning, where a range of values is associated with a partition and with hash partitioning, where you have no control of the row-to-partition mapping. The advantage of list partitioning is that you can group and organize unordered and unrelated sets of data in a natural way. The following example creates a list partitioned table grouping states according to their sales regions:

CREATE TABLE sales_list
(salesman_id  NUMBER(5), 
salesman_name VARCHAR2(30),
sales_state   VARCHAR2(20),
sales_amount  NUMBER(10), 
sales_date    DATE)
PARTITION BY LIST(sales_state)
(PARTITION sales_west VALUES('California', 'Hawaii') COMPRESS,
 PARTITION sales_east VALUES('New York', 'Virginia', 'Florida'),
 PARTITION sales_central VALUES('Texas', 'Illinois'));

Partition sales_westis furthermore created as a single compressed partition within sales_list. For details about partitioning and compression.

An additional capability with list partitioning is that you can use a default partition, so that all rows that do not map to any other partition do not generate an error. For example, modifying the previous example, you can create a default partition as follows:

CREATE TABLE sales_list
(salesman_id  NUMBER(5), 
salesman_name VARCHAR2(30),
sales_state   VARCHAR2(20),
sales_amount  NUMBER(10), 
sales_date    DATE)
PARTITION BY LIST(sales_state)
(PARTITION sales_west VALUES('California', 'Hawaii'),
 PARTITION sales_east VALUES ('New York', 'Virginia', 'Florida'),
 PARTITION sales_central VALUES('Texas', 'Illinois'),
 PARTITION sales_other VALUES(DEFAULT));

Composite Partitioning

Composite partitioning combines range and hash or list partitioning. Oracle Database first distributes data into partitions according to boundaries established by the partition ranges. Then, for range-hash partitioning, Oracle uses a hashing algorithm to further divide the data into subpartitions within each range partition. For range-list partitioning, Oracle divides the data into subpartitions within each range partition based on the explicit list you chose.

Index Partitioning

You can choose whether or not to inherit the partitioning strategy of the underlying tables. You can create both local and global indexes on a table partitioned by range, hash, or composite methods. Local indexes inherit the partitioning attributes of their related tables. For example, if you create a local index on a composite table, Oracle automatically partitions the local index using the composite method.

Performance Issues for Range, List, Hash, and Composite Partitioning

This section describes performance issues for:

When to Use Range Partitioning

Range partitioning is a convenient method for partitioning historical data. The boundaries of range partitions define the ordering of the partitions in the tables or indexes.

Range partitioning is usually used to organize data by time intervals on a column of type DATE. Thus, most SQL statements accessing range partitions focus on timeframes. An example of this is a SQL statement similar to “select data from a particular period in time.” In such a scenario, if each partition represents data for one month, the query “find data of month 98-DEC” needs to access only the December partition of year 98. This reduces the amount of data scanned to a fraction of the total data available, an optimization method called partition pruning.

Range partitioning is also ideal when you periodically load new data and purge old data. It is easy to add or drop partitions.

It is common to keep a rolling window of data, for example keeping the past 36 months’ worth of data online. Range partitioning simplifies this process. To add data from a new month, you load it into a separate table, clean it, index it, and then add it to the range-partitioned table using the EXCHANGEPARTITIONstatement, all while the original table remains online. Once you add the new partition, you can drop the trailing month with the DROPPARTITIONstatement. The alternative to using the DROPPARTITIONstatement can be to archive the partition and make it read only, but this works only when your partitions are in separate tablespaces.

In conclusion, consider using range partitioning when:

  • Very large tables are frequently scanned by a range predicate on a good partitioning column, such as ORDER_DATEor PURCHASE_DATE. Partitioning the table on that column enables partition pruning.
  • You want to maintain a rolling window of data.
  • You cannot complete administrative operations, such as backup and restore, on large tables in an allotted time frame, but you can divide them into smaller logical pieces based on the partition range column.

The following example creates the table salestablefor a period of two years, 1999 and 2000, and partitions it by range according to the column s_salesdateto separate the data into eight quarters, each corresponding to a partition.

CREATE TABLE salestable
  (s_productid  NUMBER,
   s_saledate   DATE,
   s_custid     NUMBER,
   s_totalprice NUMBER)
PARTITION BY RANGE(s_saledate)
 (PARTITION sal99q1 VALUES LESS THAN (TO_DATE('01-APR-1999', 'DD-MON-YYYY')),
  PARTITION sal99q2 VALUES LESS THAN (TO_DATE('01-JUL-1999', 'DD-MON-YYYY')),
  PARTITION sal99q3 VALUES LESS THAN (TO_DATE('01-OCT-1999', 'DD-MON-YYYY')),
  PARTITION sal99q4 VALUES LESS THAN (TO_DATE('01-JAN-2000', 'DD-MON-YYYY')),
  PARTITION sal00q1 VALUES LESS THAN (TO_DATE('01-APR-2000', 'DD-MON-YYYY')),
  PARTITION sal00q2 VALUES LESS THAN (TO_DATE('01-JUL-2000', 'DD-MON-YYYY')),
  PARTITION sal00q3 VALUES LESS THAN (TO_DATE('01-OCT-2000', 'DD-MON-YYYY')),
  PARTITION sal00q4 VALUES LESS THAN (TO_DATE('01-JAN-2001', 'DD-MON-YYYY')));

When to Use Hash Partitioning

The way Oracle Database distributes data in hash partitions does not correspond to a business or a logical view of the data, as it does in range partitioning. Consequently, hash partitioning is not an effective way to manage historical data. However, hash partitions share some performance characteristics with range partitions. For example, partition pruning is limited to equality predicates. You can also use partition-wise joins, parallel index access, and parallel DML.

As a general rule, use hash partitioning for the following purposes:

  • To improve the availability and manageability of large tables.
  • To avoid data skew among partitions. Hash partitioning is an effective means of distributing data because Oracle hashes the data into a number of partitions, each of which can reside on a separate device. Thus, data is evenly spread over a sufficient number of devices to maximize I/O throughput. Similarly, you can use hash partitioning to distribute evenly data among the nodes of an MPP platform that uses Oracle Real Application Clusters.
  • If it is important to use partition pruning and partition-wise joins according to a partitioning key that is mostly constrained by a distinct value or value list.

    Note:

    In hash partitioning, partition pruning uses only equality or IN-list predicates.

If you add or merge a hashed partition, Oracle automatically rearranges the rows to reflect the change in the number of partitions and subpartitions. The hash function that Oracle uses is especially designed to limit the cost of this reorganization. Instead of reshuffling all the rows in the table, Oracles uses an “add partition” logic that splits one and only one of the existing hashed partitions. Conversely, Oracle coalesces a partition by merging two existing hashed partitions.

Although the hash function’s use of “add partition” logic dramatically improves the manageability of hash partitioned tables, it means that the hash function can cause a skew if the number of partitions of a hash partitioned table, or the number of subpartitions in each partition of a composite table, is not a power of two. In the worst case, the largest partition can be twice the size of the smallest. So for optimal performance, create a number of partitions and subpartitions for each partition that is a power of two. For example, 2, 4, 8, 16, 32, 64, 128, and so on.

The following example creates four hashed partitions for the table sales_hashusing the column s_productidas the partition key:

CREATE TABLE sales_hash
  (s_productid  NUMBER,
   s_saledate   DATE,
   s_custid     NUMBER,
   s_totalprice NUMBER)
PARTITION BY HASH(s_productid)
PARTITIONS 4;

Specify partition names if you want to choose the names of the partitions. Otherwise, Oracle automatically generates internal names for the partitions. Also, you can use the STOREINclause to assign hash partitions to tablespaces in a round-robin manner.

When to Use List Partitioning

You should use list partitioning when you want to specifically map rows to partitions based on discrete values.

Unlike range and hash partitioning, multi-column partition keys are not supported for list partitioning. If a table is partitioned by list, the partitioning key can only consist of a single column of the table.

When to Use Composite Range-Hash Partitioning

Composite range-hash partitioning offers the benefits of both range and hash partitioning. With composite range-hash partitioning, Oracle first partitions by range. Then, within each range, Oracle creates subpartitions and distributes data within them using the same hashing algorithm it uses for hash partitioned tables.

Data placed in composite partitions is logically ordered only by the boundaries that define the range level partitions. The partitioning of data within each partition has no logical organization beyond the identity of the partition to which the subpartitions belong.

Consequently, tables and local indexes partitioned using the composite range-hash method:

  • Support historical data at the partition level.
  • Support the use of subpartitions as units of parallelism for parallel operations such as PDML or space management and backup and recovery.
  • Are eligible for partition pruning and partition-wise joins on the range and hash partitions.

Using Composite Range-Hash Partitioning

Use the composite range-hash partitioning method for tables and local indexes if:

  • Partitions must have a logical meaning to efficiently support historical data
  • The contents of a partition can be spread across multiple tablespaces, devices, or nodes (of an MPP system)
  • You require both partition pruning and partition-wise joins even when the pruning and join predicates use different columns of the partitioned table
  • You require a degree of parallelism that is greater than the number of partitions for backup, recovery, and parallel operations

Most large tables in a data warehouse should use range partitioning. Composite partitioning should be used for very large tables or for data warehouses with a well-defined need for these conditions. When using the composite method, Oracle stores each subpartition on a different segment. Thus, the subpartitions may have properties that differ from the properties of the table or from the partition to which the subpartitions belong.

The following example partitions the table sales_range_hashby range on the column s_saledateto create four partitions that order data by time. Then, within each range partition, the data is further subdivided into 16 subpartitions by hash on the column s_productid:

CREATE TABLE sales_range_hash(
  s_productid  NUMBER,
  s_saledate   DATE,
  s_custid     NUMBER,
  s_totalprice NUMBER)
PARTITION BY RANGE (s_saledate)
SUBPARTITION BY HASH (s_productid) SUBPARTITIONS 8
 (PARTITION sal99q1 VALUES LESS THAN (TO_DATE('01-APR-1999', 'DD-MON-YYYY')),
  PARTITION sal99q2 VALUES LESS THAN (TO_DATE('01-JUL-1999', 'DD-MON-YYYY')),
  PARTITION sal99q3 VALUES LESS THAN (TO_DATE('01-OCT-1999', 'DD-MON-YYYY')),
  PARTITION sal99q4 VALUES LESS THAN (TO_DATE('01-JAN-2000', 'DD-MON-YYYY')));

Each hashed subpartition contains sales data for a single quarter ordered by product code. The total number of subpartitions is 4×8 or 32.

In addition to this syntax, you can create subpartitions by using a subpartition template. This offers better ease in naming and control of location for tablespaces and subpartitions. The following statement illustrates this:

CREATE TABLE sales_range_hash(
  s_productid  NUMBER,
  s_saledate   DATE,
  s_custid     NUMBER,
  s_totalprice NUMBER)
PARTITION BY RANGE (s_saledate)
SUBPARTITION BY HASH (s_productid)
SUBPARTITION TEMPLATE(
SUBPARTITION sp1 TABLESPACE tbs1,
SUBPARTITION sp2 TABLESPACE tbs2,
SUBPARTITION sp3 TABLESPACE tbs3,
SUBPARTITION sp4 TABLESPACE tbs4,
SUBPARTITION sp5 TABLESPACE tbs5,
SUBPARTITION sp6 TABLESPACE tbs6,
SUBPARTITION sp7 TABLESPACE tbs7,
SUBPARTITION sp8 TABLESPACE tbs8)
(PARTITION sal99q1 VALUES LESS THAN (TO_DATE('01-APR-1999', 'DD-MON-YYYY')),
   PARTITION sal99q2 VALUES LESS THAN (TO_DATE('01-JUL-1999', 'DD-MON-YYYY')),
   PARTITION sal99q3 VALUES LESS THAN (TO_DATE('01-OCT-1999', 'DD-MON-YYYY')),
   PARTITION sal99q4 VALUES LESS THAN (TO_DATE('01-JAN-2000', 'DD-MON-YYYY')));

In this example, every partition has the same number of subpartitions. A sample mapping for sal99q1is illustrated in Table 5-1. Similar mappings exist for sal99q2through sal99q4.

Table 5-1 Subpartition Mapping

Subpartition Tablespace
sal99q1_sp1 tbs1
sal99q1_sp2 tbs2
sal99q1_sp3 tbs3
sal99q1_sp4 tbs4
sal99q1_sp5 tbs5
sal99q1_sp6 tbs6
sal99q1_sp7 tbs7
sal99q1_sp8 tbs8

When to Use Composite Range-List Partitioning

Composite range-list partitioning offers the benefits of both range and list partitioning. With composite range-list partitioning, Oracle first partitions by range. Then, within each range, Oracle creates subpartitions and distributes data within them to organize sets of data in a natural way as assigned by the list.

Data placed in composite partitions is logically ordered only by the boundaries that define the range level partitions.

Using Composite Range-List Partitioning

Use the composite range-list partitioning method for tables and local indexes if:

  • Subpartitions have a logical grouping defined by the user.
  • The contents of a partition can be spread across multiple tablespaces, devices, or nodes (of an MPP system).
  • You require both partition pruning and partition-wise joins even when the pruning and join predicates use different columns of the partitioned table.
  • You require a degree of parallelism that is greater than the number of partitions for backup, recovery, and parallel operations.

Most large tables in a data warehouse should use range partitioning. Composite partitioning should be used for very large tables or for data warehouses with a well-defined need for these conditions. When using the composite method, Oracle stores each subpartition on a different segment. Thus, the subpartitions may have properties that differ from the properties of the table or from the partition to which the subpartitions belong.

This statement creates a table quarterly_regional_salesthat is range partitioned on the txn_datefield and list subpartitioned on state.

CREATE TABLE quarterly_regional_sales
(deptno NUMBER, item_no VARCHAR2(20),
 txn_date DATE, txn_amount NUMBER, state VARCHAR2(2))
PARTITION BY RANGE (txn_date)
SUBPARTITION BY LIST (state)
(
PARTITION q1_1999 VALUES LESS THAN(TO_DATE('1-APR-1999','DD-MON-YYYY'))
(SUBPARTITION q1_1999_northwest VALUES ('OR', 'WA'), 
SUBPARTITION q1_1999_southwest VALUES ('AZ', 'UT', 'NM'),
SUBPARTITION q1_1999_northeast VALUES ('NY', 'VM', 'NJ'),
SUBPARTITION q1_1999_southeast VALUES  ('FL', 'GA'),
SUBPARTITION q1_1999_northcentral VALUES ('SD', 'WI'),
SUBPARTITION q1_1999_southcentral VALUES ('NM', 'TX')),
PARTITION q2_1999 VALUES LESS THAN(TO_DATE('1-JUL-1999','DD-MON-YYYY'))
(SUBPARTITION q2_1999_northwest VALUES ('OR', 'WA'),
SUBPARTITION q2_1999_southwest VALUES ('AZ', 'UT', 'NM'),
SUBPARTITION q2_1999_northeast VALUES ('NY', 'VM', 'NJ'),
SUBPARTITION q2_1999_southeast VALUES ('FL', 'GA'),
SUBPARTITION q2_1999_northcentral VALUES ('SD', 'WI'),
SUBPARTITION q2_1999_southcentral VALUES ('NM', 'TX')),
PARTITION q3_1999 VALUES LESS THAN (TO_DATE('1-OCT-1999','DD-MON-YYYY'))
(SUBPARTITION q3_1999_northwest VALUES ('OR', 'WA'),
SUBPARTITION q3_1999_southwest VALUES  ('AZ', 'UT', 'NM'),
SUBPARTITION q3_1999_northeast VALUES ('NY', 'VM', 'NJ'),
SUBPARTITION q3_1999_southeast VALUES ('FL', 'GA'),
SUBPARTITION q3_1999_northcentral VALUES ('SD', 'WI'),
SUBPARTITION q3_1999_southcentral VALUES ('NM', 'TX')),
PARTITION q4_1999 VALUES LESS THAN (TO_DATE('1-JAN-2000','DD-MON-YYYY'))
(SUBPARTITION q4_1999_northwest VALUES('OR', 'WA'),
SUBPARTITION q4_1999_southwest VALUES('AZ', 'UT', 'NM'),
SUBPARTITION q4_1999_northeast VALUES('NY', 'VM', 'NJ'),
SUBPARTITION q4_1999_southeast VALUES('FL', 'GA'),
SUBPARTITION q4_1999_northcentral VALUES ('SD', 'WI'),
SUBPARTITION q4_1999_southcentral VALUES ('NM', 'TX')));

You can create subpartitions in a composite partitioned table using a subpartition template. A subpartition template simplifies the specification of subpartitions by not requiring that a subpartition descriptor be specified for every partition in the table. Instead, you describe subpartitions only once in a template, then apply that subpartition template to every partition in the table. The following statement illustrates an example where you can choose the subpartition name and tablespace locations:

CREATE TABLE quarterly_regional_sales
(deptno NUMBER,  item_no VARCHAR2(20),
 txn_date DATE,  txn_amount NUMBER,  state VARCHAR2(2))
PARTITION BY RANGE (txn_date)
SUBPARTITION BY LIST (state)
SUBPARTITION TEMPLATE(
SUBPARTITION northwest VALUES ('OR', 'WA') TABLESPACE ts1,
SUBPARTITION southwest VALUES ('AZ', 'UT', 'NM') TABLESPACE ts2,
SUBPARTITION northeast VALUES ('NY', 'VM', 'NJ') TABLESPACE ts3,
SUBPARTITION southeast VALUES ('FL', 'GA') TABLESPACE ts4,
SUBPARTITION northcentral VALUES ('SD', 'WI') TABLESPACE ts5,
SUBPARTITION southcentral VALUES ('NM', 'TX') TABLESPACE ts6)
(
PARTITION q1_1999 VALUES LESS THAN(TO_DATE('1-APR-1999','DD-MON-YYYY')),
PARTITION q2_1999 VALUES LESS THAN(TO_DATE('1-JUL-1999','DD-MON-YYYY')),
PARTITION q3_1999 VALUES LESS THAN(TO_DATE('1-OCT-1999','DD-MON-YYYY')),
PARTITION q4_1999 VALUES LESS THAN(TO_DATE('1-JAN-2000','DD-MON-YYYY')));

Partitioning and Table Compression

You can compress several partitions or a complete partitioned heap-organized table. You do this by either defining a complete partitioned table as being compressed, or by defining it on a per-partition level. Partitions without a specific declaration inherit the attribute from the table definition or, if nothing is specified on table level, from the tablespace definition.

To decide whether or not a partition should be compressed or stay uncompressed adheres to the same rules as a nonpartitioned table. However, due to the capability of range and composite partitioning to separate data logically into distinct partitions, such a partitioned table is an ideal candidate for compressing parts of the data (partitions) that are mainly read-only. It is, for example, beneficial in all rolling window operations as a kind of intermediate stage before aging out old data. With data segment compression, you can keep more old data online, minimizing the burden of additional storage consumption.

You can also change any existing uncompressed table partition later on, add new compressed and uncompressed partitions, or change the compression attribute as part of any partition maintenance operation that requires data movement, such as MERGEPARTITION, SPLITPARTITION, or MOVEPARTITION. The partitions can contain data or can be empty.

The access and maintenance of a partially or fully compressed partitioned table are the same as for a fully uncompressed partitioned table. Everything that applies to fully uncompressed partitioned tables is also valid for partially or fully compressed partitioned tables.

Table Compression and Bitmap Indexes

If you want to use table compression on partitioned tables with bitmap indexes, you need to do the following before you introduce the compression attribute for the first time:

  1. Mark bitmap indexes unusable.
  2. Set the compression attribute.
  3. Rebuild the indexes.

The first time you make a compressed partition part of an already existing, fully uncompressed partitioned table, you must either drop all existing bitmap indexes or mark them UNUSABLEprior to adding a compressed partition. This must be done irrespective of whether any partition contains any data. It is also independent of the operation that causes one or more compressed partitions to become part of the table. This does not apply to a partitioned table having B-tree indexes only.

This rebuilding of the bitmap index structures is necessary to accommodate the potentially higher number of rows stored for each data block with table compression enabled and must be done only for the first time. All subsequent operations, whether they affect compressed or uncompressed partitions, or change the compression attribute, behave identically for uncompressed, partially compressed, or fully compressed partitioned tables.

To avoid the recreation of any bitmap index structure, Oracle recommends creating every partitioned table with at least one compressed partition whenever you plan to partially or fully compress the partitioned table in the future. This compressed partition can stay empty or even can be dropped after the partition table creation.

Having a partitioned table with compressed partitions can lead to slightly larger bitmap index structures for the uncompressed partitions. The bitmap index structures for the compressed partitions, however, are in most cases smaller than the appropriate bitmap index structure before table compression. This highly depends on the achieved compression rates.

Note:

Oracle Database will raise an error if compression is introduced to an object for the first time and there are usable bitmap index segments.

Example of Table Compression and Partitioning

The following statement moves and compresses an already existing partition sales_q1_1998of table sales:

ALTER TABLE sales
MOVE PARTITION sales_q1_1998 TABLESPACE ts_arch_q1_1998 COMPRESS;

If you use the MOVEstatement, the local indexes for partition sales_q1_1998become unusable. You have to rebuild them afterward, as follows:

ALTER TABLE sales
MODIFY PARTITION sales_q1_1998 REBUILD UNUSABLE LOCAL INDEXES;

The following statement merges two existing partitions into a new, compressed partition, residing in a separate tablespace. The local bitmap indexes have to be rebuilt afterward, as follows:

ALTER TABLE sales MERGE PARTITIONS sales_q1_1998, sales_q2_1998 
INTO PARTITION sales_1_1998 TABLESPACE ts_arch_1_1998 
COMPRESS UPDATE INDEXES;

Partition Pruning

Partition pruning is an essential performance feature for data warehouses. In partition pruning, the optimizer analyzes FROMand WHEREclauses in SQL statements to eliminate unneeded partitions when building the partition access list. This enables Oracle Database to perform operations only on those partitions that are relevant to the SQL statement. Oracle Database prunes partitions when you use range, LIKE, equality, and IN-list predicates on the range or list partitioning columns, and when you use equality and IN-list predicates on the hash partitioning columns.

Partition pruning dramatically reduces the amount of data retrieved from disk and shortens processing time, thus improving query performance and resource utilization. If you partition the index and table on different columns (with a global partitioned index), partition pruning also eliminates index partitions even when the partitions of the underlying table cannot be eliminated.

Depending upon the actual SQL statement, Oracle Database may use static or dynamic pruning. Static pruning occurs at compile-time, with the information about the partitions accessed beforehand while dynamic pruning occurs at run-time, meaning that the exact partitions to be accessed by a statement are not known beforehand. A sample scenario for static pruning would be a SQL statement containing a WHEREcondition with a constant literal on the partition key column. An example of dynamic pruning is the use of operators or functions in the WHEREcondition.

Partition pruning affects the statistics of the objects where pruning will occur and will therefore also affect the execution plan of a statement.

Information that can be Used for Partition Pruning

Oracle Database prunes partitions when you use range, LIKE, equality, and IN-list predicates on the range or list partitioning columns, and when you use equality and IN-list predicates on the hash partitioning columns.

On composite partitioned objects, Oracle can prune at both the range partition level and at the hash or list subpartition level using the relevant predicates. Examine the table sales_range_hashcreated earlier, which is partitioned by range on the column s_salesdateand subpartitioned by hash on the column s_productid, and consider the following example:

SELECT * FROM sales_range_hash
WHERE s_saledate BETWEEN (TO_DATE('01-JUL-1999', 'DD-MON-YYYY'))
  AND (TO_DATE('01-OCT-1999', 'DD-MON-YYYY')) AND s_productid = 1200;

Oracle uses the predicate on the partitioning columns to perform partition pruning as follows:

  • When using range partitioning, Oracle accesses only partitions sal99q2and sal99q3, representing the partitions for the third and fourth quarters of 1999.
  • When using hash subpartitioning, Oracle accesses only the one subpartition in each partition that stores the rows with s_productid=1200. The mapping between the subpartition and the predicate is calculated based on Oracle’s internal hash distribution function.

How to Identify Whether Partition Pruning has been Used

Whether Oracle uses partition pruning or not is reflected in the execution plan of a statement, either in the plan table for the EXPLAINPLANstatement or in the shared SQL area.

The partition pruning information is reflected in the plan columns PARTITION_STARTand PARTITION_STOP. In the case of serial statements, the pruning information is also reflected in the OPERATIONand OPTIONScolumns.

Static Partition Pruning

For a number of cases, Oracle determines the partitions to be accessed at compile time. This happens when the predicates on the partitioning columns use a range predicate. In addition, the predicates must only use constants so that Oracle can determine the start and stop partition numbers at compile time. When this happens, the actual partition numbers show up in the partition start partition stop columns of the explain plan. The following is an example:

SQL> explain plan for select * from sales where time_id = to_date('01-jan-2001', 'dd-mon-yyyy');
Explained.

SQL> select * from table(dbms_xplan.display);
PLAN_TABLE_OUTPUT
----------------------------------------------------------------------------------------------
Plan hash value: 3971874201
----------------------------------------------------------------------------------------------
| Id | Operation              | Name  | Rows | Bytes | Cost (%CPU)| Time     | Pstart| Pstop |
----------------------------------------------------------------------------------------------
|  0 | SELECT STATEMENT       |       | 673  | 19517 | 27      (8)| 00:00:01 |       |       |
|  1 |  PARTITION RANGE SINGLE|       | 673  | 19517 | 27      (8)| 00:00:01 | 17    | 17    |
|* 2 |   TABLE ACCESS FULL    | SALES | 673  | 19517 | 27      (8)| 00:00:01 | 17    | 17    |
----------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
--------------------------------------------------- 
   2 - filter("TIME_ID"=TO_DATE('2001-01-01 00:00:00', 'yyyy-mm-dd hh24:mi:ss'))

This plan shows that Oracle accesses partition number 17, as shown in the PSTARTand PSTOPcolumns.

Dynamic Partition Pruning

Oracle Database converts the pruning predicates into a partition number at compile time whenever it can. However, there are a number of cases when this is not possible. For example, if a predicate on a partitioning column involves a bind variable, Oracle cannot determine the partition numbers at compile time. Even if bind variables are not used, for certain classes of predicates such as IN-list predicates Oracle accesses a list of predicates not a range. In all these cases, the explain plan will show KEYin the partition start and stop columns. There is also some information about the type of predicate used. For example, if an IN-list predicate was used, the plan output will show KEY(I)in the partition start and stop column. For example, consider the following:

SQL> explain plan for select * from sales s where time_id in ( :a, :b, :c, :d);
Explained.

SQL> select * from table(dbms_xplan.display);
PLAN_TABLE_OUTPUT
---------------------------------------------------------------------------------------------------
Plan hash value: 513834092
---------------------------------------------------------------------------------------------------
| Id | Operation                         |    Name |Rows|Bytes|Cost (%CPU)|  Time  | Pstart| Pstop|
---------------------------------------------------------------------------------------------------
|  0 | SELECT STATEMENT                  |         |2517|72993|    292 (0)|00:00:04|       |      |
|  1 |  INLIST ITERATOR                  |         |    |     |           |        |       |      |
|  2 |   PARTITION RANGE ITERATOR        |         |2517|72993|    292 (0)|00:00:04|KEY(I) |KEY(I)|
|  3 |    TABLE ACCESS BY LOCAL INDEX ROWID| SALES |2517|72993|    292 (0)|00:00:04|KEY(I) |KEY(I)|
|  4 |     BITMAP CONVERSION TO ROWIDS   |         |    |     |           |        |       |      |
|* 5 |      BITMAP INDEX SINGLE VALUE    |SALES_TIME_BIX| |   |           |        |KEY(I) |KEY(I)|
---------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
5 - access("TIME_ID"=:A OR "TIME_ID"=:B OR "TIME_ID"=:C OR "TIME_ID"=:D)

17 rows selected.

For parallel plans only, the partition start and stop columns contain the partition pruning information; the operation column will contain information for the parallel operation, as shown in the following example:

SQL> explain plan for select * from sales where time_id in ( :a, :b, :c, :d);
Explained.

SQL> select * from table(dbms_xplan.display);
PLAN_TABLE_OUTPUT
-------------------------------------------------------------------------------------------------
Plan hash value: 4058105390
-------------------------------------------------------------------------------------------------
| Id| Operation          | Name  |Rows|Bytes|Cost(%CP|  Time  |Pstart| Pstop|  TQ |INOUT| PQ Dis|
-------------------------------------------------------------------------------------------------
|  0| SELECT STATEMENT   |       |2517|72993|  75(36)|00:00:01|      |      |     |     |       |
|  1|  PX COORDINATOR    |       |    |     |        |        |      |      |     |     |       |
|  2|  PX SEND QC(RANDOM)|:TQ10000|2517|72993| 75(36)|00:00:01|      |      |Q1,00| P->S|QC(RAND|
|  3|   PX BLOCK ITERATOR|       |2517|72993|  75(36)|00:00:01|KEY(I)|KEY(I)|Q1,00| PCWC|       |
|* 4|   TABLE ACCESS FULL| SALES |2517|72993|  75(36)|00:00:01|KEY(I)|KEY(I)|Q1,00| PCWP|       |
-------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
--------------------------------------------------- 
  4 - filter("TIME_ID"=:A OR "TIME_ID"=:B OR "TIME_ID"=:C OR "TIME_ID"=:D)

16 rows selected.

Basic Partition Pruning Techniques

The optimizer uses a wide variety of predicates for pruning. The three predicate types, equality, range, and IN-list are the most commonly used cases of partition pruning. As an example, consider the following query:

SELECT SUM(amount_sold) day_sales
FROM sales
WHERE time_id = TO_DATE('02-JAN-1998', 'DD-MON-YYYY');

Because there is an equality predicate on the partitioning column of sales, this query will prune down to a single predicate and this will be reflected in the explain plan, as shown in the following:

-----------------------------------------------------------------------------------------------
|  Id | Operation                | Name  | Rows| Bytes | Cost (%CPU)| Time     |Pstart| Pstop |
-----------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT         |       |     |       | 21 (100)   |          |      |       |
|   1 |  SORT AGGREGATE          |       | 1   | 13    |            |          |      |       |
|   2 |   PARTITION RANGE SINGLE |       | 485 | 6305  | 21 (10)    | 00:00:01 | 5    | 5     |
| * 3 |    TABLE ACCESS FULL     | SALES | 485 | 6305  | 21 (10)    | 00:00:01 | 5    | 5     |
-----------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
--------------------------------------------------- 
  3 - filter("TIME_ID"=TO_DATE('1998-01-02 00:00:00', 'yyyy-mm-dd hh24:mi:ss'))

Similarly, a range or an IN-list predicate on the time_idcolumn and the optimizer would be used to prune to a set of partitions.

The partitioning type plays a role in which predicates can be used. Range predicates cannot be used for pruning on hash partitioned tables while they can be used for all other partitioning strategies. However, on list partitioned tables, range predicates do not map to a range predicates do not map to a range of partitions. Equality and IN-list predicates can be used to prune with all the partitioning methods.

Advanced Partition Pruning Techniques

Oracle also prunes in the presence of more complex predicates or SQL statements involving partitioned tables. A common situation is when a partitioned table is joined to the subset of another table, limited by a WHEREcondition. For example, consider the following query:

SELECT t.day_number_in_month, SUM(s.amount_sold)
FROM sales s, times t
WHERE s.time_id = t.time_id
  AND t.calendar_month_desc='2000-12'
GROUP BY t.day_number_in_month;

If we performed a nested loop join with times on the right hand side, then we would only access the partition corresponding to this row from the timestable. But, if we performed a hash or sort merge join, this would not be possible. If the table with the WHEREpredicate is relatively small compared to the partitioned table, and the expected reduction of records or partitions for the partitioned table is significant, Oracle will perform dynamic partition pruning using a recursive subquery. The decision whether or not to invoke subquery pruning is an internal cost-based decision of the optimizer.

A sample plan using a hash join operation would look like the following:

--------------------------------------------------------------------------------------------------
| Id| Operation                    |  Name |  Rows | Bytes| Cost (%CPU)|  Time  | Pstart | Pstop |
--------------------------------------------------------------------------------------------------
|  0| SELECT STATEMENT             |       |       |      | 761 (100)  |        |        |       |
|  1|  HASH GROUP BY               |       |    20 | 640  | 761 (41)   |00:00:10|        |       |
|* 2|   HASH JOIN                  |       | 19153 | 598K | 749 (40)   |00:00:09|        |       |
|* 3|    TABLE ACCESS FULL         | TIMES |    30 |  570 |  17 (6)    |00:00:01|        |       |
|  4|     PARTITION RANGE SUBQUERY |       |  918K | 11M  |   655 (33) |00:00:08| KEY(SQ)|KEY(SQ)|
|  5|      TABLE ACCESS FULL       | SALES |   918 | 11M  |   655 (33) |00:00:08| KEY(SQ)|KEY(SQ)|
--------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------------------------
  2 - access("S"."TIME_ID"="T"."TIME_ID") 
  3 - filter("T"."CALENDAR_MONTH_DESC"='2000-12')

This plan shows that dynamic partition pruning occurred on the salestable.

Another example using advanced pruning is the following, which uses an ORpredicate:

SELECT p.promo_name promo_name, (s.profit - p.promo_cost) profit
FROM
   promotions p,
   (SELECT
      promo_id,
      SUM(sales.QUANTITY_SOLD * (costs.UNIT_PRICE - costs.UNIT_COST)) profit
   FROM
      sales, costs
   WHERE
      ((sales.time_id BETWEEN TO_DATE('01-JAN-1998','DD-MON-YYYY',
                  'NLS_DATE_LANGUAGE = American') AND
      TO_DATE('01-JAN-1999','DD-MON-YYYY', 'NLS_DATE_LANGUAGE = American')
OR
      (sales.time_id BETWEEN TO_DATE('01-JAN-2001','DD-MON-YYYY',
                  'NLS_DATE_LANGUAGE = American') AND
      TO_DATE('01-JAN-2002','DD-MON-YYYY', 'NLS_DATE_LANGUAGE = American')))
      AND sales.time_id = costs.time_id
      AND sales.prod_id = costs.prod_id
   GROUP BY
      promo_id) s
WHERE s.promo_id = p.promo_id
ORDER BY profit
DESC;

This query joins the salesand coststables in the shsample schema. The salestable is partitioned by range on the column time_id. One of the conditions in the query are two predicates on time_id, which are combined with an ORoperator. This ORpredicate is used to prune the partitions in salestable and a single join between salesand coststable is performed. The plan is as follows:

--------------------------------------------------------------------------------------------------
| Id| Operation               |  Name    |Rows |Bytes |TmpSp|Cost(%CPU)| Time    | Pstart| Pstop |
--------------------------------------------------------------------------------------------------
|  0| SELECT STATEMENT        |          | 4   |  200 |     | 3556 (14)| 00:00:43|       |       |
|  1|  SORT ORDER BY          |          | 4   |  200 |     | 3556 (14)| 00:00:43|       |       |
|* 2|   HASH JOIN             |          | 4   |  200 |     | 3555 (14)| 00:00:43|       |       |
|  3|    TABLE ACCESS FULL    |PROMOTIONS| 503 | 16599|     |    16 (0)| 00:00:01|       |       |
|  4|    VIEW                 |          |   4 |   68 |     | 3538 (14)| 00:00:43|       |       |
|  5|     HASH GROUP BY       |          |   4 |  164 |     | 3538 (14)| 00:00:43|       |       |
|  6|      PARTITION RANGE OR |          | 314K|   12M|     |  3321 (9)| 00:00:40|KEY(OR)|KEY(OR)|
|* 7|       HASH JOIN         |          | 314K|   12M| 440K|  3321 (9)| 00:00:40|       |       |
|* 8|        TABLE ACCESS FULL| SALES    | 402K| 7467K|     |  400 (39)| 00:00:05|KEY(OR)|KEY(OR)|
|  9| TABLE ACCESS FULL       | COSTS    |82112| 1764K|     |   77 (24)| 00:00:01|KEY(OR)|KEY(OR)|
--------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
--------------------------------------------------- 
  2 - access("S"."PROMO_ID"="P"."PROMO_ID") 
  7 - access("SALES"."TIME_ID"="COSTS"."TIME_ID" AND "SALES"."PROD_ID"="COSTS"."PROD_ID") 
  8 - filter("SALES"."TIME_ID"<=TO_DATE('1999-01-01 00:00:00', 'yyyy-mm-dd hh24:mi:ss') AND 
      "SALES"."TIME_ID">=TO_DATE('1998-01-01 00:00:00', 'yyyy-mm-dd hh24:mi:ss') OR 
      "SALES"."TIME_ID">=TO_DATE('2001-01-01 00:00:00', 'yyyy-mm-dd hh24:mi:ss') AND 
      "SALES"."TIME_ID"<=TO_DATE('2002-01-01 00:00:00', 'yyyy-mm-dd hh24:mi:ss'))

Oracle also does additional pruning when a column is range partitioned on multiple columns. As long as Oracle can guarantee that a particular predicate cannot be satisfied in a particular partition, the partition will be skipped. This allows Oracle to optimize cases where there are range predicates on more than one column or in the case where there are no predicates on a prefix of the partitioning columns.

Partition Pruning Tips

When using partition pruning, you should consider the following:

Partition Pruning Using DATE Columns

A common mistake occurs when using the Oracle DATEdatatype. An Oracle DATEdatatype is not a character string, but is only represented as such when querying the database; the format of the representation is defined by the NLS setting of the instance or the session. Consequently, the same reverse conversion has to happen when inserting data into a DATEfiled or specifying a predicate on such a field.

A conversion can either happen implicitly or explicitly by specifying a TO_DATEconversion. Only a properly applied TO_DATEfunction guarantees that Oracle is capable of uniquely determining the date value and using it potentially for static pruning, which is especially beneficial for single partition access.

Rewriting the example of the “Basic Partition Pruning Techniques” slightly, using the two-digit date format RRcauses the predicate to become non-deterministic, thus causing dynamic pruning:

SELECT SUM(amount_sold) day_sales
FROM sales,
WHERE time_id = '02-JAN-98';

The plan now should be similar to the following:

----------------------------------------------------------------------------------------------
| Id | Operation             | Name  | Rows| Bytes | Cost (%CPU) | Time     | Pstart | Pstop |
----------------------------------------------------------------------------------------------
|  0 | SELECT STATEMENT      |       |     |       | 31 (100)    |          |        |       |
|  1 |  SORT AGGREGATE       |       | 1   |    13 |             |          |        |       |
|  2 |   PARTITION RANGE SINGLE|     | 629 |  8177 | 31 (46)     | 00:00:01 |    KEY |   KEY |
|* 3 |    TABLE ACCESS FULL  | SALES | 629 |  8177 | 31 (46)     | 00:00:01 |    KEY |   KEY |
----------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
--------------------------------------------------- 
  3 – filter("TIME_ID"='02-JAN-98')

The keyword KEYfor both columns means that partition pruning occurs at run-time.

SQL Constructs to Avoid

There are several cases when the optimizer cannot perform any pruning. One of the most common reasons is when an operator is on top of a partitioning column. This could be an explicit operator (for example, a function) or even an implicit operator introduced by Oracle as part of the necessary data type conversion for executing the statement. For example, consider the following query:

EXPLAIN PLAN FOR
SELECT SUM(quantity_sold)
FROM sales
WHERE time_id = TO_TIMESTAMP('1-jan'2000', 'dd-mon-yyyy');

Because time_idis of type DATEand Oracle needs to promote it to the TIMESTAMPtype to get the same datatype, this predicate is internally rewritten as:

TO_TIMESTAMP(time_id) = TO_TIMESTAMP('1-jan-2000', 'dd-mon-yyyy')

The explain plan for this statement is as follows:

--------------------------------------------------------------------------------------------
|Id | Operation            | Name  | Rows  | Bytes | Cost (%CPU)| Time     | Pstart| Pstop |
--------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT     |       |     1 |    11 |     6  (17)| 00:00:01 |       |       |
| 1 |  SORT AGGREGATE      |       |     1 |    11 |            |          |       |       |
| 2 |   PARTITION RANGE ALL|       |    10 |   110 |     6  (17)| 00:00:01 |     1 |    16 |
|*3 |    TABLE ACCESS FULL | SALES |    10 |   110 |     6  (17)| 00:00:01 |     1 |    16 |
--------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
3 - filter(INTERNAL_FUNCTION("TIME_ID")=TO_TIMESTAMP('1-jan-2000',:B1))

15 rows selected

Partition-Wise Joins

Partition-wise joins reduce query response time by minimizing the amount of data exchanged among parallel execution servers when joins execute in parallel. This significantly reduces response time and improves the use of both CPU and memory resources. In Oracle Real Application Clusters (RAC) environments, partition-wise joins also avoid or at least limit the data traffic over the interconnect, which is the key to achieving good scalability for massive join operations.

Partition-wise joins can be full or partial. Oracle decides which type of join to use.

Full Partition-Wise Joins

A full partition-wise join divides a large join into smaller joins between a pair of partitions from the two joined tables. To use this feature, you must equipartition both tables on their join keys. For example, consider a large join between a sales table and a customer table on the column customerid. The query “find the records of all customers who bought more than 100 articles in Quarter 3 of 1999” is a typical example of a SQL statement performing such a join. The following is an example of this:

SELECT c.cust_last_name, COUNT(*)
FROM sales s, customers c
WHERE s.cust_id = c.cust_id AND 
s.time_id BETWEEN TO_DATE('01-JUL-1999', 'DD-MON-YYYY') AND 
     (TO_DATE('01-OCT-1999', 'DD-MON-YYYY'))
GROUP BY c.cust_last_name HAVING COUNT(*) > 100;

This large join is typical in data warehousing environments. The entire customer table is joined with one quarter of the sales data. In large data warehouse applications, this might mean joining millions of rows. The join method to use in that case is obviously a hash join. You can reduce the processing time for this hash join even more if both tables are equipartitioned on the customeridcolumn. This enables a full partition-wise join.

When you execute a full partition-wise join in parallel, the granule of parallelism is a partition. As a result, the degree of parallelism is limited to the number of partitions. For example, you require at least 16 partitions to set the degree of parallelism of the query to 16.

You can use various partitioning methods to equipartition both tables on the column customeridwith 16 partitions. These methods are described in these subsections.

Full Partition-Wise Joins: Hash-Hash

This is the simplest method: the customersand salestables are both partitioned by hash into 16 partitions, on the s_customeridand c_customeridcolumns. This partitioning method enables full partition-wise join when the tables are joined on c_customeridand s_customerid, both representing the same customer identification number. Because you are using the same hash function to distribute the same information (customer ID) into the same number of hash partitions, you can join the equivalent partitions. They are storing the same values.

In serial, this join is performed between pairs of matching hash partitions, one at a time. When one partition pair has been joined, the join of another partition pair begins. The join completes when the 16 partition pairs have been processed.

Note:

A pair of matching hash partitions is defined as one partition with the same partition number from each table. For example, with full partition-wise joins we join partition 0 of saleswith partition 0 of customers, partition 1 of saleswith partition 1 of customers, and so on.

Parallel execution of a full partition-wise join is a straightforward parallelization of the serial execution. Instead of joining one partition pair at a time, 16 partition pairs are joined in parallel by the 16 query servers. Figure 5-1 illustrates the parallel execution of a full partition-wise join.

Figure 5-1 Parallel Execution of a Full Partition-wise Join

Description of Figure 5-1 follows
Description of “Figure 5-1 Parallel Execution of a Full Partition-wise Join”

Full Partition-Wise Joins: Hash-Hash

In Figure 5-1, assume that the degree of parallelism and the number of partitions are the same, in other words, 16 for both. Defining more partitions than the degree of parallelism may improve load balancing and limit possible skew in the execution. If you have more partitions than query servers, when one query server completes the join of one pair of partitions, it requests that the query coordinator give it another pair to join. This process repeats until all pairs have been processed. This method enables the load to be balanced dynamically when the number of partition pairs is greater than the degree of parallelism, for example, 64 partitions with a degree of parallelism of 16.

Note:

To guarantee an equal work distribution, the number of partitions should always be a multiple of the degree of parallelism.

In Oracle Real Application Clusters environments running on shared-nothing or MPP platforms, placing partitions on nodes is critical to achieving good scalability. To avoid remote I/O, both matching partitions should have affinity to the same node. Partition pairs should be spread over all nodes to avoid bottlenecks and to use all CPU resources available on the system.

Nodes can host multiple pairs when there are more pairs than nodes. For example, with an 8-node system and 16 partition pairs, each node receives two pairs.

Full Partition-Wise Joins: (Composite-Hash)-Hash

This method is a variation of the hash-hash method. The salestable is a typical example of a table storing historical data. For all the reasons mentioned under the heading When to Use Range Partitioning range is the logical initial partitioning method.

For example, assume you want to partition the salestable into eight partitions by range on the column s_salesdate. Also assume you have two years and that each partition represents a quarter. Instead of using range partitioning, you can use composite partitioning to enable a full partition-wise join while preserving the partitioning on s_salesdate. Partition the salestable by range on s_salesdateand then subpartition each partition by hash on s_customeridusing 16 subpartitions for each partition, for a total of 128 subpartitions. The customerstable can still use hash partitioning with 16 partitions.

When you use the method just described, a full partition-wise join works similarly to the one created by the hash-hash method. The join is still divided into 16 smaller joins between hash partition pairs from both tables. The difference is that now each hash partition in the salestable is composed of a set of 8 subpartitions, one from each range partition.

Figure 5-2 illustrates how the hash partitions are formed in the salestable. Each cell represents a subpartition. Each row corresponds to one range partition, for a total of 8 range partitions. Each range partition has 16 subpartitions. Each column corresponds to one hash partition for a total of 16 hash partitions; each hash partition has 8 subpartitions. Note that hash partitions can be defined only if all partitions have the same number of subpartitions, in this case, 16.

Hash partitions are implicit in a composite table. However, Oracle does not record them in the data dictionary, and you cannot manipulate them with DDL commands as you can range partitions.

Figure 5-2 Range and Hash Partitions of a Composite Table

Description of Figure 5-2 follows
Description of “Figure 5-2 Range and Hash Partitions of a Composite Table”

(Composite-Hash)-Hash partitioning is effective because it lets you combine pruning (on s_salesdate) with a full partition-wise join (on customerid). In the previous example query, pruning is achieved by scanning only the subpartitions corresponding to Q3 of 1999, in other words, row number 3 in Figure 5-2. Oracle then joins these subpartitions with the customer table, using a full partition-wise join.

All characteristics of the hash-hash partition-wise join apply to the composite-hash partition-wise join. In particular, for this example, these two points are common to both methods:

  • The degree of parallelism for this full partition-wise join cannot exceed 16. Even though the salestable has 128 subpartitions, it has only 16 hash partitions.
  • The rules for data placement on MPP systems apply here. The only difference is that a hash partition is now a collection of subpartitions. You must ensure that all these subpartitions are placed on the same node as the matching hash partition from the other table. For example, in Figure 5-2, store hash partition 9 of the salestable shown by the eight circled subpartitions, on the same node as hash partition 9 of the customerstable.

Full Partition-Wise Joins: (Composite-List)-List

The (Composite-List)-List method resembles that for (Composite-Hash)-Hash partition-wise joins.

Full Partition-Wise Joins: (Composite-Composite (Hash/List Dimension)

If needed, you can also partition the customertable by the composite method. For example, you partition it by range on a postal code column to enable pruning based on postal code. You then subpartition it by hash on customeridusing the same number of partitions (16) to enable a partition-wise join on the hash dimension.

Full Partition-Wise Joins: Range-Range and List-List

You can also join range partitioned tables with range partitioned tables and list partitioned tables with list partitioned tables in a partition-wise manner, but this is relatively uncommon. This is more complex to implement because you must know the distribution of the data before performing the join. Furthermore, if you do not correctly identify the partition bounds so that you have partitions of equal size, data skew during the execution may result.

The basic principle for using range-range and list-list is the same as for using hash-hash: you must equipartition both tables. This means that the number of partitions must be the same and the partition bounds must be identical. For example, assume that you know in advance that you have 10 million customers, and that the values for customeridvary from 1 to 10,000,000. In other words, you have 10 million possible different values. To create 16 partitions, you can range partition both tables, saleson s_customeridand customerson c_customerid. You should define partition bounds for both tables in order to generate partitions of the same size. In this example, partition bounds should be defined as 625001, 1250001, 1875001, … 10000001, so that each partition contains 625000 rows.

Full Partition-Wise Joins: Range-Composite, Composite-Composite (Range Dimension)

Finally, you can also subpartition one or both tables on another column. Therefore, the range-composite and composite-composite methods on the range dimension are also valid for enabling a full partition-wise join on the range dimension.

Partial Partition-Wise Joins

Oracle can perform partial partition-wise joins only in parallel. Unlike full partition-wise joins, partial partition-wise joins require you to partition only one table on the join key, not both tables. The partitioned table is referred to as the reference table. The other table may or may not be partitioned. Partial partition-wise joins are more common than full partition-wise joins.

To execute a partial partition-wise join, Oracle dynamically repartitions the other table based on the partitioning of the reference table. Once the other table is repartitioned, the execution is similar to a full partition-wise join.

The performance advantage that partial partition-wise joins have over joins in non-partitioned tables is that the reference table is not moved during the join operation. Parallel joins between non-partitioned tables require both input tables to be redistributed on the join key. This redistribution operation involves exchanging rows between parallel execution servers. This is a CPU-intensive operation that can lead to excessive interconnect traffic in Oracle Real Application Clusters environments. Partitioning large tables on a join key, either a foreign or primary key, prevents this redistribution every time the table is joined on that key. Of course, if you choose a foreign key to partition the table, which is the most common scenario, select a foreign key that is involved in many queries.

To illustrate partial partition-wise joins, consider the previous sales/customerexample. Assume that sales is not partitioned or is partitioned on a column other than s_customerid. Because salesis often joined with customerson customerid, and because this join dominates our application workload, partition saleson s_customeridto enable partial partition-wise join every time customersand salesare joined. As in full partition-wise join, you have several alternatives:

Partial Partition-Wise Joins: Hash-List

The simplest method to enable a partial partition-wise join is to partition salesby hash on s_customerid. The number of partitions determines the maximum degree of parallelism, because the partition is the smallest granule of parallelism for partial partition-wise join operations.

The parallel execution of a partial partition-wise join is illustrated in Figure 5-3, which assumes that both the degree of parallelism and the number of partitions of salesare 16. The execution involves two sets of query servers: one set, labeled set 1 in Figure 5-3, scans the customerstable in parallel. The granule of parallelism for the scan operation is a range of blocks.

Rows from customersthat are selected by the first set, in this case all rows, are redistributed to the second set of query servers by hashing customerid. For example, all rows in customersthat could have matching rows in partition P1of salesare sent to query server 1 in the second set. Rows received by the second set of query servers are joined with the rows from the corresponding partitions in sales. Query server number 1 in the second set joins all customersrows that it receives with partition P1of sales.

Figure 5-3 Partial Partition-Wise Join

Description of Figure 5-3 follows
Description of “Figure 5-3 Partial Partition-Wise Join”

Note:

This section is based on range-hash, but it also applies for range-list partial partition-wise joins.

Considerations for full partition-wise joins also apply to partial partition-wise joins:

  • The degree of parallelism does not need to equal the number of partitions. In Figure 5-3, the query executes with two sets of 16 query servers. In this case, Oracle assigns 1 partition to each query server of the second set. Again, the number of partitions should always be a multiple of the degree of parallelism.
  • In Oracle Real Application Clusters environments on shared-nothing platforms (MPPs), each hash partition of salesshould preferably have affinity to only one node in order to avoid remote I/Os. Also, spread partitions over all nodes to avoid bottlenecks and use all CPU resources available on the system. A node can host multiple partitions when there are more partitions than nodes.

Partial Partition-Wise Joins: Composite

As with full partition-wise joins, the prime partitioning method for the salestable is to use the range method on column s_salesdate. This is because salesis a typical example of a table that stores historical data. To enable a partial partition-wise join while preserving this range partitioning, subpartition salesby hash on column s_customeridusing 16 subpartitions for each partition. Pruning and partial partition-wise joins can be used together if a query joins customersand salesand if the query has a selection predicate on s_salesdate.

When salesis composite, the granule of parallelism for a partial partition-wise join is a hash partition and not a subpartition. Refer to Figure 5-2 for an illustration of a hash partition in a composite table. Again, the number of hash partitions should be a multiple of the degree of parallelism. Also, on an MPP system, ensure that each hash partition has affinity to a single node. In the previous example, the eight subpartitions composing a hash partition should have affinity to the same node.

Note:

This section is based on range-hash, but it also applies for range-list partial partition-wise joins.

Partial Partition-Wise Joins: Range

Finally, you can use range partitioning on s_customeridto enable a partial partition-wise join. This works similarly to the hash method, but a side effect of range partitioning is that the resulting data distribution could be skewed if the size of the partitions differs. Moreover, this method is more complex to implement because it requires prior knowledge of the values of the partitioning column that is also a join key.

Benefits of Partition-Wise Joins

Partition-wise joins offer benefits described in this section:

Reduction of Communications Overhead

When executed in parallel, partition-wise joins reduce communications overhead. This is because, in the default case, parallel execution of a join operation by a set of parallel execution servers requires the redistribution of each table on the join column into disjoint subsets of rows. These disjoint subsets of rows are then joined pair-wise by a single parallel execution server.

Oracle can avoid redistributing the partitions because the two tables are already partitioned on the join column. This enables each parallel execution server to join a pair of matching partitions.

This improved performance from using parallel execution is even more noticeable in Oracle Real Application Clusters configurations with internode parallel execution. Partition-wise joins dramatically reduce interconnect traffic. Using this feature is for large DSS configurations that use Oracle Real Application Clusters.

Currently, most Oracle Real Application Clusters platforms, such as MPP and SMP clusters, provide limited interconnect bandwidths compared with their processing powers. Ideally, interconnect bandwidth should be comparable to disk bandwidth, but this is seldom the case. As a result, most join operations in Oracle Real Application Clusters experience high interconnect latencies without parallel execution of partition-wise joins.

Reduction of Memory Requirements

Partition-wise joins require less memory than the equivalent join operation of the complete data set of the tables being joined.

In the case of serial joins, the join is performed at the same time on a pair of matching partitions. If data is evenly distributed across partitions, the memory requirement is divided by the number of partitions. There is no skew.

In the parallel case, memory requirements depend on the number of partition pairs that are joined in parallel. For example, if the degree of parallelism is 20 and the number of partitions is 100, 5 times less memory is required because only 20 joins of two partitions are performed at the same time. The fact that partition-wise joins require less memory has a direct effect on performance. For example, the join probably does not need to write blocks to disk during the build phase of a hash join.

Performance Considerations for Parallel Partition-Wise Joins

The optimizer weighs the advantages and disadvantages when deciding whether or not to use partition-wise joins.

  • In range partitioning where partition sizes differ, data skew increases response time; some parallel execution servers take longer than others to finish their joins. Oracle recommends the use of hash (sub)partitioning to enable partition-wise joins because hash partitioning, if the number of partitions is a power of two, limits the risk of skew.
  • The number of partitions used for partition-wise joins should, if possible, be a multiple of the number of query servers. With a degree of parallelism of 16, for example, you can have 16, 32, or even 64 partitions. If there is an even number of partitions, some parallel execution servers are used less than others. For example, if there are 17 evenly distributed partition pairs, only one pair will work on the last join, while the other pairs will have to wait. This is because, in the beginning of the execution, each parallel execution server works on a different partition pair. At the end of this first phase, only one pair is left. Thus, a single parallel execution server joins this remaining pair while all other parallel execution servers are idle.
  • Sometimes, parallel joins can cause remote I/Os. For example, on Oracle Real Application Clusters environments running on MPP configurations, if a pair of matching partitions is not collocated on the same node, a partition-wise join requires extra internode communication due to remote I/O. This is because Oracle must transfer at least one partition to the node where the join is performed. In this case, it is better to explicitly redistribute the data than to use a partition-wise join.

Partition Maintenance

Maintaining partitions is one of the most important and time-consuming tasks when working with partitions.

Partitioning and Subpartitioning Columns and Keys

The partitioning columns (or subpartitioning columns) of a table or index consist of an ordered list of columns whose values determine how the data is partitioned or subpartitioned. This list can include up to 16 columns, and cannot include any of the following types of columns:

  • A LEVELor ROWIDpseudocolumn
  • A column of the ROWIDdatatype
  • A nested table, VARRAY, object type, or REFcolumn
  • A LOBcolumn (BLOB, CLOB, NCLOB, or BFILEdatatype)

A row’s partitioning key is an ordered list of its values for the partitioning columns. Similarly, in composite partitioning a row’s subpartitioning key is an ordered list of its values for the subpartitioning columns. Oracle applies either the range, list, or hash method to each row’s partitioning key or subpartitioning key to determine which partition or subpartition the row belongs in.

Partition Bounds for Range Partitioning

In a range-partitioned table or index, the partitioning key of each row is compared with a set of upper and lower bounds to determine which partition the row belongs in:

  • Every partition of a range-partitioned table or index has a noninclusive upper bound, which is specified by the VALUESLESSTHANclause.
  • Every partition except the first partition also has an inclusive lower bound, which is specified by the VALUESLESSTHANon the next-lower partition.

The partition bounds collectively define an ordering of the partitions in a table or index. The first partition is the partition with the lowest VALUESLESSTHANclause, and the last or highest partition is the partition with the highest VALUESLESSTHANclause.

Comparing Partitioning Keys with Partition Bounds

If you attempt to insert a row into a table and the row’s partitioning key is greater than or equal to the partition bound for the highest partition in the table, the insert will fail.

When comparing character values in partitioning keys and partition bounds, characters are compared according to their binary values. However, if a character consists of more than one byte, Oracle compares the binary value of each byte, not of the character. The comparison also uses the comparison rules associated with the column data type. For example, blank-padded comparison is done for the ANSI CHARdata type. The NLS parameters, specifically the initialization parameters NLS_SORTand NLS_LANGUAGEand the environment variable NLS_LANG, have no effect on the comparison.

The binary value of character data varies depending on which character set is being used (for example, ASCII or EBCDIC). For example, ASCII defines the characters A through Z as less than the characters a through z, whereas EBCDIC defines A through Z as being greater than a through z. Thus, partitions designed for one sequence will not work with the other sequence. You must repartition the table after importing from a table using a different character set.

MAXVALUE

You can specify the keyword MAXVALUEfor any value in the partition bound value_list. This keyword represents a virtual infinite value that sorts higher than any other value for the data type, including the NULLvalue.

For example, you might partition the OFFICEtable on STATE(a CHAR(10)column) into three partitions with the following partition bounds:

  • VALUES LESS THAN ('I'): States whose names start with A through H
  • VALUES LESS THAN ('S'): States whose names start with I through R
  • VALUES LESS THAN (MAXVALUE): States whose names start with S through Z, plus special codes for non-U.S. regions

Nulls

NULLcannot be specified as a value in a partition bound value_list. An empty string also cannot be specified as a value in a partition bound value_list, because it is treated as NULLwithin the database server.

For the purpose of assigning rows to partitions, Oracle Database sorts nulls greater than all other values except MAXVALUE. Nulls sort less than MAXVALUE.

This means that if a table is partitioned on a nullable column, and the column is to contain nulls, then the highest partition should have a partition bound of MAXVALUEfor that column. Otherwise the rows that contain nulls will map above the highest partition in the table and the insert will fail.

DATE Datatypes

If the partition key includes a column that has the DATEdatatype and the NLS date format does not specify the century with the year, you must specify partition bounds using the TO_DATEfunction with a 4-character format mask for the year. Otherwise, you will not be able to create the table or index. For example, with the sales_rangetable using a DATEcolumn:

CREATE TABLE sales_range
(salesman_id  NUMBER(5), 
salesman_name VARCHAR2(30), 
sales_amount  NUMBER(10), 
sales_date    DATE)
COMPRESS
PARTITION BY RANGE(sales_date)
(PARTITION sales_jan2000 VALUES LESS THAN(TO_DATE('02/01/2000','DD/MM/YYYY')),
 PARTITION sales_feb2000 VALUES LESS THAN(TO_DATE('03/01/2000','DD/MM/YYYY')),
 PARTITION sales_mar2000 VALUES LESS THAN(TO_DATE('04/01/2000','DD/MM/YYYY')),
 PARTITION sales_apr2000 VALUES LESS THAN(TO_DATE('05/01/2000','DD/MM/YYYY')));

When you query or modify data, it is recommended that you use the TO_DATEfunction in the WHEREclause so that the value of the date information can be determined at compile time. However, the optimizer can prune partitions using a selection criterion on partitioning columns of type DATEwhen you use another format, as in the following examples:

SELECT * FROM sales_range
  WHERE sales_date BETWEEN TO_DATE('01-JUL-00', 'DD-MON-YY') 
  AND TO_DATE('01-OCT-00', 'DD-MON-YY');

SELECT * FROM sales_range
  WHERE sales_date BETWEEN '01-JUL-2000' AND '01-OCT-2000';

In this case, the date value will be complete only at runtime. Therefore you will not be able to see which partitions Oracle is accessing as is usually shown on the partition_startand partition_stopcolumns of the EXPLAINPLANstatement output on the SQL statement. Instead, you will see the keyword KEYfor both columns.

Multicolumn Partitioning Keys

When a table or index is partitioned by range on multiple columns, each partition bound and partitioning key is a list (or vector) of values. The partition bounds and keys are ordered according to ANSI SQL2 vector comparison rules. This is also the way Oracle orders multicolumn index keys.

To compare a partitioning key with a partition bound, you compare the values of their corresponding columns until you find an unequal pair and then that pair determines which vector is greater. The values of any remaining columns have no effect on the comparison.

Implicit Constraints Imposed by Partition Bounds

If you specify a partition bound other than MAXVALUEfor the highest partition in a table, this imposes an implicit CHECKconstraint on the table. This constraint is not recorded in the data dictionary, but the partition bound itself is recorded.

Index Partitioning

The rules for partitioning indexes are similar to those for tables:

  • An index can be partitioned unless:
    • The index is a cluster index
    • The index is defined on a clustered table.
  • You can mix partitioned and nonpartitioned indexes with partitioned and nonpartitioned tables:
    • A partitioned table can have partitioned or nonpartitioned indexes.
    • A nonpartitioned table can have partitioned or nonpartitioned B-tree indexes.
  • Bitmap indexes on nonpartitioned tables cannot be partitioned.
  • A bitmap index on a partitioned table must be a local index.

However, partitioned indexes are more complicated than partitioned tables because there are three types of partitioned indexes:

  • Local prefixed
  • Local nonprefixed
  • Global prefixed

These types are described in the following section. Oracle supports all three types.

Local Partitioned Indexes

In a local index, all keys in a particular index partition refer only to rows stored in a single underlying table partition. A local index is created by specifying the LOCALattribute.

Oracle constructs the local index so that it is equipartitioned with the underlying table. Oracle partitions the index on the same columns as the underlying table, creates the same number of partitions or subpartitions, and gives them the same partition bounds as corresponding partitions of the underlying table.

Oracle also maintains the index partitioning automatically when partitions in the underlying table are added, dropped, merged, or split, or when hash partitions or subpartitions are added or coalesced. This ensures that the index remains equipartitioned with the table.

A local index can be created UNIQUEif the partitioning columns form a subset of the index columns. This restriction guarantees that rows with identical index keys always map into the same partition, where uniqueness violations can be detected.

Local indexes have the following advantages:

  • Only one index partition needs to be rebuilt when a maintenance operation other than SPLITPARTITIONor ADDPARTITIONis performed on an underlying table partition.
  • The duration of a partition maintenance operation remains proportional to partition size if the partitioned table has only local indexes.
  • Local indexes support partition independence.
  • Local indexes support smooth roll-out of old data and roll-in of new data in historical tables.
  • Oracle can take advantage of the fact that a local index is equipartitioned with the underlying table to generate better query access plans.
  • Local indexes simplify the task of tablespace incomplete recovery. In order to recover a partition or subpartition of a table to a point in time, you must also recover the corresponding index entries to the same point in time. The only way to accomplish this is with a local index. Then you can recover the corresponding table and index partitions or subpartitions together.

Local Prefixed Indexes

A local index is prefixed if it is partitioned on a left prefix of the index columns. For example, if the salestable and its local index sales_ixare partitioned on the week_numcolumn, then index sales_ixis local prefixed if it is defined on the columns (week_num, xaction_num). On the other hand, if index sales_ixis defined on column product_numthen it is not prefixed.

Local prefixed indexes can be unique or nonunique.

Figure 5-4 illustrates another example of a local prefixed index.

Local Nonprefixed Indexes

A local index is nonprefixed if it is not partitioned on a left prefix of the index columns.

You cannot have a unique local nonprefixed index unless the partitioning key is a subset of the index key.

Figure 5-5 illustrates an example of a local nonprefixed index.

Figure 5-5 Local Nonprefixed Index

Description of Figure 5-5 follows
Description of “Figure 5-5 Local Nonprefixed Index”

Global Partitioned Indexes

In a global partitioned index, the keys in a particular index partition may refer to rows stored in more than one underlying table partition or subpartition. A global index can be range or hash partitioned, though it can be defined on any type of partitioned table.

A global index is created by specifying the GLOBALattribute. The database administrator is responsible for defining the initial partitioning of a global index at creation and for maintaining the partitioning over time. Index partitions can be merged or split as necessary.

Normally, a global index is not equipartitioned with the underlying table. There is nothing to prevent an index from being equipartitioned with the underlying table, but Oracle does not take advantage of the equipartitioning when generating query plans or executing partition maintenance operations. So an index that is equipartitioned with the underlying table should be created as LOCAL.

A global partitioned index contains a single B-tree with entries for all rows in all partitions. Each index partition may contain keys that refer to many different partitions or subpartitions in the table.

The highest partition of a global index must have a partition bound all of whose values are MAXVALUE. This insures that all rows in the underlying table can be represented in the index.

Prefixed and Nonprefixed Global Partitioned Indexes

A global partitioned index is prefixed if it is partitioned on a left prefix of the index columns. A global partitioned index is nonprefixed if it is not partitioned on a left prefix of the index columns. Oracle does not support global nonprefixed partitioned indexes.

Global prefixed partitioned indexes can be unique or nonunique.

Nonpartitioned indexes are treated as global prefixed nonpartitioned indexes.

Management of Global Partitioned Indexes

Global partitioned indexes are harder to manage than local indexes:

  • When the data in an underlying table partition is moved or removed (SPLIT, MOVE, DROP, or TRUNCATE), all partitions of a global index are affected. Consequently global indexes do not support partition independence.
  • When an underlying table partition or subpartition is recovered to a point in time, all corresponding entries in a global index must be recovered to the same point in time. Because these entries may be scattered across all partitions or subpartitions of the index, mixed in with entries for other partitions or subpartitions that are not being recovered, there is no way to accomplish this except by re-creating the entire global index.

Figure 5-6 Global Prefixed Partitioned Index

Description of Figure 5-6 follows
Description of “Figure 5-6 Global Prefixed Partitioned Index”

Summary of Partitioned Index Types

Table 5-2 summarizes the types of partitioned indexes that Oracle supports. The key points are:

  • If an index is local, it is equipartitioned with the underlying table. Otherwise, it is global.
  • A prefixed index is partitioned on a left prefix of the index columns. Otherwise, it is nonprefixed.

Table 5-2 Types of Partitioned Indexes

Type of Index Index Equipartitioned with Table Index Partitioned on Left Prefix of Index Columns UNIQUE Attribute Allowed Example: Table Partitioning Key Example: Index Columns Example: Index Partitioning Key
Local Prefixed (any partitioning method) Yes Yes Yes A A, B A
Local Nonprefixed (any partitioning method) Yes No YesFoot 1  A B, A A
Global Prefixed (range partitioning only) NoFoot 2  Yes Yes A B B

Footnote 1 For a unique local nonprefixed index, the partitioning key must be a subset of the index key.
Footnote 2 Although a global partitioned index may be equipartitioned with the underlying table, Oracle does not take advantage of the partitioning or maintain equipartitioning after partition maintenance operations such as DROP or SPLIT PARTITION.

The Importance of Nonprefixed Indexes

Nonprefixed indexes are particularly useful in historical databases. In a table containing historical data, it is common for an index to be defined on one column to support the requirements of fast access by that column, but partitioned on another column (the same column as the underlying table) to support the time interval for rolling out old data and rolling in new data.

Consider a salestable partitioned by week. It contains a year’s worth of data, divided into 13 partitions. It is range partitioned on week_no, four weeks to a partition. You might create a nonprefixed local index sales_ixon sales. The sales_ixindex is defined on acct_nobecause there are queries that need fast access to the data by account number. However, it is partitioned on week_noto match the salestable. Every four weeks, the oldest partitions of salesand sales_ixare dropped and new ones are added.

Performance Implications of Prefixed and Nonprefixed Indexes

It is more expensive to probe into a nonprefixed index than to probe into a prefixed index.

If an index is prefixed (either local or global) and Oracle is presented with a predicate involving the index columns, then partition pruning can restrict application of the predicate to a subset of the index partitions.

For example, in Figure 5-4, if the predicate is deptno=15, the optimizer knows to apply the predicate only to the second partition of the index. (If the predicate involves a bind variable, the optimizer will not know exactly which partition but it may still know there is only one partition involved, in which case at run time, only one index partition will be accessed.)

When an index is nonprefixed, Oracle often has to apply a predicate involving the index columns to all Nindex partitions. This is required to look up a single key, or to do an index range scan. For a range scan, Oracle must also combine information from Nindex partitions. For example, in Figure 5-5, a local index is partitioned on chkdatewith an index key on acctno. If the predicate is acctno=31, Oracle probes all 12 index partitions.

Of course, if there is also a predicate on the partitioning columns, then multiple index probes might not be necessary. Oracle takes advantage of the fact that a local index is equipartitioned with the underlying table to prune partitions based on the partition key. For example, if the predicate in Figure 5-4 is chkdate<3/97, Oracle only has to probe two partitions.

So for a nonprefixed index, if the partition key is a part of the WHEREclause but not of the index key, then the optimizer determines which index partitions to probe based on the underlying table partition.

When many queries and DML statements using keys of local, nonprefixed, indexes have to probe all index partitions, this effectively reduces the degree of partition independence provided by such indexes.

Table 5-3 Comparing Prefixed Local, Nonprefixed Local, and Global Indexes

Index Characteristics Prefixed Local Nonprefixed Local Global
Unique possible? Yes Yes Yes. Must be global if using indexes on columns other than the partitioning columns
Manageability Easy to manage Easy to manage Harder to manage
OLTP Good Bad Good
Long Running (DSS) Good Good Not Good

Guidelines for Partitioning Indexes

When deciding how to partition indexes on a table, consider the mix of applications that need to access the table. There is a trade-off between performance on the one hand and availability and manageability on the other. Here are some of the guidelines you should consider:

  • For OLTP applications:
    • Global indexes and local prefixed indexes provide better performance than local nonprefixed indexes because they minimize the number of index partition probes.
    • Local indexes support more availability when there are partition or subpartition maintenance operations on the table. Local nonprefixed indexes are very useful for historical databases.
  • For DSS applications, local nonprefixed indexes can improve performance because many index partitions can be scanned in parallel by range queries on the index key.For example, a query using the predicate “acctnobetween 40 and 45″ on the table checksof Figure 5-4 causes parallel scans of all the partitions of the nonprefixed index ix3. On the other hand, a query using the predicate deptno BETWEEN 40 AND 45on the table deptnoof Figure 5-5 cannot be parallelized because it accesses a single partition of the prefixed index ix1.
  • For historical tables, indexes should be local if possible. This limits the impact of regularly scheduled drop partition operations.
  • Unique indexes on columns other than the partitioning columns must be global because unique local nonprefixed indexes whose key does not contain the partitioning key are not supported.

Physical Attributes of Index Partitions

Default physical attributes are initially specified when a CREATEINDEXstatement creates a partitioned index. Because there is no segment corresponding to the partitioned index itself, these attributes are only used in derivation of physical attributes of member partitions. Default physical attributes can later be modified using ALTERINDEXMODIFYDEFAULTATTRIBUTES.

Physical attributes of partitions created by CREATEINDEXare determined as follows:

  • Values of physical attributes specified (explicitly or by default) for the index are used whenever the value of a corresponding partition attribute is not specified. Handling of the TABLESPACEattribute of partitions of a LOCALindex constitutes an important exception to this rule in that in the absence of a user-specified TABLESPACEvalue (at both partition and index levels), that of the corresponding partition of the underlying table is used.
  • Physical attributes (other than TABLESPACE, as explained in the preceding) of partitions of local indexes created in the course of processing ALTERTABLEADDPARTITIONare set to the default physical attributes of each index.

Physical attributes (other than TABLESPACE) of index partitions created by ALTERTABLESPLITPARTITIONare determined as follows:

  • Values of physical attributes of the index partition being split are used.

Physical attributes of an existing index partition can be modified by ALTERINDEX MODIFYPARTITIONand ALTERINDEXREBUILDPARTITION. Resulting attributes are determined as follows:

  • Values of physical attributes of the partition before the statement was issued are used whenever a new value is not specified. Note that ALTERINDEXREBUILD PARTITIONcan be used to change the tablespace in which a partition resides.

Physical attributes of global index partitions created by ALTERINDEXSPLIT PARTITIONare determined as follows:

  • Values of physical attributes of the partition being split are used whenever a new value is not specified.
  • Physical attributes of all partitions of an index (along with default values) may be modified by ALTERINDEX, for example, ALTERINDEXindexnameNOLOGGINGchanges the logging mode of all partitions of indexnameto NOLOGGING.

Using Partitioning to Improve Data Warehouse Refresh

ETL (Extraction, Transformation and Loading) is done on a scheduled basis to reflect changes made to the original source system. During this step, you physically insert the new, clean data into the production data warehouse schema, and take all of the other steps necessary (such as building indexes, validating constraints, taking backups) to make this new data available to the end users. Once all of this data has been loaded into the data warehouse, the materialized views have to be updated to reflect the latest data.

The partitioning scheme of the data warehouse is often crucial in determining the efficiency of refresh operations in the data warehouse load process. In fact, the load process is often the primary consideration in choosing the partitioning scheme of data warehouse tables and indexes.

The partitioning scheme of the largest data warehouse tables (for example, the fact table in a star schema) should be based upon the loading paradigm of the data warehouse.

Most data warehouses are loaded with new data on a regular schedule. For example, every night, week, or month, new data is brought into the data warehouse. The data being loaded at the end of the week or month typically corresponds to the transactions for the week or month. In this very common scenario, the data warehouse is being loaded by time. This suggests that the data warehouse tables should be partitioned on a date column. In our data warehouse example, suppose the new data is loaded into the salestable every month. Furthermore, the salestable has been partitioned by month. These steps show how the load process will proceed to add the data for a new month (January 2001) to the table sales.

  1. Place the new data into a separate table, sales_01_2001. This data can be directly loaded into sales_01_2001from outside the data warehouse, or this data can be the result of previous data transformation operations that have already occurred in the data warehouse. sales_01_2001has the exact same columns, datatypes, and so forth, as the salestable. Gather statistics on the sales_01_2001table.
  2. Create indexes and add constraints on sales_01_2001. Again, the indexes and constraints on sales_01_2001should be identical to the indexes and constraints on sales. Indexes can be built in parallel and should use the NOLOGGINGand the COMPUTESTATISTICSoptions. For example:
    CREATE BITMAP INDEX sales_01_2001_customer_id_bix
      ON sales_01_2001(customer_id)
          TABLESPACE sales_idx NOLOGGING PARALLEL 8 COMPUTE STATISTICS;

    Apply all constraints to the sales_01_2001table that are present on the salestable. This includes referential integrity constraints. A typical constraint would be:

    ALTER TABLE sales_01_2001 ADD CONSTRAINT sales_customer_id
          REFERENCES customer(customer_id) ENABLE NOVALIDATE;

    If the partitioned table saleshas a primary or unique key that is enforced with a global index structure, ensure that the constraint on sales_pk_jan01is validated without the creation of an index structure, as in the following:

    ALTER TABLE sales_01_2001 ADD CONSTRAINT sales_pk_jan01
    PRIMARY KEY (sales_transaction_id) DISABLE VALIDATE;

    The creation of the constraint with ENABLEclause would cause the creation of a unique index, which does not match a local index structure of the partitioned table. You must not have any index structure built on the nonpartitioned table to be exchanged for existing global indexes of the partitioned table. The exchange command would fail.

  3. Add the sales_01_2001table to the salestable.In order to add this new data to the salestable, we need to do two things. First, we need to add a new partition to the salestable. We will use the ALTERTABLEADDPARTITIONstatement. This will add an empty partition to the salestable:
    ALTER TABLE sales ADD PARTITION sales_01_2001 
    VALUES LESS THAN (TO_DATE('01-FEB-2001', 'DD-MON-YYYY'));

    Then, we can add our newly created table to this partition using the EXCHANGEPARTITIONoperation. This will exchange the new, empty partition with the newly loaded table.

    ALTER TABLE sales EXCHANGE PARTITION sales_01_2001 WITH TABLE sales_01_2001 
    INCLUDING INDEXES WITHOUT VALIDATION UPDATE GLOBAL INDEXES;

    The EXCHANGEoperation will preserve the indexes and constraints that were already present on the sales_01_2001table. For unique constraints (such as the unique constraint on sales_transaction_id), you can use the UPDATEGLOBALINDEXESclause, as shown previously. This will automatically maintain your global index structures as part of the partition maintenance operation and keep them accessible throughout the whole process. If there were only foreign-key constraints, the exchange operation would be instantaneous.

The benefits of this partitioning technique are significant. First, the new data is loaded with minimal resource utilization. The new data is loaded into an entirely separate table, and the index processing and constraint processing are applied only to the new partition. If the salestable was 50 GB and had 12 partitions, then a new month’s worth of data contains approximately 4 GB. Only the new month’s worth of data needs to be indexed. None of the indexes on the remaining 46 GB of data needs to be modified at all. This partitioning scheme additionally ensures that the load processing time is directly proportional to the amount of new data being loaded, not to the total size of the salestable.

Second, the new data is loaded with minimal impact on concurrent queries. All of the operations associated with data loading are occurring on a separate sales_01_2001table. Therefore, none of the existing data or indexes of the salestable is affected during this data refresh process. The salestable and its indexes remain entirely untouched throughout this refresh process.

Third, in case of the existence of any global indexes, those are incrementally maintained as part of the exchange command. This maintenance does not affect the availability of the existing global index structures.

The exchange operation can be viewed as a publishing mechanism. Until the data warehouse administrator exchanges the sales_01_2001table into the salestable, end users cannot see the new data. Once the exchange has occurred, then any end user query accessing the salestable will immediately be able to see the sales_01_2001data.

Partitioning is useful not only for adding new data but also for removing and archiving data. Many data warehouses maintain a rolling window of data. For example, the data warehouse stores the most recent 36 months of salesdata. Just as a new partition can be added to the salestable (as described earlier), an old partition can be quickly (and independently) removed from the salestable. These two benefits (reduced resources utilization and minimal end-user impact) are just as pertinent to removing a partition as they are to adding a partition.

Removing data from a partitioned table does not necessarily mean that the old data is physically deleted from the database. There are two alternatives for removing old data from a partitioned table. First, you can physically delete all data from the database by dropping the partition containing the old data, thus freeing the allocated space:

ALTER TABLE sales DROP PARTITION sales_01_1998;

Also, you can exchange the old partition with an empty table of the same structure; this empty table is created equivalent to steps 1 and 2 described in the load process. Assuming the new empty table stub is named sales_archive_01_1998, the following SQL statement will empty partition sales_01_1998:

ALTER TABLE sales EXCHANGE PARTITION sales_01_1998 
WITH TABLE sales_archive_01_1998 INCLUDING INDEXES WITHOUT VALIDATION 
UPDATE GLOBAL INDEXES;

Note that the old data is still existent as the exchanged, nonpartitioned table sales_archive_01_1998.

If the partitioned table was setup in a way that every partition is stored in a separate tablespace, you can archive (or transport) this table using Oracle Database’s transportable tablespace framework before dropping the actual data (the tablespace).

In some situations, you might not want to drop the old data immediately, but keep it as part of the partitioned table; although the data is no longer of main interest, there are still potential queries accessing this old, read-only data. You can use Oracle’s data compression to minimize the space usage of the old data. We also assume that at least one compressed partition is already part of the partitioned table.

Refresh Scenarios

A typical scenario might not only need to compress old data, but also to merge several old partitions to reflect the granularity for a later backup of several merged partitions. Let us assume that a backup (partition) granularity is on a quarterly base for any quarter, where the oldest month is more than 36 months behind the most recent month. In this case, we are therefore compressing and merging sales_01_1998, sales_02_1998, and sales_03_1998into a new, compressed partition sales_q1_1998.

  1. Create the new merged partition in parallel in another tablespace. The partition will be compressed as part of the MERGEoperation:
    ALTER TABLE sales MERGE PARTITION sales_01_1998, sales_02_1998, sales_03_1998
     INTO PARTITION sales_q1_1998 TABLESPACE archive_q1_1998 
    COMPRESS UPDATE GLOBAL INDEXES PARALLEL 4;
  2. The partition MERGEoperation invalidates the local indexes for the new merged partition. We therefore have to rebuild them:
    ALTER TABLE sales MODIFY PARTITION sales_q1_1998 
    REBUILD UNUSABLE LOCAL INDEXES;

Alternatively, you can choose to create the new compressed table outside the partitioned table and exchange it back. The performance and the temporary space consumption is identical for both methods:

  1. Create an intermediate table to hold the new merged information. The following statement inherits all NOTNULLconstraints from the original table by default:
    CREATE TABLE sales_q1_1998_out TABLESPACE archive_q1_1998 
    NOLOGGING COMPRESS PARALLEL 4 AS SELECT * FROM sales
    WHERE time_id >=  TO_DATE('01-JAN-1998','dd-mon-yyyy')
      AND time_id < TO_DATE('01-APR-1998','dd-mon-yyyy');
  2. Create the equivalent index structure for table sales_q1_1998_outthan for the existing table sales.
  3. Prepare the existing table sales for the exchange with the new compressed table sales_q1_1998_out. Because the table to be exchanged contains data actually covered in three partitions, we have to create one matching partition, having the range boundaries we are looking for. You simply have to drop two of the existing partitions. Note that you have to drop the lower two partitions sales_01_1998and sales_02_1998; the lower boundary of a range partition is always defined by the upper (exclusive) boundary of the previous partition:
    ALTER TABLE sales DROP PARTITION sales_01_1998;
    ALTER TABLE sales DROP PARTITION sales_02_1998;
  4. You can now exchange table sales_q1_1998_outwith partition sales_03_1998. Unlike what the name of the partition suggests, its boundaries cover Q1-1998.
    ALTER TABLE sales EXCHANGE PARTITION sales_03_1998 
    WITH TABLE sales_q1_1998_out INCLUDING INDEXES WITHOUT VALIDATION 
    UPDATE GLOBAL INDEXES;

Both methods apply to slightly different business scenarios: Using the MERGEPARTITIONapproach invalidates the local index structures for the affected partition, but it keeps all data accessible all the time. Any attempt to access the affected partition through one of the unusable index structures raises an error. The limited availability time is approximately the time for re-creating the local bitmap index structures. In most cases this can be neglected, since this part of the partitioned table shouldn’t be touched too often.

The CTAS approach, however, minimizes unavailability of any index structures close to zero, but there is a specific time window, where the partitioned table does not have all the data, because we dropped two partitions. The limited availability time is approximately the time for exchanging the table. Depending on the existence and number of global indexes, this time window varies. Without any existing global indexes, this time window is a matter of a fraction to few seconds.

These examples are a simplification of the data warehouse rolling window load scenario. Real-world data warehouse refresh characteristics are always more complex. However, the advantages of this rolling window approach are not diminished in more complex scenarios.

Note that before you add single or multiple compressed partitions to a partitioned table for the first time, all local bitmap indexes must be either dropped or marked unusable. After the first compressed partition is added, no additional actions are necessary for all subsequent operations involving compressed partitions. It is irrelevant how the compressed partitions are added to the partitioned table.

Scenarios for Using Partitioning for Refreshing Data Warehouses

This section contains two typical scenarios where partitioning is used with refresh.

Refresh Scenario 1

Data is loaded daily. However, the data warehouse contains two years of data, so that partitioning by day might not be desired.

The solution is to partition by week or month (as appropriate). Use INSERTto add the new data to an existing partition. The INSERToperation only affects a single partition, so the benefits described previously remain intact. The INSERToperation could occur while the partition remains a part of the table. Inserts into a single partition can be parallelized:

INSERT /*+ APPEND*/ INTO sales PARTITION (sales_01_2001) 
SELECT * FROM new_sales;

The indexes of this salespartition will be maintained in parallel as well. An alternative is to use the EXCHANGEoperation. You can do this by exchanging the sales_01_2001partition of the salestable and then using an INSERToperation. You might prefer this technique when dropping and rebuilding indexes is more efficient than maintaining them.

Refresh Scenario 2

New data feeds, although consisting primarily of data for the most recent day, week, and month, also contain some data from previous time periods.

Solution 1

Use parallel SQL operations (such as CREATETABLEASSELECT) to separate the new data from the data in previous time periods. Process the old data separately using other techniques.

New data feeds are not solely time based. You can also feed new data into a data warehouse with data from multiple operational systems on a business need basis. For example, the sales data from direct channels may come into the data warehouse separately from the data from indirect channels. For business reasons, it may furthermore make sense to keep the direct and indirect data in separate partitions.

Solution 2

Oracle supports composite range-list partitioning. The primary partitioning strategy of the sales table could be range partitioning based on time_idas shown in the example. However, the subpartitioning is a list based on the channel attribute. Each subpartition can now be loaded independently of each other (for each distinct channel) and added in a rolling window operation as discussed before. The partitioning strategy addresses the business needs in the most optimal manner.

Optimizing DML Operations During Refresh

You can optimize DML performance through the following techniques:

Implementing an Efficient MERGE Operation

Commonly, the data that is extracted from a source system is not simply a list of new records that needs to be inserted into the data warehouse. Instead, this new data set is a combination of new records as well as modified records. For example, suppose that most of data extracted from the OLTP systems will be new sales transactions. These records will be inserted into the warehouse’s salestable, but some records may reflect modifications of previous transactions, such as returned merchandise or transactions that were incomplete or incorrect when initially loaded into the data warehouse. These records require updates to the salestable.

As a typical scenario, suppose that there is a table called new_salesthat contains both inserts and updates that will be applied to the salestable. When designing the entire data warehouse load process, it was determined that the new_salestable would contain records with the following semantics:

  • If a given sales_transaction_idof a record in new_salesalready exists in sales, then update the salestable by adding the sales_dollar_amountand sales_quantity_soldvalues from the new_salestable to the existing row in the salestable.
  • Otherwise, insert the entire new record from the new_salestable into the salestable.

This UPDATE-ELSE-INSERToperation is often called a merge. A merge can be executed using one SQL statement.

Example 15-1 MERGE Operation

MERGE INTO sales s USING new_sales n
ON (s.sales_transaction_id = n.sales_transaction_id)
WHEN MATCHED THEN
UPDATE SET s.sales_quantity_sold = s.sales_quantity_sold + n.sales_quantity_sold,
 s.sales_dollar_amount = s.sales_dollar_amount + n.sales_dollar_amount
WHEN NOT MATCHED THEN INSERT (sales_transaction_id, sales_quantity_sold, 
sales_dollar_amount)
VALUES (n.sales_transcation_id, n.sales_quantity_sold, n.sales_dollar_amount);

In addition to using the MERGEstatement for unconditional UPDATEELSEINSERTfunctionality into a target table, you can also use it to:

  • Perform an UPDATEonly or INSERTonly statement.
  • Apply additional WHEREconditions for the UPDATEor INSERTportion of the MERGEstatement.
  • The UPDATEoperation can even delete rows if a specific condition yields true.

Example 15-2 Omitting the INSERT Clause

In some data warehouse applications, it is not allowed to add new rows to historical information, but only to update them. It may also happen that you don’t want to update but only insert new information. The following example demonstrates INSERT-only with UPDATE-only functionality:

MERGE USING Product_Changes S     -- Source/Delta table
INTO Products D1                  -- Destination table 1
ON (D1.PROD_ID = S.PROD_ID)       -- Search/Join condition
WHEN MATCHED THEN UPDATE          -- update if join
SET D1.PROD_STATUS = S.PROD_NEW_STATUS

Example 15-3 Omitting the UPDATE Clause

The following statement illustrates an example of omitting an UPDATE:

MERGE USING New_Product S           -- Source/Delta table
INTO Products D2                    -- Destination table 2
ON (D2.PROD_ID = S.PROD_ID)         -- Search/Join condition
WHEN NOT MATCHED THEN               -- insert if no join
INSERT (PROD_ID, PROD_STATUS) VALUES (S.PROD_ID, S.PROD_NEW_STATUS)

When the INSERTclause is omitted, Oracle performs a regular join of the source and the target tables. When the UPDATEclause is omitted, Oracle performs an antijoin of the source and the target tables. This makes the join between the source and target table more efficient.

Example 15-4 Skipping the UPDATE Clause

In some situations, you may want to skip the UPDATEoperation when merging a given row into the table. In this case, you can use an optional WHEREclause in the UPDATEclause of the MERGE. As a result, the UPDATEoperation only executes when a given condition is true. The following statement illustrates an example of skipping the UPDATEoperation:

MERGE 
USING Product_Changes S                      -- Source/Delta table 
INTO Products P                              -- Destination table 1 
ON (P.PROD_ID = S.PROD_ID)                   -- Search/Join condition 
WHEN MATCHED THEN 
UPDATE                                       -- update if join 
SET P.PROD_LIST_PRICE = S.PROD_NEW_PRICE 
WHERE P.PROD_STATUS <> "OBSOLETE"            -- Conditional UPDATE

This shows how the UPDATEoperation would be skipped if the condition P.PROD_STATUS <> "OBSOLETE"is not true. The condition predicate can refer to both the target and the source table.

Example 15-5 Conditional Inserts with MERGE Statements

You may want to skip the INSERToperation when merging a given row into the table. So an optional WHEREclause is added to the INSERTclause of the MERGE. As a result, the INSERToperation only executes when a given condition is true. The following statement offers an example:

MERGE USING Product_Changes S                      -- Source/Delta table
INTO Products P                                    -- Destination table 1
ON (P.PROD_ID = S.PROD_ID)                         -- Search/Join condition
WHEN MATCHED THEN UPDATE                           -- update if join
SET P.PROD_LIST_PRICE = S.PROD_NEW_PRICE
WHERE P.PROD_STATUS <> "OBSOLETE"                  -- Conditional
WHEN NOT MATCHED THEN
INSERT (PROD_ID, PROD_STATUS, PROD_LIST_PRICE)     -- insert if not join
VALUES (S.PROD_ID, S.PROD_NEW_STATUS, S.PROD_NEW_PRICE)
WHERE S.PROD_STATUS <> "OBSOLETE";                 -- Conditional INSERT

This example shows that the INSERToperation would be skipped if the condition S.PROD_STATUS <> "OBSOLETE"is not true, and INSERTwill only occur if the condition is true. The condition predicate can refer to the source table only. The condition predicate can only refer to the source table.

Example 15-6 Using the DELETE Clause with MERGE Statements

You may want to cleanse tables while populating or updating them. To do this, you may want to consider using the DELETEclause in a MERGEstatement, as in the following example:

MERGE USING Product_Changes S
INTO Products D ON (D.PROD_ID = S.PROD_ID)
WHEN MATCHED THEN
UPDATE SET D.PROD_LIST_PRICE =S.PROD_NEW_PRICE, D.PROD_STATUS = S.PROD_NEWSTATUS
DELETE WHERE (D.PROD_STATUS = "OBSOLETE")
WHEN NOT MATCHED THEN
INSERT (PROD_ID, PROD_LIST_PRICE, PROD_STATUS)
VALUES (S.PROD_ID, S.PROD_NEW_PRICE, S.PROD_NEW_STATUS);

Thus when a row is updated in products, Oracle checks the delete condition D.PROD_STATUS = "OBSOLETE", and deletes the row if the condition yields true.

The DELETEoperation is not as same as that of a complete DELETEstatement. Only the rows from the destination of the MERGEcan be deleted. The only rows that will be affected by the DELETEare the ones that are updated by this MERGEstatement. Thus, although a given row of the destination table meets the delete condition, if it does not join under the ONclause condition, it will not be deleted.

Example 15-7 Unconditional Inserts with MERGE Statements

You may want to insert all of the source rows into a table. In this case, the join between the source and target table can be avoided. By identifying special constant join conditions that always result to FALSE, for example, 1=0, such MERGEstatements will be optimized and the join condition will be suppressed.

MERGE USING New_Product S       -- Source/Delta table 
INTO Products P                 -- Destination table 1 
ON (1 = 0)                      -- Search/Join condition 
WHEN NOT MATCHED THEN           -- insert if no join 
INSERT (PROD_ID, PROD_STATUS) VALUES (S.PROD_ID, S.PROD_NEW_STATUS)

Maintaining Referential Integrity

In some data warehousing environments, you might want to insert new data into tables in order to guarantee referential integrity. For example, a data warehouse may derive salesfrom an operational system that retrieves data directly from cash registers. salesis refreshed nightly. However, the data for the productdimension table may be derived from a separate operational system. The productdimension table may only be refreshed once for each week, because the producttable changes relatively slowly. If a new product was introduced on Monday, then it is possible for that product’s product_idto appear in the salesdata of the data warehouse before that product_idhas been inserted into the data warehouses producttable.

Although the sales transactions of the new product may be valid, this sales data will not satisfy the referential integrity constraint between the productdimension table and the salesfact table. Rather than disallow the new sales transactions, you might choose to insert the sales transactions into the salestable. However, you might also wish to maintain the referential integrity relationship between the salesand producttables. This can be accomplished by inserting new rows into the producttable as placeholders for the unknown products.

As in previous examples, we assume that the new data for the salestable will be staged in a separate table, new_sales. Using a single INSERTstatement (which can be parallelized), the producttable can be altered to reflect the new products:

INSERT INTO product
  (SELECT sales_product_id, 'Unknown Product Name', NULL, NULL ...
   FROM new_sales WHERE sales_product_id NOT IN
  (SELECT product_id FROM product));

Purging Data

Occasionally, it is necessary to remove large amounts of data from a data warehouse. A very common scenario is the rolling window discussed previously, in which older data is rolled out of the data warehouse to make room for new data.

However, sometimes other data might need to be removed from a data warehouse. Suppose that a retail company has previously sold products from XYZSoftware, and that XYZSoftwarehas subsequently gone out of business. The business users of the warehouse may decide that they are no longer interested in seeing any data related to XYZSoftware, so this data should be deleted.

One approach to removing a large volume of data is to use parallel delete as shown in the following statement:

DELETE FROM sales WHERE sales_product_id IN (SELECT product_id 
   FROM product WHERE product_category = 'XYZ Software');

This SQL statement will spawn one parallel process for each partition. This approach will be much more efficient than a serial DELETEstatement, and none of the data in the salestable will need to be moved. However, this approach also has some disadvantages. When removing a large percentage of rows, the DELETEstatement will leave many empty row-slots in the existing partitions. If new data is being loaded using a rolling window technique (or is being loaded using direct-path INSERTor load), then this storage space will not be reclaimed. Moreover, even though the DELETEstatement is parallelized, there might be more efficient methods. An alternative method is to re-create the entire salestable, keeping the data for all product categories except XYZSoftware.

CREATE TABLE sales2 AS SELECT * FROM sales, product
WHERE sales.sales_product_id = product.product_id 
AND product_category <> 'XYZ Software'
NOLOGGING PARALLEL (DEGREE 8)
#PARTITION ... ; #create indexes, constraints, and so on
DROP TABLE SALES;
RENAME SALES2 TO SALES;

This approach may be more efficient than a parallel delete. However, it is also costly in terms of the amount of disk space, because the salestable must effectively be instantiated twice.

An alternative method to utilize less space is to re-create the salestable one partition at a time:

CREATE TABLE sales_temp AS SELECT * FROM sales WHERE 1=0;
INSERT INTO sales_temp PARTITION (sales_99jan)
SELECT * FROM sales, product
WHERE sales.sales_product_id = product.product_id
AND product_category <> 'XYZ Software';
<create appropriate indexes and constraints on sales_temp>
ALTER TABLE sales EXCHANGE PARTITION sales_99jan WITH TABLE sales_temp;

Continue this process for each partition in the salestable.

Refreshing Materialized Views

When creating a materialized view, you have the option of specifying whether the refresh occurs ONDEMANDor ONCOMMIT. In the case of ONCOMMIT, the materialized view is changed every time a transaction commits, thus ensuring that the materialized view always contains the latest data. Alternatively, you can control the time when refresh of the materialized views occurs by specifying ONDEMAND. In this case, the materialized view can only be refreshed by calling one of the procedures in the DBMS_MVIEWpackage.

DBMS_MVIEWprovides three different types of refresh operations.

  • DBMS_MVIEW.REFRESHRefresh one or more materialized views.
  • DBMS_MVIEW.REFRESH_ALL_MVIEWSRefresh all materialized views.
  • DBMS_MVIEW.REFRESH_DEPENDENTRefresh all materialized views that depend on a specified master table or materialized view or list of master tables or materialized views.

Performing a refresh operation requires temporary space to rebuild the indexes and can require additional space for performing the refresh operation itself. Some sites might prefer not to refresh all of their materialized views at the same time: as soon as some underlying detail data has been updated, all materialized views using this data will become stale. Therefore, if you defer refreshing your materialized views, you can either rely on your chosen rewrite integrity level to determine whether or not a stale materialized view can be used for query rewrite, or you can temporarily disable query rewrite with an ALTERSYSTEMSETQUERY_REWRITE_ENABLED = FALSEstatement. After refreshing the materialized views, you can re-enable query rewrite as the default for all sessions in the current database instance by specifying ALTERSYSTEMSETQUERY_REWRITE_ENABLEDas TRUE. Refreshing a materialized view automatically updates all of its indexes. In the case of full refresh, this requires temporary sort space to rebuild all indexes during refresh. This is because the full refresh truncates or deletes the table before inserting the new full data volume. If insufficient temporary space is available to rebuild the indexes, then you must explicitly drop each index or mark it UNUSABLEprior to performing the refresh operation.

If you anticipate performing insert, update or delete operations on tables referenced by a materialized view concurrently with the refresh of that materialized view, and that materialized view includes joins and aggregation, Oracle recommends you use ONCOMMITfast refresh rather than ONDEMANDfast refresh.

Complete Refresh

A complete refresh occurs when the materialized view is initially defined as BUILDIMMEDIATE, unless the materialized view references a prebuilt table. For materialized views using BUILDDEFERRED, a complete refresh must be requested before it can be used for the first time. A complete refresh may be requested at any time during the life of any materialized view. The refresh involves reading the detail tables to compute the results for the materialized view. This can be a very time-consuming process, especially if there are huge amounts of data to be read and processed. Therefore, you should always consider the time required to process a complete refresh before requesting it.

There are, however, cases when the only refresh method available for an already built materialized view is complete refresh because the materialized view does not satisfy the conditions specified in the following section for a fast refresh.

Fast Refresh

Most data warehouses have periodic incremental updates to their detail data. As described in “Materialized View Schema Design”, you can use the SQL*Loader or any bulk load utility to perform incremental loads of detail data. Fast refresh of your materialized views is usually efficient, because instead of having to recompute the entire materialized view, the changes are applied to the existing data. Thus, processing only the changes can result in a very fast refresh time.

Partition Change Tracking (PCT) Refresh

When there have been some partition maintenance operations on the detail tables, this is the only method of fast refresh that can be used. PCT-based refresh on a materialized view is enabled only if all the conditions described in “Partition Change Tracking” are satisfied.

In the absence of partition maintenance operations on detail tables, when you request a FASTmethod (method => 'F') of refresh through procedures in DBMS_MVIEWpackage, Oracle will use a heuristic rule to try log-based rule fast refresh before choosing PCT refresh. Similarly, when you request a FORCEmethod (method => '?'), Oracle will choose the refresh method based on the following attempt order: log-based fast refresh, PCT refresh, and complete refresh. Alternatively, you can request the PCT method (method => 'P'), and Oracle will use the PCT method provided all PCT requirements are satisfied.

Oracle can use TRUNCATEPARTITIONon a materialized view if it satisfies the conditions in “Benefits of Partitioning a Materialized View” and hence, make the PCT refresh process more efficient.

ON COMMIT Refresh

A materialized view can be refreshed automatically using the ONCOMMITmethod. Therefore, whenever a transaction commits which has updated the tables on which a materialized view is defined, those changes will be automatically reflected in the materialized view. The advantage of using this approach is you never have to remember to refresh the materialized view. The only disadvantage is the time required to complete the commit will be slightly longer because of the extra processing involved. However, in a data warehouse, this should not be an issue because there is unlikely to be concurrent processes trying to update the same table.

Manual Refresh Using the DBMS_MVIEW Package

When a materialized view is refreshed ONDEMAND, one of four refresh methods can be specified as shown in the following table. You can define a default option during the creation of the materialized view. Table 15-1 details the refresh options.

Table 15-1 ON DEMAND Refresh Methods

Refresh Option Parameter Description
COMPLETE C Refreshes by recalculating the defining query of the materialized view.
FAST F Refreshes by incrementally applying changes to the materialized view.For local materialized views, it chooses the refresh method which is estimated by optimizer to be most efficient. The refresh methods considered are log-based FASTand FAST_PCT.
FAST_PCT P Refreshes by recomputing the rows in the materialized view affected by changed partitions in the detail tables.
FORCE ? Attempts a fast refresh. If that is not possible, it does a complete refresh.For local materialized views, it chooses the refresh method which is estimated by optimizer to be most efficient. The refresh methods considered are log based FAST, FAST_PCT, and COMPLETE.

Three refresh procedures are available in the DBMS_MVIEWpackage for performing ONDEMANDrefresh. Each has its own unique set of parameters.

Refresh Specific Materialized Views with REFRESH

Use the DBMS_MVIEW.REFRESHprocedure to refresh one or more materialized views. Some parameters are used only for replication, so they are not mentioned here. The required parameters to use this procedure are:

  • The comma-delimited list of materialized views to refresh
  • The refresh method: F-Fast, P-Fast_PCT, ?-Force, C-Complete
  • The rollback segment to use
  • Refresh after errors (TRUEor FALSE)A Boolean parameter. If set to TRUE, the number_of_failuresoutput parameter will be set to the number of refreshes that failed, and a generic error message will indicate that failures occurred. The alert log for the instance will give details of refresh errors. If set to FALSE, the default, then refresh will stop after it encounters the first error, and any remaining materialized views in the list will not be refreshed.
  • The following four parameters are used by the replication process. For warehouse refresh, set them to FALSE, 0,0,0.
  • Atomic refresh (TRUEor FALSE)If set to TRUE, then all refreshes are done in one transaction. If set to FALSE, then the refresh of each specified materialized view is done in a separate transaction. If set to FALSE, Oracle can optimize refresh by using parallel DML and truncate DDL on a materialized views.

For example, to perform a fast refresh on the materialized view cal_month_sales_mv, the DBMS_MVIEWpackage would be called as follows:

DBMS_MVIEW.REFRESH('CAL_MONTH_SALES_MV', 'F', '', TRUE, FALSE, 0,0,0, FALSE);

Multiple materialized views can be refreshed at the same time, and they do not all have to use the same refresh method. To give them different refresh methods, specify multiple method codes in the same order as the list of materialized views (without commas). For example, the following specifies that cal_month_sales_mvbe completely refreshed and fweek_pscat_sales_mvreceive a fast refresh:

DBMS_MVIEW.REFRESH('CAL_MONTH_SALES_MV, FWEEK_PSCAT_SALES_MV', 'CF', '', 
  TRUE, FALSE, 0,0,0, FALSE);

If the refresh method is not specified, the default refresh method as specified in the materialized view definition will be used.

Refresh All Materialized Views with REFRESH_ALL_MVIEWS

An alternative to specifying the materialized views to refresh is to use the procedure DBMS_MVIEW.REFRESH_ALL_MVIEWS. This procedure refreshes all materialized views. If any of the materialized views fails to refresh, then the number of failures is reported.

The parameters for this procedure are:

  • The number of failures (this is an OUTvariable)
  • The refresh method: F-Fast, P-Fast_PCT, ?-Force, C-Complete
  • Refresh after errors (TRUEor FALSE)A Boolean parameter. If set to TRUE, the number_of_failuresoutput parameter will be set to the number of refreshes that failed, and a generic error message will indicate that failures occurred. The alert log for the instance will give details of refresh errors. If set to FALSE, the default, then refresh will stop after it encounters the first error, and any remaining materialized views in the list will not be refreshed.
  • Atomic refresh (TRUEor FALSE)If set to TRUE, then all refreshes are done in one transaction. If set to FALSE, then the refresh of each specified materialized view is done in a separate transaction. If set to FALSE, Oracle can optimize refresh by using parallel DML and truncate DDL on a materialized views.

An example of refreshing all materialized views is the following:

DBMS_MVIEW.REFRESH_ALL_MVIEWS(failures,'C','', TRUE, FALSE);

Refresh Dependent Materialized Views with REFRESH_DEPENDENT

The third procedure, DBMS_MVIEW.REFRESH_DEPENDENT, refreshes only those materialized views that depend on a specific table or list of tables. For example, suppose the changes have been received for the orderstable but not for customerpayments. The refresh dependent procedure can be called to refresh only those materialized views that reference the orderstable.

The parameters for this procedure are:

  • The number of failures (this is an OUTvariable)
  • The dependent table
  • The refresh method: F-Fast, P-Fast_PCT, ?-Force, C-Complete
  • The rollback segment to use
  • Refresh after errors (TRUEor FALSE)A Boolean parameter. If set to TRUE, the number_of_failuresoutput parameter will be set to the number of refreshes that failed, and a generic error message will indicate that failures occurred. The alert log for the instance will give details of refresh errors. If set to FALSE, the default, then refresh will stop after it encounters the first error, and any remaining materialized views in the list will not be refreshed.
  • Atomic refresh (TRUEor FALSE)If set to TRUE, then all refreshes are done in one transaction. If set to FALSE, then the refresh of each specified materialized view is done in a separate transaction. If set to FALSE, Oracle can optimize refresh by using parallel DML and truncate DDL on a materialized views.
  • Whether it is nested or notIf set to TRUE, refresh all the dependent materialized views of the specified set of tables based on a dependency order to ensure the materialized views are truly fresh with respect to the underlying base tables.

To perform a full refresh on all materialized views that reference the customerstable, specify:

DBMS_MVIEW.REFRESH_DEPENDENT(failures, 'CUSTOMERS', 'C', '', FALSE, FALSE );

To obtain the list of materialized views that are directly dependent on a given object (table or materialized view), use the procedure DBMS_MVIEW.GET_MV_DEPENDENCIESto determine the dependent materialized views for a given table, or for deciding the order to refresh nested materialized views.

DBMS_MVIEW.GET_MV_DEPENDENCIES(mvlist IN  VARCHAR2, deplist OUT  VARCHAR2)

The input to this function is the name or names of the materialized view. The output is a comma-delimited list of the materialized views that are defined on it. For example, the following statement:

DBMS_MVIEW.GET_MV_DEPENDENCIES("JOHN.SALES_REG, SCOTT.PROD_TIME", deplist)

This populates deplistwith the list of materialized views defined on the input arguments. For example:

deplist <= "JOHN.SUM_SALES_WEST, JOHN.SUM_SALES_EAST, SCOTT.SUM_PROD_MONTH".

Using Job Queues for Refresh

Job queues can be used to refresh multiple materialized views in parallel. If queues are not available, fast refresh will sequentially refresh each view in the foreground process. To make queues available, you must set the JOB_QUEUE_PROCESSESparameter. This parameter defines the number of background job queue processes and determines how many materialized views can be refreshed concurrently. Oracle tries to balance the number of concurrent refreshes with the degree of parallelism of each refresh. The order in which the materialized views are refreshed is determined by dependencies imposed by nested materialized views and potential for efficient refresh by using query rewrite against other materialized views . This parameter is only effective when atomic_refreshis set to FALSE.

If the process that is executing DBMS_MVIEW.REFRESHis interrupted or the instance is shut down, any refresh jobs that were executing in job queue processes will be requeued and will continue running. To remove these jobs, use the DBMS_JOB.REMOVEprocedure.

When Fast Refresh is Possible

Not all materialized views may be fast refreshable. Therefore, use the package DBMS_MVIEW.EXPLAIN_MVIEWto determine what refresh methods are available for a materialized view.

If you are not sure how to make a materialized view fast refreshable, you can use the DBMS_ADVISOR.TUNE_MVIEWprocedure, which will provide a script containing the statements required to create a fast refreshable materialized view.

Recommended Initialization Parameters for Parallelism

The following initialization parameters need to be set properly for parallelism to be effective:

  • PARALLEL_MAX_SERVERSshould be set high enough to take care of parallelism. You need to consider the number of slaves needed for the refresh statement. For example, with a degree of parallelism of eight, you need 16 slave processes.
  • PGA_AGGREGATE_TARGETshould be set for the instance to manage the memory usage for sorts and joins automatically. If the memory parameters are set manually, SORT_AREA_SIZEshould be less than HASH_AREA_SIZE.
  • OPTIMIZER_MODEshould equal all_rows.

Remember to analyze all tables and indexes for better optimization.

Monitoring a Refresh

While a job is running, you can query the V$SESSION_LONGOPSview to tell you the progress of each materialized view being refreshed.

SELECT * FROM V$SESSION_LONGOPS;

To look at the progress of which jobs are on which queue, use:

SELECT * FROM DBA_JOBS_RUNNING;

Checking the Status of a Materialized View

Three views are provided for checking the status of a materialized view: DBA_MVEIWS, ALL_MVIEWS, and USER_MVIEWS. To check if a materialized view is fresh or stale, issue the following statement:

SELECT MVIEW_NAME, STALENESS, LAST_REFRESH_TYPE, COMPILE_STATE 
FROM USER_MVIEWS ORDER BY MVIEW_NAME;

MVIEW_NAME            STALENESS      LAST_REF       COMPILE_STATE
----------            ---------      --------       -------------
CUST_MTH_SALES_MV     NEEDS_COMPILE  FAST           NEEDS_COMPILE
PROD_YR_SALES_MV      FRESH          FAST           VALID

If the compile_statecolumn shows NEEDSCOMPILE, the other displayed column values cannot be trusted as reflecting the true status. To revalidate the materialized view, issue the following statement:

ALTER MATERIALIZED VIEW [materialized_view_name] COMPILE;

Then reissue the SELECTstatement.

Scheduling Refresh

Very often you will have multiple materialized views in the database. Some of these can be computed by rewriting against others. This is very common in data warehousing environment where you may have nested materialized views or materialized views at different levels of some hierarchy.

In such cases, you should create the materialized views as BUILD DEFERRED, and then issue one of the refresh procedures in DBMS_MVIEWpackage to refresh all the materialized views. Oracle Database will compute the dependencies and refresh the materialized views in the right order. Consider the example of a complete hierarchical cube described in “Examples of Hierarchical Cube Materialized Views”. Suppose all the materialized views have been created as BUILD DEFERRED. Creating the materialized views as BUILD DEFERREDwill only create the metadata for all the materialized views. And, then, you can just call one of the refresh procedures in DBMS_MVIEWpackage to refresh all the materialized views in the right order:

DECLARE numerrs PLS_INTEGER;
BEGIN DBMS_MVIEW.REFRESH_DEPENDENT (
   number_of_failures => numerrs, list=>'SALES', method => 'C');
DBMS_OUTPUT.PUT_LINE('There were ' || numerrs || ' errors during refresh');
END;
/

The procedure will refresh the materialized views in the order of their dependencies (first sales_hierarchical_mon_cube_mv, followed by sales_hierarchical_qtr_cube_mv, then, sales_hierarchical_yr_cube_mvand finally, sales_hierarchical_all_cube_mv). Each of these materialized views will get rewritten against the one prior to it in the list).

The same kind of rewrite can also be used while doing PCT refresh. PCT refresh recomputes rows in a materialized view corresponding to changed rows in the detail tables. And, if there are other fresh materialized views available at the time of refresh, it can go directly against them as opposed to going against the detail tables.

Hence, it is always beneficial to pass a list of materialized views to any of the refresh procedures in DBMS_MVIEWpackage (irrespective of the method specified) and let the procedure figure out the order of doing refresh on materialized views.

Tips for Refreshing Materialized Views with Aggregates

Following are some guidelines for using the refresh mechanism for materialized views with aggregates.

  • For fast refresh, create materialized view logs on all detail tables involved in a materialized view with the ROWID, SEQUENCEand INCLUDINGNEWVALUESclauses.Include all columns from the table likely to be used in materialized views in the materialized view logs.Fast refresh may be possible even if the SEQUENCEoption is omitted from the materialized view log. If it can be determined that only inserts or deletes will occur on all the detail tables, then the materialized view log does not require the SEQUENCEclause. However, if updates to multiple tables are likely or required or if the specific update scenarios are unknown, make sure the SEQUENCEclause is included.
  • Use Oracle’s bulk loader utility or direct-path INSERT(INSERTwith the APPENDhint for loads).This is a lot more efficient than conventional insert. During loading, disable all constraints and re-enable when finished loading. Note that materialized view logs are required regardless of whether you use direct load or conventional DML.Try to optimize the sequence of conventional mixed DML operations, direct-path INSERTand the fast refresh of materialized views. You can use fast refresh with a mixture of conventional DML and direct loads. Fast refresh can perform significant optimizations if it finds that only direct loads have occurred, as illustrated in the following:
    1. Direct-path INSERT(SQL*Loader or INSERT /*+ APPEND */) into the detail table
    2. Refresh materialized view
    3. Conventional mixed DML
    4. Refresh materialized view

    You can use fast refresh with conventional mixed DML (INSERT, UPDATE, and DELETE) to the detail tables. However, fast refresh will be able to perform significant optimizations in its processing if it detects that only inserts or deletes have been done to the tables, such as:

    • DML INSERTor DELETEto the detail table
    • Refresh materialized views
    • DML update to the detail table
    • Refresh materialized view

    Even more optimal is the separation of INSERTand DELETE.

    If possible, refresh should be performed after each type of data change (as shown earlier) rather than issuing only one refresh at the end. If that is not possible, restrict the conventional DML to the table to inserts only, to get much better refresh performance. Avoid mixing deletes and direct loads.

    Furthermore, for refresh ONCOMMIT, Oracle keeps track of the type of DML done in the committed transaction. Therefore, do not perform direct-path INSERTand DML to other tables in the same transaction, as Oracle may not be able to optimize the refresh phase.

    For ONCOMMITmaterialized views, where refreshes automatically occur at the end of each transaction, it may not be possible to isolate the DML statements, in which case keeping the transactions short will help. However, if you plan to make numerous modifications to the detail table, it may be better to perform them in one transaction, so that refresh of the materialized view will be performed just once at commit time rather than after each update.

  • Oracle recommends partitioning the tables because it enables you to use:
    • Parallel DMLFor large loads or refresh, enabling parallel DML will help shorten the length of time for the operation.
    • Partition Change Tracking (PCT) fast refreshYou can refresh your materialized views fast after partition maintenance operations on the detail tables. “Partition Change Tracking” for details on enabling PCT for materialized views.
  • Partitioning the materialized view will also help refresh performance as refresh can update the materialized view using parallel DML. For example, assume that the detail tables and materialized view are partitioned and have a parallel clause. The following sequence would enable Oracle to parallelize the refresh of the materialized view.
    1. Bulk load into the detail table.
    2. Enable parallel DML with an ALTERSESSIONENABLEPARALLELDMLstatement.
    3. Refresh the materialized view.
  • For refresh using DBMS_MVIEW.REFRESH, set the parameter atomic_refreshto FALSE.
    • For COMPLETErefresh, this will TRUNCATEto delete existing rows in the materialized view, which is faster than a delete.
    • For PCTrefresh, if the materialized view is partitioned appropriately, this will use TRUNCATEPARTITIONto delete rows in the affected partitions of the materialized view, which is faster than a delete.
    • For FASTor FORCErefresh, if COMPLETEor PCT refresh is chosen, this will be able to use the TRUNCATEoptimizations described earlier.
  • When using DBMS_MVIEW.REFRESHwith JOB_QUEUES, remember to set atomicto FALSE. Otherwise, JOB_QUEUESwill not get used. Set the number of job queue processes greater than the number of processors.If job queues are enabled and there are many materialized views to refresh, it is faster to refresh all of them in a single command than to call them individually.
  • Use REFRESHFORCEto ensure refreshing a materialized view so that it can definitely be used for query rewrite. The best refresh method will be chosen. If a fast refresh cannot be done, a complete refresh will be performed.
  • Refresh all the materialized views in a single procedure call. This gives Oracle an opportunity to schedule refresh of all the materialized views in the right order taking into account dependencies imposed by nested materialized views and potential for efficient refresh by using query rewrite against other materialized views.

Tips for Refreshing Materialized Views Without Aggregates

If a materialized view contains joins but no aggregates, then having an index on each of the join column rowids in the detail table will enhance refresh performance greatly, because this type of materialized view tends to be much larger than materialized views containing aggregates. For example, consider the following materialized view:

CREATE MATERIALIZED VIEW detail_fact_mv BUILD IMMEDIATE AS
SELECT s.rowid "sales_rid", t.rowid "times_rid", c.rowid "cust_rid",
   c.cust_state_province, t.week_ending_day, s.amount_sold
FROM sales s, times t, customers c 
WHERE s.time_id = t.time_id AND s.cust_id = c.cust_id;

Indexes should be created on columns sales_rid, times_ridand cust_rid. Partitioning is highly recommended, as is enabling parallel DML in the session before invoking refresh, because it will greatly enhance refresh performance.

This type of materialized view can also be fast refreshed if DML is performed on the detail table. It is recommended that the same procedure be applied to this type of materialized view as for a single table aggregate. That is, perform one type of change (direct-path INSERTor DML) and then refresh the materialized view. This is because Oracle Database can perform significant optimizations if it detects that only one type of change has been done.

Also, Oracle recommends that the refresh be invoked after each table is loaded, rather than load all the tables and then perform the refresh.

For refresh ONCOMMIT, Oracle keeps track of the type of DML done in the committed transaction. Oracle therefore recommends that you do not perform direct-path and conventional DML to other tables in the same transaction because Oracle may not be able to optimize the refresh phase. For example, the following is not recommended:

  1. Direct load new data into the fact table
  2. DML into the store table
  3. Commit

Also, try not to mix different types of conventional DML statements if possible. This would again prevent using various optimizations during fast refresh. For example, try to avoid the following:

  1. Insert into the fact table
  2. Delete from the fact table
  3. Commit

If many updates are needed, try to group them all into one transaction because refresh will be performed just once at commit time, rather than after each update.

When you use the DBMS_MVIEWpackage to refresh a number of materialized views containing only joins with the ATOMICparameter set to TRUE, if you disable parallel DML, refresh performance may degrade.

In a data warehousing environment, assuming that the materialized view has a parallel clause, the following sequence of steps is recommended:

  1. Bulk load into the fact table
  2. Enable parallel DML
  3. An ALTERSESSIONENABLEPARALLELDMLstatement
  4. Refresh the materialized view

Tips for Refreshing Nested Materialized Views

All underlying objects are treated as ordinary tables when refreshing materialized views. If the ONCOMMITrefresh option is specified, then all the materialized views are refreshed in the appropriate order at commit time. In other words, Oracle builds a partially ordered set of materialized views and refreshes them such that, after the successful completion of the refresh, all the materialized views are fresh. The status of the materialized views can be checked by querying the appropriate USER_, DBA_, or ALL_MVIEWSview.

If any of the materialized views are defined as ONDEMANDrefresh (irrespective of whether the refresh method is FAST, FORCE, or COMPLETE), you will need to refresh them in the correct order (taking into account the dependencies between the materialized views) because the nested materialized view will be refreshed with respect to the current contents of the other materialized views (whether fresh or not). This can be achieved by invoking the refresh procedure against the materialized view at the top of the nested hierarchy and specifying the nestedparameter as TRUE.

If a refresh fails during commit time, the list of materialized views that has not been refreshed is written to the alert log, and you must manually refresh them along with all their dependent materialized views.

Use the same DBMS_MVIEWprocedures on nested materialized views that you use on regular materialized views.

These procedures have the following behavior when used with nested materialized views:

  • If REFRESHis applied to a materialized view my_mvthat is built on other materialized views, then my_mvwill be refreshed with respect to the current contents of the other materialized views (that is, the other materialized views will not be made fresh first) unless you specify nested => TRUE.
  • If REFRESH_DEPENDENTis applied to materialized view my_mv, then only materialized views that directly depend on my_mvwill be refreshed (that is, a materialized view that depends on a materialized view that depends on my_mvwill not be refreshed) unless you specify nested => TRUE.
  • If REFRESH_ALL_MVIEWSis used, the order in which the materialized views will be refreshed is guaranteed to respect the dependencies between nested materialized views.
  • GET_MV_DEPENDENCIESprovides a list of the immediate (or direct) materialized view dependencies for an object.

Tips for Fast Refresh with UNION ALL

You can use fast refresh for materialized views that use the UNIONALLoperator by providing a maintenance column in the definition of the materialized view. For example, a materialized view with a UNIONALLoperator can be made fast refreshable as follows:

CREATE MATERIALIZED VIEW fast_rf_union_all_mv AS
SELECT x.rowid AS r1, y.rowid AS r2, a, b, c, 1 AS marker
FROM x, y WHERE x.a = y.b 
UNION ALL 
SELECT p.rowid, r.rowid, a, c, d, 2 AS marker
FROM p, r WHERE p.a = r.y;

The form of a maintenance marker column, column MARKERin the example, must be numeric_or_string_literalAScolumn_alias, where each UNIONALLmember has a distinct value for numeric_or_string_literal.

Tips After Refreshing Materialized Views

After you have performed a load or incremental load and rebuilt the detail table indexes, you need to re-enable integrity constraints (if any) and refresh the materialized views and materialized view indexes that are derived from that detail data. In a data warehouse environment, referential integrity constraints are normally enabled with the NOVALIDATEor RELYoptions. An important decision to make before performing a refresh operation is whether the refresh needs to be recoverable. Because materialized view data is redundant and can always be reconstructed from the detail tables, it might be preferable to disable logging on the materialized view. To disable logging and run incremental refresh non-recoverably, use the ALTERMATERIALIZEDVIEWNOLOGGINGstatement prior to refreshing.

If the materialized view is being refreshed using the ONCOMMITmethod, then, following refresh operations, consult the alert log alert_SID.logand the trace file ora_SID_number.trcto check that no errors have occurred.

Using Materialized Views with Partitioned Tables

A major maintenance component of a data warehouse is synchronizing (refreshing) the materialized views when the detail data changes. Partitioning the underlying detail tables can reduce the amount of time taken to perform the refresh task. This is possible because partitioning enables refresh to use parallel DML to update the materialized view. Also, it enables the use of Partition Change Tracking.

Fast Refresh with Partition Change Tracking

In a data warehouse, changes to the detail tables can often entail partition maintenance operations, such as DROP, EXCHANGE, MERGE, and ADDPARTITION. To maintain the materialized view after such operations used to require manual maintenance  or complete refresh. You now have the option of using an addition to fast refresh known as Partition Change Tracking (PCT) refresh.

For PCT to be available, the detail tables must be partitioned. The partitioning of the materialized view itself has no bearing on this feature. If PCT refresh is possible, it will occur automatically and no user intervention is required in order for it to occur.

The following examples illustrate the use of this feature. In “PCT Fast Refresh Scenario 1”, assume salesis a partitioned table using the time_idcolumn and productsis partitioned by the prod_categorycolumn. The table timesis not a partitioned table.

PCT Fast Refresh Scenario 1

  1. The following materialized view satisfies requirements for PCT.
    CREATE MATERIALIZED VIEW cust_mth_sales_mv
    BUILD IMMEDIATE
    REFRESH FAST ON DEMAND
    ENABLE QUERY REWRITE AS
    SELECT s.time_id, s.prod_id, SUM(s.quantity_sold), SUM(s.amount_sold),
           p.prod_name, t.calendar_month_name, COUNT(*),
           COUNT(s.quantity_sold), COUNT(s.amount_sold)
    FROM sales s, products p, times t
    WHERE  s.time_id = t.time_id AND s.prod_id = p.prod_id
    GROUP BY t.calendar_month_name, s.prod_id, p.prod_name, s.time_id;
  2. You can use the DBMS_MVIEW.EXPLAIN_MVIEWprocedure to determine which tables will allow PCT refresh.
    MVNAME              CAPABILITY_NAME   POSSIBLE  RELATED_TEXT  MSGTXT
    -----------------   ---------------   --------  ------------  ----------------
    CUST_MTH_SALES_MV   PCT               Y         SALES
    CUST_MTH_SALES_MV   PCT_TABLE         Y         SALES
    CUST_MTH_SALES_MV   PCT_TABLE         N         PRODUCTS      no partition key
                                                                  or PMARKER
                                                                  in SELECT list
    CUST_MTH_SALES_MV   PCT_TABLE         N         TIMES         relation is not
                                                                  partitionedtable

    As can be seen from the partial sample output from EXPLAIN_MVIEW, any partition maintenance operation performed on the salestable will allow PCT fast refresh. However, PCT is not possible after partition maintenance operations or updates to the productstable as there is insufficient information contained in cust_mth_sales_mvfor PCT refresh to be possible. Note that the timestable is not partitioned and hence can never allow for PCT refresh. Oracle will apply PCT refresh if it can determine that the materialized view has sufficient information to support PCT for all the updated tables.

  3. Suppose at some later point, a SPLIToperation of one partition in the sales table becomes necessary.
    ALTER TABLE SALES
    SPLIT PARTITION month3 AT (TO_DATE('05-02-1998', 'DD-MM-YYYY'))
    INTO (PARTITION month3_1 TABLESPACE summ,
          PARTITION month3 TABLESPACE summ);
  4. Insert some data into the salestable.
  5. Fast refresh cust_mth_sales_mvusing the DBMS_MVIEW.REFRESHprocedure.
    EXECUTE DBMS_MVIEW.REFRESH('CUST_MTH_SALES_MV', 'F',
       '',TRUE,FALSE,0,0,0,FALSE);

Fast refresh will automatically do a PCT refresh as it is the only fast refresh possible in this scenario. However, fast refresh will not occur if a partition maintenance operation occurs when any update has taken place to a table on which PCT is not enabled. This is shown in “PCT Fast Refresh Scenario 2”.

“PCT Fast Refresh Scenario 1” would also be appropriate if the materialized view was created using the PMARKERclause as illustrated in the following:

CREATE MATERIALIZED VIEW cust_sales_marker_mv
BUILD IMMEDIATE
REFRESH FAST ON DEMAND
ENABLE QUERY REWRITE AS
SELECT DBMS_MVIEW.PMARKER(s.rowid) s_marker, SUM(s.quantity_sold),
  SUM(s.amount_sold), p.prod_name, t.calendar_month_name, COUNT(*),
  COUNT(s.quantity_sold), COUNT(s.amount_sold)
FROM sales s, products p, times t
WHERE  s.time_id = t.time_id AND s.prod_id = p.prod_id
GROUP BY DBMS_MVIEW.PMARKER(s.rowid),
         p.prod_name, t.calendar_month_name;

PCT Fast Refresh Scenario 2

In “PCT Fast Refresh Scenario 2”, the first three steps are the same as in “PCT Fast Refresh Scenario 1”. Then, the SPLITpartition operation to the salestable is performed, but before the materialized view refresh occurs, records are inserted into the timestable.

  1. The same as in “PCT Fast Refresh Scenario 1”.
  2. The same as in “PCT Fast Refresh Scenario 1”.
  3. The same as in “PCT Fast Refresh Scenario 1”.
  4. After issuing the same SPLIToperation, as shown in “PCT Fast Refresh Scenario 1”, some data will be inserted into the timestable.
    ALTER TABLE SALES
    SPLIT PARTITION month3 AT (TO_DATE('05-02-1998', 'DD-MM-YYYY'))
    INTO (PARTIITION month3_1 TABLESPACE summ,
          PARTITION month3 TABLESPACE summ);
  5. Refresh cust_mth_sales_mv.
    EXECUTE DBMS_MVIEW.REFRESH('CUST_MTH_SALES_MV', 'F',
        '',TRUE,FALSE,0,0,0,FALSE);
    ORA-12052: cannot fast refresh materialized view SH.CUST_MTH_SALES_MV

The materialized view is not fast refreshable because DML has occurred to a table on which PCT fast refresh is not possible. To avoid this occurring, Oracle recommends performing a fast refresh immediately after any partition maintenance operation on detail tables for which partition tracking fast refresh is available.

If the situation in “PCT Fast Refresh Scenario 2” occurs, there are two possibilities; perform a complete refresh or switch to the CONSIDERFRESHoption outlined in the following, if suitable. However, it should be noted that CONSIDERFRESHand partition change tracking fast refresh are not compatible. Once the ALTERMATERIALIZEDVIEWcust_mth_sales_mvCONSIDERFRESHstatement has been issued, PCT refresh will not longer be applied to this materialized view, until a complete refresh is done. Moreover, you should not use CONSIDERFRESHunless you have taken manual action to ensure that the materialized view is indeed fresh.

A common situation in a data warehouse is the use of rolling windows of data. In this case, the detail table and the materialized view may contain say the last 12 months of data. Every month, new data for a month is added to the table and the oldest month is deleted (or maybe archived). PCT refresh provides a very efficient mechanism to maintain the materialized view in this case.

PCT Fast Refresh Scenario 3

  1. The new data is usually added to the detail table by adding a new partition and exchanging it with a table containing the new data.
    ALTER TABLE sales ADD PARTITION month_new ...
    ALTER TABLE sales EXCHANGE PARTITION month_new month_new_table
  2. Next, the oldest partition is dropped or truncated.
    ALTER TABLE sales DROP PARTITION month_oldest;
  3. Now, if the materialized view satisfies all conditions for PCT refresh.
    EXECUTE DBMS_MVIEW.REFRESH('CUST_MTH_SALES_MV', 'F', '', TRUE, FALSE,0,0,0,FALSE);

Fast refresh will automatically detect that PCT is available and perform a PCT refresh.

Fast Refresh with CONSIDER FRESH

In a data warehouse, you may often wish to accumulate historical information in the materialized view even though this information is no longer in the detailed tables. In this case, you could maintain the materialized view using the ALTERMATERIALIZEDVIEWmaterialized_view_name CONSIDERFRESHstatement.

Note that CONSIDERFRESHdeclares that the contents of the materialized view are FRESH(in sync with the detail tables). Care must be taken when using this option in this scenario in conjunction with query rewrite because you may see unexpected results.

After using CONSIDERFRESHin an historical scenario, you will be able to apply traditional fast refresh after DML and direct loads to the materialized view, but not PCT fast refresh. This is because if the detail table partition at one time contained data that is currently kept in aggregated form in the materialized view, PCT refresh in attempting to resynchronize the materialized view with that partition could delete historical data which cannot be recomputed.

Assume the salestable stores the prior year’s data and the cust_mth_sales_mvkeeps the prior 10 years of data in aggregated form.

  1. Remove old data from a partition in the salestable:
    ALTER TABLE sales TRUNCATE PARTITION month1;

    The materialized view is now considered stale and requires a refresh because of the partition operation. However, as the detail table no longer contains all the data associated with the partition fast refresh cannot be attempted.

  2. Therefore, alter the materialized view to tell Oracle to consider it fresh.
    ALTER MATERIALIZED VIEW cust_mth_sales_mv CONSIDER FRESH;

    This statement informs Oracle that cust_mth_sales_mvis fresh for your purposes. However, the materialized view now has a status that is neither known fresh nor known stale. Instead, it is UNKNOWN. If the materialized view has query rewrite enabled in QUERY_REWRITE_INTEGRITY = stale_toleratedmode, it will be used for rewrite.

  3. Insert data into sales.
  4. Refresh the materialized view.
    EXECUTE DBMS_MVIEW.REFRESH('CUST_MTH_SALES_MV', 'F', '', TRUE, FALSE,0,0,0,FALSE);

    Because the fast refresh detects that only INSERTstatements occurred against the sales table it will update the materialized view with the new data. However, the status of the materialized view will remain UNKNOWN. The only way to return the materialized view to FRESHstatus is with a complete refresh which, also will remove the historical data from the materialized view.


Indexes allow one to cluster data. The basis for this was the index leaf nodes store the indexed columns in an ordered fashion so that similar values are stored next to each other. That means that indexes build clusters of rows with similar values. This capability to cluster data is so important that I refer to it as the second power of indexing.

The B-tree traversal is the first power of indexing.

Clustering is the second power of indexing.

Cluster configurations can not only improve the performance of applications which use a single computer, but provide higher availability and reliability, and are typically much more cost-effective than single supercomputer systems with equivalent performance. The key to the capability, performance, and throughput of a computing cluster is the system software and tools used to provide the parallel job execution environment. Programming languages with implicit parallel processing features and a high-degree of optimization are also needed to ensure high-performance results as well as high programmer productivity. Clusters allow the data used by an application to be partitioned among the available computing resources and processed independently to achieve performance and scalability based on the amount of data.

Clustering: Topology design considerations

This topic provides guidance for planning and administering WebSphere® MQ clusters. This information is a guide based on testing and feedback from customers.

By thinking about where user applications and internal administrative processes are going to be located in advance, many problems can either be avoided, or minimized at a later date. This topic contains information about design decisions that can improve performance, and simplify maintenance tasks as the cluster scales.

Performance of the clustering infrastructure

When an application tries to open a queue on a queue manager in a cluster, the queue manager registers its interest with the full repositories for that queue so that it can learn where the queue exists in the cluster. Any updates to the queue location or configuration are automatically sent by the full repositories to the interested queue manager. This registering of interest is internally known as a subscription (these subscriptions are not the same as WebSphere MQ subscriptions used for publish/subscribe messaging in WebSphere MQ)

All information about a cluster goes through every full repository. Full repositories are therefore always being used in a cluster for administrative message traffic. The high usage of system resources when managing these subscriptions, and the transmission of them and the resulting configuration messages, can cause a considerable load on the clustering infrastructure. There are a number of things to consider when ensuring that this load is understood and minimized wherever possible:

  • The more individual queue managers using a cluster queue, the more subscriptions are in the system, and thus the bigger the administrative overhead when changes occur and interested subscribers need to be notified, especially on the full repository queue managers. One way to minimize unnecessary traffic and full repository load is by connecting similar applications (that is, those applications that work with the same queues) to a smaller number of queue managers.
  • In addition to the number of subscriptions in the system affecting the performance the rate of change in the configuration of clustered objects can affect performance, for example the frequent changing of a clustered queue configuration.
  • When a queue manager is a member of multiple clusters (that is, it is part of an overlapping cluster system) any interest made in a queue results in a subscription for each cluster it is a member of, even if the same queue managers are the full repositories for more than one of the clusters. This arrangement increases the load on the system, and is one reason to consider whether multiple overlapping clusters are necessary, rather than a single cluster.
  • Application message traffic (that is, the messages being sent byWebSphere MQ applications to the cluster queues) does not go via the full repositories to reach the destination queue managers. This message traffic is sent directly between the queue manager where the message enters the cluster, and the queue manager where the cluster queue exists. It is not therefore necessary to accommodate high rates of application message traffic with respect to the full repository queue managers, unless the full repository queue managers happen to be either of those two queue managers mentioned. For that reason, it is recommended that full repository queue managers are not used for application message traffic in clusters where the clustering infrastructure load is significant.

Full repositories

A repository is a collection of information about the queue managers that are members of a cluster. A queue manager that hosts a complete set of information about every queue manager in the cluster has a full repository. For more information about full repositories and partial repositories,

Full repositories must be held on servers that are reliable and as highly available as possible and single points of failure must be avoided. The cluster design must always have two full repositories. If there is a failure of a full repository, the cluster can still operate.

Details of any updates to cluster resources made by a queue manager in a cluster; for example, clustered queues, are sent from that queue manager to two full repositories at most in that cluster (or to one if there is only one full repository queue manager in the cluster). Those full repositories hold the information and propagate it to any queue managers in the cluster that show an interest in it (that is, they subscribe to it). To ensure that each member of the cluster has an up-to-date view of the cluster resources there, each queue manager must be able to communicate with at least one full repository queue manager at any one time.

If, for any reason a queue manager cannot communicate with any full repositories, it can continue to function in the cluster based on its already cached level of information for a period time, but no new updates or access to previously unused cluster resources are available.

For this reason, you must aim to keep the two full repositories available at all times. However, this arrangement does not mean that extreme measures must be taken because the cluster functions adequately for a short while without a full repository.

There is another reason that a cluster must have two full repository queue managers, other than the availability of cluster information: This reason is to ensure that the cluster information held in the full repository cache exists in two places for recovery purposes. If there is only one full repository, and it loses its information about the cluster, then manual intervention on all queue managers within the cluster is required in order to get the cluster working again. If there are two full repositories however, then because information is always published to and subscribed for from two full repositories, the failed full repository can be recovered with the minimum of effort.

  • It is possible to perform maintenance on full repository queue managers in a two full repository cluster design without impacting users of that cluster: The cluster continues to function with only one repository, so where possible bring the repositories down, apply the maintenance, and back up again one at a time. Even if there is an outage on the second full repository, running applications are unaffected for a minimum of three days.
  • Unless there is a good reason for using a third repository, such as using a geographically local full repository for geographical reasons, use the two repository design. Having three full repositories means that you never know which are the two that are currently in use, and there might be administrative problems caused by interactions between multiple workload management parameters. It is not recommend to have more than two full repositories.
  • If you still need better availability, consider hosting the full repository queue managers as multi-instance queue managers or using platform specific high availability support to improve their availability.
  • You must fully interconnect all the full repository queue managers with manually defined cluster sender channels. Particular care must be taken when the cluster does have, for some justifiable reason, more than two full repositories. In this situation it is often possible to miss one or more channels and for it not to be immediately apparent. When full interconnection does not occur, hard to diagnose problems often arise. They are hard to diagnose because some full repositories not holding all repository data and therefore resulting in queue managers in the cluster having different views of the cluster depending on the full repositories that they connect to.

Should applications use queues on full repositories?

A full repository is in most ways exactly like any other queue manager, and it is therefore possible to host application queues on the full repository and connect applications directly to these queue managers. Should applications use queues on full repositories?

The commonly accepted answer is “No”. Although this configuration is possible, many customers prefer to keep these queue managers dedicated to maintaining the full repository cluster cache. Points to consider when deciding on either option are described here, but ultimately the cluster architecture must be appropriate to the particular demands of the environment.

  • Upgrades: Usually, in order to use new cluster features in new releases of WebSphere MQ the full repository queue managers of that cluster must be upgraded first. When an application in the cluster wants to use new features, it might be useful to be able to update the Full Repositories (and some subset of partial repositories) without testing a number of co-located applications.
  • Maintenance: In a similar way if you must apply urgent maintenance to the full repositories, they can be restarted or refreshed with the REFRESH command without touching applications.
  • Performance: As clusters grow and demands on the full repository cluster cache maintenance become greater, keeping applications separate reduces risk of this affecting application performance through contention for system resources.
  • Hardware requirements: Typically, full repositories do not need to be powerful; for example, a simple UNIX server with a good expectation of availability is sufficient. Alternatively, for very large or constantly changing clusters, the performance of the full repository computer must be considered.
  • Software requirements: Requirements are usually the main reason for choosing to host application queues on a full repository. In a small cluster, collocation might mean a requirement for fewer queue managers/servers over all.

Managing channel definitions

Even within a single cluster, multiple channel definitions can exist giving multiple routes between two queue managers.

There is sometimes an advantage to having parallel channels within a single cluster, but this design decision must be considered thoroughly; apart from adding complexity, this design might result in channels being under-utilized which reduces performance. This situation occurs because testing usually involves sending lots of messages at a constant rate, so the parallel channels are fully used. But with real-world conditions of a non-constant stream of messages, the workload balancing algorithm causes performance to drop as the message flow is switched from channel to channel.

When a queue manager is a member of multiple clusters, the option exists to use a single channel definition with a cluster namelist, rather than defining a separate CLUSRCVR channel for each cluster. However, this setup can cause administration difficulties later; consider for example the case where SSL is to be applied to one cluster but not a second. It is therefore preferable to create separate definitions, and the naming convention suggested in Cluster naming conventions supports this.

Workload balancing over multiple channels

This information is intended as an advanced understanding of the subject. For the basic explanation of this subject (which must be understood before using the information here),

The cluster workload management algorithm provides a large set of tools, but they must not all be used with each other without fully understanding how they work and interact. It might not be immediately obvious how important channels are to the workload balancing process: The workload management round-robin algorithm behaves as though multiple cluster channels to a queue manager that owns a clustered queue, are treated as multiple instances of that queue. This process is explained in more detail in the following example:

  1. There are two queue managers hosting a queue in a cluster:
    QM1andQM2.
  2. There are five cluster receiver channels to
    QM1.
  3. There is only one cluster receiver channel to
    QM2.
  4. When MQPUT or MQOPEN on
    QM3chooses an instance, the algorithm is five times more likely to send the message toQM1than to

    QM2.

  5. The situation in step 4 occurs because the algorithm sees six options to choose from (5+1) and round-robins across all five channels to
    QM1and the single channel toQM2.

Another subtle behavior is that even when putting messages to a clustered queue that happens to have one instance configured on the local queue manager, WebSphere MQ uses the state of the local cluster receiver channel to decide whether messages are to be put to the local instance of the queue or remote instances of the queue. In this scenario:

  1. When putting messages the workload management algorithm does not look at individual cluster queues, it looks at the cluster channels which can reach those destinations.
  2. To reach local destinations, the local receiver channels are included in this list (although they are not used to send the message).
  3. When a local receiver channel is stopped, the workload management algorithm, prefers an alternative (possibly remote) instance by default, if that channel is not stopped.

Server Clustering results in the following benefits and liabilities:

Benefits

  • Improved scalability. Server Clustering enables applications to handle more load.
  • Higher availability. Server Clustering helps applications avoid interruptions in service.
  • Greater flexibility. The ability of clustering to present a virtual unified computing resource provides IT personnel with more options for configuring the infrastructure to support application performance, availability, and scalability requirements.

Liabilities

  • Improved scalability. Server Clustering enables applications to handle more load.
  • Higher availability. Server Clustering helps applications avoid interruptions in service.
  • Greater flexibility. The ability of clustering to present a virtual unified computing resource provides IT personnel with more options for configuring the infrastructure to support application performance, availability, and scalability requirements.
  • Increased infrastructure complexity. Some clustering designs significantly increase the complexity of your solution, which may affect operational and support requirements. For example, clustering can increase the numbers of servers to manage, storage devices to maintain, and network connections to configure and monitor.
  • Additional design and code requirements. Applications may require specific design and coding changes to function properly when used in an infrastructure that uses clustering. For example, the need to manage session state can become more difficult across multiple servers and could require coding changes to accommodate maintaining state so that session information is not lost if a server fails.
  • Incompatibility. An existing application or application component may not be able to support clustering technologies. For example, a limitation in the technology used to develop the application or component may not support clustering even through code changes.

Back to Tutorial

Apply for Data Mining and Warehousing Certification Now!!

http://www.vskills.in/certification/Certified-Data-Mining-and-Warehousing-Professional

Get industry recognized certification – Contact us

Menu