Thursday, March 6, 2008

Native hash full outer join in Oracle 11g

One of the new features of Oracle 11g that does not have that much of marketing attention is the new support for native full outer joins if they can be handled by a HASH join operation, whichs means that it needs to be an equi-join. Of course you need to use the ANSI join syntax (FULL OUTER JOIN) rather than the traditional Oracle syntax for outer joins (+) in order to benefit from the new feature

In comparison to the method used so far for full outer joins the new native support should require less resource consumption as it is able to perform the full outer join in one single operation whereas the former implementation required a rewrite of the full outer join to a simple outer join (left or right) and an additional "not exists" operation, e.g. handled by a further hash-join anti which would require another access to the data, e.g. two table scans instead of a single one.

This new functionality seems to be controlled by the internal parameter "_optimizer_native_full_outer_join". It currently supports three different settings: "choose", "force" and "off". The default is "force", and setting it to any other value for me disabled the new feature, so in my tests there was no difference between "choose" and "off", but of course there might be cases where "choose" actually attempts to use the new feature.

You can modify this parameter e.g. using ALTER SESSION or the OPT_PARAM hint that has been introduced with Oracle 10g that allows to manipulate parameter values only for a specific statement execution rather than for the whole session.

Here is a small sample script that demonstrates the new feature in action.

First it creates two tables using the ALL_OBJECTS view. In a 11g default installation these tables hold approx. 53.000 records. The two tables are generated using a certain offset for the join column so that they overlap only partially. Hence a full outer join on that column is supposed to return the overlapped part and each part from each table that doesn't overlap.


SQL>
SQL> drop table native_full_outer_join_test purge;

Table dropped.

Elapsed: 00:00:00.02
SQL> drop table native_full_outer_join_test2 purge;

Table dropped.

Elapsed: 00:00:00.02
SQL>
SQL> create table native_full_outer_join_test as
2 select
3 trunc(sqrt(rownum-1)) as skewed_data,
4 rownum-1 as id,
5 lpad(rownum-1,10) id_char,
6 rpad('x',50, 'x') as filler
7 from
8 all_objects;

Table created.

Elapsed: 00:00:26.25
SQL>
SQL> create table native_full_outer_join_test2 as
2 select
3 trunc(sqrt(rownum+20000-1)) as skewed_data,
4 rownum+20000-1 as id,
5 lpad(rownum+20000-1,10) id_char,
6 rpad('x',50, 'x') as filler
7 from
8 all_objects;

Table created.

Elapsed: 00:00:37.01
SQL>
SQL> begin
2 dbms_stats.gather_table_stats(ownname=>USER, tabname=>'native_full_outer_join_test');
3 dbms_stats.gather_table_stats(ownname=>USER, tabname=>'native_full_outer_join_test2');
4 end;
5 /

PL/SQL procedure successfully completed.

Elapsed: 00:00:00.70

Now we perform the full outer join using the default setting of force for the corresponding parameter.

SQL>
SQL> set autotrace traceonly
SQL>
SQL> select /*+ opt_param('_optimizer_native_full_outer_join', 'force') */ a.*, b.* from native_full_outer_join_test a
2 full outer join native_full_outer_join_test2 b
3 on a.id = b.id
4 order by a.id_char;

73762 rows selected.

Elapsed: 00:00:01.16

Execution Plan
----------------------------------------------------------
Plan hash value: 4125684409

---------------------------------------------------------------------------------------------------------------
|ִIdִִ|ִOperationִִִִִִִִִִִִִִ|ִNameִִִִִִִִִִִִִִִִִִִִִִִִִ|ִRowsִִ|ִBytesִ|TempSpc|ִCostִ(%CPU)|ִTimeִִִִִ|
---------------------------------------------------------------------------------------------------------------
|ִִִ0ִ|ִSELECTִSTATEMENTִִִִִִִ|ִִִִִִִִִִִִִִִִִִִִִִִִִִִִִִ|ִ53762ִ|ִִ6667K|ִִִִִִִ|ִִ2259ִִִ(1)|ִ00:00:28ִ|
|ִִִ1ִ|ִִSORTִORDERִBYִִִִִִִִִ|ִִִִִִִִִִִִִִִִִִִִִִִִִִִִִִ|ִ53762ִ|ִִ6667K|ִִִִ15M|ִִ2259ִִִ(1)|ִ00:00:28ִ|
|ִִִ2ִ|ִִִVIEWִִִִִִִִִִִִִִִִִ|ִVW_FOJ_0ִִִִִִִִִִִִִִִִִִִִִ|ִ53762ִ|ִִ6667K|ִִִִִִִ|ִִִ731ִִִ(1)|ִ00:00:09ִ|
|*ִִ3ִ|ִִִִHASHִJOINִFULLִOUTER|ִִִִִִִִִִִִִִִִִִִִִִִִִִִִִִ|ִ53762ִ|ִִ7350K|ִִ4312K|ִִִ731ִִִ(1)|ִ00:00:09ִ|
|ִִִ4ִ|ִִִִִTABLEִACCESSִFULLִִ|ִNATIVE_FULL_OUTER_JOIN_TESTִִ|ִ53761ִ|ִִ3675K|ִִִִִִִ|ִִִ156ִִִ(1)|ִ00:00:02ִ|
|ִִִ5ִ|ִִִִִTABLEִACCESSִFULLִִ|ִNATIVE_FULL_OUTER_JOIN_TEST2ִ|ִ53762ִ|ִִ3675K|ִִִִִִִ|ִִִ156ִִִ(1)|ִ00:00:02ִ|
---------------------------------------------------------------------------------------------------------------

PredicateִInformationִ(identifiedִbyִoperationִid):
---------------------------------------------------

ִִִ3ִ-ִaccess("A"."ID"="B"."ID")


Statistics
----------------------------------------------------------
ִִִִִִִִִִ1ִִrecursiveִcalls
ִִִִִִִִִִ0ִִdbִblockִgets
ִִִִִִִ1136ִִconsistentִgets
ִִִִִִִִִִ0ִִphysicalִreads
ִִִִִִִִִִ0ִִredoִsize
ִִִִ2704743ִִbytesִsentִviaִSQL*Netִtoִclient
ִִִִִִ54503ִִbytesִreceivedִviaִSQL*Netִfromִclient
ִִִִִִִ4919ִִSQL*Netִroundtripsִto/fromִclient
ִִִִִִִִִִ1ִִsortsִ(memory)
ִִִִִִִִִִ0ִִsortsִ(disk)
ִִִִִִ73762ִִrowsִprocessed

You can see that the database was able to perform the join operation in one single HASH JOIN FULL OUTER step. It is interesting to note that the cardinality estimate in this particular case does not consider the non-overlapping parts and just returns the cardinality of a complete overlapping set.

Now we revert to the pre-11g implementation by disabling the new feature.

SQL>
SQL> select /*+ opt_param('_optimizer_native_full_outer_join', 'off') */ a.*, b.* from native_full_outer_join_test a
2 full outer join native_full_outer_join_test2 b
3 on a.id = b.id
4 order by a.id_char;

73762 rows selected.

Elapsed: 00:00:01.26

Execution Plan
----------------------------------------------------------
Plan hash value: 4036012045

----------------------------------------------------------------------------------------------------------------
|ִIdִִ|ִOperationִִִִִִִִִִִִִִִ|ִNameִִִִִִִִִִִִִִִִִִִִִִִִִ|ִRowsִִ|ִBytesִ|TempSpc|ִCostִ(%CPU)|ִTimeִִִִִ|
----------------------------------------------------------------------------------------------------------------
|ִִִ0ִ|ִSELECTִSTATEMENTִִִִִִִִ|ִִִִִִִִִִִִִִִִִִִִִִִִִִִִִִ|ִ87522ִ|ִִִִ10M|ִִִִִִִ|ִִ3529ִִִ(1)|ִ00:00:43ִ|
|ִִִ1ִ|ִִSORTִORDERִBYִִִִִִִִִִ|ִִִִִִִִִִִִִִִִִִִִִִִִִִִִִִ|ִ87522ִ|ִִִִ10M|ִִִִ25M|ִִ3529ִִִ(1)|ִ00:00:43ִ|
|ִִִ2ִ|ִִִVIEWִִִִִִִִִִִִִִִִִִ|ִִִִִִִִִִִִִִִִִִִִִִִִִִִִִִ|ִ87522ִ|ִִִִ10M|ִִִִִִִ|ִִ1043ִִִ(1)|ִ00:00:13ִ|
|ִִִ3ִ|ִִִִUNION-ALLִִִִִִִִִִִִ|ִִִִִִִִִִִִִִִִִִִִִִִִִִִִִִ|ִִִִִִִ|ִִִִִִִ|ִִִִִִִ|ִִִִִִִִִִִִ|ִִִִִִִִִִ|
|*ִִ4ִ|ִִִִִHASHִJOINִOUTERִִִִִ|ִִִִִִִִִִִִִִִִִִִִִִִִִִִִִִ|ִ53761ִ|ִִ7350K|ִִ4312K|ִִִ731ִִִ(1)|ִ00:00:09ִ|
|ִִִ5ִ|ִִִִִִTABLEִACCESSִFULLִִ|ִNATIVE_FULL_OUTER_JOIN_TESTִִ|ִ53761ִ|ִִ3675K|ִִִִִִִ|ִִִ156ִִִ(1)|ִ00:00:02ִ|
|ִִִ6ִ|ִִִִִִTABLEִACCESSִFULLִִ|ִNATIVE_FULL_OUTER_JOIN_TEST2ִ|ִ53762ִ|ִִ3675K|ִִִִִִִ|ִִִ156ִִִ(1)|ִ00:00:02ִ|
|*ִִ7ִ|ִִִִִHASHִJOINִRIGHTִANTI|ִִִִִִִִִִִִִִִִִִִִִִִִִִִִִִ|ִ33761ִ|ִִ2472K|ִִִִִִִ|ִִִ312ִִִ(1)|ִ00:00:04ִ|
|ִִִ8ִ|ִִִִִִTABLEִACCESSִFULLִִ|ִNATIVE_FULL_OUTER_JOIN_TESTִִ|ִ53761ִ|ִִִ262K|ִִִִִִִ|ִִִ156ִִִ(1)|ִ00:00:02ִ|
|ִִִ9ִ|ִִִִִִTABLEִACCESSִFULLִִ|ִNATIVE_FULL_OUTER_JOIN_TEST2ִ|ִ53762ִ|ִִ3675K|ִִִִִִִ|ִִִ156ִִִ(1)|ִ00:00:02ִ|
----------------------------------------------------------------------------------------------------------------

PredicateִInformationִ(identifiedִbyִoperationִid):
---------------------------------------------------

ִִִ4ִ-ִaccess("A"."ID"="B"."ID"(+))
ִִִ7ִ-ִaccess("A"."ID"="B"."ID")


Statistics
----------------------------------------------------------
ִִִִִִִִִִ1ִִrecursiveִcalls
ִִִִִִִִִִ0ִִdbִblockִgets
ִִִִִִִ2272ִִconsistentִgets
ִִִִִִִִִִ0ִִphysicalִreads
ִִִִִִִִִִ0ִִredoִsize
ִִִִ2705047ִִbytesִsentִviaִSQL*Netִtoִclient
ִִִִִִ54503ִִbytesִreceivedִviaִSQL*Netִfromִclient
ִִִִִִִ4919ִִSQL*Netִroundtripsִto/fromִclient
ִִִִִִִִִִ1ִִsortsִ(memory)
ִִִִִִִִִִ0ִִsortsִ(disk)
ִִִִִִ73762ִִrowsִprocessed

It is obvious that in this particular case the database had to do more work. It required two operations, an outer join and a second anti join to perform the full outer join operation. It can also be seen that the "consistent gets" are doubled due to that second operation. On the other hand, the cardinality estimate looks closer to reality, as it considers at least partially the non-overlapping part represented by the anti join operation.

For the sake of completeness the following query shows that the new feature is only available for equi-joins, since this is the only operation supported by a hash join.

SQL>
SQL> set autotrace traceonly explain
SQL>
SQL> select a.*, b.* from native_full_outer_join_test a
2 full outer join native_full_outer_join_test2 b
3 on a.id < b.id
4 order by a.id_char;


Elapsed: 00:00:00.01

Execution Plan
----------------------------------------------------------
Plan hash value: 4239385412

---------------------------------------------------------------------------------------------------------------
|ִIdִִ|ִOperationִִִִִִִִִִִִִִ|ִNameִִִִִִִִִִִִִִִִִִִִִִִִִ|ִRowsִִ|ִBytesִ|TempSpc|ִCostִ(%CPU)|ִTimeִִִִִ|
---------------------------------------------------------------------------------------------------------------
|ִִִ0ִ|ִSELECTִSTATEMENTִִִִִִִ|ִִִִִִִִִִִִִִִִִִִִִִִִִִִִִִ|ִִִ144M|ִִִִ17G|ִִִִִִִ|ִִ4107Kִִ(1)|ִ13:41:36ִ|
|ִִִ1ִ|ִִSORTִORDERִBYִִִִִִִִִ|ִִִִִִִִִִִִִִִִִִִִִִִִִִִִִִ|ִִִ144M|ִִִִ17G|ִִִִ41G|ִִ4107Kִִ(1)|ִ13:41:36ִ|
|ִִִ2ִ|ִִִVIEWִִִִִִִִִִִִִִִִִ|ִִִִִִִִִִִִִִִִִִִִִִִִִִִִִִ|ִִִ144M|ִִִִ17G|ִִִִִִִ|ִִ4107ִִ(17)|ִ00:00:50ִ|
|ִִִ3ִ|ִִִִUNION-ALLִִִִִִִִִִִ|ִִִִִִִִִִִִִִִִִִִִִִִִִִִִִִ|ִִִִִִִ|ִִִִִִִ|ִִִִִִִ|ִִִִִִִִִִִִ|ִִִִִִִִִִ|
|ִִִ4ִ|ִִִִִMERGEִJOINִOUTERִִִ|ִִִִִִִִִִִִִִִִִִִִִִִִִִִִִִ|ִִִ144M|ִִִִ18G|ִִִִִִִ|ִִ2740ִִ(25)|ִ00:00:33ִ|
|ִִִ5ִ|ִִִִִִSORTִJOINִִִִִִִִִ|ִִִִִִִִִִִִִִִִִִִִִִִִִִִִִִ|ִ53761ִ|ִִ3675K|ִִ8888K|ִִ1044ִִִ(1)|ִ00:00:13ִ|
|ִִִ6ִ|ִִִִִִִTABLEִACCESSִFULL|ִNATIVE_FULL_OUTER_JOIN_TESTִִ|ִ53761ִ|ִִ3675K|ִִִִִִִ|ִִִ156ִִִ(1)|ִ00:00:02ִ|
|*ִִ7ִ|ִִִִִִSORTִJOINִִִִִִִִִ|ִִִִִִִִִִִִִִִִִִִִִִִִִִִִִִ|ִ53762ִ|ִִ3675K|ִִ8888K|ִִ1044ִִִ(1)|ִ00:00:13ִ|
|ִִִ8ִ|ִִִִִִִTABLEִACCESSִFULL|ִNATIVE_FULL_OUTER_JOIN_TEST2ִ|ִ53762ִ|ִִ3675K|ִִִִִִִ|ִִִ156ִִִ(1)|ִ00:00:02ִ|
|ִִִ9ִ|ִִִִִMERGEִJOINִANTIִִִִ|ִִִִִִִִִִִִִִִִִִִִִִִִִִִִִִ|ִ51074ִ|ִִ3740K|ִִִִִִִ|ִִ1366ִִִ(1)|ִ00:00:17ִ|
|ִִ10ִ|ִִִִִִSORTִJOINִִִִִִִִִ|ִִִִִִִִִִִִִִִִִִִִִִִִִִִִִִ|ִ53762ִ|ִִ3675K|ִִ8888K|ִִ1044ִִִ(1)|ִ00:00:13ִ|
|ִִ11ִ|ִִִִִִִTABLEִACCESSִFULL|ִNATIVE_FULL_OUTER_JOIN_TEST2ִ|ִ53762ִ|ִִ3675K|ִִִִִִִ|ִִִ156ִִִ(1)|ִ00:00:02ִ|
|*ִ12ִ|ִִִִִִSORTִUNIQUEִִִִִִִ|ִִִִִִִִִִִִִִִִִִִִִִִִִִִִִִ|ִ53761ִ|ִִִ262K|ִִ1288K|ִִִ322ִִִ(1)|ִ00:00:04ִ|
|ִִ13ִ|ִִִִִִִTABLEִACCESSִFULL|ִNATIVE_FULL_OUTER_JOIN_TESTִִ|ִ53761ִ|ִִִ262K|ִִִִִִִ|ִִִ156ִִִ(1)|ִ00:00:02ִ|
---------------------------------------------------------------------------------------------------------------

PredicateִInformationִ(identifiedִbyִoperationִid):
---------------------------------------------------

ִִִ7ִ-ִaccess("A"."ID"<"B"."ID"(+))
ִִִִִִִfilter("A"."ID"<"B"."ID"(+))
ִִ12ִ-ִaccess(INTERNAL_FUNCTION("A"."ID")
<INTERNAL_FUNCTION("B"."ID"))
ִִִִִִִfilter(INTERNAL_FUNCTION("A"."ID")<INTERNAL_FUNCTION("B"."ID"))

SQL>
SQL> spool off