Tuesday, March 3, 2015

12c Parallel Execution New Features: Concurrent UNION ALL - Part 1

12c introduces the concurrent UNION ALL operator that allows multiple branches below the UNION ALL to become active concurrently - which is a significant change. Before the introduction of this feature Oracle never executed multiple branches of an execution plan concurrently (in terms of Parallel Execution) - the parallelism so far was about executing the same operations of a particular branch of the execution plan by multiple processes / sessions. When we later look at the actual runtime behaviour of the new feature we'll notice that it's actually not that much different from previous behaviour than it sounds here.

By default the feature gets automatically used when there is at least one branch of the UNION ALL making use of Parallel Execution features, like a parallel full table scan for example.

If the UNION ALL operator consists of serial branches only the concurrent execution of those serial branches can be explicitly activated by using the PQ_CONCURRENT_UNION hint. To suppress the feature usage, a corresponding NO_PQ_CONCURRENT_UNION hint can be used. Both hints are officially documented.

The concurrent execution of serial branches in my opinion could be particularly useful to overcome the still existing limitation of parallelism and operations via database links: If you want to copy larger segments of data via database links you can use Parallel Execution on the source to speed up data retrieval and you can use Parallel Execution on the target to speed up the DML process, but the data will always have to go through the query coordinator process on both sides, so a copy process is always limited by the CPU and network bandwidth / latency of a single process. If you need to copy multiple objects of similar size you could simply speed up the copy process via spawning multiple jobs, but if the objects are of significantly different size you would still have to wait for the single process to cope with the largest segment.

Before 12c the only way to overcome this limitation was to employ "do-it-yourself" parallelism per (large) segment, more recently supported via DBMS_PARALLEL_EXECUTE. This allows spreading such an operation across several processes / sessions concurrently and therefore allowing to overcome the described limitation. But this approach makes the overall handling more complex (chunking, error handling etc., although DBMS_PARALLEL_EXECUTE provides a standardized interface for such tasks) and isn't a simple SQL based solution any longer.

With the concurrent UNION ALL feature in principle a SQL solution should be possible where multiple branches of a concurrent UNION ALL query (different chunks) of the same remote segment concurrently. Of course this approach still requires some manual preparation since ideally the different branches should only query disjunct parts of the segment, so for non-partitioned segments ROWID ranges should be prepared and used inside the different remote queries, similar to the DBMS_PARALLEL_EXECUTE approach. If you don't care about the duplicated work you could of course simply use something like a ORA_HASH or MOD based filter on a column expression that comes up with a reasonably equal (hash) value distribution (like a primary key).

However, due to the way the concurrent UNION ALL is implemented internally at present, this approach might not work very well, at least in 12.1.0.1 and 12.1.0.2, as I'll show in a later part.

Some other questions I would like to address regarding this feature are the following:

- How is the work distributed at runtime if you end up with a mixture of parallel and serial branches?

- If you request concurrent execution with serial branches only and no other parallel execution in the plan, how is the parallel degree to be used determined?

In this part of the blog post I want to start with the latter question first: What parallel degree gets used if a UNION ALL consists only of serial branches?

And one follow-up question could be: What happens if a more complex execution plan mixes parts with Parallel Execution outside a UNION ALL with a concurrent UNION ALL that consists only of serial branches?

Here is the table setup I'm going to use for all my tests:
-- This is the Parallel table
create table t_2
compress
as
select
        rownum as id
      , rpad('x', 100) as filler
from
        (select /*+ cardinality(1e5) */ * from dual
connect by
        level <= 1e5) a, (select /*+ cardinality(20) */ * from dual connect by level <= 20) b
;

exec dbms_stats.gather_table_stats(null, 't_2')

alter table t_2 parallel 8;

-- This is the serial table
create table t2
compress
as
select
        (rownum * 2) + 1 as id
      , mod(rownum, 2000) + 1 as id2
      , rpad('x', 100) as filler
from
        (select /*+ cardinality(10000) */ * from dual
connect by
        level <= 10000) a, (select /*+ cardinality(20) */ * from dual connect by level <= 20) b
;

exec dbms_stats.gather_table_stats(null, 't2')

create database link loop connect to cbo_test identified by cbo_test using '//localhost:1521/orcl12'
The database link will become relevant in the later parts of this series where I look at the execution of remote branches.

So let's start with the question about the parallel degree used in case of serial branches only. I'll use the following simple query to determine that (repeat the query block inside the UNION ALL as many times as desired, I've used 20 serial branches):
-- Can I force a completely serial UNION ALL into concurrent execution using the hint? Yes, but isn't selected by the optimizer automatically
-- In 12.1.0.2 the PQ_CONCURRENT_UNION hint works as documented as statement level hint, whereas in 12.1.0.1 there is the bug that you need to specify a query block name (Bug 15851422 : PQ_CONCURRENT_HINT DOES NOT WORK WITHOUT QUERY BLOCK SPECIFICATION)
-- What about the degree chosen in such a case, is there an upper limit?
select /*+ PQ_CONCURRENT_UNION(@"SET$1") */ count(*) from (
select regexp_replace(t2.filler, '^\s+([[:alnum:]]+)\s+$', lpad('\1', 10), 1, 1, 'c') as result from t2
where regexp_replace(t2.filler, '^\s+([[:alnum:]]+)\s+$', lpad('\1', 10), 1, 1, 'c') >= regexp_replace(t2.filler, '^\s+([[:alnum:]]+)\s+$', lpad('\1', 10), 1, 1, 'i')
union all
select regexp_replace(t2.filler, '^\s+([[:alnum:]]+)\s+$', lpad('\1', 10), 1, 1, 'c') as result from t2
where regexp_replace(t2.filler, '^\s+([[:alnum:]]+)\s+$', lpad('\1', 10), 1, 1, 'c') >= regexp_replace(t2.filler, '^\s+([[:alnum:]]+)\s+$', lpad('\1', 10), 1, 1, 'i')
union all
.
.
.
select regexp_replace(t2.filler, '^\s+([[:alnum:]]+)\s+$', lpad('\1', 10), 1, 1, 'c') as result from t2
where regexp_replace(t2.filler, '^\s+([[:alnum:]]+)\s+$', lpad('\1', 10), 1, 1, 'c') >= regexp_replace(t2.filler, '^\s+([[:alnum:]]+)\s+$', lpad('\1', 10), 1, 1, 'i')
);
I've performed all tests on 12.1.0.2.

If you do an EXPLAIN PLAN on above query without the PQ_CONCURRENT_UNION hint, then you'll see, as documented, the default behaviour, which is serial execution of one branch after the other (I've used 20 serial branches in my tests):
-------------------------------------
| Id  | Operation            | Name |
-------------------------------------
|   0 | SELECT STATEMENT     |      |
|   1 |  SORT AGGREGATE      |      |
|   2 |   VIEW               |      |
|   3 |    UNION-ALL         |      |
|*  4 |     TABLE ACCESS FULL| T2   |
|*  5 |     TABLE ACCESS FULL| T2   |
|*  6 |     TABLE ACCESS FULL| T2   |
|*  7 |     TABLE ACCESS FULL| T2   |
|*  8 |     TABLE ACCESS FULL| T2   |
|*  9 |     TABLE ACCESS FULL| T2   |
|* 10 |     TABLE ACCESS FULL| T2   |
|* 11 |     TABLE ACCESS FULL| T2   |
|* 12 |     TABLE ACCESS FULL| T2   |
|* 13 |     TABLE ACCESS FULL| T2   |
|* 14 |     TABLE ACCESS FULL| T2   |
|* 15 |     TABLE ACCESS FULL| T2   |
|* 16 |     TABLE ACCESS FULL| T2   |
|* 17 |     TABLE ACCESS FULL| T2   |
|* 18 |     TABLE ACCESS FULL| T2   |
|* 19 |     TABLE ACCESS FULL| T2   |
|* 20 |     TABLE ACCESS FULL| T2   |
|* 21 |     TABLE ACCESS FULL| T2   |
|* 22 |     TABLE ACCESS FULL| T2   |
|* 23 |     TABLE ACCESS FULL| T2   |
-------------------------------------
With the hint in place the plan changes to this:
-------------------------------------------------------------------------
| Id  | Operation                | Name     |   TQ  |IN-OUT| PQ Distrib |
-------------------------------------------------------------------------
|   0 | SELECT STATEMENT         |          |       |      |            |
|   1 |  SORT AGGREGATE          |          |       |      |            |
|   2 |   PX COORDINATOR         |          |       |      |            |
|   3 |    PX SEND QC (RANDOM)   | :TQ10000 | Q1,00 | P->S | QC (RAND)  |
|   4 |     SORT AGGREGATE       |          | Q1,00 | PCWP |            |
|   5 |      VIEW                |          | Q1,00 | PCWP |            |
|   6 |       UNION-ALL          |          | Q1,00 | PCWP |            |
|   7 |        PX SELECTOR       |          | Q1,00 | PCWP |            |
|*  8 |         TABLE ACCESS FULL| T2       | Q1,00 | PCWP |            |
|   9 |        PX SELECTOR       |          | Q1,00 | PCWP |            |
|* 10 |         TABLE ACCESS FULL| T2       | Q1,00 | PCWP |            |
|  11 |        PX SELECTOR       |          | Q1,00 | PCWP |            |
|* 12 |         TABLE ACCESS FULL| T2       | Q1,00 | PCWP |            |
|  13 |        PX SELECTOR       |          | Q1,00 | PCWP |            |
|* 14 |         TABLE ACCESS FULL| T2       | Q1,00 | PCWP |            |
|  15 |        PX SELECTOR       |          | Q1,00 | PCWP |            |
|* 16 |         TABLE ACCESS FULL| T2       | Q1,00 | PCWP |            |
|  17 |        PX SELECTOR       |          | Q1,00 | PCWP |            |
|* 18 |         TABLE ACCESS FULL| T2       | Q1,00 | PCWP |            |
|  19 |        PX SELECTOR       |          | Q1,00 | PCWP |            |
|* 20 |         TABLE ACCESS FULL| T2       | Q1,00 | PCWP |            |
|  21 |        PX SELECTOR       |          | Q1,00 | PCWP |            |
|* 22 |         TABLE ACCESS FULL| T2       | Q1,00 | PCWP |            |
|  23 |        PX SELECTOR       |          | Q1,00 | PCWP |            |
|* 24 |         TABLE ACCESS FULL| T2       | Q1,00 | PCWP |            |
|  25 |        PX SELECTOR       |          | Q1,00 | PCWP |            |
|* 26 |         TABLE ACCESS FULL| T2       | Q1,00 | PCWP |            |
|  27 |        PX SELECTOR       |          | Q1,00 | PCWP |            |
|* 28 |         TABLE ACCESS FULL| T2       | Q1,00 | PCWP |            |
|  29 |        PX SELECTOR       |          | Q1,00 | PCWP |            |
|* 30 |         TABLE ACCESS FULL| T2       | Q1,00 | PCWP |            |
|  31 |        PX SELECTOR       |          | Q1,00 | PCWP |            |
|* 32 |         TABLE ACCESS FULL| T2       | Q1,00 | PCWP |            |
|  33 |        PX SELECTOR       |          | Q1,00 | PCWP |            |
|* 34 |         TABLE ACCESS FULL| T2       | Q1,00 | PCWP |            |
|  35 |        PX SELECTOR       |          | Q1,00 | PCWP |            |
|* 36 |         TABLE ACCESS FULL| T2       | Q1,00 | PCWP |            |
|  37 |        PX SELECTOR       |          | Q1,00 | PCWP |            |
|* 38 |         TABLE ACCESS FULL| T2       | Q1,00 | PCWP |            |
|  39 |        PX SELECTOR       |          | Q1,00 | PCWP |            |
|* 40 |         TABLE ACCESS FULL| T2       | Q1,00 | PCWP |            |
|  41 |        PX SELECTOR       |          | Q1,00 | PCWP |            |
|* 42 |         TABLE ACCESS FULL| T2       | Q1,00 | PCWP |            |
|  43 |        PX SELECTOR       |          | Q1,00 | PCWP |            |
|* 44 |         TABLE ACCESS FULL| T2       | Q1,00 | PCWP |            |
|  45 |        PX SELECTOR       |          | Q1,00 | PCWP |            |
|* 46 |         TABLE ACCESS FULL| T2       | Q1,00 | PCWP |            |
-------------------------------------------------------------------------
Starting with 12c you'll usually get a plan note about the parallel degree the optimizer has determined, for both the traditional DOP and the in 11.2 introduced Auto DOP way (for Auto DOP this note showed already up in 11.2). However, in this case, no note section shows up.

So what degree do we then get at runtime? (XPLAN_ASH output snippet)
Real-Time SQL Monitoring Execution Summary                                                                      
-----------------------------------------------                                                                 
                                                                                                                
                   |                              |PX IS|PX        |PX        |          |PX        |PX        |
                   |                              |CROSS|MIN       |MAX       |PX        |SERVERS   |SERVERS   |
STATUS             |USERNAME                      |INST |DOP       |DOP       |INSTANCES |REQUESTED |ALLOCATED |
-------------------|------------------------------|-----|----------|----------|----------|----------|----------|
EXECUTING          |CBO_TEST                      |N    |        20|        20|         1|        20|        20|
                   |                              |     |          |          |          |          |          |
So that's interesting, in my case I tested this on a laptop with 2 CPUs and Resource Manager active, which means that insane degrees should be downgraded automatically (or even already limited by the optimizer), but I really got a degree of 20, which doesn't make a lot of sense in this environment. However, in 12c obviously the limits enforced by the Resource Manager have been raised. In 11.2 the highest degree the Resource Manager allowed in this particular environment was 16, in 12c the limit was 27 instead (maybe 28, because in 11.2 I sometimes got 15, sometimes 16).

So pretty straightforward the number of serial branches seem to determine the degree attempted to use at runtime. This also means you need to be careful how many branches you code into the UNION ALL if you want to make use of the concurrent feature.

What happens if I mix now a Parallel Execution with a UNION ALL that consists only of serial branches, for example like this:
select /*+ PQ_CONCURRENT_UNION(@"SET$1") */ count(*) from (
select id, regexp_replace(t_2.filler, '^\s+([[:alnum:]]+)\s+$', lpad('\1', 10), 1, 1, 'c') as result from t_2
where regexp_replace(t_2.filler, '^\s+([[:alnum:]]+)\s+$', lpad('\1', 10), 1, 1, 'c') >= regexp_replace(t_2.filler, '^\s+([[:alnum:]]+)\s+$', lpad('\1', 10), 1, 1, 'i')
) a,
(
select id, regexp_replace(t2.filler, '^\s+([[:alnum:]]+)\s+$', lpad('\1', 10), 1, 1, 'c') as result from t2
where regexp_replace(t2.filler, '^\s+([[:alnum:]]+)\s+$', lpad('\1', 10), 1, 1, 'c') >= regexp_replace(t2.filler, '^\s+([[:alnum:]]+)\s+$', lpad('\1', 10), 1, 1, 'i')
union all
select id, regexp_replace(t2.filler, '^\s+([[:alnum:]]+)\s+$', lpad('\1', 10), 1, 1, 'c') as result from t2
where regexp_replace(t2.filler, '^\s+([[:alnum:]]+)\s+$', lpad('\1', 10), 1, 1, 'c') >= regexp_replace(t2.filler, '^\s+([[:alnum:]]+)\s+$', lpad('\1', 10), 1, 1, 'i')
union all
select id, regexp_replace(t2.filler, '^\s+([[:alnum:]]+)\s+$', lpad('\1', 10), 1, 1, 'c') as result from t2
where regexp_replace(t2.filler, '^\s+([[:alnum:]]+)\s+$', lpad('\1', 10), 1, 1, 'c') >= regexp_replace(t2.filler, '^\s+([[:alnum:]]+)\s+$', lpad('\1', 10), 1, 1, 'i')
union all
select id, regexp_replace(t2.filler, '^\s+([[:alnum:]]+)\s+$', lpad('\1', 10), 1, 1, 'c') as result from t2
where regexp_replace(t2.filler, '^\s+([[:alnum:]]+)\s+$', lpad('\1', 10), 1, 1, 'c') >= regexp_replace(t2.filler, '^\s+([[:alnum:]]+)\s+$', lpad('\1', 10), 1, 1, 'i')
union all
select id, regexp_replace(t2.filler, '^\s+([[:alnum:]]+)\s+$', lpad('\1', 10), 1, 1, 'c') as result from t2
where regexp_replace(t2.filler, '^\s+([[:alnum:]]+)\s+$', lpad('\1', 10), 1, 1, 'c') >= regexp_replace(t2.filler, '^\s+([[:alnum:]]+)\s+$', lpad('\1', 10), 1, 1, 'i')
union all
select id, regexp_replace(t2.filler, '^\s+([[:alnum:]]+)\s+$', lpad('\1', 10), 1, 1, 'c') as result from t2
where regexp_replace(t2.filler, '^\s+([[:alnum:]]+)\s+$', lpad('\1', 10), 1, 1, 'c') >= regexp_replace(t2.filler, '^\s+([[:alnum:]]+)\s+$', lpad('\1', 10), 1, 1, 'i')
union all
select id, regexp_replace(t2.filler, '^\s+([[:alnum:]]+)\s+$', lpad('\1', 10), 1, 1, 'c') as result from t2
where regexp_replace(t2.filler, '^\s+([[:alnum:]]+)\s+$', lpad('\1', 10), 1, 1, 'c') >= regexp_replace(t2.filler, '^\s+([[:alnum:]]+)\s+$', lpad('\1', 10), 1, 1, 'i')
union all
select id, regexp_replace(t2.filler, '^\s+([[:alnum:]]+)\s+$', lpad('\1', 10), 1, 1, 'c') as result from t2
where regexp_replace(t2.filler, '^\s+([[:alnum:]]+)\s+$', lpad('\1', 10), 1, 1, 'c') >= regexp_replace(t2.filler, '^\s+([[:alnum:]]+)\s+$', lpad('\1', 10), 1, 1, 'i')
union all
select id, regexp_replace(t2.filler, '^\s+([[:alnum:]]+)\s+$', lpad('\1', 10), 1, 1, 'c') as result from t2
where regexp_replace(t2.filler, '^\s+([[:alnum:]]+)\s+$', lpad('\1', 10), 1, 1, 'c') >= regexp_replace(t2.filler, '^\s+([[:alnum:]]+)\s+$', lpad('\1', 10), 1, 1, 'i')
union all
select id, regexp_replace(t2.filler, '^\s+([[:alnum:]]+)\s+$', lpad('\1', 10), 1, 1, 'c') as result from t2
where regexp_replace(t2.filler, '^\s+([[:alnum:]]+)\s+$', lpad('\1', 10), 1, 1, 'c') >= regexp_replace(t2.filler, '^\s+([[:alnum:]]+)\s+$', lpad('\1', 10), 1, 1, 'i')
) b
where a.id = b.id;
Again, if I don't use the PQ_CONCURRENT_UNION hint, I'll get the traditional plan shape for the UNION ALL - which gets executed serially and then gets distributed as part of the remaining parallel operations:
-------------------------------------------------------------------------------
| Id  | Operation                     | Name     |    TQ  |IN-OUT| PQ Distrib |
-------------------------------------------------------------------------------
|   0 | SELECT STATEMENT              |          |        |      |            |
|   1 |  SORT AGGREGATE               |          |        |      |            |
|   2 |   PX COORDINATOR              |          |        |      |            |
|   3 |    PX SEND QC (RANDOM)        | :TQ10002 |  Q1,02 | P->S | QC (RAND)  |
|   4 |     SORT AGGREGATE            |          |  Q1,02 | PCWP |            |
|*  5 |      HASH JOIN                |          |  Q1,02 | PCWP |            |
|   6 |       BUFFER SORT             |          |  Q1,02 | PCWC |            |
|   7 |        PX RECEIVE             |          |  Q1,02 | PCWP |            |
|   8 |         PX SEND HYBRID HASH   | :TQ10000 |        | S->P | HYBRID HASH|
|   9 |          STATISTICS COLLECTOR |          |        |      |            |
|  10 |           VIEW                |          |        |      |            |
|  11 |            UNION-ALL          |          |        |      |            |
|* 12 |             TABLE ACCESS FULL | T2       |        |      |            |
|* 13 |             TABLE ACCESS FULL | T2       |        |      |            |
|* 14 |             TABLE ACCESS FULL | T2       |        |      |            |
|* 15 |             TABLE ACCESS FULL | T2       |        |      |            |
|* 16 |             TABLE ACCESS FULL | T2       |        |      |            |
|* 17 |             TABLE ACCESS FULL | T2       |        |      |            |
|* 18 |             TABLE ACCESS FULL | T2       |        |      |            |
|* 19 |             TABLE ACCESS FULL | T2       |        |      |            |
|* 20 |             TABLE ACCESS FULL | T2       |        |      |            |
|* 21 |             TABLE ACCESS FULL | T2       |        |      |            |
|  22 |       PX RECEIVE              |          |  Q1,02 | PCWP |            |
|  23 |        PX SEND HYBRID HASH    | :TQ10001 |  Q1,01 | P->P | HYBRID HASH|
|  24 |         PX BLOCK ITERATOR     |          |  Q1,01 | PCWC |            |
|* 25 |          TABLE ACCESS FULL    | T_2      |  Q1,01 | PCWP |            |
-------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
 
   5 - access("ID"="B"."ID")
  12 - filter( REGEXP_REPLACE ("T2"."FILLER",'^\s+([[:alnum:]]+)\s+$','        \1',1,1,'c')>= REGEXP_REPLACE 
              ("T2"."FILLER",'^\s+([[:alnum:]]+)\s+$','        \1',1,1,'i'))
  13 - filter( REGEXP_REPLACE ("T2"."FILLER",'^\s+([[:alnum:]]+)\s+$','        \1',1,1,'c')>= REGEXP_REPLACE 
              ("T2"."FILLER",'^\s+([[:alnum:]]+)\s+$','        \1',1,1,'i'))
  14 - filter( REGEXP_REPLACE ("T2"."FILLER",'^\s+([[:alnum:]]+)\s+$','        \1',1,1,'c')>= REGEXP_REPLACE 
              ("T2"."FILLER",'^\s+([[:alnum:]]+)\s+$','        \1',1,1,'i'))
  15 - filter( REGEXP_REPLACE ("T2"."FILLER",'^\s+([[:alnum:]]+)\s+$','        \1',1,1,'c')>= REGEXP_REPLACE 
              ("T2"."FILLER",'^\s+([[:alnum:]]+)\s+$','        \1',1,1,'i'))
  16 - filter( REGEXP_REPLACE ("T2"."FILLER",'^\s+([[:alnum:]]+)\s+$','        \1',1,1,'c')>= REGEXP_REPLACE 
              ("T2"."FILLER",'^\s+([[:alnum:]]+)\s+$','        \1',1,1,'i'))
  17 - filter( REGEXP_REPLACE ("T2"."FILLER",'^\s+([[:alnum:]]+)\s+$','        \1',1,1,'c')>= REGEXP_REPLACE 
              ("T2"."FILLER",'^\s+([[:alnum:]]+)\s+$','        \1',1,1,'i'))
  18 - filter( REGEXP_REPLACE ("T2"."FILLER",'^\s+([[:alnum:]]+)\s+$','        \1',1,1,'c')>= REGEXP_REPLACE 
              ("T2"."FILLER",'^\s+([[:alnum:]]+)\s+$','        \1',1,1,'i'))
  19 - filter( REGEXP_REPLACE ("T2"."FILLER",'^\s+([[:alnum:]]+)\s+$','        \1',1,1,'c')>= REGEXP_REPLACE 
              ("T2"."FILLER",'^\s+([[:alnum:]]+)\s+$','        \1',1,1,'i'))
  20 - filter( REGEXP_REPLACE ("T2"."FILLER",'^\s+([[:alnum:]]+)\s+$','        \1',1,1,'c')>= REGEXP_REPLACE 
              ("T2"."FILLER",'^\s+([[:alnum:]]+)\s+$','        \1',1,1,'i'))
  21 - filter( REGEXP_REPLACE ("T2"."FILLER",'^\s+([[:alnum:]]+)\s+$','        \1',1,1,'c')>= REGEXP_REPLACE 
              ("T2"."FILLER",'^\s+([[:alnum:]]+)\s+$','        \1',1,1,'i'))
  25 - filter( REGEXP_REPLACE ("T_2"."FILLER",'^\s+([[:alnum:]]+)\s+$','        \1',1,1,'c')>= REGEXP_REPLACE 
              ("T_2"."FILLER",'^\s+([[:alnum:]]+)\s+$','        \1',1,1,'i'))

Note
-----
   - Degree of Parallelism is 4 because of table property
Since I marked the table T_2 with PARALLEL 8 I would expect the overall plan to use a degree of 8, but look at the "note" section, which tells me the degree would be 4 due to a table property...

What happens at runtime:
Real-Time SQL Monitoring Execution Summary                                                                      
-----------------------------------------------                                                                 
                                                                                                                
                   |                              |PX IS|PX        |PX        |          |PX        |PX        |
                   |                              |CROSS|MIN       |MAX       |PX        |SERVERS   |SERVERS   |
STATUS             |USERNAME                      |INST |DOP       |DOP       |INSTANCES |REQUESTED |ALLOCATED |
-------------------|------------------------------|-----|----------|----------|----------|----------|----------|
EXECUTING          |CBO_TEST                      |N    |         8|         8|         1|        16|        16|
                   |                              |     |          |          |          |          |          |
But at runtime I get a degree of 8, so you have to be careful with these plan notes as they quite often seem to report incorrect degrees for the traditional DOP way.

If I now use the PQ_CONCURRENT_UNION hint, I get the following plan:
------------------------------------------------------------------------------
| Id  | Operation                    | Name     |    TQ  |IN-OUT| PQ Distrib |
------------------------------------------------------------------------------
|   0 | SELECT STATEMENT             |          |        |      |            |
|   1 |  SORT AGGREGATE              |          |        |      |            |
|   2 |   PX COORDINATOR             |          |        |      |            |
|   3 |    PX SEND QC (RANDOM)       | :TQ10002 |  Q1,02 | P->S | QC (RAND)  |
|   4 |     SORT AGGREGATE           |          |  Q1,02 | PCWP |            |
|*  5 |      HASH JOIN               |          |  Q1,02 | PCWP |            |
|   6 |       PX RECEIVE             |          |  Q1,02 | PCWP |            |
|   7 |        PX SEND HYBRID HASH   | :TQ10000 |  Q1,00 | P->P | HYBRID HASH|
|   8 |         STATISTICS COLLECTOR |          |  Q1,00 | PCWC |            |
|   9 |          VIEW                |          |  Q1,00 | PCWP |            |
|  10 |           UNION-ALL          |          |  Q1,00 | PCWP |            |
|  11 |            PX SELECTOR       |          |  Q1,00 | PCWP |            |
|* 12 |             TABLE ACCESS FULL| T2       |  Q1,00 | PCWP |            |
|  13 |            PX SELECTOR       |          |  Q1,00 | PCWP |            |
|* 14 |             TABLE ACCESS FULL| T2       |  Q1,00 | PCWP |            |
|  15 |            PX SELECTOR       |          |  Q1,00 | PCWP |            |
|* 16 |             TABLE ACCESS FULL| T2       |  Q1,00 | PCWP |            |
|  17 |            PX SELECTOR       |          |  Q1,00 | PCWP |            |
|* 18 |             TABLE ACCESS FULL| T2       |  Q1,00 | PCWP |            |
|  19 |            PX SELECTOR       |          |  Q1,00 | PCWP |            |
|* 20 |             TABLE ACCESS FULL| T2       |  Q1,00 | PCWP |            |
|  21 |            PX SELECTOR       |          |  Q1,00 | PCWP |            |
|* 22 |             TABLE ACCESS FULL| T2       |  Q1,00 | PCWP |            |
|  23 |            PX SELECTOR       |          |  Q1,00 | PCWP |            |
|* 24 |             TABLE ACCESS FULL| T2       |  Q1,00 | PCWP |            |
|  25 |            PX SELECTOR       |          |  Q1,00 | PCWP |            |
|* 26 |             TABLE ACCESS FULL| T2       |  Q1,00 | PCWP |            |
|  27 |            PX SELECTOR       |          |  Q1,00 | PCWP |            |
|* 28 |             TABLE ACCESS FULL| T2       |  Q1,00 | PCWP |            |
|  29 |            PX SELECTOR       |          |  Q1,00 | PCWP |            |
|* 30 |             TABLE ACCESS FULL| T2       |  Q1,00 | PCWP |            |
|  31 |       PX RECEIVE             |          |  Q1,02 | PCWP |            |
|  32 |        PX SEND HYBRID HASH   | :TQ10001 |  Q1,01 | P->P | HYBRID HASH|
|  33 |         PX BLOCK ITERATOR    |          |  Q1,01 | PCWC |            |
|* 34 |          TABLE ACCESS FULL   | T_2      |  Q1,01 | PCWP |            |
------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
 
   5 - access("ID"="B"."ID")
  12 - filter( REGEXP_REPLACE ("T2"."FILLER",'^\s+([[:alnum:]]+)\s+$','        \1',1,1,'c')>= REGEXP_REPLACE 
              ("T2"."FILLER",'^\s+([[:alnum:]]+)\s+$','        \1',1,1,'i'))
  14 - filter( REGEXP_REPLACE ("T2"."FILLER",'^\s+([[:alnum:]]+)\s+$','        \1',1,1,'c')>= REGEXP_REPLACE 
              ("T2"."FILLER",'^\s+([[:alnum:]]+)\s+$','        \1',1,1,'i'))
  16 - filter( REGEXP_REPLACE ("T2"."FILLER",'^\s+([[:alnum:]]+)\s+$','        \1',1,1,'c')>= REGEXP_REPLACE 
              ("T2"."FILLER",'^\s+([[:alnum:]]+)\s+$','        \1',1,1,'i'))
  18 - filter( REGEXP_REPLACE ("T2"."FILLER",'^\s+([[:alnum:]]+)\s+$','        \1',1,1,'c')>= REGEXP_REPLACE 
              ("T2"."FILLER",'^\s+([[:alnum:]]+)\s+$','        \1',1,1,'i'))
  20 - filter( REGEXP_REPLACE ("T2"."FILLER",'^\s+([[:alnum:]]+)\s+$','        \1',1,1,'c')>= REGEXP_REPLACE 
              ("T2"."FILLER",'^\s+([[:alnum:]]+)\s+$','        \1',1,1,'i'))
  22 - filter( REGEXP_REPLACE ("T2"."FILLER",'^\s+([[:alnum:]]+)\s+$','        \1',1,1,'c')>= REGEXP_REPLACE 
              ("T2"."FILLER",'^\s+([[:alnum:]]+)\s+$','        \1',1,1,'i'))
  24 - filter( REGEXP_REPLACE ("T2"."FILLER",'^\s+([[:alnum:]]+)\s+$','        \1',1,1,'c')>= REGEXP_REPLACE 
              ("T2"."FILLER",'^\s+([[:alnum:]]+)\s+$','        \1',1,1,'i'))
  26 - filter( REGEXP_REPLACE ("T2"."FILLER",'^\s+([[:alnum:]]+)\s+$','        \1',1,1,'c')>= REGEXP_REPLACE 
              ("T2"."FILLER",'^\s+([[:alnum:]]+)\s+$','        \1',1,1,'i'))
  28 - filter( REGEXP_REPLACE ("T2"."FILLER",'^\s+([[:alnum:]]+)\s+$','        \1',1,1,'c')>= REGEXP_REPLACE 
              ("T2"."FILLER",'^\s+([[:alnum:]]+)\s+$','        \1',1,1,'i'))
  30 - filter( REGEXP_REPLACE ("T2"."FILLER",'^\s+([[:alnum:]]+)\s+$','        \1',1,1,'c')>= REGEXP_REPLACE 
              ("T2"."FILLER",'^\s+([[:alnum:]]+)\s+$','        \1',1,1,'i'))
  34 - filter( REGEXP_REPLACE ("T_2"."FILLER",'^\s+([[:alnum:]]+)\s+$','        \1',1,1,'c')>= REGEXP_REPLACE 
              ("T_2"."FILLER",'^\s+([[:alnum:]]+)\s+$','        \1',1,1,'i'))

Note
-----
   - Degree of Parallelism is 4 because of table property
OK, this is the concurrent UNION ALL shape now, but the "note" section still tells me the parallel degree is 4.

What happens at runtime:
Real-Time SQL Monitoring Execution Summary                                                                      
-----------------------------------------------                                                                 
                                                                                                                
                   |                              |PX IS|PX        |PX        |          |PX        |PX        |
                   |                              |CROSS|MIN       |MAX       |PX        |SERVERS   |SERVERS   |
STATUS             |USERNAME                      |INST |DOP       |DOP       |INSTANCES |REQUESTED |ALLOCATED |
-------------------|------------------------------|-----|----------|----------|----------|----------|----------|
EXECUTING          |CBO_TEST                      |N    |        10|        10|         1|        20|        20|
                   |                              |     |          |          |          |          |          |
Oops, at runtime I now get the degree determined by the PQ_CONCURRENT_UNION part, so the overall degree is then in my case 10, and not 4 (or 8 for that matter), since my UNION ALL had 10 serial branches in this example.

So it becomes obvious that using the new feature with serial branches you have to worry a bit about the parallel degree used at runtime.

In the next instalment we'll look at the actual runtime behaviour of the feature when it gets automatically triggered by a mixture of parallel and serial branches.

Thursday, February 19, 2015

12c Parallel Execution New Features: Hybrid Hash Distribution - Part 2

In the second part of this post (go to part 1) I want to focus on the hybrid distribution for skewed join expressions.

2. Hybrid Distribution For Skewed Join Expressions


The HYBRID HASH distribution allows to some degree addressing data distribution skew in case of HASH distributions, which I've described in detail already in the past. A summary post that links to all other relevant articles regarding Parallel Execution Skew can be found here, an overview of the relevant feature can be found here and a detailed description can be found here.

One other side effect of the truly hybrid distribution in case of skew (mixture of BROADCAST / HASH for one row source and ROUND-ROBIN / HASH for the other row source) is that HASH distributions following such a hybrid distribution need to redistribute again even if the same join / distribution keys get used by following joins. If this were regular HASH distributions the data would already be suitably distributed and no further redistribution would be required.

Here's an example of this, using the test case setup mentioned here:
-- Here the HYBRID SKEW distribution works for B->C
-- But the (B->C)->A join is affected by the same skew
-- So the HASH re-distribution of the resulting B.ID is skewed, too
-- And hence the HASH JOIN/SORT AGGREGATE (operation 4+5) are affected by the skew
-- The big question is: Why is there a re-distribution (operation 12+11)?
-- The data is already distributed on B.ID??
-- If there wasn't a re-distribution no skew would happen
-- In 11.2 no-redistribution happens no matter if C is probe or hash row source
-- So it looks like a side-effect of the hybrid distribution
-- Which makes sense as it is not really HASH distributed, but hybrid
select count(t_2_filler) from (
select  /*+ monitor
            leading(b c a)
            use_hash(c a)
            swap_join_inputs(a)
            no_swap_join_inputs(c)
            pq_distribute(a hash hash)
            pq_distribute(c hash hash)
            --optimizer_features_enable('11.2.0.4')
            pq_skew(c) 
        */
        a.id as t_1_id
      , a.filler as t_1_filler
      , c.id as t_2_id
      , c.filler as t_2_filler
from    t_1 a
      , t_1 b
      , t_2 c
where
        c.fk_id_skew = b.id
and     a.id = b.id
);

-- 11.2 plan
----------------------------------------------------------------------------
| Id  | Operation                  | Name     |    TQ  |IN-OUT| PQ Distrib |
----------------------------------------------------------------------------
|   0 | SELECT STATEMENT           |          |        |      |            |
|   1 |  SORT AGGREGATE            |          |        |      |            |
|   2 |   PX COORDINATOR           |          |        |      |            |
|   3 |    PX SEND QC (RANDOM)     | :TQ10003 |  Q1,03 | P->S | QC (RAND)  |
|   4 |     SORT AGGREGATE         |          |  Q1,03 | PCWP |            |
|*  5 |      HASH JOIN             |          |  Q1,03 | PCWP |            |
|   6 |       PX RECEIVE           |          |  Q1,03 | PCWP |            |
|   7 |        PX SEND HASH        | :TQ10000 |  Q1,00 | P->P | HASH       |
|   8 |         PX BLOCK ITERATOR  |          |  Q1,00 | PCWC |            |
|   9 |          TABLE ACCESS FULL | T_1      |  Q1,00 | PCWP |            |
|* 10 |        HASH JOIN           |          |  Q1,03 | PCWP |            |
|  11 |        PX RECEIVE          |          |  Q1,03 | PCWP |            |
|  12 |         PX SEND HASH       | :TQ10001 |  Q1,01 | P->P | HASH       |
|  13 |          PX BLOCK ITERATOR |          |  Q1,01 | PCWC |            |
|  14 |           TABLE ACCESS FULL| T_1      |  Q1,01 | PCWP |            |
|  15 |        PX RECEIVE          |          |  Q1,03 | PCWP |            |
|  16 |         PX SEND HASH       | :TQ10002 |  Q1,02 | P->P | HASH       |
|  17 |          PX BLOCK ITERATOR |          |  Q1,02 | PCWC |            |
|  18 |           TABLE ACCESS FULL| T_2      |  Q1,02 | PCWP |            |
----------------------------------------------------------------------------

-- 12.1 plan
-------------------------------------------------------------------------------------
| Id  | Operation                           | Name     |    TQ  |IN-OUT| PQ Distrib |
-------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                    |          |        |      |            |
|   1 |  SORT AGGREGATE                     |          |        |      |            |
|   2 |   PX COORDINATOR                    |          |        |      |            |
|   3 |    PX SEND QC (RANDOM)              | :TQ10004 |  Q1,04 | P->S | QC (RAND)  |
|   4 |     SORT AGGREGATE                  |          |  Q1,04 | PCWP |            |
|*  5 |      HASH JOIN                      |          |  Q1,04 | PCWP |            |
|   6 |       PX RECEIVE                    |          |  Q1,04 | PCWP |            |
|   7 |        PX SEND HYBRID HASH          | :TQ10002 |  Q1,02 | P->P | HYBRID HASH|
|   8 |         STATISTICS COLLECTOR        |          |  Q1,02 | PCWC |            |
|   9 |          PX BLOCK ITERATOR          |          |  Q1,02 | PCWC |            |
|  10 |           TABLE ACCESS FULL         | T_1      |  Q1,02 | PCWP |            |
|  11 |       PX RECEIVE                    |          |  Q1,04 | PCWP |            |
|  12 |        PX SEND HYBRID HASH          | :TQ10003 |  Q1,03 | P->P | HYBRID HASH|
|* 13 |         HASH JOIN BUFFERED          |          |  Q1,03 | PCWP |            |
|  14 |          PX RECEIVE                 |          |  Q1,03 | PCWP |            |
|  15 |           PX SEND HYBRID HASH       | :TQ10000 |  Q1,00 | P->P | HYBRID HASH|
|  16 |            STATISTICS COLLECTOR     |          |  Q1,00 | PCWC |            |
|  17 |             PX BLOCK ITERATOR       |          |  Q1,00 | PCWC |            |
|  18 |              TABLE ACCESS FULL      | T_1      |  Q1,00 | PCWP |            |
|  19 |          PX RECEIVE                 |          |  Q1,03 | PCWP |            |
|  20 |           PX SEND HYBRID HASH (SKEW)| :TQ10001 |  Q1,01 | P->P | HYBRID HASH|
|  21 |            PX BLOCK ITERATOR        |          |  Q1,01 | PCWC |            |
|  22 |             TABLE ACCESS FULL       | T_2      |  Q1,01 | PCWP |            |
-------------------------------------------------------------------------------------
Note that both joins to A and C are based on B.ID. As you can see from the 11.2 plan therefore the final hash join (operation ID 5) doesn't need to have the output of the previous hash join (operation ID 10) redistributed, since the data is already distributed in a suitable way (and as a consequence both joins therefore will be affected by skewed values in T2.FK_ID_SKEW, but no BUFFERED join variant is required).

Now look at the 12c plan when SKEW is detected: Since the SKEW handling in fact leads to a potential mixture of HASH / BROADCAST and HASH / ROUND-ROBIN distribution, the data gets redistributed again for the final join (operation ID 11 + 12) which has several bad side effects: First it adds the overhead of an additional redistribution, as a side effect this then turns one of the hash joins into its BUFFERED variant, and since the SKEW distribution (at present) is only supported if the right side of the join is a table (and not the result of another join), this following join actually will be affected by the skew that was just addressed by the special SKEW handling in the join before (assuming the HYBRID HASH distributions in operation ID 6+7 / 11+12 operate in HASH / HASH, not BROADCAST / ROUND-ROBIN mode)...