Saturday, August 11, 2012

Parallel Execution Analysis Using ASH - The XPLAN_ASH Tool



Preface


Note: This blog post actually serves three purposes:
  1. It introduces and describes my latest contribution to the Oracle Community,  the "XPLAN_ASH" tool

  2. It accompanies a future OTN article on Parallel Execution that will be published some time in the future

  3. It is supposed to act as a teaser for my upcoming "Parallel Execution Masterclass" that will be organized by Oracle University and can be booked later this year


Table Of Contents


Introduction

Real-Time SQL Monitoring Overview

Real-Time SQL Monitoring Shortcomings

Introducing XPLAN_ASH (Download)

Options Reference

Examples


Introduction


Were you already confronted with some of the following questions?
  • Why did the batch process last night take that long?

  • Why did this Parallel Execution statement take that long this time?

  • What Degree Of Parallelism (DOP) was used for the execution of statement XYZ?

Anyone seriously using the Enterprise Edition Parallel Execution feature should consider upgrading to Oracle 11.2 and getting the Diagnostics and Tuning Pack license, then you might be able to answer some of these questions easily.

Oracle has significantly enhanced the Real-Time SQL Monitoring feature in 11.2, which was introduced in 11.1.

But there are some problems, even with the latest versions of Real-Time SQL Monitoring, one of them being simply that the reports are not persisted by default and they might no longer be available when you need them. I'll come back to this later.

When monitoring Parallel Execution, in particular the "ACTIVE" or "EM" version of the Real-Time SQL Monitoring offers invaluable insights into the SQL execution details.

On top of the general information provided and "Plan Statistics" tab that shows execution details on execution plan line level, the "Parallel", "Activity" and "Metrics" tabs allow gaining a lot of insights that weren't available in previous versions.


Real-Time SQL Monitoring Overview


Here are some example Real-Time SQL Monitoring screen shots of a Parallel Execution (you can also simply access the active report version using this link):


General Information


So we can see this statement is completed, was a cross-instance Parallel Execution with two instances each running a DOP of (maximum) 16 (shown as DOP 32 in Real-Time SQL Monitoring). We can also see that we have a larger DB Time than Wall-Clock time, so it was a true Parallel Execution, but we probably would expect a much larger DB Time / Wall-Clock Time ratio given the Parallel Degree shown - a first indication that something might not have worked out as expected.

Furthermore we can see that this statement didn't perform any physical I/O, everything came from cache, which looks like an In-Memory Parallel Execution (it was, but not using the new 11.2 In-Memory Parallel Execution feature, but "hand-made")


Plan Statistics


We can see here that three operations of the plan were sampled most, in this case always on CPU.




Parallel


Here we can see various things:

- The work wasn't distributed evenly among the involved RAC nodes

- The work wasn't distributed evenly among the Parallel Slaves, one Parallel Slave had to work much more than the others => Parallel Execution skew problem



Activity


Here it can be seen that the "Parallel Execution skew" problem that could already be identified in the previous tab really had a significant impact on the SQL execution: There is only a short peak with the expected Parallel activity and a very long tail of execution where only a single CPU was active.



Metrics


This tab confirms what we've seen so far, but in this particular case doesn't add more information, but it definitely could in other cases (PGA / TEMP consumption, I/O requests and throughput analysis for example)



Real-Time SQL Monitoring Shortcomings


So as you can see the Real-Time SQL Monitoring offers great insights for Parallel Execution, but there are - at least in the currently available versions - also some shortcomings:

  • In a busy system the Real-Time SQL Monitoring information can age out quite quickly out of the Shared Pool, although Oracle tries to keep it as long as possible after the actual execution finished. You could work around this using a custom job that somehow persists the SQL Monitoring information, but that feature is not there out of the box. Note: You could also change the undocumented retention setting if you want to keep them longer. See Doug Burns' post for more information

  • For cross-instance Parallel Execution the "Activity" and "Metrics" tabs don't group the information provided by instance, and therefore you might not be able to tell if one of the involved instances was overloaded in terms of CPU usage, memory or similar

  • By default Real-Time SQL Monitoring doesn't monitor plans that have more than 300 lines. This can be changed by modifying undocumented parameters. See Doug Burns' post about this for more information

  • The Real-Time SQL Monitoring lacks precision in some regards, for example it only reports the maximum DOP found if an execution plan consists of multiple Parallelizers (also called "Data Flow Operations", DFOs). Each DFO gets its own DOP determined (and in principle will have its own Parallel Slave Sets assigned at runtime).

    You have to look very closely at the number of executions on plan line level (only applicable to some operations depending on the shape of the execution plan) or have to navigate to the "Parallel" tab and check the multiple "Parallel Group" details how many Parallel Slaves were allocated per DFO to get more details about that, but even then you cannot easily relate this information to the execution plan, which means you don't know which part of the execution plan was executed with what DOP.

    Here is an Real-Time SQL Monitoring report for an execution plan consisting of multiple DFOs. Can you easily tell which part of the execution plan ran with what DOP?



  • If - as in the first example above - Parallel Execution skew causes problems, then unfortunately Real-Time SQL Monitoring lacks the details which operations of the execution plan exactly are/were impacted by the skew, which would allow gaining a better understanding what caused the skew and how the skew problem could be addressed - the basis for a systematic troubleshooting approach.

    Look at the various pictures from the report shown at the beginning - where do you think the skew happens and which operations are affected?

    Very likely you would turn to the operation that is responsible for the majority of the DB time (The PX SEND HASH operation in the lower part of the execution plan). This approach is in principle a good idea, but in this case leads you into the wrong direction.

    And the example is a very simple three-table join where you might be able to work through the different joins manually (tedious job and not always possible with transient data) and find out where things might go wrong, but imagine this was your real-life 150+ lines Parallel Execution plan including dozens of joins?

    In short, it is not easy to tell from the Real-Time SQL Monitoring report. You can only tell that some Parallel Slaves have to perform more work than others, but it's not that simple to get more details about the problem for systematic troubleshooting.

    Note that when looking at the "raw" (XML) output of the Real-Time SQL Monitoring and the underlying queries used to populate the XML it can be seen that the skew problem is already mentioned there but the information seems to be incomplete and/or is not displayed/processed yet in the report frontend code. May be this will change in a future release.

Introducing XPLAN_ASH



Download the latest version of the script from here

In order to address above shortcomings I've developed a SQL*Plus script that provides information about the SQL execution based on the Active Session History (ASH). Although that information is only based on sampling active sessions it still can provide a wealth of information, in particular for transient problems that already happened, are potentially hard to reproduce and you don't have a Real-Time SQL Monitoring (anymore) report at hand respectively you face one of the problems described that are not (yet) handled that well by the Real-Time SQL Monitoring.

Note that Oracle 10g already introduced ASH, but in 10g the ASH data lacks the relation to a particular SQL execution and the SQL execution plan lines, therefore the tool only processes ASH data from version 11g on. The information not related to execution plan lines could be extracted from 10g ASH (or free ASH implementations) but it would require a slightly different approach how to restrict the ASH data processed (for example, start sample time / end sample time instead of the 11g approach of identifying the data corresponding to an execution by SQL_EXEC_START / SQL_EXEC_ID).

Furthermore it should be mentioned that ASH requires at least the Enterprise Edition plus Diagnostic Pack license. For Real-Time SQL Monitoring you need the Tuning Pack license in addition.

Let's have a look what the tool provides about an SQL execution based on the ASH data. I'll refer the same SQL executions that were monitored above.

General Information


The first part of the output is general information so that you can verify that you're really looking at the execution that you want to analyse, and which options the tool uses:

General information ---------------------------------------------- SQL_ID CHILD_NUMBER SQL_EXEC_START SQL_EXEC_ID FORMAT_OPTION LAST_EXEC_SOURCE ASH_OPTIONS ASH_SOURCE ------------- ------------ ------------------- ----------- ------------------------- ---------------- ------------------------ ---------- 3pfa4kgc92z10 0 2012-07-23 15:14:39 83886081 ADVANCED ASH ASH,DISTRIB_REL,TIMELINE CURR

Note it can be controlled via options which of the following output should be shown.

ASH Summary


The next part is a high-level summary of the ASH data found for that execution, including the number instances involved (for RAC cross-instance Parallel Execution), whether the execution seems to be active or inactive (which is a good indicator that it is completed or still executing), the assumed duration of the execution, the number of samples found, how many of them were CPU samples, the number of Parallel Slaves found, and the resulting number of Average Active Sessions. This last two bits of information gives already a rough idea about the actual Parallel Execution degree.

INSTANCE_COUNT FIRST_SAMPLE LAST_SAMPLE DURATION_SECS STATUS SAMPLE_COUNT SLAVE_COUNT CPU_SAMPLE_COUNT AVERAGE_AS -------------- ------------------- ------------------- ------------- -------- ------------ ----------- ---------------- ---------- 2 2012-07-23 15:14:40 2012-07-23 15:15:17 39 INACTIVE 190 54 190 4,87

RAC Instance Summary


If a cross-instance Parallel Execution was detected, a similar overview on Instance Level will be provided, including an indication of the time the instance was active during the execution - it might happen that an instance was only involved partially in a Parallel Execution depending on the execution plan.

INSTANCE_ID FIRST_SAMPLE LAST_SAMPLE START_ACTIVE DURATION_SECS SAMPLE_COUNT SLAVE_COUNT AVERAGE_AS TIME_ACTIVE_GRAPH ----------- ------------------- ------------------- ------------ ------------- ------------ ----------- ---------- -------------------- 5 2012-07-23 15:14:40 2012-07-23 15:15:17 1 38 113 32 2,97 @@@@@@@@@@@@@@@@@@@ 6 2012-07-23 15:14:40 2012-07-23 15:14:44 1 5 77 22 15,4 @@@

Here we can see that one instance was only active for a short period of time.

DFO Summary


If a Parallel Execution was detected and a corresponding execution plan could be found (from the Shared Pool, or the AWR workload repository, depending on the options used), then there will another summary information per Instance / DFO.

Information on Parallel Degree based on ASH ----------------------------------------------- INSTANCE_ID DFO START_ACTIVE DURATION_SECS SAMPLE_COUNT SLAVE_COUNT SET_COUNT ASSUMED_DEGREE AVERAGE_AS TIME_ACTIVE_GRAPH ----------- ------ ------------ ------------- ------------ ----------- ---------- -------------- ---------- -------------------- 5 Q1 1 38 113 32 2 16 2,97 @@@@@@@@@@@@@@@@@@@ 6 Q1 1 5 76 22 2 11 15,2 @@@

This information is part of the attempt addressing above shortcomings - it will show the number of Parallel Slaves found per DFO and based on the execution plan will try to perform an educated guess about the actual DOP used. Of course this can be incorrect if not all of the Parallel Slaves involved show up in the ASH samples, so this information has to be treated as what it is: a guess.

It will also show a graphical representation of the time the DFO / Instance combination was active relative to the full execution time, which tells another important point: If there were multiple DFOs involved, were they active at the same time, or after each other? In the former case, you can end up with a Parallel Execution that requires more Parallel Slaves than potentially expected (because each DFO gets its own set of Parallel Slaves), in the latter case Oracle usually is clever enough to re-use the same Parallel Slave processes for the different DFO's slave sets.

Here is a sample output for an execution where multiple DFOs were active at the same time:

Information on Parallel Degree based on ASH ----------------------------------------------- DFO START_ACTIVE DURATION_SECS SAMPLE_COUNT PROCESS_COUNT SET_COUNT ASSUMED_DEGREE AVERAGE_AS TIME_ACTIVE_GRAPH ------ ------------ ------------- ------------ ------------- ---------- -------------- ---------- -------------------- Q1 3 173 170 2 1 2 .98 @@@@@@@@@@@@@@@@@@@@ Q2 2 167 5 4 2 2 .03 @@@@@@@@@@@@@@@@@@@

Here is a sample output where multiple DFOs were active, but not at the same time. It is actually the corresponding output for above Real-Time SQL Monitoring report for the statement consisting of multiple DFOs that ran with different DOPs. It shows clearly which DFO used which DOP:

Information on Parallel Degree based on ASH ----------------------------------------------- INSTANCE_ID DFO START_ACTIVE DURATION_SECS SAMPLE_COUNT PROCESS_COUNT SET_COUNT ASSUMED_DEGREE AVERAGE_AS TIME_ACTIVE_GRAPH ----------- ------ ------------ ------------- ------------ ------------- ---------- -------------- ---------- -------------------- 5 Q1 2 10 190 32 2 16 19 @@@@@ 5 Q2 13 18 93 8 2 4 5,17 @@@@@@@@ 6 Q3 31 13 228 31 2 16 18 @@@@@@

Average Active Session Graph


If a Parallel Execution was detected the "Average Active Session Graph" follows (if not disabled via the options). This is basically a cheap imitation of the "Activity" tab from the Real-Time SQL Monitoring, from 11.2 on mixed with some information from the "Metrics" tab, since from 11.2 ASH also includes PGA and TEMP figures per sample.

The graph shows the Average Active Sessions in a configurable number of buckets (100 by default), which means that an execution that has more samples than buckets will be compressed and averaged accordingly.

The graph shows CPU and non-CPU samples separately, furthermore the graph is always on instance level for cross-instance Parallel Execution, since it is important to see if there is a particular node in trouble. For example the overall memory or CPU usage might be reasonable for a cross-instance Parallel Execution, but you can't tell from such an cross-instance summary if one of the instances actually had to perform the majority of the work. As already mentioned above, this is again something that is not that easy to tell from a Real-Time SQL Monitoring report.

Here we can see that the graph in principle corresponds to the "Activity" tab from the Real-Time SQL Monitoring report and confirms that most of the time only a single CPU was active, with only a short spike of CPU activity at the beginning of the execution.

We can also see that the second instance was not involved in the execution most of the time.

Average Active Session Overview based on ASH ----------------------------------------------- INSTANCE_ID DURATION_SECS PGA TEMP CPU OTHER AVERAGE_AS AVERAGE_AS_GRAPH ----------- ------------- ------ ------ ---------- ---------- ---------- ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- 5 1 0 0 0 6 0 0 0 5 2 478M 16 0 16 @@@@@@@@@@@@@@@@ 6 34M 15 0 15 @@@@@@@@@@@@@@@ 5 3 35M 16 0 16 @@@@@@@@@@@@@@@@ 6 36M 16 0 16 @@@@@@@@@@@@@@@@ 5 4 35M 16 0 16 @@@@@@@@@@@@@@@@ 6 36M 16 0 16 @@@@@@@@@@@@@@@@ 5 5 35M 16 0 16 @@@@@@@@@@@@@@@@ 6 36M 16 0 16 @@@@@@@@@@@@@@@@ 5 6 35M 16 0 16 @@@@@@@@@@@@@@@@ 6 565M 14 0 14 @@@@@@@@@@@@@@ 5 7 81M 1 0 1 @ 6 0 0 0 5 8 81M 1 0 1 @ 6 0 0 0 5 9 81M 1 0 1 @ 6 0 0 0 5 10 81M 1 0 1 @ 6 0 0 0 5 11 81M 1 0 1 @ 6 0 0 0 5 12 81M 1 0 1 @ 6 0 0 0 5 13 81M 1 0 1 @ 6 0 0 0 5 14 81M 1 0 1 @ 6 0 0 0 5 15 81M 1 0 1 @ 6 0 0 0 5 16 81M 1 0 1 @ 6 0 0 0 5 17 81M 1 0 1 @ 6 0 0 0 5 18 81M 1 0 1 @ 6 0 0 0 5 19 81M 1 0 1 @ 6 0 0 0 5 20 81M 1 0 1 @ 6 0 0 0 5 21 81M 1 0 1 @ 6 0 0 0 5 22 81M 1 0 1 @ 6 0 0 0 5 23 81M 1 0 1 @ 6 0 0 0 5 24 81M 1 0 1 @ 6 0 0 0 5 25 81M 1 0 1 @ 6 0 0 0 5 26 81M 1 0 1 @ 6 0 0 0 5 27 81M 1 0 1 @ 6 0 0 0 5 28 81M 1 0 1 @ 6 0 0 0 5 29 81M 1 0 1 @ 6 0 0 0 5 30 81M 1 0 1 @ 6 0 0 0 5 31 81M 1 0 1 @ 6 0 0 0 5 32 81M 1 0 1 @ 6 0 0 0 5 33 81M 1 0 1 @ 6 0 0 0 5 34 81M 1 0 1 @ 6 0 0 0 5 35 81M 1 0 1 @ 6 0 0 0 5 36 81M 1 0 1 @ 6 0 0 0 5 37 81M 1 0 1 @ 6 0 0 0 5 38 81M 1 0 1 @ 6 0 0 0 5 39 81M 1 0 1 @ 6 0 0 0

Execution Plan ASH details


If an execution plan could be found by the DBMS_XPLAN.DISPLAY_CURSOR / AWR function used, then the ASH data will be shown along the execution plan per operation.

The following additional columns can be "injected" into the original DBMS_XPLAN output, using the same technique that was already used in the previous prototype implementation.

Note that the prototype extended Rowsource Statistics reporting functionality is still part of the script and can be combined with the ASH reporting.

  • Act: Indicates the most recent active plan lines according to ASH (only applicable if the statement is currently executing)

  • Start: Show the number of seconds since statement start the plan line was active the first time

  • Dur: Show the number of seconds the plan line is/was active
  • Time Active Graph: Show a graphical representation of the activity timeline of the plan line

  • Parallel Distribution ASH: Show the Top N processes names along with the number of samples found per SQL plan line id. A trailing "..." indicates that there are more than N processes found

  • Parallel Distribution Graph ASH: Show the distribution of the top processes names either relative to the number of samples per SQL plan line id or to the total number of samples. A trailing "..." indicates more processes than displayed

  • Activity Graph ASH: Show a graphical representation of the number of samples against that SQL plan line id relative to the total number of samples

  • Top N Activity ASH: Show the Top N activities (waits or ON CPU) from ASH for that SQL plan line id. A trailing "..." indicates that there are more than N activities found

Note that it can be configured which of those columns should be shown (or hidden).

Let's have a look at the full output for our sample execution:

SQL_ID 3pfa4kgc92z10, child number 0 ------------------------------------- select max(t1_id), max(t1_filler), max(t2_id), max(t2_filler), max(t3_id), max(t3_filler) from ( select /*+ monitor no_merge parallel(t1 16 2) PQ_DISTRIBUTE(T1 HASH HASH) */ t1.id as t1_id , regexp_replace(t3_filler, '^\s+([[:alnum:]]+)\s+$', lpad('\1', 10), 1, 1, 'c') as t1_filler , v1.* from ( select /*+ no_merge parallel(t2 16 2) parallel(t3 16 2) PQ_DISTRIBUTE(T2 HASH HASH) */ t2.id as t2_id , t2.filler as t2_filler , t2.id2 as t2_id2 , t3.id as t3_id , t3.filler as t3_filler from t1 t2 , t2 t3 where t3.id = t2.id (+) and regexp_replace(t3.filler, '^([[:alnum:]]+)\s+$', lpad('\1', 10), 1, 1, 'c') = regexp_replace(t2.filler (+), '^([[:alnum:]]+)\s+$', lpad('\1', 10), 1, 1, 'c') ) v1 , t1 where v1.t2_id2 = t1.id2 (+) + 2001 ) --where rownum > 1 Plan hash value: 1110592967 ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | Id | Pid | Ord | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | TQ |IN-OUT| PQ Distrib | Start | Dur | Time Active Graph | Parallel Distribution ASH | Parallel Distribution Graph ASH | Activity Graph ASH | Top 5 Activity ASH | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | 0 | | 23 | SELECT STATEMENT | | | | 32 (100)| | | | | | | | | | | | | 1 | 0 | 22 | SORT AGGREGATE | | 1 | 2145 | | | | | | | | | | | | | | 2 | 1 | 21 | PX COORDINATOR | | | | | | | | | +5 | 1 | @ | 6-P013(1) | 0 | | ON CPU(1) | | 3 | 2 | 20 | PX SEND QC (RANDOM) | :TQ10004 | 1 | 2145 | | | Q1,04 | P->S | QC (RAND) | | | | | | | | | 4 | 3 | 19 | SORT AGGREGATE | | 1 | 2145 | | | Q1,04 | PCWP | | +38 | 1 | @ | 5-P007(1) | 0 | | ON CPU(1) | | 5 | 4 | 18 | VIEW | | 2000K| 4091M| 32 (7)| 00:00:01 | Q1,04 | PCWP | | +6 | 32 | @@@@@@@@@@@@@@@@ | 5-P007(32) | 00000000 | @@@ | ON CPU(32) | |* 6 | 5 | 17 | HASH JOIN RIGHT OUTER | | 2000K| 288M| 32 (7)| 00:00:01 | Q1,04 | PCWP | | | | | | | | | | 7 | 6 | 4 | PX RECEIVE | | 2000 | 16000 | 2 (0)| 00:00:01 | Q1,04 | PCWP | | | | | | | | | | 8 | 7 | 3 | PX SEND HASH | :TQ10002 | 2000 | 16000 | 2 (0)| 00:00:01 | Q1,02 | P->P | HASH | | | | | | | | | 9 | 8 | 2 | PX BLOCK ITERATOR | | 2000 | 16000 | 2 (0)| 00:00:01 | Q1,02 | PCWC | | | | | | | | | |* 10 | 9 | 1 | TABLE ACCESS STORAGE FULL | T1 | 2000 | 16000 | 2 (0)| 00:00:01 | Q1,02 | PCWP | | | | | | | | | | 11 | 6 | 16 | PX RECEIVE | | 2000K| 272M| 29 (4)| 00:00:01 | Q1,04 | PCWP | | | | | | | | | | 12 | 11 | 15 | PX SEND HASH | :TQ10003 | 2000K| 272M| 29 (4)| 00:00:01 | Q1,03 | P->P | HASH | | | | | | | | | 13 | 12 | 14 | VIEW | | 2000K| 272M| 29 (4)| 00:00:01 | Q1,03 | PCWP | | | | | | | | | |* 14 | 13 | 13 | HASH JOIN RIGHT OUTER BUFFERED| | 2000K| 411M| 29 (4)| 00:00:01 | Q1,03 | PCWP | | +1 | 5 | @@@ | 5-P016(1),5-P017(1),5-P018(1),5-P019(1),5-P020(1),... | 0123456789ABCDEF012 | @@ | ON CPU(19) | | 15 | 14 | 8 | PX RECEIVE | | 2000 | 212K| 2 (0)| 00:00:01 | Q1,03 | PCWP | | | | | | | | | | 16 | 15 | 7 | PX SEND HASH | :TQ10000 | 2000 | 212K| 2 (0)| 00:00:01 | Q1,00 | P->P | HASH | | | | | | | | | 17 | 16 | 6 | PX BLOCK ITERATOR | | 2000 | 212K| 2 (0)| 00:00:01 | Q1,00 | PCWC | | | | | | | | | |* 18 | 17 | 5 | TABLE ACCESS STORAGE FULL | T1 | 2000 | 212K| 2 (0)| 00:00:01 | Q1,00 | PCWP | | | | | | | | | | 19 | 14 | 12 | PX RECEIVE | | 2000K| 204M| 27 (4)| 00:00:01 | Q1,03 | PCWP | | +5 | 1 | @ | 6-P024(1),6-P025(1),6-P027(1) | 012 | | ON CPU(3) | | 20 | 19 | 11 | PX SEND HASH | :TQ10001 | 2000K| 204M| 27 (4)| 00:00:01 | Q1,01 | P->P | HASH | +1 | 5 | @@@ | 6-P000(5),6-P001(5),6-P004(5),6-P007(5),6-P008(5),... | 0123456789ABCDEF0123456789ABCDEF | @@@@@@@@@@@@@@ | ON CPU(133) | | 21 | 20 | 10 | PX BLOCK ITERATOR | | 2000K| 204M| 27 (4)| 00:00:01 | Q1,01 | PCWC | | +4 | 1 | @ | 5-P006(1) | 0 | | ON CPU(1) | |* 22 | 21 | 9 | TABLE ACCESS STORAGE FULL | T2 | 2000K| 204M| 27 (4)| 00:00:01 | Q1,01 | PCWP | | | | | | | | | -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

Note in particular that the "Activity Graph ASH" roughly corresponds to the "Activity" columns of the "Plan Statistics" tab in Real-Time SQL Monitoring, but the "Parallel Distribution" columns along with the "Time Active Graph" column make it very obvious which parts of the execution plan are affected by (this type of) Parallel Execution skew and where the time is lost. This is critical information in understanding for systematic troubleshooting and what could be changed to address the problem.

This information usually is only available by looking at the dynamic performance view V$PQ_TQSTAT. But this view has some serious limitations:

  1. It is only available from within the Query Coordinator session after statement execution. It cannot be queried from outside. It therefore requires you to reproduce the issue separately in a session which might be hard (sometimes impossible due to transient data / other run time effects) and troublesome.

  2. It is only populated after a successful completion of the statement. In case of errors (running out of TEMP space for example) or cancelled queries due to long execution times it won't be populated.

  3. Depending on the version of Oracle and the execution plan shape (multiple DFOs, Parallel DML/DDL) the output might be incomplete and misleading.

Here is a snippet from the V$PQ_TQSTAT output for this particular statement execution where you can easily tell that a particular re-distribution of the data caused the skew:

. . . 1 2 Consumer 69 1795 0 0 103 30 P015 5 1 2 Consumer 77 1916 0 0 95 23 P015 6 1 3 Producer 62563 7184881 0 0 0 0 P016 5 1 3 Producer 62384 7164427 0 0 0 0 P016 6 1 3 Producer 62546 7182963 0 0 0 0 P017 6 1 3 Producer 62655 7195741 0 0 0 0 P017 5 1 3 Producer 62801 7212428 0 0 0 0 P018 5 1 3 Producer 62404 7166769 0 0 0 0 P018 6 1 3 Producer 62275 7151967 0 0 0 0 P019 5 1 3 Producer 62227 7146379 0 0 0 0 P019 6 1 3 Producer 62371 7162847 0 0 0 0 P020 5 1 3 Producer 62401 7166200 0 0 0 0 P020 6 1 3 Producer 62357 7161366 0 0 0 0 P021 5 1 3 Producer 62604 7189675 0 0 0 0 P021 6 1 3 Producer 62531 7181198 0 0 0 0 P022 5 1 3 Producer 62265 7150921 0 0 0 0 P022 6 1 3 Producer 62949 7229230 0 0 0 0 P023 5 1 3 Producer 62744 7205863 0 0 0 0 P023 6 1 3 Producer 62414 7167882 0 0 0 0 P024 5 1 3 Producer 62379 7164069 0 0 0 0 P024 6 1 3 Producer 62342 7159771 0 0 0 0 P025 5 1 3 Producer 62367 7162557 0 0 0 0 P025 6 1 3 Producer 62451 7171954 0 0 0 0 P026 6 1 3 Producer 62454 7172544 0 0 0 0 P026 5 1 3 Producer 62449 7171993 0 0 0 0 P027 5 1 3 Producer 62436 7170411 0 0 0 0 P027 6 1 3 Producer 62530 7181313 0 0 0 0 P028 5 1 3 Producer 62909 7224693 0 0 0 0 P028 6 1 3 Producer 62878 7221204 0 0 0 0 P029 5 1 3 Producer 62791 7211203 0 0 0 0 P029 6 1 3 Producer 62335 7158907 0 0 0 0 P030 6 1 3 Producer 62108 7132695 0 0 0 0 P030 5 1 3 Producer 62750 7206487 0 0 0 0 P031 5 1 3 Producer 62330 7158191 0 0 0 0 P031 6 1 3 Consumer 0 768 0 0 35 8 P000 5 1 3 Consumer 0 768 0 0 47 3 P000 6 1 3 Consumer 0 768 0 0 48 13 P001 5 1 3 Consumer 0 768 0 0 50 5 P001 6 1 3 Consumer 0 768 0 0 50 16 P002 5 1 3 Consumer 0 768 0 0 48 10 P002 6 1 3 Consumer 0 768 0 0 39 6 P003 5 1 3 Consumer 0 768 0 0 48 2 P003 6 1 3 Consumer 0 768 0 0 40 8 P004 5 1 3 Consumer 0 768 0 0 37 2 P004 6 1 3 Consumer 0 768 0 0 44 2 P005 6 1 3 Consumer 0 768 0 0 52 15 P005 5 1 3 Consumer 0 768 0 0 57 12 P006 6 1 3 Consumer 0 768 0 0 54 19 P006 5 1 3 Consumer 2000000 229664921 0 0 49 16 P007 5 1 3 Consumer 0 768 0 0 53 8 P007 6 1 3 Consumer 0 768 0 0 61 11 P008 6 1 3 Consumer 0 768 0 0 44 11 P008 5 1 3 Consumer 0 768 0 0 54 4 P009 6 1 3 Consumer 0 768 0 0 48 17 P009 5 1 3 Consumer 0 768 0 0 48 10 P010 6 1 3 Consumer 0 768 0 0 41 5 P010 5 1 3 Consumer 0 768 0 0 47 6 P011 5 1 3 Consumer 0 768 0 0 52 3 P011 6 1 3 Consumer 0 768 0 0 48 14 P012 5 1 3 Consumer 0 768 0 0 49 3 P012 6 1 3 Consumer 0 768 0 0 55 18 P013 5 1 3 Consumer 0 768 0 0 52 9 P013 6 1 3 Consumer 0 768 0 0 51 18 P014 5 1 3 Consumer 0 768 0 0 42 3 P014 6 1 3 Consumer 0 768 0 0 61 18 P015 5 1 3 Consumer 0 768 0 0 51 10 P015 6 1 4 Producer 1 4306 0 0 0 0 P000 5 1 4 Producer 1 4306 0 0 0 0 P000 6 . . .

Execution Plan ASH details - no plan available


If no execution plan is available, then the same information on execution plan line level will be provided as a simple result set that still can be helpful in understanding the problem:

PLAN_LINE PLAN_OPERATION Act Parallel Distribution ASH Parallel Distribution Graph ASH Activity Graph ASH Top 5 Activity ASH Start Dur Time Active Graph ---------- ------------------------------ ----- ------------------------------------------------------- ---------------------------------------- -------------------------------------------------------------------------------- ------------------------- --------------- --------------- ------------------------- 2 PX COORDINATOR 6-P013(1) 0 ON CPU(1) +5 1 @ 4 SORT AGGREGATE 5-P007(1) 0 ON CPU(1) +38 1 @ 5 VIEW 5-P007(32) 00000000 ON CPU(32) @@@ +6 32 @@@@@@@@@@@@@@@@ 14 HASH JOIN RIGHT OUTER BUFFERED 5-P016(1),5-P017(1),5-P018(1),5-P019(1),5-P020(1),... 0123456789ABCDEF012 ON CPU(19) @@ +1 5 @@@ 19 PX RECEIVE 6-P024(1),6-P025(1),6-P027(1) 012 ON CPU(3) +5 1 @ 20 PX SEND HASH 6-P000(5),6-P001(5),6-P004(5),6-P007(5),6-P008(5),... 0123456789ABCDEF0123456789ABCDEF ON CPU(133) @@@@@@@@@@@@@@ +1 5 @@@ 21 PX BLOCK ITERATOR 5-P006(1) 0 ON CPU(1) +4 1 @

Historic ASH


Since the major use case of the tool is post-execution analysis, the same information can be pulled from the historic ASH information persisted on disk and available from the DBA_HIST_ACTIVE_SESS_HISTORY / DBA_HIST_SQL_PLAN views.

Although Oracle by default only writes every 10th sample to disk and hence we end up with samples of sampling, the output can still be helpful. As an example, here is what the tool can extract from the history views for the same execution:


General information ---------------------------------------------- SQL_ID PLAN_HASH_VALUE SQL_EXEC_START SQL_EXEC_ID FORMAT_OPTION LAST_EXEC_SOURCE OPTIONS ASH_SOURCE ------------- --------------- ------------------- ----------- ------------------------- ---------------- ------------------------ ---------- 3pfa4kgc92z10 1110592967 2012-07-23 15:14:39 83886081 ADVANCED ASH ASH,DISTRIB_REL,TIMELINE HIST INSTANCE_COUNT FIRST_SAMPLE LAST_SAMPLE DURATION_SECS STATUS SAMPLE_COUNT SLAVE_COUNT CPU_SAMPLE_COUNT AVERAGE_AS -------------- ------------------- ------------------- ------------- -------- ------------ ----------- ---------------- ---------- 2 2012-07-23 15:14:41 2012-07-23 15:15:08 39 INACTIVE 19 17 19 4,87 INSTANCE_ID FIRST_SAMPLE LAST_SAMPLE START_ACTIVE DURATION_SECS SAMPLE_COUNT SLAVE_COUNT AVERAGE_AS TIME_ACTIVE_GRAPH ----------- ------------------- ------------------- ------------ ------------- ------------ ----------- ---------- -------------------- 5 2012-07-23 15:14:48 2012-07-23 15:15:08 9 30 3 1 1 @@@@@@@@@@@@@@@ 6 2012-07-23 15:14:41 2012-07-23 15:14:41 2 10 16 16 16 @@@@@ Information on Parallel Degree based on ASH ----------------------------------------------- INSTANCE_ID DFO START_ACTIVE DURATION_SECS SAMPLE_COUNT SLAVE_COUNT SET_COUNT ASSUMED_DEGREE AVERAGE_AS TIME_ACTIVE_GRAPH ----------- ------ ------------ ------------- ------------ ----------- ---------- -------------- ---------- -------------------- 5 Q1 9 30 3 1 2 1 1 @@@@@@@@@@@@@@@ 6 Q1 2 10 16 16 2 8 16 @@@@@ Average Active Session Overview based on ASH ----------------------------------------------- INSTANCE_ID DURATION_SECS PGA TEMP CPU OTHER AVERAGE_AS AVERAGE_AS_GRAPH ----------- ------------- ------ ------ ---------- ---------- ---------- --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- 5 3 0 0 0 6 36M 16 0 16 @@@@@@@@@@@@@@@@ 5 13 81M 1 0 1 @ 6 0 0 0 5 23 81M 1 0 1 @ 6 0 0 0 5 33 81M 1 0 1 @ 6 0 0 0 SQL_ID 3pfa4kgc92z10 -------------------- select max(t1_id), max(t1_filler), max(t2_id), max(t2_filler), max(t3_id), max(t3_filler) from ( select /*+ monitor no_merge parallel(t1 16 2) PQ_DISTRIBUTE(T1 HASH HASH) */ t1.id as t1_id , regexp_replace(t3_filler, '^\s+([[:alnum:]]+)\s+$', lpad('\1', 10), 1, 1, 'c') as t1_filler , v1.* from ( select /*+ no_merge parallel(t2 16 2) parallel(t3 16 2) PQ_DISTRIBUTE(T2 HASH HASH) */ t2.id as t2_id , t2.filler as t2_filler , t2.id2 as t2_id2 , t3.id as t3_id , t3.filler as t3_filler from t1 t2 , t2 t3 where t3.id = t2.id (+) and regexp_replace(t3.filler, '^([[:alnum:]]+)\s+$', lpad('\1', 10), 1, 1, 'c') = regexp_replace(t2.filler (+), '^([[:alnum:]]+)\s+$', lpad('\1', 10), 1, 1, 'c') ) v1 , t1 where v1.t2_id2 = t1.id2 (+) + 2001 ) --where rownum > 1 Plan hash value: 1110592967 ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | Id | Pid | Ord | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | TQ |IN-OUT| PQ Distrib | Start | Dur | Time Active Graph | Parallel Distribution ASH | Parallel Distribution Graph ASH | Activity Graph ASH | Top 5 Activity ASH | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | 0 | | 23 | SELECT STATEMENT | | | | 32 (100)| | | | | | | | | | | | | 1 | 0 | 22 | SORT AGGREGATE | | 1 | 2145 | | | | | | | | | | | | | | 2 | 1 | 21 | PX COORDINATOR | | | | | | | | | | | | | | | | | 3 | 2 | 20 | PX SEND QC (RANDOM) | :TQ10004 | 1 | 2145 | | | Q1,04 | P->S | QC (RAND) | | | | | | | | | 4 | 3 | 19 | SORT AGGREGATE | | 1 | 2145 | | | Q1,04 | PCWP | | | | | | | | | | 5 | 4 | 18 | VIEW | | 2000K| 4091M| 32 (7)| 00:00:01 | Q1,04 | PCWP | | +9 | 21 | @@@@@@@@@@@ | 5-P007(3) | 000000 | @@@ | ON CPU(3) | | 6 | 5 | 17 | HASH JOIN RIGHT OUTER | | 2000K| 288M| 32 (7)| 00:00:01 | Q1,04 | PCWP | | | | | | | | | | 7 | 6 | 4 | PX RECEIVE | | 2000 | 16000 | 2 (0)| 00:00:01 | Q1,04 | PCWP | | | | | | | | | | 8 | 7 | 3 | PX SEND HASH | :TQ10002 | 2000 | 16000 | 2 (0)| 00:00:01 | Q1,02 | P->P | HASH | | | | | | | | | 9 | 8 | 2 | PX BLOCK ITERATOR | | 2000 | 16000 | 2 (0)| 00:00:01 | Q1,02 | PCWC | | | | | | | | | | 10 | 9 | 1 | TABLE ACCESS STORAGE FULL | T1 | 2000 | 16000 | 2 (0)| 00:00:01 | Q1,02 | PCWP | | | | | | | | | | 11 | 6 | 16 | PX RECEIVE | | 2000K| 272M| 29 (4)| 00:00:01 | Q1,04 | PCWP | | | | | | | | | | 12 | 11 | 15 | PX SEND HASH | :TQ10003 | 2000K| 272M| 29 (4)| 00:00:01 | Q1,03 | P->P | HASH | | | | | | | | | 13 | 12 | 14 | VIEW | | 2000K| 272M| 29 (4)| 00:00:01 | Q1,03 | PCWP | | | | | | | | | | 14 | 13 | 13 | HASH JOIN RIGHT OUTER BUFFERED| | 2000K| 411M| 29 (4)| 00:00:01 | Q1,03 | PCWP | | | | | | | | | | 15 | 14 | 8 | PX RECEIVE | | 2000 | 212K| 2 (0)| 00:00:01 | Q1,03 | PCWP | | | | | | | | | | 16 | 15 | 7 | PX SEND HASH | :TQ10000 | 2000 | 212K| 2 (0)| 00:00:01 | Q1,00 | P->P | HASH | | | | | | | | | 17 | 16 | 6 | PX BLOCK ITERATOR | | 2000 | 212K| 2 (0)| 00:00:01 | Q1,00 | PCWC | | | | | | | | | | 18 | 17 | 5 | TABLE ACCESS STORAGE FULL | T1 | 2000 | 212K| 2 (0)| 00:00:01 | Q1,00 | PCWP | | | | | | | | | | 19 | 14 | 12 | PX RECEIVE | | 2000K| 204M| 27 (4)| 00:00:01 | Q1,03 | PCWP | | | | | | | | | | 20 | 19 | 11 | PX SEND HASH | :TQ10001 | 2000K| 204M| 27 (4)| 00:00:01 | Q1,01 | P->P | HASH | +2 | 1 | @ | 6-P000(1),6-P001(1),6-P002(1),6-P003(1),6-P004(1),... | 00112233445566778899AABBCCDDEEFF | @@@@@@@@@@@@@@@@@ | ON CPU(16) | | 21 | 20 | 10 | PX BLOCK ITERATOR | | 2000K| 204M| 27 (4)| 00:00:01 | Q1,01 | PCWC | | | | | | | | | | 22 | 21 | 9 | TABLE ACCESS STORAGE FULL | T2 | 2000K| 204M| 27 (4)| 00:00:01 | Q1,01 | PCWP | | | | | | | | | ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------


So although we can see that a lot of information has been lost, in this case the most important aspects can still be told from the output based on historic data.

Notes


Note: Depending on the version and the particular circumstances the ASH data might be subject to bugs / instrumentation issues - so some of the samples might miss important information like the correct SQL_ID, or SQL_EXEC_START / SQL_EXEC_ID / SQL_PLAN_LINE_ID is blank. I've noticed that the Real-Time SQL Monitoring attempts to address some of these issues in a way that this script does not yet, so this might explain some differences in the output of the two. Obviously the script can only partially address such issues and the significance of the analysis depends on the accuracy of the ASH sample data.

Furthermore the treatment of recursive SQL is different between Real-Time SQL Monitoring and this script, so depending on what you exactly analyse the output might differ significantly - this script at present only shows output for ASH data that is assigned to the SQL_ID in question. Note that the way the the Real-Time SQL Monitoring handles this can have some interesting side effects, too.
I already have some ideas how this could be changed in a future version, but decided to publish the script in its current version as it should be helpful in many cases as it is now.

  • The script requires write access to the current working directory. If you get errors shown during the execution of the script like:

    SP2-0110: Cannot create save file ".xplan_ash_temp"

    plan_operations as
                    *
    ERROR at line 14:
    ORA-00933: SQL command not properly ended

    then you cannot write to the current work directory

  • There are multiple comments in the script like "If you need to debug". By following the instructions shown there you can change the verbosity of the script in order to understand what went wrong. I would need such detailed output in order to help if the script doesn't work as intended

  • A note to RAC users: If the current instance was *not* involved in executing the SQL, and the execution plan should be displayed from the Shared Pool (CURR option), in best case the execution plan cannot be found.

    In worst case an incorrect plan will be associated from the local instance Shared Pool - it is possible to have the same SQL_ID / CHILD_NUMBER with different plans in different RAC instances.

    Therefore you need to be careful with cross-instance / remote-instance executions in RAC.

    Why? The tool at present relies on DBMS_XPLAN.DISPLAY_CURSOR for showing the execution plan from the Shared Pool - but DISPLAY_CURSOR is limited to the local Shared Pool. There is already a future version in planning that addresses this shortcoming

  • At present the script assigns operations that don't have a SQL_PLAN_LINE_ID to the operation ID = 0. So you need to be careful if you see an unusual high activity of the initial plan operation. It is a good idea in such a case to cross-check the raw ASH data and see if there are many samples that lack a SQL_PLAN_LINE_ID information

  • Usually the DISTRIB_REL option is the most reasonable choice for analysing the distribution of work between Parallel Slaves using the "Parallel Distribution Graph" on execution line level, but in case of very different workload of execution plan lines (for example one line accounts for thousands of samples, but others only for far less) the skew in other lines might not be that obvious. In that case the DISTRIB option can make that skew more obvious, but the downside of that option is that insignificant plan lines that contribute only a few samples might show up as very skewed which can be misleading. You need to cross-check then the number of actual number samples for such plan lines in order to understand the significance of the output.

  • Some options of the the script don't work like similar options to Real-Time SQL Monitoring. This is a deliberate design decision, since the tool doesn't require necessarily the Tuning Pack license to function and therefore sometimes uses a different approach. One example is the last SQL executed for a given Session ID - Real-Time SQL Monitoring offers a similar option, but reports then the last monitored SQL of that session, which isn't necessarily the last SQL executed by that session.

Options Reference


Here is a reference of the (command-line) options available:


SQL_ID (or SID=<nnn>[@<inst_id>])


Specify the SQL_ID to analyse. Alternatively specify a Session Identifier, optionally with a trailing Instance ID (for RAC). The tool will attempt to get the current / previous SQL_ID / CHILD_NUMBER of that session and use that as SQL_ID / CHILD_NUMBER input


CHILD_NUMBER (or PLAN_HASH_VALUE)


Identify the CHILD cursor in the Shared Pool, or the Execution Plan for the given SQL_ID from AWR by PLAN_HASH_VALUE (if you use the other option - see below - to obtain execution plan information from AWR instead of Shared Pool)

If both SQL_ID and CHILD_NUMBER/PLAN_HASH_VALUE are omitted the previous SQL_ID / CHILD_NUMBER of the current session will be used.

Note that if the CHILD_NUMBER is not provided, but a SQL_ID only, based on the remaining options specified or not (SQL_EXEC_START / SQL_EXEC_ID) the script will attempt to lookup this information from ASH / Real-Time SQL Monitoring.


DBMS_XPLAN.DISPLAY* format option


By default the tool uses the most verbose formatting of DBMS_XPLAN. You can either change the default in the script or specify here a different format option. Note that you need to explicitly request ALLSTATS / MEMSTATS / IOSTATS or similar if you want to make use of the extended Rowsource Statistics feature of the tool


SQL_EXEC_START / SQL_EXEC_ID


This is a date / number pair that together uniquely identifies an SQL statement execution instance. The default date format is "YYYY-MM-DD HH24:MI:SS" but can be changed in the configuration section of the script.

If that information is omitted and ASH information should be processed, the tool searches for the last execution of the given SQL_ID information in either ASH or Real-Time SQL Monitoring data.


MONITOR/ASH


Where to search for the last execution, in GV$SQL_MONITOR or GV$ACTIVE_SESSION_HISTORY. By default ASH is searched, but in the presence of Real-Time SQL Monitoring (Tuning Pack license) this can be used to attempt that both tools pick up the same last execution.


ASH options


A string consisting of ASH, DISTRIB[_REL|TOT] and/or TIMELINE that defines which ASH information will be shown in the output. The DISTRIB options relate to Parallel Execution information, the TIMELINE to the "Start / Duration  Time Active Graph" info on the execution plan line level, and ASH to everything else.

By default the tool uses the DISTRIB_REL option. The other options DISTRIB_TOT and DISTRIB change the proportion of the "Parallel Distribution Graph". You can play with these different options - depending on the individual execution they might make things more or less obvious.

Note: Usually the DISTRIB_REL option is the most reasonable choice, but in case of very different workload of execution plan lines (for example one line accounts for thousands of samples, but others only for far less) the skew in other lines might not be that obvious. In that case the DISTRIB option can make that skew more obvious, but the downside of that option is that insignificant plan lines that contribute only a few samples might show up as very skewed which can be misleading. You need to cross-check then the number of actual number samples for such plan lines in order to understand the significance of the output.

If NONE (or any other invalid string) is specified as ASH option, no ASH based information will be shown. This might make sense if you only want to use the Rowsource Statistics feature of the script.


CURR/HIST/MIXED


Tell the tool where to get the ASH information from.

CURR means GV$ACTIVE_SESSION_HISTORY and DBMS_XPLAN.DISPLAY_CURSOR for plan formatting.

HIST means DBA_HIST_ACTIVE_SESS_HISTORY and DBMS_XPLAN.DISPLAY_AWR for plan formatting - you need to specify the PLAN_HASH_VALUE instead of the CHILD_NUMBER in that case.

MIXED means ASH data from GV$ACTIVE_SESSION_HISTORY  but obtain the execution plan via DBMS_XPLAN.DISPLAY_AWR, again you need to specify the PLAN_HASH_VALUE instead of the CHILD_NUMBER in that case


Columns to show / hide


This applies to the output on Execution Plan line level. You can either provide a comma separated list of columns to show, or specify which columns *not* to show by using a leading "-" (minus sign). You can find a list of valid column names in the "configuration" section of the script.


Examples


Here are some example calls to the script. Note that by default the interactive prompt for options is enabled in the script, so whatever you pass on the command line can be modified / amended at the interactive prompt:

@XPLAN_ASH

This call will use all configured defaults. The SQL_ID / CHILD_NUMBER will be taken from the current session and will attempt to provide detailed output of the previously executed statement (PREV_SQL_ID from V$SESSION)

@XPLAN_ASH SID=128@3

This call will again use all configured defaults and get the SQL_ID / CHILD_NUMBER information from GV$SESSION for SID=128 on INSTANCE_ID = 3. If the current SQL_ID for that session is blank it will use PREV_SQL_ID instead.

! Note that this is different from Real-Time SQL Monitoring, which will report the latest SQL_ID for this session that has been subject to monitoring, which isn't necessarily the latest SQL executed by that session. !

@XPLAN_ASH xyz

This call will again use all configured defaults and report activity for SQL_ID xyz.

Note that since the CHILD_NUMBER is not provided, based on the remaining options specified or not (SQL_EXEC_START / SQL_EXEC_ID) the script will attempt to lookup this information from ASH / Real-Time SQL Monitoring.

@XPLAN_ASH xyz 2 "ADVANCED ALLSTATS ALL" "" "" MONITOR "ASH,DISTRIB,TIMELINE" "" "-E_ROWS_TIMES_START,-LIO_RATIO"

This call specifies most of the options. It will analyze SQL_ID "xyz", child number 2. The call to DBMS_XPLAN.DISPLAY_CURSOR will use the "ADVANCED ALLSTATS ALL" option which includes the Rowsource Statistics for all previous executions (you need this for Parallel Execution with Rowsource Statistics, but need to be careful with multiple executions of the same statement) - assuming that child cursor 2 of SQL_ID xyz holds that additional information (GATHER_PLAN_STATISTICS hint or STATISTICS_LEVEL = ALL). The SQL_EXEC_START and SQL_EXEC_ID information is not provided, but the MONITOR option tells the script to search for the last monitored execution of that SQL_ID / CHILD_NUMBER information.

For the ASH based analysis it uses the "ASH,DISTRIB,TIMELINE" option which is different from the default provided (ASH,DISTRIB_REL,TIMELINE). See above for an explanation of these options.

The ASH source is left as default (should be CURR, but can be configured in the script), and two columns will not be shown on execution plan line level

@XPLAN_ASH xyz 11223445566 "" "2012-07-23 18:14:37" 16777616 "" "" "HIST"

This call specifies for SQL_ID xyz a PLAN_HASH_VALUE of 11223445566, because it refers to the historic ASH from DBA_HIST_ACTIVE_SESS_HISTORY and tries to get an execution plan via DBMS_XPLAN.DISPLAY_AWR from DBA_HIST_SQL_PLAN. The SQL_EXEC_START and SQL_EXEC_ID are also specified, therefore the ASH/MONITOR option isn't relevant here (the script won't search for an execution instance if a SQL_EXEC_START / SQL_EXEC_ID is explicitly specified). Finally the HIST option specifies where to get the information from.

@XPLAN_ASH xyz 1 "ADVANCED ALLSTATS LAST" "" "" "" NONE

For the given SQL_ID xyz, child cursor 1, provide only the additional information on execution plan line level based on the Rowsource Execution statistics. No ASH data will be analysed.

4 comments:

timurakhmadeev said...

Hi Randolf

thanks for sharing this very cool script and findings on multiple DFOs.
One off-topic question though: why it is called XPLAN_ASH? I'd expect PXPLAN_ASH or something like this :)

Randolf said...

Hi Timur,

thanks for the feedback.

I think you can call the script whatever you like :-) as long as you find it useful.

Randolf

goran said...

excelent !

Toleg said...

thank you very much for such incredible script.

correspondence between plan steps and parallel slaves - that's outstanding


I think spooling by default would by handy. because script produce enough big output and it's much more handy to analyze it in notepad++ than observe wrapped lines in sqlplus output