Thursday, November 24, 2011

How To Cancel A Query Running In Another Session

This is not really anything new - in fact Tanel Poder has already blogged about it a while ago. Tanel has specifically covered the handling of "urgent" TCP packets and how this could be used to signal a "cancel" to another process, however this only works on Unix environments and not with Windows SQL*Plus clients. In Tanel's article it is also mentioned that there is an officially documented way of doing this via the Resource Manager if you happen to have an Enterprise Edition license.

In my quick tests however the call to DBMS_RESOURCE_MANAGER.SWITCH_CONSUMER_GROUP_FOR_SESS using "CANCEL_SQL" as consumer group only errors out with ORA-29366 saying that the specified consumer group is invalid.

So ideally there should be an approach that is independent from client or server O/S or license details, and indeed there is one, however it is using an undocumented event and therefore is unsupported and can only be used at your own risk.

If you set event 10237 ("ORA-10237: simulate ^C (for testing purposes)") in a session to any level greater 0 then any currently running and future execution will be "cancelled", so once the cancellation was successful the event needs to be unset otherwise the session will be in an unusable state cancelling any further attempts (applies even if the "lifetime 1" clause is used instead of "forever" when using ORADEBUG to set the event).

So a simple script like the following should be sufficient to cancel a current execution in another session without the need to kill the session.


--------------------------------------------------------
--
-- simulate_control_c.sql
--
-- Purpose:
--
-- Sets event 10237 in a session to simulate
-- pressing CONTROL+C for that session
--
-- Allows to cancel a running SQL statement from
-- a remote session without killing the session
--
-- If the session is stuck on the server side
-- which means that it can't be killed this
-- probably won't help either
--
-- Requirements:
--
-- EXECUTE privilege on SYS.DBMS_SYSTEM
-- SELECT privilege on V$SESSION
--
-- Usage:
--
-- @simulate_control_c <SID>
--
-- Note:
--
-- The usage of that event is undocumented
-- Therefore use at your own risk!
-- Provided for free, without any warranties -
-- test this before using it on anything important
--
-- Other implementation ideas:
--
-- The following code is supposed to achieve the same on Enterprise Edition
-- and enabled Resource Manager in a documented way
-- In all versions tested (10.2.0.4, 11.1.0.7, 11.2.0.1, 11.2.0.2) I get however
-- ORA-29366 and it doesn't work as described
-- Note that the official documentation doesn't explicitly mention CANCEL_SQL as
-- valid consumer group for this call

-- begin
-- sys.dbms_resource_manager.switch_consumer_group_for_sess(
-- <sid>,<serial#>,'CANCEL_SQL'
-- );
-- end;
--
-- When running on Unix KILL -URG sent to the server process
-- should also simulate a Control-C
-- This doesn't work with Windows SQL*Plus clients though
--
-- See Tanel Poder's blog post for more info
-- http://blog.tanelpoder.com/2010/02/17/how-to-cancel-a-query-running-in-another-session/
--
-- Author:
--
-- Randolf Geist
-- http://oracle-randolf.blogspot.com
--
-- Versions tested:
--
-- 11.2.0.1 Server+Client
-- 10.2.0.4 Server
-- 11.2.0.2 Server
--
--------------------------------------------------------

set echo off verify off feedback off

column sid new_value v_sid noprint
column serial# new_value v_serial noprint

-- Get details from V$SESSION
select
sid
, serial#
from
v$session
where
sid = to_number('&1')
and status = 'ACTIVE'
;

declare
-- Avoid compilation errors in case of SID not found
v_sid number := to_number('&v_sid');
v_serial number := to_number('&v_serial');
v_status varchar2(100);
-- 60 seconds default timeout
n_timeout number := 60;
dt_start date := sysdate;
begin
-- SID not found
if v_sid is null then
raise_application_error(-20001, 'SID: &1 cannot be found or is not in STATUS=ACTIVE');
else
-- Set event 10237 to level 1 in session to simulate CONTROL+C
sys.dbms_system.set_ev(v_sid, v_serial, 10237, 1, '');
-- Check session state
loop
begin
select
status
into
v_status
from
v$session
where
sid = v_sid;
exception
-- SID no longer found
when NO_DATA_FOUND then
raise_application_error(-20001, 'SID: ' || v_sid || ' no longer found after cancelling');
end;

-- Status no longer active
-- then set event level to 0 to avoid further cancels
if v_status != 'ACTIVE' then
sys.dbms_system.set_ev(v_sid, v_serial, 10237, 0, '');
exit;
end if;

-- Session still active after timeout exceeded
-- Give up
if dt_start + (n_timeout / 86400) < sysdate then
sys.dbms_system.set_ev(v_sid, v_serial, 10237, 0, '');
raise_application_error(-20001, 'SID: ' || v_sid || ' still active after ' || n_timeout || ' seconds');
end if;

-- Back off after 5 seconds
-- Check only every second from then on
-- Avoids burning CPU and potential contention by this loop
-- However this means that more than a single statement potentially
-- gets cancelled during this second
if dt_start + (5 / 86400) < sysdate then
dbms_lock.sleep(1);
end if;
end loop;
end if;
end;
/


It is particularly useful in Windows environments where the SQL*Plus executable by default doesn't allow cancelling a current execution by pressing Control+C - it works only while fetching or pressing it a second time, terminating the whole SQL*Plus client.

Note that Tanel's method is probably able to cancel queries that this approach cannot cancel because the URGENT signal handler under Unix effectively causes an interrupt to the running process executing the corresponding handler code whereas the event set here has to be actively checked by the code of the running process.

Saturday, November 19, 2011

DOAG 2011 Unconference Wrap-Up

My sessions at DOAG all went well, and I in particular liked the Unconference ones. As promised I held two of them and they were rather different.

The first one only had a couple of attendees (including OakTable fellow Christian Antognini from Switzerland), so we could gather around my laptop and do a real "Optimizer Hacking Session". Actually we had to do that because the projector promised by DOAG wasn't there. I talked mainly about some common traps when performing SQL statement troubleshooting and what to do in order to avoid them, but also showed some cool stuff that I'll shortly blog about separately.

The second session two hours later was attended by many more than I expected, so there was no chance at all to gather around my laptop. Originally I planned to talk about some common pitfalls why the estimates of the optimizer can go wrong (even with 100% computed statistics) and what to do about, but I then realized that I should at least give a short summary of what I've shown in the first session to those that hadn't attended (which were most of the attendees anyway). This started off an interesting discussion with many questions - a surprising number revolved around the usage of user-defined PL/SQL functions.

Since even the WiFi connection didn't work properly I could only mention briefly some important articles that should be read if one wants to get a good understanding of how the optimizer can (actually should!) be helped when dealing with PL/SQL functions.

So for reference I repeat here a summary of relevant articles:

1. Expert Oracle Practices (the book I co-authored), Chapter 7, "PL/SQL and the CBO" by OakTable fellow Joze Senegacnik

2. OakTable Member Adrian Billington on pl/sql functions and cbo costing, which is inspired by Joze's work above

3. Adrian Billington on "setting cardinality for pipelined and table functions"

4. Adrian Billington's generic "Collection Cardinality Utility", for background info see the article above, the source code includes a description when and how to use it

5. The problem of the CBO ignoring the projection for costing, described on my blog

There were some other questions in the same context, for example, what would a Real Time SQL Monitoring report look like with user-defined PL/SQL functions - here is an example you can view / download (remember that the file can be shared since it is self-contained) that is taken from a query quite similar to the one described in my blog post. By the way, it's a query that gives the approach a try discussed during the session if there a chance of getting the "projection" cost right by playing clever tricks that combine projection and selection. I didn't get it to work as desired, but at least the "selection" cost showed up, which however doesn't address the actual problem of the "projection" cost in my particular test case. So Oracle still attempts to merge the views and has then to run the user-defined function many more times than necessary - in fact the query represents the worst case because the user-defined function is called for both, selection and projection.

Sample Real Time SQL Monitoring Report

You can see from the report that at least some PL/SQL time shows up, but it also shows a deficiency of the report - it doesn't show detailed logical I/O values, only a summary of Buffer Gets.

And interestingly, although nothing about this is visible from the end-user report, the report file itself (in the XML data section) contains at least a couple of optimizer parameter settings - to address another question raised during the session.

All in all I really liked these sessions and I hope the attendees enjoyed them as much as I did.

Last, but not least: It's not definitive yet but very likely I'll give a few free "Optimizer Hacking Sessions" in the future on the internet, so stay tuned!

Monday, November 14, 2011

Star Transformation And Cardinality Estimates

If you want to make use of Oracle's cunning Star Transformation feature then you need to be aware of the fact that the star transformation logic - as the name implies - assumes that you are using a proper star schema.

Here is a nice example of what can happen if you attempt to use star transformation but your model obviously doesn't really correspond to what Oracle expects:


drop table d;

purge table d;

drop table t;

purge table t;

create table t
as
select
rownum as id
, mod(rownum, 100) + 1 as fk1
, 1000 + mod(rownum, 10) + 1 as fk2
, 2000 + mod(rownum, 100) + 1 as fk3
, rpad('x', 100) as filler
from
dual
connect by
level <= 1000000
;

exec dbms_stats.gather_table_stats(null, 't')

create bitmap index t_fk1 on t (fk1);

create bitmap index t_fk2 on t (fk2);

create bitmap index t_fk3 on t (fk3);

create table d
as
select
rownum as id
, case when rownum between 1 and 100 then 'Y' else 'N' end as is_flag_d1
, case when rownum between 1001 and 1010 then 'Y' else 'N' end as is_flag_d2
, case when rownum between 2001 and 2100 then 'Y' else 'N' end as is_flag_d3
, rpad('x', 100) as vc1
from
dual
connect by
level <= 10000
;

exec dbms_stats.gather_table_stats(null, 'd', method_opt => 'FOR ALL COLUMNS SIZE 1 FOR COLUMNS SIZE 254 IS_FLAG_D1, IS_FLAG_D2, IS_FLAG_D3');


This is a simplified example of a model where multiple, potentially small, dimensions are stored in a single physical table and the separate dimensions are represented by views that filter the corresponding dimension data from the base table.

So we have a fact table with one million rows and a "collection" dimension table that holds three dimensions, among others.

In order to enable the star transformation bitmap indexes on the foreign keys of the fact table are created.

The dimension table has histograms on the flag columns to tell the optimizer about the non-uniform distribution of the column data.

Now imagine a query where we query the fact table (and possibly do some filtering on the fact table by other means like other dimensions or direct filtering on the fact table) but need to join these three dimensions just for displaying purpose - the dimensions itself are not filtered so the join will not filter out any data.

Let's first have a look at an execution plan of such a simply query with star transformation disabled:


select /*+ no_star_transformation */
count(*)
from
t f
, (select * from d where is_flag_d1 = 'Y') d1
, (select * from d where is_flag_d2 = 'Y') d2
, (select * from d where is_flag_d3 = 'Y') d3
where
f.fk1 = d1.id
and f.fk2 = d2.id
and f.fk3 = d3.id
;

SQL> explain plan for
2 select /*+ no_star_transformation */
3 count(*)
4 from
5 t f
6 , (select * from d where is_flag_d1 = 'Y') d1
7 , (select * from d where is_flag_d2 = 'Y') d2
8 , (select * from d where is_flag_d3 = 'Y') d3
9 where
10 f.fk1 = d1.id
11 and f.fk2 = d2.id
12 and f.fk3 = d3.id
13 ;

Explained.

SQL>
SQL> select * from table(dbms_xplan.display(format => 'BASIC +ROWS +PREDICATE'));
Plan hash value: 77569906

----------------------------------------------
| Id | Operation | Name | Rows |
----------------------------------------------
| 0 | SELECT STATEMENT | | 1 |
| 1 | SORT AGGREGATE | | 1 |
|* 2 | HASH JOIN | | 940K|
|* 3 | TABLE ACCESS FULL | D | 100 |
|* 4 | HASH JOIN | | 945K|
|* 5 | TABLE ACCESS FULL | D | 100 |
|* 6 | HASH JOIN | | 950K|
|* 7 | TABLE ACCESS FULL| D | 10 |
| 8 | TABLE ACCESS FULL| T | 1000K|
----------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

2 - access("F"."FK3"="D"."ID")
3 - filter("IS_FLAG_D3"='Y')
4 - access("F"."FK1"="D"."ID")
5 - filter("IS_FLAG_D1"='Y')
6 - access("F"."FK2"="D"."ID")
7 - filter("IS_FLAG_D2"='Y')


So clearly the optimizer got it quite right - the join to the dimensions is not going to filter out significantly - the slight reduction in rows comes from the calculations based on the histograms generated.

But now try the same again with star transformation enabled:


SQL> explain plan for
2 select /*+ star_transformation opt_param('star_transformation_enabled', 'temp_disable') */
3 count(*)
4 from
5 t f
6 , (select * from d where is_flag_d1 = 'Y') d1
7 , (select * from d where is_flag_d2 = 'Y') d2
8 , (select * from d where is_flag_d3 = 'Y') d3
9 where
10 f.fk1 = d1.id
11 and f.fk2 = d2.id
12 and f.fk3 = d3.id
13 ;

Explained.

SQL>
SQL> select * from table(dbms_xplan.display(format => 'BASIC +ROWS +PREDICATE'));
Plan hash value: 459231705

----------------------------------------------------------
| Id | Operation | Name | Rows |
----------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 |
| 1 | SORT AGGREGATE | | 1 |
|* 2 | HASH JOIN | | 9 |
|* 3 | HASH JOIN | | 9 |
|* 4 | HASH JOIN | | 10 |
|* 5 | TABLE ACCESS FULL | D | 10 |
| 6 | TABLE ACCESS BY INDEX ROWID | T | 10 |
| 7 | BITMAP CONVERSION TO ROWIDS| | |
| 8 | BITMAP AND | | |
| 9 | BITMAP MERGE | | |
| 10 | BITMAP KEY ITERATION | | |
|* 11 | TABLE ACCESS FULL | D | 100 |
|* 12 | BITMAP INDEX RANGE SCAN| T_FK1 | |
| 13 | BITMAP MERGE | | |
| 14 | BITMAP KEY ITERATION | | |
|* 15 | TABLE ACCESS FULL | D | 100 |
|* 16 | BITMAP INDEX RANGE SCAN| T_FK3 | |
| 17 | BITMAP MERGE | | |
| 18 | BITMAP KEY ITERATION | | |
|* 19 | TABLE ACCESS FULL | D | 10 |
|* 20 | BITMAP INDEX RANGE SCAN| T_FK2 | |
|* 21 | TABLE ACCESS FULL | D | 100 |
|* 22 | TABLE ACCESS FULL | D | 100 |
----------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

2 - access("F"."FK3"="D"."ID")
3 - access("F"."FK1"="D"."ID")
4 - access("F"."FK2"="D"."ID")
5 - filter("IS_FLAG_D2"='Y')
11 - filter("IS_FLAG_D1"='Y')
12 - access("F"."FK1"="D"."ID")
15 - filter("IS_FLAG_D3"='Y')
16 - access("F"."FK3"="D"."ID")
19 - filter("IS_FLAG_D2"='Y')
20 - access("F"."FK2"="D"."ID")
21 - filter("IS_FLAG_D1"='Y')
22 - filter("IS_FLAG_D3"='Y')


What an astonishing result: Not only Oracle will try now to access all rows of the fact table by single-block random I/O, which by itself can be a disaster for larger real-life fact tables, in particular when dealing with Exadata features like Smart Scans which are only possible with multi-block direct-path reads, but furthermore if this was part of a more complex execution plan look at the cardinality estimates: They are off by five orders of magnitude - very likely a receipt for disaster for any step following afterwards.

The point here is simple: The Star Transformation calculation model obviously doesn't cope with the "collection" of dimensions in a single table very well, but assumes a dimensional model where each dimension is stored in separate table(s). If you don't adhere to that model the calculation will be badly wrong and the results possibly disastrous.

Here the Star Transformation assumes a filtering on dimension tables that are effectively no filter but this is something the current calculation model is not aware of. If you put the three dimensions in separate tables no "artificial" filter is required and hence the calculation won't be mislead.

Of course one could argue that the star transformation optimization seems to do a poor job since the normal optimization based on the same input data produces a much better estimate, but at least for the time being that's the way this transformation works and the model chosen better reflects this.

Thursday, November 10, 2011

Public Appearances

I've just got the confirmation that I have been accepted as speaker for the HotSOS Symposium 2012 in March next year, therefore I post here a quick update on my upcoming public appearances:

1. DOAG
Next week the DOAG conference 2011 (Nürnberg, Germany) will begin. It's an impressive conference with a large number of tracks, although a lot of them are not database-centred.

Nevertheless it's certainly one of the conferences to go if you speak German and are interested in Oracle database technology (or the ever increasing range of Oracle technology in general).

Since I've again missed to negotiate a training day this year (something I hopefully will be able to do next year) I've decided to compensate this a little bit: I plan to give at least two additional sessions at the DOAG "Unconference". This will be so called "Optimizer hacking sessions" where I simply start up a SQL prompt and explore some of the most important aspects of the Cost Based Optimizer. The session's title is: "Optimizer issues - How to detect and prevent suboptimal execution plans". I hope this will be fun and educating for all of us - it is meant to be an interactive session where you're encouraged to participate by asking questions that we will try to answer by performing live exploration of the database. I've got plenty of material to talk about, starting from basic SQL statement performance troubleshooting and not ending with more advanced topics like histograms, cardinality estimates, virtual columns, extended statistics, clustering factor etc. etc. Furthermore I plan to demonstrate some cool things that you probably haven't seen yet, so I believe it's going to be a lot of fun.

If you've already received the print out with the schedule (I never understand why DOAG publishes these print outs that early given all the possible changes to the schedule until the actual start of the conference): I agreed with Heli Helskyaho from Finland when we met at this year's OOW to swap our presentation slots at DOAG, so my presentation about the Cost-Based Optimizer called "Query Transformations" will not take place on Wednesday, 9 am, but on Thursday, 3 pm, room "Kiew".

Unfortunately DOAG didn't accept the second paper I've submitted that would have been the "introductionary" session about how to read and understand execution plans, so I'll only give this more advanced session about the most important transformations that the optimizer applies to a query, why you should care and how to control them if necessary.

So this is what my preliminary schedule for DOAG looks like:

Thursday, November 17
12 pm: DOAG Unconference - Optimizer hacking session
2 pm: DOAG Unconference - Optimizer hacking session
3 pm: DOAG Conference - Query Transformations, room "Kiew"

2. UKOUG
I won't be at the UKOUG conference (December 5-7, UK, Birmingham) this year but this doesn't mean that I don't recommend going there. In fact I believe this year's conference has one of the most impressive list of speakers ever, including a number of top speakers from Oracle USA that you won't meet anywhere else across Europe, so if you can, get there - it certainly will be a top experience.

3. HotSOS
In March 2012 the 10th HotSOS Symposium (Dallas, Texas) will be held - and I'll be speaking there for the first time (about the CBO and how it calculates joins over histograms, by the way). It's a conference that I particularly look forward to for several reasons.
This year is the 10th anniversary of the conference so some special activities are planned and since this is the only Oracle conference in the world dedicated to performance naturally the OakTable member "density" will be extremely high.

I hope to meet some of you at one of those events (well except UKOUG, that is) !