Monday, February 9, 2009

Index range Scan Vs Nested loop in IN-LIST ?

Which one is better over the above? This question has been the one of the latest topic in our performance team when using a viable solution for varying In-List management.


Background:

We all know Oracle's shared pool is sensitive to the way sql is written - whitespace, uppercase/lowercase difference, number of bind variables in IN-Clause.
To avoid shared pool being bombarded with thousands of sqls of same type, we currently are using buckets for IN-clause.

Ex:

select * from emp where empno in (:B1,:B2,:B3,......,:B16) -- this is bucket size 16.

So, if user passes only one bind value, the rest of bind values will go as null. 
And if user passes two bind values, the rest of bind values will go as null. 

This way we control, the number of sqls which needs to be maintained in shared pool.

We implement different size of buckets: 16, 32, 64, 128, 256, 512, 768, 1000

Upon searching i came across Tomy kyte's solution mentioned here (http://tkyte.blogspot.com/2006/06/varying-in-lists.html).
Initially i used his 8i solution and results were somehow unexpected:
Here is the testcase:

SQL> create or replace type stringTableType as table of varchar2(4000);
  2  /

Type created.


SQL> SQL>   2    3    4
  5      v_str   long default p_str || ',';
  6
  7      v_n        number;
  8
  9      v_data    stringTableType := stringTableType();
 10
 11  begin
 12
 13      loop
 14
 15          v_n := instr( v_str, ',' );
 16
 17          exit when (nvl(v_n,0) = 0);
 18
 19          v_data.extend;
 20
 21          v_data( v_data.count ) := ltrim(rtrim(substr(v_str,1,v_n-1)));
 22
 23          v_str := substr( v_str, v_n+1 );
 24
 25      end loop;
 26
 27      return v_data;
 28
 29  end;
 30
 31  /

Function created.

SQL> create table test as select rownum r from dual connect by level<=100000;

Table created.

SQL> create index test_ux on test(r);

Index created.

SQL> exec dbms_stats.gather_table_stats(user,'TEST',cascade=>true)

PL/SQL procedure successfully completed.


Now running a NORMAL in-list with 10 inlist values:

SQL> declare
a number;
begin
for i in 1..10000 loop
select count(*) into a from test where r in (1,2,3,4,5,6,7,8,9,10);
end loop;
end;
/
 

PL/SQL procedure successfully completed.

Elapsed: 00:00:01.23

Note the "Elapsed" time. It is 1.5 secs.

Now using the Collection method:

SQL> SQL> declare
a number;
begin
for i in 1..10000 loop
select count(*) into a from test
where r in ( select /*+ cardinality(10) */ * from THE ( select cast( convert2Table( '1,2,3,4,5,6,7,8,9,10' ) as stringTableType ) from dual ) );
end loop;
end;
/
 

PL/SQL procedure successfully completed.

Elapsed: 00:00:22.41

Note the "Elapsed" time is almost 15x more than normal inlist.

This was a bit surprising.

I did the same test under 10046 + tkprof eyes.

SELECT COUNT(*)
FROM
 TEST WHERE R IN (1,2,3,4,5,6,7,8,9,10)


call     count       cpu    elapsed       disk      query    current        rows
------- ------  -------- ---------- ---------- ---------- ----------  ----------
Parse        1      0.00       0.00          0          0          0           0
Execute  10000      0.46       0.42          0          0          0           0
Fetch    10000      1.99       2.00          0     200000          0       10000
------- ------  -------- ---------- ---------- ---------- ----------  ----------
total    20001      2.46       2.43          0     200000          0       10000

Rows     Row Source Operation
-------  ---------------------------------------------------
  10000  SORT AGGREGATE (cr=200000 pr=0 pw=0 time=2117708 us)
 100000   INLIST ITERATOR  (cr=200000 pr=0 pw=0 time=2271474 us)
 100000    INDEX RANGE SCAN TEST_UX (cr=200000 pr=0 pw=0 time=1354629 us)(object id 60859168)

 

SELECT COUNT(*)
FROM
 TEST WHERE R IN ( SELECT /*+ cardinality(10) */ * FROM THE ( SELECT CAST(
  CONVERT2TABLE( '1,2,3,4,5,6,7,8,9,10' ) AS STRINGTABLETYPE ) FROM DUAL ) )

call     count       cpu    elapsed       disk      query    current        rows
------- ------  -------- ---------- ---------- ---------- ----------  ----------
Parse        1      0.01       0.01          0         48          0           0
Execute  10000      5.40       5.42          0          0          0           0
Fetch    10000      9.57       9.65          0     120000          0       10000
------- ------  -------- ---------- ---------- ---------- ----------  ----------
total    20001     14.99      15.09          0     120048          0       10000

Rows     Row Source Operation
-------  ---------------------------------------------------
  10000  SORT AGGREGATE (cr=120000 pr=0 pw=0 time=13856299 us)
 100000   NESTED LOOPS  (cr=120000 pr=0 pw=0 time=16447839 us)
 100000    VIEW  VW_NSO_1 (cr=0 pr=0 pw=0 time=11596021 us)
 100000     HASH UNIQUE (cr=0 pr=0 pw=0 time=11085381 us)
 100000      COLLECTION ITERATOR PICKLER FETCH CONVERT2TABLE (cr=0 pr=0 pw=0 time=3527457 us)
  20000       FAST DUAL  (cr=0 pr=0 pw=0 time=151992 us)
 100000    INDEX RANGE SCAN TEST_UX (cr=120000 pr=0 pw=0 time=2182846 us)(object id 60859168)


The difference between 2 is sea. Although the later did almost 1/2 LIO yet the total time spent on CPU was huge.

It seems to me that NL over INLIST ITERATOR is not a good choice ATLEAST for 10 values.

Now, I tried with 1000 values -- i didnt capture the screen shots. But here is what i saw.

-  for loop that runs 1000 times -- the "R in (1,2,3...1000)" took 54 secs. And using the Tom Kyte's suggested methood - it took 16 minutes.

Am i missing something?


21 comments:

Nigel said...

Sachin

I notice you used Convert2Table ... AS StringTableType. This means all your list values then need to be implicitly converted to number.

In that circumstance, a hash join seems like a pretty good plan (much better than nested FTS).

Change the function to return a NumberTableType, and see what difference it makes!

Regards Nigel

Sachin said...

Hi Nigel,

Thanks for your reply!

I did what you said - but it didnt seems to work as we thought ..

it took more than 20 secs again to run the sql 10000 times as against 1.5 secs with normal "IN" clause"

Here is the tkprof results of the NumberType:

SELECT COUNT(*)
FROM
TEST WHERE R IN ( SELECT /*+ cardinality(10) */ * FROM THE ( SELECT CAST(

CONVERT2TABLE( '1,2,3,4,5,6,7,8,9,10' ) AS NUMBERTABLETYPE ) FROM DUAL ) )





call count cpu elapsed disk query current rows

------- ------ -------- ---------- ---------- ---------- ---------- ----------

Parse 1 0.01 0.01 0 25 0 0

Execute 10000 4.80 4.78 0 0 0 0

Fetch 10000 13.42 13.09 0 120000 0 10000

------- ------ -------- ---------- ---------- ---------- ---------- ----------

total 20001 18.24 17.89 0 120025 0 10000

Narendra said...

Sachin,

Is your code expected to be for 8i ?
I could not run it as it is on 8i database. The CONNECT BY trick to generate extra rows works from 9i onwards (not sure about 11g though). So I used a little trick that I had learned from Frank Zou. So that worked. However, if I am not mistaken, CARDINALITY hint is available from 9i onwards and not available in 8i.
Now, assuming this is run on 9i, I found this http://asktom.oracle.com/pls/asktom/f?p=100:11:::::P11_QUESTION_ID:3779680732446#15740265481549 .
I do not have 9i database to test it but it seems it can explain what you are seeing.

Sachin said...

Narendra,

i I did make us of cardinality hint in my original post- Still i see this issue happening!

-- Sachin

Narendra said...

Sachin,

I tested your code, with slight modification. I changed the way CARDINALITY hint is used. When tested on 10g (I don't have 9i to verify), I got the same plan as yours with the way you mentioned hint. With my way, plan changed. So that might be the reason. Following is my test:

SQL> select * from v$version ;

BANNER
----------------------------------------------------------------
Oracle Database 10g Enterprise Edition Release 10.2.0.4.0 - 64bi
PL/SQL Release 10.2.0.4.0 - Production
CORE 10.2.0.4.0 Production
TNS for Solaris: Version 10.2.0.4.0 - Production
NLSRTL Version 10.2.0.4.0 - Production

SQL> set timing on
SQL> declare
2 a number ;
3 begin
4 for i in 1..10000
5 loop
6 select count(*) into a from test_tbl where r in (1,2,3,4,5,6,7,8,9,10) ;
7 end loop;
8 end ;
9 /

PL/SQL procedure successfully completed.

Elapsed: 00:00:01.01
SQL> declare
2 a number ;
3 begin
4 for i in 1..10000
5 loop
6 select count(*) into a from test_tbl where r in ( select /*+ CARDINALITY(t 10) */ *
7 from TABLE(cast(num2tbl('1,2,3,4,5,6,7,8,9,10') as numtbltype)) t) ;
8 end loop ;
9 end ;
10 /

PL/SQL procedure successfully completed.

Elapsed: 00:00:02.05
SQL> set autotrace on
SQL> select count(*) from test_tbl where r in (1,2,3,4,5,6,7,8,9,10) ;

COUNT(*)
----------
10

Elapsed: 00:00:00.00

Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=ALL_ROWS (Cost=10 Card=1 Bytes=4)
1 0 SORT (AGGREGATE)
2 1 INLIST ITERATOR
3 2 INDEX (RANGE SCAN) OF 'TEST_TBL_IDX' (INDEX) (Cost=10
Card=10 Bytes=40)





Statistics
----------------------------------------------------------
1 recursive calls
0 db block gets
20 consistent gets
0 physical reads
0 redo size
203 bytes sent via SQL*Net to client
276 bytes received via SQL*Net from client
2 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
1 rows processed

SQL> select count(*) from test_tbl where r in (1,2,3,4,5,6,7,8,9,10) ;

COUNT(*)
----------
10

Elapsed: 00:00:00.00

Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=ALL_ROWS (Cost=10 Card=1 Bytes=4)
1 0 SORT (AGGREGATE)
2 1 INLIST ITERATOR
3 2 INDEX (RANGE SCAN) OF 'TEST_TBL_IDX' (INDEX) (Cost=10
Card=10 Bytes=40)





Statistics
----------------------------------------------------------
0 recursive calls
0 db block gets
20 consistent gets
0 physical reads
0 redo size
217 bytes sent via SQL*Net to client
276 bytes received via SQL*Net from client
2 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
1 rows processed

SQL> select count(*) from test_tbl where r in ( select /*+ CARDINALITY(t 10) */ *
2 from TABLE(cast(num2tbl('1,2,3,4,5,6,7,8,9,10') as numtbltype)) t) ;

COUNT(*)
----------
10

Elapsed: 00:00:00.00

Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=ALL_ROWS (Cost=30 Card=1 Bytes=6)
1 0 SORT (AGGREGATE)
2 1 NESTED LOOPS (Cost=30 Card=10 Bytes=60)
3 2 SORT (UNIQUE)
4 3 COLLECTION ITERATOR (PICKLER FETCH) OF 'NUM2TBL' (PR
OCEDURE)

5 2 INDEX (RANGE SCAN) OF 'TEST_TBL_IDX' (INDEX) (Cost=1 C
ard=1 Bytes=4)





Statistics
----------------------------------------------------------
12 recursive calls
0 db block gets
39 consistent gets
0 physical reads
0 redo size
217 bytes sent via SQL*Net to client
276 bytes received via SQL*Net from client
2 SQL*Net roundtrips to/from client
1 sorts (memory)
0 sorts (disk)
1 rows processed

SQL> select count(*) from test_tbl where r in ( select /*+ CARDINALITY(t 10) */ *
2 from TABLE(cast(num2tbl('1,2,3,4,5,6,7,8,9,10') as numtbltype)) t) ;

COUNT(*)
----------
10

Elapsed: 00:00:00.00

Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=ALL_ROWS (Cost=30 Card=1 Bytes=6)
1 0 SORT (AGGREGATE)
2 1 NESTED LOOPS (Cost=30 Card=10 Bytes=60)
3 2 SORT (UNIQUE)
4 3 COLLECTION ITERATOR (PICKLER FETCH) OF 'NUM2TBL' (PR
OCEDURE)

5 2 INDEX (RANGE SCAN) OF 'TEST_TBL_IDX' (INDEX) (Cost=1 C
ard=1 Bytes=4)





Statistics
----------------------------------------------------------
0 recursive calls
0 db block gets
12 consistent gets
0 physical reads
0 redo size
217 bytes sent via SQL*Net to client
276 bytes received via SQL*Net from client
2 SQL*Net roundtrips to/from client
1 sorts (memory)
0 sorts (disk)
1 rows processed

SQL> set autotrace off
SQL> spool off

Sachin said...

Narendra,

Thanks for checking this ..

However I checked the way cardinality hint can be provided and found both ways (your and mine) should work with same results:

See here:


SQL> select * from v$version
2 /

BANNER
----------------------------------------------------------------
Oracle Database 10g Enterprise Edition Release 10.2.0.2.0 - 64bi
PL/SQL Release 10.2.0.2.0 - Production
CORE 10.2.0.2.0 Production
TNS for Linux: Version 10.2.0.2.0 - Production
NLSRTL Version 10.2.0.2.0 - Production

SQL>
1 SELECT COUNT(*)
2 FROM
3 TEST WHERE R IN ( SELECT /*+ cardinality(10) */ * FROM THE ( SELECT CAST(
4* NUM2TBL( '1,2,3,4,5,6,7,8,9,10' ) AS NUM2TBLTYPE ) FROM DUAL ) )
SYSTEM: irfc01s3> /

COUNT(*)
----------
10

Elapsed: 00:00:00.10

Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=ALL_ROWS (Cost=31 Card=1 Bytes=17
)

1 0 SORT (AGGREGATE)
2 1 NESTED LOOPS (Cost=31 Card=1 Bytes=17)
3 2 VIEW OF 'VW_NSO_1' (VIEW) (Cost=29 Card=10 Bytes=130) -- See the cardinality here -- it is 10, which is what i want.
4 3 HASH (UNIQUE)
5 4 COLLECTION ITERATOR (PICKLER FETCH) OF 'NUM2TBL' (
PROCEDURE)

6 5 FAST DUAL (Cost=2 Card=1)
7 2 INDEX (RANGE SCAN) OF 'TEST_UX' (INDEX) (Cost=1 Card=1
Bytes=4)





Statistics
----------------------------------------------------------
0 recursive calls
0 db block gets
12 consistent gets
0 physical reads
0 redo size
336 bytes sent via SQL*Net to client
427 bytes received via SQL*Net from client
2 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
1 rows processed


The difference b/w your and my run is probably the SQL itself. You are using:
SELECT COUNT(*)
FROM
TEST WHERE R IN ( SELECT COUNT(*) FROM TEST WHERE R IN ( SELECT /*+
CARDINALITY(t 10) */ * FROM TABLE(CAST(NUM2TBL('1,2,3,4,5,6,7,8,9,10') AS
NUM2TBLTYPE)) T) )

and I'm using :
SELECT COUNT(*)
FROM
TEST WHERE R IN ( SELECT /*+ cardinality(10) */ * FROM THE ( SELECT CAST(
NUM2TBL( '1,2,3,4,5,6,7,8,9,10' ) AS NUM2TBLTYPE ) FROM DUAL ) )


Apart from cardinality hint which has different syntax (but not making any change) -- you are using table function and i'm using DUAL table -- which seems to be making the difference.

The new SQL posted by Narendra has following plan and cpu utilization:

SELECT COUNT(*)
FROM
TEST WHERE R IN ( SELECT COUNT(*) FROM TEST WHERE R IN ( SELECT /*+
CARDINALITY(t 10) */ * FROM TABLE(CAST(NUM2TBL('1,2,3,4,5,6,7,8,9,10') AS
NUM2TBLTYPE)) T) )


call count cpu elapsed disk query current rows
------- ------ -------- ---------- ---------- ---------- ---------- ----------
Parse 1 0.00 0.00 0 24 0 0
Execute 10000 4.41 4.45 0 120000 0 0
Fetch 10000 0.33 0.32 0 20000 0 10000
------- ------ -------- ---------- ---------- ---------- ---------- ----------
total 20001 4.75 4.79 0 140024 0 10000


In this case also, i see cpu utlization almost double when compared to simple IN-List.

Is extra hard parses good anough reason to switch to high CPU model?

Narendra said...

Sachin,

You are correct. However, what made you think that function model causes extra hard parses ? In the row-source operation you posted, I could see that the statement is getting parsed only once for 10000 executions. It appears that using function approach is indeed consuming more CPU time. This seems to be because the function is being called once per execution. Now your test case is executing the SELECT 10000 times, which means there is a context-switch (calling PL/SQL function from SQL) taking place 10000 times. The context-switch does cause excessive CPU usage. When you use "IN (1,2,3,4,5,6,7,8,9,10)" approach, there is no PL/SQL involved and hence no context-switching. You can verify this with creating even a simple function and executing it 10000 or more times from within SQL versus a SQL (or PL/SQL) only approach. I can not access TKProf on my database server and hence can not post results for my reason.
Is extra hard parses good anough reason to switch to high CPU model? Now this seems to be a question that deserves the famous Tom Kyte answer "It depends".

Nigel said...

Narendra asked the wrong question. As Sachin explained in the original post, the real app uses bind variables - with a cunning trick of IN list buckets to support lists of 16, 32, 64, up to 1000 elements.

So I guess we should be asking - for a set of varying sized IN lists does the function based approach work better than
1) hard coded IN lists (no binds)
2) soft coded IN lists (buckets)
3) function based solution

(1) obviously causes lots of hard parses (or use SHARE_CURSORS=FORCE - which is like Oracle doing the buckets for you).

If I get the chance I will try to test all three. The test is only valid if both the list values and the length of the list vary.

Anyway, it's an interesting discussion!

Regards Nigel

Sachin said...

Thanks Narendra and Nigel for your replies ..

@narendra - Although i havent read much on context switching to comment butu logically agree with your context switching argument.
But - we need to take this argument to a conclusion that in which case it could be worse than bucket model which i suggested in my original post.

@nigel - thx for your updates - cursor_sharing is critical parameter to change in any application -- so first look, i'm against it. however trying to choose one of 2 (function based,bucket) based approach.
If you get a chance - please share your views.. thx!

-- Sachin

Nigel said...

I agree about cursor sharing; I had to do it on one system written without bind variables, and it helped a lot - but it's much better to code the app right in the first place whenever possible.

I'll try to do the tests bucket vs function.

Regards Nigel

Narendra said...

Sachin,

Personally, I am not quite comfortable with having an IN clause with more than 20-30 values in it. If the values in IN list can go upto 1000, I would actually take a global temporary table approach (where I add all the values in a GTT and then the SQL becomes SELECT ... FROM tbl WHERE r in (SELECT col FROM GTT)). I guess that should also take care of manually defining buckets and shared SQL.
Will be happy to know if you have faced any issues with GTT approach.

Sachin said...

Hi Narendra,

That is precisely my argument is: i feel whenever we use NESTED Loop -- we spend more cpu time Vs doing a range scan .. probably because of going back to root block and traversing to leaf "n" number of times rather than getting to leaf and walk horizontally (INDEX range scan). I was trying to somehow force INDEX RANGE scan (without nested loops) using GTT - but was not able to do so.

If you can give it a try and share your experiment - it would be great - i will also try the same.

--Sachin

Narendra said...

Sachin,

I am not sure I understood you there. You are saying I was trying to somehow force INDEX RANGE scan (without nested loops) using GTT - but was not able to do so. But Oracle will have to use some joining mechanism when more than one "tables" are involved. How can you hope of getting rid of nested loop with GTT ?
Your first query

SELECT COUNT(*)
FROM
TEST WHERE R IN (1,2,3,4,5,6,7,8,9,10 )

uses only one table (TEST) and hence Oracle does not need to use any of the JOIN mechanisms.

Your second query

SELECT COUNT(*)
FROM
TEST WHERE R IN ( SELECT /*+ cardinality(10) */ * FROM THE ( SELECT CAST(

CONVERT2TABLE( '1,2,3,4,5,6,7,8,9,10' ) AS NUMBERTABLETYPE ) FROM DUAL ) )

is using 2 "tables" (TEST and resultset generated from function) and hence oracle is using NESTED LOOP joining mechanism (but it is also using INDEX RANGE scan to access TEST table).

This will be true even for GTT. If you use a query of the form

SELECT COUNT(*)
FROM
TEST WHERE R IN ( SELECT col from GTT )

Oracle will still treat it as query with 2 tables and use a joining mechanism.
Am I missing something obvious here ?

Sachin said...

@Narendra - this is what i'm searching -- is there a way to use "INLIST ITERATOR" as a substitue to nested (or for that matter any kind of join) loop.
I just gave a few min try to user push_subq to force that but I could not avoid joins

Hoping we can find something like that.

-- Sachin

Narendra said...

Sachin,

Agreed. But I am still not sure why you think nested loop is the cause of high CPU usage, which is resulting in poor response time. In your example, it is evident that context-switch is causing excessive CPU usage and affecting poorperformance. If your goal is to achieve better response time without (much) compromising on latches, CPU etc., shouldn't you be looking at addressing the real issue ?

Nigel said...

Sachin

I haven't had a chance to experiment further, but here's a thread on the Oracle-L list that gives another way of breaking a list into a join using sys_connect_by_path - this time without requiring a user function call and so the context-change that Narendra is worried about. See in particular Stephane Faroult's contribution (2nd in the trail).

Regards Nigel

Sachin said...

(tab)HI Narender,

Sorry for late update.

My concern is : In case of GTT - we will have to insert the values into a tempirary table. This will be a context switch b/w client process and db server and only after the data (that needs to be there in IN clause) is present in server server cache (Global temporary table), we can use that.

insert into (GTT) values ('[value]');

insert into (GTT) values ('[value]');

insert into (GTT) values ('[value]');

insert into (GTT) values ('[value]');

insert into (GTT) values ('[value]');
..
..
..

select * from (tab) where (col) in (select (col) from (GTT));

The additional cost of INSERT can be heavy.

Do you have a better way to optimize that.

Regards,
Sachin

neo said...

sachin, r u the guy from t-474, arora hostel, baljit nagar, delhi?

Rajesh Subramaniam said...

Hi Sachin,

Try the function based approach to insert the list into GTT(this will avoid the number of context switches) and then use the GTT in the join.

BTW, Did you try the 9i method mentioned in Tom Kyte's blog. Could you please share the results of that. Mean while I will also try to try that approach with your test case.

Rajesh Subramaniam said...

For me, using the 9i method, takes 4 seconds, while your method(using binds) takes 1 second.

So, its 4 times slower. I have to check how to reduce it further.

If I could reach anywhere near, I will post here.

msh games said...

thanks