Sachin Arora's Blog

Monday, February 9, 2009

Index range Scan Vs Nested loop in IN-LIST ?

Which one is better over the above? This question has been the one of the latest topic in our performance team when using a viable solution for varying In-List management.

Background:

We all know Oracle's shared pool is sensitive to the way sql is written - whitespace, uppercase/lowercase difference, number of bind variables in IN-Clause.

To avoid shared pool being bombarded with thousands of sqls of same type, we currently are using buckets for IN-clause.

Ex:

select * from emp where empno in (:B1,:B2,:B3,......,:B16) -- this is bucket size 16.

So, if user passes only one bind value, the rest of bind values will go as null.

And if user passes two bind values, the rest of bind values will go as null.

This way we control, the number of sqls which needs to be maintained in shared pool.

We implement different size of buckets: 16, 32, 64, 128, 256, 512, 768, 1000

Upon searching i came across Tomy kyte's solution mentioned here (http://tkyte.blogspot.com/2006/06/varying-in-lists.html).

Initially i used his 8i solution and results were somehow unexpected:

Here is the testcase:

SQL> create or replace type stringTableType as table of varchar2(4000);

2 /

Type created.

SQL> SQL> 2 3 4

5 v_str long default p_str || ',';

7 v_n number;

9 v_data stringTableType := stringTableType();

11 begin

13 loop

15 v_n := instr( v_str, ',' );

17 exit when (nvl(v_n,0) = 0);

19 v_data.extend;

21 v_data( v_data.count ) := ltrim(rtrim(substr(v_str,1,v_n-1)));

23 v_str := substr( v_str, v_n+1 );

25 end loop;

27 return v_data;

29 end;

31 /

Function created.

SQL> create table test as select rownum r from dual connect by level<=100000;

Table created.

SQL> create index test_ux on test(r);

Index created.

SQL> exec dbms_stats.gather_table_stats(user,'TEST',cascade=>true)

PL/SQL procedure successfully completed.

Now running a NORMAL in-list with 10 inlist values:

SQL> declare

a number;

begin

for i in 1..10000 loop

select count(*) into a from test where r in (1,2,3,4,5,6,7,8,9,10);

end loop;

end;

PL/SQL procedure successfully completed.

Elapsed: 00:00:01.23

Note the "Elapsed" time. It is 1.5 secs.

Now using the Collection method:

SQL> SQL> declare

a number;

begin

for i in 1..10000 loop

select count(*) into a from test

where r in ( select /*+ cardinality(10) */ * from THE ( select cast( convert2Table( '1,2,3,4,5,6,7,8,9,10' ) as stringTableType ) from dual ) );

end loop;

end;

PL/SQL procedure successfully completed.

Elapsed: 00:00:22.41

Note the "Elapsed" time is almost 15x more than normal inlist.

This was a bit surprising.

I did the same test under 10046 + tkprof eyes.

SELECT COUNT(*)

FROM

TEST WHERE R IN (1,2,3,4,5,6,7,8,9,10)

call count cpu elapsed disk query current rows

------- ------ -------- ---------- ---------- ---------- ---------- ----------

Parse 1 0.00 0.00 0 0 0 0

Execute 10000 0.46 0.42 0 0 0 0

Fetch 10000 1.99 2.00 0 200000 0 10000

------- ------ -------- ---------- ---------- ---------- ---------- ----------

total 20001 2.46 2.43 0 200000 0 10000

Rows Row Source Operation

------- ---------------------------------------------------

10000 SORT AGGREGATE (cr=200000 pr=0 pw=0 time=2117708 us)

100000 INLIST ITERATOR (cr=200000 pr=0 pw=0 time=2271474 us)

100000 INDEX RANGE SCAN TEST_UX (cr=200000 pr=0 pw=0 time=1354629 us)(object id 60859168)

SELECT COUNT(*)

FROM

TEST WHERE R IN ( SELECT /*+ cardinality(10) */ * FROM THE ( SELECT CAST(

CONVERT2TABLE( '1,2,3,4,5,6,7,8,9,10' ) AS STRINGTABLETYPE ) FROM DUAL ) )

call count cpu elapsed disk query current rows

------- ------ -------- ---------- ---------- ---------- ---------- ----------

Parse 1 0.01 0.01 0 48 0 0

Execute 10000 5.40 5.42 0 0 0 0

Fetch 10000 9.57 9.65 0 120000 0 10000

------- ------ -------- ---------- ---------- ---------- ---------- ----------

total 20001 14.99 15.09 0 120048 0 10000

Rows Row Source Operation

------- ---------------------------------------------------

10000 SORT AGGREGATE (cr=120000 pr=0 pw=0 time=13856299 us)

100000 NESTED LOOPS (cr=120000 pr=0 pw=0 time=16447839 us)

100000 VIEW VW_NSO_1 (cr=0 pr=0 pw=0 time=11596021 us)

100000 HASH UNIQUE (cr=0 pr=0 pw=0 time=11085381 us)

100000 COLLECTION ITERATOR PICKLER FETCH CONVERT2TABLE (cr=0 pr=0 pw=0 time=3527457 us)

20000 FAST DUAL (cr=0 pr=0 pw=0 time=151992 us)

100000 INDEX RANGE SCAN TEST_UX (cr=120000 pr=0 pw=0 time=2182846 us)(object id 60859168)

The difference between 2 is sea. Although the later did almost 1/2 LIO yet the total time spent on CPU was huge.

It seems to me that NL over INLIST ITERATOR is not a good choice ATLEAST for 10 values.

Now, I tried with 1000 values -- i didnt capture the screen shots. But here is what i saw.

- for loop that runs 1000 times -- the "R in (1,2,3...1000)" took 54 secs. And using the Tom Kyte's suggested methood - it took 16 minutes.

Am i missing something?

Friday, December 19, 2008

Partition stats usage with Bind peeking disabled

Last week, I was working on a package, which is used to collect stats.

I realized we collect stats of all partioned tables,indexes at both partition and global level.

While working on that package, i was pondering when will we actually use partition/local level stats (over global stats) in real-world scenario as most of our "where" clause

values are passed using bind variables. How will optimizer will come to know which partition (stats) to use because value passed will checked run time except in the case when it

is run for the first time and Bind peek happens.

A bit confused with this I performed a test, where my test case was inspired by David Aldridge's blog post (however i conducted a different set of tests).

I first checked the difference in 10053 trace in sqls where we pass literals Vs bind variables - when bind peek is enabled (default - enabled, but not in our case).

Test case:

============

drop table test_par

create table test_par

(

col1 number not null,

col2 number not null

)

nologging compress pctfree 0

partition by range (col1)

(partition p1_to_4 values less than (5),

partition p5 values less than (6)

)

insert /*+ append */ into test_par

select mod(rownum,4)+1,rownum

from dual

connect by level <= 100000

insert into test_par values(5,10);

commit;

create index idx02_test_par

on test_par (col2)

local nologging

begin

DBMS_STATS.GATHER_TABLE_STATS (

ownname => user,

tabname => 'test_par',

partname => null,

estimate_percent => 100,

block_sample => false,

method_opt => 'for all columns size 1',

degree => null,

granularity => 'ALL',

cascade => true,

stattab => NULL,

statid => NULL,

statown => NULL,

no_invalidate => FALSE);

end;

So - i have 75000 rows in partition p1_to_4 and 1 row in p5

SQL> select count(*),col1 from test_par partition(p1_to_4) group by col1;

COUNT(*) COL1

---------- ----------

25000 1

25000 2

25000 4

25000 3

SQL> select count(*),col1 from test_par partition(p5) group by col1;

COUNT(*) COL1

---------- ----------

1 5

Now we have the set up ready, we have to enable the trace and check which stats are used when.

Case 1: - Bind peek enable. I use literal value to query table and see which stats are used.

SQL> alter session set "_optim_peek_user_binds"=true;

SQL> alter session set events '10053 trace name context forever, level 2';

SQL> select * from test_par where col1=2 and col2=5;

Now checking the trace file, we see:

BASE STATISTICAL INFORMATION

***********************

Table Stats::

Table: TEST_PAR Alias: TEST_PAR Partition [0] -- Do you see Partition stats getting used?

#Rows: 100000 #Blks: 149 AvgRowLen: 7.00

Index Stats::

Index: IDX02_TEST_PAR Col#: 2 PARTITION [0]

LVLS: 1 #LB: 222 #DK: 100000 LB/K: 1.00 DB/K: 1.00 CLUF: 149.00

***************************************

So, with Bind peek enabled - we use partition stat (when we should) when literals are used

Case 2: Bind peek enable. I use Bind value to query table and see which stats are used.

SQL> alter session set "_optim_peek_user_binds"=true;

SQL> alter session set events '10053 trace name context forever, level 2';

SQL> variable lcol1 number;

SQL> variable lcol2 number;

SQL> execute :lcol1:=2;

SQL> execute :lcol2:=5;

SQL> select * from test_par where col1=:lcol1 and col2=:lcol2;

Trace file:

***************************************

BASE STATISTICAL INFORMATION

***********************

Table Stats::

Table: TEST_PAR Alias: TEST_PAR Partition [0] -- Here again, partition stats getting used

#Rows: 100000 #Blks: 149 AvgRowLen: 7.00

Index Stats::

Index: IDX02_TEST_PAR Col#: 2 PARTITION [0]

LVLS: 1 #LB: 222 #DK: 100000 LB/K: 1.00 DB/K: 1.00 CLUF: 149.00

***************************************

Since this was the first time, we used this sql, Bind peek had to happen.

This is confirmed by following section of trace file:

*******************************************

Peeked values of the binds in SQL statement

*******************************************

kkscoacd

Bind#0

oacdty=02 mxl=22(22) mxlc=00 mal=00 scl=00 pre=00

oacflg=03 fl2=1000000 frm=00 csi=00 siz=48 off=0

kxsbbbfp=2a973589d8 bln=22 avl=02 flg=05

value=2

Bind#1

oacdty=02 mxl=22(22) mxlc=00 mal=00 scl=00 pre=00

oacflg=03 fl2=1000000 frm=00 csi=00 siz=0 off=24

kxsbbbfp=2a973589f0 bln=22 avl=02 flg=01

value=5

CASE 3: Bind peek enabled - I use Bind value to query table and see which stats are used. -- This will be 2nd time - so this time i dont expect peeking to happen.

SQL> alter session set "_optim_peek_user_binds"=true;

SQL> alter session set events '10053 trace name context forever, level 2';

SQL> variable lcol1 number;

SQL> variable lcol2 number;

SQL> execute :lcol1:=5;

SQL> execute :lcol2:=10;

SQL> select * from test_par where col1=:lcol1 and col2=:lcol2;

Trace file:

***************************************

BASE STATISTICAL INFORMATION

***********************

Table Stats::

Table: TEST_PAR Alias: TEST_PAR Partition [0] -- Part Stats .. again

#Rows: 100000 #Blks: 149 AvgRowLen: 7.00

Index Stats::

Index: IDX02_TEST_PAR Col#: 2 PARTITION [0]

LVLS: 1 #LB: 222 #DK: 100000 LB/K: 1.00 DB/K: 1.00 CLUF: 149.00

***************************************

In this case - we still use old execution path and old stats - which is agony of 10g.

We had only one row for this partition and we ended up using INDEX access when direct access would be really fast.

I heard about Adaptive cursor sharing in 11g will avoid this, but still to run tests on it.

Now we will run the same tests by keeping the bind peeking disabled (as in my environment)

CASE 4: Bind peek disabled - we use literals

SQL> alter system flush shared_pool;

SQL> alter session set "_optim_peek_user_binds"=false;

SQL> alter session set events '10053 trace name context forever, level 2';

SQL> select * from test_par where col1=5 and col2=10;

***************************************

BASE STATISTICAL INFORMATION

***********************

Table Stats::

Table: TEST_PAR Alias: TEST_PAR Partition [1] -- Picked the right stats of right partition

#Rows: 1 #Blks: 1 AvgRowLen: 6.00

Index Stats::

Index: IDX02_TEST_PAR Col#: 2 PARTITION [1]

LVLS: 0 #LB: 1 #DK: 1 LB/K: 1.00 DB/K: 1.00 CLUF: 1.00

***************************************

Literals - as seen knows where to go.

CASE 5: Bind peek disabled - we use binds

SQL> variable lcol1 number;

SQL> variable lcol2 number;

SQL> execute :lcol1:=5;

SQL> execute :lcol2:=10;

SQL> alter session set "_optim_peek_user_binds"=false;

SQL> alter session set events '10053 trace name context forever, level 2';

SQL> select * from test_par where col1=:lcol1 and col2=:lcol2;

Trace file:

***************************************

BASE STATISTICAL INFORMATION

***********************

Table Stats::

Table: TEST_PAR Alias: TEST_PAR (Using composite stats) -- These are global stats

#Rows: 100001 #Blks: 75 AvgRowLen: 7.00

Index Stats::

Index: IDX02_TEST_PAR Col#: 2

USING COMPOSITE STATS

LVLS: 1 #LB: 223 #DK: 100000 LB/K: 1.00 DB/K: 1.00 CLUF: 150.00

***************************************

So - the conclusion is :

When we have BIND peeeking disabled, and with usage of bind values - we are most likely (any one of you could come up where above doesnt hold good) NOT using partiton stats.

Thursday, November 13, 2008

Oracle Bitmap indexes - implementation

Oracle uses BBC (Byte aligned Bitmap Code) 1-sided algorithm to store Bitmap indexes.

What is BBC?

Thinking of Bitmap indexes, most of us recall a simple Bitmap Index such as the one mention below:

Columns	Row 1	Row 2	Row 3	Row 4	Row 5	Row 6	Row 7	Row 8	Row 9
Red	0	0	0	0	0	0	0	1	1
Green	1	1	0	0	0	0	1	0	0
Blue	0	0	1	1	1	1	0	0	0

Fig: 1 - Table having 9 rows with data distribution shown as 0 and 1.

In the above style of storing bits is uncompressed i.e. all 0s and 1s are stored as it is. So, while doing any bitwise operation (and/or/xor), we use the uncompressed version of bitmaps.

Potential drawbacks of using this technique:

high storage cost
high query cost

Can compression be achieved?

Yes, we can compress 0s and 1s which are consecutive. Think of compressing seven 0s as one with a special header saying that this 0 is equivalent to seven 0s.

But this has its own disadvantage. For any bit-wise operation, I have to first decompress bitmap indexes in question and do the “and”/”or” operation, which outweighs any savings in storage costs.

So, the need was for a compress bitmap technique using which bit-wise operations could be done on the compressed state.

One such example of such techniques is BBC (used by Oracle) – called Byte aligned bitmap code.

This technique of bitmap compression is based on run-length encoding. Using this encoding BITMAPs are aligned along the BYTE boundaries. Each aligned BYTE is called a gap-byte (GBYTE) if all the bits of that BYTE store the same logical value (0 or 1). If it’s a mix of bits having different logical values, its called a non-gap-byte or map-byte (MBYTE). Adjacent bytes of the same class are grouped and controlled by a control byte (CBYTE), which keeps the information whether the bytes following it are MBYTE or GBYTE and if its GBYTE what is the length of bytes? And if it’s a MBYTE, which type of MBYTE (is it an offset or not? – offset is a MBYTE having only single bit different from other bits in that byte)

In the above explanation, a gap-byte could be for 0 or for 1. so, it is called 2-sided BBC.

Oracle uses 1-sided BBC, in which only 0s are candidate for GBYTE not 1s. All 1s will still make an MBYTE in Oracle. I’m not sure why Oracle uses 1-sided BBC technique. It could be due to integration complexities with other part of RDBMS-engine.

But going back on the question, why do we need Byte alignment when I can compress without it?

Computers are happy dealing with BYTEs than BITS.
I can do BIT-WISE operations as efficiently (in fact better) as I can do in decompress form.
Control Bytes (CBYTES) are very handy in decoding the compress BITMAP.

Can this BITMAP compression and query response get better?

From what I see after doing research on net, the answer is Yes.

The technique of handling the bits BYTEs way has gotten better by handling the WORD way.

The concepts are similar to the ones followed in BBC, the major difference is aligning the bits using “word” rather than “byte”. Computers today are more friendly dealing in words (multiple bytes) than on byte.

I will write more on word aligned hybrid compression technique in perspective of oracle in coming days.

Thanks for reading!

Inputs:

Patent# 5363098 – by Antoshenkov, Gennady (Oracle corp).