Monday, February 9, 2009

Index range Scan Vs Nested loop in IN-LIST ?

Which one is better over the above? This question has been the one of the latest topic in our performance team when using a viable solution for varying In-List management.


Background:

We all know Oracle's shared pool is sensitive to the way sql is written - whitespace, uppercase/lowercase difference, number of bind variables in IN-Clause.
To avoid shared pool being bombarded with thousands of sqls of same type, we currently are using buckets for IN-clause.

Ex:

select * from emp where empno in (:B1,:B2,:B3,......,:B16) -- this is bucket size 16.

So, if user passes only one bind value, the rest of bind values will go as null. 
And if user passes two bind values, the rest of bind values will go as null. 

This way we control, the number of sqls which needs to be maintained in shared pool.

We implement different size of buckets: 16, 32, 64, 128, 256, 512, 768, 1000

Upon searching i came across Tomy kyte's solution mentioned here (http://tkyte.blogspot.com/2006/06/varying-in-lists.html).
Initially i used his 8i solution and results were somehow unexpected:
Here is the testcase:

SQL> create or replace type stringTableType as table of varchar2(4000);
  2  /

Type created.


SQL> SQL>   2    3    4
  5      v_str   long default p_str || ',';
  6
  7      v_n        number;
  8
  9      v_data    stringTableType := stringTableType();
 10
 11  begin
 12
 13      loop
 14
 15          v_n := instr( v_str, ',' );
 16
 17          exit when (nvl(v_n,0) = 0);
 18
 19          v_data.extend;
 20
 21          v_data( v_data.count ) := ltrim(rtrim(substr(v_str,1,v_n-1)));
 22
 23          v_str := substr( v_str, v_n+1 );
 24
 25      end loop;
 26
 27      return v_data;
 28
 29  end;
 30
 31  /

Function created.

SQL> create table test as select rownum r from dual connect by level<=100000;

Table created.

SQL> create index test_ux on test(r);

Index created.

SQL> exec dbms_stats.gather_table_stats(user,'TEST',cascade=>true)

PL/SQL procedure successfully completed.


Now running a NORMAL in-list with 10 inlist values:

SQL> declare
a number;
begin
for i in 1..10000 loop
select count(*) into a from test where r in (1,2,3,4,5,6,7,8,9,10);
end loop;
end;
/
 

PL/SQL procedure successfully completed.

Elapsed: 00:00:01.23

Note the "Elapsed" time. It is 1.5 secs.

Now using the Collection method:

SQL> SQL> declare
a number;
begin
for i in 1..10000 loop
select count(*) into a from test
where r in ( select /*+ cardinality(10) */ * from THE ( select cast( convert2Table( '1,2,3,4,5,6,7,8,9,10' ) as stringTableType ) from dual ) );
end loop;
end;
/
 

PL/SQL procedure successfully completed.

Elapsed: 00:00:22.41

Note the "Elapsed" time is almost 15x more than normal inlist.

This was a bit surprising.

I did the same test under 10046 + tkprof eyes.

SELECT COUNT(*)
FROM
 TEST WHERE R IN (1,2,3,4,5,6,7,8,9,10)


call     count       cpu    elapsed       disk      query    current        rows
------- ------  -------- ---------- ---------- ---------- ----------  ----------
Parse        1      0.00       0.00          0          0          0           0
Execute  10000      0.46       0.42          0          0          0           0
Fetch    10000      1.99       2.00          0     200000          0       10000
------- ------  -------- ---------- ---------- ---------- ----------  ----------
total    20001      2.46       2.43          0     200000          0       10000

Rows     Row Source Operation
-------  ---------------------------------------------------
  10000  SORT AGGREGATE (cr=200000 pr=0 pw=0 time=2117708 us)
 100000   INLIST ITERATOR  (cr=200000 pr=0 pw=0 time=2271474 us)
 100000    INDEX RANGE SCAN TEST_UX (cr=200000 pr=0 pw=0 time=1354629 us)(object id 60859168)

 

SELECT COUNT(*)
FROM
 TEST WHERE R IN ( SELECT /*+ cardinality(10) */ * FROM THE ( SELECT CAST(
  CONVERT2TABLE( '1,2,3,4,5,6,7,8,9,10' ) AS STRINGTABLETYPE ) FROM DUAL ) )

call     count       cpu    elapsed       disk      query    current        rows
------- ------  -------- ---------- ---------- ---------- ----------  ----------
Parse        1      0.01       0.01          0         48          0           0
Execute  10000      5.40       5.42          0          0          0           0
Fetch    10000      9.57       9.65          0     120000          0       10000
------- ------  -------- ---------- ---------- ---------- ----------  ----------
total    20001     14.99      15.09          0     120048          0       10000

Rows     Row Source Operation
-------  ---------------------------------------------------
  10000  SORT AGGREGATE (cr=120000 pr=0 pw=0 time=13856299 us)
 100000   NESTED LOOPS  (cr=120000 pr=0 pw=0 time=16447839 us)
 100000    VIEW  VW_NSO_1 (cr=0 pr=0 pw=0 time=11596021 us)
 100000     HASH UNIQUE (cr=0 pr=0 pw=0 time=11085381 us)
 100000      COLLECTION ITERATOR PICKLER FETCH CONVERT2TABLE (cr=0 pr=0 pw=0 time=3527457 us)
  20000       FAST DUAL  (cr=0 pr=0 pw=0 time=151992 us)
 100000    INDEX RANGE SCAN TEST_UX (cr=120000 pr=0 pw=0 time=2182846 us)(object id 60859168)


The difference between 2 is sea. Although the later did almost 1/2 LIO yet the total time spent on CPU was huge.

It seems to me that NL over INLIST ITERATOR is not a good choice ATLEAST for 10 values.

Now, I tried with 1000 values -- i didnt capture the screen shots. But here is what i saw.

-  for loop that runs 1000 times -- the "R in (1,2,3...1000)" took 54 secs. And using the Tom Kyte's suggested methood - it took 16 minutes.

Am i missing something?


Friday, December 19, 2008

Partition stats usage with Bind peeking disabled

Last week, I was working on a package, which is used to collect stats.
I realized we collect stats of all partioned tables,indexes at both partition and global level.

While working on that package, i was pondering when will we actually use partition/local level stats (over global stats) in real-world scenario as most of our "where" clause 

values are passed using bind variables. How will optimizer will come to know which partition (stats) to use because value passed will checked run time except in the case when it 

is run for the first time and Bind peek happens.

A bit confused with this I performed a test, where my test case was inspired by David Aldridge's blog post (however i conducted a different set of tests).

I first checked the difference in 10053 trace in sqls where we pass literals Vs bind variables - when bind peek is enabled (default - enabled, but not in our case).


Test case:
============

drop table test_par
/

create table test_par
   (
   col1 number not null,
   col2 number not null
   )
nologging compress pctfree 0
partition by range (col1)
(partition p1_to_4 values less than (5),
partition p5 values less than (6)
)
/

insert /*+ append */ into test_par
select mod(rownum,4)+1,rownum
from dual
connect by level <= 100000
/

insert into test_par values(5,10);
commit;

create index idx02_test_par
on test_par (col2)
local nologging
/

begin
DBMS_STATS.GATHER_TABLE_STATS (
   ownname          => user,
   tabname          => 'test_par',
   partname         => null,
   estimate_percent => 100,
   block_sample     => false,
   method_opt       => 'for all columns size 1',
   degree           => null,
   granularity      => 'ALL',
   cascade          => true,
   stattab          => NULL, 
   statid           => NULL,
   statown          => NULL,
   no_invalidate    => FALSE);
end;
/

So - i have 75000 rows in partition p1_to_4 and 1 row in p5 



SQL> select count(*),col1 from test_par partition(p1_to_4) group by col1;

  COUNT(*)       COL1
---------- ----------
     25000          1
     25000          2
     25000          4
     25000          3

SQL>  select count(*),col1 from test_par partition(p5) group by col1;

  COUNT(*)       COL1
---------- ----------
         1          5




Now we have the set up ready, we have to enable the trace and check which stats are used when.

Case 1: - Bind peek enable. I use literal value to query table and see which stats are used.

SQL> alter session set "_optim_peek_user_binds"=true;

SQL> alter session set events '10053 trace name context forever, level 2';

SQL> select * from test_par where col1=2 and col2=5;


Now checking the trace file, we see:

BASE STATISTICAL INFORMATION

***********************

Table Stats::


  Table: TEST_PAR  Alias: TEST_PAR  Partition [0]  -- Do you see Partition stats getting used?

    #Rows: 100000  #Blks:  149  AvgRowLen:  7.00

    #Rows: 100000  #Blks:  149  AvgRowLen:  7.00

Index Stats::

  Index: IDX02_TEST_PAR  Col#: 2  PARTITION [0]

    LVLS: 1  #LB: 222  #DK: 100000  LB/K: 1.00  DB/K: 1.00  CLUF: 149.00

    LVLS: 1  #LB: 222  #DK: 100000  LB/K: 1.00  DB/K: 1.00  CLUF: 149.00

***************************************

So, with Bind peek enabled - we use partition stat (when we should) when literals are used


Case 2: Bind peek enable. I use Bind value to query table and see which stats are used.

SQL> alter session set "_optim_peek_user_binds"=true;

SQL> alter session set events '10053 trace name context forever, level 2';

SQL> variable lcol1 number;
SQL> variable lcol2 number;
SQL> execute :lcol1:=2;
SQL> execute :lcol2:=5;

SQL> select * from test_par where col1=:lcol1 and col2=:lcol2;

Trace file:

***************************************

BASE STATISTICAL INFORMATION

***********************

Table Stats::

  Table: TEST_PAR  Alias: TEST_PAR  Partition [0]  -- Here again, partition stats getting used

    #Rows: 100000  #Blks:  149  AvgRowLen:  7.00

    #Rows: 100000  #Blks:  149  AvgRowLen:  7.00

Index Stats::

  Index: IDX02_TEST_PAR  Col#: 2  PARTITION [0]


    LVLS: 1  #LB: 222  #DK: 100000  LB/K: 1.00  DB/K: 1.00  CLUF: 149.00

    LVLS: 1  #LB: 222  #DK: 100000  LB/K: 1.00  DB/K: 1.00  CLUF: 149.00

***************************************

Since this was the first time, we used this sql, Bind peek had to happen.

This is confirmed by following section of trace file:

*******************************************

Peeked values of the binds in SQL statement

*******************************************

kkscoacd


 Bind#0

  oacdty=02 mxl=22(22) mxlc=00 mal=00 scl=00 pre=00

  oacflg=03 fl2=1000000 frm=00 csi=00 siz=48 off=0

  kxsbbbfp=2a973589d8  bln=22  avl=02  flg=05

  value=2

 Bind#1

  oacdty=02 mxl=22(22) mxlc=00 mal=00 scl=00 pre=00

  oacflg=03 fl2=1000000 frm=00 csi=00 siz=0 off=24

  kxsbbbfp=2a973589f0  bln=22  avl=02  flg=01

  value=5


CASE 3: Bind peek enabled - I use Bind value to query table and see which stats are used. -- This will be 2nd time - so this time i dont expect peeking to happen.

SQL> alter session set "_optim_peek_user_binds"=true;

SQL> alter session set events '10053 trace name context forever, level 2';

SQL> variable lcol1 number;
SQL> variable lcol2 number;
SQL> execute :lcol1:=5;
SQL> execute :lcol2:=10;

SQL> select * from test_par where col1=:lcol1 and col2=:lcol2;


Trace file:

***************************************

BASE STATISTICAL INFORMATION

***********************

Table Stats::

  Table: TEST_PAR  Alias: TEST_PAR  Partition [0] -- Part Stats .. again

    #Rows: 100000  #Blks:  149  AvgRowLen:  7.00

    #Rows: 100000  #Blks:  149  AvgRowLen:  7.00

Index Stats::

  Index: IDX02_TEST_PAR  Col#: 2  PARTITION [0]


    LVLS: 1  #LB: 222  #DK: 100000  LB/K: 1.00  DB/K: 1.00  CLUF: 149.00

    LVLS: 1  #LB: 222  #DK: 100000  LB/K: 1.00  DB/K: 1.00  CLUF: 149.00

***************************************


In this case - we still use old execution path and old stats - which is agony of 10g. 
We had only one row for this partition and we ended up using INDEX access when direct access would be really fast.
I heard about Adaptive cursor sharing in 11g will avoid this, but still to run tests on it.



Now we will run the same tests by keeping the bind peeking disabled (as in my environment)



CASE 4: Bind peek disabled - we use literals

SQL> alter system flush shared_pool;
SQL> alter session set "_optim_peek_user_binds"=false;
SQL> alter session set events '10053 trace name context forever, level 2';
SQL> select * from test_par where col1=5 and col2=10;


***************************************

BASE STATISTICAL INFORMATION

***********************

Table Stats::

  Table: TEST_PAR  Alias: TEST_PAR  Partition [1] -- Picked the right stats of right partition

    #Rows: 1  #Blks:  1  AvgRowLen:  6.00

    #Rows: 1  #Blks:  1  AvgRowLen:  6.00

Index Stats::

  Index: IDX02_TEST_PAR  Col#: 2  PARTITION [1]

    LVLS: 0  #LB: 1  #DK: 1  LB/K: 1.00  DB/K: 1.00  CLUF: 1.00

    LVLS: 0  #LB: 1  #DK: 1  LB/K: 1.00  DB/K: 1.00  CLUF: 1.00

***************************************

Literals - as seen knows where to go.

CASE 5: Bind peek disabled - we use binds


SQL> variable lcol1 number;
SQL> variable lcol2 number;
SQL> execute :lcol1:=5;
SQL> execute :lcol2:=10;
SQL> alter session set "_optim_peek_user_binds"=false;
SQL> alter session set events '10053 trace name context forever, level 2';

SQL> select * from test_par where col1=:lcol1 and col2=:lcol2;

Trace file:

***************************************

BASE STATISTICAL INFORMATION

***********************

Table Stats::

  Table: TEST_PAR  Alias: TEST_PAR  (Using composite stats)  -- These are global stats 

    #Rows: 100001  #Blks:  75  AvgRowLen:  7.00

Index Stats::

  Index: IDX02_TEST_PAR  Col#: 2


    USING COMPOSITE STATS

    LVLS: 1  #LB: 223  #DK: 100000  LB/K: 1.00  DB/K: 1.00  CLUF: 150.00

***************************************


So - the conclusion is :

When we have BIND peeeking disabled, and with usage of bind values - we are most likely (any one of you could come up where above doesnt hold good) NOT using partiton stats.

Thursday, November 13, 2008

Oracle Bitmap indexes - implementation

Oracle uses BBC (Byte aligned Bitmap Code) 1-sided algorithm to store Bitmap indexes.

 

What is BBC?

 

Thinking of Bitmap indexes, most of us recall a simple Bitmap Index such as the one mention below:

Columns

Row 1

Row 2

Row 3

Row 4

Row 5

Row 6

Row 7

Row 8

Row 9

Red

0

0

0

0

0

0

0

1

1

Green

1

1

0

0

0

0

1

0

0

Blue

0

0

1

1

1

1

0

0

0


Fig: 1 - Table having 9 rows with data distribution shown as 0 and 1.

 

In the above style of storing bits is uncompressed i.e. all 0s and 1s are stored as it is. So, while doing any bitwise operation (and/or/xor), we use the uncompressed version of bitmaps.

 

Potential drawbacks of using this technique:

  1. high storage cost
  2. high query cost

 

Can compression be achieved?

Yes, we can compress 0s and 1s which are consecutive. Think of compressing seven 0s as one with a special header saying that this 0 is equivalent to seven 0s. 

 

But this has its own disadvantage. For any bit-wise operation, I have to first decompress bitmap indexes in question and do the “and”/”or” operation, which outweighs any savings in storage costs.

 

So, the need was for a compress bitmap technique using which bit-wise operations could be done on the compressed state.

One such example of such techniques is BBC (used by Oracle) – called Byte aligned bitmap code.

 

This technique of bitmap compression is based on run-length encoding. Using this encoding BITMAPs are aligned along the BYTE boundaries. Each aligned BYTE is called a gap-byte (GBYTE) if all the bits of that BYTE store the same logical value (0 or 1). If it’s a mix of bits having different logical values, its called a non-gap-byte or map-byte (MBYTE). Adjacent bytes of the same class are grouped and controlled by a control byte (CBYTE), which keeps the information whether the bytes following it are MBYTE or GBYTE and if its GBYTE what is the length of bytes? And if it’s a MBYTE, which type of MBYTE (is it an offset or not? – offset is a MBYTE having only single bit different from other bits in that byte)

 

In the above explanation, a gap-byte could be for 0 or for 1. so, it is called 2-sided BBC.

Oracle uses 1-sided BBC, in which only 0s are candidate for GBYTE not 1s. All 1s will still make an MBYTE in Oracle. I’m not sure why Oracle uses 1-sided BBC technique. It could be due to integration complexities with other part of RDBMS-engine. 

But going back on the question, why do we need Byte alignment when I can compress without it?

  1. Computers are happy dealing with BYTEs than BITS.
  2. I can do BIT-WISE operations as efficiently (in fact better) as I can do in decompress form.
  3. Control Bytes (CBYTES) are very handy in decoding the compress BITMAP.

 

Can this BITMAP compression and query response get better?

From what I see after doing research on net, the answer is Yes.

 

The technique of handling the bits BYTEs way has gotten better by handling the WORD way.

 

The concepts are similar to the ones followed in BBC, the major difference is aligning the bits using “word” rather than “byte”. Computers today are more friendly dealing in words (multiple bytes) than on byte.

 

 

I will write more on word aligned hybrid compression technique in perspective of oracle in coming days.

 

Thanks for reading!

 

 

Inputs:

Patent# 5363098 – by Antoshenkov, Gennady  (Oracle corp).