Friday, March 23, 2007

Db file sequential read while doing full table scan?

These days, we are working on data warehouse in which we have a master table which will have 1.5m (approx) rows inserted every half hour and we have few fast refresh materialized view based on it. These mviews have some aggregate functions on it, which makes it a bit complex.

To start the experiment, each mview refreshes used to take some 18-20 mins, which is totally against the business requirement. Then we tried to figure out on why the mview refresh is taking so much time, in spite of dropping all the bitmap indexes on the mview (generally b-map indexes are not good for inserts/updates).

The 10046 trace (level 12) highlighted that there were many “db file sequential reads” on mview because of optimizer using “I_SNAP$_mview” to fetch the rows from mview and merge the rows with that of master table to make the aggregated data for the mview.

Good part of the story is access to master table was quite fast because we used direct load (using sqlldr direct=y) to insert the data in it. When you use direct load to insert the data, oracle maintains the list of rowids added to table in a view called “SYS.ALL_SUMDELTA”. So while doing fast mview refresh, news rows inserted are picked directly from table using the rowids given from ALL_SUMDELTA view and not from Mview log, so this saves time.

Concerned part was still Oracle using I_SNAP$ index while fetching the data from mview and there were many “db file sequential read” waits and it was clearly visible that Oracle waited on sequential read the most. We figured it out that full table scan (which uses scattered read, and multi block read count) was very fast in comparison to index access by running simple test against table. Also the tables are dependent mviews are only for the day. End of the day the master table and mview’s data will be pushed to historical tables and master table and mviews will be empty post midnight.

I gathered the stats of mview and then re-ran the mview refresh, and traced the session, and this time optimizer didn’t use the index which was good news.

Now the challenge was to run the mview stats gathering job every half an hour or induce wrong stats to table/index to ensure mview refresh never uses index access or may be to lock the stats using DBMS_STATS.LOCK_TABLE_STATS.

But we found another solution by creating the mview with “USING NO INDEX” clause. This way “I_SNAP$” index is not created with “CREATE MATERIALIZED VIEW’ command. As per Oracle the “I_SNAP$” index is good for fast refresh but it proved to be reverse for us because our environment is different and the data changes is quite frequent.

Now, we ran the tests again, we loaded 48 slices of data (24 hrs x 2 times within hour) and the results were above expectations. We could load the data with max 3 mins per load of data.

This is not the end of story. In the trace we could see the mview refresh using “MERGE” command and using full table scan access to mview (which we wanted) and rowid range access to master table.

The explain plan looks like:

Rows     Row Source Operation
-------  ---------------------------------------------------
      2  MERGE  SF_ENV_DATA_MV (cr=4598 pr=5376 pw=0 time=47493463 us)
 263052   VIEW  (cr=3703 pr=3488 pw=0 time=24390284 us)
 263052    HASH JOIN OUTER (cr=3703 pr=3488 pw=0 time=24127224 us)
 263052     VIEW  (cr=1800 pr=1790 pw=0 time=14731732 us)
 263052      SORT GROUP BY (cr=1800 pr=1790 pw=0 time=14205624 us)
 784862       VIEW  (cr=1800 pr=1790 pw=0 time=3953958 us)
 784862        NESTED LOOPS  (cr=1800 pr=1790 pw=0 time=3169093 us)
      1         VIEW  ALL_SUMDELTA (cr=9 pr=0 pw=0 time=468 us)
      1          FILTER  (cr=9 pr=0 pw=0 time=464 us)
      1           MERGE JOIN CARTESIAN (cr=9 pr=0 pw=0 time=459 us)
      1            NESTED LOOPS  (cr=6 pr=0 pw=0 time=99 us)
      1             TABLE ACCESS BY INDEX ROWID OBJ$ (cr=3 pr=0 pw=0 time=56 us)
      1              INDEX UNIQUE SCAN I_OBJ1 (cr=2 pr=0 pw=0 time=23 us)(object id 36)
      1             TABLE ACCESS CLUSTER USER$ (cr=3 pr=0 pw=0 time=40 us)
      1              INDEX UNIQUE SCAN I_USER# (cr=1 pr=0 pw=0 time=7 us)(object id 11)
      1            BUFFER SORT (cr=3 pr=0 pw=0 time=354 us)
      1             INDEX RANGE SCAN I_SUMDELTA$ (cr=3 pr=0 pw=0 time=243 us)(object id 158)
      0           NESTED LOOPS  (cr=0 pr=0 pw=0 time=0 us)
      0            INDEX RANGE SCAN I_OBJAUTH1 (cr=0 pr=0 pw=0 time=0 us)(object id 103)
      0            FIXED TABLE FULL X$KZSRO (cr=0 pr=0 pw=0 time=0 us)
      0           FIXED TABLE FULL X$KZSPR (cr=0 pr=0 pw=0 time=0 us)
 784862         TABLE ACCESS BY ROWID RANGE SF_ENV_SLICE_DATA (cr=1791 pr=1790 pw=0 time=2383760 us)
 708905     MAT_VIEW ACCESS FULL SF_ENV_DATA_MV (cr=1903 pr=1698 pw=0 time=6387829 us)

You can see the access pattern as above.

Interesting twist in the story is when I saw the wait events in trace file.

  Event waited on                             Times   Max. Wait  Total Waited
  ----------------------------------------   Waited  ----------  ------------
  db file sequential read                      2253        0.74          7.73
  db file scattered read                        240        1.05          6.77
  log file switch completion                 6        0.98          3.16
  log file switch                                    8        0.98          2.47
  rdbms ipc reply                                 6        0.00          0.00
  log buffer space                                3        0.42          0.61

Again, even when we are doing full table scan, there are “db file sequential reads”?

To confirm I opened the raw trace file (before tkprof), and checked the obj# on sequential read wait event, it was the mview (SF_ENV_DATA_MV) !! and there were many. To further investigate I checked if there were any scattered reads to mview or not. I found there were scattered reads but there were many sequential reads also on which Oracle waited more than that of scattered read which did most of the data fetching.

After giving some thought, I realized that we created the mviews without storage clause, which means Oracle created the mview with default storage clause.

So assuming there are 17 blocks in an mview (container table) extent and Multi block read count is 16, Oracle will use scattered read mechanism (multiple blocks) to read the first 16 blocks and for the rest 1 it will use sequential read mechanism (one block), so you will find many sequential reads wait events sandwiched between scattered reads. To overcome this we created the mview with larger extent sizes and also multiple of MBCR (multi block read count).

Also, another cause of sequential read is chained or migrated rows, if your mview (or table) rows are migrated, the pointer to the next row is maintained in old (original) block, which will always be read by a single i/o call i.e. by sequential read.You can check the count of chained rows using DBA_TABLES.CHAIN_CNT after analysing the table . So to overcome this, we created the mview with genuine pctfree so when the merge runs (as a part of mview refresh) and updates few rows, the rows are not moved to a different block and hence avoiding sequential read.


  1. Mview creation with “USING NO INDEX” does not create “I_SNAP$” index which sometimes help in fast refresh when the data changes are quite frequent and you cannot afford to collect stats after every few mins.
  2. Create mview with storage clause suiting to your environment. Default extent sizes may not be always good.
  3. PCTFREE can be quite handy to avoid sequential reads and extra block read.

Saturday, March 3, 2007

Optimizer_mode – ALL_ROWS or FIRST_ROWS?

Out of all Oracle RDBMS modules, optimizer code is actually the most complicated code and different optimizer modes seem like jack while lifting your car in case of a puncture.

This paper focuses on how optimizer behaves differently when you have optimizer mode set to ALL_ROWS or FIRST_ROWS.

Possible values for optimizer_mode = choose/ all_rows/ first_rows/ first_rows[n]

By default, the value of optimizer_mode is CHOOSE which basically means ALL_ROWS (if statistics on underlying tables exist) else RULE (if there are no statistics on underlying tables). So it is very important to have statistics collected on your tables on regular intervals or else you are living in Stone Age.

FIRST_ROWS and ALL_ROWS are both cost based optimizer features. You may use them according to their requirement.


In simple terms it ensures best response time of first few rows (n rows).

This mode is good for interactive client-server environment where server serves first few rows and by the time user scroll down for more rows, it fetches other. So user feels that he has been served the data he requested, but in reality the request is still pending and query is still fetching the data in background.

Best example for this is toad, if you click on data tab, it instantaneously start showing you data and you feel toad is faster than sqlplus, but the fact is if you scroll down, you will see the query is still running.

Ok, let us simulate this on SQLPLUS

Create a table and index over it:

SQL> create table test as select * from all_objects;

Table created.

SQL> create index test_in on test(object_type);

Index created.

SQL> exec dbms_stats.gather_table_stats(‘SAC’,'TEST')

PL/SQL procedure successfully completed.

SQL> select count(*) from test;


SQL> select count(*) from test where object_type='JAVA CLASS';


You see out of almost 38k records, 15k are of JAVA class. And now if you select the rows having object_type=’JAVA_CLASS’, it should not use index as almost half of the rows are JAVA_CLASS. It will be foolish of optimizer to read the index first and then go to table.

Check out the Explain plans

SQL> set autotrace traceonly exp
SQL> select * from test where object_type='JAVA CLASS';

Execution Plan
Plan hash value: 1357081020

| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
| 0 | SELECT STATEMENT | | 1001 | 94094 | 10 (0)| 00:00:01 |
|* 1 | TABLE ACCESS FULL| TEST | 1001 | 94094 | 10 (0)| 00:00:01 |

As you see above, optimizer has not used Index we created on this table.

Now use FIRST_ROWS hint:

SQL> select /*+ FIRST_ROWS*/ * from test where object_type='JAVA CLASS';

Execution Plan
Plan hash value: 3548301374

| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
| 0 | SELECT STATEMENT | | 14662 | 1345K| 536 (1)| 00:00:07 |
| 1 | TABLE ACCESS BY INDEX ROWID| TEST | 14662 | 1345K| 536 (1)| 00:00:07 |
|* 2 | INDEX RANGE SCAN | TEST_IN | 14662 | | 43 (3)| 00:00:01 |

In this case, optimizer has used the index.

Q> Why?

Ans> Because you wanted to see first few rows quickly. So, following your instructions oracle delivered you first few rows quickly using index and later delivering the rest.

See the difference in cost, although the response time (partial) of second query was faster but resource consumption was high.

But that does not mean that this optimizer mode is bad. As I said this mode may be good for interactive client-server model. In most of OLTP systems, where users want to see data fast on their screen, this mode of optimizer is very handy.

Important facts about FIRST_ROWS

  1. It gives preference to Index scan Vs Full scan (even when index scan is not good).
  2. It prefers nested loop over hash joins because nested loop returns data as selected (& compared), but hash join hashes one first input in hash table which takes time.
  3. Cost of the query is not the only criteria for choosing the execution plan. It chooses plan which helps in fetching first rows fast.
  4. It may be a good option to use this in an OLTP environment where user wants to see data as early as possible.


In simple terms, it means better throughput

While FIRST_ROWS may be good in returning first few rows, ALL_ROWS ensures the optimum resource consumption and throughput of the query. In other words, ALL_ROWS is better to retrieve the last row first.

In above example while explaining FIRST_ROWS, you have already seen how efficient ALL_ROWS is.

Important facts about ALL_ROWS

  1. ALL_ROWS considers both index scan and full scan and based on their contribution to the overall query, it uses them. If Selectivity of a column is low, optimizer may use index to fetch the data (for example ‘where employee_code=7712’), but if selectivity of column is quite high ('where deptno=10'), optimizer may consider doing Full table scan. With ALL_ROWS, optimizer has more freedom to its job at its best.
  2. Good for OLAP system, where work happens in batches/procedures. (While some of the report may still use FIRST_ROWS depending upon the anxiety level of report reviewers)
  3. Likes hash joins over nested loop for larger data sets.


Cost based optimizer gives you flexibility to choose response time or throughput. So use them based on your business requirement.

Friday, March 2, 2007

Nested loops, Hash join and Sort Merge joins – difference?

Nested loop (loop over loop)

In this algorithm, an outer loop is formed which consists of few entries and then for each entry, and inner loop is processed.


Select tab1.*, tab2.* from tabl, tab2 where tabl.col1=tab2.col2;

It is processed like:

For i in (select * from tab1) loop
For j in (select * from tab2 where col2=i.col1) loop
Display results;
End loop;
End loop;

The Steps involved in doing nested loop are:

a) Identify outer (driving) table

b) Assign inner (driven) table to outer table.

c) For every row of outer table, access the rows of inner table.

In execution plan it is seen like this:


When optimizer uses nested loops?

Optimizer uses nested loop when we are joining tables containing small number of rows with an efficient driving condition. It is important to have an index on column of inner join table as this table is probed every time for a new value from outer table.

Optimizer may not use nested loop in case:

  1. No of rows of both the table is quite high
  2. Inner query always results in same set of records
  3. The access path of inner table is independent of data coming from outer table.

Note: You will see more use of nested loop when using FIRST_ROWS optimizer mode as it works on model of showing instantaneous results to user as they are fetched. There is no need for selecting caching any data before it is returned to user. In case of hash join it is needed and is explained below.

Hash join

Hash joins are used when the joining large tables. The optimizer uses smaller of the 2 tables to build a hash table in memory and the scans the large tables and compares the hash value (of rows from large table) with this hash table to find the joined rows.

The algorithm of hash join is divided in two parts

  1. Build a in-memory hash table on smaller of the two tables.
  2. Probe this hash table with hash value for each row second table

In simpler terms it works like

Build phase

For each row RW1 in small (left/build) table loop
Calculate hash value on RW1 join key
Insert RW1 in appropriate hash bucket.
End loop;

Probe Phase

For each row RW2 in big (right/probe) table loop
Calculate the hash value on RW2 join key
For each row RW1 in hash table loop
If RW1 joins with RW2
Return RW1, RW2
End loop;
End loop;

When optimizer uses hash join?

Optimizer uses has join while joining big tables or big fraction of small tables.

Unlike nested loop, the output of hash join result is not instantaneous as hash joining is blocked on building up hash table.

Note: You may see more hash joins used with ALL_ROWS optimizer mode, because it works on model of showing results after all the rows of at least one of the tables are hashed in hash table.

Sort merge join

Sort merge join is used to join two independent data sources. They perform better than nested loop when the volume of data is big in tables but not as good as hash joins in general.

They perform better than hash join when the join condition columns are already sorted or there is no sorting required.

The full operation is done in two parts:

  • Sort join operation

get first row RW1 from input1
get first row RW2 from input2.

  • Merge join operation

while not at the end of either input loop
if RW1 joins with RW2
get next row R2 from input 2
return (RW1, RW2)
else if RW1 < style=""> get next row RW1 from input 1
get next row RW2 from input 2
end loop

Note: If the data is already sorted, first step is avoided.

Important point to understand is, unlike nested loop where driven (inner) table is read as many number of times as the input from outer table, in sort merge join each of the tables involved are accessed at most once. So they prove to be better than nested loop when the data set is large.

When optimizer uses Sort merge join?

a) When the join condition is an inequality condition (like <, <=, >=). This is because hash join cannot be used for inequality conditions and if the data set is large, nested loop is definitely not an option.

b) If sorting is anyways required due to some other attribute (other than join) like “order by”, optimizer prefers sort merge join over hash join as it is cheaper.

Note: Sort merge join can be seen with both ALL_ROWS and FIRST_ROWS optimizer hint because it works on a model of first sorting both the data sources and then start returning the results. So if the data set is large and you have FIRST_ROWS as optimizer goal, optimizer may prefer sort merge join over nested loop because of large data. And if you have ALL_ROWS as optimizer goal and if any inequality condition is used the SQL, optimizer may use sort-merge join over hash join

Tuesday, February 27, 2007

10g new background processes

With 10g many new background processes were introduced.
This note highlights those.


Memory Manager (MMAN) coordinates the sizing of different memory components within SGA. MMAN keeps a track of sizes of components and allocates/de-allocates memory based on their usage. This process is used only when you enable Automatic Shared Memory Management.


Process responsible for writing flashback logs which stores pre-image of data-blocks.

These logs can be used to flash back database to past point in time. It proves very handy specially in situation of logical data corruptions like an accidental drop/truncate of a table or commit of an unwanted transaction.


Change Tracking Writer (CTWR) Process introduced in 10g to track changes to block to assist fast RMAN incremental backups.


The Memory Monitor Light (MMNL) process introduced in 10g is responsible for writing out statistics buffer to disk as needed. It works with Automatic Workload Repository (AWR)


The memory monitor (MMON) process was introduced in 10g is responsible for gathering statistics for Automatic Workload Repository. This helps Oracle doing automatic problem detection and self tuning.


MMON background slave (m000) processes.


RBAL (Re-balancer) is the ASM related process that performs rebalancing of disk resources controlled by ASM.


These processes are managed by the RBAL process and are used to do the actual rebalancing of ASM controlled disk resources. The number of ARBx processes invoked is directly influenced by the asm_power_limit parameter.


The ASMB process is used to provide information to and from the Cluster Synchronization Services used by ASM to manage the disk resources. It is also used to update statistics and provide a heartbeat mechanism.

For more detailed info check following note on metalink:

Server Architecture on UNIX and NT

Thursday, February 15, 2007

Size your Undo tablespace

What should be the size of UNDO tablespace? Generally this question is vaguely answered saying that sizing comes with experience (of DBA) or looking at load on server or sometimes even by the number of ORA-1555 or out of space errors.

This paper is to help DBA’s in calculating the size of UNDO tablespace by using a simple formula.

While designing an application, generally it is tough to know about the number of transactions and subsequently number of rows changed per second.
So I suggest having a “big undo tablespace” to start with and based on load, after doing some calculations and resize your UNDO tablespace.

In my case one of the applications was going to production (live), and I had no idea that how many transactions will happen against this database. All what I was told that there will be optimum (transactional) activity on this database. The word “optimum” itself is vague.

So I started with UNDO tablespace with size of 2GB and datafiles with autoextend “on” .

In production, you must be very careful in using this (autoextend on) as the space may grow to inifinity very fast. So my advice is either dont use this option, or use with "maxsize" or continuously monitor space (which is tough).

I month later, I noticed the activity from V$undostat.

Here is the step by step approach:

Step 1: Longest running query.

SQL> select max(maxquerylen) from v$undostat;


This gives you ideal value for UNDO_RETENTION. To be on the safer size you should add few more seconds to get the right value. So in my case, the size of undo retention should be say 2000 secs.

Step 2: Size of UNDO tablespace.

Size of UNDO needed = UNDO_RETENTION x [UNDO block Generation per sec x DB_BLOCK_SIZE] + Overhead(30xDB_BLOCK_SIZE)

Out of these we know UNDO_RETENTION and DB_BLOCK_SIZE

All we need is to find out “UNDO Blocks per second”

Which can be easily fetched from v$undostat

SQL> SELECT (SUM(undoblks))/ SUM ((end_time - begin_time) * 24*60*60) "UPS"
2 FROM v$undostat;


V$undostat stores data for every 10 mins and begin/end times are start/end time of those intervals. We multiplied it with 24*60*60 because the difference between two dates will be in days and to get to seconds, we need it to multiply with 24hrs*60mins*60secs

So now we have all the values needed.

Undo size needed = [8.12 x 2000 x 8192] + [30 x 8192] = 133283840 bytes = 127.11 MB

Wednesday, February 7, 2007

Identify and fix table fragmentation in Oracle 10g - how?

Table fragmentation – when?

If a table is only subject to inserts, there will not be any fragmentation.
Fragmentation comes with when we update/delete data in table.
The space which gets freed up during non-insert DML operations is not immediately re-used (or sometimes, may not get reuse ever at all). This leaves behind holes in table which results in table fragmentation.

To understand it more clearly, we need to be clear on how oracle manages space for tables.

“High water mark” of table actually defines the border line between (ever) used and unused (never) space. While performing full table scan, Oracle will always read the data up to HWM. And if there is lot of free space with-in HWM, that is read too, and hence degrading the performance of FTS.

Now lets see how to identify HWM, unused (never used) space and free space (used but deleted/updated) and then take a call whether the concerned table is candidate for a reorganization or not.

SQL> create table test as select * from dba_tables; -- Create a table

Table created.

SQL> analyze table test compute statistics; -- Analyze it

Table analyzed.

SQL> select blocks "Ever Used", empty_blocks "Never Used", num_rows "Total rows"
2 from user_tables where table_name='TEST'; -- The number of blocks used/free

Ever Used Never Used Total rows
---------- ---------- ----------
49 6 1680

SQL> delete from test where owner='SYS'; --- Im deleting almost half the number of rows.

764 rows deleted.

SQL> commit;

Commit complete.

SQL> analyze table test compute statistics; -- Analyze it again

Table analyzed.

SQL> select blocks "Ever Used", empty_blocks "Never Used", num_rows "Total rows"

2 from user_tables where table_name='TEST'; -- No difference in blocks usage

Ever Used Never Used Total rows
---------- ---------- ----------
49 6 916

PL/SQL procedure successfully completed.

Even though you deleted almost half the rows, the above shows that table HWM is up to 49 blocks, and to perform any FTS, Oracle will go up to 49 blocks to search the data. If your application is so-written that there are many FTS on this table, you may consider, reorganizing this table.

Reasons to reorganization

a) Slower response time (from that table)
b) High number of chained (actually migrated) rows.
c) Table has grown many folds and the old space is not getting reused.

Note: Index based queries may not get that much benefited by reorg as compared to queries which does Full table scan.

How to reorganize?

Before Oracle 10g, there were mainly 2 ways to do the reorganization of the table
a) Export, drop, and import.
b) Alter table move (to another tablespace, or same tablespace).

Oracle 10g provides us a new way of reorganizing the data.

Shrink command: This command is only applicable for tables which are tablespace with auto segment space management.

Before using this command, you should have row movement enabled.

SQL> alter table test enable row movement;

Table altered.

There are 2 ways of using this command.

1. Break in two parts: In first part rearrange rows and in second part reset the HWM.

Part 1: Rearrange (All DML's can happen during this time)

SQL> alter table test shrink space compact;

Table altered.

Part 2: Reset HWM (No DML can happen. but this is fairly quick, infact goes unnoticed.)

SQL> alter table sa shrink space;

Table altered.

2. Do it in one go:

SQL> alter table sa shrink space; (Both rearrange and restting HWM happens in one statement)

Table altered.

Few advantages over the conventional methods

1. Unlike "alter table move ..", indexes are not in UNUSABLE state. After shrink command, indexes are updated also.

2. Its an online operation, So you dont need downtime to do this reorg.

3. It doesnot require any extra space for the process to complete.


Its a new 10g feature to shrink (reorg) the tables (almost) online which can be used with automatic segment space management.

Wednesday, January 31, 2007

Running shell script from Oracle

It’s simple!!

Especially with 10g, it is even more simpler.

Here is an example:

Step 1: Create a shell script and grant execute priv to oracle user. I created a simple one:

bash-3.00$ cat /tmp/
echo "Test succeeded `date`" >> /tmp/test.log

Note: Do not forget to start your shell script with the shell you are using.
>> #!/usr/bin/ksh should be present in your shell script otherwise, the job will fail with ORA-27369 job of type EXECUTABLE failed with

bash-3.00$ chmod 755 /tmp/

Step 2: Create the job:
Remember user doing the following should have “CREATE JOB” privilege.

3 job_name => 'TEST_SHELL',
4 job_type => 'EXECUTABLE',
5 job_action => '/tmp/',
6 start_date => sysdate,
7 repeat_interval => 'FREQ=MINUTELY; INTERVAL=1', /* every one hour */
8 enabled => TRUE,
9 comments => 'Testing shell script from Oracle');
10 END;
11 /

PL/SQL procedure successfully completed.

Step 3: Check if the job is running:

bash-3.00$ cat /tmp/test.log
Test succeeded Wed Jan 31 01:02:23 PST 2007
Test succeeded Wed Jan 31 01:03:23 PST 2007
Test succeeded Wed Jan 31 01:04:23 PST 2007

Tuesday, January 30, 2007

Index skip Scan – how?

With Oracle 9i, CBO is equipped with many more features, one of them is “Index skip scan” .This means even if you have a composite index on more than one column and you use the non-prefix column alone in your SQL, it may still use index.

I said “may” because CBO will calculate the cost of using the index and if it is more than that of full table scan, then it may not use index.

Index skip scan works differently from a normal index (range) scan.
A normal range scan works from top to bottom first and then move horizontal.
But a Skip scan includes several range scans in it. Since the query lacks the leading column it will rewrite the query into smaller queries and each doing a range scan.

SQL> create table test (a number, b number, c number);

Table created.

SQL> create index test_i on test(a,b);

Index created.

SQL> begin
2 for i in 1 .. 100000
3 loop
4 insert into test values(mod(i, 5), i, 100);
5 end loop;
6 commit;
7 end;
8 /

PL/SQL procedure successfully completed.

SQL> exec dbms_stats.gather_table_stats(ownname => 'SAC', tabname => 'test', cascade => true);

PL/SQL procedure successfully completed.

SQL> set autotrace trace exp
SQL> select * from test where b=95267;

Execution Plan
SELECT STATEMENT Optimizer=ALL_ROWS (Cost=22 Card=1 Bytes=10)

1 0

2 1
INDEX (SKIP SCAN) OF 'TEST_I' (INDEX) (Cost=21 Card=1)

I above example, “select * from test where b=95267” was broken down to several small range scan queries. It was effectively equivalent to following

Select * from test where a=0 and b=95267
Select * from test where a=1 and b=95267
Select * from test where a=2 and b=95267
Select * from test where a=3 and b=95267
Select * from test where a=4 and b=95267;

In concrete, saying that skip scan is not as efficient as normal “single range scan” is correct. But yet saves some disk space and overhead of maintaining another index.

Wednesday, January 24, 2007

External Tables - 10g

Until 9i external tables could only be used for read purposes.
But 10g allows you to populate external tables.

here is how it does:

Step 1: - Create a directory

SQL> create directory my_dir as '/usr/test'; --- Make sure Oracle OS user has
privilege to write on this directory.

Step 2: - Grant permission to user SCOTT

SQL> grant read,write on directory my_dir to scott;

Step 3: - Create the External Table: -

SQL> CREATE TABLE scott.external_emp_dept
LOCATION ('emp_dept.exp')
reject limit unlimited
select e.ename,d.dname from emp e, dept d where

SQL> select * from scott.external_emp_dept;

---------- --------------

---------- --------------

And you will find a file generated i.e. 'emp_dept.exp' in /usr/test directory.

Now you can take this file to the target database and create an external table
and associate the file with that table.

Step 4: - to be executed on the source database
SQL> set long 2000
SQL> Select dbms_metadata.get_ddl('TABLE','EXTERNAL_EMP_DEPT') from dual;

The above command will give you the DDL command to create the table at target

Similar note is also available on metalink 249321.1 which was also written by me when I was with Oracle Support.

Friday, January 12, 2007


They 2 considered to be the most important parameter for shared pool tuning, but I guess most of us generally don’t use them or sometimes use them incorrectly.

The idea to put them here to understand “what they do?”, “when to use them?”, “how to use them?” and finally “see the impact”.


In most of the environments, there are many SQL’s which are re-fired many a times within a session, and every time they are issued, the session searches the shared pool for the parsed state, if it doesn’t get the parsed version, it will “hard parse” it, and if it exists in shared pool, it will still do a “soft parse”.

As we know “hard parse” is a costly operation, even a “soft parse” requires library cache latch and CPU overhead, which if aggregated is a significant number.

This parameter if set to a non-zero value (default is 50), improves the “soft parse” performance by doing a softer soft parse.


If enabled, oracle maintains a local session cache which stores recently closed cursors of a session.

To avoid this space getting misused or overused, oracle maintains the cursors for which there have been 3 parsed calls in the past, so all the SQL’s issued by a session are not here. Remember each cursor if pinned here, is not freeable and hence you may require more shared pool area.

A normal cursor in shared pool is sum of 2 components:
a) Heap 0 – size 1KB
b) SQL Area – size multiple of 4k

When we use session_cached_cursors only first component of cursor which is HEAP 0 is pinned in local session cache and if there is a call for re-parse for a statement, Oracle first checks the existence of the cursor in local cache and if found, it gets the address of the rest of the cursor which is in SQL Area (assuming if it is not aged out), so hereby saving CPU overhead and library cache latch contention.

How much it is getting used ?

SQL> select max(value) from v$sesstat
2 where STATISTIC# in (select STATISTIC# from v$statname where name='session cursor cache count');


This shows maximum value for session_cached_cursors in past. If this value= “session_cached_cursors” (init.ora parameter), you should consider increasing it.

If you want to see how is your session cache doing?

SQL> select cache/tot*100 "Session cursor cache%"
2 from
3 (select value tot from v$sysstat where name='parse count (total)'),
4 ( select value cache from sys.v_$sysstat where name = 'session cursor cache hits' );

Session cursor cache%

A value near 100 is considered very good. But you may still consider increasing this parameter if MAX(VALUE) in query one shows you equal number of cached cursor which you have set.

Conclusion: In an OLTP application, where the same set of SQL is issued number of times, one must configure this parameter more than its default value (50).
Also increasing this parameter will mean extra memory required for shared pool, so you must increase your shared pool when you use this parameter.


SQL can be aged out of shared pool in 2 cases:
a) When the cursor is closed: When the cursors are closed by application, they can still be in shared pool, until there comes a request for a new cursor and oracle needs to use LRU algorithm. SESSION_CACHED_CURSORS helps you in pinning (partial because it only pins HEAP 0) when the cursors are closed.

b) When the cursor is open: Oracle requires parsed state of SQL at PARSE and EXECUTE phase. If oracle parses (soft or hard) a statement, there is a likely hood that Oracle may age out your SQL out of shared pool after PARSE state if it requires to accommodate a new SQL coming its way. So in the EXECUTE state, there is a possibility that parsed information is lost and oracle parse it again.

CURSOR_SPACE_FOR_TIME if set to TRUE, ensures that SQL is not aged out before the cursor is closed, so in EXECUTE phase, you will have the PARSE information.

But this is generally a rare case and happens in a very highly active environment because to accommodate a new SQL, Oracle first check the free space and if it doesn’t get, it checks the closed cursors and see if any cursor can be aged out and when there is no space which can be reclaimed, Oracle comes to open cursors which are not EXECUTED.
This generally happens when the space of shared pool is too less.

I don’t suggest setting this parameter to TRUE in most of the cases.

There are some other serious tradeoffs also.
When a cursor is pinned, it cant be aged out and related memory cannot be freed for any new SQL and if you set this parameter to TRUE, you are telling Oracle to keep all the open cursors pinned and not freeable.
If you use this parameter, you are pinning the whole cursor not just the HEAP 0 which is 1k, you are pinning HEAP 0 (1k) + SQL Area (multiple of 4k) which makes shared pool life tough because of space issues.


As I said, I don’t suggest setting this parameter to TRUE in most of the cases. An alternative to set this parameter is to increase shared pool size or/and check your code on how many numbers of cursors you are opening/closing. That will be a better approach. Setting this parameter is like taking paracetamol without knowing the cause of fever.

Saturday, January 6, 2007

Should I rebuild or coalesce my indexes ?

Should I rebuild or coalesce my indexes ?

My last 2 notes on this blogs were specific to rebuilding (when and to which) indexes.

There is one more critical aspect which I wanted to discuss is the cost/effort related to rebuilding indexes.

Rebuilding an index is quite a costly operation and you must evaluate the benefit Vs effort before rebuilding an index.

Rebuilding (online) an index requires additional resources like space, cpu usage, time.

Here is one more option, which is less used or probably less popular “coalesce”.

Rebuild Vs Coalesce


  • Can move an index to a different tablespace

  • Resource consuming process

  • Takes more time

  • Creates a new tree

  • Shortens the height of an index if it was increased due to DML activities

  • Rebuilding can affect future DML’s because index becomes compact and for future DML’s index has to be extend dynamically.


  • Cannot move an index to a different tablespace

  • Comparatively less resource consuming

  • Takes relatively less time

  • Only merge the adjacent free space between leaf blocks within a branch

  • Doesn’t shorten height of index

  • Since coalesce doesn’t effect the total size and only frees up the unused space, it doesn’t affect future DML’s

Image Source: Oracle doc
Coalescing the index, frees up space of adjacent leaf blocks within a branch block. This way the number of blocks or extents which an index is using will not change but there will be more number of free blocks which can be used for future inserts or updates.

In an OLTP environment, where data is highly volatile, coalesce is better because it doesn’t shrink the index and the free space remains with the index segment.

Thursday, January 4, 2007

How to find indexes which are candidate for rebuild?

I'm writing this in response to Harvinder's comment.

I know it’s a very arguable question, but let me put my words and then you can comment.

I follow different approaches to find out indexes for rebuild

- Find out indexes having height(blevel+1) > 4 i.e. Indexes having BLEVEL > 3
SQL> select owner,index_name,table_name,blevel from dba_indexes where BLEVEL>3

- Analyze indexes and find out ratio of (DEL_LF_ROWS/LF_ROWS*100) is > 20
First "Analyze the index with validate structure option" and then,

SQL> SELECT name,height,lf_rows,del_lf_rows,(del_lf_rows/lf_rows)*100 as ratio FROM INDEX_STATS;

But (a big but), the reason to rebuild should be because of poor performance of your queries using indexes. You should/must not rebuild indexes if you find both the above reason true for index if it is not coupled with poor SQL performance.

See this example:

SQL> analyze index TEST_INDX validate structure; -- First analyze the suspect index

Index analyzed.

SQL> SELECT name,height,lf_rows,lf_blks,del_lf_rows FROM INDEX_STATS;

------------ ---------- ---------- ----------- -------------
TEST_INDX 8 938752 29575 73342

You can see height of the index is 8 and also high number of DEL_LF_ROWS

SQL> set autotrace on
SQL> set timing on
SQL> select count(*) from TEST_TABLE where TEST_COL like 'http://www.hots%';

Elapsed: 00:00:27.25

Execution Plan
Plan hash value: 870163320

Id Operation Name Rows Bytes Cost (%CPU) Time

0 SELECT STATEMENT 1 117 10 (0) 00:00:01


*2 INDEX RANGE SCAN TEST_INDX 115 13455 10 (0) 00:00:01

1 recursive calls
0 db block gets
764 consistent gets
757 physical reads

0 redo size
516 bytes sent via SQL*Net to client
468 bytes received via SQL*Net from client
2 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
1 rows processed

Now you rebuild the indexes

SQL> alter index TEST_INDX rebuild;

Index altered.

SQL> select count(*) from TEST_TABLE where TEST_COL like 'http://www.hots%';


Elapsed: 00:00:06.18

Execution Plan
Plan hash value: 870163320 - See here although it is using the same plan but still it is faster

Id Operation Name Rows Bytes Cost (%CPU) Time


0 SELECT STATEMENT 1 117 6 (0) 00:00:01


* 2 INDEX RANGE SCAN TEST_INDX 115 13455 6 (0) 00:00:01

15 recursive calls
0 db block gets
592 consistent gets
588 physical reads
0 redo size
516 bytes sent via SQL*Net to client
468 bytes received via SQL*Net from client
2 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)

SQL> SELECT name,height,lf_rows,lf_blks,del_lf_rows,distinct_keys,used_space FROM INDEX_STATS;

------------------------------ ---------- ---------- ---------- -----------
TEST_INDX 4 865410 15434 0

This clearly indicates rebuild helped my query performance. The height of index is reduced to 4 and DEL_LF_ROWS is 0

Now coming to second of part of Harvinder's comment.

Possible ways of Rebuilding.

- Online/Offline.

ONLINE Rebuild (8i onwards)

SQL> Alter index rebuild online;

This allows parallel DML to go on while Index is getting rebuild. Remember, online index requires more space, as it creates a new index before dropping the old one.

Index Rebuild is primarily a 3 step process

Prepare phase: Oracle locks the table for a fraction of second (actually not felt) to build index structure and populate data-dictionary.

Build phase: Most of the work is done in this phase. Oracle engine populates the index using the table and allows parallel DML's, parallel DML's uses a temporary journal table (b tree index like structure) to host the entries while the new index is getting populated.

Merge phase: Now the final stage, Oracle merges the new index with the journal table and drops the old index. Even during the merge phase, any changes to the table are recorded in the journal table and they get merged towards end of this phase.

9i onwards, online index rebuild feature includes Reverse Key, Function Based and Key Compressed indexes.

Offline Index rebuild.

SQL> Alter index rebuild;

This is conventional rebuild which was used(still available) before 8i. In this rebuild process, oracle drops the old index and creates a new one. In this process, no extra space is required, but parallel dml's are not supported.

Tuesday, January 2, 2007

When should I rebuild my indexes?

Need is necessary for any change. I hope all agree to this. So why many DBA’s (not all) rebuilds indexes on periodical basis without knowing the impact of it?

Let’s revisit the facts stated by many Oracle experts:

- B Tree indexes can never be unbalanced
- The distance of any leaf block (which contains the index value) from the root block is always same.
- B Tree performance is good for both small and large tables and does not degrade with the growth of table

When will rebuild help?

• When the data in index is sparse (lots of holes in index, due to deletes or updates) and your query is usually range based.

If your SQL’s use “=” predicate, rebuilding of index may not help you at all unless the DML’s (updates/deletes) on the table are cause of increasing the height of index. Due to heavy updates/deletes on a table, the space in the index can be left unused (unless it is reused) due to which block splits and if the splitting goes beyond the level of accomodation, the height of index increases by 1.
In simpler terms, unless you reduce the height (BLEVEL) of index, the time taken to reach the leaf block from root block will not change.

As explained above, the height of index can be decreased by rebuild of an index(only if height was increased to updates/deletes).

Only in case of range scan, rebuild (consider coalesce option also) may help.

Select * from emp where empno between 1000 and 2000;

In above case there could be lot of deleted/updated data which will also be read while reading between 1000 to 2000. Rebuilding will reclaim any unused space and hence the select could be faster.

• Index fast full scan/ Index full scan. (rare case)
If your SQL’s are designed in a way that mostly data is selected from index (it doesn’t goes to table). In this case the whole index is read for data not for redirecting to table.

Select count(empno) from emp; -- you have an index in empno
Select empno from emp -- you have an index in empno

Both the above SELECTs will fetch the data from Index only. Since the whole index is getting read, the space used by deleted rows is also read at the time of SELECT operation

Happy New Year - 2007

I wish everyone a very happy, prosperous, peaceful, joyful new year 2007.

With this, I want to thank you for the superb response to this blob.
I wanted to mention this earlier, but I was waiting for applauding mails to pause, so I could publish them in one go.

Great, Its really great, Thanks.

Ajay Kumar
Nice work I must say.... Well Done and BIG WOW

Mushtaque Syed

Very nice and interesting, knowledgeable,
I appreciate

Hi Sachin,

This certainly is a great blog and an excellent idea initiated. Just wanted to appreciate you of your hard work.


vinodh kumar
Hi Sachin,

What u have tried is a valuable thing.Thanks for taking time to prepare this blog.
I appreciate if u share ur knowledge more in tuning the SQL statements.But still what u have done is an excellent job.Keep going.

Vinodhkumar V.