Bad SQL design
Important point
If the statement is designed poorly, nothing much can be done by optimizer or indexes
Few known thumb rules
–Avoid Cartesian joins
–Use UNION ALL instead of UNION – if possible
–Use EXIST clause instead of IN - (Wherever appropiate)
–Use order by when you really require it – Its very costly
–When joining 2 views that themselves select from other views, check that the 2 views that you are using do not join the same tables!
–Avoid NOT in or NOT = on indexed columns. They prevent the optimizer from using indexes. Use where amount > 0 instead of where amount != 0
- Avoid writing whereis not null. nulls can prevent the optimizer from using an index
- Avoid calculations on indexed columns. Write WHERE amount > 26000/3 instead of WHERE approved_amt/3 > 26000
- The query below will return any record where bmm_code = cORE, Core, CORE, COre, etc.
select appl_appl_id where upper(bmm_code) LIKE 'CORE%'
But this query can be very inefficient as it results in a full table scan. It cannot make use of the index on bmm_code.
Instead, write it like this:
select appl_appl_id from nci_appl_elements_t where (bmm_code like 'C%' or bmm_code like 'c%') and upper(bmm_code) LIKE 'CORE%'
This results in Index Range Scan.
You can also make this more efficient by using 2 characters instead of just one:
where ((bmm_code like 'CO%' or bmm_code like 'Co%' or bmm_code like 'cO%' or bmm_code like 'co%') and upper(bmm_code) LIKE 'CORE%')
Inviting Experts
Friends, feel free to correct me. I will appreciate if you can add your comments also.
If the statement is designed poorly, nothing much can be done by optimizer or indexes
Few known thumb rules
–Avoid Cartesian joins
–Use UNION ALL instead of UNION – if possible
–Use EXIST clause instead of IN - (Wherever appropiate)
–Use order by when you really require it – Its very costly
–When joining 2 views that themselves select from other views, check that the 2 views that you are using do not join the same tables!
–Avoid NOT in or NOT = on indexed columns. They prevent the optimizer from using indexes. Use where amount > 0 instead of where amount != 0
- Avoid writing where
- Avoid calculations on indexed columns. Write WHERE amount > 26000/3 instead of WHERE approved_amt/3 > 26000
- The query below will return any record where bmm_code = cORE, Core, CORE, COre, etc.
select appl_appl_id where upper(bmm_code) LIKE 'CORE%'
But this query can be very inefficient as it results in a full table scan. It cannot make use of the index on bmm_code.
Instead, write it like this:
select appl_appl_id from nci_appl_elements_t where (bmm_code like 'C%' or bmm_code like 'c%') and upper(bmm_code) LIKE 'CORE%'
This results in Index Range Scan.
You can also make this more efficient by using 2 characters instead of just one:
where ((bmm_code like 'CO%' or bmm_code like 'Co%' or bmm_code like 'cO%' or bmm_code like 'co%') and upper(bmm_code) LIKE 'CORE%')
Inviting Experts
Friends, feel free to correct me. I will appreciate if you can add your comments also.
Consider declaring NOT NULL columns
Consider declaring NOT NULL columns
People sometimes do not bother to define columns as NOT NULL in the data dictionary, even though these columns should not contain nulls, and indeed never do contain nulls because the application ensures that a value is always supplied. You may think that this is a matter of indifference, but it is not. The optimizer sometimes needs to know that a column is not nullable, and without that knowledge it is constrained to choose a less than optimal execution plan.
1. An index on a nullable column cannot be used to drive access to a table unless the query contains one or more predicates against that column that exclude null values.
Of course, it is not normally desirable to use an index based access path unless the query contains such predicates, but there are important exceptions.
For example, if a full table scan would otherwise be required against the table and the query can be satisfied by a fast full scan (scan for which table data need not be read) against the index, then the latter plan will normally prove more efficient.
Test-case for the above reasoning
SQL> create index color_indx on automobile(color);
Index created.
SQL> select distinct color,count(*) from automobile group by color;
Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=ALL_ROWS (Cost=4 Card=1046 Bytes=
54392)
1 0 SORT (GROUP BY) (Cost=4 Card=1046 Bytes=54392)
2 1 TABLE ACCESS (FULL) OF 'AUTOMOBILE' (TABLE) (Cost=3 Card
=1046 Bytes=54392)
SQL> alter table automobile modify color not null;
Table altered.
SQL> select distinct color,count(*) from automobile group by color;
Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=ALL_ROWS (Cost=4 Card=1046 Bytes=
54392)
1 0 SORT (GROUP BY) (Cost=4 Card=1046 Bytes=54392)
2 1 INDEX (FAST FULL SCAN) OF 'COLOR_INDX' (INDEX) (Cost=3 C
ard=1046 Bytes=54392)
2. If you are calling a sub-query in a parent query using the NOT IN predicate, the indexing on column (in where clause of parent query) will not be used.
Because as per optimizer, results of parent query needs to be displayed only when there is no equi-matching from sub-query, And if the sub-query can potentially contain NULL value (UNKNOWN, incomparable), parent query will have no value to compare with NULL value, so it will not use the INDEX.
Test-case for the above Reasoning
SQL> create index sal_indx on emp(sal);
Index created.
SQL> create index ename_indx on emp(ename);
Index created.
SQL> select * from emp where sal not in (select sal from emp where ename='JONES');
Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=ALL_ROWS (Cost=17 Card=13 Bytes=4
81)
1 0 FILTER
2 1 TABLE ACCESS (FULL) OF 'EMP' (TABLE) (Cost=3 Card=14 Byt
es=518) –> you can see a full table scan even when index exist on SAL
3 1 TABLE ACCESS (BY INDEX ROWID) OF 'EMP' (TABLE) (Cost=2 C
ard=1 Bytes=10)
4 3 INDEX (RANGE SCAN) OF 'ENAME_INDX' (INDEX) (Cost=1 Car
d=1)
SQL> alter table emp modify sal not null;
Table altered.
SQL> select * from emp where sal not in (select sal from emp where ename='JONES');
Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=ALL_ROWS (Cost=5 Card=12 Bytes=56
4)
1 0 MERGE JOIN (ANTI) (Cost=5 Card=12 Bytes=564)
2 1 TABLE ACCESS (BY INDEX ROWID) OF 'EMP' (TABLE) (Cost=2 C
ard=14 Bytes=518)
3 2 INDEX (FULL SCAN) OF 'SAL_INDX' (INDEX) (Cost=1 Card=1
4) -> Here you go, your index getting used now
4 1 SORT (UNIQUE) (Cost=3 Card=1 Bytes=10)
5 4 TABLE ACCESS (BY INDEX ROWID) OF 'EMP' (TABLE) (Cost=2
Card=1 Bytes=10)
6 5 INDEX (RANGE SCAN) OF 'ENAME_INDX' (INDEX) (Cost=1 C
ard=1)
The above article has been an inspiration after reading an article on ixora . The article was missing some of the testcases, so I thought of adding few for newbiews to relate to it.
People sometimes do not bother to define columns as NOT NULL in the data dictionary, even though these columns should not contain nulls, and indeed never do contain nulls because the application ensures that a value is always supplied. You may think that this is a matter of indifference, but it is not. The optimizer sometimes needs to know that a column is not nullable, and without that knowledge it is constrained to choose a less than optimal execution plan.
1. An index on a nullable column cannot be used to drive access to a table unless the query contains one or more predicates against that column that exclude null values.
Of course, it is not normally desirable to use an index based access path unless the query contains such predicates, but there are important exceptions.
For example, if a full table scan would otherwise be required against the table and the query can be satisfied by a fast full scan (scan for which table data need not be read) against the index, then the latter plan will normally prove more efficient.
Test-case for the above reasoning
SQL> create index color_indx on automobile(color);
Index created.
SQL> select distinct color,count(*) from automobile group by color;
Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=ALL_ROWS (Cost=4 Card=1046 Bytes=
54392)
1 0 SORT (GROUP BY) (Cost=4 Card=1046 Bytes=54392)
2 1 TABLE ACCESS (FULL) OF 'AUTOMOBILE' (TABLE) (Cost=3 Card
=1046 Bytes=54392)
SQL> alter table automobile modify color not null;
Table altered.
SQL> select distinct color,count(*) from automobile group by color;
Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=ALL_ROWS (Cost=4 Card=1046 Bytes=
54392)
1 0 SORT (GROUP BY) (Cost=4 Card=1046 Bytes=54392)
2 1 INDEX (FAST FULL SCAN) OF 'COLOR_INDX' (INDEX) (Cost=3 C
ard=1046 Bytes=54392)
2. If you are calling a sub-query in a parent query using the NOT IN predicate, the indexing on column (in where clause of parent query) will not be used.
Because as per optimizer, results of parent query needs to be displayed only when there is no equi-matching from sub-query, And if the sub-query can potentially contain NULL value (UNKNOWN, incomparable), parent query will have no value to compare with NULL value, so it will not use the INDEX.
Test-case for the above Reasoning
SQL> create index sal_indx on emp(sal);
Index created.
SQL> create index ename_indx on emp(ename);
Index created.
SQL> select * from emp where sal not in (select sal from emp where ename='JONES');
Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=ALL_ROWS (Cost=17 Card=13 Bytes=4
81)
1 0 FILTER
2 1 TABLE ACCESS (FULL) OF 'EMP' (TABLE) (Cost=3 Card=14 Byt
es=518) –> you can see a full table scan even when index exist on SAL
3 1 TABLE ACCESS (BY INDEX ROWID) OF 'EMP' (TABLE) (Cost=2 C
ard=1 Bytes=10)
4 3 INDEX (RANGE SCAN) OF 'ENAME_INDX' (INDEX) (Cost=1 Car
d=1)
SQL> alter table emp modify sal not null;
Table altered.
SQL> select * from emp where sal not in (select sal from emp where ename='JONES');
Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=ALL_ROWS (Cost=5 Card=12 Bytes=56
4)
1 0 MERGE JOIN (ANTI) (Cost=5 Card=12 Bytes=564)
2 1 TABLE ACCESS (BY INDEX ROWID) OF 'EMP' (TABLE) (Cost=2 C
ard=14 Bytes=518)
3 2 INDEX (FULL SCAN) OF 'SAL_INDX' (INDEX) (Cost=1 Card=1
4) -> Here you go, your index getting used now
4 1 SORT (UNIQUE) (Cost=3 Card=1 Bytes=10)
5 4 TABLE ACCESS (BY INDEX ROWID) OF 'EMP' (TABLE) (Cost=2
Card=1 Bytes=10)
6 5 INDEX (RANGE SCAN) OF 'ENAME_INDX' (INDEX) (Cost=1 C
ard=1)
The above article has been an inspiration after reading an article on ixora . The article was missing some of the testcases, so I thought of adding few for newbiews to relate to it.
IN Vs Exist in SQL
IN Vs EXISTS
The two are processed quite differently.
IN Clause
Select * from T1 where x in ( select y from T2 )
is typically processed as:
select *
from t1, ( select distinct y from t2 ) t2
where t1.x = t2.y;
The sub query is evaluated, distinct, indexed and then
joined to the original table -- typically.
As opposed to "EXIST" clause
select * from t1 where exists ( select null from t2 where y = x )
That is processed more like:
for x in ( select * from t1 )
loop
if ( exists ( select null from t2 where y = x.x )
then
OUTPUT THE RECORD
end if
end loop
It always results in a full scan of T1 whereas the first query can make use of
an index on T1(x).
So, when is exists appropriate and in appropriate?
The two are processed quite differently.
IN Clause
Select * from T1 where x in ( select y from T2 )
is typically processed as:
select *
from t1, ( select distinct y from t2 ) t2
where t1.x = t2.y;
The sub query is evaluated, distinct, indexed and then
joined to the original table -- typically.
As opposed to "EXIST" clause
select * from t1 where exists ( select null from t2 where y = x )
That is processed more like:
for x in ( select * from t1 )
loop
if ( exists ( select null from t2 where y = x.x )
then
OUTPUT THE RECORD
end if
end loop
It always results in a full scan of T1 whereas the first query can make use of
an index on T1(x).
So, when is exists appropriate and in appropriate?
Lets say the result of the subquery
( select y from T2 )
is "huge" and takes a long time. But the table T1 is relatively small and
executing ( select null from t2 where y = x.x ) is fast (nice index on
t2(y)). Then the exists will be faster as the time to full scan T1 and do the
index probe into T2 could be less then the time to simply full scan T2 to build
the subquery we need to distinct on.
Lets say the result of the subquery is small -- then IN is typicaly more
appropriate.
If both the subquery and the outer table are huge -- either might work as well
as the other -- depends on the indexes and other factors.
I wrote this note after searching on same issue and found a simple explanation by Tom Kyte.
Here is the link of original





