Thursday, November 30, 2006

Precaution while defining data types in Oracle

I was reading a wonderful article on Tom Kyte’s blog on repercussion of ill defined data types.

Some of the example he mentions is:

- Varchar2(40) Vs Varchar2(4000)
- Date Vs varchar2


Varchar2(40) Vs Varchar2(4000)

Generally developers ask for this to avoid issues in the application. They always want the uppermost limit to avoid any application errors. The fact on which they argue is “varchar2” data type will not reserve 4000 character space so disk space is not an issue. But what they don’t know is how costly are these.

Repercussions

- Generally application does an “array fetch” from the database i.e. they select 100 (may be more) rows in one go. So if you are selecting 10 varchar2 cols. Then effective RAM (not storage) usage will be 4000(char) x 10 (cols) x 100 (rows) = 4 MB of RAM. On contrary, had this column defined with 40 char size the usage would have been 40 x 10 x 100 ~ 40KB (approx)!! See the difference; also we didn’t multiply the “number of session”. That could be another shock!!

- Later on, it will be difficult to know for what the column was made. Ex: for first_name, if you define varchar2 (40000), it’s confusing for a new developer to know for what this column was made.

Date Vs varchar2

Again lots of developers define date cols as varchar2 (or char) for their convenience. But what they forget is not only data integrity (a date could be 01/01/03 .. what was dd,mm,yy .. then u don’t know what did you defined) but also performance.

While doing “Index range scans” by using “ between and
Optimizer will not be able to use the index as efficiently as in case of:
between and


I suggest you to read the full article by maestro himself.

3 comments:

Rob W said...

I read the full article, but as I was searching for how data types are stored in Oracle I ran across this from the PL/SQL Reference doc:

Small VARCHAR2 variables are optimized for performance, and larger ones are optimized for efficient memory use. The cutoff point is 2000 bytes. For a VARCHAR2 that is 2000 bytes or longer, PL/SQL dynamically allocates only enough memory to hold the actual value. For a VARCHAR2 variable that is shorter than 2000 bytes, PL/SQL preallocates the full declared length of the variable. For example, if you assign the same 500-byte value to a VARCHAR2(2000 BYTE) variable and to a VARCHAR2(1999 BYTE) variable, the former takes up 500 bytes and the latter takes up 1999 bytes.



Here is the link to the documentation:

http://download-east.oracle.com/docs/cd/B19306_01/appdev.102/b14261/datatypes.htm#sthref732

Sachin said...

Hi Rob,

Yes I agree with you and infact it is good you brought soem light to this.

But my understanding is this may be true for pl/sql engine but for normal sql queries, oracle may still be taking more memory if varchar2 is defined more than 2000.
(i may be wrong). I will do some testing and come back to you.

sachin

Anonymous said...

this is really nice article. Many developers argued with me about setting max field length. Now i am clear. But one question. The array fetch is happening in the server? We do "SET ARRAYSIZE xx" in sqlplus. Is this the one Tom is taking about... Any thoughts...