Friday, March 23, 2007

Db file sequential read while doing full table scan?

These days, we are working on data warehouse in which we have a master table which will have 1.5m (approx) rows inserted every half hour and we have few fast refresh materialized view based on it. These mviews have some aggregate functions on it, which makes it a bit complex.


To start the experiment, each mview refreshes used to take some 18-20 mins, which is totally against the business requirement. Then we tried to figure out on why the mview refresh is taking so much time, in spite of dropping all the bitmap indexes on the mview (generally b-map indexes are not good for inserts/updates).

The 10046 trace (level 12) highlighted that there were many “db file sequential reads” on mview because of optimizer using “I_SNAP$_mview” to fetch the rows from mview and merge the rows with that of master table to make the aggregated data for the mview.

Good part of the story is access to master table was quite fast because we used direct load (using sqlldr direct=y) to insert the data in it. When you use direct load to insert the data, oracle maintains the list of rowids added to table in a view called “SYS.ALL_SUMDELTA”. So while doing fast mview refresh, news rows inserted are picked directly from table using the rowids given from ALL_SUMDELTA view and not from Mview log, so this saves time.

Concerned part was still Oracle using I_SNAP$ index while fetching the data from mview and there were many “db file sequential read” waits and it was clearly visible that Oracle waited on sequential read the most. We figured it out that full table scan (which uses scattered read, and multi block read count) was very fast in comparison to index access by running simple test against table. Also the tables are dependent mviews are only for the day. End of the day the master table and mview’s data will be pushed to historical tables and master table and mviews will be empty post midnight.

I gathered the stats of mview and then re-ran the mview refresh, and traced the session, and this time optimizer didn’t use the index which was good news.

Now the challenge was to run the mview stats gathering job every half an hour or induce wrong stats to table/index to ensure mview refresh never uses index access or may be to lock the stats using DBMS_STATS.LOCK_TABLE_STATS.

But we found another solution by creating the mview with “USING NO INDEX” clause. This way “I_SNAP$” index is not created with “CREATE MATERIALIZED VIEW’ command. As per Oracle the “I_SNAP$” index is good for fast refresh but it proved to be reverse for us because our environment is different and the data changes is quite frequent.

Now, we ran the tests again, we loaded 48 slices of data (24 hrs x 2 times within hour) and the results were above expectations. We could load the data with max 3 mins per load of data.

This is not the end of story. In the trace we could see the mview refresh using “MERGE” command and using full table scan access to mview (which we wanted) and rowid range access to master table.

The explain plan looks like:


Rows     Row Source Operation
-------  ---------------------------------------------------
      2  MERGE  SF_ENV_DATA_MV (cr=4598 pr=5376 pw=0 time=47493463 us)
 263052   VIEW  (cr=3703 pr=3488 pw=0 time=24390284 us)
 263052    HASH JOIN OUTER (cr=3703 pr=3488 pw=0 time=24127224 us)
 263052     VIEW  (cr=1800 pr=1790 pw=0 time=14731732 us)
 263052      SORT GROUP BY (cr=1800 pr=1790 pw=0 time=14205624 us)
 784862       VIEW  (cr=1800 pr=1790 pw=0 time=3953958 us)
 784862        NESTED LOOPS  (cr=1800 pr=1790 pw=0 time=3169093 us)
      1         VIEW  ALL_SUMDELTA (cr=9 pr=0 pw=0 time=468 us)
      1          FILTER  (cr=9 pr=0 pw=0 time=464 us)
      1           MERGE JOIN CARTESIAN (cr=9 pr=0 pw=0 time=459 us)
      1            NESTED LOOPS  (cr=6 pr=0 pw=0 time=99 us)
      1             TABLE ACCESS BY INDEX ROWID OBJ$ (cr=3 pr=0 pw=0 time=56 us)
      1              INDEX UNIQUE SCAN I_OBJ1 (cr=2 pr=0 pw=0 time=23 us)(object id 36)
      1             TABLE ACCESS CLUSTER USER$ (cr=3 pr=0 pw=0 time=40 us)
      1              INDEX UNIQUE SCAN I_USER# (cr=1 pr=0 pw=0 time=7 us)(object id 11)
      1            BUFFER SORT (cr=3 pr=0 pw=0 time=354 us)
      1             INDEX RANGE SCAN I_SUMDELTA$ (cr=3 pr=0 pw=0 time=243 us)(object id 158)
      0           NESTED LOOPS  (cr=0 pr=0 pw=0 time=0 us)
      0            INDEX RANGE SCAN I_OBJAUTH1 (cr=0 pr=0 pw=0 time=0 us)(object id 103)
      0            FIXED TABLE FULL X$KZSRO (cr=0 pr=0 pw=0 time=0 us)
      0           FIXED TABLE FULL X$KZSPR (cr=0 pr=0 pw=0 time=0 us)
 784862         TABLE ACCESS BY ROWID RANGE SF_ENV_SLICE_DATA (cr=1791 pr=1790 pw=0 time=2383760 us)
 708905     MAT_VIEW ACCESS FULL SF_ENV_DATA_MV (cr=1903 pr=1698 pw=0 time=6387829 us)




You can see the access pattern as above.

Interesting twist in the story is when I saw the wait events in trace file.



  Event waited on                             Times   Max. Wait  Total Waited
  ----------------------------------------   Waited  ----------  ------------
  db file sequential read                      2253        0.74          7.73
  db file scattered read                        240        1.05          6.77
  log file switch completion                 6        0.98          3.16
  log file switch                                    8        0.98          2.47
  rdbms ipc reply                                 6        0.00          0.00
  log buffer space                                3        0.42          0.61



Again, even when we are doing full table scan, there are “db file sequential reads”?

To confirm I opened the raw trace file (before tkprof), and checked the obj# on sequential read wait event, it was the mview (SF_ENV_DATA_MV) !! and there were many. To further investigate I checked if there were any scattered reads to mview or not. I found there were scattered reads but there were many sequential reads also on which Oracle waited more than that of scattered read which did most of the data fetching.

After giving some thought, I realized that we created the mviews without storage clause, which means Oracle created the mview with default storage clause.

So assuming there are 17 blocks in an mview (container table) extent and Multi block read count is 16, Oracle will use scattered read mechanism (multiple blocks) to read the first 16 blocks and for the rest 1 it will use sequential read mechanism (one block), so you will find many sequential reads wait events sandwiched between scattered reads. To overcome this we created the mview with larger extent sizes and also multiple of MBCR (multi block read count).

Also, another cause of sequential read is chained or migrated rows, if your mview (or table) rows are migrated, the pointer to the next row is maintained in old (original) block, which will always be read by a single i/o call i.e. by sequential read.You can check the count of chained rows using DBA_TABLES.CHAIN_CNT after analysing the table . So to overcome this, we created the mview with genuine pctfree so when the merge runs (as a part of mview refresh) and updates few rows, the rows are not moved to a different block and hence avoiding sequential read.

Conclusion:

  1. Mview creation with “USING NO INDEX” does not create “I_SNAP$” index which sometimes help in fast refresh when the data changes are quite frequent and you cannot afford to collect stats after every few mins.
  2. Create mview with storage clause suiting to your environment. Default extent sizes may not be always good.
  3. PCTFREE can be quite handy to avoid sequential reads and extra block read.

5 comments:

aditya said...

hi sachin,
i am working in so called bigest IT company in the world. but as never worked on issues like these, where can i get this sort of work. i dont care about money, simply want some good work. if u can sugest some companies it would be grt help, size does nt matter to me

aditya

Sachin said...

Aditya,

This problem is common now a days and mostly happens due to lack of job clarity before joining a new organization. I suggest you to first speak to your senior to give you the kind of work you are looking for, partially if not fully. And if that doesnt work, then probably you can move ahead with your aspirations and find suitable work which you are interested in. And this time make sure you get what you want. People around you could be of your help, including me. Dont hesitate to send your profile to me. In case I come across any future opportunity for you, I will definately revert back. All the very best.

Sachin

aditya said...

Hi Sachin,

before joining this organization i was working in small organization, so i used to think big names like my present organization, wud be having big databases, so pretty much gud work to do, but here for every work they have different teams. i only do sql tuning, database refresh and migration. i have asked for more work from my seniors they have simply told me not now, wait for some time. now i have to wait for another 1 yr to get a project change, which is really frustrating. i really want to some challenging stuff where one has to think for cracking the issue.
I will send you my profile if you give me ur id.

Aditya

Sachin said...

My id is oraclearora@gmail.com

Anonymous said...

This is really a good stuff