Showing posts with label OLAP. Show all posts
Showing posts with label OLAP. Show all posts

Friday, May 9, 2008

ORA-14097 - when using materialized view and partition table

This one was an interesting issue which came up few days back. I spent quite sometime before I solved it.

Issue was - A developer came upto me and told that he is getting "ORA-14097 - : column type or size mismatch in ALTER TABLE EXCHANGE PARTITION" while exchanging partitions.

This guy is working on a warehouse design in which huge data loads will happen on base tables through out the day and there are mviews based on which it mines and these mviews' data should be moved fact tables at midnight 12.

Now we strategize in a way that at midnight when base tables will be truncated for fresh load of next day, we exchange the partitions of these mviews with temporary partitioned tables and use "Insert /*+ append */" to move it to final fact tables. We could not have directly exchanged partitions of fact table as they might not be partitioned by day.

Now the above is all background. While investigating the issue, I was perplexed for some time when he showed me the issue. He created a temporary partitioned table using "create table as select .. the mview where 1=2" and while doing exchange partition he was getting ORA-14097.

Let me give you a simple test case:

SQL> create table test(a number);

Table created.

SQL> create materialized view log on test with rowid including new values;

Materialized view log created.

SQL> create materialized view test_mv refresh fast on demand with rowid as select * from test;

Materialized view created.

SQL> insert into test values(1);

1 row created.

SQL> commit;

Commit complete.

SQL> exec dbms_mview.refresh('test_mv','F')

PL/SQL procedure successfully completed.


Now both table and mview have on erow each.

Let's try and exchange partition of this mview with another table.

SQL> select partition_name from user_tab_partitions where table_name='TEST_PART';

PARTITION_NAME
------------------------------
SYS_P3446

SQL> alter table test_part exchange partition SYS_P3446 with table test_mv;
alter table test_part exchange partition SYS_P3446 with table test_mv
*
ERROR at line 1:
ORA-14097: column type or size mismatch in ALTER TABLE EXCHANGE PARTITION


I used CTAS, there is no question abt data type and column order.

SQL> desc test_part
Name Null? Type
----------------------------------------- -------- ----------------------------
A NUMBER

SQL> desc test_mv
Name Null? Type
----------------------------------------- -------- ----------------------------
A NUMBER


After doing some research, i got through this metalink article: 72332.1

According to that : "If a table has a FUNCTIONAL index on it there is an extra hidden column in COL$ which will cause an EXCHANGE to fail. Comparing USER_TAB_COLUMNS will not show this, nor will USER_UNUSED_COL_TABS but COL$ will show the difference."

SQL> select col#, name
2 from sys.col$
3 where obj# in
4 (select object_id from user_objects where object_name = 'TEST_MV');

COL# NAME
---------- ------------------------------
0 M_ROW$$
1 A


Now there you go - this M_ROW$$ was creating the problem for me.

Old saying - identifying is problem is 80% of tak that entails solving it.

Now - i created test_mv using primary key (not using rowid) and the whole exchange process worked fine!

For those who donot have PK in their tables can consider having a separate column which can be seeded using a sequence and treat that as PK to combat this issue.

Though it was trivial issue and solution. it kept me thinking for some time!


Friday, November 24, 2006

Star Vs Snowflake schema

Star Schemas

The star schema is the simplest data warehouse schema. It is called a star schema because the diagram resembles a star, with points radiating from a center. The center of the star consists of one or more fact tables and the points of the star are the dimension tables, as shown in figure.







Fact Tables

A fact table typically has two types of columns: those that contain numeric facts (often called measurements), and those that are foreign keys to dimension tables. A fact table contains either detail-level facts or facts that have been aggregated.


Dimension Tables

A dimension is a structure, often composed of one or more hierarchies, that categorizes data. Dimensional attributes help to describe the dimensional value. They are normally descriptive, textual values.
Dimension tables are generally small in size as compared to fact table.


-To take an example and understand, assume this schema to be of a retail-chain (like wal-mart or carrefour).

Fact will be revenue (money). Now how do you want to see data is called a dimension.



In above figure, you can see the fact is revenue and there are many dimensions to see the same data. You may want to look at revenue based on time (what was the revenue last quarter?), or you may want to look at revenue based on a certain product (what was the revenue for chocolates?) and so on.
In all these cases, the fact is same, however dimension changes as per the requirement.



Note: In an ideal Star schema, all the hierarchies of a dimension are handled within a single table.

Star Query

A star query is a join between a fact table and a number of dimension tables. Each dimension table is joined to the fact table using a primary key to foreign key join, but the dimension tables are not joined to each other. The cost-based optimizer recognizes star queries and generates efficient execution plans for them.






Snoflake Schema

The snowflake schema is a variation of the star schema used in a data warehouse.

The snowflake schema (sometimes callled snowflake join schema) is a more complex schema than the star schema because the tables which describe the dimensions are normalized.







Flips of "snowflaking"



- In a data warehouse, the fact table in which data values (and its associated indexes) are stored, is typically responsible for 90% or more of the storage requirements, so the benefit here is normally insignificant.

- Normalization of the dimension tables ("snowflaking") can impair the performance of a data warehouse. Whereas conventional databases can be tuned to match the regular pattern of usage, such patterns rarely exist in a data warehouse. Snowflaking will increase the time taken to perform a query, and the design goals of many data warehouse projects is to minimize these response times.


Benefits of "snowflaking"



- If a dimension is very sparse (i.e. most of the possible values for the dimension have no data) and/or a dimension has a very long list of attributes which may be used in a query, the dimension table may occupy a significant proportion of the database and snowflaking may be appropriate.



- A multidimensional view is sometimes added to an existing transactional database to aid reporting. In this case, the tables which describe the dimensions will already exist and will typically be normalised. A snowflake schema will hence be easier to implement.

- A snowflake schema can sometimes reflect the way in which users think about data. Users may prefer to generate queries using a star schema in some cases, although this may or may not be reflected in the underlying organisation of the database.

- Some users may wish to submit queries to the database which, using conventional multidimensional reporting tools, cannot be expressed within a simple star schema. This is particularly common in data mining of customer databases, where a common requirement is to locate common factors between customers who bought products meeting complex criteria. Some snowflaking would typically be required to permit simple query tools such as Cognos Powerplay to form such a query, especially if provision for these forms of query weren't anticpated when the data warehouse was first designed.

In practice, many data warehouses will normalize some dimensions and not others, and hence use a combination of snowflake and classic star schema.


Source: Oracle documentation, wikipedia