Tuesday, January 27, 2009

Automaticly Growing Time Dimension in Database

The following script works in Oracle and creates a table - actually a view - to be used as time dimension.

create or replace view timedimension
as
select
to_date('2009-01-01', 'YYYY-MM-DD') + level - 1 dt
from
dual
connect by
level <= trunc(current_date, 'DDD') - to_date('2009-01-01', 'YYYY-MM-DD') + 1;

/

The table contains one record per day and the dates run from 2009-01-01 till present, keeping it as small as possible - which will come handy when joining in queries.

Crosstab Column Headers

Previously I described a technique to add column headers to crosstabs and make them look like lists.
I found that this method sometimes leads to empty cells on the right hand side of the crosstab - I believe due to a bug in Cognos.

Here I describe another way to do it that works fine.

  • Add an empty first row to the crosstab by bringing in Crosstab Spaces.
  • Set the text of these... add one for each column on the left hand side of the crosstab.
  • Set "Define Contents = Yes" for all the facts cells in this first row
  • Unlock the editing (lock icon)
  • Drag a Text Item into each member cell and set the text to be the column heading of each column in the top (horizontal) side of the crosstab
  • Hide the topmost row of the corsstab by setting Box Type = None
a crosstab that looks like this:


| measure1 | measure2
----------------------------------------+----------+---------
dimension member 1 | dimension member 2 | <1234> | <1234>
+--------------------+----------+---------
| dimension member 2 | <1234> | <1234>
+--------------------+----------+---------

will look like


| measure1 | measure2
| |
-------------------------------------------+----------+---------
crosstab space | crosstab space | textitem | textitem
member1" | "member2" |"measure1"|"measure2"
----------------------+--------------------+----------+---------
dimension member 1 | dimension member 2 | <1234> | <1234>
+--------------------+----------+---------
| dimension member 2 | <1234> | <1234>
+--------------------+----------+---------

Monday, January 26, 2009

From DMR to "Real" OLAP

Recently I've been working to migrate existing report from DMR model to "real" OLAP cubes.
I haven't posted much because I was gaining experience with Cognos PowerPlay Transformer - which I used as OLAP server. In this article I will share some of my experiences.

Feeding a PowerCube with Data

Cognos offers different ways to set the data source for building a cube. One is more weird than the other.
It supports CSV files... great. Who wants to do serious BI and build analytical cubes from comma separated text files?
It support reports as data source... also great. I guess it gathers the queries defined in a report... very very weird.
Anyways... it offers one method which I would call "normal" and let's just forget about the rest: IQD.
IQD stands for Impromptu Query Definition... Impromptu was a name of their product or something like that... 
The point is it's SQL with some metadata around it and perfectly readable in a text editor.

Suggestions about IQD

First I used Framework Manager to generate IQDs by "externalizing" query subjects.
I found it unreliable... and it's not needed. Just write the SQL in a database utility and copy-paste into the IQD file.
Look at one IQD file in a text editor, it's a very easy format.

I found that the name of the query doesn't matter.

The columns in the SQL's result set are written after the SQL.
Be careful not to end these lines with commas as the comma becomes part of the column name and that's quite annoying.

On the long term I think it makes sense to have the cube update script create the IQDs. This way it can optimize the SQL that is run to generate the cube.
e.g. include  a time filter in the query and build the cube incrementally. 
Otherwise the same SQL would be run every time the cube is built... which means regenerating the cube from scratch. Even if the cube is smart enough (partitioned etc.) the query still returns way too many records.

Differences in Dimensional Model between PowerCubes and DMR

In DMR there are multiple dimension hierarchies and conformed dimensions. None of this is present in PowerCubes.
I simply did not find a way to have multiple hierarchies in the same dimension. If someone knows that this is possible please send a comment. IMO it is not supported in PowerCubes.

The lack of conformed dimensions mean that if say we have a Sales measure and an AdvertisingDollarsSpent measure and we store them in two different cubes and both have a Branch dimension then Cognos has no idea that a "branch A" in one cube is the same thing as "branch A" in another cube. It doesn't even know that January 1st is the same day in both...

Time Dimension

Even though it supports time dimension and can automatically generate the members (days) in it this doesn't always work.
It doesn't work for the topmost level (all days)... so if you are not using years, month, weeks - because like me you are using a custom calendar - then it won't work.
this sucks because it means you need to have a utility table in the database that contains a record for each and every day.
This can be nicely done by a well crafted view probably... if someone can share a script I'd appreciate it.

Dealing with Multiple Cubes from Reports

All this leads to the facts that writing reports is a lot harder. If you want to work with more than one cube things get more tricky...

And why wouldn't you?
Of course you can always group all measures used in a report into a single cube and problem solved... but I think it's bad practice. I shouldn't have to do that. Why adjust the datasource to the report?
Also... the cube update process can be parallelized if there are multiple simpler cubes rather than one giant cube.

So dealing with multiple cubes the problem is the filter page. You need to build the prompt on one dimension and that prompt won't work on another cube because it uses a different dimension (because as I just said dimensions are not conformed)

There is a way around it. If the dimensions look the same in all cubes you can just do some string replacement on the MUNs returned by the prompt macro to translate an MUN from one cube to another.

it looks something like:
#join(';', 
substitute('\[cube1\], '[cube2]', 
split(';', 
promptmany('param1', 'MUN')
)
)
)#
  • the value returned by promptmany() has to be split because otherwise substitute() would only replace the first occurence
  • then it has to be put together again with join()
  • square brackets have to be escaped with backslash in the call of substitute()
  • promptmany() returning MUNs uses semicolon as separator

So if you built the prompt using a dimension from cube1 you can still use the prompt in queries using cube2... all you need to do is use something like the abve uglyness instead of a simple #promptmany()#

Attributes

Another difference with DMR is the rather bad support for attributes. There are only 4 attributes to set. They are called "short name", "long name" etc
Actually the naming in Transformer does not match the naming in Report Studio... so you need to figure it out once.
This also sets a limit on the number of attributes... especially since one is used for ID and another for caption... so we are down to 2 which is free for any purpose.

I also found that you want to click the "refresh" checkbox in the level properties, otherwise the attributes may not be set properly when the cube is built.

Migrating Queries from DMR to Transformer

Originally I thought I would take the queries used in the DMr model "as is". Then I found myself rewriting them... though it does not take much effort since there is already some starting point.
In DMR I did as much aggregation on the database as possible to improve performance - which is horrible in DMR anyways.
In transformer I found it less important, since the cube will persist the aggregation the report actually run much faster. I decided to rather keep the SQL populating the cubes simple. This will hopefully lead to less errors and is easier to verify.
One can set the aggregation function to deal with duplicates. this is pretty much the same as aggregating in the DB using "group by".
I guess this approach could be disputed... it may make sense to aggregate n the DB server if the size of the record set returned is an issue during cube update.

Number format

I ran into this and gave me some headache. For measures that ae floating point numbers you have to set the precision to something other than the default zero... otherwise Cognos will round it to the nearest integer (so zero precision does not mean unlimited, it means zero decimals)

Hopefully these experiences will help you evaluate if you want to use PowerCubes and will save some headache while working with them... any questions let me know and I'll try my best to answer them in future posts.