Monday, January 26, 2009

From DMR to "Real" OLAP

Recently I've been working to migrate existing report from DMR model to "real" OLAP cubes.
I haven't posted much because I was gaining experience with Cognos PowerPlay Transformer - which I used as OLAP server. In this article I will share some of my experiences.

Feeding a PowerCube with Data

Cognos offers different ways to set the data source for building a cube. One is more weird than the other.
It supports CSV files... great. Who wants to do serious BI and build analytical cubes from comma separated text files?
It support reports as data source... also great. I guess it gathers the queries defined in a report... very very weird.
Anyways... it offers one method which I would call "normal" and let's just forget about the rest: IQD.
IQD stands for Impromptu Query Definition... Impromptu was a name of their product or something like that... 
The point is it's SQL with some metadata around it and perfectly readable in a text editor.

Suggestions about IQD

First I used Framework Manager to generate IQDs by "externalizing" query subjects.
I found it unreliable... and it's not needed. Just write the SQL in a database utility and copy-paste into the IQD file.
Look at one IQD file in a text editor, it's a very easy format.

I found that the name of the query doesn't matter.

The columns in the SQL's result set are written after the SQL.
Be careful not to end these lines with commas as the comma becomes part of the column name and that's quite annoying.

On the long term I think it makes sense to have the cube update script create the IQDs. This way it can optimize the SQL that is run to generate the cube.
e.g. include  a time filter in the query and build the cube incrementally. 
Otherwise the same SQL would be run every time the cube is built... which means regenerating the cube from scratch. Even if the cube is smart enough (partitioned etc.) the query still returns way too many records.

Differences in Dimensional Model between PowerCubes and DMR

In DMR there are multiple dimension hierarchies and conformed dimensions. None of this is present in PowerCubes.
I simply did not find a way to have multiple hierarchies in the same dimension. If someone knows that this is possible please send a comment. IMO it is not supported in PowerCubes.

The lack of conformed dimensions mean that if say we have a Sales measure and an AdvertisingDollarsSpent measure and we store them in two different cubes and both have a Branch dimension then Cognos has no idea that a "branch A" in one cube is the same thing as "branch A" in another cube. It doesn't even know that January 1st is the same day in both...

Time Dimension

Even though it supports time dimension and can automatically generate the members (days) in it this doesn't always work.
It doesn't work for the topmost level (all days)... so if you are not using years, month, weeks - because like me you are using a custom calendar - then it won't work.
this sucks because it means you need to have a utility table in the database that contains a record for each and every day.
This can be nicely done by a well crafted view probably... if someone can share a script I'd appreciate it.

Dealing with Multiple Cubes from Reports

All this leads to the facts that writing reports is a lot harder. If you want to work with more than one cube things get more tricky...

And why wouldn't you?
Of course you can always group all measures used in a report into a single cube and problem solved... but I think it's bad practice. I shouldn't have to do that. Why adjust the datasource to the report?
Also... the cube update process can be parallelized if there are multiple simpler cubes rather than one giant cube.

So dealing with multiple cubes the problem is the filter page. You need to build the prompt on one dimension and that prompt won't work on another cube because it uses a different dimension (because as I just said dimensions are not conformed)

There is a way around it. If the dimensions look the same in all cubes you can just do some string replacement on the MUNs returned by the prompt macro to translate an MUN from one cube to another.

it looks something like:
#join(';', 
substitute('\[cube1\], '[cube2]', 
split(';', 
promptmany('param1', 'MUN')
)
)
)#
  • the value returned by promptmany() has to be split because otherwise substitute() would only replace the first occurence
  • then it has to be put together again with join()
  • square brackets have to be escaped with backslash in the call of substitute()
  • promptmany() returning MUNs uses semicolon as separator

So if you built the prompt using a dimension from cube1 you can still use the prompt in queries using cube2... all you need to do is use something like the abve uglyness instead of a simple #promptmany()#

Attributes

Another difference with DMR is the rather bad support for attributes. There are only 4 attributes to set. They are called "short name", "long name" etc
Actually the naming in Transformer does not match the naming in Report Studio... so you need to figure it out once.
This also sets a limit on the number of attributes... especially since one is used for ID and another for caption... so we are down to 2 which is free for any purpose.

I also found that you want to click the "refresh" checkbox in the level properties, otherwise the attributes may not be set properly when the cube is built.

Migrating Queries from DMR to Transformer

Originally I thought I would take the queries used in the DMr model "as is". Then I found myself rewriting them... though it does not take much effort since there is already some starting point.
In DMR I did as much aggregation on the database as possible to improve performance - which is horrible in DMR anyways.
In transformer I found it less important, since the cube will persist the aggregation the report actually run much faster. I decided to rather keep the SQL populating the cubes simple. This will hopefully lead to less errors and is easier to verify.
One can set the aggregation function to deal with duplicates. this is pretty much the same as aggregating in the DB using "group by".
I guess this approach could be disputed... it may make sense to aggregate n the DB server if the size of the record set returned is an issue during cube update.

Number format

I ran into this and gave me some headache. For measures that ae floating point numbers you have to set the precision to something other than the default zero... otherwise Cognos will round it to the nearest integer (so zero precision does not mean unlimited, it means zero decimals)

Hopefully these experiences will help you evaluate if you want to use PowerCubes and will save some headache while working with them... any questions let me know and I'll try my best to answer them in future posts.

12 comments:

  1. IQD generation:
    Very interesting post which reflects our experiences with cubes. I do disagree with your statement around manually modifying IQD files, typically I think this is bad practice. For an incremental cube build the FM model would be built to join the fact table to a warehouse load control record which denotes which date range should be loaded into the cube. Hence the SQL is fixed.

    Multiple cube reports:
    The fact that two dimensions across two cubes (which are based on the same warehouse conformed dimension) have different MUNS is a nightmare. It really restricts cross subject area dashboard reporting where it would be nice to have a single prompt (for example date and location) which would filter each of the 4 subject area reports (based on 4 separate cubes).

    Alex

    ReplyDelete
  2. Hi Alex, you can use my technique to have the a single prompt working on 4 cubes.

    ReplyDelete
  3. Mr. Simon, THANK YOU!!! This post on prompting against 2 cubes is awesome. In fact, this whole blog is awesome! I would definitely buy the book version of this.

    By the way, I just resolved an issue using your technique. Basically, I had to union two very similar cubes and needed to prompt against the same dimension.

    I am surprised, amazed, and grateful for people like you who are so generous with their knowledge. Again, THANK YOU!

    ReplyDelete
  4. Hi, I accidentally found this article when looking for something else...BUT, I see you saying that you don't know about the possibility to have multiple hierarchies in a dimension, when using OLAP.
    Well, I tell you it's possible. But there is one limitation - all hierarchies have to end with the same level.

    For example, you have a product: Line, Type, Subtype, Brand, Product. Line:Product is 1:n, Type:Product 1:n, but for example Line:Type n:n etc.

    You can just create one hierarchy in Transformer, let's say Type-Line-Product.
    Then create a drill-down on Product(left click) and it creates an alternate drilldown, where you can add for example Brand only.

    That creates two drilling paths, Type-Line-Product, and Brand-Product.

    It may not be obvious, but those drillpaths are actually hierarchies :) Just have a look at the diagram in Transformer then, you are going to need some tuning...

    I don't know why the Transformer is "hiding" these facts, it's really not so obvious. But YES, you can have multiple hierarchies in OLAP cubes :)

    Matt

    ReplyDelete
  5. Thanks Matt, I'll check it out.

    ReplyDelete
  6. I think that the single hierarchy limitation was removed in either 8.3 or 8.4. This removes an annoying limitation in the Cognos implementation.

    ReplyDelete
  7. The single hierarchy limitation was removed in PowerPlay 4.1 in 1996. Cognos assumes people will either read the Transformer documentation (yes, it's fully documented) or take the class (yes, there are many demos of this feature). If you do neither, you are prone to making poor choices for your implementation.

    ReplyDelete
  8. Question: We're using MSAS cubes with cognos 8.4. When creating crosstabs in report studio, cognos pulls data & creates a temporary virtual cube from the source with a maximum of 65 million tuples. This can both be slow and cause some large reports to fail (due to the nature of our business, some of our dimensions are quite large).

    Does anyone know if that happens when running reports off of cubes created with transformer? is there another way around this? we'll do pretty much whatever it takes to get around this issue.

    ReplyDelete
    Replies
    1. Creating the cube with Transformer and then using dimensional functions to accurately extract the specific members required for your report will certainly improve reporting performance. Your build time in Transformer is likely to be the bottleneck, however once this the values are stored inside the cube and optimized for retrieval by the Cognos studios. There is no need to 'spin up' temporary cube and retrieved members will be cached for reuse.

      Delete
  9. Hi,
    I have tried your formula for promptmany and it works great! But, only if I don't have Prompt page. As soon as I create prompt page for prompts used I get error message : Cannot parse an empty expression.

    Do you have any solution for that?

    ReplyDelete
    Replies
    1. Try setting a default value in your macro prompt statement. This allows the query engine to parse the code regardless of what's in the prompt page.

      Delete
  10. You seem to be confused about DMR and OLAP in your information regarding support for attributes.
    The 4 attributes your list are those found in PowerCubes (OLAP). These names reflect the properties from the vendor's model, which in this case is Transformer. There are only 4 and they can't be renamed because a Transformer model has a fixed number of properties associated with a member.
    Attributes in DMR will also reflect properties from the vendor's model, which is Framework Manager. In this case the 'properties' are the fields in the projection list of the associated sql statement. These fields can be renamed and any combination of relevant fields can be associated with a level in the DMR hierarchy.
    In both cases Report Studio does not impose its own naming convention, it simply reflects the names in the model. Transformer does not allow you to rename, Framework Manager does.

    ReplyDelete