Tuesday, April 22, 2008

why Filters Should Be Avoided When Working with a Dimensional Model

I can see three scenarios when you may think that you need to use a filter:

Filtering along the axes of a crosstab


Say you have a typical crosstab with one dimension along the horizontal axes and another one along the vertical axes. e.g.

<Sales Total> <Date> <Date>
<Product Type> <1234> <1234>
<Product Type> <1234><1234>

which would give you something like:

Sales Total 1/1/2007 ... 04/21/2008 04/22/2008
Camping Equipment 1,000$ ... 1,000$1,000$
... ... ... ... ...

You want to filter the time dimension you only report for the year 2008. Or you want to only include product types with "camping" in the name. What you really want in this case is to build a memberset that you will then use as the axes of the crosstab. When you drag-and-drop a dimension level in Report Studio it really translates to something like members([my_Dimension].[my_Hierarchy].[my_Level])
e.g.
members([Product Dimension].[Product Type Hierarchy].[Product Type])

(Check "Corercion Rules" in the Report Studio documentation.)

Instead of getting all members you need to build a member set. You can do this by putting members together... or taking members away from the full set.

The former could be done with techniques such as
[my_Dimension].[my_Level]->?Parameter1?
e.g.
ancestors([Time Dimension].[YMD Hierarchy].[Year]->?Year?, [Time Dimension].[YMD Hierarchy].[Day])

the latter could be done using the filter() function
e.g.

filter(
[Product Dimension].[Product Type Hierarchy].[Product Type], roleValue('_memberCaption', currentMember([Product Dimension].[Product Type Hierarchy])) contains 'camping')

Filtering along a dimension not present in the crosstab
...to continue the example above you may only want to display sales that were done by a certain branch. Branch does not appear on the axes of the crosstab... so again filter may seem like a good idea. What you really want to do here is to tune the measure that gets displayed. when you drag-and-drop a measure in Report Studio is really translates to something like value(tuple([measure]))
This tuple is incomplete, and will use currentMember() for dimensions that are present in the context and defaultMember otherwise.
e.g.
value(completeTuple([Sales Total], currentMember([Product Dimension].[Product Type Hierarchy].[Product Type]), currentMember([Time Dimension].[YMD Hierarchy].[Date], defaultMember([Branch Dimension].[Location Hierarchy].[Branch])))

The default member is the "all" at the top... in other words the Sales Total will be rolled up and you get the sum of sales done in all the branches. Instead of using a filter you need to tweak the tuple expression to use something other then the defaultMember()
e.g.
tuple([Sales Total], [Branch Dimension].[Location Hierarchy].[Branch]->?Branch?)
will translate to
value(completeTuple([Sales Total], currentMember([Product Dimension].[Product Type Hierarchy].[Product Type]), currentMember([Time Dimension].[YMD Hierarchy].[Date], [Branch Dimension].[Location Hierarchy].[Branch]->?Branch?))

Filtering based on some attributes of the facts
This is the last scenario I can think of ... this is when it goes wild.
You want to apply a filter to the whole thing. e.g. only count sales where the customer payed in US dollars.
You don't have a dimension but you know that you facts do have an attribute that you could use to build a filter.
e.g. Somewhere you had a Sale record with a CurrencyUsedToPay field.

What really happened here is... you missed to create a dimension.

If you use DMR a filter would probably work... it's just not god design. If you use real OLAP then everything is preaggregated... it's too late to try to filter the facts.
You need to revisit the design of your Sales Total cube.

One last though about DMR
Not only is the use of filters "not nice"... often it can give you incorrect results. To return to our first scenario... if you filtered product types using filter and one day you want to add a new row to your crosstab to display the sales total for all product types... you would be in trouble.

Sales Total 1/1/2007 ... 04/21/2008 04/22/2008
Camping Equipment 1,000$ ... 1,000$1,000$
... ... ... ... ...
Product Types (All) ... ... ... ...

The way Cognos processes the filter would add a WHERE clause to the SQL SELECT. At the end "all product types" would become "all the selected product types".

Conclusion

The Cognos Report Studio GUI does not distinguish between relational and dimensional models. It always looks the same. This is misleading because certain features should only be used when working with one type of model and not the other.
In case of DMR it's even worse because Cognos tries to interpret these (filter, join, usion) wvwn though it conceptually does not make sense. Sometimes you get what you wanted but often not.

In my opnion detail or summary filter should only be used when working with a relational model.

7 comments:

  1. In the section "Filtering along a dimension not present in the crosstab" why do you recommend the use a custom tuple() expression rather than the standard Cognos slicer?

    ReplyDelete
  2. Hi Alex

    Slicer is sort of brute force thing. It filters all measures.
    Tuple gives you finer grain control, you can adjust each metric individually.

    ReplyDelete
  3. This comment has been removed by the author.

    ReplyDelete
  4. I want to discuss the section "Filtering along a dimension not present in the crosstab".
    What can you suggest in such situation (described next)? Projects, Branches and Sales, user has access to some Branches, Branches are not presented in the crosstab.
    When user doesn't select a prompt value (with branch), he see Sales from all his branches.
    When user select a promt value, he see Sales from one branch.

    I can limit measure values for user by slicer. And use filter when user select a promt value.
    But how can I use tuple here?

    ReplyDelete
  5. Thanks a lot for the information on summary filters...

    Summary filters do not work on dimensionally modelled data soruces.

    ReplyDelete
  6. We are using Cognos8.2 against large Cognos Powercubes and have noticed that tuple( measure, dimension ) is significantly faster than using the equivalent slicer. So if you are using slicers in MDX reports, try the equivalent tuple() statement.

    ReplyDelete
  7. Thats is a great fromula for disaster...

    more selftaught Cognos Cowboys will keep our consultants in work forwever...

    I am grateful youre not a surgeon though...

    ReplyDelete