Wednesday, February 27, 2008

How to write a report on a dimensional model - part2

This is part 2 of my little essay about how to get the values that you want show up on the report where you want them.
The first part  talked about the tuple() function. If it wasn't clear enough, it will be... 
Crosstabs and tuple() are so related it's difficult to explain one without the other.

If you've used pivot tables in Excel then roughly that's what you can expect from a crosstab.
Crosstab is a table with values in the middle and headers along the edges. 
These headers can be 
  • labels for the numbers we see in the middle of the table
  • members of a dimension so that we give some context to those numbers.
Tables are the best way to "flatten" the dimensional data since most of us are not comfortable looking at 5 dimensional Rubik cubes :)

Crosstabs do provide a way for displaying multi ( > 2) dimensional data in a way that we can digest.
I'll provide examples going from the simplest case towards the more difficult.

A crosstab axes contains either a member set or a tuple() expression. (Even when it looks just like a simple measure.) Member sets are "expanded" during runtime - thereby creating multiple rows or columns. tuple() expressions are evaluated for every cell in the row or column.

I will use the same example as in part one:
Sales - measure
Branch, Product, Time - dimensions

1., one dimension - one measure

measure   |
----------+-------
dimension1| <1234>

The crosstab will work much like a simple list. When you think you need only a list use this instead. Lists are not meant for dimensional model.

You may notice that the horizontal axes is left empty. That's OK.

The upper left corner of the crosstab ddisplays the default measure. You can drag a measure there or into the "content" area.
Most people get confused by this and think that the crosstab can only display one measure.
It's not true. As you can see later a crosstab can display as many measures as you want.

The default measure is used to evaluate tuple() expressions where the measure is not specified.
I'll show examples of that later. Just keep it in mind.

e.g.

Sales |
------+-------
Branch| <1234>

"Sales" really becomes tuple(Sales)

Sales                    | 
-------------------------+---------
Richmond AutoMall        | 450,000
-------------------------+---------
North Vancouver AutoMall | 129,000
-------------------------+--------- 
Kingsway Nissan          |  39,000
-------------------------+---------

The dimension is expanded.
The tuple expression is evaluated in the context of the crosstab like this
tuple(Sales, currentMember(Branch), defaultMember(Time), defaultMember(product))

The defaultMember practically means Product and Time resulting in rolled-up values along these dimensions.

2., two dimensions - one measure

measure    | dimension1
-----------+-----------
dimension2 | <1234>

This is the "classical" crosstab.

e.g.

Sales | Time
------+------
Branch| <1234>

Note that "Sales" is still tuple(Sales)
that gets evaluated like
tuple(Sales, currentMember(Branch), currentMember(Time),  defaultMember(Product))

Both dimensions will be expanded resulting in something like this:

Sales                    | 1/1/2008 | ... | 1/31/2008
-------------------------+----------+ ... +----------
Richmond AutoMall        | 20,000            17,000
-------------------------+----------+ ... +----------
North Vancouver AutoMall | 23,000                 0
-------------------------+----------+ ... +----------
Kingsway Nissan          | 18,000            24,500


3., one dimension - many measures

The crosstab can have multiple measures. One of them is the default.

measure1   | measure1 | measure2  ...  measure_n
-----------+----------+---------+ ... +---------
dimension1 | <1234>   | <1234>  | ... | <1234>

In this example measure1 is the default measure.

Having a default measure we can do something like this:
fusion_sales := tuple([Product]->[Ford Fusion])

Sales  | Sales | fusion_sales
-------+-------+-------------
Branch | <1234>| <1234>

Note that the tuple() expression only specified the product dimension.
Cognos will figure out from the context that we are talking about Sales; because Sales is the default measure.
fusion_sales will be evaluated like this:
tuple(Sales, currentMember(Branch), defaultMember(Time), [Product]->[Ford Fusion])

It will give the total of Ford Fusion sales per branch.

The arrow ``->'' is Cognos's notation for specifying a member.
Try dragging in a member from a dimension... you'll see.  

(It's a powerful tool to create "custom measures" so to speak. So we won't have "ford Fusion Sales" as a separate metric in our cube..)

4., three or more dimensions - one measure

When we need more than 2 dimensions we have to nest them into one another.

measure                 | dimension3
------------------------+-----------
dimension1 | dimension2 | <1234>
           +------------+-----------
           | dimension2 | <1234>
-----------+------------+-----------

e.g.

Sales            | Time
-----------------+-------
Branch | Product | <1234>
       +---------+-------
       | Product | <1234>
-------+---------+-------

as the dimensions are expanded all combinations will be created

Sales                                    | 1/1/2008 ...
-----------------------------------------+---------
Richmond Autmoall        | Ford Fusion   | 
                         +---------------+---------
                         | Nissan Maxima |
                         +---------------+---------
                         | Honda Civic   |
-------------------------+---------------+---------
North Vancouver Automall | Ford Fusion   |    *
                         +---------------+---------
                         | Nissan Maxima |
                         +---------------+---------
                         | Honda Civic   |

* Sales of Ford Fusion at North Vancouver Automall on 1st Jan

Note how the nested dimension (Product) is repeatedly expanded for every member of the outer dimension (Branch).

5., many dimensions  - many measures

Using nesting you can go wild!

Measure1                | dimension1          |
                        +---------------------+
                        | measure1 | measure2 |
------------------------+----------------------
dimension2 | dimension3 | <1234>   | <1234>   |
           +------------+----------+-----------
           | dimension3 | <1234>   | <1234>   |
           +------------+----------+----------+                

In the next part I'll show some practical examples and talk about member sets.
Just to wet your apetite... think about not having the entire dimension along the axes, only some specific members you want...

4 comments:

  1. We just bought Cognos 8.4.1 upgrading from Cognos Express which we bought in Nov. '09. I just found this beautiful blog today while trying to fix a crosstab issue. I'll write a review of Cognos Express (which we're happy to be rid of) later this week. I plan on being a full participant of this blog and would appreciate some clarification on the following issue.

    Enviornment:
    Data source: MSAS Cube on SQL Server 08
    Cognos Version: 8.4.1 32bit running on Server 08 64bit

    Background:
    We're a small business which in only the last 10 months has started investing in BI. As our cubes are now becoming available and we bought cognos 8 were switching from Cognos Express List reports on relational data to Cognos 8 Crosstab reports on MSAS Cubes.

    Issue:
    As in sections 4 & 5 of the article "How to write a report on a dimensional model - part2" I need nesting to appropriately display data for subcategories with subtotals and so forth. The problem is I'm new to slicers and all of how they work, including syntax. Using simple slicers the data returned is accurate (verifying totals) but with all the extra rows the report is slow and useless.

    For example, July 1st 2010 should return 45 rows but the cross join between dimension1(1430 rows) and dimension2(235 rows) returns about 336k rows in the crosstab (1430x235).

    I need to figure out how to filter out rows prior to doing the cross join. I found a cognos doc (not in manual) about doing this in general but I still can't grasp how it's done. Here is the doc (http://www.adaptiveusa.com/ASI%20Links/Creating_Better_OLAP_Queries_in_Cognos_8_Presentation.pdf). I've been working with cognos support for days and provided them with the doc but they've not been helpful so far.

    An example item for dimensions to slice by is this:
    [product_type]
    - [a]
    - [b]
    - [c]
    Where our business is split into 3 primary segments based around the product type, the report is only for one of the segments. I'll also need to slice by start & end date using a prompt, but lets do 1 thing at a time :)

    Can anyone help me understand how to do this (return less rows in the crosstab by filtering rows prior to cross join & calculations)? A sample with some syntax would be amazing.

    Thank you in advance,
    CNBitter

    ReplyDelete
  2. I just found this blog yesterday and it's awesome. I'll post a review of cognos express later in the week.

    We just switched from cognos express with relational sources to cognos 8.4.1 with MSAS cubes and I'm having a problem getting the crosstab to function as needed. As section 4-5 of the article above, nesting results in multiplication of dimensions. I need to do this but remove the majority of the rows from both dimensions before the cross join. After the join I need to filter out & remove a few more rows based on some values rather than leaving the row there but fields empty. The filtering needs to be done before the cross join because dim1 has 235 rows, dim2 has 1430. With the basic slicers I get 336k rows (235x1430) where only 45 have data (which is accurate). This kills the performance and usefulness of the report.

    Does anyone know how to slice/filter/whatever to remove rows before the cross join so they're never calculated or show up in the report? I found the following congos doc on it but am not fully grasping how to do it since I'm new to crosstabs, slicers, and their syntax.(http://www.adaptiveusa.com/ASI%20Links/Creating_Better_OLAP_Queries_in_Cognos_8_Presentation.pdf)

    Thanks in advance, cnbitter

    ReplyDelete
  3. Ok, I've some time today so here's the update to the problem I previously posted just above. The issue I had at that time was a non-issue. 336k rows is not a problem. Cognos hits a cap & fails when you reach 65 million intersections/tuples. There can be performace issues before that though, depending on your setup.

    While the item I listed for 336k rows wound up being a seperate issue, the problem being addressed is a real problem that I ran into. Without following the ideas in the PDF I linked (& there is a similar document in the cognos proven practices area of their support site) one of my reports had 2.6 trillion intersections, most of which were empty. The resolution of that problem was to create restricted data items in the report then use those for the row edge items in the crosstab. One of the things I included in my data item filters was to remove items in the dimension that weren't >0. This also made it so I don't have to apply row suppression.

    There is another issue that comes up though. When using this method and nexting the row items, subtotals will be wrong if calculated within sets. This is because each level of sub/grand total will be calculated for the level the set it's associated with is at. This can be corrected by calculating the totail within detail (as if you were using detail/summary filters). But depending on your report this can kill performance to where the report becomes inoperable.

    Cool stuff, good luck.

    ReplyDelete
  4. Another update/note: analysis studio in many cases isn't actually doing aggregations. For totals often times it will just insert the next item up in the hierarchy (if you're displaying cities, enter a state/country item for subtotal). Then apply slicers (which may be duplicating what you've implemented in restricting the data items) to narrow that hierarchy item to only include what you need for a totals row/col. Depending on what you're doing, this can be a very useful solution for crosstabs.

    ReplyDelete