Monday, May 26, 2008

Design Utility

I had this idea to create a utility that can be used during the dimensional design process.

The design starts off from looking at the requirements.

  1. What are the reports?
  2. This is usually well known...

  3. What metrics are there?
    This is sometimes lesser known as the business analyst may not recognize that different metrics are really the same, just rolled-up differently.
    e.g. He/She may have a "sales total metric" and a "branch sale for the month" metric.
    I suggest using two terms to differentiate here e.g. in the requirements document call it metric, in the design document call it measure. "measure" can then become sort of the "normalized metric.

    Start the design by collecting these, why not enter this information into two tables: Report and Metric

  4. Collect the different levels where the metrics appear.
  5. Figure out which levels belong to the same dimension.
  6. What are the attributes that have to be displayed for each level?
  7. Categorize the levels into hierarchies. Is there a need for more than one hierarchy per dimension?

    Store all this into table also. So far we added Level, Attribute, Dimension and Hierarchy, HierarchyLevel

  8. Which metric appears at which level? What is the roll-up aggregation?
  9. Which report uses which metric at which level?

    Yes... store this info also in tables. Add MetricLevel, ReportMetricLevel

  10. ... and there you go!
You can use a simple Ms Access or OpenOffice Base database to store all this information.
Then you can create very interesting reports from this:

  1. Dimension hierarchy
    Report on the Dimension, Hierarchy, Level and Attribute tables
    Will describe what dimensions to create and how to set them up
  2. Cube design
    Report on the MetricLevel, Level, Hierarchy, Dimension tables
    Will describe what cubes to create, what are the roll-up aggregations
  3. Report usage of metrics
    Report on the ReportMetricLevel, MetricLevel, Level, Metric, Report tables
    Will describe how the requirements are met and how each report will be built from the dimensional model
  4. Star Schema design
    Report on the Metric, MetricLevel, Level, HierarchyLevel tables. Find the lowest granularity level in each hierarchy
    Will describe the star schema structure
A utility like this can be written in about a day or two in MsAccess.
It helps to verify the requirements and produces all sections (of what I can think of) of a "design document".

It can be reviewed and has sufficient information to start implementation.

Furthermore I would suggest to create a placeholder model i.e. one with no real values for measures, but all dimensions hooked up.
The bulk of the work goes into building the "real" metrics.

The placeholder model serves a double purpose:
  • Development can be split between report authors and modelers, here modeling really means implementing the logic for each metric.
  • A mock-up of each report can be created that in my opinion shoud still belong to the design phase. This can be reviewed with the customer and serves to verify both the design and the requirements.
If you are interested in this utility please let me know...

Wednesday, May 21, 2008

Cognos jobs are fragile

While working with cognos jobs to automate testing I found that jobs are sometimes not very well suited to do this.

The issue is that it's very easy to break them while working on the reports.

Certain types of updates to the report invalidates the job, e.g. if you rename a report or even if you use copy-paste through the clipboard.

Another problem is if there are any changes to the prompt pages or the parameter values. Since these are stored in the job it won't be able to fill out the prompt pages properly when it's trying to run the reports.

Conclusion

Since I don't know any better I still think it's a good idea to use jobs to automate unit and smoke testing. It's just that we have to keep an eye on them to make sure they're not broken.

Tuesday, May 6, 2008

Automating Test

I'd like to briefly share some approaches I came up with that fit in the bigger picture of "software lifecycle" on Cognos, namely how we can do a sort of automated regression testing, or smoke testing.

What should be tested

I would suggest testing all layers individually, not just the final reports, and always just test the extra functionality that the layer provides.

In case of dimensional modeling the three layers to test are
  • star schema
  • dimensional model
  • reports
Depending on whether you use DMR or "real" OLAP the star schema and the dimensional model may or may not be implemented using Cognos.
Doesn't really matter... it still needs to be tested.

How to test 

Testing the star schema is best done outside Cognos. There are two things to verify: 
First, the structure, that you really get as many rows (and columns) as you think you get.
The star schema contains the lowest granularity of each measure so there should be one and only one record for each combination of the dimension members.
And I would argue that there should be a record for every combination even if there was "no data". (One of the great inventions of extra early mathematicians was the number zero. Use it!)
Second, verify the numbers. Especially with complicated metrics.

To test the dimensional model you can create simple "unit test reports".
The purpose here is to verify that the dimensions are set up properly and associated properly with the measures - especially if this is a DMR model - and that the roll-ups are working properly.
A "unit test report" can contain one or more crosstabs where the measure (or group of measures) is displayed along two dimensions.
If you have more than two dimensions then you may want to create a giant crosstab with nested dimensions or multiple smaller ones.

To test the final reports... well just run them. It may be useful to create a checklist that you use to double check each small little thing. this would have items like "Page Number appears in footer", "title is all caps and centered" etc.

How much of this can be automated

Cognos has a feature called "jobs" that becomes very useful for testing.
Jobs basically let you preset the report parameters - by actually presenting the prompt pages but not running the report and then runs the report either right away or later at a scheduled time.

You can create the above mentioned "unit test reports" and add them all to a job.

You can also add all your reports fill out the prompt pages and use this job for smoke testing.
If a report cannot be run the job will show that the execution failed.

If you need to run the same report several times with different parameter settings then use what Cognos calls the "report view"... it lets you do exactly this.

The outputs of the reports can be saved. I suggest saving as XHTML or XML.

Regression testing

You don't have to go over each report result by hand. (or at least not every time)
You can save a reference version of each report result on your local file system and then use this to compare the results of the unit test or smoke test jobs.
You can write a simple script that downloads the latest report results (e.g. using wget) and compares it  (e.g. diffxml or Altova XML tools) with the reference version. This takes a couple hours to write but it's well worth it.
Cognos support a URL that looks something like this:
folder[name=...=]/defaultOutput(report[name=...])
What I'm trying to emphasize is the defaultOutput() so your script does not have to know the most recent version number or date of each report result.

The benefits once again:
smoke test job - allows one-click smoke test
unit test job plus script - allows regression testing (aka JUnit)

One last thing

Of course at least once you need to verify that your "reference version" of each report is correct. There is no way to automate that.
In my opinion the best approach is to construct some well know test data.
Depending on the business the logic behind each measure can get quite complicated and the ETL part - massaging the transactional data set until you get the measure values in the star schema - can become very very tricky.

I often here from tester that they want to see a query to verify that the numbers in the star schema are correct.
I think it's not a good approach. After all I was happy to create one query... now should I create a second one to verify the first one?
It's much easier and more solid approach to work with data where the expected values are well known. Then there is no need for queries to get the value of a measure.
e.g. if there were 2 sales a day in January and 4 sales a day in February then testing the Daily Average Number of Sales measure becomes easy...

I hope I gave you some ideas in regards to testing a Cognos solution...