Search This Blog

Monday, December 29, 2008

EXISTING and query context

There had been some posts discussing the multi select problems in MDX. If, for example, you have a calculated measure using CURRENTMEMBER, the measure might not be computable when selecting a set as a WHERE condition. You can take a look at Mosha's blog for more information on this topic (see http://www.sqljunkies.com/WebLog/mosha/archive/2005/11/18/multiselect_friendly_mdx.aspx). Usually you can avoid the pitfalls of the multi select be changing the definition of your computed measures. The EXISTING operator is very helpful in this case. For this entry I concentrate on the EXISTING operator and the pitfalls you are facing.

To illustrate the problem, here's a simple example:

WITH
MEMBER MyCounter AS
count(
Filter((
[Date].[Calendar].[Date]),
([Measures].[Gross Profit Margin] > 0.42)))
MEMBER NumDays AS
count((
[Date].[Calendar].[Date]))
SELECT
{
NumDays,
MyCounter
} ON 0
FROM [Adventure Works]

The above MDX defines a measure (MyCounter) that counts the days for which the gross profit margin (in total) was above 42%. It also returns the number of the days selected. Running the query in the Adventure Works database returns something like this:

image3

So there are 1158 days in the Date attribute und 249 of them are having a gross profit margin above 42%.

Please note that it's much better to rewrite the Count(Filter(...)) to a Sum(...) expression as shown below as this performs better on SQL Server 2008 Analysis Services. Mosha wrote an excellent entry on SQL Server 2008 block mode computations which can be found here: http://sqljunkies.com/WebLog/mosha/archive/2007/11/23/92198.aspx

Sum((
[Date].[Calendar].[Date]),
iif(
([Measures].[Gross Profit Margin] > 0.42),1,0))

For this post however I keep the Count(Filter(...)) as I think it's easier to read.

Up till now there hasn't been anything special. But now we're going to add a WHERE statement to the query above:

WHERE
[Date].[Calendar].[Calendar Year].&[2004]

If you are not familiar with MDX set operations you might expect a different query result after adding the WHERE condition. But this is not correct. The query give the same result, it seems to completely ignore our condition. The reason for that is that set operations like above are not aware of the current context defined by our WHERE condition.

This is the reason why in SQL Server 2005 the new EXISTING operator was added. The EXISTING operator forces the set operation to be computed inside the current context. So we're simply adding this operator to our query like this:

WITH
MEMBER MyCounter AS
count(
Filter((EXISTING
[Date].[Calendar].[Date]),
([Measures].[Gross Profit Margin] > 0.42)))
MEMBER NumDays AS
count((EXISTING
[Date].[Calendar].[Date]))
SELECT
{
NumDays,
MyCounter
} ON 0
FROM [Adventure Works]
WHERE
[Date].[Calendar].[Calendar Year].&[2004]

With this easy change to our query we get the desired result:

image7

For example think of a KPI or calculated measure. Without the EXISTING operator the calculation could be wrong when selecting multiple days but with this easy change everything seems to be fine. But it's not that easy. To illustrate the problem let's turn our computation into a cube script calculated member and access this calculation by Excel 2007:

CREATE MEMBER CURRENTCUBE.[MEASURES].MyCounter AS
count(
Filter((EXISTING
[Date].[Calendar].[Date]),
([Measures].[Gross Profit Margin] > 0.42)));

CREATE MEMBER CURRENTCUBE.[MEASURES].NumDays AS
count((EXISTING
[Date].[Calendar].[Date]));

Now when accessing this member in Excel 2007 you'll see that the EXISTING fails if more than one day is selected as shown below:

image_thumb

While the calculation works fine as long there is only one day selected, it seems to be ignored when there is more than one element selected:

image_thumb1

(The value 249 is the value from our first computation that had no WHERE statement at all)

The reason for this behavior is that Excel 2007 uses subcube queries in order to improve query performance. And for some reason, EXISTING doesn't work in the subcube scope. The query that was executed by Excel can easily be grabbed using the SQL Profiler). By the way, another cool way of capturing this query in Excel 2007 is to write a macro, as shown by Marco Russo in his blog.

SELECT
FROM
(
SELECT
{
[Date].[Calendar].[Date].&[1104],
[Date].[Calendar].[Date].&[1105]
} ON COLUMNS
FROM [Adventure Works]
)
WHERE
[Measures].[MyCounter]

The fact that EXISTING doesn't really work with subcubes is somehow fatal as we like to design our cube calculations in a way they are compatible with all kinds of clients. The problem is that errors may not be easy to see and could be misleading.

So finally, even with the EXISTING keyword, multiselects are still an issue to really take care of in MS OLAP.

Friday, December 5, 2008

Formatting actual and goal for a KPI

Being able to specify the preferred format for measures and calculated measures in SSAS is a great feature since you just do the definition once and clients can leverage this central definition.

But for a KPI expression (value, goal, status, trend) it's not directly possible to specify the format. But there is still a simple trick to do so: You just have to define the calculation itself as a calculated member within your cube script like this:

create
member currentcube.Measures._BacktestingTotal as 100, visible=0

member currentcube.Measures._BacktestingValue as 5, visible=0 , format_string = "0.00%"

member currentcube.Measures._BacktestingTarget as 5, visible=0 , format_string = "0.00%"

I removed the real computation formulas for the three values and replaced them by constants because they would simply be confusing. But you can see how the format string is applied here. These members shouldn't be visible, so I set visible=0.

In our KPI definition you can simply point the expression to the calculated member like this:

image_thumb1

By doing so, the format of the calculated member defines the format for the KPI value, target etc.

Saturday, November 15, 2008

Turning a non-natural hierarchy into a natural hierarchy

In the last post we had a very simple time dimension with a non-natural hierarchy. In this post I show how to turn this hierarchy into a natural hierarchy.

The basic idea is to modify the key column for the month attribute. As we still want to keep the original month attribute, I simply create a new one named 'H_Month' ('H' for hierarchy as we only want to use it in the hierarchy).

image_thumb

H_Month is just a simple copy of the month attribute. Now we edit the key columns of this attribute in the attribute's properties by clicking on the ellipsis:

image_thumb1

In the DataItem Collection editor we choose 'Add' which gives us a new key column:

image_thumb5

Now we can edit the source for the new key column by clicking on the ellipsis in the source field. We choose 'Column binding' as the binding type and take 'Year' as the key column as shown below:

image_thumb7

After clicking 'OK' we rearrange the key columns so that the Year-column comes first. The result looks like this:

image_thumb9

Before we can proceed we also have to set the name property of our attribute 'H_Month' because now we have more than one key column and therefor the name cannot be derived from the key. We choose the Month-attribute as the name:

image_thumb13

Since we also want the months to be sorted correctly we set the OrderBy property of our new attribute to "Key":

image_thumb14

Now we can process and check our new attribute H_Month. The dimension browser shows something like this:

image_thumb16

Note that the month number is now repeated for every year. So we're having one member in our attribute for each year now. As this attribute is not what the user expects after selecting a month, we simply set this new attribute to invisible:

image_thumb18

Now we can build our attribute relationship like this:

image_thumb20

This relationship is correct because for each month in the attribute H_Month (defined by the key year/month) we only have one year as a parent. We can check this by using the BIDS Helper dimension health check:

image_thumb22

Now we only have to modify our hierarchy by switching the attribute in the second level:

image69

image_thumb29

After doing so the yellow warning sign disappears and we are finished! Since the Name property for our level remains unchanged, the hierarchy looks the same as before:

image70

You can check the dimension using the dimension browser in BIDS. Every year now has every month associated with it.

Of course modeling attribute relationship becomes more complex the more attributes you have to take into account. The dimension lettuce tool in BIDS Helper can be very useful in checking the dimension layout in SQL 2005. And with SQL 2008 the dimension editor shows the attribute relationships in a much more convenient way.

Tuesday, October 28, 2008

Attribute relationship example

Setting proper attribute relationships for the attributes of a dimension is essential for query performance and usability. As Mosha wrote in his blog, it might be the most important advice for cube designers to set the attribute relationship correctly. The full article can be viewed here and it is very detailed.

I still saw some cases recently where the improper use of attribute relationship settings caused wrong query results (which might even be worse than a low performance), so I decided to post a wrap up of this topic here.

First, you need to know that setting attribute relationship has these benefits:

  • Better query performance
  • More efficient methods for the cube to store the data of the hierarchies
  • Easier way to write MDX queries (as the dependant attribute is correctly positioned and can be referenced e.g. by currentmember), see below for an example
  • Dependant attribute may be used as metadata (especially helpful when the dependant attribute has it property "Enable attribute hierarchy" set to "false")
  • Higher ranked attribute may be used as "Order-By"-attribute.

Let me provide an example for the MDX query. Think of a sample cube with a date dimension containing two attributes for month and monthname and think about the following query:

with
member MonthMember as [Dim Time].[Monthname].currentmember.NAME

select MonthMember on rows,
[Dim Time].[Month].[Month] on columns

from [TestCube]

if there is no relationship between the month and monthname-attribute the monthname-attribute is not changed when selecting the month-attribute and therefor resides its default which is 'All'.

image_thumb1

But if there is a relationship, binding the monthname-attribute to the month-attribute, the monthname-attribute is automatically set to the corresponding member:

image_thumb31

This shows how query syntax is influenced by the proper use of attribute relationship. In the first example we had to try something with the linkmember() MDX-function to achieve the same result. So designing the attribute relationship in a proper way leads to easier to understand queries with a better performance.

The data was loaded from a very simple date dimension table like shown below:

image_thumb18

The attribute relationship can be modeled in BIDS when designing a dimension. After creating the dimension using the wizard, all attributes are associated to the key attribute of the dimension as shown below:

image_thumb3

The attribute 'Month' and 'Monthname' have no relationship but are simply tied to the attribute "Dim Time' which is the key attribute for our simple time dimension. If you build a hierarchy, like the YearMonth-hierarchy in the screenshot above, you'll notice a yellow warning sign informing you that there is no relationship between Month and Year. You'll get further information when placing the mouse coursor on the warning sign:

image_thumb7

Whenever you see this yellow warning sign we're talking of a non-natural hierarchy.

The relationship can easily be set by dragging the 'Year' attribute to the 'Month' attribute. Now the same dimension looks like this:

image_thumb5

Notice that the yellow warning sign in the hierarchy title has disappeared. In this case we're talking of a natural hierarchy.

But be warned: This doesn't mean that your dimension is healthy now. For our simple example we actually made a big mistake. If you check the dimension in the dimension browser, you'll see something like this:

image_thumb9

There are no more months for the years 2006 to 2008 anymore! This also means that no fact data will be displayed for those months and that the aggregation will be wrong.

If you model an attribute relationship like Month -> Year you have to be absolutely sure that every month corresponds to exactly one year. You can verify this using a simple SQL query like the one below:

select count(Year)
from dbo.DimTime
group by Month
having count(Year)>1

This query should return no rows at all but in our case it returns

image_thumb10

So every month in our source table is mapped to four years which breaks our attribute relationship above.

If you're having BIDS Helper installed on your machine you can also do a dimension health check from the dimension context menu.

image_thumb11

In our case the result looks like this:

image_thumb17

You can clearly see the rows that break our relationship.

So in this simple example it would have been a bad advice to simply set the attribute relationship as we did above. You should always check the relationship in your source data before designing such a relationship.

And even if the users say that the relationship is unique, you shouldn't rely on that. For example a product might belong to a unique product group. But what happens if the association changes over time and you want to keep track of the changes using a slowly changing dimension of type 2. This can also easily break up your attribute relationship. And as the MDX results might differ after changing the relationship, this could also have an influence on existing reports. So it's a good advise to carefully design your hierarchies before you start creating reports, because it's not difficult turning a non-natural hierarchy into a natural one. I'll post an example for this in the next post.

Tuesday, October 7, 2008

Returning MDX query results in an Excel matrix formula

In my last post I had an example for a function to return a single value from an MDX query which has certain performance drawbacks when being used in many Excel fields.

Since Excel supports matrix formulas to do a computation on multiple values, this can also be used for MDX query results. Another benefit for this approach is, that dimension members that are added later on, are also displayed in the query result.

Before I post the function code, here's the result. For my test, I used this MDX query on the Adventure Works cube:

SELECT
{
[Measures].[Internet Sales Amount],
[Measures].[Internet Order Quantity],
[Measures].[Internet Gross Profit]
} ON 0,
[Product].[Category].[Category] ON 1
FROM [Adventure Works]

The result, when being called from SQL Server Management Studio, looks like this:

image4

The following screenshot shows how this result looks like in Excel using the matrix MDX formula:

image9

Now, what are the #N/A entries about? The matrix formula measures the size of the resulting cell area and returns a two dimensional array of exactly that size. When entering the formula I chose a larger area on the worksheet in order to be prepared for additional entries on both axes. Of course you could think of returning a larger matrix from the function in order to properly fill out the unused cell. The best approach would be to supply width and height as parameters.

So, this is how the formula is to be entered:

image_thumb14

Be sure to press CTRL+ALT+ENTER when entering the formula in order to turn it into a matrix formula.

The MDXGetMDX function takes five parameters:

Parameter Description Example
Server Hostname of the SSAS server localhost
InitalCatalog Name of the SSAS database Adventure Works DW
Cube Name of the SSAS cube Adventure Works
mdx MDX code of the query select ... on 0,
... on 1
from cubename
WithCaption When true, captions are printed true

Finally, here's the code I used for the MDGetMDX(...) function:

Function MDGetMDX(Server As String, InitialCatalog As String, Cube As String, mdx As String, WithCaption As Boolean) As Variant
On Error GoTo errorhandler:
Dim cset As New ADOMD.Cellset
Dim conn As New ADODB.connection
Dim x As Variant
Dim i As Integer, j As Integer
Dim i0 As Integer, j0 As Integer ' begin of the data area
Dim i1 As Integer, j1 As Integer ' size of the data area
conn.Open "Data Source=" & Server & ";Provider=MSOLAP;Initial Catalog=" & CatalogName & "" & InitialCatalog & ""
cset.Open mdx, conn
If cset.Axes.Count > 2 Then
MDGetMDX = "More than 2 axes are not allowed!"
Exit Function
End If
If cset.Axes.Count > 0 Then i1 = cset.Axes(0).Positions.Count Else j1 = 0
If cset.Axes.Count > 1 Then j1 = cset.Axes(1).Positions.Count Else j1 = 0
If WithCaption Then
' column headings are displayed as rows
If cset.Axes.Count > 1 Then i0 = cset.Axes(1).DimensionCount Else i0 = 0
' row headings are displayed as columns
If cset.Axes.Count > 0 Then j0 = cset.Axes(0).DimensionCount Else j0 = 0
Else
i0 = 0
j0 = 0
End If
If cset.Axes.Count = 2 Then
ReDim x(j0 + j1 - 1, i0 + i1 - 1)
ElseIf cset.Axes.Count = 1 Then
ReDim x(j0, i0 + i1 - 1)
Else
ReDim x(1, 1)
End If
For i = 0 To UBound(x, 2)
For j = 0 To UBound(x, 1)
x(j, i) = ""
Next
Next
' Show caption:
If WithCaption Then
For i = 0 To i1 - 1
For j = 0 To cset.Axes(0).Positions(i).Members.Count - 1
x(j, i + i0) = cset.Axes(0).Positions(i).Members(j).Caption
Next
Next
For j = 0 To j1 - 1
For i = 0 To cset.Axes(1).Positions(j).Members.Count - 1
x(j + j0, i) = cset.Axes(1).Positions(j).Members(i).Caption
Next
Next
End If
If cset.Axes.Count = 2 Then
For i = 0 To i1 - 1
For j = 0 To j1 - 1
x(j + j0, i + i0) = nz(cset(i, j).Value, 0)
Next
Next
ElseIf cset.Axes.Count = 1 Then
For i = 0 To i1 - 1
x(j0, i + i0) = nz(cset(i).Value, "")
Next
Else
x(0, 0) = cset(0).Value
End If
MDGetMDX = x
cset.Close
conn.Close
Exit Function
errorhandler:
MDGetMDX = Err.Description
End Function

Function nz(x As Variant, other As Variant) As Variant
If Not IsNull(x) Then
nz = x
Else
nz = other
End If
End Function

Saturday, September 13, 2008

Cubevalue function for Excel 2002/2003?

Excel 2007 offers a lot of new functions for retrieving data from a SQL Server Analysis cube, like CUBEVALUE. While similar functions are also provided by the Excel 2002/2003 Add-In, you could also add a vba module in order to implement such functions in Excel 2002/2003.

First, simply create a new module and put this function inside the module:

Function MDGet(Server As String, InitialCatalog As String, Cube As String, ParamArray DimensionMembers() As Variant)
On Error GoTo errorhandler:
Dim cset As New ADOMD.Cellset
Dim conn As New ADODB.connection
conn.Open "Data Source=" & Server & ";Provider=MSOLAP;Initial Catalog=" & CatalogName & "" & InitialCatalog & ""
mdxstring = "Select from [" & Cube & "] where ("
For i = 0 To UBound(DimensionMembers)
mdxstring = mdxstring & DimensionMembers(i) & ","
Next i
mdxstring = Left(mdxstring, Len(mdxstring) - 1) & ")"
cset.Open mdxstring, conn
MDGet = cset(0).Value
cset.Close
conn.Close
Exit Function
errorhandler:
MDGet = Err.Description
End Function

In order to get this to work you need references to "Microsoft ActiveX Data Objects (Multi Dimensional)" and "Microsoft ActiveX Data Objects 2.8 Library" as shown in the screenshot below:

image_thumb

The new function takes three or more parameters:

Parameter Meaning Example
Server Hostname of the SSAS server localhost
InitialCatalog Name of the SSAS database on this server Adventure Works DW
Cube Name of the cube in this database Adventure Works
ParamArray Zero, one or more axes members [Measures].[Internet Sales Amount], [Date].[Calendar Year].&[2004]

The function can be used like any other Excel built-in function. You can also use the dialog for entering the formula (by clicking on the image_thumb2 icon on the left side of the entry field).

image_thumb3

The above parameters simply result in this formula:

=MDGet("localhost","Adventure Works DW","Adventure Works","[Measures].[Internet Sales Amount]","[Date].[Calendar Year].&[2004]")

Of course you will want take the server, database and cube name from Excel cells in order to change parameters more easily. And you can take the axes members from other fields as well.

The final result may look like this:

image_thumb5

Note: Server, Database and Cube are aliases for B1, B2 and B3 respectively.

For larger Excel sheets using this function for many cells, you should consider some kind of connection handling in your code because opening and closing the connection for each cell has some performance drawbacks.

Another way to enhance performance is to return a matrix from the function, so one mdx query results in many field values on your Excel sheet. I'll post the code for this in my next blog entry.

Of course, the Excel 2007 functions are much more powerful. For example, in Excel 2007 you can define sets and address them in other cells. And you don't need the string concatenation we did above in order to turn the name of an element (e.g. 2004) into its unique dimension name (e.g. [Date].[Calendar Year].&[2004]) as this can be done more easily with the new CubeMember-function in Excel 2007. But I still think the above function may be useful for some cases where you don't have Excel 2007 installed.

Friday, August 15, 2008

How to round detail rows and totals?

There had been quite some posts about rounding values in MDX or on a report. For most cases we want to do the rounding in the presentation layer of the application (like with other formatting issues too) using formatting or maybe built-in round functions.

One issue with rounding is that the sum over the rounded details does not necessarily match the rounded total as in the following simple example:

Region Exact value Rounded for display
A 0.25 0
B 0.25 0
C 0.25 0
D 0.25 0
Total 1.00 1

As you can see from the last column, the sum of the rounded details is zero (as every rounded detail is zero), but the rounded total is 1 (as the total is exactly 1).

Basically you have three options for this rounding problem:

1. Round details and totals individually (like in the last column of the table above)
For our example this means that we have 0+0+0+0 giving a total of 1.

2. Compute the total as the sum over the rounded details (round first, then do the sum)
For our example this would mean 0+0+0+0 giving a total of 0

3. Adjust the detail rows so that they match the total when the total is computed by rounding the exact total
For our example this would mean something like 0+0+0+1 giving a total of 1

Option 1 is my preferred solution for most cases. Rounding involves some kind of error but option 1 doesn't have other side effects like with options 2 and 3. Anyway, especially in financial reporting, option 1 would not be accepted as the detail rows doesn't sum up to the total. This leaves some kind of uncertainty. Is there an error in the computation? Are we missing some values? Usually you can handle this be placing a remark on the report like "Totals may not match details because of rounding".

Option 2 sounds very nice at first but has some serious drawbacks. Usually the total is more meaningful than the details, so you might want to compute the total e.g. not by region but also by product group. Imagine the details for the product group are 0.5+0.5. Both would round to 1 so the total would be 2. Now for someone comparing the total by region (0 in this case) and the total by product group (2 in this case) the result is very confusing. Furthermore the error can get really serious when there are many records. So whatever your requirement is, I don't recommend this option.

Option 3 is more complicated than option 1 and 2. It needs to adjust one or more elements which may also be confusing. In the example above we had 0+0+0+1=1, so the total really matches the details but it seems like only the four detail row is more significant. This could lead to wrong decisions. Another disadvantage of this option is that the detail usually is a total of some kind too (an aggregated value). So you might compare the value for region D with some other report showing the same value. This could also cause confusion when those two values doesn't match.

So option 1 and option 3 are still under consideration. While option 1 can be easily implemented in the commonly used frontends, option 3 is more difficult. The remaining part of this post is about option 3.

As option 3 modifies the values in some way, I don't want to implement this in the cube but solve the problem in the presentation layer of the application. There would be one reason however why you may think about implementing this kind of rounding in mdx: The presentation layer is not capable of doing so.

So maybe this post is a little bit off topic, but in the following I'm showing how to do this with Reporting Services (SSRS) instead of doing it by mdx.

As an example, I'm using a report with a simple mdx query based on the Adventure Works cube:

select {[Measures].[Amount] } on 0,
[Date].[Month of Year].[Month of Year] on 1
from [Adventure Works]
where ([Account].[Gross Sales],[Date].[Calendar Year].&[2003])

In order to get some rounding issues let's say we want the Gross Sales as millions. I added a computation to the report's dataset like this:

image_thumb1

Now, before getting into detail for the computation, let's look at the result (rounded values are rounded to one digit):

image_thumb10

The third columns is what we saw as option 2 earlier in this post. We rounded the values and added the rounded values up to the total. This results in 25.3 and as you can see this doesn't match the rounded total as 25.574665 would round to 25.6, not to 25.3.

The last column now shows the modified detail rows. As you can see, some of the detail values are slightly different resulting in the correct rounded total.

But how could this be done? In order to make things a little bit more easy to understand I added three more columns to the reporting showing the approach I took:

image_thumb7

The "Rounding error" column simply computes the error we made for each line. Therefor the expression looks like this:

=Fields!AmountMio.Value-Round(Fields!AmountMio.Value,1)

The "Running rounded error" column computes the rounded running total over the previous column using this expression:

=Round(
RunningValue(Fields!AmountMio.Value-Round(Fields!AmountMio.Value,1),Sum,Nothing)
,1)

Actually this is the cumulated error that we have to add to the original rounding. But we are just interested in the steps of the this cumulated error. For example, the first values are 0, 0.1, 0.1. For the second line our cumulated error goes from 0 to 0.1, so we will want to add 0.1 to the rounded value. The difference of the second and the third line (both 0.1) is 0, so we don't need to make any corrections.

Therefor we also take the cumulated rounding error of the previous line (simply but subtracting the current line's value) using this expression:

=Round(
RunningValue(Fields!AmountMio.Value-Round(Fields!AmountMio.Value,1),Sum,Nothing)
-(Fields!AmountMio.Value-Round(Fields!AmountMio.Value,1))
,1)

Finally we can compute the difference in the last column by combining the last two expressions into one:

=Round(
RunningValue(Fields!AmountMio.Value-Round(Fields!AmountMio.Value,1),Sum,Nothing)
,1)
-Round(
RunningValue(Fields!AmountMio.Value-Round(Fields!AmountMio.Value,1),Sum,Nothing)
-(Fields!AmountMio.Value-Round(Fields!AmountMio.Value,1))
,1)

This difference gives the correction for our rounded value column, so finally here is the "Corrected detail rows" column:

=Round(Fields!AmountMio.Value,1)+
Round(
RunningValue(Fields!AmountMio.Value-Round(Fields!AmountMio.Value,1),Sum,Nothing)
,1)
-Round(
RunningValue(Fields!AmountMio.Value-Round(Fields!AmountMio.Value,1),Sum,Nothing)
-(Fields!AmountMio.Value-Round(Fields!AmountMio.Value,1))
,1)

Again, it's rather complicated to handle this kind of rounding and the way I showed here might not be perfect. Maybe you want to adjust the detail rows with the highest error instead of cumulating the error from line to line until the error itself round up and can be used to correct the next detail row.

But it might still be helpful when your requirement is to present comprehensible and traceable totals on your report and you're tempted to write complicated mdx code as the report's data source.

Saturday, July 26, 2008

Some more samples for data aware images created in a stored procedure

In a previous post I showed an example of how to return a dynamic bitmap graphic from a SSAS stored procedure and we also displayed a Win/Loss chart using this technique. Now, I just want to add some other ideas for creating such sparklines.

Of course, the first idea is to plot some micro line charts. And it really isn't difficult. The result looks somewhat like this:

image_thumb1

The main code for the plot looks like this (I've removed some lines to increase readability):

// Create Bitmap
Bitmap bmp = new Bitmap(plot_width, plot_height, PixelFormat.Format32bppArgb);
Graphics gBmp = Graphics.FromImage(bmp);

// calculate min and max values
...

// do the output
for (i = 1; i <= count; i++)
{
float x1 = (float)(i * ((double)plot_width) / (double)count);
float y1 = (float)((q[i]-t_min) * scale);
gBmp.DrawLine(Pens.Black, x0, plot_height - y0, x1, plot_height - y1);
x0 = x1;
y0 = y1;
}
gBmp.Dispose();

// Serialize the output
MemoryStream IS = new MemoryStream();
bmp.Save(IS, ImageFormat.Png);
IS.Flush();
return Convert.ToBase64String(IS.ToArray());

The code is very similar to the code I posted about two weeks ago. First, we create the Bitmap and Graphics object (the canvas to paint on). Then we're plotting the line and finally we serialize the output to a base 64 encoded string.

Another idea might be turn the output of the stored procedure for the quantile calculation from my last post into a nice graphic.

The result may be rendered to a histogram output like this:

image_thumb2

Another to display this in a more granular way is shown in the example below. Like in the first plot, the red area is below the quantile.

image_thumb4

The code for both outputs is a bit lengthy but the basic idea is still very simple. Just create a canvas with the code posted before and draw the graphic on this canvas. After that serialize the canvas to a base 64 encoded string. For the report it just requires an image component that takes the base 64 encoded string is its source. Therefor the string needs to be converted using the function Convert.FromBase64String(...).

So, rendering graphics in stored procedures can be a really flexible way of creating small data-aware graphics.

Such graphics can be really helpful in understanding the meaning of a report at a glance. But they can also be very confusing. So, in my opinion, it's best practice to keep sparkline graphics (and all other small graphics that are repeated per line) as simple as possible.

In most cases I really don't recommend creating sparklines or other kinds of micro graphics using the approach above. There are excellent toolsets out there, for example by BonaVista Systems or Bissantz that really simplify the process of drawing sparklines a lot and that even work with Excel, Reporting Services or other frontend clients. So in most cases you will want to use such tools. Only for the rare cases that you really need to create a special chart that is not supported by such toolsets, you might consider returning the graphic by creating your own stored procedure like shown above.

Saturday, July 19, 2008

Some more thoughts on quantiles...

In a previous post some weeks ago I wrote about calculating a quantile in MDX. Of course you could also think of using a SSAS stored procedure for the calculation as stored procedures offer the full range of .Net programmability and libraries for doing the calculations. In my last post about quantiles I simplified the calculation a lot by assuming that our computation results in a single element. This made the calculation easy to be written in MDX. But if you need the quantile at a higher precision you could approximate the values between two discrete elements.

First let's take a brief look at the simplification we did when calculating the quantile in MDX: The calculation of the quantile is simple in the case that (n-1) * p% gives an integer (n=number of cases, p%=percent of the quantile). In our example we had 10001 scenarios (n=10001) and computed a 5% quantile (p%=5%) giving (n-1)*p% = 500. So we simply have to take the 501th element from the sorted set of cases.

If y:=1+(n-1)*p% is not an integer, we have to use the following calculation:

Q = ( S(b) - S(a) ) * ( y - a )

where a is the integer part of (n-1)*p%+1, b=a+1 and S(a), S(b) the scenario values at index a or b. There are other special cases for the computation but I don't want to get into too much detail here.

The following code just shows the simple calculation (sort the elements and take the Nth element) but it can easily be extended to a higher sophisticated calculation:

using System;
using System.Collections.Generic;
using Microsoft.AnalysisServices.AdomdServer;
using Microsoft.AnalysisServices;

namespace ASStatistics
{
public class Statistic
{
public static double Quantile(Set inputSet, Expression inputExpression, double quantile)
{

if (inputSet.Tuples.Count == 0) return 0;

System.Diagnostics.Debug.Assert(quantile >= 0 & quantile < 1, "Error in Quantile calculation");
double[] q=new double[inputSet.Tuples.Count];
int i=0;

foreach (Tuple tuple in inputSet)
{
q[i] = inputExpression.Calculate(tuple).ToDouble();
i++;
}

Array.Sort(q);

int position = (int)(quantile * i);
return q[position];
}

}
}

After installing this library with the stored procedure to our SSAS server (e.g. using SSMS), we can test it using MDX:

with
member [Quantile_0] as ASStatistics.Quantile([Date].[Date].[Date],[Measures].[Gross Profit],0)
member [Quantile_5] as ASStatistics.Quantile([Date].[Date].[Date],[Measures].[Gross Profit],0.05)
member [Quantile_20] as ASStatistics.Quantile([Date].[Date].[Date],[Measures].[Gross Profit],0.20)

select {[Quantile_0],[Quantile_5],[Quantile_20]} on 0
from [Adventure Works]

image_thumb

So, the worst profit at one day was -599576$ but only in 5% of the days the profit was below 986$ and only in 20% of the days the profit was below 4576$.

Now, the code above should be easy to adapt to all the cases for the quantile computation. If I find some time to do so, I'll post it later on.

So it's really easy to do the calculation in a stored procedure. But how about performance? I used the following queries to measure the performance (both formatted by the current release of Mosha's MDX Studio). First the query for the MDX calculation:

WITH
MEMBER [QuantileMDX] AS
max(
bottomcount(
[Date].[Date].[Date],
(int(((Count([Date].[Date].[Date]) - 1) * 0.05)) + 1),
[Measures].[Gross Profit]),
[Measures].[Gross Profit])
SELECT
[QuantileMDX] ON 0
FROM [Adventure Works]

And here's the query for the stored procedure approach:

WITH
MEMBER [QuantileStoredProcedure] AS
ASStatistics.Quantile(
[Date].[Date].[Date],
[Measures].[Gross Profit], 0.05)
SELECT
[QuantileStoredProcedure] ON 0
FROM [Adventure Works]

Before measuring performance it's important to take the OLAP cache into account. The cache can be cleared using a simple XMLA script like this (see here for more details):

<Batch xmlns="http://schemas.microsoft.com/analysisservices/2003/engine">
<ClearCache xmlns="http://schemas.microsoft.com/analysisservices/2003/engine">
<Object>
<DatabaseID>Adventure Works DW</DatabaseID>
</Object>
</ClearCache>
</Batch>

So here are the result of the above queries (tested on my notebook):

  Cold Cache Warm Cache
MDX 188ms 36ms
Stored Procedure 217ms 63ms

The values differed a little so I took the average over 10 executions (removing the best and worse case before). Of course the comparison is not absolutely accurate because the calculation method, the implementation of sort in both languages etc. also have an influence on the result. Furthermore, both functions could by optimized (maybe you want to use the MDX tail function instead of max or try another approach).

I still think the result is significant. While there is not a big difference with the cold cache, the difference becomes much bigger with the warm cache. The MDX variant takes more profit from the cache than the stored procedure approach.

While experimenting with the stored procedure I also added another variant that only takes a set and the quantile as parameters:

public static double Quantile(Set inputSet, double quantile)

The only difference to the version above is that I evaluate the set results using

MDXValue.FromTuple(tuple).ToDouble();


 



instead of





inputExpression.Calculate(tuple).ToDouble();





This way, I can call the function in both ways (with two or three parameters):





ASStatistics.Quantile(([Date].[Date].[Date],[Measures].[Gross Profit]), 0.05)



or



ASStatistics.Quantile([Date].[Date].[Date],[Measures].[Gross Profit], 0.05)





It's quite interesting (and I didn't expected this) that the variant with the two parameters is much slower. Even with a warm cache it took about 1500ms to compute the above result (more than 9 sec on a cold cache).



So, although passing sets to stored procedure is no longer a bottleneck for performance, you should still be aware of performance issues.

Saturday, July 12, 2008

Win/Loss chart as dynamic image created in SSAS stored procedure

Sascha Lorenz posted the source code for creating win/loss charts in his blog. I really like this kind of charts as it is especially useful for sparklines. Sascha created the bitmap for the chart dynamically on demand using ASP.NET. I thought this could be a great idea for our interactive graphics from the previous posts so that we can easily define them inside the cube and use them in a SSRS report.

First, let's look at the result:

image4

The underlying data is just an example. It shows the days on which the order amount is higher (green) or lower (red) compared to the previous day covering the last 32 days.

The MDX query for our report looks like this:

WITH
MEMBER [WLPlot] AS
ASStatistics.WinLossChart(
tail(
nonempty(
[Date].[Date].[Date],
[Measures].[Sales Amount]),32),
[Measures].[Sales Amount], (
[Measures].[Sales Amount],
[Date].[Date].currentmember.prevmember))
SELECT
{
[WLPlot],
[Gross Profit Margin]
} ON 0,
[Product].[Category].[Category] ON 1
FROM [Adventure Works]

I turned Sascha's function into a stored procedure named WinLossChart. WinLossChart takes three arguments:

Argument Description
inputset The set over which the expressions are to be evaluated
valueExpression The expression for the value
thresholdExpression The expression for the threshold

If the value is higher than the threshold we paint a green bar, if it is lower, we paint a red bar. For the simple example above we made Sales Amount the value expression and the Sales Amount of the previous day the threshold expression. Of course, the threshold could also be a simple constant.

So, here's the code for the stored procedure. The first function returns the bitmap and is very close to the code Sascha posted in his blog entry:

        public static Bitmap CreateWinLossBitmap(double[] q, double[] threshold)
{
int i;
int l = q.Length;

Bitmap objBitmap = new Bitmap(5 * l + 20, 22);
Graphics objGraphic = Graphics.FromImage(objBitmap);

objGraphic.FillRectangle(Brushes.White, 0, 0, 5 * l + 20, 30);
for (i = 0; i < l; ++i)
{
if (q[i] == Double.MinValue)
objGraphic.FillRectangle(Brushes.Gray, 10 + (i * 5), 9, 4, 4);
else
if (q[i] > threshold[i])
objGraphic.FillRectangle(Brushes.Green, 10 + (i * 5), 2, 4, 7);
else
objGraphic.FillRectangle(Brushes.Red, 10 + (i * 5), 10, 4, 7);
}

return objBitmap;
}





And here's the function wrapping the above code into a SSAS stored procedure. This is the function we're calling from MDX:





public static string WinLossChart(Set inputset, Expression valueExpression, Expression thresholdExpression)
{
double[] q = new double[inputset.Tuples.Count];
double[] thr = new double[inputset.Tuples.Count];
int idx = 0;


foreach (Tuple tuple in inputset)
{
q[idx] = valueExpression.Calculate(tuple).ToDouble();
thr[idx] = thresholdExpression.Calculate(tuple).ToDouble();
idx++;
}

Bitmap bmp = CreateWinLossBitmap(q, thr);
MemoryStream IS = new MemoryStream();
bmp.Save(IS, ImageFormat.Png);
IS.Flush();

return Convert.ToBase64String(IS.ToArray());
}


For building the library you need references to





  • Microsoft.AnalysisServices (Microsoft.AnalysisServices.dll)




  • Microsoft.AnalysisServices.AdomdClient (Microsoft.AnalysisServices.AdomdClient.dll)




  • Microsoft.AnalysisServices.XMLA (Microsoft.AnalysisServices.XMLA.dll)




  • msmgdsrv (msmgdsrv.dll) 




 


If you don't have the above assemblies installed on your machine you can take them from a SQL Server installation and install them to the global assembly cache (GAC) using



Gacutil.exe /i assembly_name.dll


 


You also have to add references on these libraries to your C# project like this:


<ItemGroup>    
<Reference Include="Microsoft.AnalysisServices, Version=9.0.242.0, Culture=neutral, PublicKeyToken=89845dcd8080cc91, processorArchitecture=MSIL">
<SpecificVersion>False</SpecificVersion>
</Reference>
<Reference Include="Microsoft.AnalysisServices.AdomdClient, Version=9.0.242.0, Culture=neutral, PublicKeyToken=89845dcd8080cc91, processorArchitecture=MSIL">
<SpecificVersion>False</SpecificVersion>
</Reference>
<Reference Include="Microsoft.AnalysisServices.XMLA, Version=9.0.242.0, Culture=neutral, PublicKeyToken=89845dcd8080cc91, processorArchitecture=MSIL">
<SpecificVersion>False</SpecificVersion>
</Reference>
<Reference Include="msmgdsrv, Version=9.0.0.0, Culture=neutral, PublicKeyToken=89845dcd8080cc91, processorArchitecture=x86">
<SpecificVersion>False</SpecificVersion>
<Private>False</Private>
</Reference>
...
</ItemGroup>



 


Of course the code above is just to show the basic idea and should be rewritten for a productive environment.



Thanks to Sascha for the visualization. Again, I recommend reading his blog entry for the original code.

Saturday, July 5, 2008

Returning an image from an SSAS stored procedure

I just played with a simple way to return an image from a stored procedure by returning a base 64 encoded string of the image in a certain image format (e.g. PNG or JPEG).

Our sample stored procedure looks like this:

public static String SampleImage()
{
Bitmap bmp = new Bitmap( 80, 20, PixelFormat.Format32bppArgb);
Graphics gBmp = Graphics.FromImage(bmp);
System.Random RndObject = new System.Random();

gBmp.CompositingMode = CompositingMode.SourceCopy;

int y=10;
int y1 = 0;
Pen p = new Pen(Color.Black);
gBmp.DrawLine(p, 0, 10, 80, 10);

p.Color = Color.Blue;

p.Width = 1;
for (int i=10;i<=80;i+=10) {
y1 = RndObject.Next(1, 19);
gBmp.DrawLine(p, i - 10, y, i, y1);
y = y1;
}

MemoryStream IS = new MemoryStream();
bmp.Save(IS, ImageFormat.Png);
IS.Flush();

return Convert.ToBase64String(IS.ToArray());
}







The function draws a very simple random line chart (we provided no real data but in some later posts I will provide some samples) and returns the resulting in-memory bitmap as base 64 encoded string.





Therefor the MDX code





with member bmp as ASStatistics.SampleImage()



select {bmp} on columns,

[Product].[Product].[Product] on 1



from [Adventure Works]





returns a list of products together with the base 64 encoded image string. We use this MDX query for the report data source. In order to show the image on a report, we have to use the .Net function Convert.FromBase64(...). So, first we place an image object on our report, then we set the value property of the image to



=Convert.FromBase64String(Fields!bmp.Value)





That's all folks. The result looks like this:





image_thumb3





Ok, this isn't exactly nice, but it still gives us some freedom in designing custom data aware graphics, like sparklines or other micro charts, and put them on a report.

Tuesday, July 1, 2008

Online version of MDX studio available

One of the great features in the current release of Mosha's MDX studio is the ability to format MDX queries. Now this feature is available online as well using this URL: http://mdx.mosha.com/. For more details see Mosha's blog entry.

If you need to format SQL queries, you can also use online services like sqlHere.

Sunday, June 29, 2008

Binomial distribution for a KPI status (part 2)

In the last post we discussed some background on statistics. Now let's turn this into a KPI target. If the parameters of the test are well known in advance the limits for our KPI can be computed for example using Excel formulas.

But if the parameters are based on user input, the computation of the thresholds has to be done in our OLAP engine (MDX). For example, in risk management we compute the Value-at-Risk (VaR) for each day. The VaR is a measure giving us the maximum loss with a certain confidence level. If our confidence level is e.g. 5% the VaR gives our maximum loss for 95 out of 100 days. In the remaining 5 days we expect our loss to be higher (so the VaR is actually a quantile of the probability function of our win/loss). The computation of the VaR requires some kind of model and during the backtesting process we make sure that the model is working properly. So our hypothesis is that the model is correct while the alternative hypthesis is that the model is wrong. Our model is correct if the VaR computed based on the model meets the really observed losses. We count the days in which the actual loss was higher than predicted by the VaR. If it's 5% we can be say, that our model is correct. If not, it may be that the model is incorrect or the sample isn't representative.

If the number of days (maybe the period) is a parameter (maybe a report parameter or a value taken from another dimension), we have to do the computation in the OLAP model.

For this, we have to start with the binomial distribution given by

image19

where n is the total number of cases (days for which we're doing the backtesting), k is the number of observed losses higher than the predicted loss, p is the probability for such a higher loss (in our example 5%), q=1-p (the confidence level of our VaR calculation) and

image21

For example for k=0 we get

image23

As we've seen in the last post, we need the aggregated probability which looks like this:

image26

Now we can define our KPI status for example as

image17

The calculation of the above function is a little tricky and cannot easily be done in MDX. We could call the Excel-Function BINOMDIST, but I didn't get it to work (not all Excel functions are also available for MDX). So here is an approach using an SSAS stored procedure.

Instead of computing the sum over the binomials we actually compute the incomplete beta integral which works much better than dealing with the large numbers resulting from the factorials. There is an excellent set of statistic function available at http://www.alglib.net for free and I used the code from there (see http://www.alglib.net/specialfunctions/distributions/binomial.php to download the code).

An SSAS stored procedure doesn't have much requirements. In its minimal form, it consists of a single static class with one or more public static functions. The code has to be compiled to a class library (Assembly DLL-File) which can then be referenced from SSAS (by simply adding it to the assemblies using Management Studio). There are many code samples available at Codeplex: http://www.codeplex.com/ASStoredProcedures

In our case, the code for the class (C#) simply looks like this:

using System;

namespace ASStatistics
{
public class Statistic
{
public static double BinomialDistribution(int k, int n, double p) {
return binomialdistr.binomialdistribution(k, n, p);
}

}
}

The function call is taken from the AlgLib library above. The MDX code for calling the library function (once the Assembly is registered in SSAS) may look like this

WITH
MEMBER bindist AS
ASStatistics.BinomialDistribution(5, 500, 0.01)
SELECT
bindist ON 0
FROM [Adventure Works]

This gives the following result:

image_thumb

So, if the number of cases if fixed, say 500, the KPI status expression for our Backtesting KPI may look like this:

case
when ASStatistics.BinomialDistribution(KPIVALUE("Backtesting"),500,0.01) <= 0.95 then 1
when ASStatistics.BinomialDistribution(KPIVALUE("Backtesting"),500,0.01) <= 0.9999 then 0
else -1
end

The above confidence levels of 95% and 99.99% correspond to the Basel II recommendation for backtesting of the VaR.