I have a set of dimensions+facts from which I can derive a histogram of frequency by age (age in days, frequency in %).
I then use that histogram to forecast the distribution of an event over time - bascially by multiplying the magnitude of the event by the histogram.
The problem is, the histogram will always be noisey, especially when there's relatively little history in the cube, so what I want is to do some sort of area-preserving curve fit, and then use that computed curve instead of the raw histogram.
I've looked a little bit at data mining, but from what I understand, I don't think it can do what I need. I need this curve to be calculated for potentially 1000's of different slices of the cube, and I need it recalculated daily (or every time a report is run, ideally). It's not possible to "train" something with the curve ahead of time (which seems to be the core concept in data mining).
Borrowing the query from a previous thread...
with
member [Measures].[Offset] as
[Age].[Age].currentMember.memberValue
+ datediff(
"d",
[Date].[Date].currentMember.name,
StrToMember(@.Date).name
)
member [Measures].[Allocated] as
(
[Measures].[IRDC],
[Date].[Date].[All],
StrToMember("[Age].[Age].[" +
CStr([Measures].[Offset]) +
"]"
)
)
member [Measures].[Total] as
(
[Measures].[IRDC],
[Date].[Date].[All],
[Age].[Age].[All]
)
member [Measures].[Share] as
[Measures].[Allocated] / [Measures].[Total]
member [Measures].[ERDA] as
[Measures].[OEDA] * [Measures].[Share]
select
{
[Age].[Age].&[0]:[Age].[Age].&[0].lead(@.Window)
* [Measures].[ERDA]
} on columns,
{
order(
StrToMember(@.Date).lag(@.Window) : StrToMember(@.Date),
[Date].[Date].currentMember.memberValue,
desc
)
} on rows
from
[Cube]
In the above query, [Measures].[Share] is the histogram.
Any suggestions how to approach such a problem? Can a CLR "stored proc" be used to good advantage to do something like this? Any good references on how-to?
Anyone?|||Here is an interesting discussion I had about the time series algorithm and whether it would be possible to get a different time series for thousands of different slices. The thought I had was that you could pivot the dataset yourself so that each slice had one row in the dataset and the time series (histogram) was represented as separate columns. Then you could train one regression model which might be decent at looking at the data for that particular row/slice and spitting out a prediction:
http://www.sqlserverdatamining.com/DMCommunity/Newsgroup/1898.aspx
Generally, if you know how you want it to predict the future, you may be able to write custom code and be smarter than any generic data mining algorithm. So looking into writing an MDX sproc would be my suggestion. As far as generic MDX stored proc examples, the following has lots of good examples:
http://www.codeplex.com/ASStoredProcedures
Also, you might keep your eyes peeled for some statistical functions to be added to that project. No ETA for those yet, but it may be soon. They are mentioned here, and though I'm not sure if any of those functions will help you, they might be good examples:
http://www.codeplex.com/ASStoredProcedures/Thread/View.aspx?ThreadId=2581
Also, SQL Server Samples contains a few stored proc examples. One might be somewhat helpful as an example as it loops over a set, does some calculations, then returns a number. If you have installed the latest (February 2007?) samples, this function can be seen at the following path: <C:\program files\Microsoft SQL Server\90\Samples\Analysis Services\Programmability\AMO\AMOAdventureWorks\CS\StoredProcedures\StoredProcedures.cs>
public static float WeightedAverage(Set set, String weightExpression, String inputExpression)
{
float expression;
float weight;
Expression weightExpr = new Expression(weightExpression);
Expression inputExpr = new Expression(inputExpression);
float total = 0;
float totalWeight = 0;
foreach (Tuple tuple in set)
{
expression = (float)inputExpr.Calculate(tuple);
weight = (float)weightExpr.Calculate(tuple);
total += expression * weight;
totalWeight += weight;
}
if (totalWeight > 0)
{
return total / totalWeight;
}
else
{
return 0;
}
}
A final thought would be that if you can model this in Excel with various Excel formulas, you can use Excel formulas directly in an MDX query.
Hope some of those suggestions help.
|||Thanks for the suggestions - I'll follow up on those in a couple weeks when I get back to the question of curve-fitting this data.
No comments:
Post a Comment