Sunday, January 31, 2016

Look into the future!

I saw Peter Myers give a talk called "Introduction to Time Series Forecasting" at this year's SQL Saturday yesterday, and it had to do with using existing business intelligence data broken down across recorded time increments (for example ice cream sales, chocolate versus strawberry, per maybe calendar week which is close to his example if memory serves) to then extrapolate what comes next. If I am the ice cream vendor and I have years of weekly data and I want to know what sales will likely be like next week, one week into the future, I should be able to do that, right? Yes, your history may predict BI data points in a chronological order sense using the "Microsoft Time Series Algorithm" which is a feature of Microsoft Analysis Services, and, yes, you have to have Microsoft Analysis Services. Some things about this of note:

  • This has sort of been around since SQL Server 2005 starting with a different algorithm called ARTXP, but as of SQL Server 2008 a second algorithm called ARIMA has been spliced in with ARTXP to create the Microsoft Time Series Algorithm. You may configure the algorithm to only use ARTXP or to only use ARIMA and a concept called prediction smoothing will furthermore let you mix a little more or a little less ARTXP with your ARIMA. This suggests that ARTXP is more for the immediacy while ARIMA is for the far future. That is the distinction. Keep in mind that predictions are just predictions, so if you can game the likelihood of their credibility you will have incentive to do so. You can mess with prediction smoothing until you have the outcome you want not unlike channel surfing between CNBC and Fox News more or less to one than the other to see the world as you want to see it.
  • Specificity decays the farther you walk out into the future. The first projected period is just based on existing data, but the second will use the projected first, and so on. The Jenga game gets progressively less stable as fingers-crossed hopefulness gets optimistically stacked on top of more fingers-crossed hopefulness.
  • Supposedly there is a way to make a new "multidimensional" project in Visual Studio which cross talks to Microsoft Analysis Services and allows for "visualizations" which may then be drug into ASP.NET apps!
  • SSDT stands for SQL Server Data Tools and DMX stands for Data Mining Extensions. These may get mixed in somehow.

I saw Jennifer McCown speak on stored procedures in a different talk right before this one and there was only one thing said that was new and it ties into this subject matter so I will just mention it here in lieu of giving it its own blog posting. She suggested that if you are recording snapshot data off to a database daily (or on some other time-based interval) that you should NOT just skip a day when Tuesday's data looks just like Monday's data. How do you really know after the fact that the skipped time period represents a carryover of the old data and not a day when your ETL just failed to run?

No comments:

Post a Comment