DATA 521 is Time Series and Forecasting
The summary page
Forecasting is predicting the future. That’s hard. There is a certain science to forecasting that we can make use of all the while recognizing that we have to assume that things are similar in relevant ways to the past to get leverage on them by using the past. This point should never be lost.
We follow the general outlined workflow of Rob J Hyndman and George Athanasopoulos in Forecasting: Principles and Practice. The key is the workflow. Tidying data to organize with around an index
… a proper date/time something that can be understand as appropriately sequential [with this sequence denotable] and a key
that describes some set of distinct time series that are [potentially] stored with the same or similar index of time. Time is central to time series and to the tsibble
. With this resolved, there is graphical, decomposition, and feature understanding, before the application of models. Almost all the action is in adding models to the toolbox. Basic time series regressions, ETS models of varying forms, ARIMA, Dynamic regressions and their integration with ARMA, using STL for seasonal adjustment and ARMA or ETS models as STL+
, and advances including aggregation and hierarchical and grouped times series and forecast reconciliation, prophet, VAR, tbats, NNets, and others.
The core issue remains the criteria for evaluation. Model fit tells us how well we do in the data that we have used to fit the model.. That’s important to know. But we often really want to know what what model is best over a given forecast horizon. We can use stretch_tsibble(.init, .step)
to decide how much data is required to get a credible forecast to start and over steps of what size to repeat it. This allows us to average over a whole bunch of future forecasts that are implications of that model. Using accuracy(original_tsibble)
with it, allows us to evaluate the question, which model has been best at forecasting (h periods out with steps of .step size) over the determined horizon. Because we only have one time series and cannot explicitly re-run time [but bootstrapping/bagging]; we can at least know which has best performed in our fixed time horizon task.
The course textbook:
Software:
complete week 4 data for week 5:
load(url("https://github.com/robertwwalker/DADMStuff/raw/master/Ch4HA.RData"))