# Model comparison with data sweeps

This slideshow requires JavaScript.

A useful exercise in modeling economic time series is to perform a “sliding window” analysis of the data that computes models in subsets of  the data and tests for the robustness of signal extractions, forecasts, and parameter variance relative to a growing subset of the data. For instance, for a time series of length 300, one could estimate a model on a shorter subset of the data, say for the first 200 observations, and then increase the amount of observations, re-estimate, and then see how the model parameter values change as the number of observations or data subset increases. One can also see how the signal extractions and forecasts change with additional data. Ideally, if the model is specified correctly for the data, there should be a very small variance in the estimated parameters as more data is added to the time series. It signifies the stability of the model selection. Normally, such an exercise would be tedious to carry out with X-13ARIMA-SEATS, or any other software such as MATLAB or R as scripts or spec files would have to be written for each individual re-estimation and then re-plotted. In the uSimX13 module of iMetrica however, this task has been rendered an easy one with the addition of a sliding windows tool.  In this blog entry, we describe this so-called “sliding windows” process and show just how fast and seamless it is to perform model choice robustness and comparisons in iMetrica.

We begin by describing the sliding span/window tool in the iMetrica-uSimX13 module. Once time series data has been loaded into the uSimX13 module from either the uSimX13 main menu or imported from the Data Control module, the uSimX13 computation engine must first be turned on from the uSimX13 menu. Then to access the sliding windows interface,  simply click on the “Sliding Span/Window Activate” check box in the main uSimX13 menu (see Figure 1).

Figure 1. Main drop down menu for the uSimX13 module, showing the “Sliding Span/Window Activate” check box.

Once clicked, the entire plotting canvas will turn to a dark shade of blue, which indicates the windowed region in which model estimation occurs. To control the sliding window, place the mouse cursor along one of the edges of the canvas and slowly glide the mouse with the left-mouse button held down either left or right, depending on which edge of the plot canvas you are on. Moving to the left or right with the left mouse button held down, the windowed area will shrink or expand. The model parameters are estimated instantaneously as the window adjusts and in effect, all the available model statistics, diagnostics, signals, and forecasts are computed as well. For example, as the window expands or shrinks, the trend, seasonally adjusted data, and 24-step ahead forecasts can be plotted and viewed in real-time as the window changes (see Figure 2). One can also slide the window to the left or right by placing the mouse anywhere inside the blue-windowed region, holding down the left mouse button and moving along the time domain. This way, the window length will remain fixed, but the window center will move along different subsets of the data. This can be useful for seeing how model parameters can change within regions of data that exhibit regime changes, namely a sequence in the series that suddenly changes in seasonal or cyclical structure after a certain time observation. The data can now be modeled in both sections before and after the regime change occurs in order to compare the estimated parameter values.

Figure 2. The window sliding across different subsets of the data. The signal extractions, forecast, and model parameters are recomputed automatically as the window changes. Forecast comparisons with the real data as the window span moves is now trivial. Here, the plot in cyan represents the original time series data in-sample and the 24 step forecast out-of-sample, and the light green plot is the time series data adjusted for outliers, as indicated in the model box. One can select the plots using the “series components” plot box. The data in gray represents the time series data not used in the model estimation.

Data Sweep

With the ability to seamlessly capture partitions of the data and model within the given partition using the sliding window, a natural extension of this mouse-on-canvas utility is to employ it somehow in comparing different models of the time series data. We call this method of model comparison time series data sweeping (or simply data sweeping) and it involves selecting an initial window of data from the first observation to the $n$-th observation where n is some number much less than the total number of observations $latex N$ in the data set (say, one third the amount). The data sweep then computes the sliding window from $n$ as the final observation all the way to $N$, in increments of one (see Figure 3). At each addition to the length of the window, the forecast is computed for up to 24 steps ahead. Of course, since the true time series data is known in the out-of-sample region of computation, we can compute the forecast error for up to $h \leq 24$ steps ahead and sum up these errors as $n$ increases to $N$. We can do this data sweep for several models, computing the aggregate forecast errors over time. The idea is that the best model for the data will ideally have the smallest forecast error, and thus comparing this forecast error with several models will identify the model with the best overall forecasting ability.

To access the data sweep, simply go to the main uSimX13 menu, shown in Figure 1, and click “Sweep Time Series Control Panel”. This will bring up the main interface for the data sweep (shown in Figures 4-6). To begin the sweep, first select the model and regressors desired to model the data with inside the model selection panel of the main uSimX13 interface. Then choose at which observation you’d like to carry out the data sweep (starting at observation $n=60$ is the default). Lastly, select how many forecast steps you’d like to use in computing the forecast error (1-24). Once content with the settings, click the “Compute time series sweep” button and watch as the window span increases from $n$ to $N$, recomputing parameters, signals, and forecasts at each step (see slideshow at top of post).  Once the sweep is complete, the parameter statistics, Ljung-Box mean value at two different lags, and the total forecast error is displayed in the control panel. To compare this with another model, save the results of the sweep by clicking “save parameters” in the uSimX13 menu, and then choose another model and recompute (while using the same settings as the previous sweep, of course).

To give an example of this process, we begin by simulating a time series data set of length $N = 300$ from a SARIMA model of dimension $(0,1,2)(0,1,1)_{12}$, namely a seasonal auto-regressive integrated moving-average process with two non-seasonal moving-average parameters, and one seasonal moving average parameter. The data sweep is performed on the simulated data with a forecast error horizon of length 23 using three different SARIMA models, (a) $(0,1,1)(0,1,1)_{12}$, (b) $(1,1,0)(0,1,1)_{12}$, (c) $(0,1,2)(0,1,1)_{12}$, the true model. See Figures 4-6 below to see the data sweep results and the estimated parameter mean and standard deviation, the average Ljung-Box statistics at lag 12 and 0, and the forecast errors for each model. Notice the forecast error for the true model (c) (figure 6) is the lowest followed by model (b) (figure 6) and then (a) (figure 4), which is exactly what we would want.

Figure 4. Model (a) and the parameter statistics, forecast error, and data sweep controls.

Figure 5. Model (b) and the parameter statistics, forecast error, and data sweep controls.

Figure 6. Model (c) (the true model) and the parameter statistics, forecast error, and data sweep controls.