iMetrica for Linux Ubuntu 64 now available

The MDFA real-time signal extraction module

The MDFA real-time signal extraction module

My first open-source release of iMetrica for Linux Ubuntu 64 can now be downloaded at my Github, with a Windows 64 version soon to follow. iMetrica is a fast, interactive, GUI-oriented software suite for predictive modeling, multivariate time series analysis, real-time signal extraction, Bayesian financial econometrics, and much more.

The principal use of iMetrica is to provide an interactive environment for the numerical and visual analysis of (multivariate) time series modeling, real-time filtering, and signal extraction. The interactive features in iMetrica boast a modeling and graphics environment for analysts, practitioners, and students of econometrics, finance, and real-time data analysis where no coding or modeling experience is necessary. All the system needs is data which can be piped into the system in many forms, including .csv, .txt, Google/Yahoo Finance, Quandle, .RData, and more. A module for connecting to MySQL databases is currently being developed. One can also simulate their own data from a one or a combination of several different popular data generating models.

With the design intending to be interactive and self-enclosed, one can change modeling data/parameter inputs and see the effects in both graphical and numerical form automatically. This feature is designed to help understand the underlying mechanics of the modeling or filtering process. One can test many attributes of the modeling or filtering process this way both visually and numerically such as sensitivity, nonlinearity, goodness-of-fit, any overfitting issues, stability, etc.

All the computational libraries were written in GNU C and/or Fortran and have been provided as Native libraries to the Java platform via JNI, where Java provides the user-interface, control, graphics, and several other components in a module format, and where each module specializes in a different data analysis paradigm. The modules available in this open-source version of iMetrica are as follows:

1) Data simulation, modeling and fitting using several popular econometric models

  • (S)ARIMA, (E)GARCH, (Multivariate) Factor models, Stochastic Volatility, High-frequency volatility models, Cycles/Trends, and more
  • Random number generators from several different types of parameterized distributions to create shocks, outliers, regression components, etc.
  • Visualize in real-time all components of the modeling process

2) An interactive GUI for multivariate real-time signal extraction using the multivariate direct filter approach (MDFA)

  • Construct mulitvariate MA filter designs, classical ARMA ZPA filtering designs, or hybrid filtering designs.
  • Analyze all components of the filtering and signal extraction process, from time-delay and smoothing control, to regularization.
  • Adaptive real-time filtering
  • Construct financial trading signals and forecasts
  • Includes a real-time/frequency analysis module using MDFA

3) An interactive GUI for X-13-ARIMA-SEATS called uSimX13

  • Perform automatic seasonal adjustment on thousands of economic time series
  • Compare SARIMA model choices using several different novel signal extraction diagnostics and tools available only in iMetrica
  • Visualize in real-time several components of modeling process
  • Analyze forecasts and compare with other models
  • All of the most important features of X-13-ARIMA-SEATS included

4) An interactive GUI for RegComponent (State Space and Unobserved Component Models)

  • Construct unobserved signal components and time-varying regression components
  • Obtain forecasts automatically and compare with other forecasting models

5) Empirical Mode Decomposition

  • Applies a fast adaptive EMD algorithm to decompose nonlinear, nonstationary data into a trend and instrinsic modes.
  • Visualize all time-frequency components with automatically generated 2D heat maps.

6) Bayesian Time Series Modeling of ARIMA, (E)GARCH, Multivariate Stochastic Volatility, HEAVY models

  • Compute and visualize posterior distribtions for all modeling parameters
  • Easily compare different model dimensions

7) Financial Trading Strategy Engineering with MDFA

  • Construct financial trading signals in the MDFA module and backtest the strategies on any frequency of data
  • Perform analysis of the strategies using forward-walk schemes
  • Automatically optimize certain components of the signal extraction on in-sample data.
  • Features a toolkit for minimizing probability of backtest overfit

Tutorials on how to use iMetrica can be found on this blog and will be added on a weekly basis, with new tools, features, and modules being added and improved on a consistent basis.

Please send any bug reports, comments, complaints, to clisztian@gmail.com.

High-Frequency Financial Trading with Multivariate Direct Filtering I: FOREX and Futures

Animation 1: Click to see animation of the Japanese Yen filter in action on 164 hourly out-of-sample observations.

Animation 1: Click to see animation of the Japanese Yen filter in action on 164 hourly out-of-sample observations.

I recently acquired over 300 GBs of financial data that includes tick data for over 7000 financial assets traded on multiple markets for the past 5 years up until February 1st 2013. This USB drive packed with nearly every detail of world financial markets coupled with iMetrica gave me an opportunity to explore at any fashion to my desire the ability of multivariate direct filtering to produce high performance financial trading signals on nearly any high-frequency. Let me begin this article with saying that I am more than ecstatic with the results, as I hope you will too after reading this article.  In this first article in a series of high-frequency trading with MDFA and iMetrica that I plan to write, I provide some initial experiments with building and extracting financial trading signals for high-frequency intraday observations on foreign exchange (FOREX) data, and by high-frequency in the context of this article, I mean higher frequencies than the daily log-returns I’ve been working with in my previous articles. In the first part of this high-frequency series, I begin by exploring hourly, 30 minute, and 15 minute log-returns, and test different strategies, mostly using low-pass and the recently introduced multi-bandpass (MBP) filter to deduce the best approach to tackle the problem of building successful trading signals in higher frequency data.

In my previous articles, I was working uniquely with daily log-return data from different time spans from a year to a year and a half. This enabled the in-sample period of computing the filter coefficients for the signal extraction to include all the most recent annual phases and seasons of markets, from holiday effects, to the transitioning period of August to September that is regularly highly influential on stock market prices and commodities as trading volume increases a significant amount. One immediate question that is raised in migrating to higher-frequency intraday data is what kind of in-sample/out-of-sample time spans should be used to compute the filter in-sample and then for how long do we apply the filter out-of-sample to produce the trades? Another question that is raised with intraday data is how do we account for the close-to-open variation in price? Certainly, after close, the after-hour bids and asks will force a jump into the next trading day. How do we deal with this jump in an optimal manner? As the observation frequency gets higher, say from one hour to 30 minutes, this close-to-open jump/fall should most likely be larger. I will start by saying that, as you will see in the results of this article, with a clever choice of the extractor \Gamma and explanatory series, MDFA can handle these jumps beautifully (both aesthetically and financially). In fact, I would go so far as to say that the MDFA does a superb job in predicting the overnight variation.

One advantage of building trading signals for higher intraday frequencies is that the signals produce trading strategies that are immediately actionable. Namely one can act upon a signal to enter a long or short position immediately when they happen. In building trading signals for the daily log-return, this is not the case since the observations are not actionable points, namely the log difference of today’s ending price with yesterday’s ending price are produced after hours and thus not actionable during open market hours and only actionable the next trading day. Thus trading on intraday observations can lead to better efficiency in trading.

In this first installment in my series on high-frequency financial trading using multivariate direct filtering in iMetrica, I consider building trading signals on hourly returns of foreign exchange currencies. I’ve received a few requests after my recent articles on the Frequency Effect in seeing iMetrica and MDFA in action on the FOREX sector. So to satisfy those curiosities, I give a series of (financially) satisfying and exciting results in combining MDFA and the FOREX. I won’t give all my secretes away into building these signals (as that would of course wipe out my competitive advantage), but I will give some of the parameters and strategies used so any courageously curious reader may try them at home (or the office). In the conclusion, I give a series of even more tricks and hacks.  The results below speak for themselves  So without further ado, let the games begin.

Japanese Yen

Frequency: One hour returns
30 day out-of-sample ROI: 12 percent
Trade success ratio: 92 percent

Yen Filter Parameters: \lambda = 9.2 \alpha = 13.2, \omega_0 = \pi/5
Regularization: smooth = .918, decay = .139, decay2 = .79, cross = 0

In the first experiment, I consider hourly log-returns of a ETF index that mimics the Japanese Yen called FXY. As for one of the explanatory series, I consider the hourly log-returns of the price of GOLD which is traded on NASDAQ. The out-of-sample results of the trading signal built using a low-pass filter and the parameters above are shown in Figure 1.  The in-sample trading signal (left of cyan line) was built using 400 hourly observations of the Yen during US market hours dating back to 1 October 2012. The filter was then applied to the out-of-sample data for 180 hours, roughly 30 trading days up until Friday, 1 February 2013.

Figure 3: Out-of-sample results for the Japanese Yen. The in-sample trading  signal was built using 400 hourly observations of the Yen during US market hours  dating back to November 1st, 2012. The out-of-sample portion passed the cyan line is on 180 hourly observations, about 30 trading days.

Figure 1: Out-of-sample results for the Japanese Yen. The in-sample trading signal was built using 400 hourly observations of the Yen during US market hours dating back to October 1st, 2012. The out-of-sample portion passed the cyan line is on 180 hourly observations, about 30 trading days.

This beauty of this filter is that it yields a trading signal exhibiting all the characteristics that one should strive for in building a robust and successful trading filter.

  1. Consistency: The in-sample portion of the filter performs exactly as it does out-of-sample (after cyan line) in both trade success ratio and systematic trading performance. 
  2. Dropdowns: One small dropdown out-of-sample for a loss of only .8 percent (nearly the cost of the transaction).
  3. Detects the cycles as it should: Although the filter is not able to pinpoint with perfect accuracy every local small upturn during the descent of the Yen against the dollar, it does detect them nonetheless and knows when to sell at their peaks (the magenta lines).
  4. Self-correction: What I love about a robust filter is that it will tend to self-correct itself very quickly to minimize a loss in an erroneous trade. Notice how it did this in the second series of buy-sell transactions during the only loss out-of-sample. The filter detects momentum but quickly sold right before the ensuing downfall. My intuition is that only frequency-based methods such as the MDFA are able to achieve this consistently. This is the sign of a skillfully smart filter.

The coefficients for this Yen filter are shown below. Notice the smoothness of the coefficients from applying the heavy smooth regularization and the strong decay at the very end.  This is exactly the type of smooth/decay combo that one should desire. There is some obvious correlation between the first and second explanatory series in the first 30 lags or so as well. The third explanatory series seems to not provide much support until the middle lags .

Coefficients of the Yen filter. Here we use three different explanatory series to extract  the trading signal shown in Figure 1.

Figure 2: Coefficients of the Yen filter. Here we use three different explanatory series to extract the trading signal.

One of the first things that I always recommend doing when first attempting to build a trading signal is to take a glance at the periodogram. Figure 2 shows the periodogram of the log-return data of the Japanese Yen over 580 hours.  Compare this with the periodogram of the same asset using log-returns of daily data over 580 days, shown in Figure 3.  Notice the much larger prominent spectral peaks at the lower frequencies in the daily log-return data. These prominent spectral peaks renders multibandpass filters much more advantageous and to use as we can take advantage of them by placing a band-pass filter directly over them to extract that particular frequency (see my article on multibandpass filters). However, in the hourly data, we don’t see any obvious spectral peaks to consider, thus I chose a low-pass filter and set the cutoff frequency at $\pi/5$, a standard choice, and good place to begin.

Figure 1: Periodogram of hourly log-returns of the Japanese Yen over 580 hours.

Figure 3: Periodogram of hourly log-returns of the Japanese Yen over 580 hours.

Figure 3: Periodogram of Japanese Yen using 580 daily log-return observations.

Figure 4: Periodogram of Japanese Yen using 580 daily log-return observations. Many more spectral peaks are present in the lower frequencies.

Japanese Yen

Frequency: 15 minute returns
7 day out-of-sample ROI: 5 percent
Trade success ratio: 82 percent

Yen Filter Parameters: \lambda = 3.7 \alpha = 13, \omega_0 = \pi/9
Regularizationsmooth = .90, decay = .11, decay2 = .09, cross = 0

In the next trading experiment, I consider the Japanese Yen again, only this time I look at trading on even high-frequency log-return data than before, namely on 15 minute log-returns of the Yen from the opening bell to market close.  This presents slightly new challenges than before as the close-to-open jumps are much larger than before, but these larger jumps do not necessarily pose problems for the MDFA. In fact, I look to exploit these and take advantage to gain profit by predicting the direction of the jump.  For this higher frequency experiment, I considered 350 15-minute in-sample observations to build and optimize the trading signal, and then applied it over the span of 200 15-minute out-of-sample observations. This produced the results shown in the Figure 5 below. Out of 17 total trades out-of-sample, there were only 3 small losses each less than .5 percent drops and thus 14 gains during the 200 15-minute out-of-sample time period.  The beauty of this filter is its impeccable ability to predict the close-to-open jump in the price of the Yen. Over the nearly 7 day trading span, it was able to correctly deduce whether to buy or short-sell before market close on every single trading day change. In the figure below, the four largest close-to-open variation in Yen price is marked with a “D” and you can clearly see how well the signal was able to correctly deduce a short-sell before market close. This is also consistent with the in-sample performance as well, where you can notice the buys and/or short-sells at the largest close-to-open jumps (notice the large gain in the in-sample period right before the out-of-sample period begins, when the Yen jumped over 1 percent over night.  This performance is most likely aided by the explanatory time series I used for helping predict the close-to-open variation in the price of the Yen. In this example, I only used two explanatory series (the price of Yen, and another closely related to the Yen).

Figure : Out-of-sample performance of the Japanese Yen filter on 15 minute log-return data.

Figure 5: Out-of-sample performance of the Japanese Yen filter on 15 minute log-return data.

We look at the filter transfer functions to see what frequencies they are being privileged in the construction of the filter. Notice that some noise leaks out passed the frequency cutoff at \pi/9, but this is typically normal and a non-issue. I had to balance for both timeliness and smoothness in this filter using both the customization parameters \lambda and \alpha. Not much at frequency 0 is emphasized, with more emphasis stemming from the large spectral peak found right at \pi/9.

Figure : The filter transfer functions.

Figure 6: The filter transfer functions.

British Pound

Frequency: 30 minute returns
14 day out-of-sample ROI: 4 percent
Trade success ratio: 76 percent

British Pound Filter Parameters: \lambda = 5 \alpha = 15, \omega_0 = \pi/9
Regularizationsmooth = .109, decay = .165, decay2 = .19, cross = 0

In this example we consider the frequency of the data to 30 minute returns and attempt to build a robust trading signal for a derivative of the British Pound (BP) on this higher frequency. Instead of using the cash value of the BP, I use 30 minute returns of the BP Futures contract expiring in March (BPH3). Although I don’t have access to tick data from the FOREX, I do have tick data from GLOBEX for the past 5 years.  Thus the futures series won’t be an exact replication of the cash price series of the BP, but it should be quite close due to very low interest rates.

The results of the out-of-sample performance of the BP futures filter are shown in Figure 7. I constructed the filter using an initial in-sample size of 390 30 minute returns dating back to 1 December 2012. After pinpointing a frequency cutoff in the frequency domain for the \Gamma that yielded decent trading results in-sample, I then proceeded to optimize the filter in-sample on smoothness and regularization to achieve similar out-of-sample performance. Applying the resulting filter out-of-sample on 168 30-minute log-return observations of the BP futures series along with 3 explanatory series, I get the results shown below. There were 13 trades made and 10 of them were successful. Notice that the filter does an exquisite job at triggering trades near local optimums associated with the frequencies inside the cutoff of the filter.

Figure 5: The out-of-sample results of the British Pound using 30-minute return data.

Figure 7: The out-of-sample results of the British Pound using 30-minute return data.

In looking at the coefficients of the filter for each series in the extraction, we can clearly see the effects of the regularization: the smoothness of the coefficients the fast decay at the very end. Notice that I never really apply any cross regularization to stress the latitudinal likeliness between the 3 explanatory series as I feel this would detract from the predicting advantages brought by the explanatory series that I used.

Figure 6: The coefficients for the 3 explanatory series of the BP futures,

Figure 8: The coefficients for the 3 explanatory series of the BP futures,

Euro

Frequency: 30 min returns
30 day out-of-sample ROI: 4 percent
Trade success ratio: 71 percent

Euro Filter Parameters: \lambda = 0, \alpha = 6.4, \omega_0 = \pi/9
Regularizationsmooth = .85, decay = .27, decay2 = .12, cross = .001

Continuing with the 30 minute frequency of log-returns, in this example I build a trading signal for the Euro futures contract with expiration on 18 March 2013 (UROH3 on the GLOBEX). My in-sample period, being the same as my previous experiment, is from 1 December 2012 to 4 January 2013 on 30 minute returns using three explanatory time series.  In this example, after inspecting the periodogram, I decided upon a low-pass filter with a frequency cutoff of \pi/9. After optimizing the customization and applying the filter to one month of 30 minute frequency return data out-of-sample (month of January 2013, after cyan line) we see the performance is akin to the performance in-sample, exactly what one strives for. This is due primarily to the heavy regularization of the filter coefficients involved. Only four very small losses of less than .02 percent are suffered during the out-of-sample span that includes 10 successful trades, with the losses only due to the transaction costs. Without transaction costs, there is only one loss suffered at the very beginning of the out-of-sample period.

Figure : Out-of-sample performance on the 30-min log-returns of Euro futures contract UROH3.

Figure 9 : Out-of-sample performance on the 30-min log-returns of Euro futures contract UROH3.

As in the first example using hourly returns, this filter again exhibits the desired characteristics of a robust and high-performing financial trading filter. Notice the out-of-sample performance behaves akin to the in-sample performance, where large upswings and downswings are pinpointed to high-accuracy. In fact, this is where the filter performs best during these periods. No need for taking advantage of a multibandpass filter here, all the profitable trading frequencies are found at less than \pi/9.  Just as with the previous two experiments with the Yen and the British Pound, notice that the filter cleanly predicts the close-to-open variation (jump or drop) in the futures value and buys or sells as needed.  This can be seen from many of the large jumps in the out-of-sample period (after cyan line).

One reason why these trading signals perform so well is due to their approximation power of the symmetric filter. In comparing the trading signal (green) with a high-order approximation of the symmetric filter (gray line) transfer function \Gamma shown in Figure 10, we see that trading signal does an outstanding job at approximating the symmetric filter uniformly. Even at the latest observation (the right most point), the asymmetric filter hones in on the symmetric signal (gray line) with near perfection. Most importantly, the signal crosses zero almost exactly where required.  This is exactly what you want when building a high-performing trading signal.

Figure : Plot of approximation of the real-time trading signal for UROH3 with a high order approximation of the symmetric filter transfer function.

Figure 10: Plot of approximation of the real-time trading signal for UROH3 with a high order approximation of the symmetric filter transfer function.

In looking at the periodogram of the log-return data and the output trading signal differences (colored in blue), we see that the majority of the frequencies were accounted for as expected in comparing the signal with the symmetric signal. Only an inconsequential amount of noise leakage passed the frequency cutoff of \pi/9 is found.  Notice the larger trading frequencies, the more prominent spectral peaks, are located just after \pi/6. These could be taken into account with a smart multibandpass filter in order to manifest even more trades, but I wanted to keep things simple for my first trials with high-frequency foreign exchange data.  I’m quite content with the results that I’ve achieved so far.

Figure : Comparing the periodogram of the signal with the log-return data.

Figure 11: Comparing the periodogram of the signal with the log-return data.

Conclusion

I must admit, at first I was a bit skeptical of the effectiveness that the MDFA would have in building any sort of successful trading signal for FOREX/GLOBEX high frequency data. I always considered the FOREX market rather ‘efficient’ due to the fact that it receives one of the highest trading volumes in the world.  Most strategies that supposedly work well on high-frequency FOREX all seem to use some form of technical analysis or charting (techniques I’m particularly not very fond of), most of which are purely time-domain based. The direct filter approach is a completely different beast, utilizing a transformation into the frequency domain and a ‘bending and warping’ of the metric space for the filter coefficients to extract a signal within the noise that is the log-return data of financial assets.  For the MDFA to be very effective at building timely trading signals, the log-returns of the asset need to diverge from white noise a bit, giving room for pinpointing intrinsically important cycles in the data. However, after weeks of experimenting, I have discovered that building financial trading signals using MDFA and iMetrica on FOREX data is as rewarding as any other.

As my confidence has now been bolstered and amplified even more after my experience with building financial trading signals with MDFA and iMetrica for high-frequency data on foreign exchange log-returns at nearly any frequency, I’d be willing to engage in a friendly competition with anyone out there who is certain that they can build better trading strategies using time domain based methods such as technical analysis or any other statistical arbitrage technique.  I strongly believe these frequency based methods are the way to go, and the new wave in financial trading.  But it takes experience and a good eye for the frequency domain and periodograms to get used to. I haven’t seen many trading benchmarks that utilize other types of strategies, but i’m willing to bet that they are not as consistent as these results using this large of an out-of-sample to in-sample ratio (the ratios in these experiments were between .50 and .80).  If anyone would like to take me up on my offer for a friendly competition (or know anyone that would), feel free to contact me.

After working with a multitude of different financial time series and building many different types of filters, I have come to the point where I can almost eyeball many of the filter parameter choices including the most important ones being the extractor \Gamma along with the regularization parameters, without resorting to time consuming, and many times inconsistent, optimization routines.  Thanks to iMetrica, transitioning from visualizing the periodogram to the transfer functions and to the filter coefficients and back to the time domain to compare with the approximate symmetric filter in order to gauge parameter choices is an easy task, and an imperative one if one wants to build successful trading signals using MDFA.

Here are some overall tips and tricks to build your own high performance trading signals on high-frequency data at home:

  • Pay close attention to the periodogram. This is your best friend in choosing the extractor \Gamma. The best performing signals are not the ones that trade often, but trade on the most important frequencies found in the data. Not all frequencies are created equal. This is true when building either low-pass or multibandpass frequencies. 
  • When tweaking customization, always begin with \alpha, the parameter for smoothness. \lambda for timeliness should be the last resort. In fact, this parameter will most likely be next to useless due to the fact that the log-return of financial data is stationary. You probably won’t ever need it.
  • You don’t need many explanatory series. Like most things in life, quality is superior to quantity. Using the log-return data of the asset you’re trading along with one and maybe two explanatory series that somewhat correlate with the financial asset you’re trading on is sufficient. Anymore than that is ridiculous overkill, probably leading to over-fitting (even the power of regularization at your fingertips won’t help you).

In my next article, I will continue with even more high-frequency trading strategies with the MDFA and iMetrica where I will engage in the sector of Funds and ETFs. If any curious reader would like even more advice/hints/comments on how to build these trading signals on high-frequency data for the FOREX (or the coefficients built in these examples), feel free to get in contact with me via email. I’ll be happy to help.

Happy extracting!

Model comparison with data sweeps

This slideshow requires JavaScript.

A useful exercise in modeling economic time series is to perform a “sliding window” analysis of the data that computes models in subsets of  the data and tests for the robustness of signal extractions, forecasts, and parameter variance relative to a growing subset of the data. For instance, for a time series of length 300, one could estimate a model on a shorter subset of the data, say for the first 200 observations, and then increase the amount of observations, re-estimate, and then see how the model parameter values change as the number of observations or data subset increases. One can also see how the signal extractions and forecasts change with additional data. Ideally, if the model is specified correctly for the data, there should be a very small variance in the estimated parameters as more data is added to the time series. It signifies the stability of the model selection. Normally, such an exercise would be tedious to carry out with X-13ARIMA-SEATS, or any other software such as MATLAB or R as scripts or spec files would have to be written for each individual re-estimation and then re-plotted. In the uSimX13 module of iMetrica however, this task has been rendered an easy one with the addition of a sliding windows tool.  In this blog entry, we describe this so-called “sliding windows” process and show just how fast and seamless it is to perform model choice robustness and comparisons in iMetrica.

We begin by describing the sliding span/window tool in the iMetrica-uSimX13 module. Once time series data has been loaded into the uSimX13 module from either the uSimX13 main menu or imported from the Data Control module, the uSimX13 computation engine must first be turned on from the uSimX13 menu. Then to access the sliding windows interface,  simply click on the “Sliding Span/Window Activate” check box in the main uSimX13 menu (see Figure 1).

Figure 1. Main drop down menu for the uSimX13 module, showing the “Sliding Span/Window Activate” check box.

Once clicked, the entire plotting canvas will turn to a dark shade of blue, which indicates the windowed region in which model estimation occurs. To control the sliding window, place the mouse cursor along one of the edges of the canvas and slowly glide the mouse with the left-mouse button held down either left or right, depending on which edge of the plot canvas you are on. Moving to the left or right with the left mouse button held down, the windowed area will shrink or expand. The model parameters are estimated instantaneously as the window adjusts and in effect, all the available model statistics, diagnostics, signals, and forecasts are computed as well. For example, as the window expands or shrinks, the trend, seasonally adjusted data, and 24-step ahead forecasts can be plotted and viewed in real-time as the window changes (see Figure 2). One can also slide the window to the left or right by placing the mouse anywhere inside the blue-windowed region, holding down the left mouse button and moving along the time domain. This way, the window length will remain fixed, but the window center will move along different subsets of the data. This can be useful for seeing how model parameters can change within regions of data that exhibit regime changes, namely a sequence in the series that suddenly changes in seasonal or cyclical structure after a certain time observation. The data can now be modeled in both sections before and after the regime change occurs in order to compare the estimated parameter values.

Figure 2. The window sliding across different subsets of the data. The signal extractions, forecast, and model parameters are recomputed automatically as the window changes. Forecast comparisons with the real data as the window span moves is now trivial. Here, the plot in cyan represents the original time series data in-sample and the 24 step forecast out-of-sample, and the light green plot is the time series data adjusted for outliers, as indicated in the model box. One can select the plots using the “series components” plot box. The data in gray represents the time series data not used in the model estimation.

Data Sweep

With the ability to seamlessly capture partitions of the data and model within the given partition using the sliding window, a natural extension of this mouse-on-canvas utility is to employ it somehow in comparing different models of the time series data. We call this method of model comparison time series data sweeping (or simply data sweeping) and it involves selecting an initial window of data from the first observation to the n-th observation where n is some number much less than the total number of observations $latex N$ in the data set (say, one third the amount). The data sweep then computes the sliding window from n as the final observation all the way to N, in increments of one (see Figure 3). At each addition to the length of the window, the forecast is computed for up to 24 steps ahead. Of course, since the true time series data is known in the out-of-sample region of computation, we can compute the forecast error for up to h \leq 24 steps ahead and sum up these errors as n increases to N. We can do this data sweep for several models, computing the aggregate forecast errors over time. The idea is that the best model for the data will ideally have the smallest forecast error, and thus comparing this forecast error with several models will identify the model with the best overall forecasting ability.

To access the data sweep, simply go to the main uSimX13 menu, shown in Figure 1, and click “Sweep Time Series Control Panel”. This will bring up the main interface for the data sweep (shown in Figures 4-6). To begin the sweep, first select the model and regressors desired to model the data with inside the model selection panel of the main uSimX13 interface. Then choose at which observation you’d like to carry out the data sweep (starting at observation n=60 is the default). Lastly, select how many forecast steps you’d like to use in computing the forecast error (1-24). Once content with the settings, click the “Compute time series sweep” button and watch as the window span increases from n to N, recomputing parameters, signals, and forecasts at each step (see slideshow at top of post).  Once the sweep is complete, the parameter statistics, Ljung-Box mean value at two different lags, and the total forecast error is displayed in the control panel. To compare this with another model, save the results of the sweep by clicking “save parameters” in the uSimX13 menu, and then choose another model and recompute (while using the same settings as the previous sweep, of course).

To give an example of this process, we begin by simulating a time series data set of length N = 300 from a SARIMA model of dimension (0,1,2)(0,1,1)_{12}, namely a seasonal auto-regressive integrated moving-average process with two non-seasonal moving-average parameters, and one seasonal moving average parameter. The data sweep is performed on the simulated data with a forecast error horizon of length 23 using three different SARIMA models, (a) (0,1,1)(0,1,1)_{12}, (b) (1,1,0)(0,1,1)_{12}, (c) (0,1,2)(0,1,1)_{12}, the true model. See Figures 4-6 below to see the data sweep results and the estimated parameter mean and standard deviation, the average Ljung-Box statistics at lag 12 and 0, and the forecast errors for each model. Notice the forecast error for the true model (c) (figure 6) is the lowest followed by model (b) (figure 6) and then (a) (figure 4), which is exactly what we would want.

Figure 4. Model (a) and the parameter statistics, forecast error, and data sweep controls.

Figure 5. Model (b) and the parameter statistics, forecast error, and data sweep controls.

Figure 6. Model (c) (the true model) and the parameter statistics, forecast error, and data sweep controls.