Hierarchy of Financial Trading Parameters

Posted on November 28, 2012 by Christian Dallas Blakely

Figure 1: A trading signal produced in iMetrica for the daily price index of GOOG (Google) using the log-returns of GOOG and AAPL (Apple) as the explanatory data, The blue-pink line represents the account wealth over time, with a 89 percent return on investment in 16 months time (GOOG recorded a 23 percent return during this time). The green line represents the trading signal built using the MDFA module using the hierarchy of parameters described in this article. The gray line is the log price of GOOG from June 6 2011 to November 16 2012.

In any computational method for constructing binary buy/sell signals for trading financial assets, most certainly a plethora of parameters are involved and must be taken into consideration when computing and testing the signals in-sample for their effectiveness and performance. As traders and trading institutions typically rely on different financial priorities for navigating their positions such as risk/reward priorities, minimizing trading costs/trading frequency, or maximizing return on investment , a robust set of parameters for adjusting and meeting the criteria of any of these financial aims is needed. The parameters need to clearly explain how and why their adjustments will aid in operating the trading signal to their goals in mind. It is my strong belief that any computational paradigm that fails to do so should not be considered a candidate for a transparent, robust, and complete method for trading financial assets.

In this article, we give an in-depth look at the hierarchy of financial trading parameters involved in building financial trading signals using the powerful and versatile real-time multivariate direct filtering approach (MDFA, Wildi 2006,2008,2012), the principle method used in the financial trading interface of iMetrica. Our aim is to clearly identify the characteristics of each parameter involved in constructing trading signals using the MDFA module in iMetrica as well as what effects (if any) the parameter will have on building trading signals and their performance.

With the many different parameters at one’s disposal for computing a signal for virtually any type of financial data and using any financial priority profile, naturally there exists a hierarchy associated with these parameters that all have well-defined mathematical definitions and properties. We propose a categorization of these parameters into three levels according to the clarity on their effect in building robust trading signals. Below are the four main control panels used in the MDFA module for the Financial Trading Interface (shown in Figure 1). They will be referenced throughout the remainder of this article.

Figure 2: The interface for controlling many of the parameters involved in MDFA. Adjusting any of these parameters will automatically compute the new filter and signal output with the new set of parameters and plot the results on the MDFA module plotting canvases.

Figure 3: The main interface for building the target symmetric filter that is used for computing the real-time (nonsymmetric) filter and output signal. Many of the desired risk/reward properties are controlled in this interface. One can control every aspect of the target filter as well as spectral densities used to compute the optimal filter in the frequency domain.

Figure 4: The main interface for constructing Zero-Pole Combination filters, the original paradigm for real-time direct filtering. Here, one can control all the parameters involved in ZPC filtering, visualize the frequency domain characteristics of the filter, and inject the filter into the I-MDFA filter to create “hybrid” filters.

Figure 5: The basic trading regulation parameters currently offered in the Financial Trading Interface. This panel is accessed by using the Financial Trading menu at the top of the software. Here, we have direct control over setting the trading frequency, the trading costs per transaction, and the risk-free rate for computing the Sharpe Ration, all controlled by simply sliding the bars to the desired level. One can also set the option to short sell during the trading period (provided that one is able to do so with the type of financial asset being traded).

The Primary Parameters:

Trading Frequency. As the title entails, the trading frequency governs how often buy/sell signal will occur during the span of the trading horizon. Regardless of minute data, hourly data, or daily data, the trading frequency regulates when trades are signaled and is also a key parameter when considering trading costs. The parameter that controls the trading frequency is defined by the cutoff frequency in the target filter of the MDFA and is regulated in either the Target Filter Design interface (see Figure 3) or, if one is not accustomed to building target filters in MDFA, a simpler parameter is given in the Trading Parameter panel (see Figure 5). In Figure 3, the pass-band and stop-band properties are controlled by any one of the sliding scrollbars. The design of the target filter is plotted in the Filter Design canvas (not shown).

Timeliness of signal. The timeliness of the signal controls the quality of the phase characteristics in the real-time filter that computes the trading signal. Namely, it can control how well turning points (momentum changes) are detected in the financial data while minimizing the phase error in the filter. Bad timeliness properties will lead to a large delay in detecting up/downswings in momentum. Good timeliness properties lead to anticipated detection of momentum in real-time. However, the timeliness must be controlled by smoothness, as too much timeliness leads to the addition of unwanted noise in the trading signal, leading to unnecessary unwanted trades. The timeliness of the filter is governed by the $\lambda$ parameter that controls the phase error in the MDFA optimization. This is done by using the sliding scrollbar marked $\lambda$ in the Real-Time Filter Design in Figure 2. One can also control the timeliness property for ZPC filters using the $\lambda$ scrollbar in the ZPC Filter Design panel (Figure 4).

Smoothness of signal. The smoothness of the signal is related to how well the filter has suppressed the unwanted frequency information in the financial data, resulting in a smoother trading signal that corresponds more directly to the targeted signal and trading frequency. A signal that has been submitted to too much smoothing however will lose any important timeliness advantages, resulting in delayed or no trades at all. The smoothness of the filter can be adjusted through using the $\alpha$ parameter that controls the error in the stop-band between the targeted filter and the computed concurrent filter. The smoothness parameter is found on the Real-Time Filter Design interface in the sliding scrollbar marked $W(\omega)$ (see Figure 2) and in the sliding scrollbar marked $\alpha$ in the ZPC Filter Design panel (see Figure 4).

Quantization of information. In this sense, the quantization of information relates to how much past information is used to construct the trading signal. In MDFA, it is controlled by the length of the filter $L$ and is found on the Real-Time Filter Design interface (see Figure 2). In theory, as the filter length $L$ gets larger. the more past information from the financial time series is used resulting in a better approximation of the targeted filter. However, as the saying goes, there’s no such thing as a free lunch: increasing the filter length adds more degrees of freedom, which then leads to the age-old problem of over-fitting. The result: increased nonsense at the most concurrent observation of the signal and chaos out-of-sample. Fortunately, we can relieve the problem of over-fitting by using regularization (see Secondary Parameters). The length of the filter is controlled in the sliding scrollbar marked Order- $L$ in the Real-Time Filter Design panel (Figure 2).

As you might have suspected, there exists a so-called “uncertainty principle” regarding the timeliness and smoothness of the signal. Namely, one cannot achieve a perfectly timely signal (zero phase error in the filter) while at the same time remaining certain that the timely signal estimate is free of unwanted “noise” (perfectly filtered data in the stop-band of the filter). The greater the timeliness (better phase error), the lesser the smoothness (suppression of unwanted high-frequency noise). A happy combination of these two parameters is always desired, and thankfully there exists in iMetrica an interface to optimize these two parameters to achieve a perfect balance given one’s financial trading priorities. There has been much to say on this real-time direct filter “uncertainty” principle, and the interested reader can seek the gory mathematical details in an original paper by the inventor and good friend and colleague Professor Marc Wildi here.

The Secondary Parameters

Regularization of filters is the act of projecting the filter space into a lower dimensional space,reducing the effective number of degrees of freedom. Recently introduced by Wildi in 2012 (see the Elements paper), regularization has three different members to adjust according to the preferences of the signal extraction problem at hand and the data. The regularization parameters are classified as secondary parameters and are found in the Additional Filter Ingredients section in the lower portion of the Real-Time Filter Design interface (Figure 2). The regularization parameters are described as follows.

Regularization: smoothness. Not to be confused with the smoothness parameter found in the primary list of parameters, this regularization technique serves to project the filter coefficients of the trading signal into an approximation space satisfying a smoothness requirement, namely that the finite differences of the coefficients up to a certain order defined by the smoothness parameter are kept relatively small. This ultimately has the effect that the parameters appear smoother as the smooth parameter increases. Furthermore, as the approximation space becomes more “regularized” according to the requirement that solutions have “smoother” solutions, the effective degrees of freedom decrease and chances of over-fitting will decrease as well. The direct consequences of applying this type of regularization on the signal output are typically quite subtle, and depends clearly on how much smoothness is being applied to the coefficients. Personally, I usually begin with this parameter for my regularization needs to decrease the number of effective degrees of freedom and improve out-of-sample performance.

Regularization: decay. Employing the decay parameter ensures that the coefficients of the filter decay to zero at a certain rate as the lag of the filter increases. In effect, it is another form of information quantization as the trading signal will tend to lessen the importance of past information as the decay increases. This rate is governed by two decay parameter and higher the value, the faster the values decrease to zero. The first decay parameter adjusts the strength of the decay. The second parameter adjusts for how fast the coefficients decay to zero. Usually, just a slight touch on the strength of the decay and then adjusting for the speed of the decay is the order in which to proceed for these parameters. As with the smoothing regularization, the number of effective degrees of freedom will (in most cases) decreases as the decay parameter decreases, which is a good thing (in most cases).

Regularization: cross correlation. Used for building trading signals with multivariate data only, this regularization effect groups the latitudinal structure of the multivariate time series more closely, resulting in more weighted estimate of the target filter using the target data frequency information. As the cross regularization parameter increases, the filter coefficients for each time series tend to converge towards each other. It should typically be used in a last effort to control for over-fitting and should only be used if the financial time series data is on the same scale and all highly correlated.

The Tertiary Parameters

Phase-delay customization. The phase-delay of the filter at frequency zero, defined by the instantaneous rate of change of a filter’s phase at frequency zero, characterizes important information related to the timeliness of the filter. One can directly ensure that the phase delay of the filter at frequency zero is zero by adding constraints to the filter coefficients at computation time. This is done by setting the clicking the $i2$ option in the Real-Time Filter Design interface. To go further, one can even set the phase delay to an fixed value other than zero using the $i2$ scrollbar in the Additional Filter Ingredients box. Setting this value to a certain value (between -20 and 20 in the scrollbar) ensures that the phase delay at zero of the filter reacts as anticipated. It’s use and benefit is still under investigation. In any case, one can seamlessly test how this constraint affects the trading signal output in their own trading strategies directly by visualizing its performance in-sample using the Financial Trading canvas.

Differencing weight. This option, found in the Real-Time Filter Design interface as the checkbox labeled “d” (Figure 2), multiplies the frequency information (periodogram or discrete Fourier transform (DFT)) of the financial data by the weighting function $f(\omega) = 1/(1 - \exp(i \omega)), \omega \in (0,\pi)$ , which is the reciprocal of the differencing operator in the frequency domain. Since the Financial Trading platform in iMetrica strictly uses log-return financial time series to build trading signals, the use of this weighting function is in a sense a frequency-based “de-differencing” of the differenced data. In many cases, using the differencing weight provides better timeliness properties for the filter and thus the trading signal.

In addition to these three levels of parameters used in building real-time trading signals, there is a collection of more exotic “parameterization” strategies that exist in the iMetica MDFA module for fine tuning and constructing boosting trading performance. However, these strategies require more time to develop, a bit of experimentation, and a keen eye for filtering. We will develop more information and tutorials about these advanced filtering techniques for constructing effective trading signals in iMetrica in future articles on this blog coming soon. For now, we just summarize their main ideas.

Advanced Filtering Parameters

Hybrid filtering. In hybrid filtering, the goal is to filter a target signal additionally by injecting it with another filter of a different type that was constructed using the same data, but different paradigm or set of parameters. One method of hybrid filtering that is readily available in the MDFA module entails constructing Zero-Pole Combination filters using the ZPC Filter Design interface (Figure 4) and injecting the filter into the filter constructed in the Real-Time Filter Design interface (Figure 2) (see Wildi ZPC for more information). The combination (or hybrid) filter can then be accessed using one of the check box buttons in the filter interface and then adjusted using all the various levels of parameters above, and then used in the financial trading interface. The effect of this hybrid construction is to essentially improve either the smoothness or timeliness of any computed trading signal, while at the same time not succumbing to the nasty side-effects of over-fitting.

Forecasting and Smoothing signals. Smoothing signals in time series, as its name implies, involves obtaining a smoother estimate of certain signal in the past. Since the real-time estimate of a signal value in the past involves using more recent values, the signal estimation becomes more symmetrical as past and future values at a point in the past are used to estimate the value of the signal. For example, if today is after market hours on Friday, we can obtain a better estimate of the targeted signal for Wednesday since we have information from Thursday and Friday. In the opposite manner, forecasting involves projecting a signal into the future. However, since the estimate becomes even more “anti-symmetric”, the estimate becomes more polluted with noise. How these smoothed and forecasted signals can be used for constructing buy/sell trading signals in real-time is still purely experimental. With iMetrica, building and testing strategies that improve trading performance using either smoothed and forecasted signals (or both), is available.To produce either a smoothed or forecasted signal, there is a lag scrollbar available in the Real-Time Filter Design interface under Additional Filter Ingredients that enables one to compute either a smooth or forecasted signal. Setting the lag value $k$ in the scrollbar to any integer between -10 and 10 and the signal with the set lag applied is automatically computed. For negative lag values $k$ , the method produces a $k$ step-ahead forecast estimate of the signal. For positive values, the method produces a smoothed signal with a delay of $k$ observations.

Customized spectral weighting functions. In the spirit of customizing a trading signal to fit one’s priorities in financial trading, one also has the option of customizing the spectral density estimate of the data generating process to any design one wishes. In the computation of the real-time filter, the periodogram (or DFTs in multivariate case) is used as the default estimate of the spectral density weighting function. This spectral density weighting function in theory is supposed to serve as the spectrum of the underlying data generating process (DGP). However, since we have no possible idea about the underlying DGP of the price movement of publicly traded financial assets (other than it’s supposed to be pretty darn close to a random walk according to the Efficient Market Hypothesis), the periodogram is the best thing to an unbiased estimate a mortal human can get and is the default option in the MDFA module of iMetrica. However, customization of this weighting function is certainly possible through the use of the Target Filter Design interface. Not only can one design their target filter for the approximation of the concurrent filter, but the spectral density weighting function of the DGP can also be customized using some of the available options readily available in the interface. We will discuss these features in a soon-to-come discussion and tutorial on advanced real-time filtering methods.

Adaptive filtering. As perhaps the most advanced feature of the MDFA module, adaptive filtering is an elegant way to achieve building smarter filters based on previous filter realizations. With the goal of adaptive filtering being to improve certain properties of the output signal at each iteration without compensating with over-fitting, the adaptive process is of course highly nonlinear. In short, adaptive MDFA filtering is an iterative process in which a one begins with a desired filter, computes the output signal, and then uses the output signal as explanatory data in the next filtering round. At each iteration step, one has the freedom to change any properties of the filter that they desire, whether it be customization, regularization, adding negative lags, adding filter coefficient constraints, applying a ZPC filter, or even changing the pass-band in the target filter. The hope is to improve on certain properties of filter at each stage of the iterative process. An in-depth look at adaptive filtering and how to easily produce an adaptive filter using iMetrica is soon to come later this week.

iMetrica: Economic and Financial Data Control

Posted on November 14, 2012 by Christian Dallas Blakely

The iMetrica software is endowed with a rich and detailed, yet quite easy-to-use module for uploading, downloading, exporting, editing, combining, transforming, building, simulating, and analyzing time series data. It contains just about anything you’d want to have in an economic or financial time series data control interface while using only simple mouse point-and-click or drag interactions to navigate or download data from the internet. Since the most important aspect of time series analysis is, well, the time series data itself, we created a dedicated data control module to handle the majority of the time series data loading and editing work, before it is exported to any one of the five iMetrica computational modules or financial trading module.

Data Control Interface

We begin this iMetrica blog entry by first giving an overview of the basic components featured in the Data Control module. Figures 1 and 2 show the interface and all the major components labeled. Here, a collection of simulated time series are being plotted together.

Figure 1. The major components of the data control module.

Figure 2. The major components of the data control module, showing the target series editor.

Main plotting canvas. This is where the time series data is plotted. Up to 10 different time series can be loaded into the data control at a time, and all of them can be plotted using the plot control in panel 2. When all the data is plotted together, to highlight a particular series, go to the main Data Control menu in the top left corner and place the mouse on any one the series names, the respective series will then be highlighted.
Plot control panel. The time series that are uploaded into the module can be viewed by toggling their respective check box inside the plot control panel. This is helpful when different time series are scaled different and/or have different means. One can also log-transform the data, rescale the data to have unit standard deviations, or compare data using cross-correlations. Note that the log and rescale check box actions will only apply to the data that is currently being plotted. Furthermore, to plot the cross-correlations, only two time series can be chosen at a time. When one time series is chosen, the auto-correlation plot is drawn. Here, the “Target $X(t)$ indicates a weighted aggregation of the data. To edit this, use the “Target Series” in 3. To delete all of the data stored in the data control module, simply press the “Delete” button. Careful, there’s no going back once deleted.
Simulated and Target Series Panels. The simulated time series data interfaces to simulate a multitude of different time series. Simulating time series can be helpful when wanting to either learn, practice, or explore the different modules and capabilites of iMetrica, learn more about time series analysis, or learn about the dynamics of time series modules. The different types of models include (S)ARIMA models, GARCH models, correlated cycle models, trend models, multivariate factor stochastic volatility models, and HEAVY models. From simulating data and toggling the parameters, one can visualize instantly the effects of the each parameter on the simulated data. The data can then be exported to any of the modules for practicing and honing one’s skills in hybrid modeling, signal extraction, and forecasting. Each model has a “parameter” button (see 4) that controls the dimensions, innovation distributions, or parameter values. When changes are made, the simulated series is recomputed automatically and replotted on their respective plotting canvas (see 4).
Simulated Data Control. Once the parameters have been selected, and a desired simulated series has been achieved to one’s liking, it can be added to the main data control plotting canvas by clicking the “Add” button. The new simulated series is now ready to be exported to any of the modules. One can also change the random seed that controls the “burn-in” of the innovation sequence (random effects that govern the initialization and trajectory of the data). In some of the models, one can “integrate” the data to render stationary data nonstationary.
Parameter Controls. Once the “Parameters” button has been clicked, an additional panel will pop up where controls for all the model’s parameters can be toggled. Once any parameter has been changed using the sliders, scrollbars, or combo boxes, the simulated data is automatically recomputed and plotted, making it a great tool to understand time series model dynamics.
Target Series Construction. The target series is used to construct a univariate time series that is a weighted sum of one or more time series (given by the $X_i(t)$ for $i=1,\ldots,10$ series). In modules that only deal with univariate time series data (the uSimX13, EMD, and State Space Modeling), the constructed target series is the series that gets exported for analysis. For the MDFA module, this is the series that is being filtered for constructing a signal, with the other time series acting as the explanatory time series. In the BayesCronos module, this target series is ignored and only the supporting time series data $X_i(t)$ are used. In these up and down slider controls, one can adjust for the weight associated with that specific series, and the aggregate target series will be automatically recomputed as it is adjusted.
Series Checkboxes. To ignore the series entirely in the computation of the target series, simply click the check box “off” in the associated “computed in target” check box. This will eliminate it from the target sum. In the case one is constructing data for the MDFA module, one has the option of utilizing a series in the target series, but not using it as an explaining time series variable, and vice-versa.

Loading Data from Files

Within this main data control hub, one can import univariate or multivariate time series data from a multitude of file formats, as well as download financial time series data directly from Yahoo! finance or another source such as Reuters for higher-frequency financial data. To load data from a file, simply click on the “Data Input/Export” menu when in the Data Control module and select one of the “Load” data options. The “Load Data” option pop up a “file select” panel and from there, the data file can be selected. The format of the data in this “Load Data” case is simple: a single column of data for each series. If more than one series is present, the data column must be separated by a space. In the “Load CSV” data, this assumes the file is stored in a CSV format. See Figure 3 for the menu options of the Data Control module.

Figure 3. Showing the different options for importing data into the data control module.

Downloading Financial Data

The other option for loading data into the module is through the “Load Market Data” interface. Rather than loading data from a file that is sitting in your directory, you also conveniently have the option to download data directly from the internet or financial time series database, such as Reuters. As a fast and easy way to download financial data into iMetrica, when the “Load Market Data” is selected, a pop-up panel interface will surface that gives access to controlling the download of financial market data. This is shown in Figure 4. The options on this interface are described below.

Figure 4. The “Load Market Data” interface to download market data directly from Yahoo!. Here the daily log-returns and volume of Google (GOOG) and Apple (AAPL) are being downloaded.

Symbols(s) – In this text box, type the market ticker symbol of the desired financial series in all CAPS. Each ticker symbol must be seperated only by one space and nothing else. Up to 10 ticker symbols can be entered.
Start Date – This indicates the year, month, and day from which the financial time series begins. This date must obviously be in the past. If the day falls on a non-traded day such as a weekend or holiday, the nearest date after that date will be chosen. The time series will then be loaded to the most recent date available for that asset.
Hours – This indicates the time period in which the frequency of the data is selected. In most cases, this should simply be set to “US Market Hours”.
Frequency – The frequency of the data. The options are Second, Minute, 3,5,10,15,30-Minute, Hourly, Daily, Weekly, Monthly.
New Data Set – Deletes all the data already stored in the data control module and uploads as new data.
Log Returns – Download the data in log-return format. This is usually the case when using the data to build financial trading strategies using the MDFA module. However, in addition to the log-return data, it will also download the log-transformed raw time series data of the first asset in the Symbols(s) box. This is generally used for gauging financial trading accounts in the financial trading interface of iMetrica. When Financial Trading is turned on in the data control menu this is automatically set on.
Volume Data – In addition to the asset time series data, the volume (of trades) data associated for the given frequency will also be downloaded for each market ticker symbol given in Symbols(s).
Yahoo! Source – The financial data will be downloaded from Yahoo! finance (thus you need an internet connection). If this box is not checked, then the downloader will assume a Reuters financial database (but of course for this you need an account with Reuters).

Once the settings are made in the interface, click “Download Market Data”. If no errors are present in the settings, then all the data should be automatically available in the plot canvas after a few seconds of downloading time. Figure 5 gives the results of the data download from the example in Figure 4. Here, the daily log-returns of Google (GOOG) and Apple (AAPL) along with their daily volumes from 6-4-2011 to today (11-14-2012) have been downloaded into the data control module and ready for use. Notice the scaling of the volume data (final two series) have been adjusted using the simple slider bars in the “Target Series” panel to more-or-less fit the scale of the log-return data.

Figure 5. The daily log-returns of Google (GOOG) and Apple (AAPL) along with their respective volumes loaded into the data control module and plotted on the canvas. The data was uploaded by using the “Load Market Data” interface panel.

If there were errors, then no data will be uploaded to the canvas and you have to try again. Common errors are either no internet connection, the symbols are either incorrect or not in CAPS, or the starting date is bogus. Once the data is available to be plotted, simply click the check boxes associated with each plot. edit, scale, export, analyse, compute, and/or trade away!

More options for downloading data will constantly be added to the iMetrica software. Check back to the blog regularly for more updates and additions as they come. Of course, suggestions are always welcome.

Model comparison with data sweeps

Posted on November 14, 2012 by Christian Dallas Blakely

This slideshow requires JavaScript.

A useful exercise in modeling economic time series is to perform a “sliding window” analysis of the data that computes models in subsets of the data and tests for the robustness of signal extractions, forecasts, and parameter variance relative to a growing subset of the data. For instance, for a time series of length 300, one could estimate a model on a shorter subset of the data, say for the first 200 observations, and then increase the amount of observations, re-estimate, and then see how the model parameter values change as the number of observations or data subset increases. One can also see how the signal extractions and forecasts change with additional data. Ideally, if the model is specified correctly for the data, there should be a very small variance in the estimated parameters as more data is added to the time series. It signifies the stability of the model selection. Normally, such an exercise would be tedious to carry out with X-13ARIMA-SEATS, or any other software such as MATLAB or R as scripts or spec files would have to be written for each individual re-estimation and then re-plotted. In the uSimX13 module of iMetrica however, this task has been rendered an easy one with the addition of a sliding windows tool. In this blog entry, we describe this so-called “sliding windows” process and show just how fast and seamless it is to perform model choice robustness and comparisons in iMetrica.

We begin by describing the sliding span/window tool in the iMetrica-uSimX13 module. Once time series data has been loaded into the uSimX13 module from either the uSimX13 main menu or imported from the Data Control module, the uSimX13 computation engine must first be turned on from the uSimX13 menu. Then to access the sliding windows interface, simply click on the “Sliding Span/Window Activate” check box in the main uSimX13 menu (see Figure 1).

Figure 1. Main drop down menu for the uSimX13 module, showing the “Sliding Span/Window Activate” check box.

Once clicked, the entire plotting canvas will turn to a dark shade of blue, which indicates the windowed region in which model estimation occurs. To control the sliding window, place the mouse cursor along one of the edges of the canvas and slowly glide the mouse with the left-mouse button held down either left or right, depending on which edge of the plot canvas you are on. Moving to the left or right with the left mouse button held down, the windowed area will shrink or expand. The model parameters are estimated instantaneously as the window adjusts and in effect, all the available model statistics, diagnostics, signals, and forecasts are computed as well. For example, as the window expands or shrinks, the trend, seasonally adjusted data, and 24-step ahead forecasts can be plotted and viewed in real-time as the window changes (see Figure 2). One can also slide the window to the left or right by placing the mouse anywhere inside the blue-windowed region, holding down the left mouse button and moving along the time domain. This way, the window length will remain fixed, but the window center will move along different subsets of the data. This can be useful for seeing how model parameters can change within regions of data that exhibit regime changes, namely a sequence in the series that suddenly changes in seasonal or cyclical structure after a certain time observation. The data can now be modeled in both sections before and after the regime change occurs in order to compare the estimated parameter values.

Figure 2. The window sliding across different subsets of the data. The signal extractions, forecast, and model parameters are recomputed automatically as the window changes. Forecast comparisons with the real data as the window span moves is now trivial. Here, the plot in cyan represents the original time series data in-sample and the 24 step forecast out-of-sample, and the light green plot is the time series data adjusted for outliers, as indicated in the model box. One can select the plots using the “series components” plot box. The data in gray represents the time series data not used in the model estimation.

Data Sweep

With the ability to seamlessly capture partitions of the data and model within the given partition using the sliding window, a natural extension of this mouse-on-canvas utility is to employ it somehow in comparing different models of the time series data. We call this method of model comparison time series data sweeping (or simply data sweeping) and it involves selecting an initial window of data from the first observation to the $n$ -th observation where n is some number much less than the total number of observations $latex N$ in the data set (say, one third the amount). The data sweep then computes the sliding window from $n$ as the final observation all the way to $N$ , in increments of one (see Figure 3). At each addition to the length of the window, the forecast is computed for up to 24 steps ahead. Of course, since the true time series data is known in the out-of-sample region of computation, we can compute the forecast error for up to $h \leq 24$ steps ahead and sum up these errors as $n$ increases to $N$ . We can do this data sweep for several models, computing the aggregate forecast errors over time. The idea is that the best model for the data will ideally have the smallest forecast error, and thus comparing this forecast error with several models will identify the model with the best overall forecasting ability.

To access the data sweep, simply go to the main uSimX13 menu, shown in Figure 1, and click “Sweep Time Series Control Panel”. This will bring up the main interface for the data sweep (shown in Figures 4-6). To begin the sweep, first select the model and regressors desired to model the data with inside the model selection panel of the main uSimX13 interface. Then choose at which observation you’d like to carry out the data sweep (starting at observation $n=60$ is the default). Lastly, select how many forecast steps you’d like to use in computing the forecast error (1-24). Once content with the settings, click the “Compute time series sweep” button and watch as the window span increases from $n$ to $N$ , recomputing parameters, signals, and forecasts at each step (see slideshow at top of post). Once the sweep is complete, the parameter statistics, Ljung-Box mean value at two different lags, and the total forecast error is displayed in the control panel. To compare this with another model, save the results of the sweep by clicking “save parameters” in the uSimX13 menu, and then choose another model and recompute (while using the same settings as the previous sweep, of course).

To give an example of this process, we begin by simulating a time series data set of length $N = 300$ from a SARIMA model of dimension $(0,1,2)(0,1,1)_{12}$ , namely a seasonal auto-regressive integrated moving-average process with two non-seasonal moving-average parameters, and one seasonal moving average parameter. The data sweep is performed on the simulated data with a forecast error horizon of length 23 using three different SARIMA models, (a) $(0,1,1)(0,1,1)_{12}$ , (b) $(1,1,0)(0,1,1)_{12}$ , (c) $(0,1,2)(0,1,1)_{12}$ , the true model. See Figures 4-6 below to see the data sweep results and the estimated parameter mean and standard deviation, the average Ljung-Box statistics at lag 12 and 0, and the forecast errors for each model. Notice the forecast error for the true model (c) (figure 6) is the lowest followed by model (b) (figure 6) and then (a) (figure 4), which is exactly what we would want.

Figure 4. Model (a) and the parameter statistics, forecast error, and data sweep controls.

Figure 5. Model (b) and the parameter statistics, forecast error, and data sweep controls.

Figure 6. Model (c) (the true model) and the parameter statistics, forecast error, and data sweep controls.

Hybrid Signal Extraction, Machine Learning, and Algorithmic Financial Trading

Combining Computational Methodologies for Real-Time Learning and Signal Extraction

Monthly Archives: November 2012