iMetrica for Linux Ubuntu 64 now available

The MDFA real-time signal extraction module

The MDFA real-time signal extraction module

My first open-source release of iMetrica for Linux Ubuntu 64 can now be downloaded at my Github, with a Windows 64 version soon to follow. iMetrica is a fast, interactive, GUI-oriented software suite for predictive modeling, multivariate time series analysis, real-time signal extraction, Bayesian financial econometrics, and much more.

The principal use of iMetrica is to provide an interactive environment for the numerical and visual analysis of (multivariate) time series modeling, real-time filtering, and signal extraction. The interactive features in iMetrica boast a modeling and graphics environment for analysts, practitioners, and students of econometrics, finance, and real-time data analysis where no coding or modeling experience is necessary. All the system needs is data which can be piped into the system in many forms, including .csv, .txt, Google/Yahoo Finance, Quandle, .RData, and more. A module for connecting to MySQL databases is currently being developed. One can also simulate their own data from a one or a combination of several different popular data generating models.

With the design intending to be interactive and self-enclosed, one can change modeling data/parameter inputs and see the effects in both graphical and numerical form automatically. This feature is designed to help understand the underlying mechanics of the modeling or filtering process. One can test many attributes of the modeling or filtering process this way both visually and numerically such as sensitivity, nonlinearity, goodness-of-fit, any overfitting issues, stability, etc.

All the computational libraries were written in GNU C and/or Fortran and have been provided as Native libraries to the Java platform via JNI, where Java provides the user-interface, control, graphics, and several other components in a module format, and where each module specializes in a different data analysis paradigm. The modules available in this open-source version of iMetrica are as follows:

1) Data simulation, modeling and fitting using several popular econometric models

  • (S)ARIMA, (E)GARCH, (Multivariate) Factor models, Stochastic Volatility, High-frequency volatility models, Cycles/Trends, and more
  • Random number generators from several different types of parameterized distributions to create shocks, outliers, regression components, etc.
  • Visualize in real-time all components of the modeling process

2) An interactive GUI for multivariate real-time signal extraction using the multivariate direct filter approach (MDFA)

  • Construct mulitvariate MA filter designs, classical ARMA ZPA filtering designs, or hybrid filtering designs.
  • Analyze all components of the filtering and signal extraction process, from time-delay and smoothing control, to regularization.
  • Adaptive real-time filtering
  • Construct financial trading signals and forecasts
  • Includes a real-time/frequency analysis module using MDFA

3) An interactive GUI for X-13-ARIMA-SEATS called uSimX13

  • Perform automatic seasonal adjustment on thousands of economic time series
  • Compare SARIMA model choices using several different novel signal extraction diagnostics and tools available only in iMetrica
  • Visualize in real-time several components of modeling process
  • Analyze forecasts and compare with other models
  • All of the most important features of X-13-ARIMA-SEATS included

4) An interactive GUI for RegComponent (State Space and Unobserved Component Models)

  • Construct unobserved signal components and time-varying regression components
  • Obtain forecasts automatically and compare with other forecasting models

5) Empirical Mode Decomposition

  • Applies a fast adaptive EMD algorithm to decompose nonlinear, nonstationary data into a trend and instrinsic modes.
  • Visualize all time-frequency components with automatically generated 2D heat maps.

6) Bayesian Time Series Modeling of ARIMA, (E)GARCH, Multivariate Stochastic Volatility, HEAVY models

  • Compute and visualize posterior distribtions for all modeling parameters
  • Easily compare different model dimensions

7) Financial Trading Strategy Engineering with MDFA

  • Construct financial trading signals in the MDFA module and backtest the strategies on any frequency of data
  • Perform analysis of the strategies using forward-walk schemes
  • Automatically optimize certain components of the signal extraction on in-sample data.
  • Features a toolkit for minimizing probability of backtest overfit

Tutorials on how to use iMetrica can be found on this blog and will be added on a weekly basis, with new tools, features, and modules being added and improved on a consistent basis.

Please send any bug reports, comments, complaints, to clisztian@gmail.com.

Realizing the Future with iMetrica and HEAVY Models

In this article we steer away from multivariate direct filtering and signal extraction in financial trading and briefly indulge ourselves a bit in the world of analyzing high-frequency financial data, an always hot topic with the ever increasing availability of tick data in computationally convenient formats. Not only has high-frequency intraday data been the basis of higher frequency risk monitoring and forecasting, but it also provides access to building ‘smarter’ volatility prediction models using so-called realized measures of intraday volatility. These realized measures have been shown in numerous studies over the past 5 years or so to provide a solidly more robust indicator of daily volatility.   While daily returns only capture close-to-close volatility, leaving much to be said about the actual volatility of the asset that was witnessed during the day, realized measures of volatility using higher frequency data such as second or minute data provide a much clearer picture of open-to-close variation in trading.

In this article, I briefly describe a new type of volatility model that takes into account these realized measures for volatility movement called  High frEquency bAsed VolatilitY (HEAVY) models developed and pioneered by Shephard and Sheppard 2009. These models take as input both close-to-close daily returns r_t as well as daily realized measures to yield better forecasting dynamics. The models have been shown to be endowed with the ability to not only track momentum in volatility, but also adjust for mean reversion effects as well as adjust quickly to structural breaks in the level of the volatility process.  As the authors (Sheppard and Shephard, 2009) state in their original paper, the focus of these models is on predictive properties, rather than on non-parametric measurement of volatility. Furthermore, HEAVY models are much easier and more robust estimation wise than single source equations (GARCH, Stochastic Volatility) as they bring two sources of volatility information to identify a longer term component of volatility.

The goal of this article is three-fold. Firstly, I briefly review these HEAVY models and give some numerical examples of the model in action using a gnu-c library and Java package called heavy_model that I develped last year for the iMetrica software. The heavy_model package is available for download (either by this link or e-mail me) and features many options that are not available in the MATLAB code provided by Sheppard (bootstrapping methods, Bayesian estimation, track reparameterization, among others). I will then demonstrate the seamless ability to model volatility with these High frEquency bAsed VolatilitY models using iMetrica, where I also provide code for computing realized measures of volatility in Java with the help of an R package called highfrequency (Boudt, Cornelissen, and Payseur 2012).  

HEAVY Model Definition

Let’s denote the daily returns as r_1, r_2, \ldots, r_T, where T is the total amount of days in the sample we are working with. In the HEAVY model, we supplement information to the daily returns by a so-called realized measure of intraday volatility based on higher frequency data, such as second, minute or hourly data. The measures are called daily realized measures and we will denote them as RM_1, RM_2, \ldots, RM_T for the total number of days in the sample.  We can think of these daily realized measures as an average of variance autocorrelations during a single day. They are supposed to provide a better snapshot of the ‘true’ volatility for a specific day t. Although there are numerous ways of computing a realized measure, the easiest is the realized variance computed as RM_t = \sum_j (X_{t+t_{j,t}} - X_{t+t_{j-1,t}})^2 where t_{j,t} are the normalized times of trades on day t. Other methods for providing realized measures includes using Kernel based methods which we will discuss later in this article (see for example http://papers.ssrn.com/sol3/papers.cfm?abstract_id=927483).

Once the realized measures have been computed for T days, the HEAVY model is given by:

Var(r_t | \mathcal{F}_{t-1}^{HF}) = h_t = \omega_1 + \alpha RM_{t-1} + \beta h_{t-1} + \lambda r^2_t

E(RM_t | \mathcal{F}_{t-1}^{HF}) = \mu_t = \omega_2+ \alpha_R RM_{t-1} + \beta_R \mu_{t-1},

where the stability constraints are  \alpha, \omega_1 \geq 0, \beta \in [0,1] and \omega_2, \alpha_R \geq 0 with \lambda + \beta \in [0,1] and \beta_R + \alpha_R \in [0,1]. Here, the \mathcal{F}_{t-1}^{HF} denotes the high-frequency information from the previous day t-1. The first equation models the close-to-close conditional variance and is akin to a GARCH type model, whereas the second equation models the conditional expectation of the open-to-close variation.  

With the formulation above, one can easily see that slight variations to the model are perfectly plausible. For example, one could consider additional lags in either the realized measure RM_t (akin to adding additional moving average parameters) or the conditional mean/variance variable (akin to adding autoregression parameters). One could also leave out the dependence on the squared returns r^2_t by setting \lambda to zero, which is what the original others recommended. A third variation is adding yet another equation to the pack that models a realized measure that takes into account negative and positive momentum to yield possibly better forecasts as it tracks both losses and gains in the model. In this case, one would add the third component by introducing a new equation for a realized semivariance to parametrically model statistical leverage effects, where falls in asset prices are associated with increases in future volatility.  With realized semivariance computed for the T days as RMS_1, \ldots RMS_T, the third equation becomes

E(RMS_t | \mathcal{F}_{t-1}^{HF}) = \phi_t = \omega_3 + \alpha_{RS} RMS_{t-1} + \beta_{RS} \phi_{t-1}

where \alpha_{RS} + \beta_{RS} < 1 and both positive.

HEAVY modeling in C and Java

To incorporate these HEAVY models into iMetrica, I began by writing a gnu-c library for providing a fast and efficient framework for both quasi-likelihood evaluation and a posteriori analysis of the models. The structure of estimating the models follows very closely to the original MATLAB code provided by Sheppard. However, in the c library I’ve added a few more useful tools for forecasting and distribution analysis. The Java code is essentially a wrapper for the c heavy_model library to provide a much cleaner approach to modeling and analyzing the HEAVY data such as the parameters and forecasts.  While there are many ways to declare, implement, and analyze HEAVY models using the c/java toolkit I provide, the most basic steps involved are as follows.

heavyModel heavy = new heavyModel();
heavy.setForecastDimensions(n_forecasts, n_steps);
heavy.setParameterValues(w1, w2, alpha, alpha_R, lambda, beta, beta_R);
heavy.setTrackReparameter(0);
heavy.setData(n_obs, n_series, series);
heavy.estimateHeavyModel();

The first line declares a HEAVY model in java, while the second line sets the number of forecasts samples to compute and how many forecast steps to take. Forecasted values are provided for both the return variable r_t (using a bootstrapping methodology) and the h_t, \mu_t variables. In the next line, the parameter values for the HEAVY model are initialized. These are the initial points that are utilized in the quasi-maximum likelihood optimization routine and can be set to any values that satisfy the model constraints.   Here, w1 = \omega_1, w2 = \omega_2.

The fourth line is completely optional and is used for toggling (0=off, 1=on) a reparameterization of the HEAVY model so the intercepts of both equations in the HEAVY model are explicitly related to the unconditional mean of squared returns r^2 and realized measures RM_t. The reparameterization of the model has the advantage that it eliminates the estimation of \omega_1, \omega_2 and instead uses the unconditional means, leaving two less degrees of freedom in the optimization. See page 12 of the Shephard and Sheppard 2009 paper for a detailed explanation of the reparameterization. After setting the initial values, the data is set for the model by inputting the total number of observation T, the number of series (normally set to 2 and the data in column-wise format (namely a double array of length n_obs x n_series, where the first column is the return data r_t and the second column is the daily realized measure data.  Finally, with the data set and the parameters initialized  we estimate the model in the 6th line. Once the model is finished estimating (should take a few seconds, depending on the number of observations), the heavyModel java object stores the parameter values, forecasts, model residuals, likelihood values, and more. For example, one can print out the estimated model parameters and plot the forecasts of h_t using the following:


heavy.printModelParameters();
heavy.plotForecasts();
Output:
w_1 = 0.063 w_2 = 0.053
beta = 0.855 beta_R = 0.566
alpha = 0.024 alpha_R = 0.375
lambda = 0.087

Figure 1 shows the plot of the filtered h_t, \mu_t values for 300 trading days from June 2011 to June 2012 of AAPL with the final 20 points being the forecasted values. Notice that the multistep ahead forecast shows momentum which is one of the attractive characteristics of the HEAVY models as mentioned in the original paper by Shephard and Sheppard.

Figure 1: Plots of the filtered returns and realized measures with 20 step forecasts for Verizon for 300 trading days.

Figure 1: Plots of the filtered returns and realized measures with 20 step forecasts for Verizon for 300 trading days.

We can also easily plot the estimated joint distribution function F_{\zeta, \eta} by simply using the filtered h_t, \mu_t and computing the devolatilized values \zeta_t = r_t/ \sqrt{h_t}, \eta_t = (RM_t/\mu_t)^{1/2}, leading to the innovations for the model for t = 2,\ldots,T.

Figure 2 below shows the empirical distribution of F_{\zeta, \eta} for 600 days (nearly two years of daily observations from AAPL).  The $\zeta_t$ sequence should be roughly a martingale difference sequences with unit variance and the $\eta_t$ sequence  should have unit conditional means and of course be uncorrelated.  The empirical results validate the theoretical values.

Figure 2: Scatter plot of the empirical distribution of devolatilized values for h and mu.

Figure 2: Scatter plot of the empirical distribution of devolatilized values for h and mu.

In order to compile and run the heavy_model library and the accompanying java wrapper, one must first be sure to meet the requirements for installation. The programs were extensively tested on a 64bit Linux machine running Ubuntu 12.04. The heavy_model library written in c uses the GNU Scientific Library (GSL) for the matrix-vector routines along with a statistical package in gnu-c called apophenia (Klemens, 2012) for the optimization routine. I’ve also included a wrapper for the GSL library called multimin.c which enables using the optimization routines from the GSL library, but were not heavily tested.  The first version (version 00) of the heavy_model library and java wrapper can be downloaded at sourceforge.net/projects/highfrequency.  As a precautionary warning, I must confess that none of the files are heavily commented in any way as this is still a project in progress. Improvements in code, efficiency, and documentation will be continuously coming.

After downloading the .tar.gz package, first ensure that GSL and Apophenia are properly installed and the libraries are correctly installed to the appropriate path for your gnu c compiler. Second, to compile the .c code, copy the makefile.test file to Makefile and then type make. To compile the heavyModel library and utilize the java heavyModel wrapper (recommended), copy makefile.lib to Makefile, then type make. After it constructs the libheavy.so, compile the heavyModel.java file by typing javac heavyModel.java. Note that the java files were complied successfully using the Oracle Java 7 SDK.  If you have any questions about this or any of the c or java files, feel free to contact me. All the files were written by me (except for the optional multimin.c/h files for the optimization) and some of the subroutines (such as the HEAVY model simulation) are based on the MATLAB code by Sheppard. Even though I fully tested and reproduced the results found in other experiments exploring HEAVY models, there still could be bugs in the code. I have not fully tested every aspect (especially the Bayesian estimation components, an ongoing effort) and if anyone would like to add, edit, test, or comment on any of the routines involved in either the c or java code, I’d be more than happy to welcome it.

HEAVY Modeling in iMetrica

The Java wrapper to the gnu-c heavy_model library was installed in the iMetrica software package and can be used for GUI style modeling of high-frequency volatility. The HEAVY modeling environment is a feature of the BayesCronos module in iMetrica that also features other stochastic models for capturing and forecasting volatility such as (E)GARCH, stochastic volatility, mutlivariate stohastic factor modeling, and ARIMA modeling, all using either standard (Q)MLE model estimation or a Bayesian estimation interface (with histograms showing the MCMC results of the parameter chains).

Modeling volatility with HEAVY models is done by first uploading the data into the BayesCronos module (shown in Figure 3) through the use of either the BayesCronos Menu (featured on the top panel) or by using the Data Control Panel (see my previous article on Data Control).

Figure 3: BayesCronos interface in iMetrica for HEAVY modeling.

Figure 3: BayesCronos interface in iMetrica for HEAVY modeling.

In the BayesCronos control panel shown above, we estimate a HEAVY model for the uploaded data (600 observations of r_t, RM_t) that were simulated from a model with omega_1 = 0.05, omega_2 = 0.10, beta = 0.8 beta_R = 0.3, alpha = 0.02, alpha_R = 0.3 (the simulation was done in the Data Control Module).

The model type is selected in the panel under the Model combobox. The number of forecasting steps and forecasting samples (for the r_t variable) are selected in the Forecasting panel. Once those values are set, the model estimates are computed by pressing the “MLE” button in the bottom lower left corner. After the computing is done, all the available plots to analyze the HEAVY model are available by simply clicking the appropriate plotting checkboxes directly below the plotting canvas.   This includes up to 5 forecasts, the original data, the filtered h_t, \mu_t values,  the residuals/empirical distributions of the returns and realized measures, and the pointwise likelihood evaluations for each observation. To see the estimated parameter values, simply click the “Parameter Values” button in the “Model and Parameters” panel and pop-up control panel will appear showing the estimated values for all the parameters.

Realized Measures in iMetrica

Figure : Computing Realized measures in iMetrica using a convenient realized measure control panel.

Figure 4: Computing Realized measures in iMetrica using a convenient realized measure control panel.

Importing and computing realized volatility measures in iMetrica is accomplished by using the control panel shown in Figure 4. With access to high frequency data, one simply types in the ticker symbol in the “Choose Instrument” box, sets the starting and ending date in the standard CCYY-MM-DD format, and then selects the kernel used for assembling the intraday measurements. The Time Scale sets the frequency of the data (seconds, minutes  hours) and the period scrollbar sets the alignment of the data. The Lags combo box determines the bandwidth of the kernel measuring the volatility. Once all the options have been set, clicking on the “Compute Realized Volatility” button will then produce three data sets for the time period given between start date and end data: 1) The daily log-returns of the asset r_1, \ldots, r_T 2) The log-price of the asset, and 3) The realized volatility measure RM_1, \ldots, RM_T. Once the Java-R highfrequency routine has finished computing the realized measures, the data sets are automatically available in the Data Control Module of iMetrica. From here, one can annualize the realized measures using the weight adjustments in the Data Control Module (see Figure 5). Once content with the weighting, the data can then be exported to the MDFA module or the BayesCronos module for estimating and forecasting the volatility of GOOG using HEAVY models.

Figure 4: The log-return data (blue) and the (annualized) realized measure data using 5 minute returns (pink).

Figure 5: The log-return data (blue) and the (annualized) realized measure data using 5 minute returns (pink) for Google from 1-1-2011 to 6-19-2012.

The Realized Measure uploading in iMetrica utilizes a fantastic R package for studying and working with high frequency financial data called highfrequency (Boudt, Cornelissen, and Payseur 2012). To handle the analysis of high frequency financial data in java, I began by writing a Java wrapper to the R functions of the highfrequency R package to enable GUI interaction shown above in order to download the data into java and then iMetrica. The java environment uses library called RCaller that opens a live R kernel in the Java runtime environment from which I can call and R routines and directly load the data into Java. The initializing sequence looks like this.


caller.getRCode().addRCode("require (Runiversal)");
caller.getRCode().addRCode("require (FinancialInstrument)");
caller.getRCode().addRCode("require (highfrequency)");
caller.getRCode().addRCode("loadInstruments('/HighFreqDataDirectoryHere/Market/instruments.rda')");
caller.getRCode().addRCode("setSymbolLookup.FI('/HighFreqDataDirectoryHere/Market/sec',use_identifier='X.RIC',extension='RData')");

Here, I’m declaring the R packages that I will be using (first three lines) and then I declare where my HighFrequency financial data symbol lookup directory is on my computer (next two lines). This
then enables me to extract high frequency tick data directly into Java. After loading in the desired intrument ticker symbol names, I then proceed to extract the daily log-returns for the given time frame, and then compute the realized measures of each asset using the rKernelCov function in highfrequency R package. This looks something like

for(i=0;i<n_assets;i++)
{
String mark = instrum[i] + "<-" + instrum[i] + "['T09:30/T16:00',]";

caller.getRCode().addRCode(mark);

String rv = "rv"+i+"<-rKernelCov("+instrum[i]+"$Trade.Price,kernel.type ="+kernels[kern]+", kernel.param="+lags+",kernel.dofadj = FALSE, align.by ="+frequency[freq]+", align.period="+period+", cts=TRUE, makeReturns=TRUE)"

caller.getRCode().addRCode(rv);

caller.getRCode().addRCode("names(rv"+i+")<-'rv"+i+"'");
rvs[i] = "rv_list"+i;

caller.getRCode().addRCode("rv_list"+i+"<-lapply(as.list(rv"+i+"), coredata)");
}

In the first line, I’m looping through all the asset symbols (I create Java strings to load into the RCaller as commands). The second line effectively retrieves the data during market hours only (America/New_York time), then creates a string to call the rKernelCov function in R. I give it all the user defined parameters defined by strings as well. Finally, in the last two lines, I extract the results and put them into an R list from which the java runtime environment will read.

Conclusion

In this article I discussed a recently introduced high frequency based volatility model by Shephard and Sheppard and gave an introduction to three different high-performance tools beyond MATLAB and R that I’ve developed for analyzing these new stochastic models. The heavyModel c/java package that I made available for download gives a workable start for experimenting in a fast and efficient framework the benefit of using high frequency financial data and most notably realized measures of volatility to produce better forecasts. The package will continuously be updated for improvements in both documentation, bug fixes, and overall presentation. Finally, the use of the R package highfrequency embedded in java and then utilized in iMetrica gives a fully GUI experience to stochastic modeling of high frequency financial data that is both conveniently easy to use and fast.

Happy Extracting and Volatilitizing!

Model comparison with data sweeps

This slideshow requires JavaScript.

A useful exercise in modeling economic time series is to perform a “sliding window” analysis of the data that computes models in subsets of  the data and tests for the robustness of signal extractions, forecasts, and parameter variance relative to a growing subset of the data. For instance, for a time series of length 300, one could estimate a model on a shorter subset of the data, say for the first 200 observations, and then increase the amount of observations, re-estimate, and then see how the model parameter values change as the number of observations or data subset increases. One can also see how the signal extractions and forecasts change with additional data. Ideally, if the model is specified correctly for the data, there should be a very small variance in the estimated parameters as more data is added to the time series. It signifies the stability of the model selection. Normally, such an exercise would be tedious to carry out with X-13ARIMA-SEATS, or any other software such as MATLAB or R as scripts or spec files would have to be written for each individual re-estimation and then re-plotted. In the uSimX13 module of iMetrica however, this task has been rendered an easy one with the addition of a sliding windows tool.  In this blog entry, we describe this so-called “sliding windows” process and show just how fast and seamless it is to perform model choice robustness and comparisons in iMetrica.

We begin by describing the sliding span/window tool in the iMetrica-uSimX13 module. Once time series data has been loaded into the uSimX13 module from either the uSimX13 main menu or imported from the Data Control module, the uSimX13 computation engine must first be turned on from the uSimX13 menu. Then to access the sliding windows interface,  simply click on the “Sliding Span/Window Activate” check box in the main uSimX13 menu (see Figure 1).

Figure 1. Main drop down menu for the uSimX13 module, showing the “Sliding Span/Window Activate” check box.

Once clicked, the entire plotting canvas will turn to a dark shade of blue, which indicates the windowed region in which model estimation occurs. To control the sliding window, place the mouse cursor along one of the edges of the canvas and slowly glide the mouse with the left-mouse button held down either left or right, depending on which edge of the plot canvas you are on. Moving to the left or right with the left mouse button held down, the windowed area will shrink or expand. The model parameters are estimated instantaneously as the window adjusts and in effect, all the available model statistics, diagnostics, signals, and forecasts are computed as well. For example, as the window expands or shrinks, the trend, seasonally adjusted data, and 24-step ahead forecasts can be plotted and viewed in real-time as the window changes (see Figure 2). One can also slide the window to the left or right by placing the mouse anywhere inside the blue-windowed region, holding down the left mouse button and moving along the time domain. This way, the window length will remain fixed, but the window center will move along different subsets of the data. This can be useful for seeing how model parameters can change within regions of data that exhibit regime changes, namely a sequence in the series that suddenly changes in seasonal or cyclical structure after a certain time observation. The data can now be modeled in both sections before and after the regime change occurs in order to compare the estimated parameter values.

Figure 2. The window sliding across different subsets of the data. The signal extractions, forecast, and model parameters are recomputed automatically as the window changes. Forecast comparisons with the real data as the window span moves is now trivial. Here, the plot in cyan represents the original time series data in-sample and the 24 step forecast out-of-sample, and the light green plot is the time series data adjusted for outliers, as indicated in the model box. One can select the plots using the “series components” plot box. The data in gray represents the time series data not used in the model estimation.

Data Sweep

With the ability to seamlessly capture partitions of the data and model within the given partition using the sliding window, a natural extension of this mouse-on-canvas utility is to employ it somehow in comparing different models of the time series data. We call this method of model comparison time series data sweeping (or simply data sweeping) and it involves selecting an initial window of data from the first observation to the n-th observation where n is some number much less than the total number of observations $latex N$ in the data set (say, one third the amount). The data sweep then computes the sliding window from n as the final observation all the way to N, in increments of one (see Figure 3). At each addition to the length of the window, the forecast is computed for up to 24 steps ahead. Of course, since the true time series data is known in the out-of-sample region of computation, we can compute the forecast error for up to h \leq 24 steps ahead and sum up these errors as n increases to N. We can do this data sweep for several models, computing the aggregate forecast errors over time. The idea is that the best model for the data will ideally have the smallest forecast error, and thus comparing this forecast error with several models will identify the model with the best overall forecasting ability.

To access the data sweep, simply go to the main uSimX13 menu, shown in Figure 1, and click “Sweep Time Series Control Panel”. This will bring up the main interface for the data sweep (shown in Figures 4-6). To begin the sweep, first select the model and regressors desired to model the data with inside the model selection panel of the main uSimX13 interface. Then choose at which observation you’d like to carry out the data sweep (starting at observation n=60 is the default). Lastly, select how many forecast steps you’d like to use in computing the forecast error (1-24). Once content with the settings, click the “Compute time series sweep” button and watch as the window span increases from n to N, recomputing parameters, signals, and forecasts at each step (see slideshow at top of post).  Once the sweep is complete, the parameter statistics, Ljung-Box mean value at two different lags, and the total forecast error is displayed in the control panel. To compare this with another model, save the results of the sweep by clicking “save parameters” in the uSimX13 menu, and then choose another model and recompute (while using the same settings as the previous sweep, of course).

To give an example of this process, we begin by simulating a time series data set of length N = 300 from a SARIMA model of dimension (0,1,2)(0,1,1)_{12}, namely a seasonal auto-regressive integrated moving-average process with two non-seasonal moving-average parameters, and one seasonal moving average parameter. The data sweep is performed on the simulated data with a forecast error horizon of length 23 using three different SARIMA models, (a) (0,1,1)(0,1,1)_{12}, (b) (1,1,0)(0,1,1)_{12}, (c) (0,1,2)(0,1,1)_{12}, the true model. See Figures 4-6 below to see the data sweep results and the estimated parameter mean and standard deviation, the average Ljung-Box statistics at lag 12 and 0, and the forecast errors for each model. Notice the forecast error for the true model (c) (figure 6) is the lowest followed by model (b) (figure 6) and then (a) (figure 4), which is exactly what we would want.

Figure 4. Model (a) and the parameter statistics, forecast error, and data sweep controls.

Figure 5. Model (b) and the parameter statistics, forecast error, and data sweep controls.

Figure 6. Model (c) (the true model) and the parameter statistics, forecast error, and data sweep controls.

iMetrica and Hybridometrics: Introduction

The high-frequency Financial Trading interface of iMetrica. Easily construct in-sample trading strategies with an array of optimizers unique to iMetrica and then employ the strategies out-of-sample to test and fine-tune the trading performance.

This blog serves as an introduction and tutorial to Hybridometrics using iMetrica. Hybridometrics is a term used to express the analysis, modeling, signal extraction, and forecasting of univariate and multivariate financial and economic time series data using a combination of model-based and non-model-based methodologies. Ideal combinations of computational paradigms and methodologies used in hybridometrics include, but are not limited to, traditional stochastic models such as (S)ARIMA models, GARCH models, and multivariate stochastic volatiluty models   combined with empirical mode decomposition techniques and the multivariate direct filter approach (MDFA). The goal of hybridometric modeling is to obtain signal extractions and forecasts, for official use or government use, all the way to building high-frequency financial trading strategies, that perform better than using only model or non-model based methods alone. In other words, hybridometrics seeks to extract the advantages of different paradigms combined to outperform traditional approaches to time series modeling. The iMetrica software package offers the most versatile and computationally efficient portal to this newly proposed time series modeling paradigm, all while remaining surprisingly easy to use.

The iMetrica software package is a unique system of econometric and financial trading tools that focuses on speed, user interaction, visualization tools, and point-and-click simplicity in building models for time series data of all types. Written entirely in GNU C and Fortran with a rich interactive interface written in Java, the iMetrica software offers an abundance of econometric tools for signal extraction and forecasting in multivariate time series that are both easily accessible with the click of a mouse button and fast with results computed and plotted instantaneously without the need for creating output data files or calling exterior plotting devices.

One powerful feature that is unique to the iMetrica software is the innate capability of easily combining both model-based and non-model based methodologies for designing data forecasts, signal extraction filters, or high-frequency financial trading strategies. Furthermore, the strategies can be computed and tested both in-sample and out-of-sample using an easy to use built-in data partitioner that effectively partitions the data into an in-sample storage where models and filters are computed and then an out-of-sample storage where new data is applied to the in-sample strategy to test for robustness, over-fitting, and many other desired properties. This gives the user complete liberty in creating a fast and efficient test-bed for implementing signal extractions, forecasting regimes, or financial trading strategies.

The iMetrica software environment includes five interacting time series analysis modules for building hybrid forecasts, signal extractions, and trading strategies.

  • uSimX13 – A computational environment for univariate seasonal auto-regressive integrated moving-average (SARIMA) modeling and simulation using X-13ARIMA-SEATS. Features an interactive approach to modeling seasonal economic time series with SARIMA models and automatic outlier detection, trading day, and holiday regressor effects. Also includes a suite of model comparison tools using both modern and goodness-of-fit signal extraction diagnostics.
  • BayesCronos – An interactive  time series module for signal extraction and forecasting of multivariate economic and financial time series focusing on Bayesian computation and simulation. This module includes a multitude of models including ARIMA, GARCH, EGARCH, Stochastic Volatility, Multivariate Factor Stochastic Volatility, Dynamic Factor, and Multivariate High-Frequency-Based Volatility (HEAVY), with more models continuously being added. For most of the models featured, one can compute the Bayesian and/or the Quasi-Maximum-Likelihood estimated model fits using either a Metropolis-Hastings Monte Carlo Markov Chain approach (Bayesian) or a QMLE formulation for computing the model parameters estimates. Using a convenient model selection panel interface, complete access to model-type, model parameter dimensions, prior distribution parameters is seamlessly available. In the case of Bayesian estimation, one has complete control over the prior distributions of the model parameters and offers interactive visualization of the Monte Carlo Markov Chain parameter samples. For each model, up to 10 sample 36-steps ahead forecasts can be produced and visualized instantaneously along with other important model features such as model residuals, computed volatility, forecasted volatility, factor models, and more. The results can then be easily exported to other modules in iMetrica for additional filtering and/or modeling.
  • MDFA – An interactive interface to the most comprehensive multivariate real-time direct filter analysis and computation environment in the world. Build real-time filters using both I-MDFA and Zero-Pole Combination (ZPC) filter constructions. The module includes interactive access to timeliness, smoothing, and accuracy controls for filter customization along with parameters for filter regularization to control overfitting. More advanced features include an interface for building adaptive filters, and many controls for filter optimization, customization, data forecasting, and target filter construction.
  • State Space Modeling – A module for building observed component ARIMA and regression models for univariate economic time series. Similar to the uSimX13 module, the State Space Modeling environment focuses on modeling and forecasting economic time series data, but with much more generality than SARIMA models. An aggregation of observed stochastic components in the form of ARIMA models are stipulated for the time series data (for example trend + seasonal + irregular) and then regression components to model outliers, holiday, and trading day effects are added to the stochastic components giving ultimate flexibility in model building. The module uses regCMPNT, a suite of Fortran code written at the US Census Bureau, for the maximum likelihood and Kalman filter computational routines.
  • EMD – The EMD module offers a time-frequency decomposition environment for the time-frequency analysis of time series data.  The module offers both the original empirical mode decomposition technique of Huan et al. using cubic splines, along with an adaptive approach using reproducing kernels and direct-filtering. This empirical decomposition technique decomposes nonlinear and nonstationary time series into amplitude modulated and frequency modulated (AM-FM) components and then computes the intrinsic phase and instantaneous frequency components from the FM components. All plots of the components as well as the time-frequency heat maps are generated instantaneously.

Along with these modules, there is also a data control module that handles all aspects of time series data input and export. Within this main data control hub, one can import multivariate time series data from a multitude of file formats, as well as download financial time series data directly from Yahoo! finance or another source such as Reuters for higher-frequency financial data.  Once the data is loaded, the data can be normalized, scaled, demeaned, and/or log-transformed with a simple slider and button controls, with the effects being plotted on the graphic canvas instantaneously.

Another great feature of the iMetrica software is the ability to learn more about time series modeling through the using of data simulators. The data control module includes an array of data simulating panels for simulating data from a multitude of both univariate and multivariate time series models.  With access to control the number of observations, the random seed for the innovation process, the innovation process distribution, and the model parameters, simulated data can be constructed for any type of economic or financial time time series imaginable. The different types of models include (S)ARIMA models, GARCH models, correlated cycle models, trend models, multivariate factor stochastic volatility models, and HEAVY models. From simulating data and toggling the parameters, one can visualize instantly the effects of the each parameter on the simulated data. The data can then be exported to any of the modules for practicing and honing one’s skills in hybrid modeling, signal extraction, and forecasting.

Keep visiting this blog frequently for continuous updates, tutorials, and proposals in the field of econometrics, signal extraction, forecasting, and high-frequency financial trading. using hybridometrics and iMetrica.