iMetricaFX: An interactive JavaFX app for the MDFA-Toolkit

Selection_072.png

Figure 1: The main interactive iMetricaFX user interface.  

Introduction

We introduce an interactive app completely written in Java/FX for doing real-time signal extraction in multivariate time series.  iMetricaFX is a completely redesigned application from the previous iMetrica, focusing mostly on the multivariate direct filter approach and it’s derivatives for generating real-time signals and analyzing multivariate time series. It retains all the features of the MDFA module in iMetrica, but now with better responsive 2D graphics written in JavaFX, and focusing on real-time data analysis applications.

One might want to use iMetricaFX for any of the following reasons:

  • To learn visually how MDFA hyperparameters interact with each other, and how their changes affect filter coefficients, frequency domain, signal extraction results.
  • To understand how MDFA signal extraction parameters, transformations on time series, additional explanatory variables, or other features affect the behavior of out-of-sample signal quality on future data or some  performance metrics (MSE, Sharpe ratios, seasonal/cycle adjustments, etc).
  • To experiment with MDFA parameter definitions for use in the MDFA-DeepLearning package.
  • To engineer financial trading signals and track their performance out-of-sample given specific trading requirements
  • To analyze correlations between a collection of (non)stationary time series and how they affect signal extractions

Overview

We now given an overview of the different components and features of the iMetricaFX system. The first, most obvious step, in getting iMetricaFX rolling is to define the data source. The easiest data source is a collection of .csv files that have DateTime stamps for the index, with the data values in the following columns. Each column should have a header to describe that column, (e.g. “DateTime”, “Timestamp”, and “Bid” or “Open”).

In the resources folder, there is a collection of .csv files which contain daily “Close” values of a few dozen NASDAQ stocks and etfs with historical data dating back the past 6 years. All files contain the same date range period.

Add files is selected from the top menu bar, and in selecting multiple files holding the ‘ctrl’ button will upload all files for streaming data from each simultaneously for multivariate signal extraction. The DateTime format of the files can be selected in the top menu as well, in the TimeFormat menu.

Once the data files have been selected and loaded, to compute the initial default signal using the default MDFA hyperparameter settings, use the “Compute Filter” button at the very bottom left. Once the data has been loaded and the initial filter coefficients have been computed for the initial time series observations, one can then construct several types of signals, apply out-of-sample data, adjust time series transformation, change filter parameters, add additional explanatory series, and more. Here is a list of all the different current interface controls.

  • Menu:File Open .csv data files for time series observations, save filter parameters, load filter parameters.
  • Menu:Signals Create/select new signals. When a new signal is added, the filter hyperparameters will be applied to the currently selected signal. All other signal parameters will remain fixed.
  • Menu:Target Series In the multivariate case, select the target series among all the series loaded. The target series represents the series from which the signal will be built.
  • Menu:TimeFormat The DateTime stamp format of the time series. Usually for daily data this looks like “yyyy-MM-dd” or for minute data “yyyy-MM-dd HH:mm:ss”
  • Menu:Options A variety of options can be chosen here. For now, only Prefiltering on/off is available.
  • Menu:Windows Select the windows that plot the various signal extraction properties. Options are MDFA Coefficients, Frequency Response Functions, and Time/Phase delay. More window types will be added in the future.
  • Compute Filter Compute the filter given the latest Series size observations and current filter hyperparamter settings
  • New Observation This button adds a new (multivariate) time series observation from the referenced .csv files. If there are no more values left in the .csv file, then no new values will be given.
  • Filter length Change the length of the filter from 4 to 100
  • Series size Change the number of in-sample time series observations for computing MDFA coefficients
  • FractionalD The fractional difference exponent between 0 and 1 (first-order differencing)
  • Filter Customization Adjust the smoothness and timeliness parameters
  • Forecasting/Smoothing Adjust the forecasting (negative value) or smoothing lag (positive value)
  • Target Filter Adjust the frequency range for the signal using two frequency cutoffs
  • Filter Constraints Toggle the i1 and/or i2 constraint and the set the Phase Shift for the i2 constraint
  • Filter Regularization Adjust the smoothness, decay, decay strength, and cross regularization for the filter coefficients
Advertisements

MDFA-DeepLearning Package: Hybrid MDFA-RNN networks for machine learning in multivariate time series

Overview

MDFA-DeepLearning is a library for building machine learning applications on large numbers of multivariate time series data, with a heavy emphasis on noisy (non)stationary data.  The goal of MDFA-DeepLearning is to learn underlying patterns, signals, and regimes in multivariate time series and to detect, predict, or forecast them in real-time with the aid of both a real-time feature extraction system based on the multivariate direct filter approach (MDFA) and deep recurrent neural networks (RNN). The feature extraction system utilizes the MDFA-Toolkit to construct K multivariate signals in real-time (the features) where each of the K features targets a certain frequency range in the underlying time series.  Furthermore, each (or some) of these features can also be forecasted multiple steps ahead, or smoothed, creating many possibilities of signal or regime learning in time series.

For the deep learning components, in this package we focus on two network structures, namely a recurrent weighted average network (RWA  Ostmeyer and Cowell) and a standard long-short term memory network.  The RWA cell is a type of RNN cell that computes a recurrent weighted average over every past processing timestep, unlike standard RNN cells which only process information from previous timesteps.

The overall general architecture of the proposed network is given in Figure 1 in the case of an RWA network, which we will discuss in more detail below.  For a given sequence of N multivariate time series values which have been transformed appropriately to a stationary sequence, which we denote Y_1, Y_2, …, Y_N,  a real-time feature extraction process is applied at each observation which is then used as input to an RWA (or LSTM) network, where the univariate output is a targeted signal value (regression) or a regime value (classification)..

Selection_070.png

Figure 1: Proposed network design using RWA cells to learn form the real-time feature extractor. The Y_t values are the (multivariate) transformed time series values, and the S_t are univariate outputs describing a target signal or regime   

Why Use MDFA-DeepLearning

One might want to develop predictive models in multivariate time series data using MDFA-DeepLearning if the time series exhibit any of the following properties:

  • High-Dimensionality (many (un)correlated nonstationary time series)
  • Difficult to forecast using traditional model-based methods (VARIMA/GARCH) or traditional deep learning methods (RNN/LSTM, component decomposition, etc)
  • Emphasis needed on out-of-sample real-time signal extraction and forecasting
  • Regime changing in the underlying dynamics (or data generating process) of the time series is a common occurrence

The MDFA-DeepLearning approach differs from most machine learning methods in time series analysis in that an emphasis on real-time feature extraction is utilized where the features extractors are build using the multivariate direct filter approach. The motivation behind this coupling of MDFA with machine learning is that, while many time series decomposition methodologies exist (from empirical mode decomposition to stochastic component analysis methods), all of these rely on either in-sample decompositions on historical data (useless for future data), and/or assumptions about the boundary values, neither of which are attractive when fast, real-time out-of-sample predictions are the emphasis.  Furthermore, simply applying standard recurrent neural networks for step-ahead forecasting or signal extraction directly on the original noisy data is a useless exercise – the recurrent networks typically will only learn noise, producing signals and forecasts of little to no value (in most cases, the latter).

As mentioned, the back-end used for the novel feature extraction is the multivariate direct filter approach (MDFA), and is used to extract both local (higher-frequency) and global (low-frequency) features in real-time, out-of-sample, and output these features in a multivariate time series as inputs into an RWA or LSTM recurrent neural network. Thus the package is divided into essentially four different components which all need to be defined properly in order to produce predictive models for time series data:

  • Labeling interface
  • Feature extractors
  • DataSetIterator interface
  • Learning interface

Labeling interface

The package includes an interface for labeling time series. The labeling process takes segments of historical data, and labels each time series observation in some manner. There are three types of labels that can be used:

  • Observational labeling: every time series observation is labeled by a signal value (for example a target value computed by a symmetric target filter). This is sequence-to-sequence labeling for time series regression.
  • Fixed Period labeling: every period (day, week, etc) is labeled, typically by a one-hot vector. This is sequence-to-value labeling. The end of the period is labeled and the rest of the values are not (masked by nonvalues in the code).
  • Regime labeling: every value in a specific regime is labeled, either by a one-hot vector (for example, long (1,0) short (0,1) neutral (0,0), or trend (1,0) and mean-reverting (0,1)). This is another example of sequence-to-sequence, but using one-hot vectors and now in the form of sequence classification.

Other labeling strategies can certainly be used, but these are the three most common. We will give an outline on how to create a custom labeling strategy in a future article.

Feature Extractors

The package contains a feature extraction class called MDFAFeatureExtraction which, when instantiated, is used as the input to the a DataSetIterator. The MDFAFeatureExtraction contains a default automated feature extraction builder where a value of K is given as the number of features and a lag to indicate smoothing or forecasting steps-ahead.

One application of the MDFA feature extraction tool is to decompose a multivariate time series into K components in real-time which are close to being “orthogonal”, meaning in this sense that the frequency information from each of the components are relatively disjoint.  A precise mathematical formulation of this property and examples of the MDFAFeatureExtraction to follow.  Another example used for turning-point detection in trends is to decompose the multivariate series into K number of low-frequency components with different speeds and forecast/smoothing characteristics.

DataSetIterator

The DataSetIterator is an interface for ND4J that handles fast N-d array manipulation akin to numpy in Python. More specifically, the DataSetIterator handles traversing through a dataset and preparing data for a recurrent neural network. Our datasets in this package are outputs from the TimeSeries through the MDFAFeatureExtraction objects which then become the input to the RNN network. The DataSetIterator also performs the labeling and how output will be arranged. Thus it is essentially the communication from the underlying time series to the extraction process and then how it is treated as input and output to the RNN. In the package, we have designed two examples of DataSetIterators, one for regression and one for classification, that will be described in more detail in a later article.

Learning interface

Finally, the learning interface is where the final network is defined, all the parameters of the network, the activation and loss functions, and number/type of layers (LSTM, FeedForward, etc). The underlying computational framework for this component uses DeepLearning4J.

Requirements and Example

MDFA-DeepLearning requires both the MDFA-Toolkit package for constructing the time series feature extractors and the Eclipse Deeplearning4j (dl4j) library for the deep recurrent neural network constructors. The dl4j library is freely available at github.com/deeplearning4j, but is included in the build of this package using Gradle as the dependency management tool.

The back-end for the dl4j package will depend on your computational hardware, but is available on a local native basis using CPUs, or can take advantage of GPUs using CUDA libraries (CUDA 8.0 was used to test current version of MDFA-DeepLearnng). In this package I have included a reference to both versions (assuming a standard linux64 architecture).

The back-end used for the novel feature extraction technique, as mentioned, is the MDFA-Toolkit (available here), which will run on the ND4J package. The feature extraction begins by defining K MDFA objects, called MDFABase from the MDFA-Toolkit, with the fixed parameters set for each MDFA object. For example, here we define K=4 MDFABase objects, that will be used to extract different types of trends at different speeds in a fractionally differenced time series. Please refer to the MDFA-Toolkit documentation for more information on the definition of each MDFA parameter.

MDFABase[] anyMDFAs = new MDFABase[4];

anyMDFAs[0] = (new MDFABase()).setLowpassCutoff(Math.PI/8.0)
.setI1(1)
.setSmooth(.2)
.setLag(-3)
.setLambda(2.0)
.setAlpha(2.0)
.setFilterLength(5);

anyMDFAs[1] = (new MDFABase()).setLowpassCutoff(Math.PI/10.0)
.setLag(-2)
.setAlpha(2.0)
.setFilterLength(5);

anyMDFAs[2] = (new MDFABase()).setLowpassCutoff(Math.PI/4.0)
.setDecayStart(.1)
.setDecayStrength(.2)
.setLag(-1)
.setFilterLength(5);

anyMDFAs[3] = (new MDFABase()).setLowpassCutoff(Math.PI/14.0)
.setSmooth(.2)
.setDecayStart(.1)
.setDecayStrength(.1)
.setFilterLength(5);

More concrete, in-depth step by step examples and tutorials will be given in the source code on github and in this blog, but here we will just give a brief overview on an example main program using these features.


/* Define the .csv data file from where we built train/test dataIterators */
String[] dataFiles = new String[]{"AAPL.daily.csv"};

/* Information about the .csv timeseries file */
TimeSeriesFile fileInfo = new TimeSeriesFile("yyyy-MM-dd", "Index", "Open");

/* Define network parameters */
int miniBatchSize = 100;
int totalTrainExamples = 1500;
int totalTestExamples = 300;
int timeStepLength = 60;
int nHiddenLayers = 2;
int nHidden = 216;
int nEpochs = 400;
int seed = 123;
int iterations = 40;
double learningRate = .001;
double gradientNormThreshold = 10.0;

IUpdater updater = new Nesterovs(learningRate, .4);

/* Instantiate Feature Extractors as an array of MDFABase objects */
MDFAFeatureExtraction features = new MDFAFeatureExtraction(anyMDFAs); 

/* Instantiate a new RecurrentMdfaRegression network using the features defined above */
RecurrentMdfaRegression myNet = new RecurrentMdfaRegression(features);

/* Set the data and the DataIterator parameters */
myNet.setTrainingTestData(dataFiles, fileInfo, miniBatchSize, totalTrainExamples, totalTestExamples, timeStepLength);

/* Usually a good idea to normalize the data */
myNet.normalizeData();

/* Build the LSTM (default network) layers */
myNet.buildNetworkLayers(nHiddenLayers, nHidden,
			RecurrentMdfaRegression.setNeuralNetConfiguration(seed, iterations, learningRate, gradientNormThreshold, 0, updater));

/* An optional dl4j control panel to in the browser */
myNet.setupUserInterface();

/* Train on the number of Epochs */
myNet.train(nEpochs);

/* Print/plot results and stats */
myNet.printPredicitions();
myNet.plotBatches(10);

The main points here are that essentially three components need to be defined:

  1. The .csv time series data file from which the DataIterator will extract the time series data for both labeling and learning. Two data sets will be created from this, a train set and a test set. Referrencing to multiple files from which to extract training and test sets is also possible. In dl4j, training and test data is built in the form of a DataSetIterator interface (org.nd4j.linalg.dataset.api.iterator).  In the package, we have defined a MDFADataSetIterator and a MDFARegressionDataSetIterator. More DataSetIterators for various applications will be added on an ongoing basis.
  2. The network RecurrentMdfaRegression is initiated, and needs to contain the feature signal extractors. Any set of feature extractors can be added, here we used the ones defined above as an example.
  3. Finally, the LSTM (or recurrent weighted average) network parameters need to be defined, this will then be used to construct the layers of the recurrent network.

With these three steps defined the network should be ready to train and test. The challenge is of course defining the feature extraction parameters. In later articles, we will give tips and tricks into what works best for what type of learning applications in large time series.

 

MDFA-Toolkit: A JAVA package for real-time signal extraction in large multivariate time series

The multivariate direct filter approach (MDFA) is a generic real-time signal extraction and forecasting framework endowed with a richly parameterized interface allowing for adaptive and fully-regularized data analysis in large multivariate time series. The methodology is based primarily in the frequency domain, where all the optimization criteria is defined, from regularization, to forecasting, to filter constraints. For an in-depth tutorial on the mathematical formation, the reader is invited to check out any of the many publications or tutorials on the subject from blog.zhaw.ch.

This MDFA-Toolkit (clone here) provides a fast, modularized, and adaptive framework in JAVA for doing such real-time signal extraction for a variety of applications. Furthermore, we have developed several components to the package featuring streaming time series data analysis tools not known to be available anywhere else. Such new features include:

  • A fractional differencing optimization tool for transforming nonstationary time-series into stationary time series while preserving memory (inspired by Marcos Lopez de Prado’s recent book on Advances in Financial Machine Learning, Wiley 2018).
  • Easy to use interface to four different signal generation outputs:
    Univariate series -> univariate signal
    Univariate series -> multivariate signal
    Multivariate series -> univariate signal
    Multivariate series -> multivariate signal
  • Generalization of optimization criterion for the signal extraction. One can use a periodogram, or a model-based spectral density of the data, or anything in between.
  • Real-time adaptive parameterization control – make slight adjustments to the filter process parameterization effortlessly
  • Build a filtering process from simpler user-defined filters, applying customization and reducing degrees of freedom.

This package also provides an API to three other real-time data analysis frameworks that are now or soon available

  • iMetricaFX – An app written entirely in JavaFX for doing real-time time series data analysis with MDFA
  • MDFA-DeepLearning – A new recurrent neural network methodology for learning in large noisy time series
  • MDFA-Tradengineer – An automated algorithmic trading platform combining MDFA-Toolkit, MDFA-DeepLearning, and Esper – a library for complex event processing (CEP) and streaming analytics

To start the most basic signal extraction process using MDFA-Toolkit, three things need to be defined.

  1. The data streaming process which determines from where and what kind of data will be streamed
  2. A transformation of the data, which includes any logarithmic transform, normalization, and/or (fractional) differencing
  3. A signal extraction definition which is defined by the MDFA parameterization

Data streaming

In the current version, time series data is providing by a streaming CSVReader, where the time series index is given by a String DateTime stamp is the first column, and the value(s) are given in the following columns. For multivariate data, two options are available for streaming data.  1) A multiple column .csv file, with each value of the time series in a separate column 2) or in multiple referenced single column time-stamped .csv files. In this case, the time series DateTime stamps will be checked to see if in agreement. If not, an exception will be thrown. More sophisticated multivariate time series data streamers which account for missing values will soon be available.

Transforming the data

Depending on the type of time series data and the application or objectives of the real time signal extraction process, transforming the data in real-time might be an attractive feature. The transformation of the data can include (but not limited to) several different things

  • A Box-Cox transform, one of the more common transformations in financial and other non-stationary time series.
  • (fractional)-differencing, defined by a value d in [0,1]. When d=1, standard first-order differencing is applied.
  • For stationary series, standard mean-variance normalization or a more exotic GARCH normalization which attempts to model the underlying volatility is also available.

Signal extraction definition

Once the data streaming and transformation procedures have been defined, the signal extraction parameters can then be set in a univariate or multivariate setting. (Multiple signals can be constructed as well, so that the output is a multivariate signal. A signal extraction process can be defined by defining and MDFABase object (or an array of MDFABase objects in the mulivariate signal case). The parameters that are defined are as follows:

  • Filter length: the length L in number of lags of the resulting filter
  • Low-pass/band-pass frequency cutoffs: which frequency range is to be filtered from the time-series data
  • In-sample data length: how much historical data need to construct the MDFA filter
  • Customization: α (smoothness) and λ (timeliness) focuses on emphasizing smoothness of the filter by mollifying high-frequency noise and optimizing timeliness of filter by emphasizing error optimization in phase delay in frequency domain
  • Regularization parameters: controls the decay rate and strength, smoothness of the (multivariate) filter coefficients, and cross-series similarity in the multivariate case
  • Lag: controls the forecasting (negative values) or smoothing (positive values)
  • Filter constraints i1 and i2: Constrains the filter coefficients to sum to one (i1) and/or the dot product with (0,1…, L) is equal to the phase shift, where L is the filter length.
  • Phase-shift: the derivative of the frequency response function at the zero frequency.

All these parameters are controlled in an MDFABase object, which holds all the information associated with the filtering process. It includes it’s own interface which ensures the MDFA filter coefficients are updated automatically anytime the user changes a parameter in real-time.

Selection_069.png

Figure 1: Overview of the main module components of MDFA-Toolkit and how they are connected

As shown in Figure 1, the main components that need to be defined in order to define a signal extraction process in MDFA-Toolkit.  The signal extraction process begins with a handle on the data streaming process, which in this article we will demonstrate using a simple CSV market file reader that is included in the package. The CSV file should contain the raw time series data, and ideally a time (or date) stamp column. In the case there is no time stamp column, such a stamp will simply be made up for each value.

Once the data stream has been defined, these are then passed into a time series transformation process, which handles automatically all the data transformations which new data is streamed. As we’ll see, the TargetSeries object defines such transformations and all streaming data is passed added directly to the TargetSeries object. A MultivariateFXSeries is then initiated with references to each TargetSeries objects. The MDFABase objects contain the MDFA parameters and are added to the MultivariateFXSeries to produce the final signal extraction output.

To demonstrate these components and how they come together, we illustrate the package with a simple example where we wish to extract three independent signals from AAPL daily open prices from the past 5 years. We also do this in a multivariate setting, to see how all the components interact, yielding a multivariate series -> multivariate signal.


//Define three data source files, the first one will be the target series
String[] dataFiles = new String[]{"AAPL.daily.csv", "QQQ.daily.csv", "GOOG.daily.csv"};

//Create a CSV market feed, where Index is the Date column and Open is the data
CsvFeed marketFeed = new CsvFeed(dataFiles, "Index", "Open");

/* Create three independent signal extraction definitions using MDFABase:
One lowpass filter with cutoff PI/20 and two bandpass filters
*/
MDFABase[] anyMDFAs = new MDFABase[3];
anyMDFAs[0] = (new MDFABase()).setLowpassCutoff(Math.PI/20.0)
                              .setI1(1)
                              .setHybridForecast(.01)
                              .setSmooth(.3)
                              .setDecayStart(.1)
                              .setDecayStrength(.2)
                              .setLag(-2.0)
                              .setLambda(2.0)
                              .setAlpha(2.0)
                              .setSeriesLength(400); 

anyMDFAs[1] = (new MDFABase()).setLowpassCutoff(Math.PI/10.0)
                              .setBandPassCutoff(Math.PI/15.0)
                              .setSmooth(.1)
                              .setSeriesLength(400); 

anyMDFAs[2] = (new MDFABase()).setLowpassCutoff(Math.PI/5.0)
                              .setBandPassCutoff(Math.PI/10.0)
                              .setSmooth(.1)
                              .setSeriesLength(400);

/*
Instantiate a multivariate series, with the MDFABase definitions,
and the Date format of the CSV market feed
*/
MultivariateFXSeries fxSeries = new MultivariateFXSeries(anyMDFAs, "yyyy-MM-dd");

/*
Now add the three series, each one a TargetSeries representing the series
we will receive from the csv market feed. The TargetSeries
defines the data transformation. Here we use differencing order with
log-transform applied
*/
fxSeries.addSeries(new TargetSeries(1.0, true, "AAPL"));
fxSeries.addSeries(new TargetSeries(1.0, true, "QQQ"));
fxSeries.addSeries(new TargetSeries(1.0, true, "GOOG"));
/*
Now start filling the fxSeries will data, we will start with
600 of the first observations from the market feed
*/
for(int i = 0; i < 600; i++) {
   TimeSeriesEntry observation = marketFeed.getNextMultivariateObservation();
   fxSeries.addValue(observation.getDateTime(), observation.getValue());
}

//Now compute the filter coefficients with the current data
fxSeries.computeAllFilterCoefficients();

//You can also chop off some of the data, he we chop off 70 observations
fxSeries.chopFirstObservations(70);

//Plot the data so far
fxSeries.plotSignals("Original");

Selection_067.png

Figure 2: Output of the three signals on the target series (red) AAPL

In the first line, we reference three data sources (AAPL daily open, GOOG daily open, and SPY daily open), where all signals are constructed from the target signal which is by default, the first series referenced in the data market feed. The second two series act as explanatory series.  The filter coeffcients are computed using the latest 400 observations, since in this example 400 was used as the insample setSeriesLength, value for all signals. As a side note, different insample values can be used for each signal, which allows one to study the affects of insample data sizes on signal output quality.  Figure 2 shows the resulting insample signals created from the latest 400 observations.

We now add 600 more observations out-of-sample, chop off the first 400, and then see how one can change a couple of parameters on the first signal (first MDFABase object).


for(int i = 0; i < 600; i++) {
   TimeSeriesEntry observation = marketFeed.getNextMultivariateObservation();
   fxSeries.addValue(observation.getDateTime(), observation.getValue());
}

fxSeries.chopFirstObservations(400);
fxSeries.plotSignals("New 400");

/* Now change the lowpass cutoff to PI/6
   and the lag to -3.0 in the first signal (index 0) */
fxSeries.getMDFAFactory(0).setLowpassCutoff(Math.PI/6.0);
fxSeries.getMDFAFactory(0).setLag(-3.0);

/* Recompute the filter coefficients with new parameters */
fxSeries.computeFilterCoefficients(0);
fxSeries.plotSignals("Changed first signal");

Selection_067.png

Figure 3: After adding 600 new observations out-of-sample signal values

After adding the 600 values out-of-sample and plotting, we then proceed to change the lowpass cutoff of the first signal to PI/6, and the lag to -3.0 (forecasting three steps ahead). This is done by accessing the MDFAFactory and getting handle on first signal (index 0), and setting the new parameters. The filter coefficients are then recomputed on the newest 400 values (but now all signal values are insample).

In the MDFA-Toolkit, plotting is done using JFreeChart, however iMetricaFX provides an app for building signal extraction pipelines with this toolkit providing the backend where all the automated plotting, analysis, and graphics are handled in JavaFX, creating a much more interactive signal extraction environment. Many more features to the MDFA-Toolkit are being constantly added, especially in regard to features boosting applications in Machine Learning, such as we will see in the next upcoming article.

Big Data analytics in time series

We also implement in MDFA-Toolkit an interface to Apache Spark-TS, which provides a Spark RDD for Time series objects, geared towards high dimension multivariate time series. Large-scale time-series data shows up across a variety of domains. Distributed as the spark-ts package, a library developed by Cloudera’s Data Science team essentially enables analysis of data sets comprising millions of time series, each with millions of measurements. The Spark-TS package runs atop Apache Spark. A tutorial on creating an Apache Spark-TS connection with MDFA-Toolkit is currently being developed.

Forecasting, Seasonal Adjustment, and Signal Extraction with uSimX13 in iMetrica

Selection_091.png

Figire 0. The uSimX13 module in iMetrica provides interactive and dynamic forecasting and signal extraction powered by X-13-ARIMA-SEATS. 

 

The uSimX13-SEATS (uSimX13) module featured in the iMetrica software suite is an interactive graphical user-interfaced time series modeling and simulation environment. The main attraction of uSimX13 is that it features computational modeling routines from the X- 13ARIMA-SEATS (X-13A-S) software developed and published by the Census Bureau of the US Department of Commerce. The uSimX13 environment offers a unique time series modeling software with the primary goal of analyzing economic time series data using the most commonly used features of X-13ARIMA-SEATS, while providing a large array of classical and modern goodness-of-fit tests to assess different model fits of the data, many different graphical representations of the time series data, adaptive time series decomposition capabilities, and much more all while being accessible to both beginners in the field of econometrics wanting to visualize frequently used tools, and practitioners wanting to obtain forecasts, seasonal/trend adjustments, and/or test and apply regression components to their data.

While there are other X-13A-S “engines” and interfaces in existence, including the original Fortran program and the excellent R package entitled seasonal , uSimX13 in comparison serves to provide most of the commonly used and important features of X-13-ARIMA-SEATS without the use of any programming interface – one simply loads the data into the module (I will show how in this article), and then all the aspects of the modeling, including forecasting, seasonal adjustment, auto-model, and model selection, can be done with just using the iMetrica user-interface. In addition, several interactive features are available to aid in model selection in determining the best model fit for one’s data, some of which are not available in the original Fortran program nor the R package.

To get started, once iMetrica has been launched, the easiest way to get data into the uSimX13 module is to click on the uSimX13 tab, and then access the menu for the uSimX13 tab at the top in the menu bar, as shown in Figure 1.

Menu_090.png

Figure 1. Opening the uSimX13 menu and selecting Open Data File

For a single data file, click ‘Open Data File’ and a file selection dialog box will appear to choose your data file. I have included a few dozen example real economic time series in the folder called ‘data’ that comes with the iMetrica distribution on my Github. The data files accepted for uSimX13 are very trivial, in that they are simply numerical values given for each time period in each row. If there is a date associated with each time series observation, the date and the observation must be seperated by either a comma or a space. There is also the option of loading in many time series, and this can be achieved by selecting ‘Open Metafile’, and choosing a file that lists all the files to the time series that need to be loaded. It is assumed that all the files are found in the same folder as the Metafile. An example is also given in the ‘data’ directory. Scrolling though the different series that were uploaded can be achieved by accessing the menu, then selecting ‘Simulator Panel’. This will bring up a satellite panel, where on the bottom right you will see a scroll bar with all the loaded time series.

Once the data has been loaded in uSimX13, you will see a plot of the data automatically on the main plotting canvas. The plot should be gray at this point. To turn on the automatic features of the module and begin analyzing, click on ‘Activate uSimX13 engine’ as showm io Figure 1.

Menu_092.png

With the uSimX13 engine activated, this essentially turns on all the automatic estimation components of X-13-AS. The original data will be plotted in cyan, while all the extraction signals will be accessible through clicking on the checkbars in the control panel. One can change the modeling SARIMA dimensions, add outlier or other regression detection components, and visualize automatically the changes in the extracted components. All signal extraction and goodness-of-fit diagnoistcs are shown on the bottom of the control panel.

Once the data has been loaded in, one exploratory feature on the module is the ‘Sliding Span Activate’ that offers a unique approach to model selection and goodness-of-fit by addressing multi-step ahead forecasting error on ‘test’ portions of the data. Such analysis can be readily achieved by using the ‘Sliding Span Activate’ component along with the ‘Sweep Time Series Control Panel’. Further details of this interactive model selection feature can be found in one of my previous articles on model selection here.

That will get you started with the basic features in loading data into the model. To learn more, click on the .pdf file here usimx13 for a full guide on how to use all the components in the uSimX13, including a quick guide on the inference of model selection with signal extraction goodness-of-fit diagnostics that was featured in a recent paper by myself and colleague Tucker McElroy here .

Coming next week: Interactive modeling with State Space and RegComponent models using the State Space Modeling module in iMetrica.

 

 

 

 

 

iMetrica for Linux Ubuntu 64 now available

The MDFA real-time signal extraction module

The MDFA real-time signal extraction module

My first open-source release of iMetrica for Linux Ubuntu 64 can now be downloaded at my Github, with a Windows 64 version soon to follow. iMetrica is a fast, interactive, GUI-oriented software suite for predictive modeling, multivariate time series analysis, real-time signal extraction, Bayesian financial econometrics, and much more.

The principal use of iMetrica is to provide an interactive environment for the numerical and visual analysis of (multivariate) time series modeling, real-time filtering, and signal extraction. The interactive features in iMetrica boast a modeling and graphics environment for analysts, practitioners, and students of econometrics, finance, and real-time data analysis where no coding or modeling experience is necessary. All the system needs is data which can be piped into the system in many forms, including .csv, .txt, Google/Yahoo Finance, Quandle, .RData, and more. A module for connecting to MySQL databases is currently being developed. One can also simulate their own data from a one or a combination of several different popular data generating models.

With the design intending to be interactive and self-enclosed, one can change modeling data/parameter inputs and see the effects in both graphical and numerical form automatically. This feature is designed to help understand the underlying mechanics of the modeling or filtering process. One can test many attributes of the modeling or filtering process this way both visually and numerically such as sensitivity, nonlinearity, goodness-of-fit, any overfitting issues, stability, etc.

All the computational libraries were written in GNU C and/or Fortran and have been provided as Native libraries to the Java platform via JNI, where Java provides the user-interface, control, graphics, and several other components in a module format, and where each module specializes in a different data analysis paradigm. The modules available in this open-source version of iMetrica are as follows:

1) Data simulation, modeling and fitting using several popular econometric models

  • (S)ARIMA, (E)GARCH, (Multivariate) Factor models, Stochastic Volatility, High-frequency volatility models, Cycles/Trends, and more
  • Random number generators from several different types of parameterized distributions to create shocks, outliers, regression components, etc.
  • Visualize in real-time all components of the modeling process

2) An interactive GUI for multivariate real-time signal extraction using the multivariate direct filter approach (MDFA)

  • Construct mulitvariate MA filter designs, classical ARMA ZPA filtering designs, or hybrid filtering designs.
  • Analyze all components of the filtering and signal extraction process, from time-delay and smoothing control, to regularization.
  • Adaptive real-time filtering
  • Construct financial trading signals and forecasts
  • Includes a real-time/frequency analysis module using MDFA

3) An interactive GUI for X-13-ARIMA-SEATS called uSimX13

  • Perform automatic seasonal adjustment on thousands of economic time series
  • Compare SARIMA model choices using several different novel signal extraction diagnostics and tools available only in iMetrica
  • Visualize in real-time several components of modeling process
  • Analyze forecasts and compare with other models
  • All of the most important features of X-13-ARIMA-SEATS included

4) An interactive GUI for RegComponent (State Space and Unobserved Component Models)

  • Construct unobserved signal components and time-varying regression components
  • Obtain forecasts automatically and compare with other forecasting models

5) Empirical Mode Decomposition

  • Applies a fast adaptive EMD algorithm to decompose nonlinear, nonstationary data into a trend and instrinsic modes.
  • Visualize all time-frequency components with automatically generated 2D heat maps.

6) Bayesian Time Series Modeling of ARIMA, (E)GARCH, Multivariate Stochastic Volatility, HEAVY models

  • Compute and visualize posterior distribtions for all modeling parameters
  • Easily compare different model dimensions

7) Financial Trading Strategy Engineering with MDFA

  • Construct financial trading signals in the MDFA module and backtest the strategies on any frequency of data
  • Perform analysis of the strategies using forward-walk schemes
  • Automatically optimize certain components of the signal extraction on in-sample data.
  • Features a toolkit for minimizing probability of backtest overfit

Tutorials on how to use iMetrica can be found on this blog and will be added on a weekly basis, with new tools, features, and modules being added and improved on a consistent basis.

Please send any bug reports, comments, complaints, to clisztian@gmail.com.

TWS-iMetrica: The Automated Intraday Financial Trading Interface Using Adaptive Multivariate Direct Filtering

Figure 1: The TWS-iMetrica automated financial trading platform. Featuring fast performance optimization, analysis, and trading design features unique to iMetrica for building direct real-time filters to generate automated trading signals for nearly any tradeable financial asset. The system was built using Java, C, and the Interactive Brokers IB API in Java.

Figure 1: The TWS-iMetrica automated financial trading platform. Featuring fast performance optimization, analysis, and trading design features unique to iMetrica for building direct real-time filters to generate automated trading signals for nearly any tradeable financial asset. The system was built using Java, C, and the Interactive Brokers IB API in Java.

Introduction

I realize that I’ve been MIA (missing in action for non-anglophones) the past three months on this blog, but I assure you there has been good reason for my long absence. Not only have I developed a large slew of various optimization, analysis, and statistical tools in iMetrica for constructing high-performance financial trading signals geared towards intraday trading which I will (slowly) be sharing over the next several months (with some of the secret-sauce-recipes kept to myself and my current clients of course), but I have also built, engineered, tested, and finally put into commission on a daily basis the planet’s first automated financial trading platform completely based on the recently developed FT-AMDFA (adaptive multivariate direct filtering approach for financial trading). I introduce to you iMetrica’s little sister, TWS-iMetrica.

Coupled with the original software I developed for hybrid econometrics, time series analysis, signal extraction, and multivariate direct filter engineering called iMetrica, the TWS-iMetrica platform was built in a way to provide an easy to use yet powerful, adaptive, versatile, and automated trading machine for intraday financial trading with a variety of options for building your own day trading strategies using MDFA based on your own financial priorities.  Being written completely in Java and gnu c, the TWS-iMetrica system currently uses the Interactive Brokers (IB) trading workstation (TWS) Java API in order to construct the automated trades, connect to the necessary historical data feeds, and provide a variety of tick data. Thus in order to run, the system will require an activated IB trading account. However, as I discuss in the conclusion of this article, the software was written in a way to be seamlessly adapted to any other brokerage/trading platform API, as long as the API is available in Java or has Java wrappers available.

The steps for setting up and building an intraday financial trading environment using iMetrica + TWS-iMetrica are easy. There are four of them. No technical analysis indicator garbage is used here, no time domain methodologies, or stochastic calculus. TWS-iMetrica is based completely on the frequency domain approach to building robust real-time multivariate filters that are designed to extract signals from tradable financial assets at any fixed observation of frequencies (the most commonly used in my trading experience with FT-AMDFA being 5, 15, 30, or 60 minute intervals). What makes this paradigm of financial trading versatile is the ability to construct trading signals based on your own trading priorities with each filter designed uniquely for a targeted asset to be traded. With that being said, the four main steps using both iMetrica and TWS-iMetrica are as follows:

  1. The first step to building an intraday trading environment is to construct what I call an MDFA portfolio (which I’ll define in a brief moment). This is achieved in the TWS-iMetrica interface that is endowed with a user-friendly portfolio construction panel shown below in Figure 4.
  2. With the desired MDFA portfolio, selected, one then proceeds in connecting TWS-iMetrica to IB by simply pressing the Connect button on the interface in order to download the historical data (see Figure 3).
  3. With the historical data saved, the iMetrica software is then used to upload the saved historical data and build the filters for the given portfolio using the MDFA module in iMetrica (see Figure 2). The filters are constructed using a sequence of proprietary MDFA optimization and analysis tools. Within the iMetrica MDFA module, three different types of filters can be built 1) a trend filter that extracts a fast moving trend 2) a band-pass filter for extracting local cycles, and 3) A multi-bandpass filter that extracts both a slow moving trend and local cycles simultaneously.
  4. Once the filters are constructed and saved in a file (a .cft file), the TWS-iMetrica system is ready to be used for intrady trading using the newly constructed and optimized filters (see Figure 6).
Figure 4: The iMetrica MDFA module for constructing the trading filters. Features dozens of design, analysis, and optimization components to fit the trading priorities of the user and is used in conjunction with the TWS-iMetrica interface.

Figure 2: The iMetrica MDFA module for constructing the trading filters. Features dozens of design, analysis, and optimization components to fit the trading priorities of the user and is used in conjunction with the TWS-iMetrica interface.

In the remaining part of this article, I give an overview of the main features of the TWS-iMetrica software and how easily one can create a high-performing automated trading strategy that fits the needs of the user.

The TWS-iMetrica Interface

The main TWS-iMetrica graphical user interface is composed of several components that allow for constructing a multitude of various MDFA intraday trading strategies, depending on one’s trading priorities. Figure 3 shows the layout of the GUI after first being launched. The first component is the top menu featuring TWS System, some basic TWS connection variables which, in most cases, these variables are left in their default mode, and the Portfolio menu. To access the main menu for setting up the MDFA trading environment, click Setup MDFA Portfolio under the Portfolio menu. Once this is clicked, a panel is displayed (shown in Figure 4) featuring the required a priori parameters for building the MDFA trading environment that should all be filled before MDFA filter construction and trading is to take place. The parameters and their possible values are given below Figure 4.

Figure 3 - The TWS-iMetrica interface when first launched and everything blank.

Figure 3 – The TWS-iMetrica interface when first launched and everything blank.

The Setup MDFA Portfolio panel featuring all the setting necessary to construct the automated trading MDFA environment.

Figure 4 – The Setup MDFA Portfolio panel featuring all the setting necessary to construct the automated trading MDFA environment.

  1. Portfolio – The portfolio is the basis for the MDFA trading platform and consists of two types of assets 1) The target asset from which we construct the trading signal, engineer the trades, and use in building the MDFA filter 2) The explanatory assets that provide the explanatory data for the target asset in the multivariate filter construction. Here, one can select up to four explanatory assets.
  2. Exchange – The exchange on which the assets are traded (according to IB).
  3. Asset Type – If the input portfolio is a selection of Stocks or Futures (Currencies and Options soon to be available).
  4. Expiration – If trading Futures, the expiration date of the contract, given as a six digit number of year then month (e.g. 201306 for June 2013).
  5. Shares/Contracts – The number of shares/contracts to trade (this number can also be changed throughout the trading day through the main panel).
  6. Observation frequency – In the MDFA financial trading method, we consider uniformly sampled observations of market data on which to do the trading (in seconds). The options are 1, 2, 3, 5, 15, 30, and 60 minute data. The default is 5 minutes.
  7. Data – For the intraday observations, determines the nature of data being extracted. Valid values include TRADES, MIDPOINT, BID, ASK, and BID_ASK. The default is MIDPOINT
  8. Historical Data – Selects how many days are used to for downloading the historical data to compute the initial MDFA filters. The historical data will of course come in intervals chosen in the observation frequency.

Once all the values have been set for the MDFA portfolio, click the Set and Build button which will first begin to check if the values entered are valid and if so, create the necessary data sets for TWS-iMetrica to initialize trading. This all must be done while TWS-iMetrica is connected to IB (not set in trading mode however). If the build was successful, the historical data of the desired target financial asset up to the most recent observation in regular market trading hours will be plotted on the graphics canvas. The historical data will be saved to a file named (by default) “lastSeriesData.dat” and the data will be come in columns, where the first column is the date/time of the observation, the second column is the price of the target asset, and remaining columns are log-returns of the target and explanatory data. And that’s it, the system is now setup to be used for financial trading. These values entered in the Setup MDFA Portfolio will never have to be set again (unless changes to the MDFA portfolio are needed of course).

Continuing on to the other controls and features of TWS-iMetrica, once the portfolio has been set, one can proceed to change any of the settings in main trading control panel. All these controls can be used/modified intraday while in automated MDFA trading mode. In the left most side of the panel at the main control panel (Figure 5) of the interface includes a set of options for the following features:

Figure 7 - The main control panel for choosing and/or modifying all the options during intraday trading.

Figure 5 – The main control panel for choosing and/or modifying all the options during intraday trading.

  1. In contracts/shares text field, one enters the amount of share (for stocks) or contracts (for futures)  that one will trade throughout the day. This can be adjusted during the day while the automated trading is activated, however, one must be certain that at the end of the day, the balance between bought and shorted contracts is zero, otherwise, you risk keeping contracts or shares overnight into the next trading day.Typically, this is set at the beginning before automated trading takes place and left alone.
  2. The data input file for loading historical data. The name of this file determines where the historical data associated with the MDFA portfolio constructed will be stored. This historical data will be needed in order to build the MDFA filters. By default this is “lastSeriesData.dat”. Usually this doesn’t need to be modified.
  3. The stop-loss activation and stop-loss slider bar, one can turn on/off the stop-loss and the stop-loss amount. This value determines how/where a stop-loss will be triggered relative to the price being bought/sold at and is completely dependent on the asset being traded.
  4. The interval search that determines how and when the trades will be made when the selected MDFA signal triggers a buy/sell transaction. If turned off, the transaction (a limit order determined by the bid/ask) will be made at the exact time that the buy/sell signal is triggered by the filter. If turned on, the value in the text field next to it gives how often (in seconds) the trade looks for a better price to make the transaction. This search runs until the next observation for the MDFA filter. For example, if 5 minute return data is being used to do the trading, the search looks every seconds for 5 minutes for a better price to make the given transaction. If at the end of the 5 minute period no better price has been found, the transaction is is made at the current ask/bid price. This feature has been shown to be quite useful during sideways or highly volatile markets.

The middle of the main control panel features the main buttons for both connecting to disconnecting from Interactive Brokers, initiating the MDFA automated trading environment, as well as convenient buttons used for instantaneous buy/sell triggers that supplement the automated system. It also features an on/off toggle button for activating the trades given in the MDFA automated trading environment. When checked on, transactions according to the automated MDFA environment will proceed and go through to the IB account. If turned off, the real-time market data feeds and historical data will continue to be read into the TWS-iMetrica system and the signals according to the filters will be automatically computed, but no actual transactions/trades into the IB account will be made.

Finally, on the right hand side of the main control panel features the filter uploading and selection boxes. These are the MDFA filters that are constructed using the MDFA module in iMetrica. One convenient and useful feature of TWS-iMetrica is the ability to utilize up to three direct real-time filters in parallel and to switch at any given moment during market hours between the filters. (Such a feature enhances the adaptability of the trading using MDFA filters. I’ll discuss more about this in further detail shortly).  In order to select up to three filters simultaneously, there is a filter selection panel (shown in bottom right corner of Figure 6 below) displaying three separate file choosers and a radio button corresponding to each filter. Clicking on the filter load button produces a file dialog box from which one selects a filter (a *.cft file produced by iMetrica). Once the filter is loaded properly, on success, the name of the filter is displayed in the text box to the right, and the radio button to the left is enabled. With multiple filters loaded, to select between any of them, simply click on their respective radio button and the corresponding signal will be plotted on the plot canvas (assuming market data has been loaded into the TWS-iMetrica using the market data file upload and/or has been connected to the IB TWS for live market data feeds). This is shown in Figure 6.

Figure 5 - The TWS-iMetrica main trading interface features many control options to design your own automated MDFA trading strategies.

Figure 6 – The TWS-iMetrica main trading interface features many control options to design your own automated MDFA trading strategies.

And finally, once the historical data file for the MDFA portfolio has been created, up to three filters have been created for the portfolio and entered in the filter selection boxes, and the system is connected to Interactive Brokers by pressing the Connect button, the market and signal plot panel can then be used for visualizing the different components that one will need for analyzing the market, signal, and performance of the automated trading environment. In the panel just below the plot canvas features and array of checkboxes and radiobuttons. When connected to IB and the Start MDFA Trading has been pressed, all the data and plots are updated in real-time automatically at the specific observation frequency selected in the MDFA Portfolio setup. The currently available plots are as follows:

Figure 8 - The plots for the trading interface. Features price, log-return, account cumulative returns, signal, buy/sell lines, and up to two additional  auxiliary signals.

Figure 8 – The plots for the trading interface. Features price, log-return, account cumulative returns, signal, buy/sell lines, and up to two additional auxiliary signals.

  • Price – Plots in real-time the price of the asset being traded, at the specific observation frequency selected for the MDFA portfolio.
  • Log-returns – Plots in real-time the log-returns of the price, which is the data that is being filtered to construct the trading signal.
  • Account – Shows the cumulative returns produced by the currently chosen MDFA filter over the current and historical data period (note that this does not necessary reflect the actual returns made by the strategy in IB, just the theoretical returns over time if this particular filter had been used).
  • Buy/Sell lines – Shows dashed lines where the MDFA trading signal has produced a buy/sell transaction. The green lines are the buy signals (entered a long position) and magenta lines are the sell (entered a short position).
  • Signal – The plot of the signal in real-time. When new data becomes available, the signal is automatically computed and replotted in real-time. This gives one the ability to closely monitory how the current filter is reacting to the incoming data.
  • Aux Signal 1/2 – (If available) Plots of the other available signals produced by the (up to two) other filters constructed and entered in the system. To make either of these auxillary signals the main trading signal simply select the filter associated with the signal using the radio buttons in the filter selection panel.

Along with these plots, to track specific values of any of these plots at anytime, select the desired plot in the Track Plot region of the panel bar. Once selected, specific values and their respective times/dates are displayed in the upper left corner of the plot panel by simply placing the mouse cursor over the plot panel. A small tracking ball will then be moved along the specific plot in accordance with movements by the mouse cursor.

With the graphics panel displaying the performance in real-time of each filter, one can seamlessly switch between a band-pass filter or a timely trend (low-pass) filter according to the changing intraday market conditions. To give an example, suppose at early morning trading hours there is an unusual high amount of volume pushing an uptrend or pulling a downtrend. In such conditions a trend filter is much more appropriate, being able to follow the large-variation in log-returns much better than a band-pass filter can. One can glean from the effects of the trend filter on the morning hours of the market. After automated trading using the trend filter, with the volume diffusing into the noon hour, the band-pass filter can then be applied in order to extract and trade at certain low frequency cycles in the log-return data. Towards the end of the day, with volume continuously picking up, the trend filter can then be selected again in order to track and trade any trending movement automatically.

I am in the process of currently building an automated algorithm to “intelligently” switch between the uploaded filters according to the instantaneous market conditions (with triggering of the switching being set by volume and volatility. Otherwise, for the time being, currently the user must manually switch between different filters, if such switching is at all desired (in most cases, I prefer to leave one filter running all day. Since the process is automated, I prefer to have minimal (if any) interaction with the software during the day while it’s in automated trading mode).

Conclusion

As I mentioned earlier, the main components of the TWS-iMetrica were written in a way to be adaptable to other brokerage/trading APIs. The only major condition is that the API either be available in Java, or at least have (possibly third-party?) wrappers available in Java. That being said, there are only three main types of general calls that are made automated to the connected broker 1) retrieve historical data for any asset(s), at any given time, at most commonly used observation frequencies (e.g. 1 min, 5 min, 10 min, etc.), 2) subscribe to automatic feed of bar/tick data to retrieve latest OHLC and bid/ask data, and finally 3) Place an order (buy/sell) to the broker with different any order conditions (limit, stop-loss, market order, etc.) for any given asset.

If you are interested in having TWS-iMetrica be built for your particular brokerage/trading platform (other than IB of course) and the above conditions for the API are met, I am more than happy to be hired at certain fixed compensation, simply get in contact with me. If you are interested seeing how well the automated system has performed thus far, interested in future collaboration, or becoming a client in order to use the TWS-iMetrica platform, feel free to contact me as well.

Happy extracting!

High-Frequency Financial Trading on Index Futures with MDFA and R: An Example with EURO STOXX50

Figure 1: Out-of-sample performance of the trading signal for the Euro Stoxx50 index futures with expiration March 18th (STXE H3)  during the period of 1-9-2013 and 2-1-2013, using 15 minute log-returns. The black dotted lines indicate a buy/long signal and the blue dotted lines indicate a sell/short (top).

Figure 1: In-sample and Out-of-sample performance (observations 240-457) of the trading signal for the Euro Stoxx50 index futures with expiration March 18th (STXE H3) during the period of 1-9-2013 and 2-1-2013, using 15 minute log-returns. The black dotted lines indicate a buy/long signal and the blue dotted lines indicate a sell/short (top).

In this second tutorial on building high-frequency financial trading signals using the multivariate direct filter approach in R, I focus on the first example of my previous article on signal engineering in high-frequency trading of financial index futures where I consider 15-minute log-returns of the Euro STOXX50 index futures with expiration on March 18th, 2013 (STXE H3).  As I mentioned in the introduction, I added a slightly new step in my approach to constructing the signals for intraday observations as I had been studying the problem of close-to-open variations in the frequency domain. With 15-minute log-return data, I look at the frequency structure related to the close-to-open variation in the price, namely when the price at close of market hours significantly differs from the price at open, an effect I’ve mentioned in my previous two articles dealing with intraday log-return data. I will show (this time in R) how MDFA can take advantage of this variation in price and profit from each one by ‘predicting’ with the extracted signal the jump or drop in the price at the open of the next trading day. Seems to good to be true, right? I demonstrate in this article how it’s possible.

The first step after looking at the log-price and the log-return data of the asset being traded is to construct the periodogram of the in-sample data being traded on.  In this example, I work with the same time frame I did with my previous R tutorial by considering the in-sample portion of my data to be from 1-4-2013 to 1-23-2013, with my out-of-sample data span being from 1-23-2013 to 2-1-2013, which will be used to analyze the true performance of the trading signal. The STXE data and the accompanying explanatory series of the EURO STOXX50 are first loaded into R and then the periodogram is computed as follows.


#load the log-return and log-price SXTE data in-sample
load(paste(path.pgm,&quot;stxe_insamp15min.RData&quot;,sep=&quot;&quot;))
load(paste(path.pgm,&quot;stxe_priceinsamp15min.RData&quot;,sep=&quot;&quot;))
#load the log-return and log-price SXTE data out-of-sample
load(paste(path.pgm,&quot;stxe_outsamp15min.RData&quot;,sep=&quot;&quot;))
load(paste(path.pgm,&quot;stxe_priceoutsamp15min.RData&quot;,sep=&quot;&quot;))

len_price&lt;-557
out_samp_len&lt;-210
in_samp_len&lt;-347

price_insample&lt;-stxeprice_insamp
price_outsample&lt;-stxeprice_outsamp

#some mdfa definitions
x&lt;-stxe_insamp
len&lt;-length(x[,1])

#my range for the 15-min close-to-open cycle
cutoff&lt;-.32
ub&lt;-.32
lb&lt;-.23

#------------ Compute DFTs ---------------------------
spec_obj&lt;-spec_comp(len,x,0)
weight_func&lt;-spec_obj$weight_func
stxe_periodogram&lt;-abs(spec_obj$weight_func[,1])^2
K&lt;-length(weight_func[,1])-1

#----------- compute Gamma ----------------------------
Gamma&lt;-((0:K)&lt;(K*ub/pi))&amp;((0:K)&gt;(K*lb/pi))

colo&lt;-rainbow(6)
xaxis&lt;-0:K*(pi/(K+1))
plot(xaxis, stxe_periodogram, main=&quot;Periodogram of STXE&quot;, xlab=&quot;Frequency&quot;, ylab=&quot;Periodogram&quot;,
xlim=c(0, 3.14), ylim=c(min(stxe_periodogram), max(stxe_periodogram)),col=colo[1],type=&quot;l&quot; )
abline(v=c(ub,lb),col=4,lty=3)

You’ll notice in the periodogram of the in-sample STXE log-returns that I’ve pinpointed a spectral peak between two blue dashed lines. This peak corresponds to an intrinsically important cycle in the 15-minute log-returns of index futures that gives access to predicting the close-to-open variation in the price. As you’ll see, the cycle flows fluidly through the 26 15-minute intervals during each trading day and will cross zero at (usually) one to two points during each trading day to signal whether to go long or go short on the index for the next day. I’ve deduced this optimal frequency range in a prior analysis of this data that I did using my target filter toolkit in iMetrica (see previous article). This frequency range will depend on the frequency of intraday observations, and can also depend on the index (but in my experiments, this range is typically consistent to be between .23 and .32 for most index futures using 15min observations). Thus in the R code above, I’ve defined a frequency cutoff at .32 and upper and lower bandpass cutoffs at .32 and .23, respectively.

Figure 2: Periodogram of the log-return STXE data. The spectral peak is extracted and highlighted between the two red dashed lines.

Figure 2: Periodogram of the log-return STXE data. The spectral peak is extracted and highlighted between the two red dashed lines.

In this first part of the tutorial, I extract this cycle responsible for marking the close-to-open variations and show how well it can perform. As I’ve mentioned in my previous articles on trading signal extraction, I like to begin with the mean-square solution (i.e. no customization or regularization) to the extraction problem to see exactly what kind of parameterization I might need. To produce the plain vanilla mean-square solution, I set all the parameters to 0.0 and then compute the filter by calling the main MDFA function (shown below). The function IMDFA returns an object with the filter coefficients and the in-sample signal. It also plots the concurrent transfer function for both of the filters along with the filter coefficients for increasing lag, shown in Figure 3.

L&lt;-86
lambda_smooth&lt;-0.0
lambda_cross&lt;-0.0
lambda_decay&lt;-c(0.00,0.0)
i1&lt;-F
i2&lt;-F
lambda&lt;-0
expweight&lt;-0
i_mdfa_obj&lt;-IMDFA(L,i1,i2,cutoff,lambda,expweight,lambda_cross,lambda_decay,lambda_smooth,weight_func,Gamma,x)
Figure 3: Concurrent transfer functions for the STXE (red) and explanatory series (cyan) (top). Coefficients for the STXE and explanatory series.

Figure 3: Concurrent transfer functions for the STXE (red) and explanatory series (cyan) (top). Coefficients for the STXE and explanatory series (bottom).

Notice the noise leakage past the stopband in the concurrent filter and the roughness of both sets of filter coefficients (due to overfitting). We would like to smooth both of these out, along with allowing the filter coefficients to decay as the lag increases. This ensures more consistent in-sample and out-of-sample properties of the filter. I first apply some smoothing to the stopband by applying an expweight parameter of 16, and to compensate slightly for this improved smoothness, I improve the timeliness by setting the lambda parameter to 1. After noticing the improvement in the smoothness of filter coefficients, I then proceed with the regularization and conclude with the following parameters.

lambda_smooth&lt;-0.90
lambda_decay&lt;-c(0.08,0.11)
lambda&lt;-1
expweight&lt;-16
Figure 4: Transfer functions and coefficients after smoothing and regularization.

Figure 4: Transfer functions and coefficients after smoothing and regularization.

A vast improvement over the mean-squared solution. Virtually no noise leakage in the stopband passed \omega_1 =.32 and the coefficients decay beautifully with perfect smoothness achieved. Notice the two transfer functions perfectly picking out the spectral peak that is intrinsic to the close-to-open cycle that I mentioned was between .23 and .32. To verify these filter coefficients achieve the extraction of the close-to-open cycle, I compute the trading signal from the imdfa object and then plot it against the log-returns of STXE. I then compute the trades in-sample using the signal and the log-price of STXE. The R code is below and the plots are shown in Figures 5 and 6.

bn&lt;-i_mdfa_obj$i_mdfa$b
trading_signal&lt;-i_mdfa_obj$xff[,1] + i_mdfa_obj$xff[,2]

plot(x[L:len,1],col=colo[1],type=&quot;l&quot;)
lines(trading_signal[L:len],col=colo[4])
trade&lt;-trading_logdiff(trading_signal[L:len],price_insample[L:len],0)
Figure : The in-sample signal and the log-returns of SXTE in 15 minute observations from 1-9-2013 to 1-23-2013

Figure 5: The in-sample signal and the log-returns of SXTE in 15 minute observations from 1-9-2013 to 1-23-2013

Figure 5 shows the log-return data and the trading signal extracted from the data. The spikes in the log-return data represent the close-to-open jumps in the STOXX Europe 50 index futures contract, occurring every 27 observations. But notice how regular the signal is, and how consistent this frequency range is found in the log-return data, almost like a perfect sinusoidal wave, with one complete cycle occurring nearly every 27 observations. This signal triggers trades that are shown in Figure 6, where the black dotted lines are buys/long and the blue dotted lines are sells/shorts. The signal is extremely consistent in finding the opportune times to buy and sell at the near optimal peaks, such as at observations 140, 197, and 240. It also ‘predicts’ the jump or fall of the EuroStoxx50 index future for the next trading day by triggering the necessary buy/sell signal, such as at observations 19, 40, 51, 99, 121, 156, and, 250.  The performance of this trading in-sample is shown in Figure 7.

Figure 6: The in-sample trades. Black dotted lines are buy/long and the blue dotted lines are sell/short.

Figure 6: The in-sample trades. Black dotted lines are buy/long and the blue dotted lines are sell/short.

Figure 7: The in-sample performance of the trading signal.

Figure 7: The in-sample performance of the trading signal.

Now for the real litmus test in the performance of this extracted signal, we need to apply the filter out-of-sample to check for consistency in not only performance, but also in trading characteristics. To do this in R, we bind the in-sample and out-of-sample data together and then apply the filter to the out-of-sample set (needing the final L-1 observations from the in-sample portion). The resulting signal in shown in Figure 8.

x_out&lt;-rbind(stxe_insamp,stxe_outsamp)
xff&lt;-matrix(nrow=out_samp_len,ncol=2)

  for(i in 1:out_samp_len)
  {
    xff[i,]&lt;-0
    for(j in 2:3)
    {
      xff[i,j-1]&lt;-xff[i,j-1]+bn[,j-1]%*%x_out[in_samp_len+i:(i-L+1),j]
    }
  }
  trading_signal_outsamp&lt;-xff[,1] + xff[,2]

plot(stxe_outsamp[,1],col=colo[1],type=&quot;l&quot;)
lines(trading_signal_outsamp,col=colo[4])

The signal and log-return data Notice that the signal performs consistently out-of-sample until right around observation 170 when the log-returns become increasingly volatile. The intrinsic cycle between frequencies .23 and .32 has been slowed down due to this increased volatility and might affect the trading performance.

Figure 9: Out-of-sample signal and log-return data of STXE

Figure 8: Signal produced out-of-sample on 210 observations and log-return data of STXE

The total in-sample plus out-of-sample trading performance is shown in Figure 9 and 10, with the final 210 points being out-of-sample.  The out-of-sample performance is very much akin to the in-sample performance we had, with a very clear systematic trading exposed by ‘predicting’ the next day close-to-open jump or fall in a consistent manner, by triggering the necessary buy/sell signal, such as at observations 310, 363, 383, and 413, with only one loss up until the final day trading.  The higher volatility during the final day of the in-sample period damages the cyclical signal and fails to trade systematically as it had been during the first 420 observations.

Figure 9: The total in-sample plus out-of-sample buys and sells.

Figure 9: The total in-sample plus out-of-sample buys and sells.

Figure 10: Total performance over in-sample and out-of-sample periods.

Figure 10: Total performance over in-sample and out-of-sample periods.

With this kind of performance both in-sample and out-of-sample, and the beautifully consistent yet methodological trading patterns this signal provides, it would seem like attempting to improve upon it would be a pointless task. Why attempt to fix what’s not “broken”. But being the perfectionist that I am, I strive for an even “smarter” filter. If only there was a way to 1) keep the consistent cyclical trading effects as before 2)  ‘predict’ the next day close-to-open jump/fall in the Euro Stoxx50 index future as before, and 3) avoid volatile periods to eliminate erroneous trading, where the signal performed worse. After hours spent in iMetrica, I figured how to do it. This is where advanced trading signal engineering comes into play.

The first step was to include all the lower frequencies below .23, which were not included in my previous trading signal. Due to the low amount of activity in these lower frequencies, this should only provide the effect or a ‘lift’ or a ‘push’ or the signal locally, while still retaining the cyclical component. So after changing my \Gamma to a low-pass filter with cutoff set at \omega = .32, I then computed the filter with the new lowpass design. The transfer functions for the filter coefficients are shown below in Figure 11, with the red colored plot the transfer function for the STXE. Notice that the transfer function for the explanatory series still privileges spectral peak between .23 and .32, with only a slight lift at frequency zero (compare this with the bandpass design in Figure 4, not much has changed).  The problem is that the peak exceeds 1.0 in the passband, and this will amplify the cyclical component extracted from the log-return. It might be okay, trading wise, but not what I’m looking to do.  For the STXE filter, we get slightly more of a lift at frequency zero, however this has been compensated with a decreased cycle extraction between frequencies .23 and .32.  Also, a slight amount of noise has entered in the stopband, another factor we must mollify.


#---- set Gamma to low-pass
cutoff&lt;-.32
Gamma&lt;-((0:K)&lt;(cutoff*K/pi))

#---- compute new filter ----------
i_mdfa_obj&lt;-IMDFA(L,i1,i2,cutoff,lambda,expweight,lambda_cross,lambda_decay,lambda_smooth,weight_func,Gamma,x)
Figure 11: The concurrent transfer functions after changing to lowpass filter.

Figure 11: The concurrent transfer functions after changing to lowpass filter.

To improve the concurrent filter properties for both, I increase the smoothing expweight to 26, which will in turn affect the lambda_smooth, so I decrease it to .70. This gives me a much better transfer function pair, shown in Figure 12.  Notice the peak in the explanatory series transfer function is now much closer to 1.0, exactly what we want.

Figure 11: The concurrent transfer functions after changing to lowpass filter.

Figure 12: The concurrent transfer functions after changing to lowpass filter, increasing expweight to 26, and decreasing lambda_smooth to .70.

I’m still not satisfied with the lift at frequency zero for the STXE series. At roughly .5 at frequency zero, the filter might not provide enough push or pull that I need. The only way to ensure a guaranteed lift in the STXE log-return series is to employ constraints on the filter coefficients so that the transfer function is one at frequency zero. This can be achieved by setting i1 to true in the IMDFA function call, which effectively ensures that the sum of the filter coefficients at \omega = 0 is one. After doing this, I get the following transfer functions and the respective filter coefficients.

#---- Update the regularization parameters
lambda_smooth&lt;-0.68
lambda_cross&lt;-0.0
lambda_decay&lt;-c(0.083,0.11)

#---- update customization parameters
lambda&lt;-0
expweight&lt;-28

#---- set filter constraint -------
i1&lt;-T
weight_constraint[1]&lt;-1
Figure 13: Transfer function and filter coefficients after setting the coefficient constraint i1 to true.

Figure 13: Transfer function and filter coefficients after setting the coefficient constraint i1 to true.

Now this is exactly what I was looking for. Not only does the transfer function for the explanatory series keep the important close-to-open cycle intact, but I have also enforced the lift I need for the STXE series. The coefficients still remain smooth with a nice decaying property at the end.  With the new filter coefficients, I then applied them to the data both in-sample and out-of-sample, yielding the trading signal shown in Figure 14.  It posses exactly the properties that I was seeking. The close-to-open cyclical component is still being extracted (thanks in part to the explanatory series), and is still relatively consistent, although not as much as the pure bandpass design. The feature that I like is the following: When the log-return data diverges away from the cyclical component, with increasing volatility, the STXE filter reacts by pushing the signal down to avoid any erroneous trading. This can be seen in observations 100 through 120 and then at observations 390 through the end of trading. Figure 15 (same as Figure 1 at the top of the article) show the resulting trades and performance produced in-sample and out-of-sample by this signal. This is the art of meticulous signal engineering folks.

Figure 14: In-sample and out-of-sample signal produced from the low-pass with i1 coefficient constraints.

Figure 14: In-sample and out-of-sample signal produced from the low-pass with i1 coefficient constraints.

With only two losses suffered out-of-sample during the roughly 9 days trading, the filter performs much more methodologically than before. Notice during the final two days trading, when volatility picked up, the signal ceases to trade as it is being pushed down. It even continues to ‘predict’ the close-to-open jump/fall correctly, such as at observations 288, 321, and 391. The last trade made was a sell/short sell position, with the signal trending down at the end. The filter is in position to make a huge gain from this timely signaling of a short position at 391, correctly determining a large fall the next trading day, and then waiting out the volatile trading. The gain should be large no matter what happens.

Figure 15: In-sample and out-of-sample performance of the i1 constrained filter design.

Figure 15: In-sample and out-of-sample performance of the i1 constrained filter design.

One thing I mention before concluding is that I made a slight adjustment to my filter design after employing the i1 constraint to get the results shown in Figure 13-15. I’ll leave this as an exercise for the reader to deduce what I have done. Hint: Look at the freezed degrees of freedom before and after applying the i1 constraint. If you still have trouble finding what I’ve done, email me and I’ll give you further hints.

Conclusion

The overall performance of the first filter built, in regards to total return on investment out-of-sample, was superior to the second. However, this superior performance comes only with the assumption that the cycle component defined between frequencies .23 and .32 will continue to be present in future observations of STXE up until the expiration. If volatility increases and this intrinsic cycle ceases to exist in the log-return data, the performance will deteriorate.

For a better more comfortable approach that deals with changing volatile index conditions, I would opt for ensuring that the local-bias is present in the signal, This will effectively push or pull the signal down or up when the intrinsic cycle is weak in the increasing volatility, resulting in a pullback in trading activity.

As before, you can acquire the high-freq data used in this tutorial by requesting it via email.

Happy extracting!