High-Frequency Financial Trading on Index Futures with MDFA and R: An Example with EURO STOXX50

Figure 1: Out-of-sample performance of the trading signal for the Euro Stoxx50 index futures with expiration March 18th (STXE H3)  during the period of 1-9-2013 and 2-1-2013, using 15 minute log-returns. The black dotted lines indicate a buy/long signal and the blue dotted lines indicate a sell/short (top).

Figure 1: In-sample and Out-of-sample performance (observations 240-457) of the trading signal for the Euro Stoxx50 index futures with expiration March 18th (STXE H3) during the period of 1-9-2013 and 2-1-2013, using 15 minute log-returns. The black dotted lines indicate a buy/long signal and the blue dotted lines indicate a sell/short (top).

In this second tutorial on building high-frequency financial trading signals using the multivariate direct filter approach in R, I focus on the first example of my previous article on signal engineering in high-frequency trading of financial index futures where I consider 15-minute log-returns of the Euro STOXX50 index futures with expiration on March 18th, 2013 (STXE H3).  As I mentioned in the introduction, I added a slightly new step in my approach to constructing the signals for intraday observations as I had been studying the problem of close-to-open variations in the frequency domain. With 15-minute log-return data, I look at the frequency structure related to the close-to-open variation in the price, namely when the price at close of market hours significantly differs from the price at open, an effect I’ve mentioned in my previous two articles dealing with intraday log-return data. I will show (this time in R) how MDFA can take advantage of this variation in price and profit from each one by ‘predicting’ with the extracted signal the jump or drop in the price at the open of the next trading day. Seems to good to be true, right? I demonstrate in this article how it’s possible.

The first step after looking at the log-price and the log-return data of the asset being traded is to construct the periodogram of the in-sample data being traded on.  In this example, I work with the same time frame I did with my previous R tutorial by considering the in-sample portion of my data to be from 1-4-2013 to 1-23-2013, with my out-of-sample data span being from 1-23-2013 to 2-1-2013, which will be used to analyze the true performance of the trading signal. The STXE data and the accompanying explanatory series of the EURO STOXX50 are first loaded into R and then the periodogram is computed as follows.


#load the log-return and log-price SXTE data in-sample
load(paste(path.pgm,"stxe_insamp15min.RData",sep=""))
load(paste(path.pgm,"stxe_priceinsamp15min.RData",sep=""))
#load the log-return and log-price SXTE data out-of-sample
load(paste(path.pgm,"stxe_outsamp15min.RData",sep=""))
load(paste(path.pgm,"stxe_priceoutsamp15min.RData",sep=""))

len_price<-557
out_samp_len<-210
in_samp_len<-347

price_insample<-stxeprice_insamp
price_outsample<-stxeprice_outsamp

#some mdfa definitions
x<-stxe_insamp
len<-length(x[,1])

#my range for the 15-min close-to-open cycle
cutoff<-.32
ub<-.32
lb<-.23

#------------ Compute DFTs ---------------------------
spec_obj<-spec_comp(len,x,0)
weight_func<-spec_obj$weight_func
stxe_periodogram<-abs(spec_obj$weight_func[,1])^2
K<-length(weight_func[,1])-1

#----------- compute Gamma ----------------------------
Gamma<-((0:K)<(K*ub/pi))&((0:K)>(K*lb/pi))

colo<-rainbow(6)
xaxis<-0:K*(pi/(K+1))
plot(xaxis, stxe_periodogram, main="Periodogram of STXE", xlab="Frequency", ylab="Periodogram",
xlim=c(0, 3.14), ylim=c(min(stxe_periodogram), max(stxe_periodogram)),col=colo[1],type="l" )
abline(v=c(ub,lb),col=4,lty=3)

You’ll notice in the periodogram of the in-sample STXE log-returns that I’ve pinpointed a spectral peak between two blue dashed lines. This peak corresponds to an intrinsically important cycle in the 15-minute log-returns of index futures that gives access to predicting the close-to-open variation in the price. As you’ll see, the cycle flows fluidly through the 26 15-minute intervals during each trading day and will cross zero at (usually) one to two points during each trading day to signal whether to go long or go short on the index for the next day. I’ve deduced this optimal frequency range in a prior analysis of this data that I did using my target filter toolkit in iMetrica (see previous article). This frequency range will depend on the frequency of intraday observations, and can also depend on the index (but in my experiments, this range is typically consistent to be between .23 and .32 for most index futures using 15min observations). Thus in the R code above, I’ve defined a frequency cutoff at .32 and upper and lower bandpass cutoffs at .32 and .23, respectively.

Figure 2: Periodogram of the log-return STXE data. The spectral peak is extracted and highlighted between the two red dashed lines.

Figure 2: Periodogram of the log-return STXE data. The spectral peak is extracted and highlighted between the two red dashed lines.

In this first part of the tutorial, I extract this cycle responsible for marking the close-to-open variations and show how well it can perform. As I’ve mentioned in my previous articles on trading signal extraction, I like to begin with the mean-square solution (i.e. no customization or regularization) to the extraction problem to see exactly what kind of parameterization I might need. To produce the plain vanilla mean-square solution, I set all the parameters to 0.0 and then compute the filter by calling the main MDFA function (shown below). The function IMDFA returns an object with the filter coefficients and the in-sample signal. It also plots the concurrent transfer function for both of the filters along with the filter coefficients for increasing lag, shown in Figure 3.

L<-86
lambda_smooth<-0.0
lambda_cross<-0.0
lambda_decay<-c(0.00,0.0)
i1<-F
i2<-F
lambda<-0
expweight<-0
i_mdfa_obj<-IMDFA(L,i1,i2,cutoff,lambda,expweight,lambda_cross,lambda_decay,lambda_smooth,weight_func,Gamma,x)
Figure 3: Concurrent transfer functions for the STXE (red) and explanatory series (cyan) (top). Coefficients for the STXE and explanatory series.

Figure 3: Concurrent transfer functions for the STXE (red) and explanatory series (cyan) (top). Coefficients for the STXE and explanatory series (bottom).

Notice the noise leakage past the stopband in the concurrent filter and the roughness of both sets of filter coefficients (due to overfitting). We would like to smooth both of these out, along with allowing the filter coefficients to decay as the lag increases. This ensures more consistent in-sample and out-of-sample properties of the filter. I first apply some smoothing to the stopband by applying an expweight parameter of 16, and to compensate slightly for this improved smoothness, I improve the timeliness by setting the lambda parameter to 1. After noticing the improvement in the smoothness of filter coefficients, I then proceed with the regularization and conclude with the following parameters.

lambda_smooth<-0.90
lambda_decay<-c(0.08,0.11)
lambda<-1
expweight<-16
Figure 4: Transfer functions and coefficients after smoothing and regularization.

Figure 4: Transfer functions and coefficients after smoothing and regularization.

A vast improvement over the mean-squared solution. Virtually no noise leakage in the stopband passed \omega_1 =.32 and the coefficients decay beautifully with perfect smoothness achieved. Notice the two transfer functions perfectly picking out the spectral peak that is intrinsic to the close-to-open cycle that I mentioned was between .23 and .32. To verify these filter coefficients achieve the extraction of the close-to-open cycle, I compute the trading signal from the imdfa object and then plot it against the log-returns of STXE. I then compute the trades in-sample using the signal and the log-price of STXE. The R code is below and the plots are shown in Figures 5 and 6.

bn<-i_mdfa_obj$i_mdfa$b
trading_signal<-i_mdfa_obj$xff[,1] + i_mdfa_obj$xff[,2]

plot(x[L:len,1],col=colo[1],type="l")
lines(trading_signal[L:len],col=colo[4])
trade<-trading_logdiff(trading_signal[L:len],price_insample[L:len],0)
Figure : The in-sample signal and the log-returns of SXTE in 15 minute observations from 1-9-2013 to 1-23-2013

Figure 5: The in-sample signal and the log-returns of SXTE in 15 minute observations from 1-9-2013 to 1-23-2013

Figure 5 shows the log-return data and the trading signal extracted from the data. The spikes in the log-return data represent the close-to-open jumps in the STOXX Europe 50 index futures contract, occurring every 27 observations. But notice how regular the signal is, and how consistent this frequency range is found in the log-return data, almost like a perfect sinusoidal wave, with one complete cycle occurring nearly every 27 observations. This signal triggers trades that are shown in Figure 6, where the black dotted lines are buys/long and the blue dotted lines are sells/shorts. The signal is extremely consistent in finding the opportune times to buy and sell at the near optimal peaks, such as at observations 140, 197, and 240. It also ‘predicts’ the jump or fall of the EuroStoxx50 index future for the next trading day by triggering the necessary buy/sell signal, such as at observations 19, 40, 51, 99, 121, 156, and, 250.  The performance of this trading in-sample is shown in Figure 7.

Figure 6: The in-sample trades. Black dotted lines are buy/long and the blue dotted lines are sell/short.

Figure 6: The in-sample trades. Black dotted lines are buy/long and the blue dotted lines are sell/short.

Figure 7: The in-sample performance of the trading signal.

Figure 7: The in-sample performance of the trading signal.

Now for the real litmus test in the performance of this extracted signal, we need to apply the filter out-of-sample to check for consistency in not only performance, but also in trading characteristics. To do this in R, we bind the in-sample and out-of-sample data together and then apply the filter to the out-of-sample set (needing the final L-1 observations from the in-sample portion). The resulting signal in shown in Figure 8.

x_out<-rbind(stxe_insamp,stxe_outsamp)
xff<-matrix(nrow=out_samp_len,ncol=2)

  for(i in 1:out_samp_len)
  {
    xff[i,]<-0
    for(j in 2:3)
    {
      xff[i,j-1]<-xff[i,j-1]+bn[,j-1]%*%x_out[in_samp_len+i:(i-L+1),j]
    }
  }
  trading_signal_outsamp<-xff[,1] + xff[,2]

plot(stxe_outsamp[,1],col=colo[1],type="l")
lines(trading_signal_outsamp,col=colo[4])

The signal and log-return data Notice that the signal performs consistently out-of-sample until right around observation 170 when the log-returns become increasingly volatile. The intrinsic cycle between frequencies .23 and .32 has been slowed down due to this increased volatility and might affect the trading performance.

Figure 9: Out-of-sample signal and log-return data of STXE

Figure 8: Signal produced out-of-sample on 210 observations and log-return data of STXE

The total in-sample plus out-of-sample trading performance is shown in Figure 9 and 10, with the final 210 points being out-of-sample.  The out-of-sample performance is very much akin to the in-sample performance we had, with a very clear systematic trading exposed by ‘predicting’ the next day close-to-open jump or fall in a consistent manner, by triggering the necessary buy/sell signal, such as at observations 310, 363, 383, and 413, with only one loss up until the final day trading.  The higher volatility during the final day of the in-sample period damages the cyclical signal and fails to trade systematically as it had been during the first 420 observations.

Figure 9: The total in-sample plus out-of-sample buys and sells.

Figure 9: The total in-sample plus out-of-sample buys and sells.

Figure 10: Total performance over in-sample and out-of-sample periods.

Figure 10: Total performance over in-sample and out-of-sample periods.

With this kind of performance both in-sample and out-of-sample, and the beautifully consistent yet methodological trading patterns this signal provides, it would seem like attempting to improve upon it would be a pointless task. Why attempt to fix what’s not “broken”. But being the perfectionist that I am, I strive for an even “smarter” filter. If only there was a way to 1) keep the consistent cyclical trading effects as before 2)  ‘predict’ the next day close-to-open jump/fall in the Euro Stoxx50 index future as before, and 3) avoid volatile periods to eliminate erroneous trading, where the signal performed worse. After hours spent in iMetrica, I figured how to do it. This is where advanced trading signal engineering comes into play.

The first step was to include all the lower frequencies below .23, which were not included in my previous trading signal. Due to the low amount of activity in these lower frequencies, this should only provide the effect or a ‘lift’ or a ‘push’ or the signal locally, while still retaining the cyclical component. So after changing my \Gamma to a low-pass filter with cutoff set at \omega = .32, I then computed the filter with the new lowpass design. The transfer functions for the filter coefficients are shown below in Figure 11, with the red colored plot the transfer function for the STXE. Notice that the transfer function for the explanatory series still privileges spectral peak between .23 and .32, with only a slight lift at frequency zero (compare this with the bandpass design in Figure 4, not much has changed).  The problem is that the peak exceeds 1.0 in the passband, and this will amplify the cyclical component extracted from the log-return. It might be okay, trading wise, but not what I’m looking to do.  For the STXE filter, we get slightly more of a lift at frequency zero, however this has been compensated with a decreased cycle extraction between frequencies .23 and .32.  Also, a slight amount of noise has entered in the stopband, another factor we must mollify.


#---- set Gamma to low-pass
cutoff<-.32
Gamma<-((0:K)<(cutoff*K/pi))

#---- compute new filter ----------
i_mdfa_obj<-IMDFA(L,i1,i2,cutoff,lambda,expweight,lambda_cross,lambda_decay,lambda_smooth,weight_func,Gamma,x)
Figure 11: The concurrent transfer functions after changing to lowpass filter.

Figure 11: The concurrent transfer functions after changing to lowpass filter.

To improve the concurrent filter properties for both, I increase the smoothing expweight to 26, which will in turn affect the lambda_smooth, so I decrease it to .70. This gives me a much better transfer function pair, shown in Figure 12.  Notice the peak in the explanatory series transfer function is now much closer to 1.0, exactly what we want.

Figure 11: The concurrent transfer functions after changing to lowpass filter.

Figure 12: The concurrent transfer functions after changing to lowpass filter, increasing expweight to 26, and decreasing lambda_smooth to .70.

I’m still not satisfied with the lift at frequency zero for the STXE series. At roughly .5 at frequency zero, the filter might not provide enough push or pull that I need. The only way to ensure a guaranteed lift in the STXE log-return series is to employ constraints on the filter coefficients so that the transfer function is one at frequency zero. This can be achieved by setting i1 to true in the IMDFA function call, which effectively ensures that the sum of the filter coefficients at \omega = 0 is one. After doing this, I get the following transfer functions and the respective filter coefficients.

#---- Update the regularization parameters
lambda_smooth<-0.68
lambda_cross<-0.0
lambda_decay<-c(0.083,0.11)

#---- update customization parameters
lambda<-0
expweight<-28

#---- set filter constraint -------
i1<-T
weight_constraint[1]<-1
Figure 13: Transfer function and filter coefficients after setting the coefficient constraint i1 to true.

Figure 13: Transfer function and filter coefficients after setting the coefficient constraint i1 to true.

Now this is exactly what I was looking for. Not only does the transfer function for the explanatory series keep the important close-to-open cycle intact, but I have also enforced the lift I need for the STXE series. The coefficients still remain smooth with a nice decaying property at the end.  With the new filter coefficients, I then applied them to the data both in-sample and out-of-sample, yielding the trading signal shown in Figure 14.  It posses exactly the properties that I was seeking. The close-to-open cyclical component is still being extracted (thanks in part to the explanatory series), and is still relatively consistent, although not as much as the pure bandpass design. The feature that I like is the following: When the log-return data diverges away from the cyclical component, with increasing volatility, the STXE filter reacts by pushing the signal down to avoid any erroneous trading. This can be seen in observations 100 through 120 and then at observations 390 through the end of trading. Figure 15 (same as Figure 1 at the top of the article) show the resulting trades and performance produced in-sample and out-of-sample by this signal. This is the art of meticulous signal engineering folks.

Figure 14: In-sample and out-of-sample signal produced from the low-pass with i1 coefficient constraints.

Figure 14: In-sample and out-of-sample signal produced from the low-pass with i1 coefficient constraints.

With only two losses suffered out-of-sample during the roughly 9 days trading, the filter performs much more methodologically than before. Notice during the final two days trading, when volatility picked up, the signal ceases to trade as it is being pushed down. It even continues to ‘predict’ the close-to-open jump/fall correctly, such as at observations 288, 321, and 391. The last trade made was a sell/short sell position, with the signal trending down at the end. The filter is in position to make a huge gain from this timely signaling of a short position at 391, correctly determining a large fall the next trading day, and then waiting out the volatile trading. The gain should be large no matter what happens.

Figure 15: In-sample and out-of-sample performance of the i1 constrained filter design.

Figure 15: In-sample and out-of-sample performance of the i1 constrained filter design.

One thing I mention before concluding is that I made a slight adjustment to my filter design after employing the i1 constraint to get the results shown in Figure 13-15. I’ll leave this as an exercise for the reader to deduce what I have done. Hint: Look at the freezed degrees of freedom before and after applying the i1 constraint. If you still have trouble finding what I’ve done, email me and I’ll give you further hints.

Conclusion

The overall performance of the first filter built, in regards to total return on investment out-of-sample, was superior to the second. However, this superior performance comes only with the assumption that the cycle component defined between frequencies .23 and .32 will continue to be present in future observations of STXE up until the expiration. If volatility increases and this intrinsic cycle ceases to exist in the log-return data, the performance will deteriorate.

For a better more comfortable approach that deals with changing volatile index conditions, I would opt for ensuring that the local-bias is present in the signal, This will effectively push or pull the signal down or up when the intrinsic cycle is weak in the increasing volatility, resulting in a pullback in trading activity.

As before, you can acquire the high-freq data used in this tutorial by requesting it via email.

Happy extracting!

High-Frequency Financial Trading on FOREX with MDFA and R: An Example with the Japanese Yen

In-sample (observations 1-235) and out-of-sample (observations 236-455) performance of the trading signal built in this tutorial using MDFA. (Top) The log price of the Yen (FXY) in 15 minute intervals and the trades generated by the trading signal. Here black line is a buy (long), blue is sell (short position). (Bottom) The returns accumulated (cash)  generated by the trading, in percentage gained or lossed.

Figure 1: In-sample (observations 1-250) and out-of-sample performance of the trading signal built in this tutorial using MDFA. (Top) The log price of the Yen (FXY) in 15 minute intervals and the trades generated by the trading signal. Here black line is a buy (long), blue is sell (short position). (Bottom) The returns accumulated (cash) generated by the trading, in percentage gained or lost.

In my previous article on high-frequency trading in iMetrica on the FOREX/GLOBEX, I introduced some robust signal extraction strategies in iMetrica using the multidimensional direct filter approach (MDFA) to generate high-performance signals for trading on the foreign exchange and Futures market. In this article I take a brief leave-of-absence from my world of developing financial trading signals in iMetrica and migrate into an uber-popular language used in finance due to its exuberant array of packages, quick data management and graphics handling, and of course the fact that it’s free (as in speech and beer) on nearly any computing platform in the world.

This article gives an intro tutorial on using R for high-frequency trading on the FOREX market using the R package for MDFA (offered by Herr Doktor Marc Wildi von Bern) and some strategies that I’ve developed for generating financially robust trading signals. For this tutorial, I consider the second example given in my previous article where I engineered a trading signal for 15-minute log-returns of the Japanese Yen (from opening bell to market close EST).  This presented slightly new challenges than before as the close-to-open jump variations are much larger than those generated by hourly or daily returns. But as I demonstrated, these larger variations on close-to-open price posed no problems for the MDFA. In fact, it exploited these jumps and made large profits by predicting the direction of the jump. Figure 1 at the top of this article shows the in-sample (observations 1-250) and out-of-sample (observations 251 onward) performance of the filter I will be building in the first part of this tutorial. 

Throughout this tutorial, I attempt to replicate these results that I built in iMetrica and expand on them a bit using the R language and the implementation of the MDFA available in here.  The data that we consider are 15-minute log-returns of the Yen from January 4th to January 17th and I have them saved as an .RData file given by ld_fxy_insamp. I have an additional explanatory series embedded in the .RData file that I’m using to predict the price of the Yen. Additionally, I also will be using price_fxy_insamp which is the log price of Yen, used to compute the performance (buy/sells) of the trading signal. The ld_fxy_insamp will be used as the in-sample data to construct the filter and trading signal for FXY. To obtain this data so you can perform these examples at home, email me and I’ll send you all the necessary .RData files (the in-sample and out-of-sample data) in a .zip file. Taking a quick glance at the ld_fxy_insamp data, we see log-returns of the Yen at every 15 minutes starting at market open (time zone UTC). The target data (Yen) is in the first column along with the two explanatory series (Yen and another asset co-integrated with movement of Yen).

> head(ld_fxy_insamp)
[,1]           [,2]          [,3]
2013-01-04 13:30:00  0.000000e+00   0.000000e+00  0.0000000000
2013-01-04 13:45:00  4.763412e-03   4.763412e-03  0.0033465833
2013-01-04 14:00:00 -8.966599e-05  -8.966599e-05  0.0040635638
2013-01-04 14:15:00  2.597055e-03   2.597055e-03 -0.0008322064
2013-01-04 14:30:00 -7.157556e-04  -7.157556e-04  0.0020792190
2013-01-04 14:45:00 -4.476075e-04  -4.476075e-04 -0.0014685198

Moving on, to begin constructing the first trading signal for the Yen, we begin by uploading the data into our R environment, define some initial parameters for the MDFA function call, and then compute the DFTs and periodogram for the Yen.

load(paste(path.pgm,"ld_fxy_in15min.RData",sep=""))    #load in-sample log-returns of Yen
load(paste(path.pgm,"price_fxy_in15min.RData",sep="")) #load in-sample log-price of Yen

in_samp_lenprice_insample<-price_fxy_insamp

#setup some MDFA variables
x<-ld_fxy_insamp
len<-length(x[,1])
shift_constraint<-rep(0,length(x[1,])-1)
weight_constraint<-rep(0,length(x[1,])-1)
d<-0
plots<-T
lin_expweight<-F

# Compute DFTs and periodogram for initial analysis
spec_obj<-spec_comp(len,x,d)
weight_func<-spec_obj$weight_func
K<-length(weight_func[,1])-1
fxy_periodogram<-abs(spec_obj$weight_func[,1])^2

As I’ve mentioned in my previous articles, my step-by-step strategy for building trading signals always begin by a quick analysis of the periodogram of the asset being traded on. Holding the key to providing insight into the characteristics of how the asset trades, the periodogram is an essential tool for navigating how the extractor \Gamma is chosen. Here, I look for principal spectral peaks that correspond in the time domain to how and where my signal will trigger buy/sell trades. Figure 2 shows the periodogram of the 15-minute log-returns of the Japanese Yen during the in-sample period from January 4 to January 17 2013. The arrows point to the main spectral peaks that I look for and provides a guide to how I will define my \Gamma function. The black dotted lines indicate the two frequency cutoffs that I will consider in this example, the first being \pi/12 and the second at \pi/6. Notice that both cutoffs are set directly after a spectral peak, something that I highly recommend.  In high-frequency trading on the FOREX using MDFA, as we’ll see, the trick is to seek out the spectral peak which accounts for the close-to-open variation in the price of the foreign currency. We want to take advantage of this spectral peak as this is where the big gains in foreign currency trading using MDFA will occur.

Figure 2: Periodogram of FXY (Japanese Yen) along with spectral peaks and two different frequency cutoffs.

Figure 2: Periodogram of FXY (Japanese Yen) along with spectral peaks and two different frequency cutoffs.

In our first example we consider the larger frequency as the cutoff for \Gamma by setting it to \pi/6 (the right most line in the figure of the periodogram). I then initially set the timeliness and smoothness parameters, lambda and expweight to 0 along with setting all the regularization parameters to 0 as well. This will give me a barometer for where and how much to adjust the filter parameters. In selecting the filter length L, my empirical studies over numerous experiments in building trading signals using iMetrica have demonstrated that a ‘good’ choice is anywhere between 1/4 and 1/5 of the total in-sample length of the time series data.  Of course, the length depends on the frequency of the data observations (i.e. 15 minute, hourly, daily, etc.), but in general you will most likely never need more than L being greater than 1/4 the in-sample size. Otherwise, regularization can become too cumbersome to handle effectively. In this example, the total in-sample length is 335 and thus I set L= 82 which I’ll stick to for the remainder of this tutorial. In any case, the length of the filter is not the most crucial parameter to consider in building good trading signals. For a good robust selection of the filter parameters couple with appropriate explanatory series, the results of the trading signal with L= 80 compared with, say, L= 85 should hardly differ. If they do, then the parameterization is not robust enough.

After uploading both the in-sample log-return data along with the corresponding log price of the Yen for computing the trading performance, we the proceed in R to setting initial filter settings for the MDFA routine and then compute the filter using the IMDFA_comp function. This returns both the i_mdfa& object holding coefficients, frequency response functions, and statistics of filter, along with the signal produced for each explanatory series. We combine these signals to get the final trading signal in-sample. All this is all done in R as follows:


cutoff<-pi/6 #set frequency cutoff
Gamma<-((0:K)<(cutoff*K/pi)) #define Gamma

grand_mean<-F
Lag<-0
L<-82
lambda_smooth<-0
lambda_cross<-0
lambda_decay<-c(0.,0.) #regularization - decay

lambda<-0
expweight<-0
i1<-F
i2<-F
# compute the filter for the given parameter definitions
i_mdfa_obj<-IMDFA_comp(Lag,K,L,lambda,weight_func,Gamma,expweight,cutoff,i1,i2,weight_constraint,
lambda_cross,lambda_decay,lambda_smooth,x,plots,lin_expweight,shift_constraint,grand_mean)

# after computing filter, we save coefficients
bn<-i_mdfa_obj$i_mdfa$b

# now we build trading signal
trading_signal<-i_mdfa_obj$xff[,1] + i_mdfa_obj$xff[,2]

The resulting frequency response functions of the filter and the coefficients are plotted in the figure below.

Figure 3: The Frequency response functions of the filter and the filter coefficients

Figure 3: The Frequency response functions of the filter (top) and the filter coefficients (below)

Notice the abundance of noise still present passed the cutoff frequency. This is mollified by increasing the expweight smoothness parameter. The coefficients for each explanatory series show some correlation in their movement as the lags increase. However, the smoothness and decay of the coefficients leaves much to be desired. We will remedy this by introducing regularization parameters. Plots of the in-sample trading signal and the performance in-sample of the signal are shown in the two figures below. Notice that the trading signal behaves quite nicely in-sample. However, looks can be deceiving. This stellar performance is due in large part to a filtering phenomenon called overfitting. One can deduce that overfitting is the culprit here by simply looking at the nonsmoothness of the coefficients along with the number of freezed degrees of freedom, which in this example is roughly 174 (out of 174), way too high. We would like to get this number at around half the total amount of degrees of freedom (number of explanatory series x L).

Figure 4: The trading signal and the log-return data of the Yen.

Figure 4: The trading signal and the log-return data of the Yen.

The in-sample performance of this filter demonstrates the type of results we would like to see after regularization is applied.  But now comes for the sobering effects of overfitting. We apply these filter coeffcients to 200 15-minute observations of the Yen and the explanatory series from January 18 to February 1 2013 and compare with the characteristics in-sample. To do this in R, we first load the out-of-sample data into the R environment, and then apply the filter to the out-of-sample data that I defined as x_out.

load(paste(path.pgm,"ld_fxy_out15min.RData",sep=""))
load(paste(path.pgm,"price_fxy_out15min.RData",sep=""))
x_out<-rbind(ld_fxy_insamp,ld_fxy_outsamp) #bind the in-sample with out-of-sample data
xff<-matrix(nrow=out_samp_len,ncol=2)

#apply filter built in-sample
for(i in 1:out_samp_len)
{
  xff[i,]<-0
  for(j in 2:3)
  {
      xff[i,j-1]<-xff[i,j-1]+bn[,j-1]%*%x_out[335+i:(i-L+1),j]
  }
}
trading_signal_outsamp<-xff[,1] + xff[,2]     #assemble the trading signal out-of-sample
trade_outsamp<-trading_logdiff(trading_signal_outsamp,price_outsample,.0005)  #compute the performance

The plot in Figure 5 shows the out-of-sample trading signal. Notice that the signal is not nearly as smooth as it was in-sample. Overshooting of the data in some areas is also obviously present. Although the out-of-sample overfitting characteristics of the signal are not horribly suspicious, I would not trust this filter to produce stellar returns in the long run.

Figure : Filter applied to 200 15 minute observations of Yen out-of-sample to produce trading signal (shown in blue)

Figure 5 : Filter applied to 200 15 minute observations of Yen out-of-sample to produce trading signal (shown in blue)

Following the previous analysis of the mean-squared solution (no customization or regularization), we now proceed to clean up the problem of overfitting that was apparent in the coefficients along with mollifying the noise in the stopband (frequencies after \pi/6).  In order to choose the parameters for smoothing and regularization, one approach is to first apply the smoothness parameter first, as this will generally smooth the coefficients while acting as a ‘pre’-regularizer, and then advance to selecting appropriate regularization controls. In looking at the coefficients (Figure 3), we can see that a fair amount of smoothing is necessary, with only a slight touch of decay. To select these two parameters in R, one option is to use the Troikaner optimizer (found here) to find a suitable combination (I have a secret sauce algorithmic approach I developed for iMetrica for choosing optimal combinations of parameters given an extractor \Gamma and a performance indicator, although it’s lengthy (even in GNU C) and cumbersome to use, so I typically prefer the strategy discussed in this tutorial).   In this example, I began by setting the lambda_smooth to .5 and the decay to (.1,.1) along with an expweight smoothness parameter set to 8.5. After viewing the coefficients, it still wasn’t enough smoothness, so I proceeded to add more finally reaching .63, which did the trick. I then chose lambda to balance the effects of the smoothing expweight (lambda is always the last resort tweaking parameter).

lambda_smooth<-0.63
lambda_cross<-0.
lambda_decay<-c(0.119,0.099)
lambda<-9
expweight<-8.5

i_mdfa_obj<-IMDFA_comp(Lag,K,L,lambda,weight_func,Gamma,expweight,cutoff,i1,i2,weight_constraint,
lambda_cross,lambda_decay,lambda_smooth,x,plots,lin_expweight,shift_constraint,grand_mean)

bn<-i_mdfa_obj$i_mdfa$b    #save the filter coefficients

trading_signal<-i_mdfa_obj$xff[,1] + i_mdfa_obj$xff[,2]  #compute the trading signal
trade<-trading_logdiff(trading_signal[L:len],price_insample[L:len],0) #compute the in-sample performance

Figure 6 shows the resulting frequency response function for both explanatory series (Yen in red). Notice that the largest spectral peak found directly before the frequency cutoff at \pi/6 is being emphasized and slightly mollified (value near .8 instead of 1.0). The other spectral peaks below \pi/6 are also present. For the coefficients, just enough smoothing and decay was applied to keep the lag, cyclical, and correlated structure of the coefficients intact, but now they look much nicer in their smoothed form. The number of freezed degrees of freedom has been reduced to approximately 102.

Figure : The frequency response functions and the coefficients after regularization and smoothing have been applied.

Figure 6: The frequency response functions and the coefficients after regularization and smoothing have been applied (top). The smoothed coefficients with slight decay at the end (bottom). Number of freezed degrees of freedom is approximately 102 (out of 172).

Along with an improved freezed degrees of freedom and no apparent havoc of overfitting, we apply this filter out-of-sample to the 200 out-of-sample observations in order to verify the improvement in the structure of the filter coefficients (shown below in Figure 7).  Notice the tremendous improvement in the properties of the trading signal (compared with Figure 5). The overshooting of the data has be eliminated and the overall smoothness of the signal has significantly improved. This is due to the fact that we’ve eradicated the presence of overfitting.

Figure : Out-of-sample trading signal with regularization.

Figure 7: Out-of-sample trading signal with regularization.

With all indications of a filter endowed with exactly the characteristics we need for robustness, we now apply the trading signal both in-sample and out of sample to activate the buy/sell trades and see the performance of the trading account in cash value. When the signal crosses below zero, we sell (enter short position) and when the signal rises above zero, we buy (enter long position).

The top plot of Figure 8 is the log price of the Yen for the 15 minute intervals and the dotted lines represent exactly where the trading signal generated trades (crossing zero). The black dotted lines represent a buy (long position) and the blue lines indicate a sell (and short position).  Notice that the signal predicted all the close-to-open jumps for the Yen (in part thanks to the explanatory series). This is exactly what we will be striving for when we add regularization and customization to the filter. The cash account of the trades over the in-sample period is shown below, where transaction costs were set at .05 percent. In-sample, the signal earned roughly 6 percent in 9 trading days and a 76 percent trading success ratio.

Figure : In-sample performance of the new filter  and the trades that are generated.

Figure 8: In-sample performance of the new filter and the trades that are generated.

Now for the ultimate test to see how well the filter performs in producing a winning trading signal, we applied the filter to the 200 15-minute out-of-sample observation of the Yen and the explanatory series from Jan 18th to February 1st and make trades based on the zero crossing. The results are shown below in Figure 9. The black lines represent the buys and blue lines the sells (shorts). Notice the filter is still able to predict the close-to-open jumps even out-of-sample thanks to the regularization. The filter succumbs to only three tiny losses at less than .08 percent each between observations 160 and 180 and one small loss at the beginning, with an out-of-sample trade success ratio hitting 82 percent and an ROI of just over 4 percent over the 9 day interval.

Figure : Out-of-sample performance of the regularized filter on 200 out-of-sample 15 minute returns of the Yen.

Figure 9: Out-of-sample performance of the regularized filter on 200 out-of-sample 15 minute returns of the Yen. The filter achieved 4 percent ROI over the 200 observations and an 82 percent trade success ratio.

Compare this with the results achieved in iMetrica using the same MDFA parameter settings. In Figure 10, both the in-sample and out-of-sample performance are shown. The performance is nearly identical.

Figure : In-sample and out-of-sample performance of the Yen filter in iMetrica. Nearly identical with performance obtained in R.

Figure 10: In-sample and out-of-sample performance of the Yen filter in iMetrica. Nearly identical with performance obtained in R.

Example 2

Now we take a stab at producing another trading filter for the Yen, only this time we wish to identify only the lowest frequencies to generate a trading signal that trades less often, only seeking the largest cycles. As with the performance of the previous filter, we still wish to target the frequencies that might be responsible to the large close-to-open variations in the price of Yen. To do this, we select our cutoff to be \pi/12 which will effectively keep the largest three spectral peaks intact in the low-pass band of \Gamma.

For this new filter, we keep things simple by continuing to use the same regularization parameters chosen in the previous filter as they seemed to produce good results out-of-sample. The \lambda and expweight customization parameters however need to be adjusted to account for the new noise suppression requirements in the stopband and the phase properties in the smaller passband. Thus I increase the smoothing parameter and decreased the timeliness parameter (which only affects the passband) to account for this change. The new frequency response functions and filter coefficients for this smaller lowpass design are shown below in Figure 11. Notice that the second spectral peak is accounted for and only slightly mollified under the new changes. The coefficients still have the noticeable smoothness and decay at the largest lags.

Figure : Frequency response functions of the two filters and their corresponding coefficients.

Figure 11: Frequency response functions of the two filters and their corresponding coefficients.

To test the effectiveness of this new lower trading frequency design, we apply the filter coefficients to the 200 out-of-sample observations of the 15-minute Yen log-returns. The performance is shown below in Figure 12. In this filter, we clearly see that the filter still succeeds in predicting correctly the large close-to-open jumps in the price of the Yen. Only three total losses are observed during the 9 day period. The overall performance is not as appealing as the previous filter design as less amount of trades are made, with a near 2 percent ROI and 76 percent trade success ratio. However, this design could fit the priorities for a trader much more sensitive to transaction costs.

Figure : Out-of-sample performance of filter with lower cutoff.

Figure 12: Out-of-sample performance of filter with lower cutoff.

Conclusion

Verification and cross-validation is important, just as the most interesting man in the world knows.

Verification and cross-validation is important, just as the most interesting man in the world will tell you.

The point of this tutorial was to show some of the main concepts and strategies that I undergo when approaching the problem of building a robust and highly efficient trading signal for any given asset at any frequency. I also wanted to see if I could achieve similar results with the R MDFA package as my iMetrica software package. The results ended up being nearly parallel except for some minor differences. The main points I was attempting to highlight were in first analyzing the periodogram to seek out the important spectral peaks (such as ones associate with close-to-open variations) and to demonstrate how the choice of the cutoff affects the systematic trading.  Here’s a quick recap on good strategies and hacks to keep in mind.

Summary of strategies for building trading signal using MDFA in R:

  • As I mentioned before, the periodogram is your best friend. Apply the cutoff directly after any range of spectral peaks that you want to consider. These peaks are what generate the trades.
  • Utilize a choice of filter length L no greater than 1/4. Anything larger is unnecessary.
  • Begin by computing the filter in the mean-square sense, namely without using any customization or regularization and see exactly what needs to be approved upon by viewing the frequency response functions and coefficients for each explanatory series.  Good performance of the trading signal in-sample (and even out-of-sample in most cases) is meaningless unless the coefficients have solid robust characteristics in both the frequency domain and the lag domain.
  • I recommend beginning with tweaking the smoothness customization parameter expweight and the lambda_smooth regularization parameters first. Then proceed with only slight adjustments to the lambda_decay parameters. Finally, as a last resort, the lambda customization. I really never bother to look at lambda_cross. It has seldom helped in any significant manner.  Since the data we are using to target and build trading signals are log-returns, no need to ever bother with i1 and i2. Those are for the truly advanced and patient signal extractors, and should only be left for those endowed with iMetrica 😉

If you have any questions, or would like the high-frequency Yen data I used in these examples, feel free to contact me and I’ll send them to you. Until next time, happy extracting!