# High-Frequency Financial Trading on Index Futures with MDFA and R: An Example with EURO STOXX50

Figure 1: In-sample and Out-of-sample performance (observations 240-457) of the trading signal for the Euro Stoxx50 index futures with expiration March 18th (STXE H3) during the period of 1-9-2013 and 2-1-2013, using 15 minute log-returns. The black dotted lines indicate a buy/long signal and the blue dotted lines indicate a sell/short (top).

In this second tutorial on building high-frequency financial trading signals using the multivariate direct filter approach in R, I focus on the first example of my previous article on signal engineering in high-frequency trading of financial index futures where I consider 15-minute log-returns of the Euro STOXX50 index futures with expiration on March 18th, 2013 (STXE H3).  As I mentioned in the introduction, I added a slightly new step in my approach to constructing the signals for intraday observations as I had been studying the problem of close-to-open variations in the frequency domain. With 15-minute log-return data, I look at the frequency structure related to the close-to-open variation in the price, namely when the price at close of market hours significantly differs from the price at open, an effect I’ve mentioned in my previous two articles dealing with intraday log-return data. I will show (this time in R) how MDFA can take advantage of this variation in price and profit from each one by ‘predicting’ with the extracted signal the jump or drop in the price at the open of the next trading day. Seems to good to be true, right? I demonstrate in this article how it’s possible.

The first step after looking at the log-price and the log-return data of the asset being traded is to construct the periodogram of the in-sample data being traded on.  In this example, I work with the same time frame I did with my previous R tutorial by considering the in-sample portion of my data to be from 1-4-2013 to 1-23-2013, with my out-of-sample data span being from 1-23-2013 to 2-1-2013, which will be used to analyze the true performance of the trading signal. The STXE data and the accompanying explanatory series of the EURO STOXX50 are first loaded into R and then the periodogram is computed as follows.


#load the log-return and log-price SXTE data in-sample
#load the log-return and log-price SXTE data out-of-sample

len_price&lt;-557
out_samp_len&lt;-210
in_samp_len&lt;-347

price_insample&lt;-stxeprice_insamp
price_outsample&lt;-stxeprice_outsamp

#some mdfa definitions
x&lt;-stxe_insamp
len&lt;-length(x[,1])

#my range for the 15-min close-to-open cycle
cutoff&lt;-.32
ub&lt;-.32
lb&lt;-.23

#------------ Compute DFTs ---------------------------
spec_obj&lt;-spec_comp(len,x,0)
weight_func&lt;-spec_obj$weight_func stxe_periodogram&lt;-abs(spec_obj$weight_func[,1])^2
K&lt;-length(weight_func[,1])-1

#----------- compute Gamma ----------------------------
Gamma&lt;-((0:K)&lt;(K*ub/pi))&amp;((0:K)&gt;(K*lb/pi))

colo&lt;-rainbow(6)
xaxis&lt;-0:K*(pi/(K+1))
plot(xaxis, stxe_periodogram, main=&quot;Periodogram of STXE&quot;, xlab=&quot;Frequency&quot;, ylab=&quot;Periodogram&quot;,
xlim=c(0, 3.14), ylim=c(min(stxe_periodogram), max(stxe_periodogram)),col=colo[1],type=&quot;l&quot; )
abline(v=c(ub,lb),col=4,lty=3)


You’ll notice in the periodogram of the in-sample STXE log-returns that I’ve pinpointed a spectral peak between two blue dashed lines. This peak corresponds to an intrinsically important cycle in the 15-minute log-returns of index futures that gives access to predicting the close-to-open variation in the price. As you’ll see, the cycle flows fluidly through the 26 15-minute intervals during each trading day and will cross zero at (usually) one to two points during each trading day to signal whether to go long or go short on the index for the next day. I’ve deduced this optimal frequency range in a prior analysis of this data that I did using my target filter toolkit in iMetrica (see previous article). This frequency range will depend on the frequency of intraday observations, and can also depend on the index (but in my experiments, this range is typically consistent to be between .23 and .32 for most index futures using 15min observations). Thus in the R code above, I’ve defined a frequency cutoff at .32 and upper and lower bandpass cutoffs at .32 and .23, respectively.

Figure 2: Periodogram of the log-return STXE data. The spectral peak is extracted and highlighted between the two red dashed lines.

In this first part of the tutorial, I extract this cycle responsible for marking the close-to-open variations and show how well it can perform. As I’ve mentioned in my previous articles on trading signal extraction, I like to begin with the mean-square solution (i.e. no customization or regularization) to the extraction problem to see exactly what kind of parameterization I might need. To produce the plain vanilla mean-square solution, I set all the parameters to 0.0 and then compute the filter by calling the main MDFA function (shown below). The function IMDFA returns an object with the filter coefficients and the in-sample signal. It also plots the concurrent transfer function for both of the filters along with the filter coefficients for increasing lag, shown in Figure 3.

L&lt;-86
lambda_smooth&lt;-0.0
lambda_cross&lt;-0.0
lambda_decay&lt;-c(0.00,0.0)
i1&lt;-F
i2&lt;-F
lambda&lt;-0
expweight&lt;-0
i_mdfa_obj&lt;-IMDFA(L,i1,i2,cutoff,lambda,expweight,lambda_cross,lambda_decay,lambda_smooth,weight_func,Gamma,x)


Figure 3: Concurrent transfer functions for the STXE (red) and explanatory series (cyan) (top). Coefficients for the STXE and explanatory series (bottom).

Notice the noise leakage past the stopband in the concurrent filter and the roughness of both sets of filter coefficients (due to overfitting). We would like to smooth both of these out, along with allowing the filter coefficients to decay as the lag increases. This ensures more consistent in-sample and out-of-sample properties of the filter. I first apply some smoothing to the stopband by applying an expweight parameter of 16, and to compensate slightly for this improved smoothness, I improve the timeliness by setting the lambda parameter to 1. After noticing the improvement in the smoothness of filter coefficients, I then proceed with the regularization and conclude with the following parameters.

lambda_smooth&lt;-0.90
lambda_decay&lt;-c(0.08,0.11)
lambda&lt;-1
expweight&lt;-16


Figure 4: Transfer functions and coefficients after smoothing and regularization.

A vast improvement over the mean-squared solution. Virtually no noise leakage in the stopband passed $\omega_1 =.32$ and the coefficients decay beautifully with perfect smoothness achieved. Notice the two transfer functions perfectly picking out the spectral peak that is intrinsic to the close-to-open cycle that I mentioned was between .23 and .32. To verify these filter coefficients achieve the extraction of the close-to-open cycle, I compute the trading signal from the imdfa object and then plot it against the log-returns of STXE. I then compute the trades in-sample using the signal and the log-price of STXE. The R code is below and the plots are shown in Figures 5 and 6.

bn&lt;-i_mdfa_obj$i_mdfa$b
trading_signal&lt;-i_mdfa_obj$xff[,1] + i_mdfa_obj$xff[,2]

plot(x[L:len,1],col=colo[1],type=&quot;l&quot;)


Figure 5: The in-sample signal and the log-returns of SXTE in 15 minute observations from 1-9-2013 to 1-23-2013

Figure 5 shows the log-return data and the trading signal extracted from the data. The spikes in the log-return data represent the close-to-open jumps in the STOXX Europe 50 index futures contract, occurring every 27 observations. But notice how regular the signal is, and how consistent this frequency range is found in the log-return data, almost like a perfect sinusoidal wave, with one complete cycle occurring nearly every 27 observations. This signal triggers trades that are shown in Figure 6, where the black dotted lines are buys/long and the blue dotted lines are sells/shorts. The signal is extremely consistent in finding the opportune times to buy and sell at the near optimal peaks, such as at observations 140, 197, and 240. It also ‘predicts’ the jump or fall of the EuroStoxx50 index future for the next trading day by triggering the necessary buy/sell signal, such as at observations 19, 40, 51, 99, 121, 156, and, 250.  The performance of this trading in-sample is shown in Figure 7.

Figure 6: The in-sample trades. Black dotted lines are buy/long and the blue dotted lines are sell/short.

Figure 7: The in-sample performance of the trading signal.

Now for the real litmus test in the performance of this extracted signal, we need to apply the filter out-of-sample to check for consistency in not only performance, but also in trading characteristics. To do this in R, we bind the in-sample and out-of-sample data together and then apply the filter to the out-of-sample set (needing the final L-1 observations from the in-sample portion). The resulting signal in shown in Figure 8.

x_out&lt;-rbind(stxe_insamp,stxe_outsamp)
xff&lt;-matrix(nrow=out_samp_len,ncol=2)

for(i in 1:out_samp_len)
{
xff[i,]&lt;-0
for(j in 2:3)
{
xff[i,j-1]&lt;-xff[i,j-1]+bn[,j-1]%*%x_out[in_samp_len+i:(i-L+1),j]
}
}

plot(stxe_outsamp[,1],col=colo[1],type=&quot;l&quot;)


The signal and log-return data Notice that the signal performs consistently out-of-sample until right around observation 170 when the log-returns become increasingly volatile. The intrinsic cycle between frequencies .23 and .32 has been slowed down due to this increased volatility and might affect the trading performance.

Figure 8: Signal produced out-of-sample on 210 observations and log-return data of STXE

The total in-sample plus out-of-sample trading performance is shown in Figure 9 and 10, with the final 210 points being out-of-sample.  The out-of-sample performance is very much akin to the in-sample performance we had, with a very clear systematic trading exposed by ‘predicting’ the next day close-to-open jump or fall in a consistent manner, by triggering the necessary buy/sell signal, such as at observations 310, 363, 383, and 413, with only one loss up until the final day trading.  The higher volatility during the final day of the in-sample period damages the cyclical signal and fails to trade systematically as it had been during the first 420 observations.

Figure 9: The total in-sample plus out-of-sample buys and sells.

Figure 10: Total performance over in-sample and out-of-sample periods.

With this kind of performance both in-sample and out-of-sample, and the beautifully consistent yet methodological trading patterns this signal provides, it would seem like attempting to improve upon it would be a pointless task. Why attempt to fix what’s not “broken”. But being the perfectionist that I am, I strive for an even “smarter” filter. If only there was a way to 1) keep the consistent cyclical trading effects as before 2)  ‘predict’ the next day close-to-open jump/fall in the Euro Stoxx50 index future as before, and 3) avoid volatile periods to eliminate erroneous trading, where the signal performed worse. After hours spent in iMetrica, I figured how to do it. This is where advanced trading signal engineering comes into play.

The first step was to include all the lower frequencies below .23, which were not included in my previous trading signal. Due to the low amount of activity in these lower frequencies, this should only provide the effect or a ‘lift’ or a ‘push’ or the signal locally, while still retaining the cyclical component. So after changing my $\Gamma$ to a low-pass filter with cutoff set at $\omega = .32$, I then computed the filter with the new lowpass design. The transfer functions for the filter coefficients are shown below in Figure 11, with the red colored plot the transfer function for the STXE. Notice that the transfer function for the explanatory series still privileges spectral peak between .23 and .32, with only a slight lift at frequency zero (compare this with the bandpass design in Figure 4, not much has changed).  The problem is that the peak exceeds 1.0 in the passband, and this will amplify the cyclical component extracted from the log-return. It might be okay, trading wise, but not what I’m looking to do.  For the STXE filter, we get slightly more of a lift at frequency zero, however this has been compensated with a decreased cycle extraction between frequencies .23 and .32.  Also, a slight amount of noise has entered in the stopband, another factor we must mollify.


#---- set Gamma to low-pass
cutoff&lt;-.32
Gamma&lt;-((0:K)&lt;(cutoff*K/pi))

#---- compute new filter ----------
i_mdfa_obj&lt;-IMDFA(L,i1,i2,cutoff,lambda,expweight,lambda_cross,lambda_decay,lambda_smooth,weight_func,Gamma,x)


Figure 11: The concurrent transfer functions after changing to lowpass filter.

To improve the concurrent filter properties for both, I increase the smoothing expweight to 26, which will in turn affect the lambda_smooth, so I decrease it to .70. This gives me a much better transfer function pair, shown in Figure 12.  Notice the peak in the explanatory series transfer function is now much closer to 1.0, exactly what we want.

Figure 12: The concurrent transfer functions after changing to lowpass filter, increasing expweight to 26, and decreasing lambda_smooth to .70.

I’m still not satisfied with the lift at frequency zero for the STXE series. At roughly .5 at frequency zero, the filter might not provide enough push or pull that I need. The only way to ensure a guaranteed lift in the STXE log-return series is to employ constraints on the filter coefficients so that the transfer function is one at frequency zero. This can be achieved by setting i1 to true in the IMDFA function call, which effectively ensures that the sum of the filter coefficients at $\omega = 0$ is one. After doing this, I get the following transfer functions and the respective filter coefficients.

#---- Update the regularization parameters
lambda_smooth&lt;-0.68
lambda_cross&lt;-0.0
lambda_decay&lt;-c(0.083,0.11)

#---- update customization parameters
lambda&lt;-0
expweight&lt;-28

#---- set filter constraint -------
i1&lt;-T
weight_constraint[1]&lt;-1


Figure 13: Transfer function and filter coefficients after setting the coefficient constraint i1 to true.

Now this is exactly what I was looking for. Not only does the transfer function for the explanatory series keep the important close-to-open cycle intact, but I have also enforced the lift I need for the STXE series. The coefficients still remain smooth with a nice decaying property at the end.  With the new filter coefficients, I then applied them to the data both in-sample and out-of-sample, yielding the trading signal shown in Figure 14.  It posses exactly the properties that I was seeking. The close-to-open cyclical component is still being extracted (thanks in part to the explanatory series), and is still relatively consistent, although not as much as the pure bandpass design. The feature that I like is the following: When the log-return data diverges away from the cyclical component, with increasing volatility, the STXE filter reacts by pushing the signal down to avoid any erroneous trading. This can be seen in observations 100 through 120 and then at observations 390 through the end of trading. Figure 15 (same as Figure 1 at the top of the article) show the resulting trades and performance produced in-sample and out-of-sample by this signal. This is the art of meticulous signal engineering folks.

Figure 14: In-sample and out-of-sample signal produced from the low-pass with i1 coefficient constraints.

With only two losses suffered out-of-sample during the roughly 9 days trading, the filter performs much more methodologically than before. Notice during the final two days trading, when volatility picked up, the signal ceases to trade as it is being pushed down. It even continues to ‘predict’ the close-to-open jump/fall correctly, such as at observations 288, 321, and 391. The last trade made was a sell/short sell position, with the signal trending down at the end. The filter is in position to make a huge gain from this timely signaling of a short position at 391, correctly determining a large fall the next trading day, and then waiting out the volatile trading. The gain should be large no matter what happens.

Figure 15: In-sample and out-of-sample performance of the i1 constrained filter design.

One thing I mention before concluding is that I made a slight adjustment to my filter design after employing the i1 constraint to get the results shown in Figure 13-15. I’ll leave this as an exercise for the reader to deduce what I have done. Hint: Look at the freezed degrees of freedom before and after applying the i1 constraint. If you still have trouble finding what I’ve done, email me and I’ll give you further hints.

### Conclusion

The overall performance of the first filter built, in regards to total return on investment out-of-sample, was superior to the second. However, this superior performance comes only with the assumption that the cycle component defined between frequencies .23 and .32 will continue to be present in future observations of STXE up until the expiration. If volatility increases and this intrinsic cycle ceases to exist in the log-return data, the performance will deteriorate.

For a better more comfortable approach that deals with changing volatile index conditions, I would opt for ensuring that the local-bias is present in the signal, This will effectively push or pull the signal down or up when the intrinsic cycle is weak in the increasing volatility, resulting in a pullback in trading activity.

As before, you can acquire the high-freq data used in this tutorial by requesting it via email.

Happy extracting!

# High-Frequency Financial Trading with Multivariate Direct Filtering Part Deux: Index Futures

Out-of-sample performance (cash, blue-to-pink line) of an MDFA low-pass filter built using the approach discussed in this article on the STOXX Europe 50 index futures with expiration March 18th (STXEH3) for 200 15-minute interval log-return observations. Only one small loss out-of-sample was recorded from the period of Jan 18th through February 1st, 2013.

The frequency of observations on the index that are to be considered for building trading filters using MDFA is typically only a question of taste and priorities.  The beauty of MDFA lies in not only the versatility and strength in building trading signals for virtually any financial trading priorities, but also in the independence on the underlying observation frequency of the data. In my previous articles, I’ve considered and built high-performing strategies for daily, hourly, and 15 minute log returns, where the focus of the strategy in building the signal began with viewing the periodogram as the main barometer in searching for optimal frequencies on which one should set the low-pass cutoff for the extracting target filter $\Gamma$ function.

Index futures, as in a futures contract on a financial index, as we will see in this article present no new challenges. With the success I had on the 15-minute return observation frequency that I utilized in my previous article in building a signal for the Japanese Yen, I will continue to use the 15 minute intervals for the index futures where I hope to shed some more light on the filter selection process. This includes deducing properties of the intrinsically optimal spectral peaks to trade on. To do this, I present a simple approach I take in these examples by first setting up a bandpass filter over the spectral peak in the periodogram and then study the in-sample and out-of-sample characteristics of this signal, both in performance and consistency. So without further ado, I present my experiments with financial trading on index futures using MDFA, in iMetrica.

### STOXX Europe 50 Index (STXE H3, Expiration March 18 2013)

The STOXX Europe 50 Index, Europe’s leading Blue-chip index, provides a representation of sector leaders in Europe. The index covers 50 stocks from 18 European countries and has the highest trading volume of any European index. One of the first things to notice with the 15-minute log-returns of STXE are the frequent large spikes. These spikes will occur every 27 observations at 13:30 (UTC time zone) due to the fact that there are 26 15-minute periods during the trading hours. These spikes represent the close-to-open jumps that the STOXX Europe 50 index has been subjected to and then reflected in the price of the futures contract. With this ‘seasonal’ pattern so obviously present in the log-return data, the frequency effects of this pattern should be clearly visible in the periodogram of the data. The beauty of MDFA (and iMetrica) is that we have the ability to explicitly engineer a trading signal to take advantage of this ‘seasonal’ pattern by building an appropriate extractor $\Gamma$.

Figure 1: Log-returns of STXE for the 15-min observations from 1-4-2013 to 2-1-2013

Regarding the periodogram of the data, Figure 2 depicts the periodograms for the 15 minute log-returns of STXE (red) and the explanatory series (pink) together on the same discretized frequency domain. Notice that in both log-return series, there is a principal spectral peak found between .23 and .32.   The trick is to localize the spectral peak that accounts for the cyclical pattern that is brought about by the close-to-open variation between 20:00 and 13:30 UTC.

Figure 2: The periodograms for the 15 minute log-returns of STXE (red) and the explanatory series (pink).

In order to see the effects of the MDFA filter when localizing this spectral peak, I use my target $\Gamma$ builder interface in iMetrica to set the necessary cutoffs for the bandpass filter directly covering both spectral peaks of the log-returns, which are found between .23 and .32. This is shown in Figure 3, where the two dashed red lines show indicate the cutoffs and spectral peak is clearly inside these two cutoffs, with the spectral peak for both series occurring in the vicinity of $\pi/12$.  Once the bandpass target $\Gamma$ was fixed on this small frequency range, I set the regularization parameters for the filter coefficients to be $\lambda_{smooth} = .86$,   $\lambda_{decay} = .11$,  and $\lambda_{decay2} = .11$.

Figure 3: Choosing the cutoffs for the band pass filter to localize the spectral peak.

Pinpointing this frequency range that zooms in on the largest spectral peak generates a filter that acts on the intrinsic cycles found in the 15 minute log-returns of the STXE futures index. The resulting trading signal produced by this spectral peak extraction is shown in Figure 4, with the returns (blue to pink line) generated from the trading signal (green) , and the price of the STXE futures index in gray. The cyclical effects in the signal include the close-to-open variations in the data. Notice how the signal predicts the variation of the close-to-open price in the index quite well, seen from the large jumps or falls in price every 27 observations. The performance of this simple design in extracting the spectral peak of STXE yields a 4 percent ROI on 200 observations out-of-sample with only 3 losses out of 20 total trades (85 percent trade success rate), with two of them being accounted for towards the very end of the out-of-sample observations in an uncharacteristic volatile period occurring on January 31st 2013.

Figure 4: The performance in-sample and out-of-sample of the spectral peak localizing bandpass filter.

The two concurrent frequency response (transfer) functions for the two filters acting on the STXE log-return data (purple) and the explanatory series (blue), respectively, are plotted below in Figure 5. Notice the presence of the spectral peaks for both series being accounted for in the vicinity of the frequency $\pi/12$, with mild damping at the peak. Slow damping of the noise in the higher frequencies is aided by the addition of a smoothing expweight parameter that was set at $\alpha = 4$.

Figure 5: The concurrent frequency response functions of the localizing spectral peak band-pass filter.

With the ideal characteristics of a trading signal quite present in this simple bandpass filter, namely smooth decaying filter coefficients, in-sample and out-of-sample performance properties identical, and accurate, consistent trading patterns, it would be hard to imagine on improving the trading signal for this European futures index even more. But we can. We simply keep the spectral peak frequencies intact, but also account for the local bias in log-return data by extending the lower cutoff to frequency zero. This will provide improved systematic trading characteristics by not only predicting the close-to-open variation and jumps, but also handling upswings and downswings, and highly volatile periods much better.

In this new design, I created a low-pass filter by keeping the upper cutoff $\omega_1$ from the band-pass design and setting the lower cutoff to 0. I also increased the smoothing parameter to $\alpha = 32$. In this newly designed filter, we see a vast improvement in the trading structure. As before, the filter was able to deduce the direction of every single close-to-open jump during the 200 out-of-sample observations, but notice that it was also able to become much more flexible in the trading during any upswing/downswing and volatile period. This is seen in more detail in Figure 7, where I added the letter ‘D’ to each of the 5 major buy/sell signals occurring before close.

Figure 6: Performance of filter both in-sample (left of cyan line) and on 210 observations out-of-sample (right of cyan line).

Notice that the signal predicted the jump correctly for each of these major jumps, resulting in large returns. For example, at the first “D” indicator,  the signal indicated sell/short (magenta dashed line) the STXE future index 5 observations before close, namely at 18:45 UTC, before market close at 20:00 UTC. Sure enough, the price of the STXE contract went down during overnight trading hours and opened far below the previous days close, with the filter signaling a buy (green dashed line) 45 minutes into trading. At the mark of the second “D”, we see that on the final observation before market close, the signal produced a buy/long indication, and indeed, the next day the price of the future jumped significantly.

Figure 7: Zooming in on the out-of-sample performance and showing all the signal responses that predicted the major close-to-open jumps.

Only two very small losses of less than .08 percent were accounted for.  One advantage of including the frequency zero along with the spectral peak frequency of STXE is that the local bias can help push-up or pull-down the signal resulting in a more ‘patient’ or ‘diligent’ filter that can better handle long upswings/downswings or volatile periods. This is seen in the improvement of the performance towards the end of the 200 observations out-of-sample, where the filter is more patient in signaling a sell/short after the previous buy. Compare this with the end of the performance from the band-pass filter, Figure 4.  With this trading signal out-of-sample, I computed a 5 percent ROI on the 200 observations out-of-sample with only 2 small losses. The trading statistics for the entire in-sample combined with out-of-sample are shown in Figure 8.

Figure 8: The total performance statistics of the STXEH3 trading signal in-sample plus out-of-sample. The max drop indicates -0 due to the fact that there was a truncation to two decimal places. Thus the losses were less than .01.

### S&P 500 Futures Index (ES H3, Expiration March 18 2013)

In this experiment trading S&P 500 future contracts (E-mini) on observations of 15 minute intervals from Jan 4th to Feb 1st 2013, I apply the same regimental approach as before.  In looking at the log-returns of ESH3 shown in Figure 10, the effect of close-to-open variation seem to be much less prominent here compared to that on the STXE future index. Because of this, the log-returns seem to be much closer to ‘white noise’ on this index. Let that not perturb our pursuit of a high performing trading signal however. The approach I take for extracting the trading signal, as always, begins with the periodogram.

Figure 10: The log-return data of ES H3 at 15 minute intervals from 1-4-2013 to 2-1-2013.

As the large variations in the close-to-open price are not nearly as prominent, it would make sense that the same spectral peak found before at near $\pi/12$ is not nearly as prominent either. We can clearly see this in the periodogram plotted below in Figure 11. In fact, the spectral peak at $\pi/12$ is slightly larger in the explanatory series (pink), thus we should still be able to take advantage of any sort of close-to-open variation that exists in the E-min future index.

Figure 11: Periodograms of ES H3 log-returns (red) and the explanatory series (pink). The red dashed vertical lines are framing the spectral peak between .23 and .32.

With this spectral peak extracted from the series, the resulting trading signal is shown in Figure 12 with the performance of the bandpass signal shown in Figure 13.

Figure 12: The signal built from the extracted spectral peak and the log-return ESH3 data.

One can clearly see that the trading signal performs very well during the consistent cyclical behavior in the ESH3 price, However, when breakdown occurs in this stochastic structure and follows more prominently another frequency, the trading signal dies and no longer trades systematically taking advantage of the intrinsic cycle found near $\pi/12$. This can be seen in the middle 90 or so observations. The price can be seen to follow more closely a random walk and the trading becomes inconsistent. After this period of 90 or so observations however, just after the beginning of the out-of-sample period, the trajectory of the ESH3 follows back on its consistent course with a $\pi/12$ cyclical component it had before.

Figure 13: The performance in-sample and out-of-sample of the simple bandpass filter extracting the spectral peak.

Now to improve on these results, we include the frequency zero by moving the lower cutoff of the previous band-pass filter to $\latex \omega_0 = 0$.  As I mentioned before, this lifts or pushes down the signal from the local bias and will trade much more systematically. I then lessened the amount of smoothing in the expweight function to $\alpha = 24$, down from $\alpha = 36$ as I had on the band-pass filter.  This allows for slightly higher frequencies than $\pi/12$ to be traded on. I then proceeded to adjust the regularization parameters to obtain a healthy dosage of smoothness and decay in the coefficients. The result of this new low-pass filter design is shown in Figure 14.

Figure 14: Performance out-of-sample (right of cyan line) of the ES H3 filter on 200 15 minute observations.

The improvement in the overall systematic trading and performance is clear. Most of the major improvements came from the middle 90 points where the trading became less cyclical. With 6 losses in the band-pass design during this period alone, I was able to turn those losses into two large gains and no losses.  Only one major loss was accounted for during the 200 observation out-of-sample testing of filter from January 18th to February 1st, with an ROI of nearly 4 percent during the 9 trading days. As with the STXE filter in the previous example, I was able to successfully build a filter that correctly predicts close-to-open variations, despite the added difficulty that such variations were much smaller. Both in-sample and out-of-sample, the filter performs consistently, which is exactly what one strives for thanks to regularization.

### ASX Futures (YAPH3, Expiration March 18, 2013)

In the final experiment, I build a trading signal for the Australian Stock Exchange futures, during the same period of the previous two experiments. The log-returns show moderately large jumps/drops in price during the entire sample from Jan 4th to Feb 1st, but not quite as large as in the STXE index. We still should be able to take advantage of these close-to-open variations.

Figure 15: The log-returns of the YAPH3 in 15-minute interval observations.

In looking at the periodograms for both the YAPH3 15 minute log-returns (red) and the explanatory series (pink), it is clear that the spectral peaks don’t align like they did in the previous two exampls. In fact, there hardly exists a dominant spectral peak in the explanatory series, whereas the $\pi/12$ spectral peak in YAPH3 is very prominent. This ultimately might effect the performance of the filter, and consequently the trades. After building the low-pass filter and setting a high smoothing expweight parameter $\alpha = 26.5$, I then set the regularization parameters to be  $\lambda_{smooth} = .85$,   $\lambda_{decay} = .11$,  and $\lambda_{decay2} = .11$ (same as first example).

Figure 16: The periodograms for YAPH3 and explanatory series with spectral peak in YAPH3 framed by the red dashed lines.

The performance of the filter in-sample and out-of-sample is shown in Figure 18. This was one of the more challenging index futures series to work with as I struggled finding an appropriate explanatory series (likely because I was lazy since it was late at night and I was getting tired). Nonetheless, the filter still seems to predict the close-to-open variation on the Australian stock exchange index fairly well. All the major jumps in price are accounted for if you look closely at the trades (green dashed lines are buys/long and magenta lines are sells/shorts) and the corresponding action on the price of the futures contract.   Five losses out-of-sample for a trade success ratio of 72 percent and an ROI out-of-sample on 200 observations of 4.2 percent.  As with all other experiments in building trading signals with MDFA, we check the consistency of the in-sample and out-of-sample performance, and these seem to match up nicely.

Figure 18 : The out-of-sample performance of the low-pass filter on YAPH3.

The filter coefficients for the YAPH3 log-difference series is shown in Figure 19. Notice the perfectly smooth undulating yet decaying structure of the coefficients as the lag increases. What a beauty.

Figure 19: Filter coefficients for the YAPH3 series.

### Conclusion

Studying the trading performance of spectral peaks by first constructing band-pass filters to extract the signal corresponding to the peak in these index futures enabled me to understand how I can better construct the lowpass filter to yield even better performance. In these examples, I demonstrated that the close-to-open variation in the index futures price can be seen in the periodogram and thus be controlled for in the MDFA trading signal construction. This trading frequency corresponds to roughly $\pi/12$ in the 15 minute observation data that I had from Jan 4th to Feb 1st. As I witnessed in my empirical studies using iMetrica, this peak is more prominent when the close-to-open variations are larger and more often, promoting a very cyclical structure in the log-return data.  As I look deeper and deeper into studying the effects of extracting spectral peaks in the periodogram of financial data log-returns and the trading performance, I seem to improve on results even more and building the trading signals becomes even easier.

Stay tuned very soon for a tutorial using R (and MDFA) for one of these examples on high-frequency trading on index futures.  If you have any questions or would like to request a certain index future (out of one of the above examples or another) to be dissected in my second and upcoming R tutorial, feel free to shoot me an email.

Happy extracting!

# High-Frequency Financial Trading on FOREX with MDFA and R: An Example with the Japanese Yen

Figure 1: In-sample (observations 1-250) and out-of-sample performance of the trading signal built in this tutorial using MDFA. (Top) The log price of the Yen (FXY) in 15 minute intervals and the trades generated by the trading signal. Here black line is a buy (long), blue is sell (short position). (Bottom) The returns accumulated (cash) generated by the trading, in percentage gained or lost.

In my previous article on high-frequency trading in iMetrica on the FOREX/GLOBEX, I introduced some robust signal extraction strategies in iMetrica using the multidimensional direct filter approach (MDFA) to generate high-performance signals for trading on the foreign exchange and Futures market. In this article I take a brief leave-of-absence from my world of developing financial trading signals in iMetrica and migrate into an uber-popular language used in finance due to its exuberant array of packages, quick data management and graphics handling, and of course the fact that it’s free (as in speech and beer) on nearly any computing platform in the world.

This article gives an intro tutorial on using R for high-frequency trading on the FOREX market using the R package for MDFA (offered by Herr Doktor Marc Wildi von Bern) and some strategies that I’ve developed for generating financially robust trading signals. For this tutorial, I consider the second example given in my previous article where I engineered a trading signal for 15-minute log-returns of the Japanese Yen (from opening bell to market close EST).  This presented slightly new challenges than before as the close-to-open jump variations are much larger than those generated by hourly or daily returns. But as I demonstrated, these larger variations on close-to-open price posed no problems for the MDFA. In fact, it exploited these jumps and made large profits by predicting the direction of the jump. Figure 1 at the top of this article shows the in-sample (observations 1-250) and out-of-sample (observations 251 onward) performance of the filter I will be building in the first part of this tutorial.

Throughout this tutorial, I attempt to replicate these results that I built in iMetrica and expand on them a bit using the R language and the implementation of the MDFA available in here.  The data that we consider are 15-minute log-returns of the Yen from January 4th to January 17th and I have them saved as an .RData file given by ld_fxy_insamp. I have an additional explanatory series embedded in the .RData file that I’m using to predict the price of the Yen. Additionally, I also will be using price_fxy_insamp which is the log price of Yen, used to compute the performance (buy/sells) of the trading signal. The ld_fxy_insamp will be used as the in-sample data to construct the filter and trading signal for FXY. To obtain this data so you can perform these examples at home, email me and I’ll send you all the necessary .RData files (the in-sample and out-of-sample data) in a .zip file. Taking a quick glance at the ld_fxy_insamp data, we see log-returns of the Yen at every 15 minutes starting at market open (time zone UTC). The target data (Yen) is in the first column along with the two explanatory series (Yen and another asset co-integrated with movement of Yen).
 > head(ld_fxy_insamp) [,1]           [,2]          [,3] 2013-01-04 13:30:00  0.000000e+00   0.000000e+00  0.0000000000 2013-01-04 13:45:00  4.763412e-03   4.763412e-03  0.0033465833 2013-01-04 14:00:00 -8.966599e-05  -8.966599e-05  0.0040635638 2013-01-04 14:15:00  2.597055e-03   2.597055e-03 -0.0008322064 2013-01-04 14:30:00 -7.157556e-04  -7.157556e-04  0.0020792190 2013-01-04 14:45:00 -4.476075e-04  -4.476075e-04 -0.0014685198 

Moving on, to begin constructing the first trading signal for the Yen, we begin by uploading the data into our R environment, define some initial parameters for the MDFA function call, and then compute the DFTs and periodogram for the Yen.

load(paste(path.pgm,&quot;ld_fxy_in15min.RData&quot;,sep=&quot;&quot;))    #load in-sample log-returns of Yen

in_samp_lenprice_insample&lt;-price_fxy_insamp

#setup some MDFA variables
x&lt;-ld_fxy_insamp
len&lt;-length(x[,1])
shift_constraint&lt;-rep(0,length(x[1,])-1)
weight_constraint&lt;-rep(0,length(x[1,])-1)
d&lt;-0
plots&lt;-T
lin_expweight&lt;-F

# Compute DFTs and periodogram for initial analysis
spec_obj&lt;-spec_comp(len,x,d)
weight_func&lt;-spec_obj$weight_func K&lt;-length(weight_func[,1])-1 fxy_periodogram&lt;-abs(spec_obj$weight_func[,1])^2


As I’ve mentioned in my previous articles, my step-by-step strategy for building trading signals always begin by a quick analysis of the periodogram of the asset being traded on. Holding the key to providing insight into the characteristics of how the asset trades, the periodogram is an essential tool for navigating how the extractor $\Gamma$ is chosen. Here, I look for principal spectral peaks that correspond in the time domain to how and where my signal will trigger buy/sell trades. Figure 2 shows the periodogram of the 15-minute log-returns of the Japanese Yen during the in-sample period from January 4 to January 17 2013. The arrows point to the main spectral peaks that I look for and provides a guide to how I will define my $\Gamma$ function. The black dotted lines indicate the two frequency cutoffs that I will consider in this example, the first being $\pi/12$ and the second at $\pi/6$. Notice that both cutoffs are set directly after a spectral peak, something that I highly recommend.  In high-frequency trading on the FOREX using MDFA, as we’ll see, the trick is to seek out the spectral peak which accounts for the close-to-open variation in the price of the foreign currency. We want to take advantage of this spectral peak as this is where the big gains in foreign currency trading using MDFA will occur.

Figure 2: Periodogram of FXY (Japanese Yen) along with spectral peaks and two different frequency cutoffs.

In our first example we consider the larger frequency as the cutoff for $\Gamma$ by setting it to $\pi/6$ (the right most line in the figure of the periodogram). I then initially set the timeliness and smoothness parameters, $lambda$ and expweight to 0 along with setting all the regularization parameters to 0 as well. This will give me a barometer for where and how much to adjust the filter parameters. In selecting the filter length $L$, my empirical studies over numerous experiments in building trading signals using iMetrica have demonstrated that a ‘good’ choice is anywhere between 1/4 and 1/5 of the total in-sample length of the time series data.  Of course, the length depends on the frequency of the data observations (i.e. 15 minute, hourly, daily, etc.), but in general you will most likely never need more than $L$ being greater than 1/4 the in-sample size. Otherwise, regularization can become too cumbersome to handle effectively. In this example, the total in-sample length is 335 and thus I set $L= 82$ which I’ll stick to for the remainder of this tutorial. In any case, the length of the filter is not the most crucial parameter to consider in building good trading signals. For a good robust selection of the filter parameters couple with appropriate explanatory series, the results of the trading signal with $L= 80$ compared with, say, $L= 85$ should hardly differ. If they do, then the parameterization is not robust enough.

After uploading both the in-sample log-return data along with the corresponding log price of the Yen for computing the trading performance, we the proceed in R to setting initial filter settings for the MDFA routine and then compute the filter using the IMDFA_comp function. This returns both the i_mdfa& object holding coefficients, frequency response functions, and statistics of filter, along with the signal produced for each explanatory series. We combine these signals to get the final trading signal in-sample. All this is all done in R as follows:


cutoff&lt;-pi/6 #set frequency cutoff
Gamma&lt;-((0:K)&lt;(cutoff*K/pi)) #define Gamma

grand_mean&lt;-F
Lag&lt;-0
L&lt;-82
lambda_smooth&lt;-0
lambda_cross&lt;-0
lambda_decay&lt;-c(0.,0.) #regularization - decay

lambda&lt;-0
expweight&lt;-0
i1&lt;-F
i2&lt;-F
# compute the filter for the given parameter definitions
i_mdfa_obj&lt;-IMDFA_comp(Lag,K,L,lambda,weight_func,Gamma,expweight,cutoff,i1,i2,weight_constraint,
lambda_cross,lambda_decay,lambda_smooth,x,plots,lin_expweight,shift_constraint,grand_mean)

# after computing filter, we save coefficients
bn&lt;-i_mdfa_obj$i_mdfa$b

# now we build trading signal
trading_signal&lt;-i_mdfa_obj$xff[,1] + i_mdfa_obj$xff[,2]


The resulting frequency response functions of the filter and the coefficients are plotted in the figure below.

Figure 3: The Frequency response functions of the filter (top) and the filter coefficients (below)

Notice the abundance of noise still present passed the cutoff frequency. This is mollified by increasing the expweight smoothness parameter. The coefficients for each explanatory series show some correlation in their movement as the lags increase. However, the smoothness and decay of the coefficients leaves much to be desired. We will remedy this by introducing regularization parameters. Plots of the in-sample trading signal and the performance in-sample of the signal are shown in the two figures below. Notice that the trading signal behaves quite nicely in-sample. However, looks can be deceiving. This stellar performance is due in large part to a filtering phenomenon called overfitting. One can deduce that overfitting is the culprit here by simply looking at the nonsmoothness of the coefficients along with the number of freezed degrees of freedom, which in this example is roughly 174 (out of 174), way too high. We would like to get this number at around half the total amount of degrees of freedom (number of explanatory series x L).

Figure 4: The trading signal and the log-return data of the Yen.

The in-sample performance of this filter demonstrates the type of results we would like to see after regularization is applied.  But now comes for the sobering effects of overfitting. We apply these filter coeffcients to 200 15-minute observations of the Yen and the explanatory series from January 18 to February 1 2013 and compare with the characteristics in-sample. To do this in R, we first load the out-of-sample data into the R environment, and then apply the filter to the out-of-sample data that I defined as x_out.

load(paste(path.pgm,&quot;ld_fxy_out15min.RData&quot;,sep=&quot;&quot;))
x_out&lt;-rbind(ld_fxy_insamp,ld_fxy_outsamp) #bind the in-sample with out-of-sample data
xff&lt;-matrix(nrow=out_samp_len,ncol=2)

#apply filter built in-sample
for(i in 1:out_samp_len)
{
xff[i,]&lt;-0
for(j in 2:3)
{
xff[i,j-1]&lt;-xff[i,j-1]+bn[,j-1]%*%x_out[335+i:(i-L+1),j]
}
}


The plot in Figure 5 shows the out-of-sample trading signal. Notice that the signal is not nearly as smooth as it was in-sample. Overshooting of the data in some areas is also obviously present. Although the out-of-sample overfitting characteristics of the signal are not horribly suspicious, I would not trust this filter to produce stellar returns in the long run.

Figure 5 : Filter applied to 200 15 minute observations of Yen out-of-sample to produce trading signal (shown in blue)

Following the previous analysis of the mean-squared solution (no customization or regularization), we now proceed to clean up the problem of overfitting that was apparent in the coefficients along with mollifying the noise in the stopband (frequencies after $\pi/6$).  In order to choose the parameters for smoothing and regularization, one approach is to first apply the smoothness parameter first, as this will generally smooth the coefficients while acting as a ‘pre’-regularizer, and then advance to selecting appropriate regularization controls. In looking at the coefficients (Figure 3), we can see that a fair amount of smoothing is necessary, with only a slight touch of decay. To select these two parameters in R, one option is to use the Troikaner optimizer (found here) to find a suitable combination (I have a secret sauce algorithmic approach I developed for iMetrica for choosing optimal combinations of parameters given an extractor $\Gamma$ and a performance indicator, although it’s lengthy (even in GNU C) and cumbersome to use, so I typically prefer the strategy discussed in this tutorial).   In this example, I began by setting the lambda_smooth to .5 and the decay to (.1,.1) along with an expweight smoothness parameter set to 8.5. After viewing the coefficients, it still wasn’t enough smoothness, so I proceeded to add more finally reaching .63, which did the trick. I then chose lambda to balance the effects of the smoothing expweight (lambda is always the last resort tweaking parameter).

lambda_smooth&lt;-0.63
lambda_cross&lt;-0.
lambda_decay&lt;-c(0.119,0.099)
lambda&lt;-9
expweight&lt;-8.5

i_mdfa_obj&lt;-IMDFA_comp(Lag,K,L,lambda,weight_func,Gamma,expweight,cutoff,i1,i2,weight_constraint,
lambda_cross,lambda_decay,lambda_smooth,x,plots,lin_expweight,shift_constraint,grand_mean)

bn&lt;-i_mdfa_obj$i_mdfa$b    #save the filter coefficients

trading_signal&lt;-i_mdfa_obj$xff[,1] + i_mdfa_obj$xff[,2]  #compute the trading signal


Figure 6 shows the resulting frequency response function for both explanatory series (Yen in red). Notice that the largest spectral peak found directly before the frequency cutoff at $\pi/6$ is being emphasized and slightly mollified (value near .8 instead of 1.0). The other spectral peaks below $\pi/6$ are also present. For the coefficients, just enough smoothing and decay was applied to keep the lag, cyclical, and correlated structure of the coefficients intact, but now they look much nicer in their smoothed form. The number of freezed degrees of freedom has been reduced to approximately 102.

Figure 6: The frequency response functions and the coefficients after regularization and smoothing have been applied (top). The smoothed coefficients with slight decay at the end (bottom). Number of freezed degrees of freedom is approximately 102 (out of 172).

Along with an improved freezed degrees of freedom and no apparent havoc of overfitting, we apply this filter out-of-sample to the 200 out-of-sample observations in order to verify the improvement in the structure of the filter coefficients (shown below in Figure 7).  Notice the tremendous improvement in the properties of the trading signal (compared with Figure 5). The overshooting of the data has be eliminated and the overall smoothness of the signal has significantly improved. This is due to the fact that we’ve eradicated the presence of overfitting.

Figure 7: Out-of-sample trading signal with regularization.

With all indications of a filter endowed with exactly the characteristics we need for robustness, we now apply the trading signal both in-sample and out of sample to activate the buy/sell trades and see the performance of the trading account in cash value. When the signal crosses below zero, we sell (enter short position) and when the signal rises above zero, we buy (enter long position).

The top plot of Figure 8 is the log price of the Yen for the 15 minute intervals and the dotted lines represent exactly where the trading signal generated trades (crossing zero). The black dotted lines represent a buy (long position) and the blue lines indicate a sell (and short position).  Notice that the signal predicted all the close-to-open jumps for the Yen (in part thanks to the explanatory series). This is exactly what we will be striving for when we add regularization and customization to the filter. The cash account of the trades over the in-sample period is shown below, where transaction costs were set at .05 percent. In-sample, the signal earned roughly 6 percent in 9 trading days and a 76 percent trading success ratio.

Figure 8: In-sample performance of the new filter and the trades that are generated.

Now for the ultimate test to see how well the filter performs in producing a winning trading signal, we applied the filter to the 200 15-minute out-of-sample observation of the Yen and the explanatory series from Jan 18th to February 1st and make trades based on the zero crossing. The results are shown below in Figure 9. The black lines represent the buys and blue lines the sells (shorts). Notice the filter is still able to predict the close-to-open jumps even out-of-sample thanks to the regularization. The filter succumbs to only three tiny losses at less than .08 percent each between observations 160 and 180 and one small loss at the beginning, with an out-of-sample trade success ratio hitting 82 percent and an ROI of just over 4 percent over the 9 day interval.

Figure 9: Out-of-sample performance of the regularized filter on 200 out-of-sample 15 minute returns of the Yen. The filter achieved 4 percent ROI over the 200 observations and an 82 percent trade success ratio.

Compare this with the results achieved in iMetrica using the same MDFA parameter settings. In Figure 10, both the in-sample and out-of-sample performance are shown. The performance is nearly identical.

Figure 10: In-sample and out-of-sample performance of the Yen filter in iMetrica. Nearly identical with performance obtained in R.

#### Example 2

Now we take a stab at producing another trading filter for the Yen, only this time we wish to identify only the lowest frequencies to generate a trading signal that trades less often, only seeking the largest cycles. As with the performance of the previous filter, we still wish to target the frequencies that might be responsible to the large close-to-open variations in the price of Yen. To do this, we select our cutoff to be $\pi/12$ which will effectively keep the largest three spectral peaks intact in the low-pass band of $\Gamma$.

For this new filter, we keep things simple by continuing to use the same regularization parameters chosen in the previous filter as they seemed to produce good results out-of-sample. The $\lambda$ and expweight customization parameters however need to be adjusted to account for the new noise suppression requirements in the stopband and the phase properties in the smaller passband. Thus I increase the smoothing parameter and decreased the timeliness parameter (which only affects the passband) to account for this change. The new frequency response functions and filter coefficients for this smaller lowpass design are shown below in Figure 11. Notice that the second spectral peak is accounted for and only slightly mollified under the new changes. The coefficients still have the noticeable smoothness and decay at the largest lags.

Figure 11: Frequency response functions of the two filters and their corresponding coefficients.

To test the effectiveness of this new lower trading frequency design, we apply the filter coefficients to the 200 out-of-sample observations of the 15-minute Yen log-returns. The performance is shown below in Figure 12. In this filter, we clearly see that the filter still succeeds in predicting correctly the large close-to-open jumps in the price of the Yen. Only three total losses are observed during the 9 day period. The overall performance is not as appealing as the previous filter design as less amount of trades are made, with a near 2 percent ROI and 76 percent trade success ratio. However, this design could fit the priorities for a trader much more sensitive to transaction costs.

Figure 12: Out-of-sample performance of filter with lower cutoff.

#### Conclusion

Verification and cross-validation is important, just as the most interesting man in the world will tell you.

The point of this tutorial was to show some of the main concepts and strategies that I undergo when approaching the problem of building a robust and highly efficient trading signal for any given asset at any frequency. I also wanted to see if I could achieve similar results with the R MDFA package as my iMetrica software package. The results ended up being nearly parallel except for some minor differences. The main points I was attempting to highlight were in first analyzing the periodogram to seek out the important spectral peaks (such as ones associate with close-to-open variations) and to demonstrate how the choice of the cutoff affects the systematic trading.  Here’s a quick recap on good strategies and hacks to keep in mind.

Summary of strategies for building trading signal using MDFA in R:

• As I mentioned before, the periodogram is your best friend. Apply the cutoff directly after any range of spectral peaks that you want to consider. These peaks are what generate the trades.
• Utilize a choice of filter length $L$ no greater than 1/4. Anything larger is unnecessary.
• Begin by computing the filter in the mean-square sense, namely without using any customization or regularization and see exactly what needs to be approved upon by viewing the frequency response functions and coefficients for each explanatory series.  Good performance of the trading signal in-sample (and even out-of-sample in most cases) is meaningless unless the coefficients have solid robust characteristics in both the frequency domain and the lag domain.
• I recommend beginning with tweaking the smoothness customization parameter expweight and the lambda_smooth regularization parameters first. Then proceed with only slight adjustments to the lambda_decay parameters. Finally, as a last resort, the lambda customization. I really never bother to look at lambda_cross. It has seldom helped in any significant manner.  Since the data we are using to target and build trading signals are log-returns, no need to ever bother with i1 and i2. Those are for the truly advanced and patient signal extractors, and should only be left for those endowed with iMetrica 😉

If you have any questions, or would like the high-frequency Yen data I used in these examples, feel free to contact me and I’ll send them to you. Until next time, happy extracting!

# High-Frequency Financial Trading with Multivariate Direct Filtering I: FOREX and Futures

Animation 1: Click to see animation of the Japanese Yen filter in action on 164 hourly out-of-sample observations.

In my previous articles, I was working uniquely with daily log-return data from different time spans from a year to a year and a half. This enabled the in-sample period of computing the filter coefficients for the signal extraction to include all the most recent annual phases and seasons of markets, from holiday effects, to the transitioning period of August to September that is regularly highly influential on stock market prices and commodities as trading volume increases a significant amount. One immediate question that is raised in migrating to higher-frequency intraday data is what kind of in-sample/out-of-sample time spans should be used to compute the filter in-sample and then for how long do we apply the filter out-of-sample to produce the trades? Another question that is raised with intraday data is how do we account for the close-to-open variation in price? Certainly, after close, the after-hour bids and asks will force a jump into the next trading day. How do we deal with this jump in an optimal manner? As the observation frequency gets higher, say from one hour to 30 minutes, this close-to-open jump/fall should most likely be larger. I will start by saying that, as you will see in the results of this article, with a clever choice of the extractor $\Gamma$ and explanatory series, MDFA can handle these jumps beautifully (both aesthetically and financially). In fact, I would go so far as to say that the MDFA does a superb job in predicting the overnight variation.

One advantage of building trading signals for higher intraday frequencies is that the signals produce trading strategies that are immediately actionable. Namely one can act upon a signal to enter a long or short position immediately when they happen. In building trading signals for the daily log-return, this is not the case since the observations are not actionable points, namely the log difference of today’s ending price with yesterday’s ending price are produced after hours and thus not actionable during open market hours and only actionable the next trading day. Thus trading on intraday observations can lead to better efficiency in trading.

In this first installment in my series on high-frequency financial trading using multivariate direct filtering in iMetrica, I consider building trading signals on hourly returns of foreign exchange currencies. I’ve received a few requests after my recent articles on the Frequency Effect in seeing iMetrica and MDFA in action on the FOREX sector. So to satisfy those curiosities, I give a series of (financially) satisfying and exciting results in combining MDFA and the FOREX. I won’t give all my secretes away into building these signals (as that would of course wipe out my competitive advantage), but I will give some of the parameters and strategies used so any courageously curious reader may try them at home (or the office). In the conclusion, I give a series of even more tricks and hacks.  The results below speak for themselves  So without further ado, let the games begin.

#### Japanese Yen

Frequency: One hour returns
30 day out-of-sample ROI: 12 percent

Yen Filter Parameters: $\lambda$ = 9.2 $\alpha$ = 13.2, $\omega_0 = \pi/5$
Regularization: smooth = .918, decay = .139, decay2 = .79, cross = 0

In the first experiment, I consider hourly log-returns of a ETF index that mimics the Japanese Yen called FXY. As for one of the explanatory series, I consider the hourly log-returns of the price of GOLD which is traded on NASDAQ. The out-of-sample results of the trading signal built using a low-pass filter and the parameters above are shown in Figure 1.  The in-sample trading signal (left of cyan line) was built using 400 hourly observations of the Yen during US market hours dating back to 1 October 2012. The filter was then applied to the out-of-sample data for 180 hours, roughly 30 trading days up until Friday, 1 February 2013.

Figure 1: Out-of-sample results for the Japanese Yen. The in-sample trading signal was built using 400 hourly observations of the Yen during US market hours dating back to October 1st, 2012. The out-of-sample portion passed the cyan line is on 180 hourly observations, about 30 trading days.

This beauty of this filter is that it yields a trading signal exhibiting all the characteristics that one should strive for in building a robust and successful trading filter.

1. Consistency: The in-sample portion of the filter performs exactly as it does out-of-sample (after cyan line) in both trade success ratio and systematic trading performance.
2. Dropdowns: One small dropdown out-of-sample for a loss of only .8 percent (nearly the cost of the transaction).
3. Detects the cycles as it should: Although the filter is not able to pinpoint with perfect accuracy every local small upturn during the descent of the Yen against the dollar, it does detect them nonetheless and knows when to sell at their peaks (the magenta lines).
4. Self-correction: What I love about a robust filter is that it will tend to self-correct itself very quickly to minimize a loss in an erroneous trade. Notice how it did this in the second series of buy-sell transactions during the only loss out-of-sample. The filter detects momentum but quickly sold right before the ensuing downfall. My intuition is that only frequency-based methods such as the MDFA are able to achieve this consistently. This is the sign of a skillfully smart filter.

The coefficients for this Yen filter are shown below. Notice the smoothness of the coefficients from applying the heavy smooth regularization and the strong decay at the very end.  This is exactly the type of smooth/decay combo that one should desire. There is some obvious correlation between the first and second explanatory series in the first 30 lags or so as well. The third explanatory series seems to not provide much support until the middle lags .

Figure 2: Coefficients of the Yen filter. Here we use three different explanatory series to extract the trading signal.

One of the first things that I always recommend doing when first attempting to build a trading signal is to take a glance at the periodogram. Figure 2 shows the periodogram of the log-return data of the Japanese Yen over 580 hours.  Compare this with the periodogram of the same asset using log-returns of daily data over 580 days, shown in Figure 3.  Notice the much larger prominent spectral peaks at the lower frequencies in the daily log-return data. These prominent spectral peaks renders multibandpass filters much more advantageous and to use as we can take advantage of them by placing a band-pass filter directly over them to extract that particular frequency (see my article on multibandpass filters). However, in the hourly data, we don’t see any obvious spectral peaks to consider, thus I chose a low-pass filter and set the cutoff frequency at $\pi/5$, a standard choice, and good place to begin.

Figure 3: Periodogram of hourly log-returns of the Japanese Yen over 580 hours.

Figure 4: Periodogram of Japanese Yen using 580 daily log-return observations. Many more spectral peaks are present in the lower frequencies.

#### Japanese Yen

Frequency: 15 minute returns
7 day out-of-sample ROI: 5 percent

Yen Filter Parameters: $\lambda$ = 3.7 $\alpha$ = 13, $\omega_0 = \pi/9$
Regularizationsmooth = .90, decay = .11, decay2 = .09, cross = 0

In the next trading experiment, I consider the Japanese Yen again, only this time I look at trading on even high-frequency log-return data than before, namely on 15 minute log-returns of the Yen from the opening bell to market close.  This presents slightly new challenges than before as the close-to-open jumps are much larger than before, but these larger jumps do not necessarily pose problems for the MDFA. In fact, I look to exploit these and take advantage to gain profit by predicting the direction of the jump.  For this higher frequency experiment, I considered 350 15-minute in-sample observations to build and optimize the trading signal, and then applied it over the span of 200 15-minute out-of-sample observations. This produced the results shown in the Figure 5 below. Out of 17 total trades out-of-sample, there were only 3 small losses each less than .5 percent drops and thus 14 gains during the 200 15-minute out-of-sample time period.  The beauty of this filter is its impeccable ability to predict the close-to-open jump in the price of the Yen. Over the nearly 7 day trading span, it was able to correctly deduce whether to buy or short-sell before market close on every single trading day change. In the figure below, the four largest close-to-open variation in Yen price is marked with a “D” and you can clearly see how well the signal was able to correctly deduce a short-sell before market close. This is also consistent with the in-sample performance as well, where you can notice the buys and/or short-sells at the largest close-to-open jumps (notice the large gain in the in-sample period right before the out-of-sample period begins, when the Yen jumped over 1 percent over night.  This performance is most likely aided by the explanatory time series I used for helping predict the close-to-open variation in the price of the Yen. In this example, I only used two explanatory series (the price of Yen, and another closely related to the Yen).

Figure 5: Out-of-sample performance of the Japanese Yen filter on 15 minute log-return data.

We look at the filter transfer functions to see what frequencies they are being privileged in the construction of the filter. Notice that some noise leaks out passed the frequency cutoff at $\pi/9$, but this is typically normal and a non-issue. I had to balance for both timeliness and smoothness in this filter using both the customization parameters $\lambda$ and $\alpha$. Not much at frequency 0 is emphasized, with more emphasis stemming from the large spectral peak found right at $\pi/9$.

Figure 6: The filter transfer functions.

#### British Pound

Frequency: 30 minute returns
14 day out-of-sample ROI: 4 percent

British Pound Filter Parameters: $\lambda$ = 5 $\alpha$ = 15, $\omega_0 = \pi/9$
Regularizationsmooth = .109, decay = .165, decay2 = .19, cross = 0

In this example we consider the frequency of the data to 30 minute returns and attempt to build a robust trading signal for a derivative of the British Pound (BP) on this higher frequency. Instead of using the cash value of the BP, I use 30 minute returns of the BP Futures contract expiring in March (BPH3). Although I don’t have access to tick data from the FOREX, I do have tick data from GLOBEX for the past 5 years.  Thus the futures series won’t be an exact replication of the cash price series of the BP, but it should be quite close due to very low interest rates.

The results of the out-of-sample performance of the BP futures filter are shown in Figure 7. I constructed the filter using an initial in-sample size of 390 30 minute returns dating back to 1 December 2012. After pinpointing a frequency cutoff in the frequency domain for the $\Gamma$ that yielded decent trading results in-sample, I then proceeded to optimize the filter in-sample on smoothness and regularization to achieve similar out-of-sample performance. Applying the resulting filter out-of-sample on 168 30-minute log-return observations of the BP futures series along with 3 explanatory series, I get the results shown below. There were 13 trades made and 10 of them were successful. Notice that the filter does an exquisite job at triggering trades near local optimums associated with the frequencies inside the cutoff of the filter.

Figure 7: The out-of-sample results of the British Pound using 30-minute return data.

In looking at the coefficients of the filter for each series in the extraction, we can clearly see the effects of the regularization: the smoothness of the coefficients the fast decay at the very end. Notice that I never really apply any cross regularization to stress the latitudinal likeliness between the 3 explanatory series as I feel this would detract from the predicting advantages brought by the explanatory series that I used.

Figure 8: The coefficients for the 3 explanatory series of the BP futures,

#### Euro

Frequency: 30 min returns
30 day out-of-sample ROI: 4 percent

Euro Filter Parameters: $\lambda$ = 0, $\alpha$ = 6.4, $\omega_0 = \pi/9$
Regularizationsmooth = .85, decay = .27, decay2 = .12, cross = .001

Continuing with the 30 minute frequency of log-returns, in this example I build a trading signal for the Euro futures contract with expiration on 18 March 2013 (UROH3 on the GLOBEX). My in-sample period, being the same as my previous experiment, is from 1 December 2012 to 4 January 2013 on 30 minute returns using three explanatory time series.  In this example, after inspecting the periodogram, I decided upon a low-pass filter with a frequency cutoff of $\pi/9$. After optimizing the customization and applying the filter to one month of 30 minute frequency return data out-of-sample (month of January 2013, after cyan line) we see the performance is akin to the performance in-sample, exactly what one strives for. This is due primarily to the heavy regularization of the filter coefficients involved. Only four very small losses of less than .02 percent are suffered during the out-of-sample span that includes 10 successful trades, with the losses only due to the transaction costs. Without transaction costs, there is only one loss suffered at the very beginning of the out-of-sample period.

Figure 9 : Out-of-sample performance on the 30-min log-returns of Euro futures contract UROH3.

As in the first example using hourly returns, this filter again exhibits the desired characteristics of a robust and high-performing financial trading filter. Notice the out-of-sample performance behaves akin to the in-sample performance, where large upswings and downswings are pinpointed to high-accuracy. In fact, this is where the filter performs best during these periods. No need for taking advantage of a multibandpass filter here, all the profitable trading frequencies are found at less than $\pi/9$.  Just as with the previous two experiments with the Yen and the British Pound, notice that the filter cleanly predicts the close-to-open variation (jump or drop) in the futures value and buys or sells as needed.  This can be seen from many of the large jumps in the out-of-sample period (after cyan line).

One reason why these trading signals perform so well is due to their approximation power of the symmetric filter. In comparing the trading signal (green) with a high-order approximation of the symmetric filter (gray line) transfer function $\Gamma$ shown in Figure 10, we see that trading signal does an outstanding job at approximating the symmetric filter uniformly. Even at the latest observation (the right most point), the asymmetric filter hones in on the symmetric signal (gray line) with near perfection. Most importantly, the signal crosses zero almost exactly where required.  This is exactly what you want when building a high-performing trading signal.

Figure 10: Plot of approximation of the real-time trading signal for UROH3 with a high order approximation of the symmetric filter transfer function.

In looking at the periodogram of the log-return data and the output trading signal differences (colored in blue), we see that the majority of the frequencies were accounted for as expected in comparing the signal with the symmetric signal. Only an inconsequential amount of noise leakage passed the frequency cutoff of $\pi/9$ is found.  Notice the larger trading frequencies, the more prominent spectral peaks, are located just after $\pi/6$. These could be taken into account with a smart multibandpass filter in order to manifest even more trades, but I wanted to keep things simple for my first trials with high-frequency foreign exchange data.  I’m quite content with the results that I’ve achieved so far.

Figure 11: Comparing the periodogram of the signal with the log-return data.

#### Conclusion

I must admit, at first I was a bit skeptical of the effectiveness that the MDFA would have in building any sort of successful trading signal for FOREX/GLOBEX high frequency data. I always considered the FOREX market rather ‘efficient’ due to the fact that it receives one of the highest trading volumes in the world.  Most strategies that supposedly work well on high-frequency FOREX all seem to use some form of technical analysis or charting (techniques I’m particularly not very fond of), most of which are purely time-domain based. The direct filter approach is a completely different beast, utilizing a transformation into the frequency domain and a ‘bending and warping’ of the metric space for the filter coefficients to extract a signal within the noise that is the log-return data of financial assets.  For the MDFA to be very effective at building timely trading signals, the log-returns of the asset need to diverge from white noise a bit, giving room for pinpointing intrinsically important cycles in the data. However, after weeks of experimenting, I have discovered that building financial trading signals using MDFA and iMetrica on FOREX data is as rewarding as any other.

As my confidence has now been bolstered and amplified even more after my experience with building financial trading signals with MDFA and iMetrica for high-frequency data on foreign exchange log-returns at nearly any frequency, I’d be willing to engage in a friendly competition with anyone out there who is certain that they can build better trading strategies using time domain based methods such as technical analysis or any other statistical arbitrage technique.  I strongly believe these frequency based methods are the way to go, and the new wave in financial trading.  But it takes experience and a good eye for the frequency domain and periodograms to get used to. I haven’t seen many trading benchmarks that utilize other types of strategies, but i’m willing to bet that they are not as consistent as these results using this large of an out-of-sample to in-sample ratio (the ratios in these experiments were between .50 and .80).  If anyone would like to take me up on my offer for a friendly competition (or know anyone that would), feel free to contact me.

After working with a multitude of different financial time series and building many different types of filters, I have come to the point where I can almost eyeball many of the filter parameter choices including the most important ones being the extractor $\Gamma$ along with the regularization parameters, without resorting to time consuming, and many times inconsistent, optimization routines.  Thanks to iMetrica, transitioning from visualizing the periodogram to the transfer functions and to the filter coefficients and back to the time domain to compare with the approximate symmetric filter in order to gauge parameter choices is an easy task, and an imperative one if one wants to build successful trading signals using MDFA.

Here are some overall tips and tricks to build your own high performance trading signals on high-frequency data at home:

• Pay close attention to the periodogram. This is your best friend in choosing the extractor $\Gamma$. The best performing signals are not the ones that trade often, but trade on the most important frequencies found in the data. Not all frequencies are created equal. This is true when building either low-pass or multibandpass frequencies.
• When tweaking customization, always begin with $\alpha$, the parameter for smoothness. $\lambda$ for timeliness should be the last resort. In fact, this parameter will most likely be next to useless due to the fact that the log-return of financial data is stationary. You probably won’t ever need it.
• You don’t need many explanatory series. Like most things in life, quality is superior to quantity. Using the log-return data of the asset you’re trading along with one and maybe two explanatory series that somewhat correlate with the financial asset you’re trading on is sufficient. Anymore than that is ridiculous overkill, probably leading to over-fitting (even the power of regularization at your fingertips won’t help you).

In my next article, I will continue with even more high-frequency trading strategies with the MDFA and iMetrica where I will engage in the sector of Funds and ETFs. If any curious reader would like even more advice/hints/comments on how to build these trading signals on high-frequency data for the FOREX (or the coefficients built in these examples), feel free to get in contact with me via email. I’ll be happy to help.

Happy extracting!

# Realizing the Future with iMetrica and HEAVY Models

In this article we steer away from multivariate direct filtering and signal extraction in financial trading and briefly indulge ourselves a bit in the world of analyzing high-frequency financial data, an always hot topic with the ever increasing availability of tick data in computationally convenient formats. Not only has high-frequency intraday data been the basis of higher frequency risk monitoring and forecasting, but it also provides access to building ‘smarter’ volatility prediction models using so-called realized measures of intraday volatility. These realized measures have been shown in numerous studies over the past 5 years or so to provide a solidly more robust indicator of daily volatility.   While daily returns only capture close-to-close volatility, leaving much to be said about the actual volatility of the asset that was witnessed during the day, realized measures of volatility using higher frequency data such as second or minute data provide a much clearer picture of open-to-close variation in trading.

In this article, I briefly describe a new type of volatility model that takes into account these realized measures for volatility movement called  High frEquency bAsed VolatilitY (HEAVY) models developed and pioneered by Shephard and Sheppard 2009. These models take as input both close-to-close daily returns $r_t$ as well as daily realized measures to yield better forecasting dynamics. The models have been shown to be endowed with the ability to not only track momentum in volatility, but also adjust for mean reversion effects as well as adjust quickly to structural breaks in the level of the volatility process.  As the authors (Sheppard and Shephard, 2009) state in their original paper, the focus of these models is on predictive properties, rather than on non-parametric measurement of volatility. Furthermore, HEAVY models are much easier and more robust estimation wise than single source equations (GARCH, Stochastic Volatility) as they bring two sources of volatility information to identify a longer term component of volatility.

The goal of this article is three-fold. Firstly, I briefly review these HEAVY models and give some numerical examples of the model in action using a gnu-c library and Java package called heavy_model that I develped last year for the iMetrica software. The heavy_model package is available for download (either by this link or e-mail me) and features many options that are not available in the MATLAB code provided by Sheppard (bootstrapping methods, Bayesian estimation, track reparameterization, among others). I will then demonstrate the seamless ability to model volatility with these High frEquency bAsed VolatilitY models using iMetrica, where I also provide code for computing realized measures of volatility in Java with the help of an R package called highfrequency (Boudt, Cornelissen, and Payseur 2012).

#### HEAVY Model Definition

Let’s denote the daily returns as $r_1, r_2, \ldots, r_T$, where $T$ is the total amount of days in the sample we are working with. In the HEAVY model, we supplement information to the daily returns by a so-called realized measure of intraday volatility based on higher frequency data, such as second, minute or hourly data. The measures are called daily realized measures and we will denote them as $RM_1, RM_2, \ldots, RM_T$ for the total number of days in the sample.  We can think of these daily realized measures as an average of variance autocorrelations during a single day. They are supposed to provide a better snapshot of the ‘true’ volatility for a specific day $t$. Although there are numerous ways of computing a realized measure, the easiest is the realized variance computed as $RM_t = \sum_j (X_{t+t_{j,t}} - X_{t+t_{j-1,t}})^2$ where $t_{j,t}$ are the normalized times of trades on day $t$. Other methods for providing realized measures includes using Kernel based methods which we will discuss later in this article (see for example http://papers.ssrn.com/sol3/papers.cfm?abstract_id=927483).

Once the realized measures have been computed for $T$ days, the HEAVY model is given by:

$Var(r_t | \mathcal{F}_{t-1}^{HF}) = h_t = \omega_1 + \alpha RM_{t-1} + \beta h_{t-1} + \lambda r^2_t$

$E(RM_t | \mathcal{F}_{t-1}^{HF}) = \mu_t = \omega_2+ \alpha_R RM_{t-1} + \beta_R \mu_{t-1},$

where the stability constraints are  $\alpha, \omega_1 \geq 0, \beta \in [0,1]$ and $\omega_2, \alpha_R \geq 0$ with $\lambda + \beta \in [0,1]$ and $\beta_R + \alpha_R \in [0,1]$. Here, the $\mathcal{F}_{t-1}^{HF}$ denotes the high-frequency information from the previous day $t-1$. The first equation models the close-to-close conditional variance and is akin to a GARCH type model, whereas the second equation models the conditional expectation of the open-to-close variation.

With the formulation above, one can easily see that slight variations to the model are perfectly plausible. For example, one could consider additional lags in either the realized measure $RM_t$ (akin to adding additional moving average parameters) or the conditional mean/variance variable (akin to adding autoregression parameters). One could also leave out the dependence on the squared returns $r^2_t$ by setting $\lambda$ to zero, which is what the original others recommended. A third variation is adding yet another equation to the pack that models a realized measure that takes into account negative and positive momentum to yield possibly better forecasts as it tracks both losses and gains in the model. In this case, one would add the third component by introducing a new equation for a realized semivariance to parametrically model statistical leverage eﬀects, where falls in asset prices are associated with increases in future volatility.  With realized semivariance computed for the $T$ days as $RMS_1, \ldots RMS_T$, the third equation becomes

$E(RMS_t | \mathcal{F}_{t-1}^{HF}) = \phi_t = \omega_3 + \alpha_{RS} RMS_{t-1} + \beta_{RS} \phi_{t-1}$

where $\alpha_{RS} + \beta_{RS} < 1$ and both positive.

#### HEAVY modeling in C and Java

To incorporate these HEAVY models into iMetrica, I began by writing a gnu-c library for providing a fast and efficient framework for both quasi-likelihood evaluation and a posteriori analysis of the models. The structure of estimating the models follows very closely to the original MATLAB code provided by Sheppard. However, in the c library I’ve added a few more useful tools for forecasting and distribution analysis. The Java code is essentially a wrapper for the c heavy_model library to provide a much cleaner approach to modeling and analyzing the HEAVY data such as the parameters and forecasts.  While there are many ways to declare, implement, and analyze HEAVY models using the c/java toolkit I provide, the most basic steps involved are as follows.

 heavyModel heavy = new heavyModel(); heavy.setForecastDimensions(n_forecasts, n_steps); heavy.setParameterValues(w1, w2, alpha, alpha_R, lambda, beta, beta_R); heavy.setTrackReparameter(0); heavy.setData(n_obs, n_series, series); heavy.estimateHeavyModel(); 

The first line declares a HEAVY model in java, while the second line sets the number of forecasts samples to compute and how many forecast steps to take. Forecasted values are provided for both the return variable $r_t$ (using a bootstrapping methodology) and the $h_t$, $\mu_t$ variables. In the next line, the parameter values for the HEAVY model are initialized. These are the initial points that are utilized in the quasi-maximum likelihood optimization routine and can be set to any values that satisfy the model constraints.   Here, $w1 = \omega_1, w2 = \omega_2$.

The fourth line is completely optional and is used for toggling (0=off, 1=on) a reparameterization of the HEAVY model so the intercepts of both equations in the HEAVY model are explicitly related to the unconditional mean of squared returns $r^2$ and realized measures $RM_t$. The reparameterization of the model has the advantage that it eliminates the estimation of $\omega_1, \omega_2$ and instead uses the unconditional means, leaving two less degrees of freedom in the optimization. See page 12 of the Shephard and Sheppard 2009 paper for a detailed explanation of the reparameterization. After setting the initial values, the data is set for the model by inputting the total number of observation $T$, the number of series (normally set to 2 and the data in column-wise format (namely a double array of length n_obs x n_series, where the first column is the return data $r_t$ and the second column is the daily realized measure data.  Finally, with the data set and the parameters initialized  we estimate the model in the 6th line. Once the model is finished estimating (should take a few seconds, depending on the number of observations), the heavyModel java object stores the parameter values, forecasts, model residuals, likelihood values, and more. For example, one can print out the estimated model parameters and plot the forecasts of $h_t$ using the following:

 heavy.printModelParameters(); heavy.plotForecasts(); Output: w_1 = 0.063 w_2 = 0.053 beta = 0.855 beta_R = 0.566 alpha = 0.024 alpha_R = 0.375 lambda = 0.087 

Figure 1 shows the plot of the filtered $h_t, \mu_t$ values for 300 trading days from June 2011 to June 2012 of AAPL with the final 20 points being the forecasted values. Notice that the multistep ahead forecast shows momentum which is one of the attractive characteristics of the HEAVY models as mentioned in the original paper by Shephard and Sheppard.

Figure 1: Plots of the filtered returns and realized measures with 20 step forecasts for Verizon for 300 trading days.

We can also easily plot the estimated joint distribution function $F_{\zeta, \eta}$ by simply using the ﬁltered $h_t, \mu_t$ and computing the devolatilized values $\zeta_t = r_t/ \sqrt{h_t}$, $\eta_t = (RM_t/\mu_t)^{1/2}$, leading to the innovations for the model for $t = 2,\ldots,T$.

Figure 2 below shows the empirical distribution of $F_{\zeta, \eta}$ for 600 days (nearly two years of daily observations from AAPL).  The $\zeta_t$ sequence should be roughly a martingale diﬀerence sequences with unit variance and the $\eta_t$ sequence  should have unit conditional means and of course be uncorrelated.  The empirical results validate the theoretical values.

Figure 2: Scatter plot of the empirical distribution of devolatilized values for h and mu.

In order to compile and run the heavy_model library and the accompanying java wrapper, one must first be sure to meet the requirements for installation. The programs were extensively tested on a 64bit Linux machine running Ubuntu 12.04. The heavy_model library written in c uses the GNU Scientific Library (GSL) for the matrix-vector routines along with a statistical package in gnu-c called apophenia (Klemens, 2012) for the optimization routine. I’ve also included a wrapper for the GSL library called multimin.c which enables using the optimization routines from the GSL library, but were not heavily tested.  The first version (version 00) of the heavy_model library and java wrapper can be downloaded at sourceforge.net/projects/highfrequency.  As a precautionary warning, I must confess that none of the files are heavily commented in any way as this is still a project in progress. Improvements in code, efficiency, and documentation will be continuously coming.

After downloading the .tar.gz package, first ensure that GSL and Apophenia are properly installed and the libraries are correctly installed to the appropriate path for your gnu c compiler. Second, to compile the .c code, copy the makefile.test file to Makefile and then type make. To compile the heavyModel library and utilize the java heavyModel wrapper (recommended), copy makefile.lib to Makefile, then type make. After it constructs the libheavy.so, compile the heavyModel.java file by typing javac heavyModel.java. Note that the java files were complied successfully using the Oracle Java 7 SDK.  If you have any questions about this or any of the c or java files, feel free to contact me. All the files were written by me (except for the optional multimin.c/h files for the optimization) and some of the subroutines (such as the HEAVY model simulation) are based on the MATLAB code by Sheppard. Even though I fully tested and reproduced the results found in other experiments exploring HEAVY models, there still could be bugs in the code. I have not fully tested every aspect (especially the Bayesian estimation components, an ongoing effort) and if anyone would like to add, edit, test, or comment on any of the routines involved in either the c or java code, I’d be more than happy to welcome it.

#### HEAVY Modeling in iMetrica

The Java wrapper to the gnu-c heavy_model library was installed in the iMetrica software package and can be used for GUI style modeling of high-frequency volatility. The HEAVY modeling environment is a feature of the BayesCronos module in iMetrica that also features other stochastic models for capturing and forecasting volatility such as (E)GARCH, stochastic volatility, mutlivariate stohastic factor modeling, and ARIMA modeling, all using either standard (Q)MLE model estimation or a Bayesian estimation interface (with histograms showing the MCMC results of the parameter chains).

Modeling volatility with HEAVY models is done by first uploading the data into the BayesCronos module (shown in Figure 3) through the use of either the BayesCronos Menu (featured on the top panel) or by using the Data Control Panel (see my previous article on Data Control).

Figure 3: BayesCronos interface in iMetrica for HEAVY modeling.

In the BayesCronos control panel shown above, we estimate a HEAVY model for the uploaded data (600 observations of $r_t, RM_t$) that were simulated from a model with omega_1 = 0.05, omega_2 = 0.10, beta = 0.8 beta_R = 0.3, alpha = 0.02, alpha_R = 0.3 (the simulation was done in the Data Control Module).

The model type is selected in the panel under the Model combobox. The number of forecasting steps and forecasting samples (for the $r_t$ variable) are selected in the Forecasting panel. Once those values are set, the model estimates are computed by pressing the “MLE” button in the bottom lower left corner. After the computing is done, all the available plots to analyze the HEAVY model are available by simply clicking the appropriate plotting checkboxes directly below the plotting canvas.   This includes up to 5 forecasts, the original data, the filtered $h_t, \mu_t$ values,  the residuals/empirical distributions of the returns and realized measures, and the pointwise likelihood evaluations for each observation. To see the estimated parameter values, simply click the “Parameter Values” button in the “Model and Parameters” panel and pop-up control panel will appear showing the estimated values for all the parameters.

#### Realized Measures in iMetrica

Figure 4: Computing Realized measures in iMetrica using a convenient realized measure control panel.

Importing and computing realized volatility measures in iMetrica is accomplished by using the control panel shown in Figure 4. With access to high frequency data, one simply types in the ticker symbol in the “Choose Instrument” box, sets the starting and ending date in the standard CCYY-MM-DD format, and then selects the kernel used for assembling the intraday measurements. The Time Scale sets the frequency of the data (seconds, minutes  hours) and the period scrollbar sets the alignment of the data. The Lags combo box determines the bandwidth of the kernel measuring the volatility. Once all the options have been set, clicking on the “Compute Realized Volatility” button will then produce three data sets for the time period given between start date and end data: 1) The daily log-returns of the asset $r_1, \ldots, r_T$ 2) The log-price of the asset, and 3) The realized volatility measure $RM_1, \ldots, RM_T$. Once the Java-R highfrequency routine has finished computing the realized measures, the data sets are automatically available in the Data Control Module of iMetrica. From here, one can annualize the realized measures using the weight adjustments in the Data Control Module (see Figure 5). Once content with the weighting, the data can then be exported to the MDFA module or the BayesCronos module for estimating and forecasting the volatility of GOOG using HEAVY models.

Figure 5: The log-return data (blue) and the (annualized) realized measure data using 5 minute returns (pink) for Google from 1-1-2011 to 6-19-2012.

The Realized Measure uploading in iMetrica utilizes a fantastic R package for studying and working with high frequency financial data called highfrequency (Boudt, Cornelissen, and Payseur 2012). To handle the analysis of high frequency financial data in java, I began by writing a Java wrapper to the R functions of the highfrequency R package to enable GUI interaction shown above in order to download the data into java and then iMetrica. The java environment uses library called RCaller that opens a live R kernel in the Java runtime environment from which I can call and R routines and directly load the data into Java. The initializing sequence looks like this.

 caller.getRCode().addRCode("require (Runiversal)"); caller.getRCode().addRCode("require (FinancialInstrument)"); caller.getRCode().addRCode("require (highfrequency)"); caller.getRCode().addRCode("loadInstruments('/HighFreqDataDirectoryHere/Market/instruments.rda')"); caller.getRCode().addRCode("setSymbolLookup.FI('/HighFreqDataDirectoryHere/Market/sec',use_identifier='X.RIC',extension='RData')"); 
Here, I’m declaring the R packages that I will be using (first three lines) and then I declare where my HighFrequency financial data symbol lookup directory is on my computer (next two lines). This
then enables me to extract high frequency tick data directly into Java. After loading in the desired intrument ticker symbol names, I then proceed to extract the daily log-returns for the given time frame, and then compute the realized measures of each asset using the rKernelCov function in highfrequency R package. This looks something like
 for(i=0;i<n_assets;i++) { String mark = instrum[i] + "<-" + instrum[i] + "['T09:30/T16:00',]"; 

caller.getRCode().addRCode(mark);

String rv = "rv"+i+"<-rKernelCov("+instrum[i]+"\$Trade.Price,kernel.type ="+kernels[kern]+", kernel.param="+lags+",kernel.dofadj = FALSE, align.by ="+frequency[freq]+", align.period="+period+", cts=TRUE, makeReturns=TRUE)"

caller.getRCode().addRCode(rv);

caller.getRCode().addRCode("names(rv"+i+")<-'rv"+i+"'"); rvs[i] = "rv_list"+i;

caller.getRCode().addRCode("rv_list"+i+"<-lapply(as.list(rv"+i+"), coredata)"); }

In the first line, I’m looping through all the asset symbols (I create Java strings to load into the RCaller as commands). The second line effectively retrieves the data during market hours only (America/New_York time), then creates a string to call the rKernelCov function in R. I give it all the user defined parameters defined by strings as well. Finally, in the last two lines, I extract the results and put them into an R list from which the java runtime environment will read.

#### Conclusion

In this article I discussed a recently introduced high frequency based volatility model by Shephard and Sheppard and gave an introduction to three different high-performance tools beyond MATLAB and R that I’ve developed for analyzing these new stochastic models. The heavyModel c/java package that I made available for download gives a workable start for experimenting in a fast and efficient framework the benefit of using high frequency financial data and most notably realized measures of volatility to produce better forecasts. The package will continuously be updated for improvements in both documentation, bug fixes, and overall presentation. Finally, the use of the R package highfrequency embedded in java and then utilized in iMetrica gives a fully GUI experience to stochastic modeling of high frequency financial data that is both conveniently easy to use and fast.

Happy Extracting and Volatilitizing!