iMetricaFX: An interactive JavaFX app for the MDFA-Toolkit

Selection_072.png

Figure 1: The main interactive iMetricaFX user interface.  

Introduction

We introduce an interactive app completely written in Java/FX for doing real-time signal extraction in multivariate time series.  iMetricaFX is a completely redesigned application from the previous iMetrica, focusing mostly on the multivariate direct filter approach and it’s derivatives for generating real-time signals and analyzing multivariate time series. It retains all the features of the MDFA module in iMetrica, but now with better responsive 2D graphics written in JavaFX, and focusing on real-time data analysis applications.

One might want to use iMetricaFX for any of the following reasons:

  • To learn visually how MDFA hyperparameters interact with each other, and how their changes affect filter coefficients, frequency domain, signal extraction results.
  • To understand how MDFA signal extraction parameters, transformations on time series, additional explanatory variables, or other features affect the behavior of out-of-sample signal quality on future data or some  performance metrics (MSE, Sharpe ratios, seasonal/cycle adjustments, etc).
  • To experiment with MDFA parameter definitions for use in the MDFA-DeepLearning package.
  • To engineer financial trading signals and track their performance out-of-sample given specific trading requirements
  • To analyze correlations between a collection of (non)stationary time series and how they affect signal extractions

Overview

We now given an overview of the different components and features of the iMetricaFX system. The first, most obvious step, in getting iMetricaFX rolling is to define the data source. The easiest data source is a collection of .csv files that have DateTime stamps for the index, with the data values in the following columns. Each column should have a header to describe that column, (e.g. “DateTime”, “Timestamp”, and “Bid” or “Open”).

In the resources folder, there is a collection of .csv files which contain daily “Close” values of a few dozen NASDAQ stocks and etfs with historical data dating back the past 6 years. All files contain the same date range period.

Add files is selected from the top menu bar, and in selecting multiple files holding the ‘ctrl’ button will upload all files for streaming data from each simultaneously for multivariate signal extraction. The DateTime format of the files can be selected in the top menu as well, in the TimeFormat menu.

Once the data files have been selected and loaded, to compute the initial default signal using the default MDFA hyperparameter settings, use the “Compute Filter” button at the very bottom left. Once the data has been loaded and the initial filter coefficients have been computed for the initial time series observations, one can then construct several types of signals, apply out-of-sample data, adjust time series transformation, change filter parameters, add additional explanatory series, and more. Here is a list of all the different current interface controls.

  • Menu:File Open .csv data files for time series observations, save filter parameters, load filter parameters.
  • Menu:Signals Create/select new signals. When a new signal is added, the filter hyperparameters will be applied to the currently selected signal. All other signal parameters will remain fixed.
  • Menu:Target Series In the multivariate case, select the target series among all the series loaded. The target series represents the series from which the signal will be built.
  • Menu:TimeFormat The DateTime stamp format of the time series. Usually for daily data this looks like “yyyy-MM-dd” or for minute data “yyyy-MM-dd HH:mm:ss”
  • Menu:Options A variety of options can be chosen here. For now, only Prefiltering on/off is available.
  • Menu:Windows Select the windows that plot the various signal extraction properties. Options are MDFA Coefficients, Frequency Response Functions, and Time/Phase delay. More window types will be added in the future.
  • Compute Filter Compute the filter given the latest Series size observations and current filter hyperparamter settings
  • New Observation This button adds a new (multivariate) time series observation from the referenced .csv files. If there are no more values left in the .csv file, then no new values will be given.
  • Filter length Change the length of the filter from 4 to 100
  • Series size Change the number of in-sample time series observations for computing MDFA coefficients
  • FractionalD The fractional difference exponent between 0 and 1 (first-order differencing)
  • Filter Customization Adjust the smoothness and timeliness parameters
  • Forecasting/Smoothing Adjust the forecasting (negative value) or smoothing lag (positive value)
  • Target Filter Adjust the frequency range for the signal using two frequency cutoffs
  • Filter Constraints Toggle the i1 and/or i2 constraint and the set the Phase Shift for the i2 constraint
  • Filter Regularization Adjust the smoothness, decay, decay strength, and cross regularization for the filter coefficients
Advertisements

MDFA-DeepLearning Package: Hybrid MDFA-RNN networks for machine learning in multivariate time series

Overview

MDFA-DeepLearning is a library for building machine learning applications on large numbers of multivariate time series data, with a heavy emphasis on noisy (non)stationary data.  The goal of MDFA-DeepLearning is to learn underlying patterns, signals, and regimes in multivariate time series and to detect, predict, or forecast them in real-time with the aid of both a real-time feature extraction system based on the multivariate direct filter approach (MDFA) and deep recurrent neural networks (RNN). The feature extraction system utilizes the MDFA-Toolkit to construct K multivariate signals in real-time (the features) where each of the K features targets a certain frequency range in the underlying time series.  Furthermore, each (or some) of these features can also be forecasted multiple steps ahead, or smoothed, creating many possibilities of signal or regime learning in time series.

For the deep learning components, in this package we focus on two network structures, namely a recurrent weighted average network (RWA  Ostmeyer and Cowell) and a standard long-short term memory network.  The RWA cell is a type of RNN cell that computes a recurrent weighted average over every past processing timestep, unlike standard RNN cells which only process information from previous timesteps.

The overall general architecture of the proposed network is given in Figure 1 in the case of an RWA network, which we will discuss in more detail below.  For a given sequence of N multivariate time series values which have been transformed appropriately to a stationary sequence, which we denote Y_1, Y_2, …, Y_N,  a real-time feature extraction process is applied at each observation which is then used as input to an RWA (or LSTM) network, where the univariate output is a targeted signal value (regression) or a regime value (classification)..

Selection_070.png

Figure 1: Proposed network design using RWA cells to learn form the real-time feature extractor. The Y_t values are the (multivariate) transformed time series values, and the S_t are univariate outputs describing a target signal or regime   

Why Use MDFA-DeepLearning

One might want to develop predictive models in multivariate time series data using MDFA-DeepLearning if the time series exhibit any of the following properties:

  • High-Dimensionality (many (un)correlated nonstationary time series)
  • Difficult to forecast using traditional model-based methods (VARIMA/GARCH) or traditional deep learning methods (RNN/LSTM, component decomposition, etc)
  • Emphasis needed on out-of-sample real-time signal extraction and forecasting
  • Regime changing in the underlying dynamics (or data generating process) of the time series is a common occurrence

The MDFA-DeepLearning approach differs from most machine learning methods in time series analysis in that an emphasis on real-time feature extraction is utilized where the features extractors are build using the multivariate direct filter approach. The motivation behind this coupling of MDFA with machine learning is that, while many time series decomposition methodologies exist (from empirical mode decomposition to stochastic component analysis methods), all of these rely on either in-sample decompositions on historical data (useless for future data), and/or assumptions about the boundary values, neither of which are attractive when fast, real-time out-of-sample predictions are the emphasis.  Furthermore, simply applying standard recurrent neural networks for step-ahead forecasting or signal extraction directly on the original noisy data is a useless exercise – the recurrent networks typically will only learn noise, producing signals and forecasts of little to no value (in most cases, the latter).

As mentioned, the back-end used for the novel feature extraction is the multivariate direct filter approach (MDFA), and is used to extract both local (higher-frequency) and global (low-frequency) features in real-time, out-of-sample, and output these features in a multivariate time series as inputs into an RWA or LSTM recurrent neural network. Thus the package is divided into essentially four different components which all need to be defined properly in order to produce predictive models for time series data:

  • Labeling interface
  • Feature extractors
  • DataSetIterator interface
  • Learning interface

Labeling interface

The package includes an interface for labeling time series. The labeling process takes segments of historical data, and labels each time series observation in some manner. There are three types of labels that can be used:

  • Observational labeling: every time series observation is labeled by a signal value (for example a target value computed by a symmetric target filter). This is sequence-to-sequence labeling for time series regression.
  • Fixed Period labeling: every period (day, week, etc) is labeled, typically by a one-hot vector. This is sequence-to-value labeling. The end of the period is labeled and the rest of the values are not (masked by nonvalues in the code).
  • Regime labeling: every value in a specific regime is labeled, either by a one-hot vector (for example, long (1,0) short (0,1) neutral (0,0), or trend (1,0) and mean-reverting (0,1)). This is another example of sequence-to-sequence, but using one-hot vectors and now in the form of sequence classification.

Other labeling strategies can certainly be used, but these are the three most common. We will give an outline on how to create a custom labeling strategy in a future article.

Feature Extractors

The package contains a feature extraction class called MDFAFeatureExtraction which, when instantiated, is used as the input to the a DataSetIterator. The MDFAFeatureExtraction contains a default automated feature extraction builder where a value of K is given as the number of features and a lag to indicate smoothing or forecasting steps-ahead.

One application of the MDFA feature extraction tool is to decompose a multivariate time series into K components in real-time which are close to being “orthogonal”, meaning in this sense that the frequency information from each of the components are relatively disjoint.  A precise mathematical formulation of this property and examples of the MDFAFeatureExtraction to follow.  Another example used for turning-point detection in trends is to decompose the multivariate series into K number of low-frequency components with different speeds and forecast/smoothing characteristics.

DataSetIterator

The DataSetIterator is an interface for ND4J that handles fast N-d array manipulation akin to numpy in Python. More specifically, the DataSetIterator handles traversing through a dataset and preparing data for a recurrent neural network. Our datasets in this package are outputs from the TimeSeries through the MDFAFeatureExtraction objects which then become the input to the RNN network. The DataSetIterator also performs the labeling and how output will be arranged. Thus it is essentially the communication from the underlying time series to the extraction process and then how it is treated as input and output to the RNN. In the package, we have designed two examples of DataSetIterators, one for regression and one for classification, that will be described in more detail in a later article.

Learning interface

Finally, the learning interface is where the final network is defined, all the parameters of the network, the activation and loss functions, and number/type of layers (LSTM, FeedForward, etc). The underlying computational framework for this component uses DeepLearning4J.

Requirements and Example

MDFA-DeepLearning requires both the MDFA-Toolkit package for constructing the time series feature extractors and the Eclipse Deeplearning4j (dl4j) library for the deep recurrent neural network constructors. The dl4j library is freely available at github.com/deeplearning4j, but is included in the build of this package using Gradle as the dependency management tool.

The back-end for the dl4j package will depend on your computational hardware, but is available on a local native basis using CPUs, or can take advantage of GPUs using CUDA libraries (CUDA 8.0 was used to test current version of MDFA-DeepLearnng). In this package I have included a reference to both versions (assuming a standard linux64 architecture).

The back-end used for the novel feature extraction technique, as mentioned, is the MDFA-Toolkit (available here), which will run on the ND4J package. The feature extraction begins by defining K MDFA objects, called MDFABase from the MDFA-Toolkit, with the fixed parameters set for each MDFA object. For example, here we define K=4 MDFABase objects, that will be used to extract different types of trends at different speeds in a fractionally differenced time series. Please refer to the MDFA-Toolkit documentation for more information on the definition of each MDFA parameter.

MDFABase[] anyMDFAs = new MDFABase[4];

anyMDFAs[0] = (new MDFABase()).setLowpassCutoff(Math.PI/8.0)
.setI1(1)
.setSmooth(.2)
.setLag(-3)
.setLambda(2.0)
.setAlpha(2.0)
.setFilterLength(5);

anyMDFAs[1] = (new MDFABase()).setLowpassCutoff(Math.PI/10.0)
.setLag(-2)
.setAlpha(2.0)
.setFilterLength(5);

anyMDFAs[2] = (new MDFABase()).setLowpassCutoff(Math.PI/4.0)
.setDecayStart(.1)
.setDecayStrength(.2)
.setLag(-1)
.setFilterLength(5);

anyMDFAs[3] = (new MDFABase()).setLowpassCutoff(Math.PI/14.0)
.setSmooth(.2)
.setDecayStart(.1)
.setDecayStrength(.1)
.setFilterLength(5);

More concrete, in-depth step by step examples and tutorials will be given in the source code on github and in this blog, but here we will just give a brief overview on an example main program using these features.


/* Define the .csv data file from where we built train/test dataIterators */
String[] dataFiles = new String[]{"AAPL.daily.csv"};

/* Information about the .csv timeseries file */
TimeSeriesFile fileInfo = new TimeSeriesFile("yyyy-MM-dd", "Index", "Open");

/* Define network parameters */
int miniBatchSize = 100;
int totalTrainExamples = 1500;
int totalTestExamples = 300;
int timeStepLength = 60;
int nHiddenLayers = 2;
int nHidden = 216;
int nEpochs = 400;
int seed = 123;
int iterations = 40;
double learningRate = .001;
double gradientNormThreshold = 10.0;

IUpdater updater = new Nesterovs(learningRate, .4);

/* Instantiate Feature Extractors as an array of MDFABase objects */
MDFAFeatureExtraction features = new MDFAFeatureExtraction(anyMDFAs); 

/* Instantiate a new RecurrentMdfaRegression network using the features defined above */
RecurrentMdfaRegression myNet = new RecurrentMdfaRegression(features);

/* Set the data and the DataIterator parameters */
myNet.setTrainingTestData(dataFiles, fileInfo, miniBatchSize, totalTrainExamples, totalTestExamples, timeStepLength);

/* Usually a good idea to normalize the data */
myNet.normalizeData();

/* Build the LSTM (default network) layers */
myNet.buildNetworkLayers(nHiddenLayers, nHidden,
			RecurrentMdfaRegression.setNeuralNetConfiguration(seed, iterations, learningRate, gradientNormThreshold, 0, updater));

/* An optional dl4j control panel to in the browser */
myNet.setupUserInterface();

/* Train on the number of Epochs */
myNet.train(nEpochs);

/* Print/plot results and stats */
myNet.printPredicitions();
myNet.plotBatches(10);

The main points here are that essentially three components need to be defined:

  1. The .csv time series data file from which the DataIterator will extract the time series data for both labeling and learning. Two data sets will be created from this, a train set and a test set. Referrencing to multiple files from which to extract training and test sets is also possible. In dl4j, training and test data is built in the form of a DataSetIterator interface (org.nd4j.linalg.dataset.api.iterator).  In the package, we have defined a MDFADataSetIterator and a MDFARegressionDataSetIterator. More DataSetIterators for various applications will be added on an ongoing basis.
  2. The network RecurrentMdfaRegression is initiated, and needs to contain the feature signal extractors. Any set of feature extractors can be added, here we used the ones defined above as an example.
  3. Finally, the LSTM (or recurrent weighted average) network parameters need to be defined, this will then be used to construct the layers of the recurrent network.

With these three steps defined the network should be ready to train and test. The challenge is of course defining the feature extraction parameters. In later articles, we will give tips and tricks into what works best for what type of learning applications in large time series.

 

MDFA-Toolkit: A JAVA package for real-time signal extraction in large multivariate time series

The multivariate direct filter approach (MDFA) is a generic real-time signal extraction and forecasting framework endowed with a richly parameterized interface allowing for adaptive and fully-regularized data analysis in large multivariate time series. The methodology is based primarily in the frequency domain, where all the optimization criteria is defined, from regularization, to forecasting, to filter constraints. For an in-depth tutorial on the mathematical formation, the reader is invited to check out any of the many publications or tutorials on the subject from blog.zhaw.ch.

This MDFA-Toolkit (clone here) provides a fast, modularized, and adaptive framework in JAVA for doing such real-time signal extraction for a variety of applications. Furthermore, we have developed several components to the package featuring streaming time series data analysis tools not known to be available anywhere else. Such new features include:

  • A fractional differencing optimization tool for transforming nonstationary time-series into stationary time series while preserving memory (inspired by Marcos Lopez de Prado’s recent book on Advances in Financial Machine Learning, Wiley 2018).
  • Easy to use interface to four different signal generation outputs:
    Univariate series -> univariate signal
    Univariate series -> multivariate signal
    Multivariate series -> univariate signal
    Multivariate series -> multivariate signal
  • Generalization of optimization criterion for the signal extraction. One can use a periodogram, or a model-based spectral density of the data, or anything in between.
  • Real-time adaptive parameterization control – make slight adjustments to the filter process parameterization effortlessly
  • Build a filtering process from simpler user-defined filters, applying customization and reducing degrees of freedom.

This package also provides an API to three other real-time data analysis frameworks that are now or soon available

  • iMetricaFX – An app written entirely in JavaFX for doing real-time time series data analysis with MDFA
  • MDFA-DeepLearning – A new recurrent neural network methodology for learning in large noisy time series
  • MDFA-Tradengineer – An automated algorithmic trading platform combining MDFA-Toolkit, MDFA-DeepLearning, and Esper – a library for complex event processing (CEP) and streaming analytics

To start the most basic signal extraction process using MDFA-Toolkit, three things need to be defined.

  1. The data streaming process which determines from where and what kind of data will be streamed
  2. A transformation of the data, which includes any logarithmic transform, normalization, and/or (fractional) differencing
  3. A signal extraction definition which is defined by the MDFA parameterization

Data streaming

In the current version, time series data is providing by a streaming CSVReader, where the time series index is given by a String DateTime stamp is the first column, and the value(s) are given in the following columns. For multivariate data, two options are available for streaming data.  1) A multiple column .csv file, with each value of the time series in a separate column 2) or in multiple referenced single column time-stamped .csv files. In this case, the time series DateTime stamps will be checked to see if in agreement. If not, an exception will be thrown. More sophisticated multivariate time series data streamers which account for missing values will soon be available.

Transforming the data

Depending on the type of time series data and the application or objectives of the real time signal extraction process, transforming the data in real-time might be an attractive feature. The transformation of the data can include (but not limited to) several different things

  • A Box-Cox transform, one of the more common transformations in financial and other non-stationary time series.
  • (fractional)-differencing, defined by a value d in [0,1]. When d=1, standard first-order differencing is applied.
  • For stationary series, standard mean-variance normalization or a more exotic GARCH normalization which attempts to model the underlying volatility is also available.

Signal extraction definition

Once the data streaming and transformation procedures have been defined, the signal extraction parameters can then be set in a univariate or multivariate setting. (Multiple signals can be constructed as well, so that the output is a multivariate signal. A signal extraction process can be defined by defining and MDFABase object (or an array of MDFABase objects in the mulivariate signal case). The parameters that are defined are as follows:

  • Filter length: the length L in number of lags of the resulting filter
  • Low-pass/band-pass frequency cutoffs: which frequency range is to be filtered from the time-series data
  • In-sample data length: how much historical data need to construct the MDFA filter
  • Customization: α (smoothness) and λ (timeliness) focuses on emphasizing smoothness of the filter by mollifying high-frequency noise and optimizing timeliness of filter by emphasizing error optimization in phase delay in frequency domain
  • Regularization parameters: controls the decay rate and strength, smoothness of the (multivariate) filter coefficients, and cross-series similarity in the multivariate case
  • Lag: controls the forecasting (negative values) or smoothing (positive values)
  • Filter constraints i1 and i2: Constrains the filter coefficients to sum to one (i1) and/or the dot product with (0,1…, L) is equal to the phase shift, where L is the filter length.
  • Phase-shift: the derivative of the frequency response function at the zero frequency.

All these parameters are controlled in an MDFABase object, which holds all the information associated with the filtering process. It includes it’s own interface which ensures the MDFA filter coefficients are updated automatically anytime the user changes a parameter in real-time.

Selection_069.png

Figure 1: Overview of the main module components of MDFA-Toolkit and how they are connected

As shown in Figure 1, the main components that need to be defined in order to define a signal extraction process in MDFA-Toolkit.  The signal extraction process begins with a handle on the data streaming process, which in this article we will demonstrate using a simple CSV market file reader that is included in the package. The CSV file should contain the raw time series data, and ideally a time (or date) stamp column. In the case there is no time stamp column, such a stamp will simply be made up for each value.

Once the data stream has been defined, these are then passed into a time series transformation process, which handles automatically all the data transformations which new data is streamed. As we’ll see, the TargetSeries object defines such transformations and all streaming data is passed added directly to the TargetSeries object. A MultivariateFXSeries is then initiated with references to each TargetSeries objects. The MDFABase objects contain the MDFA parameters and are added to the MultivariateFXSeries to produce the final signal extraction output.

To demonstrate these components and how they come together, we illustrate the package with a simple example where we wish to extract three independent signals from AAPL daily open prices from the past 5 years. We also do this in a multivariate setting, to see how all the components interact, yielding a multivariate series -> multivariate signal.


//Define three data source files, the first one will be the target series
String[] dataFiles = new String[]{"AAPL.daily.csv", "QQQ.daily.csv", "GOOG.daily.csv"};

//Create a CSV market feed, where Index is the Date column and Open is the data
CsvFeed marketFeed = new CsvFeed(dataFiles, "Index", "Open");

/* Create three independent signal extraction definitions using MDFABase:
One lowpass filter with cutoff PI/20 and two bandpass filters
*/
MDFABase[] anyMDFAs = new MDFABase[3];
anyMDFAs[0] = (new MDFABase()).setLowpassCutoff(Math.PI/20.0)
                              .setI1(1)
                              .setHybridForecast(.01)
                              .setSmooth(.3)
                              .setDecayStart(.1)
                              .setDecayStrength(.2)
                              .setLag(-2.0)
                              .setLambda(2.0)
                              .setAlpha(2.0)
                              .setSeriesLength(400); 

anyMDFAs[1] = (new MDFABase()).setLowpassCutoff(Math.PI/10.0)
                              .setBandPassCutoff(Math.PI/15.0)
                              .setSmooth(.1)
                              .setSeriesLength(400); 

anyMDFAs[2] = (new MDFABase()).setLowpassCutoff(Math.PI/5.0)
                              .setBandPassCutoff(Math.PI/10.0)
                              .setSmooth(.1)
                              .setSeriesLength(400);

/*
Instantiate a multivariate series, with the MDFABase definitions,
and the Date format of the CSV market feed
*/
MultivariateFXSeries fxSeries = new MultivariateFXSeries(anyMDFAs, "yyyy-MM-dd");

/*
Now add the three series, each one a TargetSeries representing the series
we will receive from the csv market feed. The TargetSeries
defines the data transformation. Here we use differencing order with
log-transform applied
*/
fxSeries.addSeries(new TargetSeries(1.0, true, "AAPL"));
fxSeries.addSeries(new TargetSeries(1.0, true, "QQQ"));
fxSeries.addSeries(new TargetSeries(1.0, true, "GOOG"));
/*
Now start filling the fxSeries will data, we will start with
600 of the first observations from the market feed
*/
for(int i = 0; i < 600; i++) {
   TimeSeriesEntry observation = marketFeed.getNextMultivariateObservation();
   fxSeries.addValue(observation.getDateTime(), observation.getValue());
}

//Now compute the filter coefficients with the current data
fxSeries.computeAllFilterCoefficients();

//You can also chop off some of the data, he we chop off 70 observations
fxSeries.chopFirstObservations(70);

//Plot the data so far
fxSeries.plotSignals("Original");

Selection_067.png

Figure 2: Output of the three signals on the target series (red) AAPL

In the first line, we reference three data sources (AAPL daily open, GOOG daily open, and SPY daily open), where all signals are constructed from the target signal which is by default, the first series referenced in the data market feed. The second two series act as explanatory series.  The filter coeffcients are computed using the latest 400 observations, since in this example 400 was used as the insample setSeriesLength, value for all signals. As a side note, different insample values can be used for each signal, which allows one to study the affects of insample data sizes on signal output quality.  Figure 2 shows the resulting insample signals created from the latest 400 observations.

We now add 600 more observations out-of-sample, chop off the first 400, and then see how one can change a couple of parameters on the first signal (first MDFABase object).


for(int i = 0; i < 600; i++) {
   TimeSeriesEntry observation = marketFeed.getNextMultivariateObservation();
   fxSeries.addValue(observation.getDateTime(), observation.getValue());
}

fxSeries.chopFirstObservations(400);
fxSeries.plotSignals("New 400");

/* Now change the lowpass cutoff to PI/6
   and the lag to -3.0 in the first signal (index 0) */
fxSeries.getMDFAFactory(0).setLowpassCutoff(Math.PI/6.0);
fxSeries.getMDFAFactory(0).setLag(-3.0);

/* Recompute the filter coefficients with new parameters */
fxSeries.computeFilterCoefficients(0);
fxSeries.plotSignals("Changed first signal");

Selection_067.png

Figure 3: After adding 600 new observations out-of-sample signal values

After adding the 600 values out-of-sample and plotting, we then proceed to change the lowpass cutoff of the first signal to PI/6, and the lag to -3.0 (forecasting three steps ahead). This is done by accessing the MDFAFactory and getting handle on first signal (index 0), and setting the new parameters. The filter coefficients are then recomputed on the newest 400 values (but now all signal values are insample).

In the MDFA-Toolkit, plotting is done using JFreeChart, however iMetricaFX provides an app for building signal extraction pipelines with this toolkit providing the backend where all the automated plotting, analysis, and graphics are handled in JavaFX, creating a much more interactive signal extraction environment. Many more features to the MDFA-Toolkit are being constantly added, especially in regard to features boosting applications in Machine Learning, such as we will see in the next upcoming article.

Big Data analytics in time series

We also implement in MDFA-Toolkit an interface to Apache Spark-TS, which provides a Spark RDD for Time series objects, geared towards high dimension multivariate time series. Large-scale time-series data shows up across a variety of domains. Distributed as the spark-ts package, a library developed by Cloudera’s Data Science team essentially enables analysis of data sets comprising millions of time series, each with millions of measurements. The Spark-TS package runs atop Apache Spark. A tutorial on creating an Apache Spark-TS connection with MDFA-Toolkit is currently being developed.