## Comparison of a very simple regression in tensorflow and keras

In this short post we perform a comparative  analysis of a very simple regression problem in tensorflow and keras.

We start off with an eye-catching plot, representing the functioning of an optimizer using the stochastic gradient method. The plot is explained in more detail further below.

A 3 rotatable version of the Loss function of the regression problem. For the hosting we use the free service by plotly. The black line is the path taken by the optimiser.  W, b are the slope and offset parameters of the model. A full view can be found here:  https://plot.ly/~hfwittmann/5/loss-function/#/

The focus is on  the first principles of gradient descent. We replicate the results of 1,2. The post uses a Gradient Tape which in turn makes use of Automatic differentiation 3,4.

In the original implementation in 1, the training and testing data are not separate. The motivation behind the original version is – doubtless – to keep things as simple as possible, and to omit everything unimportant. We feel however, that it might be confusing to not have the training / testing split. Therefore we use a train/test split in the notebooks covered in this post.

Custom_training_basics_standard_optimizer

Here we present first a “split-variation” of the original version, where the training and testing are in fact split.

We add two more notebooks that are replications of the split-variation, these are in particular:

• A tensorflow-based replication with a standard optimizer
• A tensorflow/keras implementation.

Please note that all three workbooks are self.contained. Moreover, the results are exactly the same between the notebooks.

As usual the code/notebooks can be found on github:

https://github.com/hfwittmann/comparison-tensorflow-keras

## Knime – Multivariate time series

Intro:

Knime is a  very powerful machine learning tool, particularly suitable for the management of complicated workflows as well as rapid prototyping.

It has recently become yet more useful with the arrival of easy-to-use Python nodes. This is true because sometimes the set of nodes – which is large – still may not provide the exact functionality that you need. On the other hand, Python is flexible enough to do anything easily. Therefore the marriage of the two is very powerful.
This post is about the transfer of the three previous time series prediction posts into a knime workflow.
This will provide for a good overview of the dataflow, making it thus more easily manageable.
The workflow code is available on github:

## Multivariate Time Series Forecasting with Neural Networks (3) – multivariate signal noise mixtures

In this follow up post we apply the same methods we developed previously to a different dataset. In this third post we mix the previous two datasets.

So, on the one hand, we have noise signals, on the other hand we have innovators and followers.

## Multivariate Time Series Forecasting with Neural Networks (1)

In this post we present the results of a competition between various forecasting techniques applied to multivariate time series. The forecasting techniques we use are some neural networks, and also – as a benchmark – arima.

In particular the neural networks we considered are long short term memory (lstm) networks, and dense networks.

The winner in the setting is lstm, followed by dense neural networks followed by arima.

Of course, arima is actually typically applied to univariate time series, where it works extremely well. For arima we adopt the approach to treat the multivariate time series as a collection of many univariate time series. As stated, arima is not the main focus of this post but used only to demonstrate a benchmark.

To test these forecasting techniques we use random time series. We distinguish between innovator time series and follower time series. Innovator time series are composed of random innovations and can therefore not be forecast. Follower time series are functions of lagged innovator time series and can therefore in principle be forecast.

It turns out that our dense network can only forecast simple functions of innovator time series. For instance the sum of two-legged innovator series can be forecast by our dense network. However a product is already more difficult for a dense network.

In contrast the lstm network deals with products easily and also with more complicated functions.