Knime – Multivariate time series

Intro:

Knime is a  very powerful machine learning tool, particularly suitable for the management of complicated workflows as well as rapid prototyping.

It has recently become yet more useful with the arrival of easy-to-use Python nodes. This is true because sometimes the set of nodes – which is large – still may not provide the exact functionality that you need. On the other hand, Python is flexible enough to do anything easily. Therefore the marriage of the two is very powerful.
This post is about the transfer of the three previous time series prediction posts into a knime workflow.
This will provide for a good overview of the dataflow, making it thus more easily manageable.
The workflow code is available on github:

Continue reading “Knime – Multivariate time series”

Multivariate Time Series Forecasting with Neural Networks (3) – multivariate signal noise mixtures

In this follow up post we apply the same methods we developed previously to a different dataset. In this third post we mix the previous two datasets.

So, on the one hand, we have noise signals, on the other hand we have innovators and followers.

Continue reading “Multivariate Time Series Forecasting with Neural Networks (3) – multivariate signal noise mixtures”

Multivariate Time Series Forecasting with Neural Networks (1)

In this post we present the results of a competition between various forecasting techniques applied to multivariate time series. The forecasting techniques we use are some neural networks, and also – as a benchmark – arima.

In particular the neural networks we considered are long short term memory (lstm) networks, and dense networks.

The winner in the setting is lstm, followed by dense neural networks followed by arima.

Of course, arima is actually typically applied to univariate time series, where it works extremely well. For arima we adopt the approach to treat the multivariate time series as a collection of many univariate time series. As stated, arima is not the main focus of this post but used only to demonstrate a benchmark.

To test these forecasting techniques we use random time series. We distinguish between innovator time series and follower time series. Innovator time series are composed of random innovations and can therefore not be forecast. Follower time series are functions of lagged innovator time series and can therefore in principle be forecast.

It turns out that our dense network can only forecast simple functions of innovator time series. For instance the sum of two-legged innovator series can be forecast by our dense network. However a product is already more difficult for a dense network.

In contrast the lstm network deals with products easily and also with more complicated functions.

Continue reading “Multivariate Time Series Forecasting with Neural Networks (1)”

Game of Nim, Reinforcement Learning

Intro

In this post we report success in using reinforcement to learn the game of nim. We had previously cited two theses (ERIK JÄRLEBERG (2011) and  PAUL GRAHAM & WILLIAM LORD (2015)) that used Q-learning to learn the game of nim. However, in this setting, the scaling issues with Q-learning are much more severe than with value-learning. In this post we use a value-based approach with a table. Because the value-based approach is much more efficient than Q-learning no functional approximation is needed, up to reasonable heap sizes.

Continue reading “Game of Nim, Reinforcement Learning”

Game of Nim, Supervised Learning

There are entire theses devoted to reinforcement learning of the game of nim, in particular those of ERIK JÄRLEBERG (2011) and  PAUL GRAHAM & WILLIAM LORD (2015).

Those two were successful in training a reinforcement-based agent to play the game of nim with a high percentage of accurate moves. However, they used lookup tables as their evaluation functions, which leads to scalability  problems.  Further, there is no particular advantage in using Q-learning as opposed to a Value-based approach. This is due to the fact that the “environment’s response” to a particular action (“take b beans from heap h”) is entirely known, and particularly simple. This is different, e.g. from games where the rules are unclear and not stated explicitly and must be learned by the agent, as is the case in the video. In the game of nim the rules are stated explicitly. Indeed, if the action “take b beans from heap h” is possible, i.e. there are at least b beans on heap h, then the update rule is:

heapSize(h) -> heapSize(h) – b

In other the size of heap h is reduced by he beans taken away from it. Therefore, as stated, there is no advantage in using Q-learning over a Value-based approach. The curse of dimensionality, however, is worse for the Q-learning setup as for a heap vector of (h0, h1, …, hn-1) there is one Value but, without paying special attention to duplicates,
~ h0* h1  * … * hn-1  actions, and therefore ~ h0* h1  * … * hn-1 Q-values. There will use a Value based approach.

In other words, we want to use a neural network approximation for the evaluation function. A priori, it is, however, by no means clear that this type of function approximation will work. Yet, the game of Nim is in a sense easy, as there is  a complete solution of the game. We can use this solution to our advantage by using it to estimate whether it is likely that the mentioned network approximation will work.

A simple way is to use supervised learning as a test.

The simplest case is to test a classification of a position as winning or losing.

Continue reading “Game of Nim, Supervised Learning”

Perfect Play in the game of nim

This post reports on the creation of a python package for the game of nim. The package contains a function that finds the perfect move in a given position and informs on whether the position is winning. In the package we make use of Charles Leonard Bouton’s solution of the game of Nim. We use the so-called normal variant, where the player with the last move wins.

The creation of the package is preparatory work for the reinforcement learning of the game of nim.

The code is available on github.

Continue reading “Perfect Play in the game of nim”

R shiny custom docker server with caching

So now we’re ready to deploy our own custom R Shiny server with caching.

We had previously already discussed the pros and cons of

  1. hosting your own server, by this we mean a docker based server in the cloud
  2. Signing up at https://www.shinyapps.io/

See this https://arthought.com/r-shiny-stock-analysis. Then we had opted for option 2, mainly to avoid complexity. We chose the free plan option, however there is a fairly tight limit on the number of hours the free-plan shinyapps.io based apps will run. On the other hand, the non-free shinyapps.io pricing options, while very practical, might be a stretch for some people’s budget, the starter package is currently at 9$/month.

Therefore in this post we deploy a server on digital ocean on the smallest droplet, which currently is at 5$/month. This is a link to the digital ocean pricing options.

We want to achieve the same functionality as the already mentioned predecessor post, namely plotting various analyses of a DAX-stock vs the DAX index itself.

In order for this to work as smoothly as possible we make use of caching, as discussed previously.

The app code can be found on github.

The other part of this post concerns how to spin up the docker based RShiny server, the necessary files are also on github.

Continue reading “R shiny custom docker server with caching”

R caching with financial data

In the previous post we looked at a simple data caching example which we used to explore the workings of the R-package DataCache.

In this post we continue with this exploration. Instead of just using system time as the datafeed we now use a more real world example of financial data. This is again in preparation for running a custom Shiny server.

Continue reading “R caching with financial data”

R-caching (and scheduling)

This is in preparation for running a custom Shiny server. We want to accelerate the server by using caching. In the this post we take a look at a candidate caching package.

In this post we’ll explore a the package DataCache. It is a very useful package, however, I found that for some reason the provided weather data example was not working.  So I wanted to simulate a  datafeed using a scheduler, preferably within R. There is a scheduler package for R tcltk2. It worked for me from the command line, however when running this in RStudio or Rscript there is a small complication, which we will cover further below.

The data function here  outputs the system time, using Sys.time(). When it is cached it uses a previous version of the cached time, therefore it is is smaller than the current Sys.time(). In general

Current time >=Cached time  .

Let’s look at the output first:

So we can see that it works. Basically, the scheduler does a cycle ~ every 200ms, whereas the Cached time is only updated every second, which implies that the update happens after 5 = 1000ms/200ms cycles.

Let’s discuss the code. We have three parts:

Continue reading “R-caching (and scheduling)”