Game of Nim, Supervised Learning

There are entire theses devoted to reinforcement learning of the game of nim, in particular those of ERIK JÄRLEBERG (2011) and  PAUL GRAHAM & WILLIAM LORD (2015).

Those two were successful in training a reinforcement-based agent to play the game of nim with a high percentage of accurate moves. However, they used lookup tables as their evaluation functions, which leads to scalability  problems.  Further, there is no particular advantage in using Q-learning as opposed to a Value-based approach. This is due to the fact that the “environment’s response” to a particular action (“take b beans from heap h”) is entirely known, and particularly simple. This is different, e.g. from games where the rules are unclear and not stated explicitly and must be learned by the agent, as is the case in the video. In the game of nim the rules are stated explicitly. Indeed, if the action “take b beans from heap h” is possible, i.e. there are at least b beans on heap h, then the update rule is:

heapSize(h) -> heapSize(h) – b

In other the size of heap h is reduced by he beans taken away from it. Therefore, as stated, there is no advantage in using Q-learning over a Value-based approach. The curse of dimensionality, however, is worse for the Q-learning setup as for a heap vector of (h0, h1, …, hn-1) there is one Value but, without paying special attention to duplicates,
~ h0* h1  * … * hn-1  actions, and therefore ~ h0* h1  * … * hn-1 Q-values. There will use a Value based approach.

In other words, we want to use a neural network approximation for the evaluation function. A priori, it is, however, by no means clear that this type of function approximation will work. Yet, the game of Nim is in a sense easy, as there is  a complete solution of the game. We can use this solution to our advantage by using it to estimate whether it is likely that the mentioned network approximation will work.

A simple way is to use supervised learning as a test.

The simplest case is to test a classification of a position as winning or losing.

Continue reading “Game of Nim, Supervised Learning”

Perfect Play in the game of nim

This post reports on the creation of a python package for the game of nim. The package contains a function that finds the perfect move in a given position and informs on whether the position is winning. In the package we make use of Charles Leonard Bouton’s solution of the game of Nim. We use the so-called normal variant, where the player with the last move wins.

The creation of the package is preparatory work for the reinforcement learning of the game of nim.

The code is available on github.

Continue reading “Perfect Play in the game of nim”

R shiny custom docker server with caching

So now we’re ready to deploy our own custom R Shiny server with caching.

We had previously already discussed the pros and cons of

  1. hosting your own server, by this we mean a docker based server in the cloud
  2. Signing up at

See this Then we had opted for option 2, mainly to avoid complexity. We chose the free plan option, however there is a fairly tight limit on the number of hours the free-plan based apps will run. On the other hand, the non-free pricing options, while very practical, might be a stretch for some people’s budget, the starter package is currently at 9$/month.

Therefore in this post we deploy a server on digital ocean on the smallest droplet, which currently is at 5$/month. This is a link to the digital ocean pricing options.

We want to achieve the same functionality as the already mentioned predecessor post, namely plotting various analyses of a DAX-stock vs the DAX index itself.

In order for this to work as smoothly as possible we make use of caching, as discussed previously.

The app code can be found on github.

The other part of this post concerns how to spin up the docker based RShiny server, the necessary files are also on github.

Continue reading “R shiny custom docker server with caching”

R-caching (and scheduling)

This is in preparation for running a custom Shiny server. We want to accelerate the server by using caching. In the this post we take a look at a candidate caching package.

In this post we’ll explore a the package DataCache. It is a very useful package, however, I found that for some reason the provided weather data example was not working.  So I wanted to simulate a  datafeed using a scheduler, preferably within R. There is a scheduler package for R tcltk2. It worked for me from the command line, however when running this in RStudio or Rscript there is a small complication, which we will cover further below.

The data function here  outputs the system time, using Sys.time(). When it is cached it uses a previous version of the cached time, therefore it is is smaller than the current Sys.time(). In general

Current time >=Cached time  .

Let’s look at the output first:

So we can see that it works. Basically, the scheduler does a cycle ~ every 200ms, whereas the Cached time is only updated every second, which implies that the update happens after 5 = 1000ms/200ms cycles.

Let’s discuss the code. We have three parts:

Continue reading “R-caching (and scheduling)”

Game of Nim

This post starts a mini series of two posts in which we want to solve the Game of Nim using Reinforcement learning. The first part of this mini series is devoted to having a look at using our own Nim-specific custom environment for OpenAI.

The Game of Nim is a simple two player game. The rules of the game are indeed very simple, however, to play it well is difficult for the uninitiated.

Continue reading “Game of Nim”

Windy Walk (part 2) – addendum


  1. Previously we constructed a very simple class to emulate the type of environment which is provided by OpenAI. Windy Walk (part 1)
  2. Then we implemented the same windy walk model as an extension to OpenAI. For this we created a custom python package, we named it gym_drifty_walk, you can grab it from github .
  3. These two versions we used, to produce exactly the same “Windy Walks” results, as shown by the plots.

I this addendum post we’ll take a look at  the package code in gym_drifty_walk.

There are excellent articles about how to create a python package, I have no intention to duplicate those, I recommend python-packaging. One warning: the packaging article is written for python 2, so be aware of that.

There are things missing in this (like e.g. tests) that you would typically add. We choose to use the most basic approach that works, which we believe lowers the barrier to comprehension.

The most basic steps you need are these:

  • Choose a package name
    We have already done that:  gym_drifty_walk
  • ✓ Follow the basic package structure
    Here the structure is like this :

    (Remark; This tree-representation of the directory structure can be obtained using tree )

    Continue reading “Windy Walk (part 2) – addendum”

Windy Walk (part 2)

Recap: Previously we constructed a very simple class to emulate the type of environment which is provided by OpenAI. Then we simulated two windy walks and used the result of these walks to produce some plots.

Now we want to produce the same results again but via a different and more interesting route. This time we want a windy walk model that is implemented as an extension to OpenAI.

Continue reading “Windy Walk (part 2)”