Perfect Play in the game of nim – Artificial thoughts

This post reports on the creation of a python package for the game of nim. The package contains a function that finds the perfect move in a given position and informs on whether the position is winning. In the package we make use of Charles Leonard Bouton’s solution of the game of Nim. We use the so-called normal variant, where the player with the last move wins.

The creation of the package is preparatory work for the reinforcement learning of the game of nim.

The code is available on github.

In this post we follow the standard nomenclature as used e.g. on wikipedia and in this article .

Remarks on ⊕, the bitwise Xor Operator

⊕ is the symbol normally used in maths for the bitwise xor operator. Another way of describing ⊕ is as bitwise addition modulo-2. An explanation of how this ⊕ operator works is given below.
In python there is an implementation of the bitwise xor operator. The symbol is however ^. So a⊕b = a^b (in python). This may be confusing to people familiar with other programming languages, as in some languages ^ refers instead to exponentiation.
To avoid confusion in the package we define our own bitwise xor operator, calling it instead |NIM_PLUS|. For this we make use of the infix package https://pypi.python.org/pypi/infix/. The infix package allows the definition of custom operators, which aid readability for those accustomed with mathematical operators.
Also we define a function for the bitwise sum of a list of integers, calling it NIM_SUM

Examples

Examples of the bitwise xor operator, |NIM_PLUS| resp. ⊕, in action

Example 13 ⊕ 7 = 3 |NIM_PLUS| 7 = 4.
Explanation:
3 : has the following bitwise 8-bit representation : [0, 0, 0, 0, 0, 0, 1, 1]
7 : has the following bitwise8-bit representation : [0, 0, 0, 0, 0, 1, 1, 1]The bitwise sum is [0, 0, 0, 0, 0, 1, 2, 2]. Taking mod 2 this becomes[0, 0, 0, 0, 0, 1, 0, 0] = 4
13 ⊕ 17 = 13 |NIM_PLUS| 17 = 28

Explanation:
13 : has the following bitwise 8-bit representation : [0, 0, 0, 0, 1, 1, 0, 1]
11 : has the following bitwise 8-bit representation : [0, 0, 0, 1, 0, 0, 0, 1]

The bitwise-xor-operation yields [0, 0, 0, 1, 1, 1, 0, 0] = 28

As mentioned, the package provides a generalised version of NIM_PLUS, called NIM_SUM, which expects a list of integers as input.

For a list of two this reduces to NIM_PLUS, in the following way

NIM_SUM([a , b]) =

a |NIM_PLUS| b =

a ⊕ b =

a ^ b ( in python)

For a longer list of three this becomes

NIM_SUM([a , b, c]) =

a |NIM_PLUS| b |NIM_PLUS| c=

a ⊕ b ⊕ c =

a ^ b ^ c ( in python)

The package also provides

a command line function.
After installing the package one can run the e.g. the following command

> nim-game -H 1 3 5 7

The output is:

{‘winning’: False, ‘description’: ‘From heap 3 take this number of beans : 1’, ‘move’: [3, 1], ‘next_position’: [1, 3, 5, 6], ‘position_before_move’: [1, 3, 5, 7]}

various unit tests, e.g. of the expected properties of NIM_PLUS and NIM_SUM. Further tests include a playout, with the expectation of alternating winning and losing positions, the command line functionality (including parsing).
To find out from which heap to take away the “beans”, we follow the approach described in the nim-article on Wikipedia with this statement: “To find out which move to make, let X be the nim-sum of all the heap sizes. Find a heap where the nim-sum of X and heap-size is less than the heap-size – the winning strategy is to play in such a heap, reducing that heap to the nim sum of its original size with X. ” There is a function in the package devoted to finding such a heap, if there are more solutions than one the function choses the first heap among those. The function in the package is called find_heap_with_power.

Finally let’s look at an example playout. We use this code, which is almost identical to one of the unit tests in the package:

#!/usr/bin/env python3
# -*- coding: utf-8 -*-
"""
Created on Fri Dec 29 19:31:29 2017

@author: hfwittmann
"""

import numpy as np
from unittest import TestCase
from nim_perfect_play.nim_perfect_play import findWinningMove, NIM_PLUS, NIM_SUM, find_heap_with_power

# initialize
done = False
position = [1,3,5,7]

winning = findWinningMove(position)['winning']
previous_winning = not winning

while not done:
    
    result = findWinningMove(position)
    
    next_position =  result['next_position']
    winning = result['winning']
    assert winning ^ previous_winning, \
        'If the previous position was winning, after an optimal move the new position should be lost, \
        If the previous position was lost the new position should be winning. The position is : {}'.format(position)

    

    if sum(next_position) == 0:
        done = True
    
    print ('position :{}, next_position: {}, winning : {}, done: {}'.format(position, next_position, winning, done))
    
    position = next_position.copy()
    previous_winning = winning

The code above produces the following output for the playout:

position :[1, 3, 5, 7], next_position: [1, 3, 5, 6], winning : False, done: False
position :[1, 3, 5, 6], next_position: [0, 3, 5, 6], winning : True, done: False
position :[0, 3, 5, 6], next_position: [0, 3, 5, 5], winning : False, done: False
position :[0, 3, 5, 5], next_position: [0, 0, 5, 5], winning : True, done: False
position :[0, 0, 5, 5], next_position: [0, 0, 4, 5], winning : False, done: False
position :[0, 0, 4, 5], next_position: [0, 0, 4, 4], winning : True, done: False
position :[0, 0, 4, 4], next_position: [0, 0, 3, 4], winning : False, done: False
position :[0, 0, 3, 4], next_position: [0, 0, 3, 3], winning : True, done: False
position :[0, 0, 3, 3], next_position: [0, 0, 2, 3], winning : False, done: False
position :[0, 0, 2, 3], next_position: [0, 0, 2, 2], winning : True, done: False
position :[0, 0, 2, 2], next_position: [0, 0, 1, 2], winning : False, done: False
position :[0, 0, 1, 2], next_position: [0, 0, 1, 1], winning : True, done: False
position :[0, 0, 1, 1], next_position: [0, 0, 0, 1], winning : False, done: False
position :[0, 0, 0, 1], next_position: [0, 0, 0, 0], winning : True, done: True

The output always shows the current position, the next position, whether the current position is winning, and finally whether the game is finished. We observe that in the sequence the winning state alternates between True and False as it must for an optimally played game.

Also the output prints … done: True .. when the game is finished.