The Game of Nim is a simple two player game. The rules of the game are indeed very simple, however, to play it well is difficult for the uninitiated.
It is a mathematical game and technically it is a zero sum complete information game, which sounds complicated but it is not. It means there are no dice in this game and a loss for one side is a win for the other.
More information can be found here: https://en.wikipedia.org/wiki/Nim
In the popular variant, which we adopt here, the starting position is formed by three random sized heaps of objects (“beans”). One Player “A” starts by taking from any one heap any number of beans. Then the second Player “B” does likewise, i.e. s/he takes any number of beans from any one heap. The players take turns.
The game is typically played in the misère variant, that is the last person who plays a move loses (misère is french for destitution, that’s a bad situation, in case you did not know).
We’ve discussed how to extend the environment in previous posts. The main structural addition here are unit tests. These tests aim to add stability to the code and guidance to the user if he tries an action which is not allowed.
The action is supposed to be a list of two numbers, specifying the heap and the amount of beans taken, e.g. action = [0, 40] means from heap 0 take 40 (bear in mind that heap numbering starts at 0, so this is the first heap).
We use various assertions in the GameNimEnv to check that the action conforms to this pattern. In the unit test we specify non-conforming actions, in order to check whether these assertions actually work.
So first here are the assertions from GameNimEnv:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 |
assert isinstance(action, list), 'Wrong type.' + \ "Type is: {}. Should be <class 'list'>".format(type(action)) assert len(action) == 2, 'Wrong length.' + \ 'Length is:{}. Should by 2'.format(len(action)) heapnumber = action[0] beansnumber = action[1] heapsize = self.state[heapnumber] assert heapsize>=beansnumber, 'Heap is not big enough' + \ "Heap number is: {}. You tried to take {}, but there are only {} beans in this heap".format(heapnumber, beansnumber, heapsize) |
and here are the unit tests checking the proper working of the assertions by submitting non-conforming actions:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 |
G.reset() # use first heap firstHeap = G.state[0] # produce failures to check asserts are working # 1 failure with self.assertRaises(AssertionError) as context: # use wrong type for action (action should be a list of 2 values) G.step(0) self.assertTrue('Wrong type' in str(context.exception), str(context.exception)) # 2 failure with self.assertRaises(AssertionError) as context: # use wrong list length for action (action should be a list of 2 values) G.step([0]) self.assertTrue('Wrong length' in str(context.exception), str(context.exception)) # 3 failure with self.assertRaises(AssertionError) as context: # try to take more than is in the first heap to cause error G.step([0, firstHeap + 1]) |
To simulate the game we use our own custom OpenAI environment : https://github.com/hfwittmann/gym_game_nim.
1 2 3 4 5 6 7 8 9 |
# import packages import sys import numpy as np import matplotlib.style as style import matplotlib.pyplot as plt style.use('ggplot') # for debugging from IPython.core.debugger import set_trace |
Now import custom environment …
1 2 3 |
import gym import gym_game_nim env = gym.make('game_nim-v0') |
We reset the environment, to intialize, then we do a random playout
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 |
env.reset() state = env.state.copy() done = False reward = 0 # game development REWARDS = [] BEANSNUMBERS = [] STATES = [] HEAPSIZES = [] HEAPNUMBERS = [] while not done: # first choose heap with non-zero size nonZeroHeaps = np.arange(3) [env.state != 0] # zero index heapnumber heapnumber = np.random.choice(nonZeroHeaps) # size of the chosen heap heapsize = env.state[heapnumber] # take a random number of beans, of course fewer than the heap size beansnumber = np.random.choice(heapsize) + 1 # np.random.choice(heapsize) starts at zero, therefore + 1 next_state, reward, done, info = env.step([heapnumber, beansnumber]) # set_trace() STATES.append(state) REWARDS.append(reward) BEANSNUMBERS.append(beansnumber) HEAPSIZES.append(heapsize) HEAPNUMBERS.append(heapnumber) state = next_state.copy() |
Let’s now plot the game’s progress, and make a few observations:
1) The heap sizes should go down as a fucntion of the number of steps taken. In Plot ‘Random Playouts: Heap size’ we see that they do.
1 2 3 4 5 6 7 8 9 10 11 |
STATES = np.array(STATES) for heapnumber in range(3): plt.plot(STATES[:,heapnumber], label = 'Heap' + str(heapnumber) ) plt.title('Random Playouts: Heap size') plt.legend(loc='best') plt.xlabel('Steps taken') plt.ylabel('Heap size') plt.show() |
2) At each step, the maximum of the current heap sizes should be bigger (or equal) than the chosen’s heap size, which in turn should be bigger than the chosen number of beans that are taken away. In plot ‘Random Playouts Inequalities’ we see that they are.
1 2 3 4 5 6 |
plt.plot(STATES.max(axis=1), label = 'Max Heapsize') plt.plot(HEAPSIZES, label = "Chosen Heaps' sizes") plt.plot(BEANSNUMBERS, label = "Chosen beansnumbers") plt.title("Random Playouts Inequalities: \n Max Heap size is bigger o.e. than \nChosen Heaps' sizes is bigger o.e. than \nChosen beansnumbers\n (o.e. or equal )") plt.legend(loc='best') plt.show() |
In this post we had a look at extending OpenAI with a custom environment. We then played a random game and analysed with some plots.
In the second part, we’ll take a bite at learning the optimal strategy, using reinforcement learning.