Comparison of a very simple regression in tensorflow and keras

In this short post we perform a comparative  analysis of a very simple regression problem in tensorflow and keras.

We start off with an eye-catching plot, representing the functioning of an optimizer using the stochastic gradient method. The plot is explained in more detail further below.

plot from API

A 3 rotatable version of the Loss function of the regression problem. For the hosting we use the free service by plotly. The black line is the path taken by the optimiser.  W, b are the slope and offset parameters of the model. A full view can be found here:  https://plot.ly/~hfwittmann/5/loss-function/#/

The focus is on  the first principles of gradient descent. We replicate the results of 1,2. The post uses a Gradient Tape which in turn makes use of Automatic differentiation 3,4.

In the original implementation in 1, the training and testing data are not separate. The motivation behind the original version is – doubtless – to keep things as simple as possible, and to omit everything unimportant. We feel however, that it might be confusing to not have the training / testing split. Therefore we use a train/test split in the notebooks covered in this post.

Custom_training_basics_standard_optimizer

Here we present first a “split-variation” of the original version, where the training and testing are in fact split.

We add two more notebooks that are replications of the split-variation, these are in particular:

  • A tensorflow-based replication with a standard optimizer
  • A tensorflow/keras implementation.

Please note that all three workbooks are self.contained. Moreover, the results are exactly the same between the notebooks.

As usual the code/notebooks can be found on github:

https://github.com/hfwittmann/comparison-tensorflow-keras

Requirements

Hardware

  • A computer with at least 4 GB Ram

Software

  • The computer can run on Linux, MacOS or Windows

Wetware

  • Familiarity with Python

 

Let’s get  started

We provide 3 notebooks with increasing differences from the original. These 3 notebooks are:

  1. Custom_training_basics_standard_optimizer.ipynb
  2. Custom_training_basics_train_test.ipynb
  3. Custom_training_basics_keras.ipynb

We will now discuss these in turn

Custom_training_basics_train_test

The differences in Custom_training_basics_train_test from the original are as follows:

  • We use the numpy package instead of tensorflow to generate random numbers
  • we split the data into training and testing data
  • We modify the plotting accordingly, to take into account the split.
  • Further we add some explorational plots, that show the “goodness” or rather the badness of the new model before and after training. The badness is captured by the loss function.
  • Also, we show its evolution and trace of the loss function during the gradient based optimisation.
  • Additionally, we define a grad function instead of the train function used in the original notebook. We do this, because it fits more easily with alterations in the subsequent notebooks.
Starting point of the optimizer. The data is in blue, the initial model guess in black. The model is a straight line, characterised by a slope (W) and and offset (b). The initial value as are chosen as W = 5 b = 0 corresponding to a bad fit. This is shown in the diagram.

 

State of the model after some rounds of optimizer . Again, the data is in blue, the initial model guess in black. The model is a straight line, characterised by a slope (W) and and offset (b). The values are W = 3.23 b = 1.74 corresponding to a good fit. This is shown in the diagram. (The true values are W = 3, b = 2)

In Custom_training_basics_standard_optimizer the further differences from the original are as follows:

  • the manual calculation of the parameter changes from the gradient is replaced by a standard optimizer, GradientDescentOptimizer, from the tensorflow package.

Custom_training_basics_keras

In Custom_training_basics_keras the further differences from the original are as follows:

  • The model definition, compilation, fit and prediction are run via keras.

This concludes the post.

 

1.
Eager Execution  |  TensorFlow. TensorFlow. https://www.tensorflow.org/guide/eager. Published December 12, 2018. Accessed December 29, 2018.
2.
Custom training: basics  |  TensorFlow. TensorFlow. https://www.tensorflow.org/tutorials/eager/custom_training. Published December 17, 2018. Accessed December 29, 2018.
3.
Automatic differentiation and gradient tape  |  TensorFlow. TensorFlow. https://www.tensorflow.org/tutorials/eager/automatic_differentiation. Published December 12, 2018. Accessed December 29, 2018.
4.
Automatic differentiation. Wikipedia, The Free Encyclopedia. https://en.wikipedia.org/wiki/Automatic_differentiation. Published November 15, 2018. Accessed December 29, 2018.