In this short post we perform a comparative analysis of a very simple regression problem in tensorflow and keras.
We start off with an eye-catching plot, representing the functioning of an optimizer using the stochastic gradient method. The plot is explained in more detail further below.
The focus is on the first principles of gradient descent. We replicate the results of 1,2. The post uses a Gradient Tape which in turn makes use of Automatic differentiation 3,4.
In the original implementation in 1, the training and testing data are not separate. The motivation behind the original version is – doubtless – to keep things as simple as possible, and to omit everything unimportant. We feel however, that it might be confusing to not have the training / testing split. Therefore we use a train/test split in the notebooks covered in this post.
Here we present first a “split-variation” of the original version, where the training and testing are in fact split.
We add two more notebooks that are replications of the split-variation, these are in particular:
- A tensorflow-based replication with a standard optimizer
- A tensorflow/keras implementation.
Please note that all three workbooks are self.contained. Moreover, the results are exactly the same between the notebooks.
As usual the code/notebooks can be found on github:
- A computer with at least 4 GB Ram
- The computer can run on Linux, MacOS or Windows
- Familiarity with Python
Let’s get started
We provide 3 notebooks with increasing differences from the original. These 3 notebooks are:
We will now discuss these in turn
The differences in Custom_training_basics_train_test from the original are as follows:
- We use the numpy package instead of tensorflow to generate random numbers
- we split the data into training and testing data
- We modify the plotting accordingly, to take into account the split.
- Further we add some explorational plots, that show the “goodness” or rather the badness of the new model before and after training. The badness is captured by the loss function.
- Also, we show its evolution and trace of the loss function during the gradient based optimisation.
- Additionally, we define a grad function instead of the train function used in the original notebook. We do this, because it fits more easily with alterations in the subsequent notebooks.
In Custom_training_basics_standard_optimizer the further differences from the original are as follows:
- the manual calculation of the parameter changes from the gradient is replaced by a standard optimizer, GradientDescentOptimizer, from the tensorflow package.
In Custom_training_basics_keras the further differences from the original are as follows:
- The model definition, compilation, fit and prediction are run via keras.
This concludes the post.