This post is a follow-up of ^{1}.

We start off with an eye-catching plot, representing the functioning of an optimiser using the stochastic gradient method. The plot is explained in more detail further below.

Visualisation of a loss surface with multiple minima. The surface is in gray, the exemplary path taken by the optimiser is in colors.

We had previously explored a visualisation of gradient-based optimisation. To this end we plotted an optimisation path against the background of its corresponding loss surface.

Remark: the loss surface is a function that is typically used to describe the badness of fit of an optimisation.

Following the logic of a tutorial from tensorflow^{2} , in the previous case we looked at a very simple optimisation, maybe one of the simplest of all : we used linear regression. As data points we used a straight line with mild noise (= a little bit of noise, but not too much). Further, we used the offset and the inclination as fitting parameters. As a loss function we used a quadratic function, which is very typical. It turns out that this produces a very well behaved loss surface, with a simple valley, which the optimiser has no problem of finding.

As a special case, the surface can be one-dimensional. In this current post, for simplicity, we use only one optimisation parameter, therefore the surface is one-dimensional. The fitting parameter we use here is the straight line’s angle. Despite this one-dimensional simplicity, we construct a surface which is reasonably hard for the gradient-based optimiser, and in fact can derail it.

The primary purpose of this post is to construct from artificial data and visualise a loss surface that is not well behaved, and which in particular has many minima.

Continue reading “Loss surface with multiple valleys”