Loss surface with multiple valleys

This post is a follow-up of 1.

We start off with an eye-catching plot, representing the functioning of an optimiser using the stochastic gradient method. The plot is explained in more detail further below.

Visualisation of a loss surface with multiple minima. The surface is in gray, the exemplary path taken by the optimiser  is in colors.

We had previously explored a visualisation  of gradient-based optimisation. To this end we plotted an optimisation path against the background of its corresponding loss surface.

Remark: the loss surface  is a function that is typically used to describe the badness of fit of an optimisation.

Following the logic of a  tutorial from tensorflow2 , in the previous case we looked at a very simple  optimisation, maybe one of the simplest of all : we used linear regression. As data points we used  a straight line with mild noise (= a little bit of noise,  but not too much). Further, we used the offset and the inclination as fitting parameters.  As a loss function we used a quadratic function,  which is very typical. It turns out that this produces a very well behaved loss surface,  with a simple valley, which the optimiser has no problem of finding.

As a special case, the surface can be one-dimensional. In this current post, for simplicity, we use only one optimisation parameter, therefore the surface is one-dimensional. The fitting parameter we use here is the straight line’s angle. Despite this one-dimensional simplicity, we construct a surface which is reasonably hard for the gradient-based optimiser,  and in fact can derail it.

The primary purpose of this post is to construct from artificial data and  visualise a loss surface that is not well behaved, and which in particular has many minima.

Continue reading “Loss surface with multiple valleys”