## Corona dashboard

Everyone’s at it: The number of corona dashboards is spreading like a virus. So here’s another one:

https://covid19.arthought.com/

Probably the most interesting story is the large disparity of the national death rate. For instance Germany has a much lower death rate than Italy.

As of 20 March 2020 the numbers are (rounded):

So the ratio of death rates is a factor ~ 30.

This topic has already been much discussed, but to me this is a huge mystery, the solution of which will likely be important in defeating this virus.

That is it for now.

## Matrix operations with pytorch – optimizer – addendum

This blog post is an addendum to a 3 post miniseries​1​

Here were present 2 notebooks.

• 02a—SVD-with-pytorch-optimizer-SGD.ipynb

Is a dropin replacement of the stochastic gradient + momentum method shown earlier​2​, but with using the inbuilt pytorch sgd optimiser.

Uses the inbuild pytorch adam optimizer – rather than the sgd optimiser. As known in the literature, the adam optimiser shows better results​3​.

## Matrix operations with pytorch – optimizer – part 3

SVD with pytorch optimizer

This blog post is part of a 3 post miniseries.

Today’s post in particular covers the topic SVD with pytorch optimizer.

The point of the entire miniseries is to reproduce matrix operations such as matrix inverse and svd using pytorch’s automatic differentiation capability.

These algorithms are already implemented in pytorch itself and other libraries such as scikit-learn. However, we will solve this problem in a general way using gradient descent. We hope that this will provide an understanding of the power of the gradient method in general and the capabilities of pytorch in particular.

## Matrix operations with pytorch – optimizer – part 2

pytorch – matrix inverse with pytorch optimizer

This blog post is part of a 3 post miniseries.

Today’s post in particular covers the topic pytorch – matrix inverse with pytorch optimizer.

The point of the entire miniseries is to reproduce matrix operations such as matrix inverse and svd using pytorch’s automatic differentiation capability.

These algorithms are already implemented in pytorch itself and other libraries such as scikit-learn. However, we will solve this problem in a general way using gradient descent. We hope that this will provide an understanding of the power of the gradient method in general and the capabilities of pytorch in particular.

## Matrix operations with pytorch – optimizer – part 1

pytorch – playing with tensors

This blog post is part of a 3 post miniseries.

The point of the entire miniseries is to reproduce matrix operations such as matrix inverse and svd using pytorch’s automatic differentiation capability.

These algorithms are already implemented in pytorch itself and other libraries such as scikit-learn. However, we will solve this problem in a general way using gradient descent. We hope that this will provide an understanding of the power of the gradient method in general and the capabilities of pytorch in particular.

To avoid reader fatigue, we present the material in 3 posts:

• A introductory section: pytorch – playing with tensors demonstrates some basic tensor usage. This also shows how to calculate various derivatives.
• A main section: pytorch – matrix inverse with pytorch optimizer shows how to calculate the matrix inverse​1​ using gradient descent.
• An advanced section: SVD with pytorch optimizer shows how to do singular value decomposition​1,2​ with gradient descent.

The code of this post is provided in a jupyter notebook on github:

https://github.com/hfwittmann/matrix-operations-with-pytorch/blob/master/Matrix-operations-with-pytorch-optimizer/00—pytorch-playing-with-tensors.ipynb

Remark: the following part of the post is directly written in a Jupyter notebook. It is displayed via a very nice wordpress plugin nbconvert​3​.

Continue reading “Matrix operations with pytorch – optimizer – part 1”

## Colab, MLflow and papermill

Machine learning with the maximum of free GPU currently available plus the ability to keep a neat log of your data science experiments. Interested? My article shows a deep dive solution.

Quick summary

ColabMLflow and papermill are individually great. Together they form a dream team.

Colab is great for running  notebooks, MLflow keeps records of your results and papermill can parametrise a notebook, run it and save a copy.

All three are backed by top tier American companies, Colab by Google, MLflow by Databricks and papermill by Netflix.

## Loss surface with multiple valleys

This post is a follow-up of 1.

We start off with an eye-catching plot, representing the functioning of an optimiser using the stochastic gradient method. The plot is explained in more detail further below.

Visualisation of a loss surface with multiple minima. The surface is in gray, the exemplary path taken by the optimiser  is in colors.

We had previously explored a visualisation  of gradient-based optimisation. To this end we plotted an optimisation path against the background of its corresponding loss surface.

Remark: the loss surface  is a function that is typically used to describe the badness of fit of an optimisation.

Following the logic of a  tutorial from tensorflow2 , in the previous case we looked at a very simple  optimisation, maybe one of the simplest of all : we used linear regression. As data points we used  a straight line with mild noise (= a little bit of noise,  but not too much). Further, we used the offset and the inclination as fitting parameters.  As a loss function we used a quadratic function,  which is very typical. It turns out that this produces a very well behaved loss surface,  with a simple valley, which the optimiser has no problem of finding.

As a special case, the surface can be one-dimensional. In this current post, for simplicity, we use only one optimisation parameter, therefore the surface is one-dimensional. The fitting parameter we use here is the straight line’s angle. Despite this one-dimensional simplicity, we construct a surface which is reasonably hard for the gradient-based optimiser,  and in fact can derail it.

The primary purpose of this post is to construct from artificial data and  visualise a loss surface that is not well behaved, and which in particular has many minima.

## Comparison of a very simple regression in tensorflow and keras

In this short post we perform a comparative  analysis of a very simple regression problem in tensorflow and keras.

We start off with an eye-catching plot, representing the functioning of an optimizer using the stochastic gradient method. The plot is explained in more detail further below.

A 3 rotatable version of the Loss function of the regression problem. For the hosting we use the free service by plotly. The black line is the path taken by the optimiser.  W, b are the slope and offset parameters of the model. A full view can be found here:  https://plot.ly/~hfwittmann/5/loss-function/#/

The focus is on  the first principles of gradient descent. We replicate the results of 1,2. The post uses a Gradient Tape which in turn makes use of Automatic differentiation 3,4.

In the original implementation in 1, the training and testing data are not separate. The motivation behind the original version is – doubtless – to keep things as simple as possible, and to omit everything unimportant. We feel however, that it might be confusing to not have the training / testing split. Therefore we use a train/test split in the notebooks covered in this post.

Custom_training_basics_standard_optimizer

Here we present first a “split-variation” of the original version, where the training and testing are in fact split.

We add two more notebooks that are replications of the split-variation, these are in particular:

• A tensorflow-based replication with a standard optimizer
• A tensorflow/keras implementation.

Please note that all three workbooks are self.contained. Moreover, the results are exactly the same between the notebooks.

As usual the code/notebooks can be found on github:

https://github.com/hfwittmann/comparison-tensorflow-keras