Matrix operations with pytorch – optimizer – part 1

pytorch – playing with tensors

This blog post is part of a 3 post miniseries. 

The point of the entire miniseries is to reproduce matrix operations such as matrix inverse and svd using pytorch’s automatic differentiation capability.

These algorithms are already implemented in pytorch itself and other libraries such as scikit-learn. However, we will solve this problem in a general way using gradient descent. We hope that this will provide an understanding of the power of the gradient method in general and the capabilities of pytorch in particular.

To avoid reader fatigue, we present the material in 3 posts:

  • A introductory section: pytorch – playing with tensors demonstrates some basic tensor usage. This also shows how to calculate various derivatives.
  • A main section: pytorch – matrix inverse with pytorch optimizer shows how to calculate the matrix inverse​1​ using gradient descent.
  • An advanced section: SVD with pytorch optimizer shows how to do singular value decomposition​1,2​ with gradient descent.

The code of this post is provided in a jupyter notebook on github:

https://github.com/hfwittmann/matrix-operations-with-pytorch/blob/master/Matrix-operations-with-pytorch-optimizer/00—pytorch-playing-with-tensors.ipynb

Remark: the following part of the post is directly written in a Jupyter notebook. It is displayed via a very nice wordpress plugin nbconvert​3​.

This notebook pytorch - playing with tensors demonstrates some basic tensor usage. It also shows how to calculate various derivatives.

In [0]:
# We start by importing a few packages

import torch
from torch import tensor, manual_seed, rand
import math
import numpy as np
import pandas as pd

import matplotlib.pyplot as plt

import plotly.graph_objects as go
from plotly import express as px

Basic use of tensor library

Tensors are to the torch package what arrays are to numpy. If you are familiar with one, you can easily handle the other.

Create one-dimensional tensor

In [0]:
_123 = tensor([1,2,3])
_123
Out[0]:
tensor([1, 2, 3])

... multiply with a constant

In [0]:
_123 * 7
Out[0]:
tensor([ 7, 14, 21])

... dot product

In [0]:
_456 = tensor([4,5,6])

Dot product by hand:

Let's caluculate it "by hand":

14 + 25 + 3*6 = 32

Dot product basic:

Calculate it using basic operators: Multiply the elements of the vectors, and sum up:

In [0]:
dot_basic = (_123 * _456).sum()
assert tensor(32) == dot_basic, 'Should match dot_basic'

Dot product elegant:

Now let's calculate the dot product with the matrix operator @

In [0]:
dot_elegant = _123 @ _456
_123 = tensor([1,2,3])
assert tensor(32) == dot_elegant, 'Should match dot_elegant'

Random Matrix

In [0]:
# set seed for reproducibility
manual_seed(314)

rand(size=[2,3])
Out[0]:
tensor([[0.7196, 0.6295, 0.6667],
        [0.3385, 0.8522, 0.3126]])

Magic Differentiation

This part relies on the magic of automatic differentiation. This magic is available in pytorch via the package autograd. This magic underpins all the recent successes of neural networks, because with it you dont have the tedius calculation by hand of derivatives which would otherwise be necessary. Derivatives in turn are of course necessary for the Stochastic Gradient Descent which is the Workhorse Opimisation Technique of Deep Learning. Without Automatic Differentiation thare would be no Deep Learning Renaissance. So let's do it. Let's use the magic.

In [0]:
x = tensor(3.,requires_grad=True)
y = tensor(7.,requires_grad=True)
z = x * y**5
In [0]:
# calculate derivatives, with respect to scalars
z.backward()

We have

  • z = x*y**5

Therefore by the use of calculus we know that the derivative of z:

  • with respect to x is y**5
  • with respect to y is x 5 y**4

Remark: In technical lingo these "derivatives with respect to ..." are known as partial derivatives.

Let's check whether we can calculate this using pytorch.

First we note some values

In [0]:
7**5, 5 * 7**4, 3 * 5 * 7**4
Out[0]:
(16807, 12005, 36015)

Let's check the derivative with respect to x

In [0]:
assert x.grad == tensor(16807), 'Derivative of z with respect to x should match 16807'
assert x.grad == tensor(7**5), 'Derivative of z with respect to x should match 7^5'
assert x.grad == y**5, 'Derivative of z with respect to x should match y^5'
In [0]:
 

Let's check the derivative with respect to y

In [0]:
assert y.grad == tensor(36015), 'Derivative of z with respect to y should match 36015'
assert y.grad == tensor(3 * 5 * 7**4), 'Derivative of z with respect to y should match 3 * 5 * 7**4'
assert y.grad == 3 * 5 * y**4, 'Derivative of z with respect to y should match 3 * 5 * y**4'
In [0]:
 

Let's ramp it up a little, calculating the derivatives of a function (version 1)

In [0]:
# first we initiaise a x tensor(in this case a vector), starting at 0, and ending at 2*pi 
x = torch.linspace(0, 2*math.pi, requires_grad=True)

# next we initialise the y values as being the sin of the x values
y = torch.sin(x)
In [0]:
# now we calculate derivatives
# https://stackoverflow.com/questions/55749202/getting-gradient-of-vectorized-function-in-pytorch
y.backward(torch.ones_like(x)) # this construct is needed because x is not a scalar, see link above
In [0]:
dy_vs_dx_version1 = x.grad# collect the derivatives with respect to the x-values
In [0]:
# prepare data for plotting
# --- to plot we must transform pytorch tensors to numpy arrays
# --- by using .detach().numpy()
X = x.detach().numpy()
Y = y.detach().numpy()
dY_vs_dX_version1 = dy_vs_dx_version1.detach().numpy()

Let us plot the result using plotly

In [0]:
data = [] # the date for the plot is collected in a list of traces
trace0 = go.Scatter(x=X, y=Y, name='Sin')
trace1 = go.Scatter(x=X, y=dY_vs_dX_version1, name='Aut. diff of Sin', mode='markers')
trace2 = go.Scatter(x=X, y=np.cos(X), name='Cos', mode='lines')

data.append(trace0)
data.append(trace1)
data.append(trace2)

# the layout sets some important plotting aspects such as the title
layout = go.Layout(
    title='Automatic Differentiation of function',
    legend_orientation="h"
    )

# the figure has two inputs data and layout
fig = go.Figure(data=data, layout=layout)

fig.show()

As you can see by zooming in, the result of of automatic differentation matches faithfully the expected cos function. Hurray!

This is the magic in action!

Let's dress it up as an optimisation with a loss function (version 2)

We may also dress this calculation up in a fashion that is more akin to the way neural networks optimisations are done. This is by way of using a loss function. However to achieve the desired derivative, we need a loss function that is atypical. Attention : loss function that is atypical; for regressions for instance the mean squared error is typical

In [0]:
# define a special loss function for this particular purpose
def myloss(y_hat, y): return (y_hat-y).sum()
In [0]:
x = torch.linspace(0, 2*math.pi, requires_grad=True)
y = torch.sin(x)

Y_hat = y
Y_true = 0 * Y_hat

Define loss and calculate derivatives

In [0]:
loss = myloss(Y_hat, Y_true)

# now we calculate derivatives
loss.backward()
dy_vs_dx_version2 = x.grad# collect the derivatives with respect to the x-values
In [0]:
dy_vs_dx_version2 = x.grad# collect the derivatives with respect to the x-values
In [0]:
Y_hat.grad
In [0]:
# prepare data for plotting
# --- to plot we must transform pytorch tensors to numpy arrays
# --- by using .detach().numpy()
X = x.detach().numpy()
Y = y.detach().numpy()
dY_vs_dX_version2 = dy_vs_dx_version2.detach().numpy()
In [0]:
import plotly.graph_objects as go
from plotly import express as px

data = [] # the date for the plot is collected in a list of traces
trace0 = go.Scatter(x=X, y=Y, name='Sin')
trace1 = go.Scatter(x=X, 
                    y=dY_vs_dX_version1, 
                    name='Aut. diff of Sin',
                    mode='markers', 
                    marker=dict(
                        symbol='circle',
                        opacity=0.5,size=10)
                    )

trace2 = go.Scatter(x=X, y=np.cos(X), name='Cos', mode='lines')
trace3 = go.Scatter(x=X, 
                    y=dY_vs_dX_version2, 
                    name='Aut. diff of Sin loss', 
                    mode='markers', 
                    marker=dict(
                        symbol='star-triangle-up',
                        opacity=0.5, 
                        size=10))

data.append(trace0)
data.append(trace1)
data.append(trace2)
data.append(trace3)

# the layout sets some important plotting aspects such as the title
layout = go.Layout(title='Automatic Differentiation of function')

# the figure has two inputs data and layout
fig = go.Figure(data=data, layout=layout)

# this is a way of updating the some figure parameters
fig.update_layout(legend_orientation="h")

fig.show()

Again one can see that the result of the second automatic differentation using the loss function matches faithfully the expected cos function. Hurray!

This concludes the first part of miniseries pytorch - playing with tensors.

  1. 1.
    Invertible matrix. Wikipedia. https://en.wikipedia.org/wiki/Invertible_matrix. Published February 6, 2020. Accessed February 6, 2020.
  2. 2.
    Singular value decomposition. Wikipedia. https://en.wikipedia.org/wiki/Singular_value_decomposition. Published February 6, 2020. Accessed February 6, 2020.
  3. 3.
    Challis A. PHP: nbconvert – A wordpress plugin for Jupyter notebooks.   . https://www.andrewchallis.co.uk/. Published May 1, 2019. Accessed February 6, 2020.