Warning: jsMath requires JavaScript to process the mathematics on this page.
If your browser supports JavaScript, be sure it is enabled.

Gradient Descent Methods

This tour explores the use of gradient descent method for unconstrained and constrained optimization of a smooth function

Contents

Installing toolboxes and setting up the path.

You need to download the following files: signal toolbox and general toolbox.

You need to unzip these toolboxes in your working directory, so that you have toolbox_signal and toolbox_general in your directory.

For Scilab user: you must replace the Matlab comment '%' by its Scilab counterpart '//'.

Recommandation: You should create a text file named for instance numericaltour.sce (in Scilab) or numericaltour.m (in Matlab) to write all the Scilab/Matlab command you want to execute. Then, simply run exec('numericaltour.sce'); (in Scilab) or numericaltour; (in Matlab) to run the commands.

Execute this line only if you are using Matlab.

getd = @(p)path(p,path); % scilab users must *not* execute this

Then you can add the toolboxes to the path.

getd('toolbox_signal/');
getd('toolbox_general/');

Gradient Descent for Unconstrained Problems

We consider the problem of finding a minimum of a function \(f\), hence solving \[ \umin{x \in \RR^d} f(x) \] where \(f : \RR^d \rightarrow \RR\) is a smooth function.

Note that the minimum is not necessarily unique. In the general case, \(f\) might exhibit local minima, in which case the proposed algorithms is not expected to find a global minimizer of the problem. In this tour, we restrict our attention to convex function, so that the methods will converge to a global minimizer.

The simplest method is the gradient descent, that computes \[ x^{(k+1)} = x^{(k)} - \tau_k \nabla f(x^{(k)}), \] where \(\tau_k>0\) is a step size, and \(\nabla f(x) \in \RR^d\) is the gradient of \(f\) at the point \(x\), and \(x^{(0)} \in \RR^d\) is any initial point.

In the convex case, if \(f\) is of class \(C^2\), in order to ensure convergence, the step size should satisfy \[ 0 < \tau_k < \frac{2}{ \sup_x \norm{Hf(x)} } \] where \(Hf(x) \in \RR^{d \times d}\) is the Hessian of \(f\) at \(x\) and \( \norm{\cdot} \) is the spectral operator norm (largest eigenvalue).

Gradient Descent in 2-D

We consider a simple problem, corresponding to the minimization of a 2-D quadratic form \[ f(x) = \frac{1}{2} \pa{ x_1^2 + \eta x_2^2, } \] where \( \eta>0 \) controls the anisotropy, and hence the difficulty, of the problem.

Anisotropy parameter \(\eta\).

eta = 10;

Function \(f\).

f = @(x)( x(1)^2 + eta*x(2)^2 ) /2;

Background image of the function.

t = linspace(-.7,.7,101);
[u,v] = meshgrid(t,t);
F = ( u.^2 + eta*v.^2 )/2 ;

Display the function as a 2-D image.

clf; hold on;
imagesc(t,t,F); colormap jet(256);
contour(t,t,F, 20, 'k');
axis off; axis equal;

Gradient.

Gradf = @(x)[x(1); eta*x(2)];

The step size should satisfy \(\tau_k < 2/\eta\). We use here a constrant step size.

tau = 1.8/eta;

Exercice 1: (the solution is exo1.m) Perform the gradient descent using a fixed step size \(\tau_k=\tau\). Display the decay of the energy \(f(x^{(k)})\) through the iteration. Save the iterates so that X(:,k) corresponds to \(x^{(k)}\).

exo1;

Display the iterations.

clf; hold on;
imagesc(t,t,F); colormap jet(256);
contour(t,t,F, 20, 'k');
h = plot(X(1,:), X(2,:), 'k.-');
set(h, 'LineWidth', 2);
set(h, 'MarkerSize', 15);
axis off; axis equal;

Exercice 2: (the solution is exo2.m) Display the iteration for several different step sizes.

exo2;

Gradient and Divergence of Images

Local differential operators like gradient, divergence and laplacian are the building blocks for variational image processing.

Load an image \(g \in \RR^N\) of \(N=n \times n\) pixels.

n = 256;
g = rescale( load_image('lena',n) );

Display it.

clf;
imageplot(g);

For continuous functions, the gradient reads \[ \nabla g(x) = \pa{ \pd{g(x)}{x_1}, \pd{g(x)}{x_2} } \in \RR^2. \] (note that here, the variable \(x\) denotes the 2-D spacial position).

We discretize this differential operator using first order finite differences. \[ (\nabla g)_i = ( g_{i_1,i_2}-g_{i_1-1,i_2}, g_{i_1,i_2}-g_{i_1,i_2-1} ) \in \RR^2. \] Note that for simplity we use periodic boundary conditions.

Compute its gradient, using finite differences.

s = [n 1:n-1];
grad = @(f)cat(3, f-f(s,:), f-f(:,s));

One thus has \( \nabla : \RR^N \mapsto \RR^{N \times 2}. \)

v = grad(g);

One can display each of its components.

clf;
imageplot(v(:,:,1), 'd/dx', 1,2,1);
imageplot(v(:,:,2), 'd/dy', 1,2,2);

One can also display it using a color image.

clf;
imageplot(v);

One can display its magnitude \(\norm{\nabla g(x)}\), which is large near edges.

clf;
imageplot( sqrt( sum3(v.^2,3) ) );

The divergence operator maps vector field to images. For continuous vector fields \(v(x) \in \RR^2\), it is defined as \[ \text{div}(v)(x) = \pd{v_1(x)}{x_1} + \pd{v_2(x)}{x_2} \in \RR. \] (note that here, the variable \(x\) denotes the 2-D spacial position). It is minus the adjoint of the gadient, i.e. \(\text{div} = - \nabla^*\).

It is discretized, for \(v=(v^1,v^2)\) as \[ \text{div}(v)_i = v^1_{i_1+1,i_2} - v^1_{i_1,i_2} + v^2_{i_1,i_2+1} - v^2_{i_1,i_2} . \]

t = [2:n 1];
div = @(v)v(t,:,1)-v(:,:,1) + v(:,t,2)-v(:,:,2);

The Laplacian operatore is defined as \(\Delta=\text{div} \circ \nabla = -\nabla^* \circ \nabla\). It is thus a negative symmetric operator.

delta = @(f)div(grad(f));

Display \(\Delta f_0\).

clf;
imageplot(delta(g));

Check that the relation \( \norm{\nabla f} = - \dotp{\Delta f}{f}. \)

dotp = @(a,b)sum(a(:).*b(:));
fprintf('Should be 0: %.3i\n', dotp(grad(g), grad(g)) + dotp(delta(g),g) );
Should be 0: 1.933e-12

Gradient Descent in Image Processing

We consider now the problem of denoising an image \(y \in \RR^d\) where \(d = n \times n\) is the number of pixels (\(n\) being the number of rows/columns in the image).

Add noise to the original image, to simulate a noisy image.

sigma = .1;
y = g + randn(n)*sigma;

Display the noisy image \(y\).

clf;
imageplot(clamp(y));

Denoising is obtained by minimizing the following functional \[ \umin{x \in \RR^d} f(x) = \frac{1}{2} \norm{ y-x}^2 + \la J(x) \] where \(J(x)\) is a smoothed total variation of the image. \[ J(x) = \sum_i \norm{ (G x)_i }_{\epsilon} \] where \( (Gx)_i \in \RR^2 \) is an approximation of the gradient of \(x\) at pixel \(i\) and for \(u \in \RR^2\), we use the following smoothing of the \(L^2\) norm in \(\RR^2\) \[ \norm{u}_\epsilon = \sqrt{ \epsilon^2 + \norm{u}^2 }, \] for a small value of \(\epsilon>0\).

The gradient of the functional read \[ \nabla f(x) = x-y + \lambda \nabla J(x) \] where the gradient of the smoothed TV norm is \[ \nabla J(x)_i = G^*( u ) \qwhereq u_i = \frac{ (G x)_i }{\norm{ (G x)_i }_\epsilon} \] where \(G^*\) is the adjoint operator of \(G\) which corresponds to minus a discretized divergence.

Value for \(\lambda\).

lambda = .3/5;

Value for \(\epsilon\).

epsilon = 1e-3;

TV norm.

Amplitude = @(u)sqrt(epsilon^2 + sum(u.^2,3));
J = @(x)sum(sum(Amplitude(Grad(x))));

Function to minimize.

f = @(x)1/2*norm(x-y,'fro')^2 + lambda*J(x);

Gradient of TV norm. Note that div implement \(-G^*\).

Normalize = @(u)u./repmat(Amplitude(u), [1 1 2]);
GradJ = @(x)-div( Normalize(grad(x)) );

Gradient of the functional.

Gradf = @(x)x-y+lambda*GradJ(x);

The step size should satisfy \[ 0 < \tau_k < \frac{2}{ 1 + 4 \lambda / \epsilon }. \] Here we use a slightly larger step size, which still work in practice.

tau = 1.8/( 1 + lambda*8/epsilon );
tau = tau*4;

Exercice 3: (the solution is exo3.m) Implement the gradient descent. Monitor the decay of \(f\) through the iterations.

exo3;
Warning: Could not find an exact (case-sensitive) match for 'Grad'.
/Users/gabrielpeyre/Documents/Dropbox/numerical-tour/matlab/toolbox_signal/grad.m is a
case-insensitive match and will be used instead.
You can improve the performance of your code by using exact
name matches and we therefore recommend that you update your
usage accordingly. Alternatively, you can disable this warning using
warning('off','MATLAB:dispatcher:InexactCaseMatch').
This warning will become an error in future releases. 

Display the resulting denoised image.

clf;
imageplot(clamp(x));