Video Coding
This numerical tour explores video coding, in partiular optical flow computation and residual coding.
Contents
Installing toolboxes and setting up the path.
You need to download the following files: signal toolbox and general toolbox.
You need to unzip these toolboxes in your working directory, so that you have toolbox_signal and toolbox_general in your directory.
For Scilab user: you must replace the Matlab comment '%' by its Scilab counterpart '//'.
Recommandation: You should create a text file named for instance numericaltour.sce (in Scilab) or numericaltour.m (in Matlab) to write all the Scilab/Matlab command you want to execute. Then, simply run exec('numericaltour.sce'); (in Scilab) or numericaltour; (in Matlab) to run the commands.
Execute this line only if you are using Matlab.
getd = @(p)path(p,path); % scilab users must *not* execute this
Then you can add the toolboxes to the path.
getd('toolbox_signal/'); getd('toolbox_general/');
Loading Warped Images
To evaluate the performance of optical flow computation, we compute a pair of image obtained by a smooth warping (small deformation), here a simple rotation.
Original frame #1.
n = 256;
name = 'lena';
M1 = rescale( load_image(name,n) );
The second image M2 is obtaind by rotating the first one.
% angle of rotation theta = .03 * pi/2; % original coordinates [Y,X] = meshgrid(1:n,1:n); % rotated coordinates X1 = (X-n/2)*cos(theta) + (Y-n/2)*sin(theta) + n/2; Y1 =-(X-n/2)*sin(theta) + (Y-n/2)*cos(theta) + n/2; % boundary handling X1 = mod(X1-1,n)+1; Y1 = mod(Y1-1,n)+1; % interpolation M2 = interp2(Y,X,M1,Y1,X1); M2(isnan(M2)) = 0;
Display the two images.
clf; imageplot(M1, 'Frame #1', 1,2,1); imageplot(M2, 'Frame #2', 1,2,2);
Optical Flow Computation with Block Matching
An optical flow is a vector field that describes the movement between to consecutive frames of the video.
The flow can be computed by block matching. A block of (2*k+1,2*k+1) pixels in frame 1 around a location (x,y) is compared to the blocks at locations (x+dx,y+dy) for -q<=dy,dx<=q in the frame 2.
% width of the block w = 8; % search width q = 4; % sub-pixelic search if <1 dq = .5;
Number of flow vector is m^2.
m = ceil(n/w);
Precompute movements vectors.
[X0,Y0,dX,dY] = ndgrid( 0:w-1, 0:w-1, -q:dq:q,-q:dq:q); [dy,dx] = meshgrid(-q:dq:q,-q:dq:q);
Start with empty optical flow. Each f=F(x,y,:) is a 2D vector mapping the patch at location (x,y) to the patch (x+f(1),y+f(2).
F = zeros(n,n,2);
Example of block number for wich the flow is computed. Each index should be less than m
i = 3; j = 40;
Pixel numbers.
x = (i-1)*w+1; y = (j-1)*w+1;
Block pixels index.
selx = clamp( (i-1)*w+1:i*w, 1,n); sely = clamp( (j-1)*w+1:j*w, 1,n);
A special care should be taken at the boundary : we simply clamp values outside boundaries
X = clamp(x + X0 + dX,1,n); Y = clamp(y + Y0 + dY,1,n);
Compute base patch of M2 at which the flow is computed.
P2 = M2(selx,sely);
Compute patches of M1 that are matched. Use interpolation to handle non indeger pixel indexes.
P1 = interp2( 1:n,1:n, M1, Y,X );
Compute the distance between P1 and all the patches of P2.
d = sum(sum( (P1-repmat(P2,[1 1 size(P1,3) size(P1,4)])).^2 ) );
Compute best match and report its value.
[tmp,I] = compute_min(d(:)); F(selx,sely,1) = dx(I); F(selx,sely,2) = dy(I);
Exercice 1: (the solution is exo1.m) Compute the whole optical flow F, by cycling through the pixels.
exo1;
Display the flow as a color image and as arrows.
clf; imageplot(F, '', 1,2,1); subplot(1,2,2); t = w/2 + ((0:m-1)*w); [V,U] = meshgrid(t,t); hold on; imageplot(M1); quiver(t,t,F(1:w:n,1:w:n,2), F(1:w:n,1:w:n,1)); axis('ij');
Residual Computation
The optical flow F allows one to compute the residual R between frame M2 and an extrapolated version of M1 along the flow F.
One can translate the first frame M1 along the flow F.
% compute the grid, translated along the flow [Y,X] = meshgrid(1:n,1:n); X = clamp(X+F(:,:,1),1,n); Y = clamp(Y+F(:,:,2),1,n); % compute the first fame, translated along the flow Ms = interp2( 1:n,1:n, M1, Y,X );
One can compare the residual with and without the flow
% residual without flow R0 = M2-M1; % residual along the flow R = M2-Ms; % ensure same dynamic range (just for display) v = max( [max(abs(R0(:))) max(abs(R(:)))] ); R(1)=v; R(2)=-v; R0(1)=v; R0(2)=-v; % display clf; imageplot(R0, 'Residual without flow', 1,2,1); imageplot(R, 'Residual with flow', 1,2,2);
Wavelet Coding
This is an intermediate section that explains how to perform a wavelet coding of an image. This will be usefull to code the frame and the residual in the video coder.
We simulate a wavelet coding of an image (might be M(:,:,i) or a residual) by counting the number of bits of quantized wavelet coefficients.
An image to code.
A = M1;
Compute its wavelet coefficients.
Jmin = 4; AW = perform_wavelet_transf(A, Jmin, +1);
Step size for coding.
T = .2;
Quantize the coefficients.
Quantized integer values.
AWi = floor(abs(AW/T)).*sign(AW);
De-quantized values from vI
AWq= sign(AWi) .* (abs(AWi)+.5) * T;
Approximated image.
A1 = perform_wavelet_transf(AWq, Jmin, -1);
Probability distribution of the quantized coefficients.
t = min(AWi(:)):max(AWi(:)); h = hist(AWi(:), t); h = h/sum(h);
Estimate the number of bits using the entropy lower bound.
h = max(h,1e-9); % avoid division by 0
nbits = -sum( h.*log2(h) )*n^2;
Display.
clf; imageplot(A, 'Original', 1,2,1); imageplot(clamp(A1), [num2str(nbits/n^2,2) 'bpp, SNR=' num2str(snr(A,A1),3) 'dB'], 1,2,2);
Video Data.
A video data M is a [n,n,p] matrix where n is the dimension of the image and p is the number of frames.
We load 3 consecutive frames from a video.
% number of frames in the video p = 3; % load each frame name = 'rubik'; n = 200; M = []; for i=1:p A = load_image(strcat([name num2str(i)]),n); if nb_dims(A)==3 % remove color information A = mean(A,3); end M(:,:,i) = A; end M = rescale(M);
Display the frames.
clf; imageplot(M(:,:,1), 'Frame #1', 1,3,1); imageplot(M(:,:,2), 'Frame #2', 1,3,2); imageplot(M(:,:,3), 'Frame #3', 1,3,3);
One can compute the residual between two frames of the video.
clf; imageplot(M(:,:,1)-M(:,:,2), 'Residual 1', 1,2,1); imageplot(M(:,:,2)-M(:,:,3), 'Residual 2', 1,2,2);
Wavelet Video Coding
We describe here the step of a prototype video coding, that compresses using wavelets the first frame and the residual, and code apart the flow.
We code these two frames (you might want to try on your own data).
M1 = M(:,:,1); M2 = M(:,:,2);
Exercice 2: (the solution is exo2.m) Compute the optical flow F between the two frames using a given window size w. The number of bits needes to code the flow is nbitsF = 2*m^2*log2(2*q/dq+1).
exo2;
Exercice 3: (the solution is exo3.m) Compress the frames M1 and M2 using a threshold T to obtain compresed images M1c and M2c together with a number of bits nbitsM1 and nbitsM2.
exo3;
Exercice 4: (the solution is exo4.m) Compute the residual R=M2-M1cF between M2 and the warped version M1cF of M1c along the flow F. Note that you should use M1c and not M1.
exo4;
Exercice 5: (the solution is exo5.m) Compress the residual R using wavelet and a threshold lambda*T, where lambda is an constant that should be tuned to give the best results. You can start by using lambda=1/2 for instance. Record the number of bits nbitsR needed to code R and the decompressed approximation Rc.
exo5;
Exercice 6: (the solution is exo6.m) Perform the decompression by computing the decompressed version of M2 along the flow, M2cF=M1cF+Rc. Record the approximation errors for the video coder EVid = norm(M1-M1c,'fro')^2+norm(M2-M2cF,'fro')^2 and with the trivial coder (that code M1 and M2 separately) EWav = norm(M1-M1c,'fro')^2+norm(M2-M2c,'fro')^2. Store also the number of bits nbitsVid=nbitsM1+nbitsR+nbitsF and nbitsWav=nbitsM1+nbitsM2.
exo6;
Exercice 7: (the solution is exo7.m) Draw the SNR rate distortion curves -10*log10(Evid/E) and -10*log10(Ewav/E), where E=norm(M1,'fro')^2+norm(M2,'fro')^2 and a function of the number of bits nbitsVid and nbitsWav. This is achieved by varying the value of the threshold T. Check the influence of the parameters, namely w and lambda on the result.
exo7;