0%
April 16, 2022

Discrete Cosine Transform and JPEG Compression Implementation in Python

math

python

Introduction: Why Care DCT?

We have been using Discrete Cosine Transform (DCT) without possibly awaring of it: DCT is actually the JPEG compression standard!

In audio analysis, DCT is used in computing MFCC coefficients to extract feature vector (for more on MFCCs, see [KC] in reference section), which makes deep learning on sound possible by

  • Treating the sequence of MFCC coefficients as a time sequence of data (at each "window" of suitable "hop length", we get a float vector of fixed size) or;

  • Stacking the MFCC coefficients and treat it as an image.

In the former case the use of sequence model such as LSTM or Transformer becomes possible. The later case makes it possible to use convolution layer such as Conv1D in Keras.

Basic Knowledge to Work With

Definitions

In what follows for we will denote

For each fixed we will consider as a function on a discrete domain. The vectors

form an orthonormal basis in , by stacking all them together column by column

then is orthonormal: .

The orthogonality of and the construction of the basis will be explained in Mathematics behind DCT section. For now let's enjoy the coding and see the results.

Computations

We make explicit calculations in order to make readers comfortable with the definition above.

First, given a gray-scale image there are always unique coefficients such that

In fact by using defined in the previous section. Denote also as the matrix with entry at -th row and -th column, if we define , then

You can expand the RHS () to convince yourself that eventually you get the same expression as in . On the other hand, by using this we have the reverse:

as this is not a summation of 's of fixed frequencies, that's why we don't define .

Our coding will strictly follow the notation and calculation in this section.

Implementation of JPEG Compression in Python

For readers who are more familiar with matlab, you may also follow the youtube video listed in [ET]. Most of the content is a translation from matlab to python based on my understanding.

Notation and definition may be different from the video.

Basic Import
import numpy as np
import matplotlib.pyplot as plt
from tensorflow.keras.preprocessing.image import img_to_array, load_img
%matplotlib inline
Constants
N = 8
IMAGE_PATH = "512_512_image.jpg"
Basis of Cosine Transform
def range2D(n):
  indexes = []
  for i in range(0, n):
    for j in range(0, n):
      indexes.append((i, j))
  return indexes

def alpha(p):
  return 1/np.sqrt(N) if p == 0 else np.sqrt(2/N)

def cos_basis(i,j):
  def cal_result(a):
    x=a[0]
    y=a[1]

    return alpha(i) * alpha(j) * np.cos((2*x+1) * i * np.pi/(2*N)) *  np.cos((2*y+1) * j * np.pi/(2*N))
  return cal_result
Plot of the 2D Cosine Basis Functions
fig = plt.figure()
fig.set_figheight(12)
fig.set_figwidth(12)
xy_range = range2D(N)
xy_plane_NxN = (np.array(xy_range).reshape(N,N,2))

for i in range(0, N):
  for j in range(0, N):
    xy_result = np.apply_along_axis(cos_basis(i,j), -1, xy_plane_NxN)
    plt.subplot(N, N, 1 + N*i + j)
    plt.imshow(xy_result, cmap="gray")
Compute Coefficients for Cosine Transform

Let's prepare the following matrix that convert the original pixels into compacted energy distribution:

U = np.zeros((N,N))

for i in range(0, N):
  for j in range(0, N):
    U[i, j] = alpha(j) * np.cos((2*i+1)*j*np.pi/(2*N))

U_t =  U.transpose()

You may check that U is indeed orthonormal by:

np.matmul(U, U_t)

# Result:
# array([[ 1.00000000e+00, -1.01506949e-16, -3.78598224e-17,
#         -1.03614850e-17, -1.33831237e-16,  7.88118312e-17,
#          2.00856868e-16, -1.70562493e-16],
#        [-1.01506949e-16,  1.00000000e+00,  1.70298111e-17,
#         -1.13980777e-17,  1.37840552e-16, -2.88419699e-16,
#         -1.24636773e-16,  2.51800326e-17],
#        [-3.78598224e-17,  1.70298111e-17,  1.00000000e+00,
#         -1.61419161e-16, -2.81922100e-17, -1.25319880e-16,
#         -1.83252893e-16, -2.79432802e-17],
#        [-1.03614850e-17, -1.13980777e-17, -1.61419161e-16,
#          1.00000000e+00, -1.92659768e-17,  2.75108042e-16,
#          2.02487213e-16,  2.22263103e-18],
#        [-1.33831237e-16,  1.37840552e-16, -2.81922100e-17,
#         -1.92659768e-17,  1.00000000e+00, -8.54966705e-17,
#         -2.36183557e-17, -3.71508058e-16],
#        [ 7.88118312e-17, -2.88419699e-16, -1.25319880e-16,
#          2.75108042e-16, -8.54966705e-17,  1.00000000e+00,
#          1.21575873e-16,  8.41357861e-17],
#        [ 2.00856868e-16, -1.24636773e-16, -1.83252893e-16,
#          2.02487213e-16, -2.36183557e-17,  1.21575873e-16,
#          1.00000000e+00,  3.92417644e-16],
#        [-1.70562493e-16,  2.51800326e-17, -2.79432802e-17,
#          2.22263103e-18, -3.71508058e-16,  8.41357861e-17,
#          3.92417644e-16,  1.00000000e+00]])
Mask or Truncate the Energy Distribution

Here we use a naive approach, we screen out the fourier coefficents by simply retaining those from the upper-left corner. For that, we create a mask:

def create_mask(closure):
  mask = np.zeros((N, N))
  for i in range(N):
    for j in range(N):
      if closure(i,j):
        mask[i, j] = 1
  return mask

For example, by calling mask = create_mask(lambda i,j: 0<=i+j<=2) our mask will be like:

[[1. 1. 1. 0. 0. 0. 0. 0.]
 [1. 1. 0. 0. 0. 0. 0. 0.]
 [1. 0. 0. 0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 0. 0. 0. 0.]]
Result: Apply Cosine Transform
original = load_img(IMAGE_PATH, target_size=(512, 512), color_mode="grayscale")
original = img_to_array(original).astype("uint8")
original = np.squeeze(original)
original.shape

mask = create_mask(lambda i,j: 0<=i+j<=2)
print(mask)
steps = int(original.shape[0]/N) # we choose 512x512 image, therefore steps = 64
print(steps)
compressed = np.zeros_like(original)

for x in range(steps):
  for y in range(steps):
    sub_pixels = original[N * y: N*(y+1), N * x: N*(x+1)]
    sub_pixels = sub_pixels - 127.5
    fourier_coefficients = np.matmul(np.matmul(U_t, sub_pixels), U)
    fourier_coefficients = fourier_coefficients * mask
    reverted_pixels = np.matmul(np.matmul(U, fourier_coefficients), U_t)
    reverted_pixels = reverted_pixels + 127.5
    compressed[N * y: N*(y+1), N * x: N*(x+1)] = reverted_pixels

fig.set_figheight(20)
fig.set_figwidth(20)
plt.subplot(1, 2, 1)
plt.imshow(original, cmap="gray")
plt.subplot(1, 2, 2)
plt.imshow(compressed, cmap="gray")
plt.savefig("DCT_result", dpi=200, bbox_inches="tight")

Mathematics behind DCT

Notation. Every vector will be considered as a column vector whenever the computation does not make sense when they are represented as a row.

From DFT

Let be an integer and denote , define then the matrix

is orthogonal simply because their inner product

is when , where . Therefore the basis forms an orthogonal basis in .

But how to find a similar basis in ? It is not as simple as taking the real part of the linear combination since the coordinate even .

DCT-2
Strategy of Construction

Consider the symmetric second difference matrix:

which arises when a second derivate is approximated by the central second difference:

as .

Let denote the values of a function evaluated on the discretized domain of , i.e., for some , where : means stacking values by running through all indexes at that position.

Our cosine transform is based on decomposing a function into a linear combination of cosine functions (of different frequencies) in discrete case. To find them, we will choose such that . By the fact that:

Fact. Eigenvectors associated to different eigenvalues are linearly independent.

we then obtain a set of eigenvectors (of different eigenvalues) solving the approximated problem

Which function would solve the ODE ? Which is the trigonometric function!

Why and ?

These two assignments are determined by imposing boundary conditions. More specific, we extend the domain from to , imagine we are solving (append one point at the beginning and the tail of respectively).

We have not imposed any boundary condition to our solution yet. We will require our function be symmetric at (or at if , ), this implies .

At another boundary due to the shift above, we try to require , this implies .

and plug this condition into the second difference formula to get:

These become the necessary condition and thus have determined our and , and therefore ensure our solution is a cosine function.

Different values of 's and 's will correspond to different boundary condition imposed on for other combinations, see [GS] for a complete classification.

The solution corresponding to and is usally called the basis of DCT-2 (or simply DCT).

Derivation of the Basis of Discrete Cosine Transform

We have spent many effort to determine the matrix to work with, let's start with computing the basis directly for our only candidate:

Let for some , denote

note that we have whenever for integer . As we will see this is indeed the case later for .

Since , then for , coordinate-wise:

Therefore for whatever linear we choose (here ).

By taking real part on both sides, as is real, we have

here denotes the standard basis in . It remains to require satisfies and , from which we can determine and thus .

The equation implies , since is even, it is sufficient to require

The equation implies

for this to hold, it is sufficient to require (as is always symmetric about for every ), altogether we have , and thus for ,

Finally, we have

for suitably chosen 's, .

Reference