Morphing images using Linear Algebra
(This post is supposed to be the second for this year after a period of inactivity. But I could not decide how to present a light introduction to Linear Algebra, so I decided to just complete this one. We will tackle a unique application of linear algebra.)
Linear Algebra is everywhere in ML, so I had to re-learn the former to understand the inner workings of the latter. I studied linear algebra in my engineering undergrad over two decades ago. The discussions then were very theoretical and abstract. I just accepted the process of matrix and vector calculations as they were presented. I rarely used them in computer programming. The one time I used them was a few months in early 2000 when I needed affine transformations for wireframe graphics. I re-encountered matrices and vectors, and the magic of vectorized operations in Numpy, while studying ML in 2016. Before then, code of this sort was crudely accomplished with for-loops.
Today, we will use linear algebra to morph images, dissolving one image while slowly fading in a new one. We commonly see this in scene transition effects in movies. It sounds like a hard process, but it is not. And linear algebra makes it even easier. For illustration purposes, let's limit ourselves to morphing as-is, without the far more complicated problem of mapping/re-mapping corresponding pixels.
So, let's start loading standard libraries, including the Image class of PIL.
# import libraries
%matplotlib inline
import numpy as np
import matplotlib.pyplot as plt
#from scipy import misc
from PIL import Image
Let's load our test images. To make this simple, we will only work with grayscale images. We will also invert pixel values, else we will have a film-negative type of image.
im1=Image.open('img1.jpg')
im2=Image.open('img2.jpg')
y,x=im1.size
im_dim=(x,y)
img1,img2=np.array(im1),np.array(im2)
# convert RGB to grayscale
r,g,b=img1[:,:,0],img1[:,:,1],img1[:,:,2]
img1=0.2989*r+0.5870*g+0.1140*b
r,g,b=img2[:,:,0],img2[:,:,1],img2[:,:,2]
img2=0.2989*r+0.5870*g+0.1140*b
# invert colors
img1=np.abs(img1-255)
img2=np.abs(img2-255)
# flatten arrays
img1=img1.flatten()
img2=img2.flatten()
How to morph...
To morph, we could mix the two images using different 'amounts' from each image. We should do this for each pixel, but since we know that Numpy handles the correct pixel pairings, we can just do the calculation at the array level, which is also faster than looping through all the image pixels.
The middle morphed image would be a 50-50 split, 50% from image1 and 50% from image2. We could then implement the morphing by calculating and displaying these mixed images. If we want 10 transition morphs between the two images, we can simply run 10 iterations and calculate mixing proportions per run. For good measure, let's do it in both directions.
def morph_mixing(img1,img2,img3,im_dim,num_pix):
plt.figure(figsize=(num_pix,num_pix))
for i in range(num_pix):
ax=plt.subplot(1,num_pix,i+1)
ax.set_axis_off()
# mixing proportions
t=i/max(range(num_pix))
a=(1-t)*img1+t*img2
a=np.reshape(a,(im_dim))
ax.imshow(a,cmap='Greys',interpolation='nearest')
plt.show()
morph_mixing(img1,img2,img2-img1,im_dim,11)
morph_mixing(img2,img1,img1-img2,im_dim,11)
It is easy to animate this. This effect is even more striking in its subtlety when morphing two faces with facial features aligned!
Abstracting images into points
We could think of the proportionate mixing geometrically. The two images are, in effect, points in hyperspace, in this case, a hyperspace whose the number of dimensions is the image width times the image height. Each pixel is a dimension. Each dimension can have grayscale values of 0-255.
The mixing calculation is similar to moving through a line that connects the two image points: as we move through the line, we also change the mixing weights proportionate to our distance to either end points. Geometrically, this is also similar to calculating the proportions for each dimension individually, then reassembling all the coordinates. This per-component calculation is implied in the above code. The mixing weight $t$ and $1-t$ is broadcast to each pixel entry in the image arrays img1 and img2.
Enter Linear Algebra
In a linear algebra world, the image points can be represented by position vectors, so the above code is basically a scalar times a vector. Recall also that broadcasting in Numpy is exactly how scalar-vector multiplication works in Linear Algebra.
We can take this further into 'purer' linear algebra. If we think of each image as a vector in hyperspace, by vector addition, there is a vector $\mathbf v_3$ that connects the vector heads of the two images' position vectors (that was a mouthful, but parse it slowly). That is, $\mathbf v_1+\mathbf v_3=\mathbf v_2$, or $\mathbf v_2+\mathbf v_3=\mathbf v_1$ if $\mathbf v_3$ is the opposite direction. This vector $\mathbf v_3$ is therefore equal to $\mathbf v_2-\mathbf v_1$, or $\mathbf v_1-\mathbf v_2$ depending on the direction desired. This vector $\mathbf v_3$ is collinear with a line (the vector is effectively a line if stretched on both ends). We can then traverse through this n-dimensional line easily, exactly like we did in the mixing weights calculation above.
So, how do we define this line $\ell_3$ in linear algebra? Easy peasy. It is:
$$\begin{align}\ell_3 &=\{\mathbf v_1 + t*(\mathbf v_2-\mathbf v_1)\,:\, t \in \mathbb R \}\\ &=\{\mathbf v_2 + t*(\mathbf v_1-\mathbf v_2)\,:\, t \in \mathbb R \}\end{align}$$Notice that we add one of the vectors $\mathbf v_1$ or $\mathbf v_2$ to shift the difference vector to the tip of the head of either $\mathbf v_1$ or $\mathbf v_2$.
If you need a review of the parameterized equation of a line --and Linear Algebra in general-- I highly recommend Khan Academy's Linear Algebra course and OCW Scholar's Intro to Linear Algebra (note the "Scholar" qualifier; Prof. Strang's course has been elevated to a special status among the OCW universe; you might still find previous versions on YouTube or on OCW itself).
We know that any point defined by a vector can be reached by scaling another vector, as long as they are collinear. Or put another way, we can multiply a scalar to a vector to produce any arbitrary vector (length). In our case, we need the line segment from $\mathbf v_1$ to $\mathbf v_2$, which is via $\mathbf v_3=\mathbf v_2-\mathbf v_1$.
From $\mathbf v_1$ to $\mathbf v_2$ through $\mathbf v_3$, if we multiply 0.5 to $\mathbf v_3$, we should be halfway, and if we multiply by 1.0, we should reach $\mathbf v_2$. This is exactly the same as the above mixing weights solution.
$$\mathbf v_3=\mathbf v_2-\mathbf v_1=\{\mathbf v_1 + t*(\mathbf v_2-\mathbf v_1)\,:\, t \in [0,1] \}$$Let's write the code for this. While we are writing the code, let's also write this in a mathematical set notation (i.e., set builder notation/list comprehension to generate the images line segment).
def display_vector(img,im_dim,num_pix):
plt.figure(figsize=(img.shape[0]*1,img.shape[0]*1))
for i,im in enumerate(img):
ax=plt.subplot(1,len(img),i+1)
ax.set_axis_off()
a=np.reshape(im,(im_dim))
ax.imshow(a,cmap='Greys',interpolation='nearest')
plt.show()
# build a line segment/set of 'point images' via a parameterized line equation
img3_set=np.array([img1+t*(img2-img1) for t in np.linspace(0,1,num=11,endpoint=True)])
display_vector(img3_set,im_dim,11)
# reverse
img3_set=np.array([img2+t*(img1-img2) for t in np.linspace(0,1,num=11,endpoint=True)])
display_vector(img3_set,im_dim,11)
This result matches our previous image sequence.
Closing thoughts
The above example is a very simple use of linear algebra. To the uninitiated, the morphing mechanism appears complex, but a mathematical abstraction makes it straightforward to calculate. The application of basic linear algebra concepts makes this process even easier and faster. This technique can be generalized to any image. There are some refinements needed to handle RGB, but the mechanism is similar.
In the following post, we will explore what we can do with this difference vector. Here's a clue: we can hide an image under another image (sort of obvious from the above picture sequence). I was not thinking of hiding images and has never explored the techniques around image masking, but the possibility jumped at me after thinking through its algebra. So, while I did not look forward to re-learning this subject, I found it satisfying that a linear algebra abstraction made this topic, and extended thought experiments, easy to visualize.
No comments:
Post a Comment