'Learning' an algorithm: kNN as an edge detector

Let's start with a question: can a machine learnign algorithm 'learn' another algorithm by just looking at examples?

This is a broad question. The answer is equally broad: yes, no, maybe, depending on the process. Let's limit this down to something fairly simple, but not so simple that it is an easily predictable process.

Let's look at edge detection, a very common problem in computer vision and image processing. Given an image, we have to find (contrasting) edges. Ideally, the edges should also outline distinct objects to help objection identification. More realistically, and the easier challenge, it is to at least establish the location of possible edges (that is, the high contrast gradients in the pixels).

Edge detection algorithms

There are many algorithms out there. For this problem, we will focus on two: the Sobel and the Canny edge detectors. We will not discuss the process and merits of the algorithms. And that's the point. We hope to 'learn' these edge detectors by just looking at its output relative to the original image.

Since we know the algorithms are calculated exactly, we at least know that there has to be a pattern. The algorithms could also be probabilistic (I do not recall if they have a probabilistic component in how they use a Gaussian convolution, if at all. Frankly, I am going into this problem blind. I have not checked exactly how these algorithms work). As long as it follows a discernable pattern, we should be able to apply machine learning to said algorithms. This therefore sounds like an easy problem.

It could also be a difficult problem. Maybe the patterns are so similar that the guesses might end up being wrong. After all, images are rather noisy and filled with unpredictably-oriented edges. So this might be quite a challenge.

How do we go about this?

This should be an easy implementation. First, we will run an edge detector. Then we will extract a small subset of an image as our training set. We extract two sets: one for the prediction, and one for the pixel inputs/features. For ease of computation, we will settle with a 3x3 grid around each pixel, so we have nine features (including the center pixel). For each of the center pixels, we extract the prediction value from the edge detector output image. We then train our algorithm, then test it against the entire image (including the small training subset).

Step 1: Load image and run edge detector

We use the sobel and canny functions from skimage. Both algorithms produce real numbers as the prediction for each pixel, i.e., they create varying degrees of being an edge. We could go with this and use a regression algorithm (real-valued predictions), but since we want to make this easy, let's stick with binary predictions --i.e., we will deploy a classifier.

So we need to reformat the edges image into a black-white image. We can easily do this by thresholding the pixel values, then assuming it is a '1' if above a cutoff, '0' otherwise. Here we use another standard thresholding technique called the Otsu method so we do not have to think about what threshold is appropriate.

These images are displayed below.

%matplotlib inline
import numpy as np
import matplotlib.pyplot as plt
from skimage import data
from skimage.filters import sobel, threshold_otsu
from skimage import feature

def get_image_outline(image,edge_type,display=True):
    edge_sobel=sobel(image)
    edge_canny=feature.canny(image, sigma=3)
    bw_sobel=edge_sobel>threshold_otsu(edge_sobel)
    bw_canny=edge_canny>threshold_otsu(edge_canny)
    
    img=[image,edge_sobel,bw_sobel,edge_canny,bw_canny]
    titles=['original','sobel','sobel, binary','canny','canny, binary']

    if display:
        fig,ax=plt.subplots(ncols=5,figsize=(12.5,2.5))
        for idx,a in enumerate(ax):
            a.imshow(img[idx],cmap='gray')
            a.axis('off')
            a.set_title(titles[idx])
        plt.show()
        
    return img[2 if edge_type==0 else 4]

orig_image=data.camera()
edge_type=0
print('image.shape:',orig_image.shape)
edge_img=get_image_outline(orig_image,edge_type)

image.shape: (512, 512)

Step 2: Extract training patch

We then select a random box from the image. For simplicity, we select a 50x50 patch from the middle of the image. From the b/w image, we extract the expected prediction for each pixel within the patch. From the original image, we extract the pixel grayscale information from a 3x3 grid centered around each of the 2,500 pixels within the patch.

Let's also show the location of this patch for easy reference.

# build a training set based on a 50x50 patch from the middle of the image
def make_training_set(image,target,start_patch_x,start_patch_y):
    for i in range(start_patch_x,start_patch_x+50):
        for j in range(start_patch_y,start_patch_y+50):
            r=np.c_[target[i+1,j+1],[image[i:i+3,j:j+3].flatten()]]
            if i==start_patch_x and j==start_patch_y:
                patch=r.copy()
            else:
                patch=np.r_[patch,r]
    train_data=np.array(patch)
    #print('train_data.shape:',train_data.shape)
    return train_data

# create the test set based on the entire image
def make_test_set(image,target):
    full_img=[]
    for i in range(1,len(image)-2):
        for j in range(1,len(image[0,:])-2):
            r=[]
            r.append(target[i+1,j+1])

            r.append(image[i,j])
            r.append(image[i,j+1])
            r.append(image[i,j+2])

            r.append(image[i+1,j])
            r.append(image[i+1,j+1])
            r.append(image[i+1,j+2])

            r.append(image[i+2,j])
            r.append(image[i+2,j+1])
            r.append(image[i+2,j+2])

            r=np.array(r)
            full_img.append(r)
    test_data=np.array(full_img)
    #print('test_data.shape:',test_data.shape)
    return test_data

def create_experiment_data(image_train,image_test,edge_img_train,edge_img_test,
                           start_patch_x,start_patch_y,display=True):
    patch=make_training_set(image_train,edge_img_train,start_patch_x,start_patch_y)
    full_img=make_test_set(image_test,edge_img_test)

    full_img_box=np.reshape(full_img[:,0],((len(image_test)-3),(len(image_test[0,:])-3)))
    
    X_patch_box=np.reshape(patch[:,5],(50,50))
    y_patch_box=np.reshape(patch[:,0],(50,50))
    
    X_outline=image_train.copy()
    X_outline[[start_patch_x,start_patch_x+1,start_patch_x+2,
              start_patch_x+50,start_patch_x+51,start_patch_x+52],:]=255
    X_outline[:,[start_patch_y,start_patch_y+1,start_patch_y+2,
                 start_patch_y+50,start_patch_y+51,start_patch_y+52]]=255
    
    y_outline=edge_img_train.copy()
    y_outline[[start_patch_x,start_patch_x+1,start_patch_x+2,
              start_patch_x+50,start_patch_x+51,start_patch_x+52],:]=1
    y_outline[:,[start_patch_y,start_patch_y+1,start_patch_y+2,
                 start_patch_y+50,start_patch_y+51,start_patch_y+52]]=1
    
    img=[X_patch_box,y_patch_box,X_outline,y_outline]
    titles=['patch, training set X','patch, training set y','patch source, X','patch source, y']

    if display:
        fig,ax=plt.subplots(ncols=4,figsize=(10,2.5))
        for idx,a in enumerate(ax):
            a.imshow(img[idx],cmap='gray')
            a.axis('off')
            a.set_title(titles[idx])
        plt.show()
            
    return patch,full_img,full_img_box

start_patch_x,start_patch_y=int(len(orig_image)/2)-25,int(len(orig_image[0,:])/2)-25
patch,full_img,full_img_box=create_experiment_data(orig_image,orig_image,edge_img,edge_img,start_patch_x,start_patch_y)

Step 3: Train and run classifiers

We now train and test the classifiers on the entire image. We select a normal kNN and a PCA+kNN for these tests, with five principal components for the PCA.

For a 9-feature problem, the PCA reduction to five features will not matter much. Chances are, given the nature of the problem --i.e., we can assume that the center pixel will depend on the values of the eight pixels surrounding it, reducing this problem to five dimensions is ill-advised. This would be more relevant for instance if we have a 5x5 grid, and we then limit the PCA to nine components. But let's do it anyway.

For those familiar with machine learning metrics, you will also note that since we have very few positive values relative to negative values (few edge pixels relative to the black empty space), an accuracy score will not be very enlightening. We could have a terrible output image (e.g., incorrect 10,000 pixels), but accuracy might still be very high because we have 500x500=250,000 pixels across the entire image.

So we have to use an F-score (its formula can be discerned from the code below), which penalizes false negatives and false positives.

from sklearn.neighbors import KNeighborsClassifier
from sklearn.decomposition import PCA

def learn_edge_detector_experiment(orig_img,patch,full_img,full_img_box,edge_type):
    Xy,X,y=patch,patch[:,1:],patch[:,0]
    Xt,yt=full_img[:,1:],full_img[:,0]
    num_pca=5
    k=1

    pca1=PCA(n_components=num_pca).fit(X)
    X_pca=pca1.transform(X)
    Xt_pca=pca1.transform(Xt)

    kNN1=KNeighborsClassifier(n_neighbors=k).fit(X_pca,y)
    pred1=kNN1.predict(Xt_pca)
    
    kNN2=KNeighborsClassifier(n_neighbors=k).fit(X,y)
    pred2=kNN2.predict(Xt)
    
    acc1=np.count_nonzero(pred1==yt)/len(pred1)
    tp1=np.count_nonzero((pred1==yt)&(pred1==1))
    fp1=np.count_nonzero((pred1!=yt)&(pred1==1))
    fn1=np.count_nonzero((pred1!=yt)&(pred1!=1))
    prec1=tp1/(tp1+fp1)
    recall1=tp1/(tp1+fn1)
    f_score1=2*prec1*recall1/(prec1+recall1)

    acc2=np.count_nonzero(pred2==yt)/len(pred2)
    tp2=np.count_nonzero((pred2==yt)&(pred2==1))
    fp2=np.count_nonzero((pred2!=yt)&(pred2==1))
    fn2=np.count_nonzero((pred2!=yt)&(pred2!=1))
    prec2=tp2/(tp2+fp2)
    recall2=tp2/(tp2+fn2)
    f_score2=2*prec2*recall2/(prec2+recall2)

    ps1=np.reshape(pred1,(len(full_img_box),len(full_img_box[0,:])))
    ps2=np.reshape(pred2,(len(full_img_box),len(full_img_box[0,:])))
    img=[orig_img,full_img_box,ps1,ps2]
    detector_name=['sobel','canny']
    titles=['orig image',detector_name[edge_type],'pca+knn','knn']
    acc=[1.0,1.0,acc1,acc2]
    f_score=[1.0,1.0,f_score1,f_score2]

    fig,ax=plt.subplots(ncols=4,figsize=(10,2.5))
    for idx,a in enumerate(ax):
        a.imshow(img[idx],cmap='gray')
        a.axis('off')
        if idx==0:
            a.set_title(titles[idx])
        else:
            a.set_title(str(titles[idx])+'\n'+'(accuracy: {:.2f}\nf-score: {:.2f})'.format(acc[idx],f_score[idx]))
    plt.show()

learn_edge_detector_experiment(orig_image,patch,full_img,full_img_box,edge_type)

The output is not bad at all. We do get some grainy patches on the ground, but overall, we seem to have captured a good representation of the edges. Notice the high accuracy here but a relatively low F-score.

Let's re-run to put the images closer. Note that we are using and 'learning' the Sobel detector.

edge_type=0
edge_img=get_image_outline(orig_image,edge_type)
start_patch_x,start_patch_y=int(len(orig_image)/2)-25,int(len(orig_image[0,:])/2)-25
patch,full_img,full_img_box=create_experiment_data(orig_image,orig_image,edge_img,edge_img,start_patch_x,start_patch_y)
learn_edge_detector_experiment(orig_image,patch,full_img,full_img_box,edge_type)

Test 2: Same as above but using a different image

Having seen that an algorithm can somewhat 'learn' an edge detector, let's try another image for good measure. Knowing how a kNN works, one could say this is more like using the closest copy from memory, not necessarily 'learning'. You would be right, but that's machine learning to you (the kNN anyway).

orig_image=data.coins()
edge_type=0
edge_img=get_image_outline(orig_image,edge_type)
start_patch_x,start_patch_y=int(len(orig_image)/2)-25,int(len(orig_image[0,:])/2)-25
patch,full_img,full_img_box=create_experiment_data(orig_image,orig_image,edge_img,edge_img,start_patch_x,start_patch_y)
learn_edge_detector_experiment(orig_image,patch,full_img,full_img_box,edge_type)

It works still, sort of. There is that fuzzy galaxy of pixels around the first row of coins. If we look closely, we do notice that there is some shading differences around the first row that is different from the shading in the middle of the image (where we took the patch!).

This is therefore to be expected. The machine has not seen these kind of 'extreme' shading. It can extrapolate (it is nearest neighbor, not exact duplicate, so it extrapolates), but it can get thrown off if the behavior of the process that generated the training set differs from that which created the test set. Although it came from the same image, the top left is lightly shaded than the middle and bottom parts. That led to the incorrect pixels.

How do we solve this?

Test 3: Same as Test 2, but with a different local patch

What if we select the patch near the top, to at least 'train' the kNN with the lightly-shaded data?

orig_image=data.coins()
edge_type=0
edge_img=get_image_outline(orig_image,edge_type,False)
start_patch_x,start_patch_y=int(len(orig_image)/4)-25,int(len(orig_image[0,:])/4)-25
patch,full_img,full_img_box=create_experiment_data(orig_image,orig_image,edge_img,edge_img,start_patch_x,start_patch_y)
learn_edge_detector_experiment(orig_image,patch,full_img,full_img_box,edge_type)

It sort of works. We got rid of the noisy pixels near the top row, but we now lost details on the lower part. Still, the average F-score (between kNN and PCA+kNN) is better this time, so perhaps this is a better-trained model after all. That is, the choice of the upper-left training patch is better than the initial middle patch.

A Random Walk to AI

Pages

Wednesday, December 28, 2016

One algorithm learning another: kNN as an edge detector, part 1