This project explores 2D convolutions, filtering, and image manipulation in the frequency domain.
First, I show the image partial derivatives Fx, Fy with respect to the x and y axes. This is done by 2D convolutions with the finite difference operators.
Gradient magnitude is computed as sqrt(Fx ** 2 + Fy ** 2). The gradient image has the following stats: min: 0.000 | max: 360.624 | mean: 9.098. To binarize, I set the threshold at pixel value 80.
I convolve the original image with a 5x5 Gaussian kernel to get a blurred image. Then, I calculate partial derivatives with difference operators the same as before.
The gradient image has the following stats: min: 0.000 | max: 175.655 | mean: 5.865. Notice that the maximum value decreases significantly, indicating a reduction in noisy pixels. Qualitatively, we see this difference in the deeper, more defined edges of the partial derivatives, as well as the smoother gradient magnitude. The final binarized image has smoother, thicker, and more solid lines compared to the previous part. The chosen threshold is 28.
We can also convolve the Gaussian kernel with the difference operators before applying to the image. Since convolutions are commutative and associative, this yields the same results.
Below I display the DoG X and Y filters as images. Experimentally, I used the 5x5 kernels. The 500x500 kernels are only shown to better illustrate the filter at a higher resolution.
High frequency features can be extracted from an image by subtracting a blurred version from the original image. The sharpen operation with an unsharp mask is derived as follows, with Gaussian blurring filter G, identity mask I, and hyperparameter alpha for controlling the amount of sharpening.
sharpened = original + alpha x high_freq_features sharpened = original + alpha x (original - original * G) sharpened = original + alpha x original * (I - G)
I show results of sharpened low resolution images with varying hyperparameter values. Natural photos perform well with alpha < 5. Illustrations tend to yield many artifacts. Pictures with motion blur need higher alpha values to have significant improvement.
For evaluation, I blur a sharp image and try to resharpen it with my implementation. The resharpened eiffel tower image looks much more contrasted than the original. Some of the features are inevitably lost after the Gaussian blur, but the resharpened image restored a lot of the initial structure. The results on my cat Turbo are difficult to evaluate since he is so fluffy, but one can make out more defined furs on the sharpened images. In other words, he is looking very sharp.
To create hybrid images, I pass one image through a lowpass (Gaussian) filter and the other through a highpass (identity - Gaussian) filter to extract the respective low and high frequency features. The combined image is an addition of the extracted features.
The Fourier analysis of grayscale images is shown below in log scale.
Below are some more results. I think the lion with my cat Agave is a very nice result, since their facial expressions line up perfectly. I also like the way the floppy ears of the real and the crochet bunnies align, as well as their colors. The one with Peepo and Agave is a failure case. I cannot get it to look nice despite how much the parameters are tuned, since Peepo's face is so much wider and the green color is very hard to compensate for.
I implemented the bells & whistles for this section by filtering all the color channels. Colors tend to improve the look of the hybrid image when there are already overlapping colors.
The Gaussian stack is a set of lowpass filtered images, where the Gaussian filter is applied successively.
The Laplacian stack is a set of bandpass images and consists of differences between consecutive levels on the
Gaussian stack with all the mid-band frequencies. Each of the displayed depths are normalized for visualization
on the Laplacian stack.
The following are displayed: apple Gaussian stack, apple Laplacian stack, orange
Gaussian stack, and orange Laplacian stack. The last layer of the Laplacian stack (last Gaussian layer) is not
shown here, but will be used for part 2.4.
Start with a Gaussian stack of an image mask and two Laplacian stacks of the target images. Each blended layer is the calculated with im1[i] * mask[i] + im2[i] * (1 - mask[i]). The last layer is the sum of all the blended Laplacian stack along with the last image Gaussian layers (i.e. lowest resolution base layer). I found that with I had the best results via less levels, fine-grained adjustments, and details extracted from smaller kernels, as shown by the more subtle blurring between levels.
Here are some more blending results. I was very happy with how well the fruit slices lined up! I got the corgi picture and mask from the Segment Anything demo. The corgi doesn't need to be 'blended' into the background as much, so I set the Gaussian kernel for the mask to be very small for this image.