CS 180/280A Project 1: Colorizing the Prokudin-Gorskii Photo Collection

Gina Wu (Fall 2024)

Prokudin-Gorskii images consist of three glass plate negatives in red, green, and blue respectively. The goal of this project is to take these digitalized images and automatically align them to output a single color image. Since these images are slightly offset, we want to identify the best translation for each color plate to minimize visual artifacts.


Approach

The digitalized images are given as three vertically stacked images. I first evenly divide each image into three along its height to isolate the color plates. In the naive single-scale implementation, I exhaustively search over a window of possible displacements by sliding the green and red plates over the blue. This window is set as 10 percent of the height and width of each image operated upon. Then for every alignment, I calculate a similarity score to determine the best translation. Two simple metrics, mean squared error (MSE) and normalized cross correlation (NCC), are implemented. NCC yields much better results than MSE on simple images including monastery.jpg.

cathedral
MSE
icon
NCC

While the naive exhaustive search works for smaller images, the problem quickly becomes intractable with high resolution inputs. To speed up the process, I implement an image pyramid as the search algorithm instead. The goal is to create a stack of images with different resolutions, with the lowest resolution image at the top and the highest resolution at the bottom (thus resembling a pyramid), and search for an objective in all of them by gradually increasing the resolution (i.e. going down the pyramid), reducing the search window until you reach the final image. The intuition is that if something can be localized in low resolution, then we know the approximate window where such an objective exists in. So, we can reduce the search window for the next higher resolution image until we pinpoint the optimum in final state.

I shrink each image to 100 pixels on its shortest side as the minimum resolution image, calculated as log2(min(input.shape) / 100), and the initial search window as 20 percent of its width and height. Starting from there, I align the image exhaustively as before at its current scale and update the best translation so far. Then, I reduce the search window by a factor of 4 and scale up the image by 2. From here, I iterate on the exhaustive search until reaching the original resolution, ensuring that the image is aligned according to its current scale and last best displacement every time.

Additionally, I crop out the edge pixels and leave 85 percent of each image with respect to their centers when calculating similarity scores. This refinement significantly improves alignment performance, especially in large images like lady.tif as shown below.

lady_nocrop
NCC, no crop
lady_crop
NCC, crop

Below is the execution time of each image alignment using NCC with crop as the similarity metric and multiscale pyramid search on Google Colab CPU.

        Image: cathedral.jpg             | Time: 6.149 sec
        Image: church.tif                | Time: 2.424 sec
        Image: emir.tif                  | Time: 2.459 sec
        Image: harvesters.tif            | Time: 2.489 sec
        Image: icon.tif                  | Time: 4.627 sec
        Image: lady.tif                  | Time: 2.583 sec
        Image: melons.tif                | Time: 2.509 sec
        Image: monastery.jpg             | Time: 4.476 sec
        Image: onion_church.tif          | Time: 4.137 sec
        Image: sculpture.tif             | Time: 3.678 sec
        Image: self_portrait.tif         | Time: 2.706 sec
        Image: tobolsk.jpg               | Time: 4.157 sec
        Image: three_generations.tif     | Time: 3.334 sec
        Image: train.tif                 | Time: 4.919 sec
    

This approach yields satisfactory results.

cathedral
cathedral.jpg
icon
icon.tif
train
train.tif

Bells & Whistles

Edge Detection: While most images had good results with calculating NCC based on RGB similarity, emir.tif had a particularly misaligned red plate. I apply a Gaussian blur to smooth and denoise the image, and then featurize the raw image plates through Canny edge detection and calculate similarity metric based on edges. This allows for a better similarity scoring based on perception and structure of objects, rather than raw pixel differences. The tradeoff is a much slower runtime, therefore I don't apply this method to pictures that already look aligned.

emir_ncc
RGB Similarity
emir_edge
Canny Edge Detection

Results

The final results are all calculated from NCC based on RGB similarity, except for emir.tif which operated on edge detection. The artifacts on image edges are manually cropped.

cathedral
cathedral.jpg
G: (2, 5); R: (3, 12)
church
church.tif
G: (4, 26); R: (-4, 58)
emir
emir.tif
G: (24, 50); R: (40, 106)
harvesters
harvesters.tif
G: (16, 60); R: (14, 124)
icon
icon.tif
G: (16, 40); R: (22, 90)
lady
lady.tif
G: (8, 56); R: (12, 116)
melons
melons.tif
G: (10, 82); R: (14, 178)
monastery
monastery.jpg
G: (2, -3); R: (2, 3)
onion_church
onion_church.tif
G: (26, 52); R: (36, 108)
sculpture
sculpture.tif
G: (-10, 34); R: (-26, 140)
self_portrait
self_portrait.tif
G: (28, 78); R: (36, 176)
tobolsk
tobolsk.jpg
G: (3, 3); R: (3, 6)
three_generations
three_generations.tif
G: (14, 54); R: (12, 112)
train
train.tif
G: (6, 42); R: (32, 88)