CS 180/280A Project 4: [Auto]Stitching Photo Mosaics

Gina Wu (Fall 2024)

This project creates a pipeline for automatically stitching images together to create a larger panoramic image.

Part A: Image Warping and Mosaicing

In the first part, I create image mosaics through projective warping and pyramid blending on images with manually labeled correspondence points.

1: Shoot Pictures

For the dataset, I take images that significantly overlap from the same pivot point in consistent lighting conditions. This will help the later projection steps look more seamless.

2: Recover Homographies

The homography transformation is a 3x3 matrix with 8 degrees of freedom. With a set of correspondence points {(x, y), (x', y')}, we can formulate the problem as such in the matrix equation form Ha = b. Note that we are operating in homogeneous coordinates, so the coordinates must be normalized later on.

This is simply a system of linear equations that can be expanded:

Rewrite in matrix form:

We can now plug in manually labeled correspondence points to calculate the homography transformation for any set of images. I stack the points in matrix form and use least squares to estimate the homography matrix. I used the labeling tool from project 3 to establish the correspondences.

3: Warp Images

I use inverse warping similar to project 3. I calculate the forward and backward homography transformation matrices separately (instead of directly calculate the inverse) to work around the normalization. Then, I identify the corner points of the input image and use the forward homography matrix to map it to a new warped "bounding box". The new bounds might consist of negative coordinates, so I also shift them so they are all positive. With the bounding box coordinates, I obtain every pixel in the newly warped constraints via skimage.draw.polygon and interpolate each pixel to a color in the original image with scipy.interpolate.griddata. These steps are very similar to what was done in project 3, but some care must be taken in normalizing the coordinates and shifting the bounding box correctly.

4: Image Rectification

I test the correctness of my warping implementation by rectifying images. The corner points for the daisy granny square and the textbook are manually chosen. The final warped shape is also chosen according to the square or rectangle dimensions.

Daisy Granny Square

Corner Points

Rectification

Textbook

Corner Points

Rectification

5: Mosaic Blending

For every set of images, I warp one image to the other unchanged target image. If we naively add the images together, the overlapping region will be very apparent and bright as shown here.

Image 1 Warped

Image 2 Target

Naive Mosaic

In order to blend, I calculate a mask based on the distance transform. Every pixel is mapped to one side or the other based on its value in the corresponding distance transform. I show the results of this mask creation below, take note of the coordinates in the overlapping region.

Image 1 Distance Transform

Image 2 Distance Transform

Combined Mask

With the mask, I blend the warped image with the target using a Laplacian stack from project 2!

Image 1

Image 2

Mosaic

Image 1

Image 2

Mosaic

Image 1

Image 2

Mosaic

Part B: Feature Matching for Autostitching

In the second part, I implement a system for automatically stitching images into a mosaic as proposed by Brown et al. in "Multi-Image Matching using Multi-Scale Oriented Patched" (2005).

1: Corner Detection

I retrieve the corners with the provided code, which uses skimage.feature.corner_harris. The corner detector function first calculates the second moment Harris matrix from Gaussian image gradients and uses its eigenvalues to compute the corner strength/response function. Each value in the Harris matrix can then be classified as either a flat region, an edge, or a corner. Below, I show the Harris response images (brightness correlates with corner feature weight) as well as the Harris corner points overlaid on the original images.

Image 1 Harris Response

Image 2 Harris Response

Image 1 Harris Corners

Image 2 Harris Corners

The target images have resolutions below 1000 on each side, but we still achieve number of corners in the magnitude of thousands naively. As shown above, things like my cats' facial features and words on my computer screen are perfectly outlined with the corner detector. These features are not always useful and computational cost of matching is superlinear, so we would like to restrict the number of extracted interest points.

2: Adaptive Non-Maximal Suppression

Here, I implement Adaptive Non-Maximal Suppression (ANMS) to select a fixed number of interest points. Instead of naively suppressing points that don't have enough corner strength, ANMS suppresses points with neighbors that have a sufficiently larger strength. As a result, ANMS is not simply choosing the brights points in the Harris response matrix, and is therefore more robust and obtains a more uniform spatial distribution of interest points. In implementation, I calculate the suppression radius around each point in the Harris response matrix and return the top 500.

Image 1 Interest Points After ANMS

Image 2 Interest Points After ANMS

3: Feature Descriptor Extraction

Now I extract features from the 500 interest points. As described in the paper, I find 40x40 patches around each interest point and downsample it to 8x8 for blurry feature patches. The patches are also normalized to mean 0 and std 1. Some example features of corners and edges are shown.

4: Feature Matching

I match the features according to the paper. For every feature in one image, I find its two nearest neighbors according to L2 distance. With the L2 distance as the error, I calculate Lowe's ratio err_nearest_first / err_nearest_second. If this ratio is below some indicated threshold (I used 0.5), then the feature pair is considered a match. Otherwise, I discard the pairing. This helps eliminate a lot of false positive matches.

Here are some examples of matched features patches:

Examples of automatically matched correspondences:

The correspondences look good and a lot of noise is eliminated. However, there are still incorrect matches. We want to make sure that the homography calculation is robust against outliers.

5: RANSAC

RANSAC is implemented to make the least-squares homography calculation more robust. The RANSAC loop proceeds as follows: randomly sample 4 matches, compute the exact homography, compute inliers (correctly transformed points), and exit the loop after some number of iterations with the largest set of inliers. The final homography is calculated with this largest set of inliers.

6: Mosaic with Autostitching

Finally, everything is combined for a fully automatic mosaic creation pipeline! Here I show the results of both part A and B.

Manual Correspondences

Autostitching

Manual Correspondences

Autostitching

Manual Correspondences

Autostitching

You can definitely see the differences in warping and projection from manual correspondence points and autostitching, particularly in the first and third set of images. The results from part B appear much more consistent. For example, focus on the lights in the first set of results: the manual correspondence points result in an awkward puzzling that cannot be fully aligned, while autostitching returns a seamless blend!

This project was really satisfying to see come together. My biggest frustration was with image rectification in part A. I found that even when the homography calculation was correct, choosing only four corners of a square or rectangle resulted in very high variance results. The final rectified image could fail even when the original image or manual correspondences were off by a little. I found this part to be very unreliable to debug, because I couldn't always tell if it was a code or data issue.

In comparison, I found part B to be much easier overall. The paper was very easy to read and straightforward to implement and debug, and it was really cool to work so closely with a classic computer vision paper! The algorithms requiring nearest neighbor calculations were also surprisingly fast to compute.