This week marks the end of Zhejiang University’s 2019 Information Product Design (DIP) (no, I don’t know why it’s not IPD either) module. For those who are out of the loop, it’s essentially Introduction to Design / Engineering Design Innovation in Chinese (for those who don’t know what these are either, they can be roughly translated into ‘pain’ and ‘suffering’ respectively). Those of you who know me personally will know that I haven’t had particularly stellar experiences with the Big D in the past – but let’s not get into that. Oh, and it’s compulsory for everyone in the AI Empowered Design theme to join a group and participate in DIP.

DIP is what many of my theme mates have been pouring their blood, sweat, and tears into for the past month, and is the primary reason why progress has been so slow with regards to our theme project. Fortunately, with its conclusion, we’ll finally be able to get into the really good stuff.

For the AI Empowered Design theme project, I’ll be working on live video abstraction. This essentially involves generating key frames or video skims which are able to capture the important and meaningful parts from a live video sample (for instance, from a live-streaming source). As mentioned earlier, though, we’re a little bit behind; we’ve only just started learning the very basics of OpenCV and image processing.

Inspired by this photographic mosaic of a cat, I decided to use my rudimentary knowledge of image processing to write a small program that’d be able to create similar photographic mosaics.

I went for the most direct approach of downscaling and splitting up a reference image into component pixels. Each of these pixels would then be assigned a corresponding image from the set of component images, based on the average colour of each component image. When the image is viewed from far, each individual component image will be (hopefully) seen by the viewer as its average colour instead. Together with the other component images, the average colours would form the original, larger, reference image.

Following from this, the next issue to tackle was the actual assignment of images to pixels. To solve this, I came up with an arbitrary ‘error’ measurement between the average colour of a component image and the actual pixel colour. I defined this to be the square of the Euclidean distance between the pixel colour and average image colour.

This simplifies the image-pixel assignment into an optimization problem of minimizing the total error score (under the assumption that the number of component images is equal to the number of pixels in the downscaled reference image). As will be obvious to those of you who have taken the 10.007 Modeling the Systems World course, this can, in fact, be modeled as an assignment problem (and to think I’d thought that I’d never touch my 10.007 notes ever again!).

An assignment problem is one where a number of ‘workers’ must be assigned to an equal number of ‘tasks’. There is a cost associated with a certain worker performing a certain task, and we wish to minimize to overall cost.

In this case, the ‘tasks’ become the pixels, the ‘workers’ are the component images, and the ‘cost’ of each respective assignment is the error measurement between the average component image colour and pixel colour.

To efficiently solve the assignment problem, I made use of Google’s Operations Research tools library. The most troublesome and time-consuming operations in the program at this point, though, were the computations required to generate the cost matrix. As the error for each and every possible image-pixel permutation had to be calculated for the cost matrix, the time required to do this scaled with the square of the total number of pixels in the reference image (which itself squares with the side length of the image). For a 30×30 reference image, this only took 3-4 seconds to do; however, once the size goes up to 50×50, 25-30 seconds are required to generate the cost matrix.

At this point, I briefly considered exploring other approaches to this project. But as it turns out, I only had enough pictures to create a 26×26 image, which meant that the cost matrix limitation wasn’t really too problematic.

The next decision I had to make was which picture to use as the reference image. I went for the mathematical approach: The reference image would be the one in my image library whose average colour most closely matched the average of all average colours among all of the images. Of course, there are many limitations to this method – for one, the matching of the average colours does not guarantee that the distribution of colours will be similar. But it served as a quick and easy way to choose a picture, so I went for it. I ended up with this image:

This was from my first culture experience visit to Ningbo Fotile Kitchen Ware.

While there wasn’t really anything wrong with it, it wasn’t particularly interesting either – and neither were the next few closest candidates. So of course, I threw the whole mathematical approach out of the window and decided to go for this:

My first ice cream in Hangzhou.

With this step done, I got down to actually writing and implementing the program – here’s some of the code which I used.

The pre-processing turned out to be a bit more of an annoyance than I expected. Initially, I had worked under the assumption that the reference and component image inputs would already have undergone all necessary processing – but now that I was actually implementing the project, all the cropping, resizing, and stitching had to be done as well. Fortunately, there weren’t any unexpected issues which popped up in this phase, and I soon had the final product.

Now, unless you’re on psychedelics while reading this post, it should be quite obvious that this looks absolutely nothing like an ice cream.

So, what went wrong?

Below is the same final photographic mosaic, substituted with the average colour of each component image.

From this mosaic, the problem is immediately obvious – the range of average colours available for use from the component images is highly limited, mainly to grey, dark grey, and light grey. This makes it very difficult to form any meaningful colour pictures, even if this arrangement of images is optimally configured.

Let’s go back to the original photographic mosaic image of the cat.

Unlike the cat mosaic, which is predominantly orange (as are its component images), there is also no dominant colour or subject in my library of photos. Most of them compose of a large range and mix of colours. The consequence of this is that the average colour tends to be grey, and that, when placed in a photographic mosaic, the average colour is not clearly discernible. Instead, it ends up looking like a random montage of pictures; the range of colours in each component image is wide enough such that resolving each of them into their average colours will not give a good representation of what the image looks like from afar.

It should also be noted that in my ice cream mosaic, all of the images were used uniquely and without any colour adjustment. In contrast, while the cat mosaic has a component image resolution of about 50×50 pixels, meaning that there are approximately 2500 pixels, only 120 unique images were used. Besides being able to match colours more accurately, reusing images would also allow me to increase the component image resolution – as it is, even the actual 26×26 downscaled original image is barely recognizable as an ice cream.

Another potentially interesting approach towards photographic mosaics would be edge matching – where the component images would be assigned not only based on colour, but on how well their internal boundaries match with certain areas of the original resolution reference image. This seems to be something that is applied in the software used to create the cat mosaic.

Overall, while this project was a complete and absolute failure in terms of creating any sort of a recognizable image, it was still a pretty worthwhile endeavour. I haven’t had any prior experience with image processing and manipulation, so all of this stuff was completely new to me (which means I got to learn a lot, yay).

That being said, I’m not sure if I’ll be revisiting this project (or topic) again in subsequent weeks. This was a bit of a tangent from the actual theme project, so it’s far more likely that I’ll be exploring more on video processing, feature detection, and the likes in the near future. After all, with the end of DIP, work on the theme project is sure to pick up.

Click to rate this post!
[Total: 0 Average: 0]

LEAVE A REPLY

Please enter your comment!
Please enter your name here