A New Idea about Realtime Image Processing

Written 2009-06-12

Tags:imageprocessing YouTube Segmentation (image processing) OpenCV Electrical Engineering Technology Pixel 

The more image processing I deal with, the more I'm convinced a better solution is needed than pre-made image toolkits. Evaluating a 640x480 image, even just once, can be processor intensive. Every processing toolkit I've seen has the same basic parts, an image class/interface, some I/O functions, and some image modification functions. OpenCV at least gives you raw pointer access to the image, but the problem here is that each of these transforms are always applied sequentially.

Some time ago, I wrote a simple image-segmentation algorithm. However, because this was a realtime task, it needed to be made as fast as possible(see here). What I completed was a giant mess of c/c++, with loops hand-unrolled to remove edge conditions from them. With optimization, 50fps worst-case (P4, 3.2). However, later I needed to add some functionallity to the segmentation algorithm. Re-opening this code was a nightmare. Even though I had plenty of comments, it would've been about 1/6 the size if I hadn't been crunched for CPU power.

So, raw image access and image-optimized code are still necessary for real-time image processing (I look forward to doing it in python, whenever that becomes doable).

Also, the fewer passes you can make through an image, the better( usually ). The problem with most(perhaps all) premade image toolkits, is that they encourage sequential execution of functions. For example, if I wanted to double contrast, then apply a threshold, and then halve the contrast, the image would be contrasted, then thresholded, then contrasted. We've made a total of three passes through the image to do our three functions. O(N) on the number of functions isn't bad, but it isn't necessary either. It would be faster to say, Pixel(x,y) = (composition of multiple rules). In this way, the pixels, which now have a single, CPU intensive step, only have to be read and written once. I don't know of a single compiler that can optimize this case.

If we can break our image processing down to the pixel level, we can apply multiple complex transformations in one step. If we can do that, we can minimize memory bandwidth usage and increase our cache utility.