I recently came across a post on /r/programming about resizing images in PHP using Entropy. I was excited to read the post because this sounded interesting. I was soon disappointed to find it only explained how to use an existing library and not how the technology worked.
For those who have not heard of it before, Image Cropping based on Entropy is a system for automatically finding the ‘interesting’ parts of an image when generating a cropped version. This is useful for generating thumbnails of images for blog posts or e-commerce stores.
I decided to read the code and write up an explanation. After investigating, it turned out things were not exactly as I had expected in the code. It is important to point out that there are two different authors that will be referenced in this post. First, there is the blog author, who wrote the original article that prompted this post. Second there is the code author, who is not the same person as the blog author. Additionally, this post will include a critique of another developer’s code. I don’t intend to disparage this person for what they wrote, only to show where it is wrong. The code author should be commended for taking the time to both create and share something with the world. It is entirely possible that I am failing to understand the indent, or using it incorrectly. All issues I have find will be submitted in a pull request by time of publishing. You can view the changes I made, including a branch with test-bed code, on my public GitHub repository.
tl;dr; Although the code author describes the algorithm as follows, it seems step 3 did not work as described and is, in fact, extremely flawed. As such, I was unable to reproduce the blog author’s results using the code he linked to. Therefore, I rewrote the code to work as I would expect it to, getting the results presented in the original article.
- Take the image and turn it into black and white.
- Run an edge filter so that you are left with only edges.
- Find a piece in the picture that has the highest entropy (i.e. most edges).
- Return coordinates that make sure that this piece of the picture is not cropped ‘away.’
Let’s step through the process and illustrate how it works. Internally, the algorithm re-sizes the source image before running. Here is the source image resized (source image provided by https://littlevisuals.co/):
The list above states that it will first turn the image black and white and then run edge-detection filters on it. However, in the code you will find that they start instead with edge detection and then turn the image black and white (and de-saturate). So, following the code, the first thing we do is run edge-detection on the original image. It produces the following:
After running edge-detection, the code does some modulation, de-saturation, and then blurring of the original image. The end result can be seen below.
Now that we have a black-and-white approximation of the original image, we start to process the entropy. Here, however, is where the original code starts to fall apart. Here is the theory on what the code was trying to do, in pseudo-code:
imageWidth = image.width // 346 pixels in this example targetWidth = 270 //Arbitrary sliceSize = ceil(imageWidth - targetWidth) / 25 // 25 is number of slices to test. start = 0 // beginning of image end = image.width - sliceSize //end of image sliceA = image.getSliceOfPixels(start, sliceSize) sliceB = image.getSliceOfPixels(end, sliceSize) while (end - start) > targetWidth : if entropy(sliceA) < entropy(sliceB) start += sliceSize sliceA = image.getSliceOfPixels(start, sliceSize) else end -= sliceSize sliceB = image.getSliceOfPixels(end, sliceSize) endif endwhile return start
Essentially, this code would compare strips of pixels on either end of the image – comparing entropy to find the highest value left-most strip, compared to the right – until the two strips are within the desired bounds of the resulting image. Phew! That’s a long sentence. It does this horizontally and then vertically to find the left and top positions for cropping. Fundamentally, this is incorrect.
If we first only consider the horizontal dimension, we can reduce our image into an array of integers that represent each strip’s entropy.
entropyStart = [0, 1, 10, 9, 9, 9, 9, 4, 2, 1] entropyEnd = [1, 1, 0, 0, 0, 11, 5, 0, 1, 0]
Now, following the algorithm, the following table illustrates what our resulting value would be:
[table id=6 /]
The final image would include the highest entropy strip from the right, at index 5, but would not include the range of high entropy that was less individually than the maximum on the left. This can’t be right!
On top of that, I found the implementation of “getSliceOfPixels” to be wrong and not give the correct results. The order and value of the operands was incorrect, resulting in comparing vastly different image subsections. The implementation of “getSliceOfPixes” is the ImageMagick method cropImage, which has the following signature:
bool Imagick::cropImage ( int $width , int $height , int $x , int $y )
A reduced example of the original code for slicing the image horizontally is as follows:
$geometry = $image->getImageGeometry(); $start = 0; $end = $geometry['width']; $sliceSize = 5; $targetWith = 270; //Arbitrary $originalSize = $geometry['width']; $sliceA = clone $image; // 500 5 0 0 $sliceA->cropImage($originalSize, $sliceSize, $start, 0); $sliceB = clone $image; // 500 5 500 - 5 = 455 0 $sliceB->cropImage($originalSize, $sliceSize, $end - $sliceSize, 0);
Now, horizontal processing means we are working left to right, so our slices should be vertical. But this code clearly makes $sliceA a horizontal strip starting at 0,0 that is 500 x 5 pixels in size. Worse yet, $sliceB ends up being a square strip starting at 455,0 that is 5 x 5 pixels in size. This is because if you ask ImageMagick to crop beyond the boundaries of an image it just limits to the boundary. That is to say, if I say give me 500 pixels wide starting at position 455 of the image, it returns 5 pixels because that is all that is available.
I also found entire sections of code that could be removed. It seemed that the intent was for some early-out optimizations, but since it is not finished it served only to confuse understanding.
So we see it is comparing the entropy of 2500 pixels to the entropy of 25. That cannot give correct results, and it seems it does not. When running the original image from the blog author’s post through the latest code on GitHub I do not get the results he sees at all, which you will see in the conclusion.
Fixing the algorithm
I set about trying to fix the algorithm, though I don’t know if I am correct. I only went on intuition, and could ultimately be completely off the reservation. However, the results I found did seem to indicate a somewhat more correct implementation.
By studying what was going on during execution, and outputting intermediate images, I was able to see what the intention was and work toward that. The first thing I did was correct the image slicing code to cut correct slices. Although this improved things, the results were still not correct in my opinion. We were still only processing some of the image and basing our decision on the single highest valued strip. I thought we could do better.
Going back to the example where I represented the entropy as an array of integers, I changed the algorithm to process the image entirely, from one side to the other, building up a list of entropy values. Then, I iterate over that list finding the subset of strips that maximizes the entropy and meets the required target dimension. Pseudo-code for the algorithm is:
imageWidth = image.width targetWidth = 270 sliceSize = imageWidth / 25 requiredSlices = targetWidth / sliceSize start = 0 entropies while start < imageWidth: slice = image.getSliceOfPixels(start, sliceSize) entropies.push([point => start, entropy => entropy(slice) endwhile bestStart = 0 bestEntropy = 0 // Sum up the entropies for i is 0 to count(entropies): temp = 0 for j is 0 to requiredSlices temp += entropies[i + j].entropy endfor if temp > bestEntropy: bestEntropy = temp bestStart = i endif endfor return bestStart
We calculate the entropy of 25 slices of the image and then find the subset that maximizes our entropy. This is inefficient and could, I think, be improved with dynamic programming. However, it appears to work. See the results below:
The results appear to speak for themselves. I am unsure how to reproduce the results seen in the original author’s blog post, but I found my results to be nearly identical to his (slight vertical offset between them). I hope you found this interesting, and please let me know all the ways I am incorrect. It’s the best way to learn! And I want to thank the original blog writer and code authors. You went out and created when you didn’t have to, and that gave me something interesting to think about. Thanks!