Generating checksums—cryptographic hashes such as MD5 or SHA-256 functions for files is hardly anything new and one of the most efficient means to ascertain the integrity of a file, or to check if two files are identical.
However, generating a file containing its own checksum as part of its content is a task quite daunting, if not seemingly impossible due to a paradox involved in the process.
That has not stopped a researcher from creating a PNG image that contains the file’s MD5 checksum, visible within the matrix of pixels that make up the image.
A leet image with a 1337 hash
Reverse engineer and researcher David Buchanan has yet again left everyone surprised after sharing an image on Twitter that contains its own hash.
BleepingComputer confirmed the checksum of the image in question is 1337e2ef42b9bee8de06a4d223a51337, which are the characters displayed vertically within the image itself.
Note: The image embedded below has been compressed and as such lost this properly. Readers can attempt the experiment with the original image shared by the researcher. For redundancy, we have preserved the original image: 1, 2.
A checksum is a smaller-sized chunk of data, or even a digit, derived from another set of digital data as a means to detect errors or data corruption that may have occurred. The idea is that any minor change occurring to the original file or piece of data will alter its checksum indicating that the integrity of the data is now void.
Most digital technologies of the times make use of cryptographic hash functions—like MD5, SHA1, SHA256, or so to generate checksums of files fairly quickly.
For any file you may have or create, you can trivially calculate its MD5 checksum on your PC, Mac, or another device. And, the slightest change to the file’s contents even by a character or pixel will drastically change its checksum. You can try this in practice by recalculating the checksum of your altered file.
This makes the inclusion of a file’s checksum within its content by ordinary means a scenario that is quite paradoxical.
You need the checksum or hash of a file first to include this information within the content of the file itself. But doing so by editing or altering the file will effectively change the file’s checksum, therefore making this practice seem impossible.
But, reacting to a 2013 challenge posted by security researcher 0xabad1dea (‘a bad idea’), Buchanan solved the puzzle this week by creating such a file.
Trick I want to see: a document in a conventional format (such as PDF) which mentions its own MD5 or SHA1 hash in the text and is right
— badidea (@0xabad1dea) August 9, 2013
“The image in this tweet displays its own MD5 hash,” tweets Buchanan.
“You can download and hash it yourself, and it should still match – 1337e2ef42b9bee8de06a4d223a51337”
“I think this is the first PNG/MD5 hashquine.”
Hashquines: files containing their own checksums
What Buchanan essentially created is colloquially called a “Hashquine,” a term coined in 2017 by hardware and software enthusiast, foone to refer to files that show their own hash.
The same year, Google security engineers, known as spq and Ange Albertini successfully demonstrated the concept by respectively generating GIF and Postscript files that displayed their own hash as part of the file’s contents:
Another researcher Rogdham later provided “GIF-MD5-hashquine” source code on GitHub, enabling just about anyone to generate such GIFs while using MD5 as the choice of hash.
What Buchanan has demonstrated today, however, essentially makes the MD5 hashquine technique possible for PNG files.
“I think I first became aware of hashquines after seeing spq‘s GIF hashquine in 2017,” Buchanan told BleepingComputer in an email interview.
“Ever since, I wanted to make a PNG hashquine. I thought about it for a while, but couldn’t figure it out – the same tricks used for the GIF file format can’t be directly applied to PNG.”
Since then, the researcher worked on several projects involving PNG images.
In 2021, Buchanan produced a mysterious image that looked very different on Apple and non-Apple devices, as first reported by BleepingComputer.
Prior to this, the researcher demonstrated using Twitter images to pack entire ZIP archives and MP3 files.
“Through these, I learnt a lot more about the gory details of the PNG file format, and also the limits of how badly you can mangle an image before Twitter won’t let you upload it.”
And it seems the researcher has figured out creating a perfect PNG-MD5 hashquine that Twitter won’t block or alter—for now anyway.
“Armed with my improved knowledge of the PNG file format, and a much faster PC than I had in 2017, I finally figured out a viable method.”
Buchanan has shared a detailed technical breakdown in a Twitter thread on how he was able to land on his hashquine, and it has to do with leveraging hash collisions:
The adler32 checksum was collided to a chosen value using 48 FastColl collisions, with a meet-in-the-middle technique.
After the adler32, the crc32 was collided similarly, using another 48 FastColl blocks.
As the name suggests FastColl is fast, and this part only took ~minutes.
— David Buchanan (@David3141593) September 23, 2022
“Most (all?) existing hashquines rely on using collisions to change the boundaries of different sections within the file. I couldn’t think how to do this with PNG (especially while remaining twitter-compatible),” says Buchanan.
“Instead, I put all my collisions within the image data itself. Colliding blocks contain random garbage data, which would look like this:”
In crafting a solution for this challenge, Buchanan derived a “clever PNG palette” that has 256 entries, the first being red and all others being black.
“As long as my colliding blocks don’t contain any bytes that are 0, all the random garbage ends up simply as black pixels. Sometimes the colliding blocks do contain zeroes – I just keep trying again until they don’t.”
Another key insight the researcher shared with BleepingComputer is having to carefully design the font used to print the MD5 hash within his PNG so as to avoid “garbage bytes” resulting from a collision to flow into the next pixel, which would alter the image.
“Here’s a closeup of one of the digits, you can see that there is only one red pixel per row,” the researcher tells BleepingComputer.
“I had to carefully design the font like this, because otherwise the garbage bytes from one collision would run into the next pixel.”
Pointing to the image, Buchanan further explained, “the garbage on the right side is from the digit collisions, but the garbage along the lower edge is from the collisions used to correct the adler32 checksum (the crc32 collisions are not visible, they actually occur after the end).”
It seems it’ll be a while until, much like Rogdham, Buchanan is also able to release his code for PNG-MD5 hashquines.
The researcher tells BleepingComputer he’s further refining the code which at the time is “a bit of a rube goldberg machine” and potentially working on a paper.
Source: www.bleepingcomputer.com