While cleaning up nebol.se I found this text I wrote in 2005, about a compression algorithm I was working on… putting it here in the blog instead (for laughs) and removing yet another hardcoded html page from the unfathomable depths of nebol.se…
“Mission Goal: Creating the Ultimate Compression Algorithm. My priority is compression ratio, ie creating small files. Compression speed is more or less irrelevant. Decompression speed is more important, but less so than compression ratio.
050314: So there I was, re-compressing my MAME roms-collection using ZipMax, to squeeze out a few more saved bytes… My friend Mgt was working on his assignment, creating a Huffman encoder. Fashinating stuff. I read up on the Huffman algorithm on the web, as well as the LZW algorithm, to get a better idea of what we were talking about.
Previously I kind of had the idea that we were near the limits of compression, but I saw many problems with the algorithms and I had some ideas on how to improve them. I lay awake at night thinking up an improvement on the Huffman algorithm.
050315: Some researching on Huffman revealed that my new improved Huffman algorithm already existed. It’s called “Adaptive Huffman”, and it was even a bit smarter than my own “new” algorithm. Darn. Back to the drawing board. Luckily I still have a few ideas on improvements, both on Huffman and LZW.
050315: I had an idea on how to improve the Adaptive Huffman algorithm. Unfortunately, I now found that, again, I was too late:
“For adaptive Huffman coding, Gallager suggests an “aging” scheme, whereby recent occurrences of a character contribute more to its frequency count than do earlier occurrences [Gallager 1978]. This strategy introduces the notion of locality into the adaptive Huffman scheme. Cormack and Horspool describe an algorithm for approximating exponential aging [Cormack and Horspool 1984]. However, the effectiveness of this algorithm has not been established.”
Strange, the effectiveness has not been established? Just code it and try it out! What’s stopping them? It has been 20 years!!! Anyway, my algorithm is simpler and smarter than “approximating exponential aging”. Maybe I’ll implement my algorithm and run a few tests.
A few links:
Adaptive Huffman: http://www.cs.sfu.ca/cs/CC/365/li/squeeze/AdaptiveHuff.html
The original LZW paper by Terry Welch: http://sochi.net.ru/~maxime/doc/welch.shtml
LZW article by Mark Nelsson in DDJ: http://www.dogma.net/markn/articles/lzw/lzw.htm
ZIP format (by PKWARE): http://www.pkware.com/company/standards/appnote/
050315: I’ll definitely use my own format for the compressed archive. It’s time to get rid of ZIP and old, ugly, inefficient, bloated file formats like that.
050318: I searched the net for some decent tree class, but found nothing.. Nothing, I tell you, nothing! My old jeTree class that I coded a long time ago was not exactly what I needed, so I started coding on a new jeTree class. The base class is now complete and I will sooon finish the jeAdaptiveHuffmanTree class…
I learned that I am not alone in my quest. MGT’s Power Packer 2005 is coming along nicely and could be a serious threat. We shall have to wait and see which will be the Ultimate Compressor…”
The modesty, it burns! Anyway, it’s safe to say that I will never finish that project. Same old story: No time, no money. Gotta prioritize.