Puzzle time: Compression

Klaus

Somewhat related to Jon's most recent puzzle.

We all know zip files and the like: Their purpose is to reduce the size of data in such a way that the original data can be reconstructed (usually known as lossless compression).

Sometimes, however, something funny happens: When you try to compress a file, the compressed version is actually bigger than the original data.

My question is this: Can there be a lossless compression method that does not suffer from this property: It reduces the size most of the time but the compressed version is never bigger than the original (not even by one bit).

Horace

I don't think so. It's tempting to say that if the compressed size was larger, you'd just switch to an identity compression algorithm, but you'd have to add info to the 'identity compressed' data to indicate that it's unaltered from the original. That information can't be carried at zero size.

jon-nyc

Seems like it’s impossible based on the pigeon hole principle alone. If the mapping of uncompressed to compressed is reversible then some files must be larger.

Klaus

@jon-nyc said in Puzzle time: Compression:

Seems like it’s impossible based on the pigeon hole principle alone. If the mapping of uncompressed to compressed is reversible then some files must be larger.

Exactly. There is no bijection between sets of different cardinality.

Klaus

@horace said in Puzzle time: Compression:

I don't think so. It's tempting to say that if the compressed size was larger, you'd just switch to an identity compression algorithm, but you'd have to add info to the 'identity compressed' data to indicate that it's unaltered from the original. That information can't be carried at zero size.

That’s why I added the condition that it cannot be even one bit larger. With one extra bit it would be trivial, as you describe.

The New Coffee Room

Puzzle time: Compression