Interesting behavior of various compression software programs

It can only be explained by a mistake in how you constructed the file with “8 different letters”. Your construction of the file with seven different letters appears correct, as the compression ratio should be log2(7)/8, which is 0.351. For the same thing with eight letters, the compression ratio is log2(8)/8, which is 0.375.

Perhaps your file has a repeating pattern in the eight letters.


You are using rand() to generate your “random” distribution. Unfortunately, the classic implementation of rand() has very poor randomness in the low few bits, with repeating patterns. Your % 7 uses all of the bits from rand(), but % 8 uses only the low three bits. % 8 is equivalent to & 7.

Use random() instead, which generates random numbers for which the low bits, as well as any of the bits, have good random behavior.

