how many parts of data can be compressed into a single byte?

In the most general terms, when you have a value that goes from 1 to 7 (I’m assuming inclusive,) which is equivalent of the range [0..6], you can think of it as a digit in base 7.

This means that a string of them forms a number in base 7. Then, the easiest thing you can do is to store that number in a computer variable (in effect, turning it into base 2, but you don’t usually have to think about that, since a number is a number.)

Now lets do some calculations about how many of your values will fit in how many bits. Obviously, one value will fit in 3 bits (and you will waste 1 out of 8 values, or 12.5% of the 3 bits.) Two values – which is equivalent to a two-digit number in base 7 – will have a range of 0 to 48, and will fit in 6 whole bits (and will waste 15 out of 64 binary values, or 23.4%.) Three values will have a range of 0 to 342, and will fit in 9 whole bits and waste a lot of values.

I think you can figure out the rest. Note that the actual (fractional) number of bits required to store each number (in this case 7) is by definition its logarithm in base 2.

To actually encode and decode in such a way, you just need to successively multiply by 7 (encode) or divide by 7 (decode) the same what that you would convert numbers into any other bases. For example, if your values are x, y, and z, the following will hold:

// x,y,z must be in [0..6], so probably have to subtract 1 from each first...
// then:
encoded = x * 1 + y * 7 + z * 49;  // encoded will fit in 9 bits.

decoded_x = (encoded /  1) % 7;     // also need a plus one.
decoded_y = (encoded /  7) % 7;     // same
decoded_z = (encoded / 49) % 7;     // ditto

In general, to map and pack any number of values, each in the range [0..m-1] into a number of bits, you do the same thing as above. Think of each of your values as a single digit in a (possibly large) number in base m, and then store that number in any variable that can hold it.

Note 1: Usually, you get less wasted bits the more values you pack into a single number. For example, some comments and other answers suggest packing each base-7 value into 3 bits. If you do that, you can only store 10 of them in a 32-bit variable. But you can actually pack 11 (whole) values in 32 bits. Admittedly, storing 1 value per 3 bits is faster to encode and decode (using only shifts and masking,) which makes the choice dependent on your needs.

Note 2: There are ways to talk about (and reason about) fractional bit counts. For example, each base-7 digit will take up 2.81 bits. And there are other ways to encode and store them as a collection. But that’s a more complex matter in the realm of coding and compression. If you are curious, you can read about “arithmetic coding”.

Note 3: Talking about compression, there are far better ways to pack several values into a number of bits, if you know more about them. That is, if some of them are more likely to be specific values, or equal to each other, or whatever. Also, if there is a large enough number of values, you can extract these relationships from the data itself. This is – in essense – what a general purpose lossless compressor does.

Note 4: Keep in mind that you can even do this kind of encode and decode (the way I’ve suggested) if the ranges of your values aren’t the same. For example, if x and z are in the range [0..6], but y is in [0..4], then you can encode and decode them like this:

encoded = x * 1 + y * 7 + z * (7 * 5);  // encoded will fit in 8 bits now!

decoded_x = (encoded /  1) % 7;
decoded_y = (encoded /  7) % 5;
decoded_z = (encoded / 35) % 7;

CLICK HERE to find out more related problems solutions.

Leave a Comment

Your email address will not be published.

Scroll to Top