An algorithm puzzle from a rotor code examination. Weigh in your thoughts...

Okay, so I sent my little rotor bug issue to a friend of mine that is a really good systems level programmer.  He quickly enumerates all of the issues I had found and goes a step further than I was thinking.  He starts to actually recode the APIs with some more intelligent schemes in mind that actually work.  So the two APIs in question are:

BinaryWriter.Write7BitEncodedInt and BinaryReader.Read7BitEncodedInt.  Each is a sample of Microsoft trying to save space when writing out integers.  Basically, the same space is used to store a 64 bit integer value as a 32 bit integer value if their values are the same.  Once the 64 bit integer starts using it's high bits, then the encoding begins taking up more space.  In many cases you can store a 32 bit integer in less than 4 bytes as well, which saves some space.  Since many common portions of the BinaryFormatter use small numbers this can result in a huge space savings (all of the ID values and the enumerations can fit in the first or second bytes of the encoding process).

Anyway, we first decide that the darn thing should support negative numbers.  Which it doesn't right now.  Check this code out:
        protected void Write7BitEncodedInt(int value) {
      // Write out an int 7 bits at a time.  The high bit of the byte,
      // when on, tells reader to continue reading more bytes.
            uint v = (uint) value;   // support negative numbers
            while (value >= 0x80) {
                Write((byte) (value | 0x80));
                value >>= 7;
            }
            Write((byte)value);
     }

Notice that it casts the value to a (uint) to support negative numbers, but then continues to use value instead of v.  Doh!  If you try to write out negative numbers they get completely fudged.  So don't do that.  Fixing this method is easy.  Just replace the use of value with v and everything is cool.  You can now write out negative numbers without any issues (at least we haven't discovered any issues yet ;-)

Now, let's go look at the next strangely written method.
        protected int Read7BitEncodedInt() {
            // Read out an int 7 bits at a time.  The high bit
            // of the byte when on means to continue reading more bytes.
            int count = 0;
            int shift = 0;
            byte b;
            do {
                b = ReadByte();
                count |= (b & 0x7F) << shift;
                shift += 7;
            } while ((b & 0x80) != 0);
            return count;
        }

Okay, so this method isn't all that bad.  However, it doesn't have any sort of count.  If you have a stream with a bunch of 0xFF bytes written, then the method loops indefinitely until the end of the stream.  Note this method use a ReadByte() helper method that does a bunch of buffer fills, etc...  If we actually want to hi-jack this method to rewrite it we have to work directly on a stream.  This is what my friend has done, so here is what he came up with.  Note that there is some exception tossing code that tosses custom exceptions, but that really isn't all that important.  Basically what we needed to add was some form of bounds checking.  He may kick my ass for posting his humanly unreadable code, but who cares.  I had immediate issues with his code, so after checking it out.  See if you can answer the questions I've come up with (note, I'll post answers later).
    //++RIPRIP
    //
    public
    int
    __Read7BitEncodedInt (
     BinaryReader Reader )
        {
        int _count = 0;
        int _shift = 0;
        int _counter = 0;
        byte _b;

        do
            {
            //++
            // Add a catch here to grab stream errors and munge it into
            // our own shit.
            //
            try
                {
                _b = Reader.ReadByte();
                }
            catch( Exception _exception )
                {
                throw new Invalid7BitDataException("[invalid 7-bit encoded int]",_exception);
                }

            //++
            // Add a counter to check the number of bytes we've read.
            //
            if( _counter++ > (Marshal.SizeOf(_count)))
                {
                throw new Invalid7BitDataException("[overflow 7-bit encoded int]");
                }

            _count |= ((_b & 0x7F) << _shift);
            _shift += 7;
            }
        while ((_b & 0x80) != 0);

        return _count;
        }

1.  He uses a _counter variable to check and make sure he isn't reading too many bytes.  Does he need to use the counter or would another variable work?  Just to note, he did point out that he could have used another variable instead but chose not to.  I still don't know the methods of his madness so I won't comment yet.
2.  Now, does this method work for 32, 64, and even larger integers?  If not, what about it doesn't allow for arbitrarily large integers?  If you think it works, let me know why you think it works as well.

To let you know, I've encapsulated these methods (with my own changes) into a SafeBinaryReader/SafeBinaryWriter class so you can take advantage of integer compression if you want to.  The code will take arbitrarily long numbers, and we could update the read/write methods to work with longs, shorts, etc...

Published Thursday, February 19, 2004 7:12 PM by Justin Rogers

Comments

Friday, February 20, 2004 5:23 AM by Darren Neimke

# re: An algorithm puzzle from a rotor code examination. Weigh in your thoughts...

I really don't have much to add in terms of your questions but, it always interests me to see people who are concerned enough to put conditions at the bottom of loop structures like that (i.e. do...while's as opposed to while loops). It's a good reminder that there's more than one way to do stuff but, left unchecked it opens up another possibility for failure.
Friday, February 20, 2004 5:31 AM by Justin Rogers

# re: An algorithm puzzle from a rotor code examination. Weigh in your thoughts...

For the most part, this is straight out of the rotor code, except for our updates. I have to say, putting the condition at the end of the loop here is IMO the right thing to do. It makes the code much more elegant than it would otherwise be.

Now, would I have written the loop the same way had I not been looking at the rotor code when writing my method? Maybe, maybe not. It would certainly have taken me some thought to come to this method, because it is not an oft used method for me.
Friday, February 20, 2004 7:00 AM by Darren Neimke

# re: An algorithm puzzle from a rotor code examination. Weigh in your thoughts...

So, what does:

( ( _b & 0x80 ) != 0 )

.. test for? As in, what is 0x80?
Friday, February 20, 2004 7:08 AM by Justin Rogers

# re: An algorithm puzzle from a rotor code examination. Weigh in your thoughts...

_b & 0x80 tests the highest bit of the byte.

So basically:
0x80 translates to 10000000 binary

The way 7 bit encoded integers work, is this. Read a byte, take the bottom 7 bits and add it to your accumulator, shifting if necessary (by a multiple of 7). Then check the highest bit, or the 8th bit, and see if it is set. If it is set, then we aren't done yet. We need to read another byte from the stream and continue the decoding process.
Friday, February 20, 2004 1:31 PM by Bill

# re: An algorithm puzzle from a rotor code examination. Weigh in your thoughts...

<<
do...while's as opposed to while loops
>>

In the Write... code a while loop is used because you don't always want to enter the loop. In the Read... code, a do loop is used because you always want to go through the loop at least once. You could re-code each method to use the other construct, but you'd have to add more code to handle special cases.

<<
left unchecked it opens up another possibility for failure.
>>

Not any more than other constructs. Unless I'm missing something about do..while loops.
Monday, January 11, 2010 3:38 PM by Marcel Roma

# re: An algorithm puzzle from a rotor code examination. Weigh in your thoughts...

Justin, you write that Write7BitEncodedInt() supports negative integers (after modification). How can it be?

If four bytes are written to the stream, the algorithm will set the highest bit to aknowledge that more bytes are to follow. So how can we differentiate between the highest bit set because of the encoding algorithm and the highest bit set to signalize a negative number?

Tuesday, July 31, 2012 10:30 PM by Reannef Bogomilo

# re: An algorithm puzzle from a rotor code examination. Weigh in your thoughts...

Thanks for these tips. One thing I also believe is that often credit cards featuring a 0% rate of interest often lure consumers with zero interest, instant acceptance and easy on the web balance transfers, however beware of the real factor that can void your own 0% easy road annual percentage rate and to throw one out into the bad house quickly.

Sunday, April 7, 2013 11:47 AM by coachfactoryoutlet66.com

# re: An algorithm puzzle from a rotor code examination. Weigh in your thoughts...

That may possibly nuptials without requiring really like, it will likely be really like without requiring nuptials.