Card Games: Testing the Poker Coder (a short before I start back into the math)...
Testing the poker coder is actually fairly fun. We know quite a bit about the sample data, and we know exactly how many combinations there are. So let's take a look at some naive approaches.
for(int i = 0; i < 52; i++) {
for(int j = 0; j < 52; j++) {
for(int k = 0; k < 52; k++) { ... All the way through m!
Well, we'll get some definite redundancy there. On the order of over 300 million calculations when we only need about 2.6 million of them. You see 5 nested loops iterating over the range 0 through 51 is going to result in 52^5 operations. Darn that sucks. Even worse if the algorithm is expecting non-repeated data, we'll have to parse out all non-repeating patterns.
if ( i == j || i ==k || i == l || i == m || j == k || j == l || j == m || k == l || k == m || l == m ) { continue; }
Can you believe I actually wrote that! Well, when you are spec'ing out tests sometimes you write that sort of stuff. I'm guessing everyone has had a comical loop or two, and this is probably the most comical loop I've ever written. How can we fix the loop to operate more quickly, while achieving our desired results? Well, what if we clamped each loop to it's desired range. For instance, we know that since values like 51, 51, 51, 51, 51 are invalid, that only one of the loops needs to traverse that value. Perhaps the last loop m should remain the same, but all the others should get kicked down a notch based on their tier.
for(int i = 0; i < 48; i++) {
for(int j = 0; j < 49; j++) {
for(int k = 0; k < 50; k++) { ... All the way through m again!
Well, that gets rid of a few thousand iterations. The outer loop is going to run 4 less times meaning j is going to operate 49*48 instead of 52*52, that is a big savings, and it cascades all the way down the loop structure. It is good to know how to come up with these values so you can work out how long your tests are actually taking. For instance, the total work saved is actually 52^5-(48*49*50*51*52)... I think that gets rid of around 68 million?
The lower range is also going to need to be fixed up quite a bit. For instance, values like 0, 0, 0, 0, 0 are completely unneeded. When i = 0, loop j should start at 1, and of course k should be based on j, etc...
for(int i = 0; i < 48; i++) {
for(int j = i + 1; j < 49; j++) {
for(int k = j + 1; k < 50; k++) { ... All the way through m. Getting the hang of it!
By the time you are done, you'll have gone from the first loop featuring 300 million or so loops, most of which were redundant to the final loop which features our target of 2.6 million or so operations. That is a huge change. Not only can we reduce the number of operations, but we can simplify the loop by taking out the duplicate checks. Because each inner loop is guaranteed to always be at least one greater than its containing loop, and because the outermost loops never try to step on the last few values, we'll never have an indice that is duplicated.
Since I was using this to test the poker coder submitted by Tim through some comments on another entry, I figured I'd post this test routine. He had some routines of his own, but this is definitely invaluable for testing any other algorithms that need to operate over all combinations of hands. To ensure that things were working properly I went ahead and threw in a Hashtable for good measure. Each encoded value was then placed in the Hashtable and checked for duplicates. On a decent machine 2.6 million entries doesn't really hurt all that much.
That doesn't ensure success though. What happens if the values were in the range of 5.2 million and half the numbers were being skipped? That doesn't really check the routine as a compression routine at all now does it? Well, if you can pack 2.6 million entries into a hashtable, and have a maximum value of 2.6 million, then I'm pretty sure we just did some saturation coverage and ensured the algorithm was successful. Since he used the int data-type, we also have to check for negative values. So we should ensure all values are greater than 0.
Testing very failure scenario is a real pain in the rear. Most often I try to negate bugs early on by using data types that reflect the data I am going to use. I know it isn't CLR compliant, but I often reach for the uint or ulong data-types in place of the int or long data-types when I have no need of a negative number. The range checks you might do on this value turn into a single range check and you get much more range out of it.
We aren't done with the poker coder just yet, but we have some math to jump into next. Tim used a special coding method in order to turn a series of cards into what it's value would be in a look-up table, if one existed. Rather than make the look-up table, he reverses the value right back out into a hand of cards. In this manner he is able to pack the deck of cards into exactly the number of combinations. Since that particular number fits into 22 bits, so does his packing routine. Very nice, but definitely needs some explanation. We also had a number of other routines that were tried. I used an arithmetic coder, which I'll post separately since it can be used to more generically pack numeric data, and was capable of packing repeated cards in 24 bits. Tim had a great idea to limit the number of bits required to encode the suits of the cards and wound up stumbling upon Tetrahedral and Pyramidal numbers. Fun stuff!