Math Quickie: Introduction to combination formulas
Over the past few days there has been a lot of comment traffice on one particular entry. It covers the coding of a particular poker hand in as few bits as possible. Timothy Fries finally came up with an optimal coder that packs the hand into the fewest bits achievable. But how do we know that we packed the hand in the fewest achievable bits? For that we need to investigate some problems.
Pick a Card any Card
If I have a full deck, what are the odds of picking a specific card? 1:52... Now, if I'm going to pick a card, notate it, put it back, shuffle, and pick another card. What are the odds of picking the same card a second time (barring magicians)? We need to combine the probabilities through multiplication. We end up with 1:2704... What is the number 52 and 2704 in each of these ratios? Further, what is the number 1?
// Ratio1
n = number of items being selected from = 52
m = number of selections being made = 1
n^m = 52^1 = 52
// Ratio 2
n = number of items being selected from = 52
m = number of selections being made = 2
n^m = 52^2 = 2704
In the ratios, the 52 and 2704 is the number of possible arrangements given by the equation n^m. The 1 gives the arrangement we are searching for and represents the number of ways we can get a desired result. If we ask for a specific card, 1 card, then there is only one way for that to happen. If we ask for the same card twice, there is again only one way for that to happen. The equation n^m is the number of ordered selections with repetition.
Quit Shuffling that Deck
I don't recall any games where you pick a card and put it back only to pick another card. That could just be my own ignorance though. Many more games rely on the selection of cards from the deck without reshuffling or reinserting the cards. A poker hand for instance, would comprise selecting 5 cards from the deck. After you select each card, the number of possibilities for the next card you pick change. The first card has a 1:52 chance, while the second card has a 1:51 chance of coming up. This isn't optimal for representing poker hands because the ordering of cards in the hand doesn't matter, but if you were going to represent a pile of hidden cards that you are going to reveal in order it makes more sense.
n = number of items being selected from
m = number of selections being made
n!/(n-m)!
! is the factorial mark. It means multiply all the numbers from 1 to the given number
n! = (n*n-1*n-2*...*1)
(n-m)! = special case!
I mark (n-m)! as a special case because of the expansion potential. The entire equation becomes (n*n-1*n-2*...*n-m+1)... From a computational standpoint it now becomes easy to work the above equations without evaluating the large factorials. The equation n!/(n-m)! is the number of ordered selections taken without repetition.
22 Bit Poker Hands
A real poker hand gets compressed into 22 bits. That is because 22 bits is enough bits to encode the 2.6 million or so combinations that are available? Where does that number come from? Well, it comes from a different class of combination methods. These methods are unordered, meaning the order in which the items are placed in the resulting sequence doesn't matter. Previously combinations such as Ace/2 and 2/Ace would be considered two different combinations, but in effect, we don't need the ordering information and we can throw it out.
The number of unordered selections is most often called a combination while the number of ordered selections is considered an arrangement. Every combination that we pull will also have some number of arrangements associated with it. For instance, the Ace/2 combination has two arrangements and the Ace/2/3 combination would have six arrangements. The number of arrangements of a particular combination is simply n!... It is actually n!/(n-m)!, just like our previously examined method, however, m == n. What does that do? Well, the bottom term becomes 0!, which in turn is 1. Dividing by one is an identity, so we lose the division altogether.
Okay, how does this help us? Well, if we use our previous equation for arrangements, and then remove the arrangement through division we get a new equation... If the number of arrangments, for instance, of 3 cards is 6, then if we divide by that number, we get the number of combinations. Sweet huh? The new equation is C(n, m) = n!/((n-m)!m!) where the m! term counts the number of arrangements.
n = 52
m = 5
52!/(47!5!) = 2598960
I think a lot of people know the equations here, but don't put together the concept of arrangements and combinations and how they are combined to derive the above equation. Timothy Fries came up with a very performant way to compute the above values that I think is very useful. Remember the expansion we did before where a bunch of the terms in the factorials cancelled out? Well, with the new equations, the same cancellations apply, but we now have a small factorial remaining on the bottom. The new expansion is (52*51*50*49*48)/(5*4*3*2*1).
items = 52; int numerator = 1; int denominator = 1;
for(int i = 0; i < picks; i++) { numerator *= (items-i); denominator *= (i+1); }
result = numerator/denominator;
This is great. It emphasizes a single division for performance and only works over the non-cancelled terms. The above method is crucial to Tim's packing routine, which I'm going to cover in a single post, and examine the result spatially with some cool diagrams. To reiterate, the equation C(n, m) is called the combinatorial method and represents the number of unordered selections without repetition.
Suit Combinations for 5 Cards
We have 5 cards in our hand, and each card has a suit. If we add up the counts of each suit, how many different combinations will we come up with. We are allowed to pick suits multiple times (aka counting), and the order of the suits doesn't matter. We need to extend our previous equation to allow repeated selection of the same suit.
n = 4
m = 5
C(n + m - 1, m) = C(4 + 5 - 1, 4) = C(8, 5)
8!/(5!3!) = (8*7*6)/(3*2) = 8*7 = 56
You can use this method for quite a few different counting routines. Another abstraction might be with scoring a game. You can count the number of possible scores that might exists if say 5 people are going to split 6 points between them. Trivial combinations exist where a single player accrues all 6 points. You have 6 of those, and then you start getting to the partioning where 1 player has 5, and you pass the single remaining point between 4 other players. There are 4*5 of these, etc.., etc...
n = 5
m = 6
C(n + m - 1, m) = C(5 + 6 - 1, 6) = C(10, 6)
10!/(6!4!) = 210
// If we do something smaller we can verify the results.
n = 5, m = 2
20000, 02000, 00200, 00020, 00002
11000, 10100, 10010, 10001
01100, 01010, 01001
00110, 00101
00011 = 5+4+3+2+1 = 15
C(6, 2) = 6!/(4!2!) = (6*5)/2 = 15
Picking the Correct Equation
Sometimes picking the correct equation to examine the possibilities is quite hard. We saw that there are at least 2 equations for representing a poker hand, one more compact than another. The best way to test would be simulating a smaller example that you can work and expand by hand to verify the equation matches the results. If it does, you might have a winner. Run a few more expansions just to make sure.