Thanks to Ralf Anderson for error detection. Also see David Serrano's version.
The C++ code is here; you can copy and paste it into your favorite editor. To find a multinomial coefficient, invoke: multinomial::multi<result_type>(std::vector<size_t> const & vec)where the components of vec are the arguments to the multinomial function. The return value is of type result_type. To find the number of partitions of an integer rem, where no element is larger than top, invoke: multinomial::parti(size_t const rem, size_t const top)The return value is of type size_t. 
Detailed explanation.
When mathematicians solve problems in combinatorics, they often need to calculate multinomial coefficients. Unfortunately, the most direct definition of these numbers requires factorials, which are often inconveniently large.
The multinomial coefficient function, here written multi (…), takes several arguments which are nonnegative integers. It is defined in plain English as
the factorial of the sum of the arguments
divided by the product of the factorials of the individual arguments 
For instance, with three or five arguments:
The sum of the arguments is the order of the multi (…) invocation.
This function is commutative: the sequence in which the arguments are written makes no difference.
The purpose of this web page is to provide a efficient scheme for calculating multi (…) in the computer programming language C++. A similar implementation can be developed in other languages.
A numerical overflow problem can be demonstrated with as few as three arguments. For instance:
Even though the result (25,740) is of reasonable size, the numerator (6,227,020,800) will not fit into the 32bit unsigned integer offered on many computers, where the maximum value is 4,294,967,295.
Of little help is the 64bit unsigned integer available on most newer machines, because it has a maximum of 18,446,744,073,709,551,615, which is less than 21!, as seen here:
The result fits into a 32bit integer, the denominator into a 64bit, but the numerator into neither.
Extendedprecision integer packages are readily available, and they solve the overflow problem, but often at a considerable speed penalty. Writing computer code to cancel common factors from numerator and denominator is possible, but complicated, and offers little hope of fast performance.
The key to efficient calculation is to exploit a recursive property of multinomial coefficients. Although it is awkward to write in the general case, the fourargument formula as an example ought to make it clear:
More concretely:
If an invocation of multi (…) would have a negative argument, the invocation is replaced by zero, and recursion of that branch ends.
The first virtue of this scheme is that no intermediate result requires larger storage than the ultimate answer. A second benefit is that calculations involve neither multiplication nor division, but only addition — computers can typically add faster than they can multiply, and multiply faster than they can divide. If a large result is anticipated and extendedprecision integers are chosen, the time savings with recursion can be substantial.
One disadvantage of the recursive formula, if it is implemented naïvely, is that it leads to extensive repetition in the calculation. For instance, both multi (4, 9, 8, 4) and multi (5, 8, 8, 4) need multi (4, 8, 8, 4) as an intermediate result. This redundancy can be eliminated, however, by establishing a cache.
In this implementation the cache is initialized with the multinomial coefficient of order zero, namely multi ( ) = 1. Later, when a multinomial coefficient of order n is requested,
For instance, if the first request is for multi (2, 1, 1), which is of order 4, the following will be calculated and encached, in this sequence:
If a second request is for multi (1, 1, 1), no calculation is necessary.
If a third request is for multi (3, 2), the seven multinomial coefficients of order 5 will be calculated and encached, but the multinomial coefficients of order 4 and below will not need to be recalculated.
The first 45 entries in the multi cache, if that many are needed, look like this:



The C++ function to invoke multi (…) is called multinomial::multi is and is declared
namespace multinomial { template <typename result_type> result_type multi(std::vector<size_t> const &) }
Presumably, result_type will be an unsigned integer, perhaps of extended precision. For many purposes size_t will be satisfactory, even though its range can vary from machine to machine. A disadvantage of size_t, however, is that the programmer cannot expect to be alerted in case of overflow. If exactitude is not required, result_type could instead be floating point.
Any zeroes that happen to be in the argument std::vector<size_t> do not affect the answer. This turns out to be convenient because when a std::vector<size_t> is constructed, its components are by default set to size_t(0) which equals zero.
Among other things, namespace multinomial contains these items for convenience:
namespace multinomial { typedef std::vector<size_t> SVI; void view (std::ostream &, SVI const &); }
With those declarations, this sample code:
using namespace multinomial; SVI v(4); v.at(0) = 7; v.at(1) = 9; v.at(2) = 4; v.at(3) = 6; std::cout << "\n multi "; view (std::cout, v); std::cout << " = " << multi<unsigned long long int>(v);gives this output:
multi (7, 9, 4, 6) = 12760912164000
Had we instead invoked the function as
multi<double>(v)an answer such as 1.27609e+13 would have ensued.
The cache is stored in a std::vector<result_type> in class multinomial::combo. Within the vector, multinomial coefficients of order n are stored at lower addresses than multinomial coefficients of order n + 1, as shown in the table above. The vector grows as needed, and it is not necessary for the programmer to predict its ultimate size.
Efficient subscripting of this vector is a nontrivial matter, because the multinomial coefficients of order n are of the same quantity as the integer partitions of n, and partition quantity functions are complicated to figure. Class multinomial::index uses recursion to generate the values, and stores them in its own cache, which is separate from that of class multinomial::combo.
Programmers who are interested only in finding multinomial coefficients may ignore class multinomial::index. However, programmers who are enumerating partitions can easily use, through a wrapper function parti, the code of class index that calculates partition quantities:
namespace multinomial { size_t parti (size_t const rem, size_t const top); }
This tells how many partitions of the integer rem exist, with no component greater than top. For example multinomial::parti (7, 4) equals 11, because there are 11 ways to break 7 down into addends that lie between 1 and 4:
The recursion formula is
parti (rem, top) = parti (rem  top, top) + parti (rem, top  1)with two conditions:
The first 45 entries in the parti cache, if that many are needed, look like this:


