Problem 92 from Project Euler asks us to find the number of non-Happy numbers under 10 million:
A number chain is created by continuously adding the square of the digits in a number to form a new number until it has been seen before.
44 -> 32 -> 13 -> 10 -> 1 -> 1
85 -> 89 -> 145 -> 42 -> 20 -> 4 -> 16 -> 37 -> 58 -> 89
Therefore any chain that arrives at 1 or 89 will become stuck in an endless loop. What is most amazing is that EVERY starting number will eventually arrive at 1 or 89.
How many starting numbers below ten million will arrive at 89?
The numbers that have a sequence ending in 1 are called the ‘Happy’ numbers (making the rest non-Happy).
As with most Project Euler problems, this could be solved by brute force. Following the definition of the sequences, we can run through all numbers from 1 to 10 million. For each number, split it into its constituent digits, square each digit and sum them all. If the sum of squares of digits is 89, we’ve found a match. Otherwise, test the new number for a match (split digits, square etc). The problem with this approach is that you will have to evaluate at lest 10 million numbers. In a naive implementation, you would actually end up looking at some numbers multiple times. For example, in the series starting at 85 above, you would end up looking at 145 and you may look at it again when you consider the series starting at 145. Clearly, this approach does not scale.
The first optimization we can make is ensure we evaluate a number only once. To do this, we can store a lookup table with all the numbers that have a series ending in 89. If we evaluate a series that ends in 89, we know that any numbers in the series ends in 89 (i.e. 85,145,20 etc from above).
The second optimization comes when we notice that the largest possible sum of the squares of digits in a number for our test is (92)x7 = 567 (from the number 9999999). This reduces the number of sequence evaluations we have to make. We can construct our lookup table showing numbers between 0 and 567 that end in 89. Then for each number in our test range from 0 to 10 million, we simply calculate the sum of squares of digits and check our lookup table to see if the value would terminate in 89.
The third optimization is based on the realization that the sum of squares of digits function does not change if you reorder the digits. That is, the sum of squares for 123 is the same as for 213, 312, 321 etc. So if the sequence starting at 123 ends at 89, we know that 312, 213 etc will end at 89 as well. Formally, this means that the sum-of-squares-of-digits function partitions the set of integers into Equivalence classes such that a~b if a and b contain the same digits. Using this optimization,we only need to check sequences for 11,440 numbers – almost 1000 times smaller than the 10 million checks that the brute force approach would have us make. However, to use this approach, we need to determine the size of each equivalence class. That way, if we determine that that a particular number ends in 89, then we can update our count of matching numbers by adding the full size of the numbers equivalence class.
We turn to combinatorics to determine the size of each class. For any one number, its equivalence class is constructed by creating permutations of the digits. The size of the class is therefore given by the number of unique permutations of digits in a sequence. We need to account for repetitions of indistinguishable objects: that is, given the sequence 100335, we need to account for that the the 3s and 0s can be interchanged in permutations without resulting in different integers. The formula and reasoning for such a count is explained at http://www.andrews.edu/~calkins/math/webtexts/prod02.htm under the title “Permutations with repeated elements”. The eventual formula is;
nPr1 r2…rk = n! /( r1!r2!…rk!)
where we are arranging n elements where the first element occurs r1 times, the second r2 times …
We still need to figure out how to make sure we pick one (and only one) element from each equivalence class for our sequence calculations. Each class is composed of numbers that contain the same digits in some order. We can represent each class by creating a string with the non-decreasing sequence of digits that define the class. Generating all strings of of length 7 with non-decreasing sequence of digits will give us exactly one representative from each equivalence class.
With these optimizations, we solve the problem in 0.5 seconds using python. We can also count all the non-Happy numbers under 10^100 in under a minute. We can further reduce our computations by noticing that there are much fewer happy numbers than unhappy numbers – there are 20 happy numbers under 100. Since all numbers are either Happy or non-Happy, we can determine the number of non-Happy numbers by counting happy numbers and subtracting from the count of numbers in range.