Monday, January 24, 2011

Sudoku puzzels revisited (Part 1)


Previously, I compared some functional programming aspects between Mathematica and Haskell. As promised, this is part two in a set of essays that are designed to reflect on functional programming aspects or the expressiveness thereof. I mostly use Mathematica for testing ideas, but then I meander in the language forest with all the dead and still alive trees to find a tall one to climb up and view at an algorithmic problem from higher up.



Why Haskell?

If you’re not interested in a bit of computer language trivia, or theory, then feel free to skip to the next section!

I do not advocate for one or the other programming language out there, honestly I think they all suck! Somewhere on the Lambda the Ultimate Blog I read that in 70 years of computer history a not exactly known number of computer languages have been invented, but it may be in the range of 8000, so twice a week somebody comes up with new language. That number, if only vaguely true, should give testimony to the very unsatisfying state of programming languages.

In my opinion, the ideal programming language would somehow reflect extreme simplicity, a kind of simplicity that physicists strive to discover in the basic workings of the universe. Lately, I stumbled upon the Urbit mother lode of language. If any language deserves to be called the most ideal and basic, then it may be the Urbit one! Alas, it is not well supported (yet?) here on Earth.

Let’s see what we have, and as I said, the starting point is Mathematica. Then, I’d like to look for an alive language with a compiler, a language that translates well from Mathematica functional style programming code and allows one to go down to the basic machine supported data types.

I think the Java/C#/C++ route is hopeless, one spends way too much time in designing the usual data plumbing before solving anything real. The main culprit is the inadequate language capability to separate between data structure implementation and algorithms. Generics or templates in those languages constitute a type system, where the programmer ends up typing a lot to achieve some separation of algorithmic core and data structure implementation.

Probably, I digress a bit here, but it’s worth taking a closer look at the last statement. Classes in any object oriented languages are also a sort of type and additionally encapsulate a finite state machine with (altering) methods. Now in a concrete example, the generic algorithm to test for the equality of two trees with objects of some kind should make no reference to the objects. However, when the method for object comparison is defined in the object class itself, then you can’t really have a true separation between data structure implementation and algorithm unless the compiler is smart enough to figure out the embedding context of operations and functions. Take a closer look here, how ugly things can get with generics in C#!

I think the gist is, the C# compiler is simply not smart enough with generics, that means it can’t figure out the type, to place the “==” operator into the proper context. The C++ compiler is smarter, and with “==” operator overloading, the generic core algorithm can avoid making references to the objects. Overall, type inference and computation is key to the solution, and that’s where Haskell excels with succinct syntax! C++ needs a lot of hand typing to perform the same abstraction and everything looks cumbersome and quite fragile, especially when it comes to template-template parameters.

Luckily, Mathematica doesn’t need generics, it’s not a compiled language. Here, a generic algorithm can work with vastly different different data, thanks to the pattern matching capability.
But that’s aside the point here.  Enough of the language trivia, let’s go to solving problems!

Some Sudoku questions

Back to solving Sudoku puzzles! It is quite easy to solve Sudoku puzzles with a computer. The Wolfram Library Archive has at least three Sudoku solvers published in Mathematica. The Haskell Wiki page has its own section on Sudoku puzzles. Numerous implementations in other languages do exist too.

Solving Sudoku puzzles on a computer amounts to exploring a huge combinatorical space, very simple minded approaches rely mostly on guessing, but then this approach is not very successful at finding solutions quickly. The most efficient Sudoku Solvers implement a version of constraint propagation with search. Constraint propagation is based on imposing constraints, or likewise eliminating choices through the repeated application of elimination rules. The strategies for deterministically solving a Sudoku puzzle be can cast into a program code. This is the backbone for the constraint propagation approach.

Peter Norvig, after having solved many human made Sudoku puzzles with his solver, conducted a random search on the vast space of puzzles and found that a slim percentage takes an inordinate amount of time to solve via guessing and the application of elementary logic.

My question now is, what is the role of more complicated solution strategies, does it lead to faster solutions on average, how about for particular hard ones? Can we quantify the power of a particular solution strategy, it would be nice to obtain some information theoretic quantities, like the average information gain through a solution strategy.

Amongst the hardest question is the one to find the smallest set of constraint logic rules which allows one to solve deterministically any unique Sudoku puzzle or prove the inconsistency thereof (without resorting to guesses). How does that set of rules change, if we are allowed to guess n =1, 2, 3, … times? Is there a way for a computer to learn the solution strategies automatically, hello Bayesian computer scientists?

Also, wouldn’t a Sudoku Solver make a good candidate for a quantum computing (QC) algorithm, the solution space implements the Fermi exclusion rules, hence a QC approach is expected to perform really well! Any volunteers to take on the Sudoku QC challenge? Here are some tools. Of course, there are all the questions of scaling the Sudoku puzzle in higher dimensions!

Ok, I stop teasing! Let’s answer some very simple questions first.


How to represent a Sudoku puzzle?


In Mathematica, I use nested lists to represent a Sudoku puzzle. A simple text string serves as input:

sudokuPuzzle = “002000080050400600100073020008006203060050090901300700010260007007005060030000900”
With some nice presentation transformation

Initialize[sudoku_String] :=
Partition[
 Take[Rest@
   IntegerDigits@
    FromDigits["1" <> StringReplace[sudoku, {"." -> "0", "-" -> "0"}]
     ], 81], {9}]

SudokuForm[soduku_List] := GridBox[soduku /. 0 -> \[Placeholder],
    ColumnsEqual -> True,
    GridBoxDividers -> {"Columns" -> {True, True, True,
            AbsoluteThickness[2], True, True,
            AbsoluteThickness[2], {True}, True}, "ColumnsIndexed" -> {},
   "Rows" -> {True, True, True,
            AbsoluteThickness[2], True, True,
            AbsoluteThickness[2], {True}, True}, "RowsIndexed" -> {}},
    RowsEqual -> True] // DisplayForm

we  get this pleasant to read Sudoku typical output:

(s0 = Initialize[sudokuPuzzle] // Normal) // SudokuForm

where s0 is the usual dense matrix representation with zeros at places with no hints.

How to solve a Sudoku puzzle systematically?

Humans solve Sudoku puzzles using a number of strategies. These strategies help us to reduce the number of wild guesses and the necessary bookkeeping, all the tasks which humans are not really good at!

For a computer solution, we need a somewhat more detailed perspective at what needs to be achieved. Think of a Sudoku puzzle as sort of combinatorical space that contains all hypotheses. A hypothesis is a wild guess what the final solution could be if we ignored all but the simplest Sudoku rules.

A Sudoku puzzle is solved when every row, column and 3x3 sub-grid contains the values 1 to 9.
So lets use these rules to construct an initial hypotheses space:

CurriedMap[f_Function, level_: {1}] = Function[{set}, Map[f, set, level]];
CurriedPartition[level_] = Function[{set}, Partition[set, level]];
CurriedFlatten[level___] = Function[{set}, Flatten[set, level]];

Rows = Identity@# &;
Columns = Transpose @# &;
SubGrids = CurriedFlatten[{{1, 2}, {3, 4}}]@CurriedPartition[{3, 3}]@# &;

MissingValues = Complement[Range[1, 9], Flatten[#]] &;
HasAllValuesQ = Equal[9, Length@Union@Flatten[#]] &;

Then the intersection of the independent column, row and subgrid hypotheses give us a first shot on how many possibilities are left to try:

RowsHypotheses = CurriedMap[MissingValues];
ColumnsHypotheses = RowsHypotheses@Columns@# &;
SubGridsHypotheses = CurriedMap[MissingValues]@SubGrids@# &;

HypothesesSpace = Function[{sudoku},
 MapThread[
  If[#1 == 0, #2, {#1}] &,
  {sudoku,
   SubGrids@MapThread[
     Function[{subgridHyp, block},
      Map[subgridHyp~Intersection~# &, block]],
     {SubGridsHypotheses@sudoku,
      SubGrids@
       Outer[Intersection, RowsHypotheses@sudoku, ColumnsHypotheses@sudoku,
        1]
      }
     ]}
  , 2]
 ];

Let’s try that for the above mentioned Sudoku puzzle:

(hyp0 = HypothesesSpace[s0]) // MatrixForm


And the hypotheses matrix is:
(HYP0)

A cell with a certain value is represented by a list with one element, otherwise we have multiple guesses.




The essential algorithm behind solving a Sudoku puzzle is to eliminate multiple choices, simple! You can either use logic rules or guess. The size of the initial hypothesis space is too big to allow for testing all possibilities:



HypothesesSpaceSize = Times @@ Flatten@Map[Length, #, {2}] &;
HypothesesSpaceSize[hyp0]

72 781 537 356 620 035 522 560 000

Even if we used a big machine that could evaluate ten billion guesses per second it would still take more than

200 000 000 years

to find the answer with certainty. Well, solely guessing is not a strategy for solving general Sudoku puzzles!

For  information theorists, here are some numbers, Entropy decreases to a constant and is measured in “nonets” (logarithm with base 9) instead of bits (logarithm with base 2)!

InformationEntropy =
Function[{hyp},
 Plus @@ Flatten@Map[Length[#]/9 Log[9, 9/Length[#]] &, hyp, {2}]];
InformationEntropy@hyp0 // N

11.6489

and the information gained increases to 81

InformationLowerBound =
Function[{hyp}, N@Log[9, Times @@ Flatten@Map[9/Length[#] &, hyp, {2}]]];
InformationLowerBound@hyp0

53.8979

Pruning the hypotheses space

Let’s find and apply some good logic to reduce the hypotheses space! Firstly we need a bit more terminology here. I call a row, a column or a 3x3 subgrid (block) a dimension. A unit is a list of cells (squares) either in a particular dimension or some cross dimension. Now, there are strategies to follow that only consider the hyotheses within a single dimension. This is called same dimensional logic!

Peter Norvig’s solver, also defines a cross-dimensional unit, the peer group. Peers of a cell is the group of 20 cells in direct influence of the considered cell (including itself). This definition is useful for implementing the consistency logic.


With the following set of 4 reduction rules iteratively applied to all dimensions separately, the Mathematica Sudoku Solver finds all solutions to the (at this time) 49151 distinct Sudoku puzzles with 17 hints with minimal guessing.

Let’s look at the logic!

Level 0 logic is what happens in forming the initial hypothesis space. If there's a solution in a column, row or block, then eliminate the solution value from all the rest of the unit!

EnforceConsistency[sS_List] := Module[{known = Cases[sS, {a_Integer} -> a]},
  If[Length[Union[known]] < Length[known] || (Count[sS, {}] > 0), Throw[{},Inconsistent],
   Map[If[Length[#] === 1, #, Complement[#, known]] &, sS]]];

Level 1 logic is rather simple: We tally up all unknown values in a given unit, if a value only occurs once it must be a solution!

LocalizeSingles =
 Function[{sS},
  sS /. Map[{___, #, ___} -> {#} &,
    Cases[Tally@Flatten@sS, {a_Integer, 1} -> a]]];

Level 2 logic is a bit more complicated: Say in a given unit, we find exactly two cells with the hypothesis {1,9}. Then obviously, the value 1 and 9 can only occur in those two cells. All other ocurrances of the value 1 and 9 can eliminated.

MakePattern = Function[{case}, Pattern[v, Alternatives @@ case[[1]] ]];
LocalizePairs =
 Function[{sS},
  sS /. Map[
    RuleDelayed[
        x : {_Integer, __Integer} /; MemberQ[x, #1] && UnsameQ[x, #2],
        DeleteCases[x, #1]] &[MakePattern@#, First@#] &,
    Cases[Tally@sS, {{a_, b_}, 2}]]];

Level 3 logic resembles the level 2 logic: In a given unit, if we find exactly three cells with the hypothesis {1,7,8}, then all other ...

LocalizeTriples =
Function[{sS},
 sS /. Map[
   RuleDelayed[
       x : {_Integer, __Integer} /; MemberQ[x, #1] && UnsameQ[x, #2],
       DeleteCases[x, #1]] &[MakePattern@#, First@#] &,
   Cases[Tally@sS, {{a_, b_, c_}, 3}]]];


You see the pattern here!. Even higher order logic is really easy to implement with the Mathematica pattern rules without writing tons of code. However, higher level same dimensional logic becomes less and less useful in general, because it only applies to a small subset of instances.

Anyway, you can apply these rules on any unit and see what it does, i.e for all the units in the row dimension:

hyp1 = Map[LocalizeSingles, hyp0];

Let’s chain the logic together via rules

TheWholeNineYard[unit_] := LocalizeTriples@LocalizePairs@LocallizeSingles@EnforceConsistency@unit

SameDimensionRules = {
  hyp_ :>  Map[TheWholeNineYard, hyp],
  hyp_ :>  Columns@Map[TheWholeNineYard, Columns@hyp],
  hyp_ :>  SubGrids@Map[TheWholeNineYard, SubGrids@hyp]
  };

and via a fix point method we implement the constraint propagation:

PruneHypothesesSpace[hyp_] :=
FixedPoint[Function[{step}, Fold[#1 /. #2 &, step, SameDimensionRules]],
 hyp]

The Search strategy

Not much needs to be said about a classical backtracking algorithm, except for the "Throw-Catch" statements it's mostly functional programming style and recursion:

CompleteQ[hypS_] := HypothesesSpaceSize[hypS] <= 1

Search[hypS_?CompleteQ, ___] := hypS
Search[hypS0_] := Module[
 {
  hypses = ExpandFirst[hypS0],
  hypS1
  },
 Select[hypses,
  (hypS1 = Function[{hypGuess},
        Catch[
         Search[PruneHypothesesSpace[hypGuess]],
         Inconsistent]
        ][#[[1]]]) != {} &, 1];
 hypS1
 ]

ExpandFirst[hypS_] := Module[
 {firstunsolved = First[Cases[hypS, {_Integer, __Integer}, {2}]],
  pos
  },
 pos = First[Position[hypS, firstunsolved]];
 {{ReplacePart[hypS, Take[firstunsolved, 1], pos]}, {ReplacePart[hypS,
    Rest[firstunsolved], pos]}}
 ]



Anyway, you can grab the Mathematica Sudoku Solver and experiment with it freely!


The Fast Haskell Sudoku Solver

Despite that I was quite pleased with my initial solver, it nowhere came near the speed requirements for some really nice experimentation with lots of hard Sudoku puzzles!

Well, let’s see what Haskell has to offer? I leave the details and some Sudoku analysis for the next section.

However, if you’re interested in the performance comparisons for solving the 49151 Sudoku puzzles take a look at the table below!

Core Duo 2.4 Ghz, Mathematica 7, GHC 7.0.1
MathematicaSolver9000s
FastSudokuSolver2.hs390s
FastSudokuSolver3.hs220s
FastSudokuSolver4.hs100s
FastSudokuSolver5.hs50s



I translated the original Mathematica code into Haskell (FastSudokuSolver2), then iteratively changed the data structures to become more and more machine efficient.

You’re welcome to extend the list and build a faster one!