Solving Scrabble with Crystal

I recently solved a problem I’ve always wanted to solve: writing a library that can unscramble words. I know, this is a pretty geeky and pointless endeavor. Geeky, though, is what I do.

Originally, the library focused on exact matches. Given some string, I could find all words that can be formed with those letters. After accomplishing my goal, I decided to extend the project to finding the longest word possible. I noticed that this sounded a lot like finding solutions for the game Scrabble. While the library is still a work-in-progress, I thought it might be fun to share my progress. This is the story of how I wrote a library and simple web service for solving Scrabble with Crystal.

The Tools

For my solution, I didn’t want to just produce a library; I wanted to build a web service. To accomplish this, I used kemal, which is a fantastic Sinatra-like framework for Crystal. For building quick and simple APIs, I love using Sinatra, and Kemal is pretty close.

Here’s what my shard.yml (the file used to manage dependencies for Crystal) looks like:

Here is the absolute basics for my health check in Kemal:

Other than Kemal, I was able to get away with the standard library for Crystal. Less dependencies means a slimmer binary and easier maintenance, so that’s a win.

Original Solution

The first and arguably most obvious requirement for unscrambling words is a dictionary of actual words. The better the dictionary, the more solutions are possible. I found this word list on GitHub and it seems to do the job. The simplest approach to using this list of words is to load it directly into memory as an Array. Here’s the easiest way to do that:

Once we have a dictionary, the rest of the work is in the actual implementation. We need to be able to covert a string of characters to a list of possible words.

My first thought was to test all permutations of the input to see which words were buried in there. Crystal provides a method for Array called permutations (technically, this comes from the Indexable module that Array includes).

Here’s a simple example:

Here’s what my original web service looked like:

Assuming the above is saved in src/, running this looks like:

With the app running, a GET request like curl http://localhost:4567/unscramble/hstu will respond with something like {"anagrams":["huts","shut","thus"]}.

This works great for short input, but trying to unscramble something just a little longer, like “napiyeccedol” (which unscrambles to “encyclopedia”), will take a very, very long time. The reason is actually pretty straightforward: for “hstu”, there are 4! (or 24) permutations, so our dictionary Array is searched 24 times. For “napiyeccedol”, that is 12! (or 479,001,600) permutations, or 12! searches. Crystal is fast, but not fast enough to make up for that! Yes, I know this isn’t technically a problem for the actual game Scrabble, but I wanted my solution to be better than this.

Improving the Solution

It dawned on me that I could do some preprocessing of the input to dramatically improve performance. All permutations of a given set of characters are, by definition using the same characters, just in a different order. If I converted my original dictionary to a Hash, I could key off of a sorted version of those characters and the value could be the actual words that can be formed from that sorted list of characters. This way, all I need to do on a lookup is sort the input and see if there is a key that matches that sorted input. This requires only one dictionary lookup.

My new dictionary looked like this:

Ah, the magic of Hashes. This takes a bit longer to create, but it performs consistently and very well. On average, Crystal Hash lookups average O(1) (with a worst case of O(n)) thanks to the use of a hash table. Even on a formidable dictionary like the one above, key lookups are consistently less than 100µs (yes, that’s microseconds).

The new and improved version of the service looks like this:

Adding Functionality

We have a fast way to find words with all of a precise set of characters, but that isn’t the only thing I would want when playing Scrabble. What if there isn’t a way to use all the letters? I need to be able to find the longest word I can make with a given set of letters.

To do this, we’ll need to go back to something like permutations again, but we don’t need all of them (in most cases). What we actually need are combinations. We can take advantage of Crystal’s Iterators and their ability to lazily process huge collections.

Here is how I implemented this initially:

This works well for many circumstances but in some of the worst cases (like long input or input that can’t form a word) it can be quite slow. If I limit the input to what is possible from actual Scrabble games (7 tiles), this is safe enough. Here’s how I added some input validation:

I defined a simple input validation method that verifies text input and that it is less than or equal to some maximum length. Now we can add this to the /longest endpoint:

What’s Left

This isn’t the end. One thing I haven’t implemented here is support for “blank” tiles. I’d also like to support informing the service about bonus spots on the board and about tile point values so I can optimize for points over word length. In any case, this was a fun exercise and I got to learn a bit more about the wonderful Kemal framework for Crystal.

It is amazing how well Crystal performs. With the input validation added, I couldn’t get a response from any of the above endpoints that took longer than a 200µs, and that’s before I compiled a “release” version of the application. I’m really impressed by how easy it is to create simple APIs that perform extremely well.

Spread the love