I recently solved a problem I’ve always wanted to solve: writing a library that can unscramble words. I know, this is a pretty geeky and pointless endeavor. Geeky, though, is what I do.
Originally, the library focused on exact matches. Given some string, I could find all words that can be formed with those letters. After accomplishing my goal, I decided to extend the project to finding the longest word possible. I noticed that this sounded a lot like finding solutions for the game Scrabble. While the library is still a work-in-progress, I thought it might be fun to share my progress. This is the story of how I wrote a library and simple web service for solving Scrabble with Crystal.
The Tools
For my solution, I didn’t want to just produce a library; I wanted to build a web service. To accomplish this, I used kemal, which is a fantastic Sinatra-like framework for Crystal. For building quick and simple APIs, I love using Sinatra, and Kemal is pretty close.
Here’s what my shard.yml
(the file used to manage dependencies for Crystal) looks like:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 |
name: scrabbler version: 0.1.0 authors: - Jonathan Gnagy <jonathan.gnagy@therubyist.org> dependencies: kemal: github: kemalcr/kemal targets: anagram: main: src/scrabbler.cr crystal: 0.36.1 license: MIT |
Here is the absolute basics for my health check in Kemal:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 |
require "kemal" serve_static false before_get do |env| env.response.content_type = "application/json" end get "/ping" do { ping: "pong" }.to_json end Kemal.run(ENV.fetch("LISTEN_PORT", "4567").to_i) |
Other than Kemal, I was able to get away with the standard library for Crystal. Less dependencies means a slimmer binary and easier maintenance, so that’s a win.
Original Solution
The first and arguably most obvious requirement for unscrambling words is a dictionary of actual words. The better the dictionary, the more solutions are possible. I found this word list on GitHub and it seems to do the job. The simplest approach to using this list of words is to load it directly into memory as an Array. Here’s the easiest way to do that:
1 2 |
WORDS = File.read("./words.txt").downcase.split("\n") |
Once we have a dictionary, the rest of the work is in the actual implementation. We need to be able to covert a string of characters to a list of possible words.
My first thought was to test all permutations of the input to see which words were buried in there. Crystal provides a method for Array called permutations
(technically, this comes from the Indexable module that Array includes).
Here’s a simple example:
1 2 3 4 |
input = "eht" input.chars.permutations.map {|p| p.join }.select { |p| WORDS.includes?(p) } # => ["the"] |
Here’s what my original web service looked like:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 |
require "kemal" WORDS = File.read(ENV.fetch("WORD_LIST", "/data/words.txt")).downcase.split("\n") serve_static false before_get do |env| env.response.content_type = "application/json" end error 404 do { "error": "Not Found" }.to_json end get "/ping" do { ping: "pong" }.to_json end get "/unscramble/:input" do |env| input = env.params.url["input"].downcase words = input.chars.permutations.map {|p| p.join }.select { |p| WORDS.includes?(p) } { anagrams: words }.to_json end Kemal.run(ENV.fetch("LISTEN_PORT", "4567").to_i) |
Assuming the above is saved in src/scrabbler.cr
, running this looks like:
1 2 3 |
shards install crystal run src/scrabbler.cr |
With the app running, a GET request like curl http://localhost:4567/unscramble/hstu
will respond with something like {"anagrams":["huts","shut","thus"]}
.
This works great for short input, but trying to unscramble something just a little longer, like “napiyeccedol” (which unscrambles to “encyclopedia”), will take a very, very long time. The reason is actually pretty straightforward: for “hstu”, there are 4!
(or 24) permutations, so our dictionary Array is searched 24 times. For “napiyeccedol”, that is 12!
(or 479,001,600) permutations, or 12!
searches. Crystal is fast, but not fast enough to make up for that! Yes, I know this isn’t technically a problem for the actual game Scrabble, but I wanted my solution to be better than this.
Improving the Solution
It dawned on me that I could do some preprocessing of the input to dramatically improve performance. All permutations of a given set of characters are, by definition using the same characters, just in a different order. If I converted my original dictionary to a Hash, I could key off of a sorted version of those characters and the value could be the actual words that can be formed from that sorted list of characters. This way, all I need to do on a lookup is sort the input and see if there is a key that matches that sorted input. This requires only one dictionary lookup.
My new dictionary looked like this:
1 2 3 4 5 6 7 |
WORDS = Hash(String, Array(String)).new File.read("./words.txt").downcase.split("\n").each do |line| sorted = line.chars.sort.join WORDS[sorted] ||= [] of String WORDS[sorted] << line end |
Ah, the magic of Hashes. This takes a bit longer to create, but it performs consistently and very well. On average, Crystal Hash lookups average O(1)
(with a worst case of O(n)
) thanks to the use of a hash table. Even on a formidable dictionary like the one above, key lookups are consistently less than 100µs (yes, that’s microseconds).
The new and improved version of the service looks like this:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 |
require "kemal" # Dictionary, will return an empty array on failed lookups WORDS = Hash(String, Array(String)).new([] of String) File.read(ENV.fetch("WORD_LIST", "/data/words.txt")).downcase.split("\n").each do |line| sorted = line.chars.sort.join WORDS[sorted] ||= [] of String WORDS[sorted] << line end serve_static false before_get do |env| env.response.content_type = "application/json" end error 404 do { "error": "Not Found" }.to_json end get "/ping" do { ping: "pong" }.to_json end get "/unscramble/:input" do |env| input = env.params.url["input"].downcase words = WORDS[input.chars.sort.join] { anagrams: words }.to_json end Kemal.run(ENV.fetch("LISTEN_PORT", "4567").to_i) |
Adding Functionality
We have a fast way to find words with all of a precise set of characters, but that isn’t the only thing I would want when playing Scrabble. What if there isn’t a way to use all the letters? I need to be able to find the longest word I can make with a given set of letters.
To do this, we’ll need to go back to something like permutations again, but we don’t need all of them (in most cases). What we actually need are combinations
. We can take advantage of Crystal’s Iterators and their ability to lazily process huge collections.
Here is how I implemented this initially:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 |
get "/longest/:input" do |env| input = env.params.url["input"].downcase result = "" # Start with the full input size and work down input.size.downto(1) do |n| # memoize attempts so we don't repeat them attempts = [] of String # take each combination of `n` characters and attempt a lookup input.chars.sort.each_combination(n) do |combo| this_word = combo.join # skip previous attempts next if attempts.includes?(this_word) attempts << this_word # lookup the possible word words = WORDS[this_word.chars.sort.join] unless words.empty? # grab the first word (we only need one) result = words.first # end our loop since we're done break end end # end the outer loop if we're done break unless result.empty? end { longest: result }.to_json end |
This works well for many circumstances but in some of the worst cases (like long input or input that can’t form a word) it can be quite slow. If I limit the input to what is possible from actual Scrabble games (7 tiles), this is safe enough. Here’s how I added some input validation:
1 2 3 4 5 6 7 8 9 |
class InvalidInput < Exception end def validate_input(input, max_length : Int32 = 7) unless input.match(/^[a-z]+$/) && input.size <= max_length raise InvalidInput.new({ "error": "Invalid Input" }.to_json) end end |
I defined a simple input validation method that verifies text input and that it is less than or equal to some maximum length. Now we can add this to the /longest
endpoint:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 |
get "/longest/:input" do |env| input = env.params.url["input"].downcase validate_input(input) result = "" # Start with the full input size and work down input.size.downto(1) do |n| # memoize attempts so we don't repeat them attempts = [] of String # take each combination of `n` characters and attempt a lookup input.chars.sort.each_combination(n) do |combo| this_word = combo.join # skip previous attempts next if attempts.includes?(this_word) attempts << this_word # lookup the possible word words = WORDS[this_word.chars.sort.join] unless words.empty? # grab the first word (we only need one) result = words.first # end our loop since we're done break end end # end the outer loop if we're done break unless result.empty? end { longest: result }.to_json rescue e: InvalidInput halt env, status_code: 422, response: e.message end |
What’s Left
This isn’t the end. One thing I haven’t implemented here is support for “blank” tiles. I’d also like to support informing the service about bonus spots on the board and about tile point values so I can optimize for points over word length. In any case, this was a fun exercise and I got to learn a bit more about the wonderful Kemal framework for Crystal.
It is amazing how well Crystal performs. With the input validation added, I couldn’t get a response from any of the above endpoints that took longer than a 200µs, and that’s before I compiled a “release” version of the application. I’m really impressed by how easy it is to create simple APIs that perform extremely well.