A good book makes you think. But what if you make the book think, instead?
I’ve long been fascinated with the idea of inanimate objects that compute, or otherwise do things outside the bounds of their normal function. Examples of this include mechanical calculators, slide rules, and choose your own adventure books; a more recent and directly relevant example is Tic Tac Tome, a book that will play you at Tic-tac-toe.
What these all have in common is that they’re an otherwise inert object that, when a human operator follows a set of directions, can ‘execute’ algorithms or produce behaviours beyond that which the object itself can achieve. They’re a real life example of a chinese room.
When a friend recently asked if a book could play Hangman, it set me thinking about these sort of objects, again, and I decided to see just how practical it was. It turns out that yes, you can ‘teach’ a book to play Hangman, and it does a decent job of it, too.
The first task is to pick a word list to use. I wanted something that included many common words, and which I could tune to the size I needed; that eliminated ordinary word lists. I also wanted to restrict it to nouns, since (in my experience, at least) that’s what people most often guess in hangman.
In the end, the Google Books n-gram corpus proved to be a good match. The 1-gram list provides an enormous dataset of words, frequency counts (by book and total occurrences) part of speech, and publication date, and with a little work can be used to extract, say, the most common 50,000 nouns in books published since 1980. A little sanitisation is necessary; the list contains many nonwords and proper nouns, and particularly short words (fewer than 3 characters) and particularly long ones (longer than, say, 15 characters) aren’t particularly interesting for Hangman. Eliminating the nonwords is easily done by comparng the list against a regular dictionary; I used the scrabble dictionary.
Then, it’s a matter of figuring out the best way to transcribe a game of hangman - or, really, all possible games of hangman - into book form. This also turns out to be relatively straightforward, and works like this:
- Pick the most common letter from your list of words, and guess that.
- Separate the words into groups according to the pattern they make with that letter guessed (eg, 'dead' when guessing 'e' produces the pattern '_e__')
- For each group created, repeat from step 1 with just those words.
This recursive procedure creates a tree structure, with each node corresponding to a guess the book makes. Leaf nodes represent final guesses, where the book declares “your word is ‘dead’!”. A few more tweaks and improvements are necessary; for instance, there’s no point guessing a letter if every remaining word contains it, and there’s likewise no point in using up your last guess if there’s only two options and you can only get one of them right.
After building the tree structure and cleaning it up a bit, formatting it into book form is a relatively straightforward matter of writing CSS and HTML. I found it worked best when sections are sorted topologically, so you always proceed forwards through the book as it makes guesses, and each section tends to be close to the previous one.
One advantage of the way we chose the wordlist at the beginning is that you can tune the length of the wordlist to suit the size of the book you want to produce. I wanted something that was a reasonable, usable size, so I settled on 500 pages; this works out to about 4000 words. A simple second pass that identifies words that can be added without requiring new sections brings the total count up to about 6800 words.
It’s interesting to note that shorter words are much harder to guess in hangman than longer ones, because they have fewer letters, and form fewer patterns with the letters they do have. You’re better thinking of ‘sky’ than ‘disciplinarian’ when it comes to hangman. As a result, there are a few short common words that the book knows of but can’t guess in the alotted number of wrong guesses (6, in the version I always played growing up), none of them longer than 5 characters. In the other direction, if you pick a long word that it knows, it may well figure it out after just a single guessed letter!
And, to answer the question you’re undoubtedly asking right now: Yes, this exists as a real book. I’m calling it Dead Tree, and thanks to the wonders of print-on-demand, it’s available now from amazon.com and amazon.co.uk, just in time for the holiday season. If you know a geek who’s as fascinated as me with things that compute, this might make a good gift for them.
What’s more, the book is open-source, and you can find the source files - wordlist, generating program, and everything else, here on GitHub.
This is a companion discussion topic for the original entry at http://www.arachnidlabs.com/blog/2015/12/17/hangman/
Hi Nick,
This is a fascinating area of thought
It'd be interesting to see something like a book for playing chess - which could then discuss what your options are at each move and why you'd choose them. Obviously it'd only include two or three move selections at each point - the more common openings or reasonable responses - but this would give the reader a kind of intermediate chess strategy lesson. And having a few wins for the reader would be good too
The other resonance is to what I've heard called a biological 'key' - a book which helps you identify a particular organism based on notable characteristics. You've just seen a bird in your garden - how big is it? What colour were its wings? How log was its tail? How sharp was its beak? Did it have a crest? The book then provides some index pages for those characteristics (green wings: page 37, white wings: page 38, etc) and then asks further questions to narrow down the possible selection of birds until hopefully you get it identified.
I've seen this concept more generally as a guessing game where you think of an object and the computer tries to guess what it is based on binary or few-answer questions. Is it larger than a bread box? Animal, vegetable, mineral? Another variation on this is the game of "Guess Who?", where one player asks another questions about the people displayed - "Does he have a beard?" "Is she smiling?" - and removes candidates until only one is left. That'd be a fairly easy game to code into a book - the challenge would be to see if you can pick the one person that takes the longest for the book to guess.
Ultimately I think your reference to the Chinese Room concept is worth exploring. What about a book where you converse with a 'person'? Like in some computer games, you choose from two or three scripted responses and it follows through with a response, and you get to explore a scenario or try to solve a mystery.
There's a lot of good ideas here!
Have fun,
Paul