Using ‘grep’ to play a word game
Sometimes I need to take a break from what I’m doing and let my mind relax. And a fun way to do that is to play a simple puzzle game. You might be familiar with Wordle, the word puzzle game where you make successive attempts to guess a secret five-letter word that changes every day. For each guess, the game tells you which letters are correct and in the correct location (green), which letters are correct but in the wrong position (yellow), and which letters don’t actually appear in the secret word (gray).
I find that this can be a relaxing game to play when I need a quick break. And when I play the game, I like to use the grep
command to exercise regular expressions. Using grep
isn’t really cheating, it’s just a way to help narrow down my options.
Start with a list of words
To get started, you’ll need to have a list of five-letter words. Linux provides this in the /usr/share/dict/words
file, but this file contains all kinds of words, including names and other proper nouns (like Linus), some number-based words (such as 12-point and 1st), and acronyms (like SPARC). Wordle doesn’t allow these kinds of words, it only uses all-lowercase words. To get a list of all-lowercase five-letter words, we can use the character pattern [a-z]
which matches a single lowercase letter from a
to z
. If we use this multiple times, and combine it with ^
to match the start of a line, and $
for the end of a line, we’ll have a list of words that are all-lowercase and exactly five letters long:
$ grep '^[a-z][a-z][a-z][a-z][a-z]$' /usr/share/dict/words > wordlist
This looks for words in /usr/share/dict/words
that are composed of exactly five lowercase letters, and saves the output in a new file called wordlist
in the current directory. On my system, that list is over 15,000 words long!
$ wc -l wordlist
15034 wordlist
Narrow down the options
Start the game by guessing a word that has five letters. To help narrow down the options, I like to pick a word that has five unique letters, rather than a word with repeated letters, like boots. Some of the most commonly used letters in English include E, S, T, and R, so I’ll start by guessing the word stare.
Let’s use grep
to help narrow down my possible next guesses. The gray and yellow letter tiles tell me that today’s secret word doesn’t contain the letters S, T, or A. The secret word does contain R and E, but not as the last two letters.
First, let’s narrow down the options to eliminate words that do not contain S, T, or A. The -v
option for grep
is very handy here to “invert” a search. For example, if we “invert” the search for any words with S, T, or, A, grep
will return only the words that do not contain those letters. This already reduces our options from 15,000 possible words in the first guess to only 3,600 possible words for our second guess:
$ grep -v '[sta]' wordlist > guess2a
$ wc -l guess2a
3640 guess2a
But this list also includes words like chide, which has the letter E in the last position, or the word berry which has an R in the next-to-last position. Wordle colored those letter tiles yellow after our first guess, to indicate that the secret word had both R and E in it, but not in those positions. So to narrow down our possible list of guesses, we need to eliminate any words with an E as the last letter, or an R as the second-to-last letter. This brings the list down to only 550 possible words:
$ grep e guess2a | grep -v 'e$' | grep r | grep -v 'r.$' > guess2b
$ wc -l guess2b
553 guess2b
The period in r.$
is a placeholder for any possible character. In this case, since our list only contains words with five letters, this regular expression effectively means “the letter R as the next-to-last letter.”
Make another guess
As I look through my list of words to make my next guess, I want to pick an “everyday” word that has five unique letters. For example, the word creek is good, but it has two E’s. Instead, I decided to guess the word biker.
Guessing a word that has five unique letters provides me additional information about what letters might appear in the word. For example, the gray and yellow tiles tell me that the secret word does not contain the letters B or I. It does have a K in it, but not as the middle letter.
We can use grep
again to further narrow down the options. As before, the first step is to eliminate any words that have B or I. This narrows the list to just over 300 possible words:
$ grep -v '[bi]' guess2b > guess3a
$ wc -l guess3a
325 guess3a
Then, filter the list to only find words with K, E, and R, but not as the last three letters. Since we already filtered the word list to only contain R and E words, we don’t need to run grep
with those letters, but we need to grep
for any words with K:
$ grep k guess3a | grep -v '^..k' | grep -v 'e.$' | grep -v 'r$' > guess3b
$ wc -l guess3b
9 guess3b
This brings the list down to only nine possible words:
$ cat guess3b
dreck
freck
jerky
kerch
kreng
perky
reeky
renky
wreck
Guess the word
From here, guessing the secret word within six total attempts should be pretty easy. Wordle tends to use “everyday” words, so we can pick a word like wreck for the next guess.
This is getting close! We now know the word doesn’t contain W or C, and the letters R, E, and K are in the wrong positions:
$ grep -v w guess3b | grep -v c | grep -v '^.r' | grep -v '^..e' | grep -v 'k$' > guess4a
$ wc -l guess4a
3 guess4a
This narrows down the list of possible words to just three:
$ cat guess4a
jerky
perky
renky
I’ll guess the word jerky, which happens to be correct!
Regular expressions for the win
The grep
command is a powerful tool that lets you find words in a list based on regular expressions. This example shows how to use grep
to help narrow down the options in a word puzzle game, but you can use grep
in the same way to match other things. For example, system administrators might use grep
to find errors in a log file, such as the /var/log/messages
file, but only for a particular day. With grep
, you can match text at the beginning of a line, the end of a line, or anywhere in between – or find lines that do not contain the text pattern. Add grep
to your systems administrator “toolkit” to make your work easier.