By now, you might have played through our regular expressions tutorial, Breaking the Ice With Regular Expressions. We think regular expressions are pretty cool, but would you believe we used them to help make some of the challenges for the course? That’s right — we used regular expressions to find patterns in text that we then turned into challenges for a course that teaches you how to write regular expressions to find patterns in text (hashtag /regex\sinception\!/i)! So today I’m going to walk through creating a challenge for the course, and how we used regular expressions to do it.
One of the course’s challenges starts by describing a problem: “Try writing a single pattern that matches at least 2 characters in all of these strings: Canada, Japan, New Zealand, Albania.” It sounds simple enough, but at Code School, we want you to be immersed in the fun storylines of our courses, so we needed to find out what countries are accessible by boat.
I headed over to Wikipedia, which sure enough had an article, but there were a lot of countries, and the data wasn’t in a format that was very usable.
A block of text like that could ruin anyone’s day, but regular expressions can help make it more readable. I copied the entire plain text contents of that table on Wikipedia into regex101.com, which is a great tool for testing out regular expressions because it highlights matches in the subject string as you’re typing, and it’s similar to the environment we’ve created for you in the course.
Since I just wanted to find country names, and those were the first item reported by the list, I started by looking for [A-Za-z]+, which would select the entire first word. Some countries, though, are multiple words, or have special characters, and are then followed by more letters, so I also added a group with a few of those characters followed by another set of letters, like this — (\s|\,|\'|\(|\)|[A-Za-z\u00E9])+ — so that countries like Macedonia, Republic of, Cocos (Keeling) Islands, and Réunion would show up in the results.
At this point, a whole host of strange results are being returned because of all the parentheses creating multiple capture groups, so I finished off the pattern by wrapping the whole thing in parentheses and marking the inner groups as non-capture with the ?: operator so they aren’t included in the results.
The final pattern looked something like this: /([A-Za-z]+(?:(\s|,|\(|\)|'|[A-Za-z\u00E9])+)?)/, and you can play around with it here on regex101.com.
Wait, What Am I Trying to Do?
Let’s stop for a moment and think about why we even went down this path to begin with — we wanted to find a list of countries with at least 2 common letters that border an ocean.
So, I copied over that result set to another Regex 101 window so I had a fresh canvas, and then I went to work writing a pattern that would check for any amount of letters, followed by the letters a and n, followed by any amount of letters with the much simpler pattern (\w+an\w+). That left me with a few countries, like Canada, Rwanda, Japan, and France, and 4 of those made it into the challenge!
We Learned Something, Right?!?#$?
There’s all kinds of things regex can help you with, and while most of you aren’t going to finish reading this and start writing your own challenges, it’s pretty easy to see the power of regular expressions. And once you learn how they work, it’s likely you’ll find many more places where you’ll rely on them for a variety of tasks. Ready to dive in? Play our RegEx course, Breaking the Ice With Regular Expressions to start learning (tip: the first level is free), and let us know what you think in the comments below!