Matching Text to extract zip codes


#1

I’m looking to understand the different patterns that can be used for matching specific text/numbers.

What I have is a full address with 9 digit zip code (ex. 12345-9876) from an Calendar invite extracted to text. I then want to then extract just then 5 digit zip code from that text. How can I achieve this? Thanks!!


#2

The direct solution for your problem is this pattern: (\d{5})-\d{4}

Download a demo here: https://www.icloud.com/shortcuts/e8440235e4224271ad842e334d7bffa3

Now to answer your broader question. Let me explain how reguar expressions work. No, there is too much. Let me sum up. :wink:

Seriously though, “regular expressions” are a special language that’s used in the Match Text action. They are also used in lots of other programming languages, and although there are all kinds of small differences and specifics, most are pretty much the same. That’s good news for us, because that means there is a lot of information on the internet.

I personally learned about them using https://www.regular-expressions.info/, which used to be a very popular website about regex 'back in the day. :wink:

Just to get you started though, I’ll explain the regular expression from this Shortcut:

(\d{5})-\d{4}

It starts with a (. I’ll skip that for now, we’ll come back to that. Just ignore the parentheses for now. :slight_smile:

After that we see \d. This means “any number (or actually: any digit)”. So this would match either 0,1,2,3,4,5,6,7,8 or, you guessed it, 9!

After that I put {5}. This means “we actually want five of those in a row”. So “1248” would not match (that’s only four), but “12483” will match, because that’s five digits.

Than we simply match the “-” by putting in the literal “-”. And now we need another 4 digits, so we use \d{4}.

Ok, so back to the start. What about those parentheses?

We use () for so-called capture groups. In Shortcuts this means you can use Get Group from Matched Text to get the text from such a capture group. In this case we only have one, because we only need one part of the text, but if we also wanted to match the second part (the 4 digits) we would have put the \d{4} in parentheses too. We could then use Group at Index (1) to get the first 5 digits, and Group at Index (2) for the last 4 digits.

Hope this all made sense. Regular expressions can be hard to grasp (you’ll probably end up with two problems :wink:), but who knows, you might actually save the day by learning them!


#3

Wow. I’m blown away by the awesome help. Thank you very much!!! That really helped and I’ve spent all day falling down the rabbit hole that is “RegEx”.

Your solution really helped me. Thanks!!