Extracting first and last names

CRT · June 9, 2022, 3:55am

I am using a shortcut to extract personal details from a phot. I have extacted out the line with their name, and it is in the format such as:
SMITH, MS JENNY ANGELA

I am trying to use the MATCH command to get the firstname and have
^*(.+),\s
as i thought that you could use brackets in a RegExp to return part of the results of the expression, but when I look at the match results I will get
SMITH,
(with the comma)

The reson I am using .+ to match is I want to handle names such as O’HARA and FANCY-SURNAME as well as more typical single word surnames.

Is there an easier way that I am just overlooking? I can’t really find a lot of detailed explanation or examples for this when I actually want to return more than one piece of data from a single match statement.

sylumer · June 9, 2022, 6:11am

You are matching. Groups (sets in parentheses) are for substitutions. I think you would always have to use a two step process if you are relying on the comma for a match to then remove the comma with a subsequent step.

CRT · June 9, 2022, 8:10am

Ahhh, yes, that’s where I did see groups discussed. I had thought I could use it for matching but you are right, used for substitutions.

So I would need to a multistep process:

Match for last name (will also return comma)
Tidy match to just name
Match for first name (will also return prefix and middle name if present)
Tidy to match just first name

sylumer · June 9, 2022, 8:32am

Or match both names with the comma and then just split the result on ", " to put them into a list.

CRT · June 9, 2022, 8:46am

Not sure what you mean here. If I match first and last names, aren’t I getting back what I already have? And if I split on comma, I still need to trim out the prefix

eg:
SMITH, MS JENNY ANGELA

Split on comma, wont I get
“SMITH”
“MS JENNY ANGELA”
and still need to parse out the first name?

sylumer · June 9, 2022, 8:55am

Sorry. Forgot that was your example. Yes, you’re right unless you remove the comma and split by spaces I guess. That would be three steps to give access to the two items.

oldblueday · June 10, 2022, 3:37am

What if you used a look ahead? Something like:
.*(?=,)
would match any characters that are followed by a comma, but not including the comma.

And I suppose for the rest, you could use (?<=, ).*, so it matches everything after a comma-space but not including that.