Find and copy specific text in an article

I come across a lot of articles in my research that include Bible references. Generally I don’t have time to stop and look up each verse. But I do like to take note of them to so that I make sure to refer to them at a later date.

So far I have the web articles split into new lines with a repeating IF script. But, what would be the best way to find AND copy each occurrences of those scripture references and only the reference (without the entire line)?

I hope this makes sense.

Do you have a link to a specific website that I could see.

If there is a reasonable structure, regular expression matching is probably going to be a reliable approach.

If there is no such structure, then you’ll probably be a bit stuck and need a human to step in until such time as Shortcuts starts including machine learning action steps for text processing.

This blog post is a good example:

As @sylumer mentioned it should be easy to have regular expression work on this website. But if it is bunch of different websites it will need to be done manually. But if you do visit this site regularly then it could be done with just this site. I’ll work on it and send you an example.

In most cases the verses appear in a two or three or four letter abbreviations of the 66 books of the Bible followed by a chapter and verses separated by a semicolon. No one books has more than 150 chapters and no chapter has more than 175 verses.

Some examples:
Psalm 119:175
Mt 23:7
Luk 12:2
Gen 1:9-18

If they are all from different articles with different formats, then for regular expressions you would have to have quite a set of variations on books alone. The structure after that should be possible assuming all of the article authors structure things the same in terms of combinations of digits, colons and dashes.

Also would an articles content ever contain multiple entries and how would you handle that?

Yes it’s very tricky. Even if you had the right regular expression for this particular post there is no guarantee the author won’t change the way he formats verses which would break your regular expression.