Regex negative assertions in the middle of a pattern

Hi automators,
I have a regex that I could use some help with. I write code in Matlab for work, and a common task I want to do is find lines of code that generate print output. In Matlab, most lines that don’t end with a semicolon print the result of the line to the console. What I want to do is find lines that don’t end with a semicolon, and optionally append a semicolon to the end.

I’ve thrown together a regex for use in BBEdit and iterated it over time so it mostly works. There’s one edge case I’ve recently discovered that I can’t figure out how to fix, so that’s what I want help with. My regex is:

(?x
^\h* (?# Indentation)
(?!while|for|if|assert|<other keywords>) (?# Statements that don't produce output: ignore lines that start with these)
[^%\s] (?# Ensure there are non-empty characters before a comment on the line: otherwise would match empty lines)
[%;\v]+ (?# Matches the rest of the line, not allowing ;)
(?<!\.\.\.) (?# Line continuation: if this is present, ignore the line, since output is generated at the last line of the continuation)
(\h*%[^\v]*)? (?# Allow for a comment at the end of the line)
$ (?# End of line)
)

I have a replacement pattern \0; that I can use with find/replace to automate suppressing output in my code.

The case I’m having trouble with is when I have both a line comment (all characters after ;) and a line continuation (lines ending with ... continue onto the next, like \ in Python). I want the regex to be able to notice when there’s a ... at the end of a line and ignore it if so. I also want to allow for comments after the ..., which take up the rest of the line until a line break. For example, this case is captured by the regex, but I don’t want it to be since output is suppressed at the end of the continued line:

break_works=1*2* ...
3*4; % Neither line is captured!
comment_works=1*2*3*4; % This also isn't captured!

fails=1*2 ... % This line is captured by the regex!
*3*4;

I think the issue is that the negative assertion can move around. I think that the assertion fails if the pattern captures the comment, so it instead doesn’t capture the comment and moves the assertion to the end of the line, which doesn’t match. Is there a way to deal with this kind of complex pattern, where I have quantifiers before and after a subpattern I want to reject?

If you have another method that works, that would be great too! It just needs to account for keywords, line continuations, and comments.

Thanks for your help!

I am not sure exactly what you are asking and I do not know all the nuances of Matlab :woozy_face: so take what I write as only something to think about.

Is it possible to deal with this issue with a multi-step RegEx instead of trying to pile everything into a single RegEx statement?

  1. Find: ^(.*)(\s+?)\.\.\.\n
    Replace: \1

  2. Find: ^(.*)(\s+)\.\.\.(\s+)%(.*)\n(.*);
    Replace: \1\5;\3%\4

Perhaps initially passing your code through these two steps using BBEdit, you will end up with something that is easier to further process.

I’ve thrown the regex into https://regex101.com to see where the problem might be and it turns out that the regex doesn’t handle the space between ... and %. If you’re certain that there will only ever be 0 or 1 spaces between a line continuation and the start of a comment then the easiest fix would be to change the line (?<!\.\.\.) to (?<!\.\.\.)(?<!\.\.\. ). The second group is needed because lookbehinds need to be fixed width and therefore *, +, ? and groups with options with different lengths won’t work.

If this is not the case then you either have to cover every occurring case like above or there is a much more complex version needed.

I’ve also noticed that there is a ^ missing in the character class in [%;\v]+ (?# Matches the rest of the line, not allowing ;) because it didn’t match anything for me.

Thanks for the replies! I should have provided some examples, sorry. I would like to keep this in a single regex if possible so I can use it in BBEdit’s search form, but separating it is a good idea.

While it’s not general, using 2 assertions works for me! This is all code I write, and I almost always just use a single space before comments. So since the problem is restricted to my own workflow, that works. I never would have thought of stacking assertions like that, that’s a good trick to remember.