Hi automators,
I have a regex that I could use some help with. I write code in Matlab for work, and a common task I want to do is find lines of code that generate print output. In Matlab, most lines that don’t end with a semicolon print the result of the line to the console. What I want to do is find lines that don’t end with a semicolon, and optionally append a semicolon to the end.
I’ve thrown together a regex for use in BBEdit and iterated it over time so it mostly works. There’s one edge case I’ve recently discovered that I can’t figure out how to fix, so that’s what I want help with. My regex is:
(?x
^\h* (?# Indentation)
(?!while|for|if|assert|<other keywords>) (?# Statements that don't produce output: ignore lines that start with these)
[^%\s] (?# Ensure there are non-empty characters before a comment on the line: otherwise would match empty lines)
[%;\v]+ (?# Matches the rest of the line, not allowing ;)
(?<!\.\.\.) (?# Line continuation: if this is present, ignore the line, since output is generated at the last line of the continuation)
(\h*%[^\v]*)? (?# Allow for a comment at the end of the line)
$ (?# End of line)
)
I have a replacement pattern \0;
that I can use with find/replace to automate suppressing output in my code.
The case I’m having trouble with is when I have both a line comment (all characters after ;
) and a line continuation (lines ending with ...
continue onto the next, like \
in Python). I want the regex to be able to notice when there’s a ...
at the end of a line and ignore it if so. I also want to allow for comments after the ...
, which take up the rest of the line until a line break. For example, this case is captured by the regex, but I don’t want it to be since output is suppressed at the end of the continued line:
break_works=1*2* ...
3*4; % Neither line is captured!
comment_works=1*2*3*4; % This also isn't captured!
fails=1*2 ... % This line is captured by the regex!
*3*4;
I think the issue is that the negative assertion can move around. I think that the assertion fails if the pattern captures the comment, so it instead doesn’t capture the comment and moves the assertion to the end of the line, which doesn’t match. Is there a way to deal with this kind of complex pattern, where I have quantifiers before and after a subpattern I want to reject?
If you have another method that works, that would be great too! It just needs to account for keywords, line continuations, and comments.
Thanks for your help!