Parse HTML output with regex?

lbutlr · January 1, 2019, 2:10pm

I have a URL that gives me the current server maintenance schedule. It’s an odd page in that it returns text, but the text is HTML (That is, if I load it in a browser it will show HTML BODY etc tags).

What I want to do is parse that text for a string like “beginning on {day of week} {Long Month} {day}, {time am/pm (PST)} blah blah text here {time am/pm (PST)}” and what I want to get out of that for my shortcuts is month and day and start time and end time.

All I need is a short alert that says something like “SERVER DOWN TIME: 5 JAN 0200-0600”

All the text is contained within a single

tag and there is no extra tagging to worry about.

Here’s a sample of the results I get from the URL:

    ALERT:
    <html><body><p>Server downtime scheduled beginning on Tuesday, January 1st, 2:00 AM (PST) for backup and database integrity checks until 6:00 AM (PST). During this window the servers will not be available.<br><br>Please contact <person@email> for more information.</p></body></html>

dustinknopoff · January 1, 2019, 2:22pm

Here’s an example This assumes that it always begins with “downtime scheduled beginning on”

lbutlr · January 1, 2019, 2:30pm

Aha. Must not have had enough coffee the morning, of course match and then replace. I was trying multiple matches and was getting way out there in those weeds!