Decoding HTML entities

I’m grabbing contents of some web pages to extract some details and then send to a log. The HTML I get sometimes includes “HTML entities” such as &rsquot; and &. These were necessary in ASCII-only days, but aren’t needed now we have UTF-8 most places.

I could write a series of find-and-replaces, but I’d prefer to re-use someone else’s tested work.

(The URL Encode/Decode action doesn’t deal with these, at least for me. It deals with %-encodings only, it seems.)

If you are not interested in the tags and just the text, convert the HTML to rich text and then take the text from that.

Neat thought, thanks, @sylumer.