Here’s a bit of a challenge. I regularly get Word docs that include a bulleted list in the following format:
• Some Explanatory Text: https://www.someurl.com
There’s always a bullet, followed by plain text, a colon and then the URL. I would like to automate a conversion of each line to HTML:
Any ideas? If I had to do it on iOS, I’d take that too. (Asking for them to be formatted differently in creation if not on the table, unfortunately.) Thanks.
I would say I’m most comfortable with Applescript. I recognize that there’s a pattern to match, but I was hoping someone could get me started in the right direction.
Btw if you are the one that creates these documents and wanted to make it less complicated of a script you could reverse the order (url first then explanation) it would be MUCH easier. But I will work with we got for now.
Thank you, that’s a big help. I am running into a snag. I know how to split the line based on a delimiter, and I can see why we can’t just split based on the colon (there being two colons in there), but I just don’t know how to split based on a beginning and ending delimiter. Thanks for your help. I’ve been an Applescript tinkerer for years but haven’t really progressed beyond beginner/intermediate level.
I don’t know if I understand your entire question. I do see that you are trying to split it, and splitting it by the colon will create 3 strings. I dont think there is a way in AppleScript to split a string by only the first occurrence of a colon. So, what I would recommend is splitting it up into three strings (Some Explanatory Text, the https and the //www.someurl.com)
Then when you are creating the final concatenated string at the end, putting those together in the rearranged order and adding the https: as a string is probably your best bet.
If you have python3, the below script will run on macOS and in Pythonista. It uses regular expressions to extract the two parts you need.
Python
import re
import sys
def find_and_format(contents):
"""find_and_format
File must be a series of text in the format:
• Some Explanatory Text: https://www.someurl.com
separated by newlines
Arguments:
contents {[str]} -- [contents of a file]
"""
r = re.compile(r'(?<=• )(?P<name>.*(?=: )): (?P<url>.*)')
matches = re.findall(r, contents)
html_out = ""
for match in matches:
name = match[0]
url = match[1]
html_out += "<li><a href=\"" + url + "\">" + name + "</a></li>"
print(html_out)
if __name__ == '__main__':
with open(sys.argv[1], 'r') as f:
contents = f.read()
find_and_format(contents)
Otherwise, you can try using pandoc, a command line tool that converts between many text types (docx, html, pdf, markdown, and a lot more)
Thanks for your help, everyone. I decided to go in a different direction and work it out in Keyboard Maestro. It’s ab it more manual than I’d had it before, but I will continue to refine it to make it more automated. So now, I paste the bulleted list into BBedit, strip out the bullets, select the URL in the first line, invoke the KM macro, select the next URL and repeat until done and then have BBedit add the HTML for line item back in. Perhaps not as quick as it can be yet, but I got 90% of what I wanted with minimal effort and that’s good enough for now.
Thanks again for helping me think through the structure of the problem.