With Powershell I am able to start a request to a website and parse through the body. I know that there are two different types of request I can do with Scriptable, „WebView“ and „Request“.
Could someone of you help me getting back the HTML Body from a Website, so that I I can search for specific Text?
Thank you for your time and help.
As an example, this will dump the HTML of the Automator’s page on the Relay web site to the console.
let req = new Request("https://www.relay.fm/automators");
Great. That worked. So now I could save the code and search and parse for any words on the site, right?
Yes, it is just a string of uninterpreted text the same way it is if you retrieve it in Powershell. You can slice, dice and parse the string in any way you want.
Perfect. Thank you. I will try my best to find the right commands for that.
If you get stuck and can be specific about what you want to do (e.g. find the location of the first occurrence of the text, return all text before the third occurrence of the text, return everything after (and including) the last occurrence of the text, return every line with the text in it as an array, return every sentence with the text in it as a single string with sentence separated by two newline characters), I’m sure we can assist further.
On work we have an internal website where daily news are published. I thought it would be nice to have some kind of widget, like the Apple News on iOS, where you could show the latest four or six news. For this reason I analysed the source code and found a section, where every news is listed.
When I made plans for the widget I thought the first thing I would need to do would be to isolate the links for the news, as well as the pictures.
I will post a snippet of the code at the end. Maybe my thoughts are to complicated and there is an easier way. So if you have any ideas or even if it doesn’t make sense, please let me know
<aside id="text-3" class="widget widget_text"><h2 class="widget-title">News</h2> <div class="textwidget"><p><a href="https://internalurl.local"><img loading="lazy" class="" src="http://img.internal Ito.local.to/Zugvvd.jpg" alt="" width="170" height="251" /></a><a href="https://internalurl.local/"><img loading="lazy" class="" src="http://img.internal.local.to/XGFtzs.jpg" alt="" width="167" height="250" /></a></p>
This should give you a starting point for images which you can also adapt to links.
Great! Thank you. But how would you filter for the news section so that you don’t get all links out of the source code but only the ones in the filtered section?
I would either use a regular expression to match against the boundary tags (that
widget-title one with “News” after it, and probably the
/aside that follows), and return what’s between them (i.e. the news section), or do a couple of
split() calls to chop the content prior to and after from those boundary tags to leave me with just the news section.
There may also be a good way to do it via loading it in and using a DOM approach, but I’m a bit tired to think that one through right now, and the options above should suffice from a purely string processing point of view.
Hope that helps.
Is it not possible to build a DOM tree from returned HTML? Then tree walking gets you what you want in a more robust fashion.
Hello everyone, i hope you had a great weekend and good start into the week.
I‘ve been trying to do some some regex and so far it seems to work. The only problem is that it only returns 1 string and not everything the expression finds. I double checked my regex on https://regex101.com/ and found out that the code matches everything that i need but doesn’t return it. I also tried an Array but maybe i did something wrong. Would be really nice if someone could take a closer look.
let string = '<aside id="text-3" class="widget widget_text"><h2 class="widget-title">News</h2> <div class="textwidget"><p><a href="https://internalurl.local"><img loading="lazy" class="" src="http://img.internal Ito.local.to/Zugvvd.jpg" alt="" width="170" height="251" /></a><a href="https://internalurl.local/"><img loading="lazy" class="" src="http://img.internal.local.to/XGFtzs.jpg" alt="" width="167" height="250" /></a></p></div></aside>'
let regexsection = /News.*\/aside/s;
let section = regexsection.exec(string)
//Creating Regex for News Links
let regexlinkstonews = /<a href="(.*?)".*?\>/gs;
let linkstonews = regexlinkstonews.exec(section)
//Creating Regex for Image Links
let regexlinkstoimage = /src="(.*?)".*?\/a>/gs;
let linkstoimage = regexlinkstoimage.exec(section)
That actually did the trick. Thank you very much. After I put everything in the array I had to slice some parts but after all I had the result I was looking for.
Now I will try to build a widget. Anyway. Thanks again for your fast and efficient help.
Have a great day.
I’m about to pursue my “DOM Tree” idea - as I think it has merit.
I don’t mind whether it works on Mac or iOS. I’m wondering what can take HTML (as that’s easy to get to from Markdown) and create a DOM tree. Drafts? Scriptable? A web browser? curl?
I would suggest to use an exiting solution instead of reinventing the wheel again and the best thing for that is a browser.
Since I don’t know Drafts, how about the WebView of Scriptable?
Actually , since I commented, that is precisely what I did. That and get x-callback-url working between Drafts and Scriptable…
This was really an exercise rather than a practical application but it’s got me the basics of an idea.
What I don’t know is whether I could’ve done it all in Drafts with its idea of a web view. Obviously I’d prefer not to round trip through a different application.
Then I had several more unrelated brainstorms and so this is left as a proof of concept.
I could write it up in my blog - but would only do so if someone confirms there isn’t a DOM tree walking web view in Drafts. (Otherwise it’s a waste of my readers’ time.)