Getting HTML Body or specific Items

julianrd · March 27, 2021, 1:42pm

Hello everyone,
i am new to Scriptable and especially JavaScript. I am used to script with Microsoft Powershell for work, but I wanted to try something new.
With Powershell I am able to start a request to a website and parse through the body. I know that there are two different types of request I can do with Scriptable, „WebView“ and „Request“.
Could someone of you help me getting back the HTML Body from a Website, so that I I can search for specific Text?

Thank you for your time and help.
Julian

sylumer · March 27, 2021, 2:01pm

As an example, this will dump the HTML of the Automator’s page on the Relay web site to the console.

let req = new Request("https://www.relay.fm/automators");
console.log(await req.loadString());

julianrd · March 27, 2021, 4:33pm

Great. That worked. So now I could save the code and search and parse for any words on the site, right?

sylumer · March 27, 2021, 4:36pm

Yes, it is just a string of uninterpreted text the same way it is if you retrieve it in Powershell. You can slice, dice and parse the string in any way you want.

julianrd · March 27, 2021, 4:47pm

Perfect. Thank you. I will try my best to find the right commands for that.

sylumer · March 27, 2021, 4:58pm

If you get stuck and can be specific about what you want to do (e.g. find the location of the first occurrence of the text, return all text before the third occurrence of the text, return everything after (and including) the last occurrence of the text, return every line with the text in it as an array, return every sentence with the text in it as a single string with sentence separated by two newline characters), I’m sure we can assist further.

julianrd · March 27, 2021, 8:03pm

On work we have an internal website where daily news are published. I thought it would be nice to have some kind of widget, like the Apple News on iOS, where you could show the latest four or six news. For this reason I analysed the source code and found a section, where every news is listed.
When I made plans for the widget I thought the first thing I would need to do would be to isolate the links for the news, as well as the pictures.

I will post a snippet of the code at the end. Maybe my thoughts are to complicated and there is an easier way. So if you have any ideas or even if it doesn’t make sense, please let me know


<aside id="text-3" class="widget widget_text"><h2 class="widget-title">News</h2>			<div class="textwidget"><p><a href="https://internalurl.local"><img loading="lazy" class="" src="http://img.internal Ito.local.to/Zugvvd.jpg" alt="" width="170" height="251" /></a><a href="https://internalurl.local/"><img loading="lazy" class="" src="http://img.internal.local.to/XGFtzs.jpg" alt="" width="167" height="250" /></a></p>
</div>
		</aside>

sylumer · March 27, 2021, 8:30pm

This should give you a starting point for images which you can also adapt to links.

The approach required an understanding of the basics of JavaScript and some familiarity with regular expressions.

julianrd · March 27, 2021, 8:55pm

Great! Thank you. But how would you filter for the news section so that you don’t get all links out of the source code but only the ones in the filtered section?

sylumer · March 27, 2021, 10:09pm

I would either use a regular expression to match against the boundary tags (that h2 widget-title one with “News” after it, and probably the /aside that follows), and return what’s between them (i.e. the news section), or do a couple of split() calls to chop the content prior to and after from those boundary tags to leave me with just the news section.

There may also be a good way to do it via loading it in and using a DOM approach, but I’m a bit tired to think that one through right now, and the options above should suffice from a purely string processing point of view.

Hope that helps.

Martin_Packer · March 28, 2021, 8:07am

Is it not possible to build a DOM tree from returned HTML? Then tree walking gets you what you want in a more robust fashion.

julianrd · March 29, 2021, 2:26pm

Hello everyone, i hope you had a great weekend and good start into the week.
I‘ve been trying to do some some regex and so far it seems to work. The only problem is that it only returns 1 string and not everything the expression finds. I double checked my regex on https://regex101.com/ and found out that the code matches everything that i need but doesn’t return it. I also tried an Array but maybe i did something wrong. Would be really nice if someone could take a closer look.


let string = '<aside id="text-3" class="widget widget_text"><h2 class="widget-title">News</h2>			<div class="textwidget"><p><a href="https://internalurl.local"><img loading="lazy" class="" src="http://img.internal Ito.local.to/Zugvvd.jpg" alt="" width="170" height="251" /></a><a href="https://internalurl.local/"><img loading="lazy" class="" src="http://img.internal.local.to/XGFtzs.jpg" alt="" width="167" height="250" /></a></p></div></aside>'

let regexsection = /News.*\/aside/s;
let section = regexsection.exec(string)
//console.log(section)

//Creating Regex for News Links
let regexlinkstonews = /<a href="(.*?)".*?\>/gs;
let linkstonews = regexlinkstonews.exec(section)
console.log(linkstonews)

//Creating Regex for Image Links
let regexlinkstoimage = /src="(.*?)".*?\/a>/gs;
let linkstoimage = regexlinkstoimage.exec(section)
console.log(linkstoimage)

sylumer · March 29, 2021, 2:32pm

You should get an array when you loop over successive matches with an exec.

https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/RegExp/exec

Try a match to get them all in one go in an array.

https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/String/match

julianrd · March 30, 2021, 8:51am

That actually did the trick. Thank you very much. After I put everything in the array I had to slice some parts but after all I had the result I was looking for.
Now I will try to build a widget. Anyway. Thanks again for your fast and efficient help.
Have a great day.

Martin_Packer · March 30, 2021, 9:02am

I’m about to pursue my “DOM Tree” idea - as I think it has merit.

I don’t mind whether it works on Mac or iOS. I’m wondering what can take HTML (as that’s easy to get to from Markdown) and create a DOM tree. Drafts? Scriptable? A web browser? curl?

schl3ck · April 1, 2021, 6:45pm

I would suggest to use an exiting solution instead of reinventing the wheel again and the best thing for that is a browser.

Since I don’t know Drafts, how about the WebView of Scriptable?

Martin_Packer · April 2, 2021, 7:50am

Actually , since I commented, that is precisely what I did. That and get x-callback-url working between Drafts and Scriptable…

My simple case was to convert to current draft to HTML in Drafts and then call Scriptable to count the number of “H2” elements in the HTML. All in javascript on both sides.

This was really an exercise rather than a practical application but it’s got me the basics of an idea.

What I don’t know is whether I could’ve done it all in Drafts with its idea of a web view. Obviously I’d prefer not to round trip through a different application.

Then I had several more unrelated brainstorms and so this is left as a proof of concept.

I could write it up in my blog - but would only do so if someone confirms there isn’t a DOM tree walking web view in Drafts. (Otherwise it’s a waste of my readers’ time.)