How can i scrape sites in scriptable?

Moh · January 26, 2021, 10:27am

Usually in js if i wanted to scrape a site id use nodejs firstly and id use Puppeteer but i cant do that without node, is there any other way to scrape sites/

sylumer · January 26, 2021, 12:39pm

Download the raw content of a page and parse it to plain text? Won’t work for pages that dynamically render, but should for the rest.
Use an app other than Scriptable with node capabilities or some page extraction functionality built in.
Have Scriptable hand off to something else to do the scraping (a web service, a box running your node setup, etc.) and process the result.

Herr_K · January 27, 2021, 2:08pm

Hi,
I scrape a commercial weather site by Request(url).loadString() and parse it with RegExp.
But doesn‘t work for all content, as mentioned by sylumer above.
ck

chrillek · January 28, 2021, 11:08am

Parsing HTML with RegExp might work in some cases. Generally, REs are not the right tool for this task:

and not only because of dynamically generated content.

Herr_K · January 28, 2021, 3:57pm

Hi chrillek,

acknowledged.

ck

DillaD · January 31, 2021, 12:45am

In some situations does

let url = “ “
let wv = new WebView()
await wv.loadURL(url)
let html = await wv.getHTML()

vs.

using Request

not help with content loading??

I’ve also had some moderate success with using

and

to access some handy modules.