Test for new JSON data?

I’m building a system in Bear for reviewing random reading notes. I have found ways to turn Kindle and Instapaper highlights into JSON data. I am importing that JSON into Shortcuts, then using the Dictionary tools to process the data and create individual Bear notes with tags for each highlight. Then I have a later Bear shortcut that searches specific tags and returns a random note.

Everything is working great, but I wonder now how to deal with new highlights. The systems I use don’t make it easy to only create JSON data of “new” highlights–it’s basically: get output of “all” your highlights at any given time. How can I “diff” the new JSON data from the old and only feed the new JSON into Shortcuts to create Bear notes?

1 Like

Diff of structured data is actually harder than just a Diff of a flat data file as both structure and data can vary and if one item of data in one structure varies, that can require other related data elements to also be inclusive in the Diff. Trying to come up with something generic for JSON on a platform like iOS, I would describe as “non-trivial”; meaning it is likely a lot of work to develop and test it.

Can you clear down the old data at the source(s) after import so all data is effectively new and equivalent to a diff against the old?

In this case I don’t think that I can, as that would mean deleting my existing highlights every time, which isn’t something I’m willing to do.

However, the data structures here are very simple. New data is added as new objects, no existing objects are changed in any way.

I’m thinking it might be worth trying a find/replace with the existing data. I will archive the old JSON before grabbing the newly updated file. Then I can use the old file as the “find” and replace it with a blank line. Perhaps I can just add a bracket or brace to the new text and it will still work. I’ll report back if that has any success.

When I’ve been faced with similar problems in other contexts I’ve done one of the following:

  • use a time stamp if the original data has one; I’m guessing it’s unlikely your highlights are the same

  • create a unique identifier and use that to test for presence; for example, if concatenate two or three fields that, together, are likely to be unique.

With either of these, if you store just the time stamp/unique identifier in a separate list, you can test whether a new one is in that list.

And of course, both are easier if you use them in a database. I know iOS has good support for SQLite databases, but I don’t know if you can use them in shortcuts very easily.

I’m not sure if any of those will work in your case, but thought I’d share.

For anyone else with a similar question: the following works for me when run as a Drafts action:

let fmCloud = FileManager.createCloud();
let New = fmCloud.read("/Test/instapaper-new.txt");
let Old = fmCloud.read("/Test/instapaper-old.txt");
let b = JSON.parse(New);
let a = JSON.parse(Old);

function remove_duplicates(a, b) {
    for (var i = 0, len = a.length; i < len; i++) { 
        for (var j = 0, len2 = b.length; j < len2; j++) { 
            if (a[i].highlight === b[j].highlight) {
                b.splice(j, 1);
                len2=b.length;
            }
        }
    }

    console.log(a);
    console.log(b);

}

remove_duplicates(a,b);


var clean = JSON.stringify(b);

fmCloud.write("/Test/instapaper-deduped.txt",clean);

Thanks to the author of this “jsfiddle”: http://jsfiddle.net/Snippet/FC2fY/2/