I work for an architectural firm. Trying to better coordinate our drawings with our specifications, which are document files the give more detailed requirements than are shown on the drawings. We receive quarterly updates for about 500 .docx files. I’d like an app to open those documents, search for part 2 of the .docx file then list all paragraphs in part 2 of the document file into a three column .xls (csv or numbers) that contains specification number (construction specifications institute code) which is part of the file name, short description and long description. The short description and long description are separated by a “:” in the document file so I think an app should be able to parse this. Would an app like keyboard maestro be the best? or others that I should investigate?
I’d do this in Applescript in the apps themselves - MS Office has very robust scripting.
Robust but hard to understand AppleScript support. You really have to understand the VBA object model - which took me a while.
To summarise the responses you’ve had already, this will need some scripting.
An application like Keyboard Maestro can launch a script, or apply a regular expression to a string already extracted from Word, but there is no really no alternative to scripting (whether VBA or osascript or XQuery) for extracting the plain text from .docx XML files, and building a report.
I would personally do this by writing a query in XQuery, which could then be run whenever needed:
NSXMLDocument(launched by Keyboard Maestro if you like)
- or in an XQuery engine like BaseX or eXist-db
Try, for example, a search like:
[xquery docx ](https://www.google.co.uk/search?q=xquery+docx)
( an advantage of XQuery, in the context of 500 .docx files, is that it can work quite swiftly and directly with the files themselves, rather than having to load them into MS Word )
Thank you all for the suggestions! I’ll dig into it a bit but my suspicion is it might be a little above my pay grade I’ll stick to designing buildings and leave the scripting to the pros.
If there are any Pythonistas in your office, then this may provide a lighter approach:
- writing a Python script which imports the docx and simplify-docx libraries to read the text in a docx file.
- launching it from a Keyboard Maestro Execute shell script action.
(again, faster than loading files into MS Word and using VBA or osascript)
[python-docx · PyPI](https://pypi.org/project/python-docx/)
[microsoft/Simplify-Docx: Simplify DOCX files to JSON](https://github.com/microsoft/Simplify-Docx)
I’d also follow the Python route rather than sliding down the Apple Script/VB whirlpool.
For exporting the information to an excel sheet, as OP requested, I recommend the module
Automation is a video game in which the road to reward (clearer focus or greater reach) is closely flanked to left and right by rabbit holes and tar-pits – Scylla and Charybdis – while all the while the xkcd jokes circle above like hungry vultures, and bellowing yaks distract the hapless pilgrim.
[openpyxl - A Python library to read/write Excel 2010 xlsx/xlsm files — openpyxl 3.0.5 documentation](https://openpyxl.readthedocs.io/en/stable/)
I’d like to print and frame what you just wrote
I know this post is old but I just had to comment in case someone else comes along asking the same question and finds this thread. There’s a lot of wisdom as @k.a.ll.e suggests. Python’s been around a while and most of the web runs on it. In 9 cases out of 10, you’ll find someone’s already created the script you’re looking for and you’ll just need to perform a few tweaks to get it to work for your situation. Why re-invent the wheel?
I also have to add that asking someone which app to use is like asking a person what is the best color to use. I can tell you green is the best color and you’ll find a billion people who will say that blue is the best color by far.
Work from what you know best and branch out from there. It doesn’t hurt to ask people what they recommend but some investigation should be focused on trying a bunch of stuff and seeing what “speaks” to you. Which one do you “get” or seems easier to you right away.
You could spend five years working with a program making little progress only to stumple on another one which you get the hang of more quickly and seems easier to you. I love Apple and it’s my system of choice but they will only disappoint you over and over and over again. Try searching for an AppleScript course on LinkedIn Learning. You’ll get zero hits.