I work for an architectural firm. Trying to better coordinate our drawings with our specifications, which are document files the give more detailed requirements than are shown on the drawings. We receive quarterly updates for about 500 .docx files. I’d like an app to open those documents, search for part 2 of the .docx file then list all paragraphs in part 2 of the document file into a three column .xls (csv or numbers) that contains specification number (construction specifications institute code) which is part of the file name, short description and long description. The short description and long description are separated by a “:” in the document file so I think an app should be able to parse this. Would an app like keyboard maestro be the best? or others that I should investigate?
I’d do this in Applescript in the apps themselves - MS Office has very robust scripting.
Robust but hard to understand AppleScript support. You really have to understand the VBA object model - which took me a while.
To summarise the responses you’ve had already, this will need some scripting.
An application like Keyboard Maestro can launch a script, or apply a regular expression to a string already extracted from Word, but there is no really no alternative to scripting (whether VBA or osascript or XQuery) for extracting the plain text from .docx XML files, and building a report.
I would personally do this by writing a query in XQuery, which could then be run whenever needed:
NSXMLDocument(launched by Keyboard Maestro if you like)
- or in an XQuery engine like BaseX or eXist-db
Try, for example, a search like:
[xquery docx ](https://www.google.co.uk/search?q=xquery+docx)
( an advantage of XQuery, in the context of 500 .docx files, is that it can work quite swiftly and directly with the files themselves, rather than having to load them into MS Word )
Thank you all for the suggestions! I’ll dig into it a bit but my suspicion is it might be a little above my pay grade I’ll stick to designing buildings and leave the scripting to the pros.
If there are any Pythonistas in your office, then this may provide a lighter approach:
- writing a Python script which imports the docx and simplify-docx libraries to read the text in a docx file.
- launching it from a Keyboard Maestro Execute shell script action.
(again, faster than loading files into MS Word and using VBA or osascript)
[python-docx · PyPI](https://pypi.org/project/python-docx/)
[microsoft/Simplify-Docx: Simplify DOCX files to JSON](https://github.com/microsoft/Simplify-Docx)
I’d also follow the Python route rather than sliding down the Apple Script/VB whirlpool.
For exporting the information to an excel sheet, as OP requested, I recommend the module
Automation is a video game in which the road to reward (clearer focus or greater reach) is closely flanked to left and right by rabbit holes and tar-pits – Scylla and Charybdis – while all the while the xkcd jokes circle above like hungry vultures, and bellowing yaks distract the hapless pilgrim.
[openpyxl - A Python library to read/write Excel 2010 xlsx/xlsm files — openpyxl 3.0.5 documentation](https://openpyxl.readthedocs.io/en/stable/)
I’d like to print and frame what you just wrote