Need to automate conversion of MS Word docs to plain text

#1

My wife has thousands of client notes in MS Word that I would like to convert to plain text.
Obviously, doing this one at a time is out of the question, so I need a way to automate it, but I have no idea how to do so. I welcome suggestions!

0 Likes

#2

I’ve mentioned Pandoc a few times as a way to convert text based files. I think it would be ideally suited to this.

Someone has posted a shell script that will convert DOCX to TXT (and a few other formats) on GitHub, so you could re-use that and remove the MarkDown and HTML output lines.

Other options are likely to focus around using AppleScript to drive the opening of each document in Word, then a Save As (TXT) action, followed by closing the document. If you take that approach, maybe making it a service would be a good option then you could set it to operate on each selected file in Finder rather than having to build n searching and filtering of any sort.

0 Likes

#3

Thank you so much for the advice!
I’ll try Pandoc first.

0 Likes

#4

If you haven’t found a solution yet, I used this approach with success several years ago to convert my DOCX files to TXT:

I just tested it again this morning by copying several DOCX files into a CONVERT directory on the desktop, and this terminal command produced the corresponding TXT files:

textutil -convert txt /Users/James/Desktop/CONVERT/*.docx

I didn’t have thousands as you do, so there could be some limit that never cropped up with me.

1 Like