Renaming Files to specific format

I have a bunch of files that I’ve named over the years, and I now want to standardize the file name format to e.g. 2020-02-28.pdf

The existing files are in several different formats

e.g. 1-12-09.pdf, 1-5-10.pdf, 1-1-2020.pdf, 12-23-2020.pdf

I’ve matched the file names using: (\\d{1,2})\\-(\\d{1,2})\\-(\\d{2,4})\\.pdf

and using back referencing, I’ve switched the groups around using NameChanger for Mac

$3-$1-$2.pdf

I end up with the files renamed:

09-1-12.pdf, 10-1-5 .pdf, 2020-1-1.pdf, 2020-12-23

My question is, how can I add the missing digits to conform to my desired format?

e.g. change 09-1-12 to 2009-01-12

Can this be done with regex alone or will I need to use some type of script?

I think a script would be more robust. The major scripting languages (Perl, Python, Ruby) have excellent libraries for parsing dates on the input side and formatting them on the output side. Given that your file names are already in good shape, it would be simple for Python’s dateutil library, for example, to recognize the dates and transform them into a consistent format.

Thanks for the guidance. Do you have any recommendations for a Python dateutil tutorial for a beginner?

The docs have a pretty good example https://dateutil.readthedocs.io/en/stable/

1 Like

Yes, the dateutil docs are good. Look into the parser submodule. You could also look here for a couple of simple examples. This weekend is busy, but I can probably sketch out some starter code that you could flesh out to meet your specific needs.

Converting dates from one format to another is probably best done with a date utility, as Dr. Drang says. However… I did recently run into a neat trick in some Javascript I was looking at:

If you want leading zeros, but you’re starting with month and day values that don’t have them, just add a zero to the front of whatever you do have (as a string) and take the last two digits.

So month or day 1 becomes 01 and you take both digits. Month or day 12 becomes 012 and you take the last two digits (using slice in this case, but it would also work in Python with slice notation: "012"[-2:]

It doesn’t help with the year, however.

This started out short and grew as I realized how many ways it could be used and how many edge cases there might be. I’m sure there are still situations I haven’t covered, so please please please test this before unleashing it and definitely have a backup of all your files before you do anything else.

The script is intended to be named redate and saved in your $PATH. As explained in the usage message near the top of the script, it has one main option, -t, which doesn’t do any renaming but shows you how the script would rename the files passed to it.

It can accept any number of files as arguments, so you can run it as

redate *.pdf

from within a folder of your dated PDFs. It will also work with nested folders, so if you are in a folder that contains folders of dated PDFs, you can run

redate */*.pdf

It skips files that are already in the desired YYYY-MM-DD format and files for which the stem (the file name without the extension) can’t be parsed as a date. It assumes the US convention for numeric dates, so 3-2-20 is taken as March 2, not February 3.

If there are multiple files that have different names but parse to the same date, it adds ~n to the file stem. So

02-28-20.pdf
02-28-2020.pdf
2-28-20.pdf

get renamed to

2020-02-28.pdf
2020-02-28~2.pdf
2020-02-28~3.pdf

I’m pretty sure it could be turned into a Quick Action (what we used to call a Service) through Automator, but I haven’t tried that.

Although I normally work in Python 3, this was written to run in the Python 2.7 that comes with macOS. It uses no libraries that aren’t installed with the system. I think the comments are good enough for you to modify it if it doesn’t quite meet your needs.

Again, don’t trust me to have covered all the bases. Make sure you have backups and test before you leap.

#!/usr/bin/python

from dateutil.parser import parse
import os.path
import os
import sys
from getopt import getopt, GetoptError

usage = """Usage: redate [-th] FILES
Rename files with date names to YYYY-MM-DD format.

  -t   Don't rename files; test by showing how they'd be renamed
  -h   Print this help message

Skip files with names that cannot be parsed as dates and files
that are already in the desired format. If files with different
names parse to the same date, i.e., 2-28-20.txt and 02-28-20.txt,
add ~2, ~3, etc. to the base file name.
"""

# Initialize the test conditional and the list of new filenames
test = False
newpaths = []

# Handle any command-line options
try:
	opts, args = getopt(sys.argv[1:], 'th')
except GetoptError as err:
	print str(err)
	print usage
	sys.exit()

for o, v in opts:
	if o == "-t":
		test = True
	else:
		print usage
		sys.exit()		

# Loop through all the file path arguments
for path in args:

	# Get all the parts of the absolute file path
	oldpath = os.path.abspath(path)
	directory, name = os.path.split(oldpath)
	stem, ext = os.path.splitext(name)
	
	# Parse the date and construct a new file name and absolute path
	try:
		filedate = parse(stem, yearfirst=False, dayfirst=False)
	except ValueError:
		sys.stderr.write("Date parsing error on {}. Skipping...\n".format(name))
		continue
	
	newstem = filedate.strftime("%Y-%m-%d")
	newname = "{}{}".format(newstem, ext)
	newpath = os.path.join(directory, newname)
	
	# Skip if the new name is the same as the old
	if newpath == oldpath:
		continue
	
	# Handle naming conflicts by appending ~n to the date
	if os.path.exists(newpath) or newpath in newpaths:
		n = 2
		newnamen = "{}~{}{}".format(newstem, n, ext)
		newpathn = os.path.join(directory, newnamen)
		while os.path.exists(newpathn) or newpathn in newpaths:
			n += 1
			newnamen = "{}~{}{}".format(newstem, n, ext)
			newpathn = os.path.join(directory, newnamen)
		newname = newnamen
		newpath = newpathn
	
	# Rename the file or show how it would be renamed
	if test:
		print "{}\n{} -> {}".format(directory, name, newname)
		print
	else:
		os.rename(oldpath, newpath)

	newpaths.append(newpath)
2 Likes

This is way more help than I expected. Thanks so much! After looking at some of the automation that can be accomplished with Python, I’m going to try my hand at it with some online courses. This script will be a great reference. I truly appreciate it.

You’re welcome, and good luck. It was fun to write. Please be careful with it, though. Python’s os.rename function is unforgiving and can mistakenly overwrite files if the arguments aren’t set right. I did my best to avoid that happening, but I may have missed something.

As with any script from an untrusted source (me), back up before you use it.