Read CSV, split a field, export the updated CSV

Philrob · March 3, 2021, 6:47pm

Hello All.

Assistance would be much appreciated - I have a prototype AppleScript file that successfully reads the CSV file and splits the required field. At the moment it just logs the data to the console.

Two issues:

Issue 1 - It fails once run with the error

Result:
error "Can’t get item 1 of {}." number -1728 from item 1 of {}

Which will be something basic but I can’t see what it is.

Issue 2 -

What is the best way to write the output to a new csv file with the same name & path as the one chosen by the user but with the suffix “- processed” appended? e.g. test.csv becomes test-processed.csv

Here is the prototype file - comments and suggestions very welcome.

-- Open CSV File and import to list

-- select the file
set theFile to (choose file of type {"public.comma-separated-values-text"} with prompt "Select the CSV file")

-- read the file contents
set f to read theFile


-- break the file into rows (paragraphs)
repeat with row in ((paragraphs of f))
	
	-- parse the row into comma-delimited fields
	set fields to parseCSV(row as text)
	
	-- now we have a line of data:
	set payAmount to item 1 of fields
	set paySortCode to characters 1 through 6 of item 2 of fields
	set payAccount to characters -8 through -1 of item 2 of fields
	set payName to item 3 of fields
	set payRef to item 5 of fields
	
	-- now do something with the data:
	log payAmount & " " & paySortCode & " " & payAccount & " " & payName & " " & payRef
	
end repeat


-- subroutine to process the CSV and make the list
on parseCSV(theText)
	set {od, my text item delimiters} to {my text item delimiters, ","}
	set parsedText to text items of theText
	set my text item delimiters to od
	return parsedText
end parseCSV

Background: The accounts package spits out a csv file. Unfortunately the output contains a single field that I need to split into two fields (the sort code and the account number) so that I can import it into the bank without worrying about errors creeping in as someone hand edits a payment file!!!

For completeness here is the full events log from the ScriptEditor:

tell application "Script Editor"
	choose file of type {"public.comma-separated-values-text"} with prompt "Select the CSV file"
end tell
tell current application
	read alias "Macintosh HD:Users:philiproberts:Downloads:test.csv"
	(*22.64 123456 12345678 Fred Bloggs MCI Logistics*)
	(*88.64 654321 87654321 Elephant Enterprises 88246 MCI*)
Result:
error "Can’t get item 1 of {}." number -1728 from item 1 of {}

and the sample csv file (yes there is an empty 4th field)

22.64,12345612345678,Fred Bloggs,,MCI Logistics
88.64,65432187654321,Elephant Enterprises,,88246 MCI

sylumer · March 3, 2021, 7:52pm

I realise that you are asking about AppleScript, but I wonder if you had considered using awk?

I think this one-liner does what you want.

Obviously, just point awk at your own CSV file, but here’s the code from the above for easier copying.

#!/bin/zsh
awk -F "," '{print $1","substr($2,1,6)","substr($2,7)","$3","$4","$5}' /Users/stephen/scripts/test.csv

I’m not an awk expert, so there may be a simpler command, but what the above does is split the file records by commas to specify fields, then output the fields 1 though 5, though with field 2, it further subdivides based on number of characters into two output fields.

Hope that makes sense, and that might be a viable alternative for you.

Martin_Packer · March 3, 2021, 8:00pm

Would a blank line cause the error?

drdrang · March 4, 2021, 12:36am

I suspect @Martin_Packer is on the right track. If your file has a trailing linefeed, AppleScript will treat the empty string after that as another paragraph. Then your parseCSV returns an empty list and out pops an error message.

One way around this is to wrap an if statement around the section of code that sets the variables. Test for enough items in the fields variable.

@sylumer is right that awk is pretty much tailor-made for this kind of problem. It does all the initial splitting and parsing of lines for you.

One thing that neither AppleScript nor awk will handle gracefully is a CSV that wraps its fields in quotation marks because one or more of them contain commas. If you know you’ll never get that kind of input, great; but if you do have to parse more general CSV files, you might look into Python’s csv module. It’s also good for writing CSV files.

Martin_Packer · March 4, 2021, 8:29am

I’m glad you mentioned Python’s csv module. My code relies heavily on it and it has the most robust support one could want. (It’s in two of my open source projects.)

One thing I like - in potentia - about the csv module is its support for dialects. That might be needed here to cope with this very problem.

Philrob · March 4, 2021, 3:30pm

Many thanks - awk works great, not something that I had used before but it does the job well (almost as if it were designed for that task )

I have managed to put the shell script in an Automator quick action (right mouse click) with the workflow receiving current files or folders in Finder.app

The script currently looks like this:

for f in "$1"
	do
			if [[ "$f" = *.csv ]]; then
				cd "$(dirname "$f")"
				awk -F "," '{print $1","substr($2,1,6)","substr($2,7)","$3","$4","$5}' "$f" > XeroPayments.csv
			else
				afplay /System/Library/Sounds/Funk.aiff
			fi
	done

A couple of things I have done that might be useful to others (there may be a better way)

The cd “$(dirname “$f”)” means that the output file is placed in the same folder that the original file that I select for the quick action resides.

I have a test to check that it is a CSV file and to make a noise if it isn’t.

I think the [ for f in “$1” ] rather than using [ for f in “$@” ] restricts the script to the first file (in the event that more than one file is selected) - is that right? is there a better way?

Further development (but not urgent) - base the name of the output file on the original fill with -processed suffixed to the end.

drdrang · March 5, 2021, 2:33pm

Try this in your Quick Action:

# Process the file and path names of the first argument
directory=$(dirname "$1")                  # directory path
oldfile=$(basename "$1")                   # original file name with extension
oldname="${oldfile%.*}"                    # original file name w/o extension
extension="${oldfile##*.}"                 # extension only
newfile="$oldname"-processed."$extension"  # new file name with extension

# Process the file itself
if [[ "$extension" = csv ]]; then
	cd "$directory"
	awk -F "," '{print $1","substr($2,1,6)","substr($2,7)","$3","$4","$5}' "$oldfile" > "$newfile"
else
	afplay /System/Library/Sounds/Funk.aiff
fi

It should work with either zsh or bash as the shell. Zsh has some cute suffixes you can use to extract the parts of file paths, but I went with the more traditional functions and pattern matching tools because I’m not as comfortable in zsh.

As far as I know, there’s nothing wrong with the way you’re using a loop to restrict the script to the first selected file, but I think it’s simpler to do it this way.

Philrob · March 8, 2021, 10:47am

Thank you @drdrang works perfectly - Including the comments in the script helps understanding as well.

@Martin_Packer - you were right, there is a linefeed on the export from the accounting software.

Martin_Packer · March 9, 2021, 8:16am

Glad you got it sorted out. And I don’t suppose you control the source to be able to eliminate these new lines. So accommodating them is the right thing to do.