Problems automating combining CSV files with embedded bash script in Hazel

Dana_Portenier · June 27, 2020, 1:29pm

This may not be the best forum to ask this question about Hazel and the bash script. Happy to put elsewhere if anyone has suggestions

I am pretty new to bash scripting but have watched several Lynda.com courses recently and figured out what I want to do in the terminal but now want to automate with Hazel. Likely could do in the terminal but I’m not that advanced. But willing to consider if people feel its best

I am receiving web based customer data from a web form in a CSV file. The headers for the data are in column 1 and customer input in column 2. Each customer input is a separate CSV file

I want to automatically combine each CSV file so I can dashboard some of the data. Ultimately hope to display with the Charty App.

In the terminal, I have used awk to extract the headers and place them in row 1 of a new file ill refer to as combined.csv

I have created another awk command to extract column 2 ( the customer data) and put it in a row separated by commas. The command I’m using is: awk ‘BEGIN {FS=“,”}{RS=“\r”}{ORS=“,”}{print $2}’ file_name.csv >> combined.csv

One issue I had is this places the new data on the same row as the headers which I can’t figure out how to overcome. I can make my combined not have the headers if necessary but sure there is a way to fix.

Next, I decided to automate with Hazel. I have pointed it folder where the individual CSV files get placed. They have no tag so I want Hazel to identify the files with no tag and run an embedded bash script

awk ‘BEGIN{FS=","}{RS="\r"}{ORS=","}{print $2}’ $1 >>

filepath Combined.csv

In this reportedly the $1 is how you denote the current file Hazel is working on.

After running I would like Hazel to tag the file with “complete” so it won’t be triggered again.

Hazel reports back an error saying it can’t run the shell script.

Sorry for the long post. Any help would be appreciated.

drdrang · June 27, 2020, 3:23pm

I can’t help you with Hazel, but I might be able to help with the shell stuff.

First, with ORS set to a comma, awk is ending each command ouput with a comma instead of a newline character (\n). So when you >> the next output, it goes onto the end of your combined file. You could pipe the awk output through sed to fix this. To get the header line, run

awk 'BEGIN{FS=",";ORS=","}{print $1}' cust1.txt | sed 's/,$//' > combined.csv

And then for each of the data files, run

awk 'BEGIN{FS=",";ORS=","}{print $2}' custN.txt | sed 's/,$//' >> combined.csv

where custN.txt represents any customer file.

You’ll note that I’m not including RS="\r" in my code. That’s because I don’t understand why it’s in your code. Mac files have lines that end with \n (and that’s what Unix commands expect), Windows files typically have lines that end with \r\n. No files have line endings of just \r, at least not since the classic Mac OS days. Now, you may well have good reason for including RS="\r", and it may be necessary because of the input files you’re getting from your customers. If that’s the case, keep it in there. It just seems weird to me.

By the way, I tested the above commands with a cust1.txt file of

A,1
B,2
C,3

and a cust2.txt file of

A,9
B,8
C,7

When I was done, combined.csv was

A,B,C
1,2,3
9,8,7

which I think is what you’re looking for.

Dana_Portenier · June 27, 2020, 4:58pm

Thanks…I will play with what you have sent.

The \r really only came from multiple search after i couldnt get my initial attempts to work. Since I am new to this I am certain your method is likley better.

Appreciate the help and will let you know how it goes

Martin_Packer · June 28, 2020, 7:51am

By the way Python has CSV support built in. I use it a lot. What you’re doing sounds like it’s getting complex enough to use Python instead of awk.

Just a thought.

Dana_Portenier · July 19, 2020, 4:19pm

drdrang I really enjoyed your recent appearance on The Automators. I don’t have a coding back ground but have been progressing slowly and learning allot along the way. I certainly never thought I would venture into command line scripting but here I am.

I appreciate your previous response. I haven’t gotten it to work with Hazel. The Hazel forum has suggested I should try an external bash script to run rather than inline script.

Any advice on syntax for looping through all files in a specific folder and performing the above awk/sed commands you provided. Would want it to act on files without a tag and place a tag at the end if possible to mark them as added.

drdrang · July 19, 2020, 8:46pm

Again, I don’t know how to do this in Hazel, but I can do it from the Terminal. First, let’s assume the following:

The file you’re collecting all the information into is named “combined.csv”.
“combined.csv” been set up with the headers as described in my previous answer.
The tag you’re going to use to identify files that have already been added is “dana”.
“combined.csv” has been given the “dana” tag.
You have installed J.D. Berry’s tag command. I know it’s not entirely fair to ask you to install a new program, but manipulating tags without tag is horrible. It’s easy to install if you have Homebrew.

OK, with all that in place, here’s a shell script that adds the data from CSV files in the current directory that do not have a tag:

#!/bin/bash

for f in `tag -m ''`; do
  if [ ${f: -4} == ".csv" ]; then
    awk 'BEGIN{FS=",";ORS=","}{print $2}' "$f" | sed 's/,$//' >> combined.csv
    tag -a dana "$f"
  fi
done

In Terminal, you’d cd to the directory with the data files and run this command.

What it does is

Use the tag command to get all the files in the current directory that have no tag. So we’re assuming not only that the new files don’t have the “dana” tag, but that they have no tags whatsoever. I hope this will work for you; unfortunately, the tag command doesn’t have a “negate” option.
Go through those file names and check if their last four characters are “.csv”.
For each file with that extension, add the data to “combined.csv” using the code in my previous answer and then tag that file with “dana”.

This is not the most robust script in the world. It assumes that the individual CSV files all have the same ordering of their data, that they don’t have any extra lines, and that they all have a “.csv” extension. File hygiene is important. If the incoming files aren’t clean, you’re probably better off switching from a shell script to a “real” programming language with libraries already written to handle dirty data. As @Martin_Packer suggests, Python would be good choice.

Good luck!