Some more find snippets

Exec complex commands with find

find * -exec is a great way to run a simple command on multiple files, like shown in Aug 21, 2020 - More Unix tools. But what to do if you want to run more complex commands, where the “file” variable isn’t necessary at the end of the command, or you need more variables?

Lets say you have a similar task, a lot of tar files you want to extract. But now each tar file is in it’s own directory and you want to extract the files inside there. tar has the -C target_directory option which can do that, so for a single file it would be tar xJf SomeDirectory/Something.tar.xz -C SomeDiretory. But how do you do this in combination with find * -exec for multiple files/directories?

You have to do this via -exec sh -c ''! For example:

find * -type f -name "*.xz" -exec sh -c 'dir=$(dirname "$0"); tar xJf "$0" -C "$dir"' {} \;

As usual you pass in the found file as last argument via {} \; to the exec command. But then in that case the exec command launches a shell which takes it in and passes it on to the shell script you specify in -c '...' as first argument $0.

Find + awk to extract information via regex groups

Recently I had to extract the image dimensions from a few thousand of tif images. Again find came to the rescue, in combination with awk. As single and double quotes and there combinations are quite important for both, find * -exec didn’t work directly in that case. I had to iterate over the find output with a for loop.

echo "Filename , Width , Height" >> image_dims.csv"
for i in `find * -type f -name "*.tif"`; do echo -n "$i , "; tiffinfo $i | head -3 | tr '\n' ' ' | awk 'match($0,/.+Width:\ ([0-9]+).+Length:\ ([0-9]+).+/,a) {print a[1],",",a[2]}' >> image_dims.csv; done;

To break it down:

find * -type f -name "*.tif" generates a list of the tif files, for iterates over them and runs the commands between the do and done.

tiffinfo spits out lots of information about an tif image (but only the first 3 lines are needed), e.g.:

TIFF Directory at offset 0x8 (8)
  Subfile Type: multi-page document (2 = 0x2)
  Image Width: 2960 Image Length: 2960

tr '\n' ' ' removes the linebreaks, so everything’s on one line.

awk 'match($0,/.+Width:\ ([0-9]+).+Length:\ ([0-9]+).+/,a) is looking for two groups of decimals, one after “Width: “ and one after “Length: “. These two values will be stored in the array a.

print a[1],",",a[2] simply prints the two values.