Some more find snippets
Exec complex commands with find
find * -exec
is a great way to run a simple command on multiple files, like shown
in Aug 21, 2020 - More Unix tools. But what to do if you want to run
more complex commands, where the “file” variable isn’t necessary at the end of the
command, or you need more variables?
Lets say you have a similar task, a lot of tar files you want to extract. But now
each tar file is in it’s own directory and you want to extract the files inside
there. tar
has the -C target_directory
option which can do that, so for a
single file it would be tar xJf SomeDirectory/Something.tar.xz -C SomeDiretory
.
But how do you do this in combination with find * -exec
for multiple files/directories?
You have to do this via -exec sh -c ''
! For example:
find * -type f -name "*.xz" -exec sh -c 'dir=$(dirname "$0"); tar xJf "$0" -C "$dir"' {} \;
As usual you pass in the found file as last argument via {} \;
to the exec command. But then
in that case the exec command launches a shell which takes it in and passes it on to the shell script
you specify in -c '...'
as first argument $0
.
Find + awk to extract information via regex groups
Recently I had to extract the image dimensions from a few thousand of tif images. Again find
came to the rescue, in combination with awk
. As single and double quotes and there combinations
are quite important for both, find * -exec
didn’t work directly in that case. I had to iterate
over the find
output with a for loop.
echo "Filename , Width , Height" >> image_dims.csv"
for i in `find * -type f -name "*.tif"`; do echo -n "$i , "; tiffinfo $i | head -3 | tr '\n' ' ' | awk 'match($0,/.+Width:\ ([0-9]+).+Length:\ ([0-9]+).+/,a) {print a[1],",",a[2]}' >> image_dims.csv; done;
To break it down:
find * -type f -name "*.tif"
generates a list of the tif files, for
iterates over them and runs the commands
between the do
and done
.
tiffinfo
spits out lots of information about an tif image (but only the first 3 lines are needed), e.g.:
TIFF Directory at offset 0x8 (8)
Subfile Type: multi-page document (2 = 0x2)
Image Width: 2960 Image Length: 2960
tr '\n' ' '
removes the linebreaks, so everything’s on one line.
awk 'match($0,/.+Width:\ ([0-9]+).+Length:\ ([0-9]+).+/,a)
is looking for two groups of decimals,
one after “Width: “ and one after “Length: “. These two values will be stored in the array a
.
print a[1],",",a[2]
simply prints the two values.