MD5 hash generation (Mac)
Overview
As mentioned in this article, creating an
md5 hash (.md5
) of your file can help assure the integrity of your
file after transport through the SFTP layer.
This article provides you with a client side script for generating md5 hash files. This works even with nested subfolders.
The main challenge for creating md5 hash files is that you need to
first cd
into the folder before generating the md5 hash
. This is
because the relative path to the file is included in the hash.
For example, you can run the md5
command on a file in the current directory:
md5 file1.txt
MD5 (file1.txt) = 400c186509c898fce0a803838b1da036
But, you don't want to run it on a nested file, because the relative path gets included in the hash:
md5 subfolderA/subfolderB/file3.txt
MD5 (subfolderA/subfolderB/file3.txt) = c474464ca58e59a6472929a755ab4980
Bash script for generating hash files
This is the bash script for generating .md5
files:
find . -type f -exec sh -c '(cd $(dirname {}) && FILENAME=$(basename {}) && if [[ ${FILENAME: -4} != ".md5" ]]; then md5 $FILENAME > $FILENAME.md5; fi )' \;
To use this script, first cd
to where your files are located on your SFTP client machine.
Then run the script.
It should create an .md5
file for every file in that directory (including nested directories).
Explanation of the bash script
The above script is not easy to reason about, so here's an explanation of how it works.
You can recursively find all of the files in the current directory:
find . -type f
This gives you the following output:
./file1.txt
./subfolderA/file2.txt
./subfolderA/subfolderB/file3.txt
You can run the -exec
flag on the find
command, like this:
find . -type f -exec md5 {} \;
And this gives you the output:
MD5 (./file1.txt) = 400c186509c898fce0a803838b1da036
MD5 (./subfolderA/file2.txt) = e18158cfa7ce8f19a7b40b858450f383
MD5 (./subfolderA/subfolderB/file3.txt) = c474464ca58e59a6472929a755ab4980
The problem here is that the has includes the relative path.
So, the next step is to first cd
into the directory prior to running md5
:
find . -type f -exec sh -c '(cd $(dirname {}) && md5 $(basename {}))' \;
This gives you the output:
MD5 (file1.txt) = 400c186509c898fce0a803838b1da036
MD5 (file2.txt) = e18158cfa7ce8f19a7b40b858450f383
MD5 (file3.txt) = c474464ca58e59a6472929a755ab4980
Here is some further explanation of the added syntax:
(cd <command>)
: Wrapping in parentheses runs the command in a subprocess, without changing the directory of the main thread.$(dirname {})
: Gives you the relative path (i.e../subfolderA/subfolderB/
)$(basename {})
: Gives you the file name (i.e.file3.txt
)
Skipping ahead, this is the final script:
find . -type f -exec sh -c '(cd $(dirname {}) && FILENAME=$(basename {}) && if [[ ${FILENAME: -4} != ".md5" ]]; then md5 $FILENAME > $FILENAME.md5; fi )' \;
Here is some further explanation of the added syntax:
FILENAME=$(basename {})
: Sets a variableFILENAME
to thebasename
. This gets used multiple times later on.md5 $FILENAME > $FILENAME.md5
: Creates the md5 hash file, with an.md5
extension${FILENAME: -4}
: Gets the last four characters of the variable${FILENAME}
for the purpose of checking the file extension (i.e..md5
).if then
: Prevents creating a hash file for existing.md5
files.