Tracing difficult shell scripts
Whether you’re an IT systems administrator, developer or devop working with Linux/UNIX shell scripts, you’re likely going to come across some that are not well documented and difficult to trace (read and understand) as a result.
While these scripts discourage others from tracing them, you can easily trace even the most difficult shell scripts if you follow good tracing procedures. Moreover, by tracing a difficult shell script, you build your knowledge of specific Linux/UNIX components, as well as scripting in general.
To demonstrate this, let’s trace an undocumented treed
shell script that I made several decades ago for my students. If you execute this shell script, it performs a hierarchical listing of the contents of a directory name that you must specify as an argument (much like the MS-DOS tree
command that has no equivalent on Linux/UNIX systems). If you execute it without supplying an argument, it returns usage information as shown below:
$ ./treed
usage: ./treed directory
$
To view a list of files and subdirectories under the classfiles directory, you could run the script with the directory name as an argument:
$ ./treed classfiles
classfiles
|
|__ Miscellaneous
| |__ mystery
| |__ letter
|__ Poems
| |__ Blake
| | |__ jerusalem
| | |__ tiger
| |__ Shakespeare
| | |__ sonnet5
| | |__ sonnet2
| | |__ sonnet3
| | |__ sonnet4
| | |__ sonnet1
| |__ Yeats
| | |__ mooncat
| | |__ old
| | |__ whitebirds
| |__ rhyme
| |__ nursery
| |__ twister
|__ proposal1
|__ proposal2
$
The classfiles directory shown above contains two files (proposal1 and proposal2) as well as two subdirectories. The Miscellaneous subdirectory contains two files (mystery and letter), and the Poems subdirectory contains three files (rhyme, nursery and twister) as well as three subdirectories (Blake, Shakespeare and Yeats) with additional files.
Now, let’s examine the contents of the shell script that produced these results:
:
[ $# -eq 0 ] && {
echo "usage: $0 directory" >&2
exit 1
}
base=$1
export base
echo "$base"
find $base -print | sed '
s:^'$base':|:
s:/\([^/]*\)$:?? \1:
s:/[^ ?/]*: |:g
s:?:_:g
'
Were you able to trace it easily? If not, you’re definitely not alone! Let’s examine some shell script tracing strategies that we can apply to this shell script.
1. Start with what you know
With a basic knowledge of shell scripting from a course or other learning resource, you’ve likely learned about standard input/output redirection, positional parameters (shell script command line arguments), as well as if
statements their equivalent conditional ANDs and ORs. Thus, you’ll probably be able to easily trace the following part of the shell script that tests whether the number of positional parameters ($#
) is equal to zero AND (&&
) prints a usage line to the screen using standard error (>&2
) if this is the case, stopping the shell script with a false exit status (exit 1
):
[ $# -eq 0 ] && {
echo "usage: $0 directory" >&2
exit 1
}
You’ve also likely learned about variables, so you know that $0
stores the shell script name, $1
stores the first positional parameter, and that the following code copies the first positional parameter to a new variable called base
that is made available to other commands run by the shell (export base
) and also printed to the screen via standard output using the echo
command (the first line you see when executing the shell script):
base=$1
export base
echo "$base"
So, by focusing on your existing basic shell script knowledge, you can trace the code that produces usage information if you don’t supply an argument to the shell script, as well as trace the code that prints the top of the directory listing in the shell script output.
2. Research specific command usage
The next part of the shell is the hardest to trace as it leverages two powerful Linux/UNIX commands: find
and sed
(the stream editor). The find
command is fairly common knowledge for any Linux/UNIX user, so you’ll be able to identify that the following line generates a recursive list of files and subdirectories starting from the directory supplied as an argument to the shell script (the $base
variable) and sends that recursive list to the sed
command for processing via a pipe (|
)
find $base -print | sed '
Where it likely gets difficult to trace for most people is in the single-quoted, multi-line argument that follows the sed
command:
find $base -print | sed '
s:^'$base':|:
s:/\([^/]*\)$:?? \1:
s:/[^ ?/]*: |:g
s:?:_:g
'
This isn’t about knowing shell scripting per se, but instead understanding the complex usage of a command that is used within the shell script (there are hundreds of such tools). Of course, you’ll need to spend some time looking through manual page (man sed
) or some examples online. While it may take you several minutes (maybe even an hour or more) to learn how sed
is used here, you’ll also be learning a powerful tool that you can use in other shell scripts for processing data using search-and-replace statements. Basically the old Linux/UNIX saying He said. She said. We all said “Use sed” ;-)
After researching sed
usage, you’ll learn that s
stands for search, and the first character after s
is the delimiter for each statement (:
in this case). The first sed
command searches for lines that start with $base
, and replaces them with a pipe (|
) character:
s:^'$base':|:
The second sed
command matches patterns starting with a /
, with anything but a slash ([^/]
) in the middle, and ending with End-of-Line character ($
) - this represents a filename. Next, it prepends a ??
to this filename as a marker for the third sed
command:
s:/\([^/]*\)$:?? \1:
The third sed
command matches patterns starting with a /
, until the next space, ?
or /
character. It then replaces this pattern globally throughout each line (g
) with 3 spaces and a pipe, (|
). This recursively strips parent directories from each line but their own, replacing them with pipe symbols.
s:/[^ ?/]*: |:g
And the fourth sed
command replaces all ?
characters with underscore (_
) characters, globally throughout each line:
s:?:_:g
3. Verify your trace using script output or functionality
After tracing enough of your script, make sure you correlate it to the actual functionality of the script, which can often be achieved by examining the output generated by the script, or the tasks that the script performs. After printing the $base
variable to the screen in our example, the sed
command modified the output of a recursive find
listing by replacing directories at the beginning of the hierarchy with spaces, pipe symbols and underscore characters (via a ??
placeholder). You can easily verify your trace logic by examining the classfiles directory output generated by the script.
Furthermore, certain things you didn’t know can easily be deduced by examining the output of a script. For example, if you didn’t know what $0
represented, you could run the script without arguments to see the message generated, and then compare it to the script contents to see that it represents the script path.
4. Identify the purpose of any remaining script syntax
Sometimes shell scripts contain syntax with associated functionality that isn’t immediately clear when tracing the shell script. In our example, the :
at the beginning of the shell script serves no visible function. This :
is functionally the same as the /bin/true
command in that it generates a true exit status, and nothing more. Normally a shell script starts with a hashpling/shebang such as #!/bin/bash
to tell the executing shell interpret all remaining lines using a BASH shell. However, starting a script with a :
was common practice on old UNIX systems. It was a way to identify scripts that could safely execute in any shell on the system.
Another example of this is using standard output redirection following a code block (e.g. an if
statement block). This syntax forces the standard output of all commands in the code block to a single file.
Regardless, any remaining script syntax can be easily Googled and committed to memory for future use!