Performance & efficiency

Before we begin, I would like to clarify that the power of shell scripting stems from the Unix-native packages. In this post, I will focus on what I believe to be the most crucial aspect of any programming language, which is efficiency.

One should do as little as possible in shell script and aim just to use it to connect the existing logic available in the rich set of utilities available on a UNIX system. !

It is worth noting that even ChatGpt, while powerful in its own right, is not trained to write efficient code due to the limitations of the training data. Therefore, there is a risk of producing suboptimal code.

To illustrate why efficiency is critical, let us consider a straightforward shell function that stores keys and values from a file.

read_keys_values(){
  while read line; do

    key=$(echo $line | awk '{print $1}')

    value=$(echo $line | awk '{print $2}')

  done < file.txt
}
# file.txt 2966 lines of keys and values (Example: key1 567189)

Note: Just for Demo purposes The above function is not really practical since the keys/values will be updated in each iteration.

  • Now let’s measure the execution time of the above function using the time command.
time ./read_keys_values

real	0m28.966s
user	0m25.133s
sys	0m5.073s

It took about 29 seconds 😮.

Tipp: the time command usages

time ./script.sh
time SomeCommand
# Or you can use it inside your script to time specific functions 

Now let’s see what’s wrong!

  • unnecessary use of awk
  • unnecessary use of echo
  • unnecessary use of the while loop

A more efficient example would be

read_keys_values(){
key="$(cut -d' ' -s -f1 file.txt)"
value="$(cut -d' ' -s -f2 file.txt)"
}
# file.txt 2966 lines of keys and values (Example: key1 567189)
time ./read_keys_values
real	0m0.003s
user	0m0.003s
sys	0m0.000s

It took 0.003 second for a 2966 line file 🤓!

Now why not awk ?

  • cut is a way faster than awk so if you really don’t need it don’t use it !

Stop using cat if you don’t need it !

# Bad practise
cat file.txt | cut -d' ' -f1
cat file.txt | grep "Search For Something"
 
# Good practise
cut -d' ' -f1 file.txt
grep "Search For Something" file.txt
  • Same is true for all other packages 'tr, grep, find, sed etc ...'
  • Remember time is your friend!

Use Streams

  • Use streams instead of writing to files can be more efficient and can help avoid unnecessary disk I/O operations. When you write to a file, the data has to be written to disk, which can slow down your script if you are writing a lot of data.
  • Use (variables, arrays etc) instead of storing data to a file
command1 | command2

This sends the output of “command1” to “command2” without having to write it to a file first. This can be especially useful when dealing with large amounts of data or when working with sensitive information that you don’t want to save to disk.

  • When you use a temporary file make sure you cleanup !
someFonction{
 # Doing something
 
  trap cleanup INT QUIT TERM EXIT
    cleanup(){
        # remove the temporary when something goes wrong (or when the script finishes)
        [ -f $tmpfile ] && rm $tmpfile 
    }
}

Stop using sed for simple stuff

Use ${a// /_} to replace spaces in variable names with underscore instead of

# Bad Practise
sed 's/ /_/g' VAR

Best Practices for File Naming

Use “./*.pdf” instead of “*.pdf”

To improve security, it is recommended to use the file path prefix of ./ when specifying PDF files in a command. Instead of using just *.pdf, which would match any PDF file in the current directory and possibly in subdirectories, use the more specific pattern ././*.pdf.

Using ././*.pdf ensures that the command only operates on PDF files in the current directory and not in subdirectories, which could potentially contain files that are not intended to be operated on. This is an important security measure to prevent accidental or malicious actions on files outside of the current directory.

Why use sh over bash?

While bash is a more powerful shell language than sh, it also has more complexity and features that can make scripts more difficult to read and maintain.

Here are some reasons why you might choose sh over bash:

  • Portability: sh is more widely available on different Unix-like systems than bash, which may not be installed by default on some systems. This means that scripts written in sh are more likely to work on different systems without modifications.

  • Efficiency: sh is a simpler and more lightweight language than bash, which can make scripts run faster and use less system resources.

  • Simplicity: sh has a simpler syntax and fewer features than bash, which can make scripts easier to read and maintain.

Use set -e to exit on errors

Add set -e at the top of your script to exit immediately if any command returns a non-zero status code. This can help catch errors early and prevent your script from continuing in an invalid state. Use $(command) instead of backticks.

Note: backticks: `someCommand`

Use $(command) instead of backticks to execute commands and capture their output. Backticks can be difficult to read and can cause syntax errors in some cases.

Debugging tricks

set -x			# activate debugging from here
# Some Logic 
set +x			# stop debugging from here

Use printf instead of echo

Use printf instead of echo for more consistent and portable output formatting. printf also supports more advanced formatting options.

Styling and readability

# Use this to conditionally execute a command based on the value of a variable
[ "$var" ] && command1     # If var is empty, command1 will not execute

# Instead of this, which can lead to unexpected behavior if var contains whitespace or special characters
[ ! -z $var ] && something
# Use this to check if the value of a variable is equal to a specific string
[ "$var" = "find" ] && echo found

# Instead of this, which is longer and less readable
if [ "$var" -eq 'find' ]; then
    echo found
fi
# Use this to set a default value for a variable
"${var=value}"

# Instead of this, which is longer and less efficient
[ "$var" ] || var="value"
If you have any insights or suggestions, I would love to hear them 🙂.