Performance & efficiency
Before we begin, I would like to clarify that the power of shell scripting stems from the Unix-native packages. In this post, I will focus on what I believe to be the most crucial aspect of any programming language, which is efficiency.
One should do as little as possible in shell script and aim just to use it to connect the existing logic available in the rich set of utilities available on a UNIX system. !
It is worth noting that even ChatGpt
, while powerful in its own right, is not
trained to write efficient code due to the limitations of the training data.
Therefore, there is a risk of producing suboptimal code.
To illustrate why efficiency is critical, let us consider a straightforward shell function that stores keys and values from a file.
read_keys_values(){
while read line; do
key=$(echo $line | awk '{print $1}')
value=$(echo $line | awk '{print $2}')
done < file.txt
}
# file.txt 2966 lines of keys and values (Example: key1 567189)
Note: Just for Demo purposes
The above function is not really practical since the keys/values will be
updated in each iteration.
- Now let’s measure the execution time of the above function using the
time
command.
time ./read_keys_values
real 0m28.966s
user 0m25.133s
sys 0m5.073s
It took about 29 seconds 😮.
Tipp
: the time
command usages
time ./script.sh
time SomeCommand
# Or you can use it inside your script to time specific functions
Now let’s see what’s wrong!
- unnecessary use of
awk
- unnecessary use of
echo
- unnecessary use of the while loop
A more efficient example would be
read_keys_values(){
key="$(cut -d' ' -s -f1 file.txt)"
value="$(cut -d' ' -s -f2 file.txt)"
}
# file.txt 2966 lines of keys and values (Example: key1 567189)
time ./read_keys_values
real 0m0.003s
user 0m0.003s
sys 0m0.000s
It took 0.003 second for a 2966 line file 🤓!
Now why not awk
?
cut
is a way faster thanawk
so if you really don’t need it don’t use it !
Stop using cat
if you don’t need it !
# Bad practise
cat file.txt | cut -d' ' -f1
cat file.txt | grep "Search For Something"
# Good practise
cut -d' ' -f1 file.txt
grep "Search For Something" file.txt
- Same is true for all other packages
'tr, grep, find, sed etc ...'
- Remember
time
is your friend!
Use Streams
- Use streams instead of writing to files can be more efficient and can help avoid unnecessary disk I/O operations. When you write to a file, the data has to be written to disk, which can slow down your script if you are writing a lot of data.
- Use (variables, arrays etc) instead of storing data to a file
command1 | command2
This sends the output of “command1” to “command2” without having to write it to a file first. This can be especially useful when dealing with large amounts of data or when working with sensitive information that you don’t want to save to disk.
- When you use a temporary file make sure you cleanup !
someFonction{
# Doing something
trap cleanup INT QUIT TERM EXIT
cleanup(){
# remove the temporary when something goes wrong (or when the script finishes)
[ -f $tmpfile ] && rm $tmpfile
}
}
Stop using sed for simple stuff
Use ${a// /_}
to replace spaces in variable names with underscore instead of
# Bad Practise
sed 's/ /_/g' VAR
Best Practices for File Naming
Use “./*.pdf” instead of “*.pdf”
To improve security, it is recommended to use the file path prefix of ./
when
specifying PDF files in a command. Instead of using just *.pdf
, which would
match any PDF file in the current directory and possibly in subdirectories, use
the more specific pattern ././*.pdf
.
Using ././*.pdf
ensures that the command only operates on PDF files in the
current directory and not in subdirectories, which could potentially contain
files that are not intended to be operated on. This is an important security
measure to prevent accidental or malicious actions on files outside of the
current directory.
Why use sh over bash?
While bash
is a more powerful shell language than sh
, it also has more
complexity and features that can make scripts more difficult to read and
maintain.
Here are some reasons why you might choose sh
over bash
:
Portability:
sh
is more widely available on different Unix-like systems thanbash
, which may not be installed by default on some systems. This means that scripts written in sh are more likely to work on different systems without modifications.Efficiency:
sh
is a simpler and more lightweight language thanbash
, which can make scripts run faster and use less system resources.Simplicity:
sh
has a simpler syntax and fewer features thanbash
, which can make scripts easier to read and maintain.
Use set -e to exit on errors
Add set -e
at the top of your script to exit immediately if any command returns
a non-zero status code. This can help catch errors early and prevent your
script from continuing in an invalid state.
Use $(command)
instead of backticks.
Note:
backticks: `someCommand`
Use $(command)
instead of backticks
to execute commands and capture their
output. Backticks can be difficult to read and can cause syntax errors in some
cases.
Debugging tricks
set -x # activate debugging from here
# Some Logic
set +x # stop debugging from here
Use printf instead of echo
Use printf
instead of echo
for more consistent and portable output formatting.
printf also supports more advanced formatting options.
Styling and readability
# Use this to conditionally execute a command based on the value of a variable
[ "$var" ] && command1 # If var is empty, command1 will not execute
# Instead of this, which can lead to unexpected behavior if var contains whitespace or special characters
[ ! -z $var ] && something
# Use this to check if the value of a variable is equal to a specific string
[ "$var" = "find" ] && echo found
# Instead of this, which is longer and less readable
if [ "$var" -eq 'find' ]; then
echo found
fi
# Use this to set a default value for a variable
"${var=value}"
# Instead of this, which is longer and less efficient
[ "$var" ] || var="value"