Multithreaded Shell Script

I had a queue of jobs that I wanted to process, but I wanted to utilize more than one thread while also avoiding just creating a thread for every job in the queue. The goal is to implement a scalable system both in terms of adding cores to a virtual machine but also in adding additional machines (discussed later).

Proof of Value: I have a process that uses tesseract to convert a bunch of JPGs into PDF files, then pdfunite to join those into a single PDF. Working on a single file took 81 seconds, but working with 4 threads (on a 4-core VM), I was able to create 4 identical PDFs in about 104 seconds. That’s a 400% increase in throughput with only a 28% increase in time. As the processing power and core count increases, we can expect improved throughput with additional threads.

I am going to use the number of cores on the computer to determine how many simultaneous threads my script should be limited to. This is absolutely a ‘best guess’ scenario, and you may prefer to run two or more times as many cores on the computer. I am making no attempt to spread out the threads or identify how busy a core is (the OS should be doing that). In reality, multiple threads are already running on all the cores from different processes.

The script will be run on a cron job, every minute. Once run, it will run for 50 years, or until the server is shut down. However, it will use an exclusive lock to make sure that only one instance of the script is running at a time (except for the child threads of course).

The demo script will scan for a set of files, then fork itself (once for each thread) to sleep for several random seconds and then delete the file.

The script starts with a standard header and the declaration of a temp folder for our files:

#!/bin/bash
#define our own tmp folder name, should be unique
tmpdir='/tmp/mtdemo'

Then, it tests to see if we got a command line argument. If we did, it means the script was invoked with information to do some job in the child thread. In our example, we’re just sleeping and deleting some flag and data files to simulate doing real work:

if [ "$1" != "" ]; then
    # If we got an argument, we were told to do something. This is where the child works:
    RANDOM=$$
    R=$(($(($RANDOM%8))+2))
    echo "Child thread in slot $2 started with $1.  Simulating processing with sleep $R"
    sleep "$R"
    # clean up the files that control synchronization
    echo "Child thread in slot $2 finished with $1."
    trap "rm -f ~/$1; rm -f $tmpdir/$1; rm -f $tmpdir/slot.$2" exit
 else

This demo code will show you what ‘slot’ the thread is running in, and the file it’s processing, along with start and stop messages. Note how we’re using trap to clean up any orphaned files in case of an error.

The following code snippet shows how to insure we are running only a single copy of the main file monitor that will invoke a child thread as needed. The logic for identifying a free ‘slot’ and a file to be processed will be located within the do … done block below:

(
    # Wait for lock on /tmp/mtdemo_exclusivelock (fd 200) for 1 second
    flock -x -w 1 200 || exit 1
    end=$(date -ud "50 year" +%s)
    while [[ $(date -u +%s) -le $end ]]
    do
       # If needed, invoke ourselves as a forked process (&) with some parameters:
       if something
          $0 "$key" "$slot"&
       fi
    done
 ) 200>"/tmp/mtdemo_exclusivelock"

The parens will create a code block that will be identified as a file handle of 200 on the file /tmp/mtdemo_exclusivelock. Inside that block, it uses flock to attempt to lock the file, and if the lock can’t be obtained then exit is invoked, otherwise it proceeds into the block. The end variable and the while loop create a loop that will execute for 50 years (or until the server shuts down).

In order to identify a free ‘slot’ for threading, we will first get the number of cores with the following:

cpus=$(nproc)

Then, we use files to try and identify free slots. For example, let’s say we have 4 cores. We will test to see if the file slot.1 exists. If it does, then that means a thread is already running on that slot, and we proceed to test for slot.2. If it doesn’t exist, then we’re going to start a thread with ‘1’ as a parameter, and the child thread will know to create slot.1 at the start of it’s processing, and delete it at the end. The child process will create and destroy the slot.1 file as a flag.

Here’s the loop that tests for a free slot:

slot=0
for i in $(seq 1 $cpus); do
   if [ ! -f "$tmpdir/slot.$i" ]; then
      slot=$i
   fi
done

At the end of the loop above, if slot is > 0, then a free slot was found. The bash test would be:

if [ $slot -gt 0 ]; then
   ... Do something
fi

Now, the Do something comment above is going to look for files to process. For our demo, we’re just going to look for .tmp files in our home folder. The ‘Processing’ that will be done for each file is the same sleep and file delete with messages discussed above. This loop shows how we look for .tmp files to ‘process’:

for d in ~/*.tmp ; do
   [ -f "$d" ] || continue
   key=$(basename -- "$d")
   if [ ! -f "$tmpdir/$key" ]; then
      echo "Working $d" > "$tmpdir/slot.$slot"
      echo "Working slot $slot" > "$tmpdir/$key"
      $0 "$key" "$slot"&
      break;
   fi
done

Here’s a key component: When a file to process is found, the script runs itself in a fork (&) passing to it the name of the file to process. If no files are found, no fork takes place.

In order to test the script, let’s first make several bogus .tmp files in our home folder (make twice as many as nproc reports, which is 4 in my case):

echo Bogus File > 1.tmp
echo Bogus File > 2.tmp
echo Bogus File > 3.tmp
echo Bogus File > 4.tmp
echo Bogus File > 5.tmp
echo Bogus File > 6.tmp
echo Bogus File > 7.tmp
echo Bogus File > 8.tmp

With the script in the home folder, run it. The results should look like this:

[root@fpsrv5 ~]# ./demo.sh
Cores found: 4
Child thread in slot 4 started with 1.tmp.  Simulating processing with sleep 9
Child thread in slot 3 started with 2.tmp.  Simulating processing with sleep 3
Child thread in slot 2 started with 3.tmp.  Simulating processing with sleep 4
Child thread in slot 1 started with 4.tmp.  Simulating processing with sleep 4
Child thread in slot 3 finished with 2.tmp.
Child thread in slot 3 started with 5.tmp.  Simulating processing with sleep 7
Child thread in slot 2 finished with 3.tmp.
Child thread in slot 1 finished with 4.tmp.
Child thread in slot 2 started with 6.tmp.  Simulating processing with sleep 7
Child thread in slot 1 started with 7.tmp.  Simulating processing with sleep 5
Child thread in slot 4 finished with 1.tmp.
Child thread in slot 4 started with 8.tmp.  Simulating processing with sleep 4
Child thread in slot 1 finished with 7.tmp.
Child thread in slot 3 finished with 5.tmp.
Child thread in slot 2 finished with 6.tmp.
Child thread in slot 4 finished with 8.tmp.

So we can see the slots 4, 3, 2, and 1 were child threads for files 1.tmp, 2.tmp, 3.tmp, and 4.tmp respectively. Note how slot 3 started on 2.tmp with a 3 second sleep. Because of that, it finished first, and you can see that slot 3 then started processing file 5.tmp (with a sleep of 7 seconds).

To complete the practical use of this script, we would remove any of the information ‘echo’ statements we didn’t want, and add it to a cron job so that it would run all the time.

Entire Script

#!/bin/bash
#define our own tmp folder name, should be unique
tmpdir='/tmp/mtdemo'
 
if [ "$1" != "" ]; then
   # If we got an argument, we were told to do something. This is where the child works:
   RANDOM=$$
   R=$(($(($RANDOM%8))+2))
   echo "Child thread in slot $2 started with $1.  Simulating processing with sleep $R"
   sleep "$R"
   # clean up the files that control synchronization
   echo "Child thread in slot $2 finished with $1."
   trap "rm -f ~/$1; rm -f $tmpdir/$1; rm -f $tmpdir/slot.$2" exit
 else
   #execute the following only if we get an exclusive lock.  The first run will hold the lock for 50 years.
   (
      # Wait for lock on  /tmp/mtdemo_exclusivelock (fd 200) for 1 second
      flock -x -w 1 200 || exit 1
 
      #Start by wiping out the synch files, incase we were rebooted mid-job
      rm -r -f "$tmpdir"
      mkdir -p "$tmpdir"
 
      #determine how many processors we have to control thread count
      cpus=$(nproc)
      echo "Cores found: $cpus"
      end=$(date -ud "50 year" +%s)
      while [[ $(date -u +%s) -le $end ]]
      do
         #Locate a free slot for a CPU based on file existance
         slot=0
         for i in $(seq 1 $cpus); do
            if [ ! -f "$tmpdir/slot.$i" ]; then
               slot=$i
            fi
         done
 
         #if we got a free slot, then do we have a file that needs processing?
         if [ $slot -gt 0 ]; then
            for d in ~/*.tmp ; do
               [ -f "$d" ] || continue
               key=$(basename -- "$d")
               if [ ! -f "$tmpdir/$key" ]; then
                  echo "Working $d" > "$tmpdir/slot.$slot"
                  echo "Working slot $slot" > "$tmpdir/$key"
                  $0 "$key" "$slot"&
                  break;
               fi
            done
         fi
      done
   ) 200>"/tmp/mtdemo_exclusivelock"
fi