Thursday, July 25, 2013

Kill All Child Processes

If you run a script that launches child processes which might hang and need to be killed by the parent script, and if those child processes in turn launch grandchilren, then it's not enough for the parent script to just kill them, because the grandchildren will be left as zombies.

Most answers to this problem suggest using the -groupId option of kill, but that won't work if you don't want to kill the whole process tree, i.e. if you just want to kill the hung child process and its grandchildren.

#!/bin/sh
PIDFILE="/tmp/my.pid"
PKGS="binary_might_hang another_might_hang one_more_might_hang"
for PKG in $PKGS
do
  # create a new process to execute the following two script commands
  # the first command will launch yet another process that might hang
  ( "$PKG"; rm -f "$PIDFILE"; ) &
  # record the pid of the new script process
  PID=$!
  # create the pid file to be deleted by the above then sleep
  echo "$PID" > "$PIDFILE"
  COUNT=600
  while [ $COUNT -gt 0 ]; do
    sleep 2
    COUNT=`expr $COUNT - 2`
    # if the pid file has been deleted then the process completed
    if [ ! -e "$PIDFILE" ]; then
      break;
    fi
  done
  # if the pid file hasn't been deleted then the process failed to complete
  if [ -e "$PIDFILE" ]; then
    # ack! this only kills the script process, not its child that hung
    kill -9 -$PID
    rm -f "$PIDFILE"
  fi
done

Here's an example of PIDS when the above is run.

# ps -ef
UID        PID  PPID  C STIME TTY          TIME CMD
root     13522 13515  0 08:38 pts/0    00:00:00 -bash
root     13619 13522  0 08:49 pts/0    00:00:00 /bin/sh /my_script.sh
root     13631 13619  0 08:49 pts/0    00:00:00 /bin/sh /my_script.sh
root     13633 13631  3 08:49 pts/0    00:00:03 /binary_might_hang
root     14158 13619  0 08:57 pts/0    00:00:00 sleep 2

# ps x -o  "%p %r %y %x %c "
  PID  PGID TTY          TIME COMMAND
13619 13619 pts/0    00:00:00 my_script.sh
13631 13619 pts/0    00:00:00 my_script.sh
13633 13619 pts/0    00:00:03 binary_might_hang
13868 13619 pts/0    00:00:00 sleep

So you can't kill the 13619 group, because that will kill the top level script as well.

# this kills everything
kill -9 -13619

And you can't kill just 13631 and children because 13631 isn't a group.

# kill -9 -13631
-bash: kill: (-13631) - No such process

But you can get a list of direct children of a given process.

# ps -ef | grep 15907
root     15907     1  0 09:21 pts/0    00:00:00 /bin/sh /my_script.sh
root     15909 15907  0 09:21 pts/0    00:00:03 /binary_might_hang
root     16420 13586  0 10:41 pts/1    00:00:00 grep 15907

# ps -ef | sed -n "s/^root *\([0-9][0-9]*\) *15907 *.*/\1/p"
15909

And you can use that in a script to kill them all.

#!/bin/sh
PID=15907
ps -ef | sed -n "s/^root  *\([0-9][0-9]*\)  *$PID .*/\1/p" | while read LINE; do
  echo "killing $LINE"
  kill -9 $LINE
done
echo "killing $PID"
kill -9 $PID

As an imporvement, you could wrap that in a function to kill multiple levels of children. Or you could write the pids to a file and kill parents first and then children to prevent the situation where after killing a child the parent launches a new child you don't know about.

FYI, here's a explanation of the above sed command.

sed -n "s/^root *\([0-9][0-9]*\) *$PID *.*/\1/p"

-n    print nothing
s     do substitution
/     delimiter
^     find the start of the line
root  followed by the word root
      followed by a space
 *    followed by zero or more spaces
\(    start of part we'll reference later
[0-9] followed by a digit
[0-9] followed by zero or more digits
\)    end of part we'll reference later
      followed by a space
 *    followed by zero or more spaces
$PID  follwed by the value from our $PID variable
      followed by a space
.*    followed by anything (which brings us to the end of the line)
/     delimiter
\1    replace the matched portion (which happens to be the whole line) with the part we referenced via \(\)
/     delimiter
p     print matched lines
{ "loggedin": false, "owner": false, "avatar": "", "render": "nothing", "trackingID": "UA-36983794-1", "description": "", "page": { "blogIds": [ 433 ] }, "domain": "holtstrom.com", "base": "\/michael", "url": "https:\/\/holtstrom.com\/michael\/", "frameworkFiles": "https:\/\/holtstrom.com\/michael\/_framework\/_files.4\/", "commonFiles": "https:\/\/holtstrom.com\/michael\/_common\/_files.3\/", "mediaFiles": "https:\/\/holtstrom.com\/michael\/media\/_files.3\/", "tmdbUrl": "http:\/\/www.themoviedb.org\/", "tmdbPoster": "http:\/\/image.tmdb.org\/t\/p\/w342" }