Kill All Child Processes
If you run a script that launches child processes which might hang and need to be killed by the parent script, and if those child processes in turn launch grandchilren, then it's not enough for the parent script to just kill them, because the grandchildren will be left as zombies.
Most answers to this problem suggest using the -groupId option of kill, but that won't work if you don't want to kill the whole process tree, i.e. if you just want to kill the hung child process and its grandchildren.
#!/bin/sh PIDFILE="/tmp/my.pid" PKGS="binary_might_hang another_might_hang one_more_might_hang" for PKG in $PKGS do # create a new process to execute the following two script commands # the first command will launch yet another process that might hang ( "$PKG"; rm -f "$PIDFILE"; ) & # record the pid of the new script process PID=$! # create the pid file to be deleted by the above then sleep echo "$PID" > "$PIDFILE" COUNT=600 while [ $COUNT -gt 0 ]; do sleep 2 COUNT=`expr $COUNT - 2` # if the pid file has been deleted then the process completed if [ ! -e "$PIDFILE" ]; then break; fi done # if the pid file hasn't been deleted then the process failed to complete if [ -e "$PIDFILE" ]; then # ack! this only kills the script process, not its child that hung kill -9 -$PID rm -f "$PIDFILE" fi done
Here's an example of PIDS when the above is run.
# ps -ef UID PID PPID C STIME TTY TIME CMD root 13522 13515 0 08:38 pts/0 00:00:00 -bash root 13619 13522 0 08:49 pts/0 00:00:00 /bin/sh /my_script.sh root 13631 13619 0 08:49 pts/0 00:00:00 /bin/sh /my_script.sh root 13633 13631 3 08:49 pts/0 00:00:03 /binary_might_hang root 14158 13619 0 08:57 pts/0 00:00:00 sleep 2 # ps x -o "%p %r %y %x %c " PID PGID TTY TIME COMMAND 13619 13619 pts/0 00:00:00 my_script.sh 13631 13619 pts/0 00:00:00 my_script.sh 13633 13619 pts/0 00:00:03 binary_might_hang 13868 13619 pts/0 00:00:00 sleep
So you can't kill the 13619 group, because that will kill the top level script as well.
# this kills everything kill -9 -13619
And you can't kill just 13631 and children because 13631 isn't a group.
# kill -9 -13631 -bash: kill: (-13631) - No such process
But you can get a list of direct children of a given process.
# ps -ef | grep 15907 root 15907 1 0 09:21 pts/0 00:00:00 /bin/sh /my_script.sh root 15909 15907 0 09:21 pts/0 00:00:03 /binary_might_hang root 16420 13586 0 10:41 pts/1 00:00:00 grep 15907 # ps -ef | sed -n "s/^root *\([0-9][0-9]*\) *15907 *.*/\1/p" 15909
And you can use that in a script to kill them all.
#!/bin/sh PID=15907 ps -ef | sed -n "s/^root *\([0-9][0-9]*\) *$PID .*/\1/p" | while read LINE; do echo "killing $LINE" kill -9 $LINE done echo "killing $PID" kill -9 $PID
As an imporvement, you could wrap that in a function to kill multiple levels of children. Or you could write the pids to a file and kill parents first and then children to prevent the situation where after killing a child the parent launches a new child you don't know about.
FYI, here's a explanation of the above sed command.
sed -n "s/^root *\([0-9][0-9]*\) *$PID *.*/\1/p" -n print nothing s do substitution / delimiter ^ find the start of the line root followed by the word root followed by a space * followed by zero or more spaces \( start of part we'll reference later [0-9] followed by a digit [0-9] followed by zero or more digits \) end of part we'll reference later followed by a space * followed by zero or more spaces $PID follwed by the value from our $PID variable followed by a space .* followed by anything (which brings us to the end of the line) / delimiter \1 replace the matched portion (which happens to be the whole line) with the part we referenced via \(\) / delimiter p print matched lines