psmon - Process Table Monitoring Script
$Id: psmon.html,v 1.15 2005/05/06 14:36:36 nicolaw Exp $
Syntax: psmon [--conf=filename] [--daemon] [--cron] [--user=user] [--nouser] [--adminemail=emailaddress] [--dryrun] [--verbose] [--help] [--version] --help Display this help --version Display full version information --dryrun Dry run (do not actually kill or spawn any processes) --daemon Spawn in to background daemon --cron Disables 'already running' errors with the --daemon option --conf=str Specify alternative config filename --user=str Only scan the process table for processes running as str --nouser Force scanning for all users when not run as superuser --adminemail=str Force all notification emails to be sent to str --verbose Output more verbose information
Single user account crontab operation:
MAILTO="nicolaw@cpan.org" HOME=/home/nicolaw USER=nicolaw */5 * * * * psmon --daemon --cron --conf=$HOME/etc/psmon.conf --user=$USER --adminemail=$MAILTO
Regular system-wide call from cron:
*/5 * * * * psmon --daemon --cron
Only check processes during working office hours:
* 9-17 * * * psmon
This script monitors the process table using Proc::ProcessTable, and will respawn or kill processes based on a set of rules defined in an Apache style configuration file.
Processes will be respawned if a spawn command is defined for a process, and no occurrences of that process are running. If the --user command line option is specified, then the process will only be spawned if no instances are running as the specified userid.
Processes can be killed off if they have been running for too long, use too much CPU or memory resources, or have too many concurrent versions running. Exceptions can be made to kill rulesets using the pidfile and lastsafepid directives.
If a PID file is declared for a process, psmon will never kill the process ID that is contained within the pid file. This is useful if for example, you have a script which spawns hundreds of child processes which you may need to automatically kill, but you do not want to kill the parent process.
Any actions performed will be logged to the DAEMON syslog facility by default. There is support to optionally also send notifications emails to an administrator on a global or pre-rule basis.
In addition to Perl 5.005_03 or higher, the following Perl modules are required:
Proc::ProcessTable Config::General Getopt::Long POSIX IO::File File::Basename
These two additional modules are not required, but will provide enhanced functionality if present.
Net::SMTP Unix::Syslog
The POSIX module is usually supplied with Perl as standard, as is IO::File and File::Basename. All these modules can be obtained from CPAN. Visit http://search.span.org and http://www.cpan.org for further details. For the lazy people reading this, you can try the following command to install these modules:
for m in Config::General Proc::ProcessTable Net::SMTP \ Unix::Syslog Getopt::Long; do perl -MCPAN -e"install $m";done
Alternatively you can run the install.sh script which comes in the distribution tarball. It will attempt to install the right modules, install the script and configuration file, and generate UNIX man page documentation.
By default psmon will look for its runtime configuration in /etc/psmon.conf, although this can be defined as otherwise from the command line. For system wide installations it is recommended that you install your psmon in to the default location.
The default configuration file location is /etc/psmon.conf. A different configuration file can be declared from the command line. You will find an example configuration file supplied in the etc/ directory of the distribution tarball. It is recommended that you use this as a guide to writing your own configuration file by hand. Alternatively you can use the psmon-config script which will interactively create a configuration for you.
Syntax of the configuration file is based upon that which is used by Apache. Each process to be monitored is declared with a Process scope directive like this example which monitors the OpenSSH daemon:
<Process sshd> spawncmd /sbin/service sshd start pidfile /var/run/sshd.pid instances 50 pctcpu 90 </Process>
There is a special * process scope which applies to all running processes. This special scope should be used with extreme care. It does not support the use of the spawncmd, pidfile, instances or ttl directives. A typical example of this scope might be as follows:
<Process *> pctcpu 95 pctmem 80 </Process>
Global directives which are not specific to any one process should be placed outside of any Process scopes.
Configuration directives are not case sensitive, but the values that they define are.
<Process syslogd> spawncmd /sbin/service syslogd restart pidfile /var/run/syslogd.pid instances 1 pctcpu 70 pctmem 30 </Process>
Syslog is a good example of a process which can get a little full of itself under certain circumstances, and excessively hog CPU and memory. Here we will kill off syslogd processes if it exceeds 70% CPU or 30% memory utilization.
Older running copies of syslogd will be killed if they are running, while leaving the most recently spawned copy which will be listed in the PID file defined.
<Process httpd> spawncmd /sbin/service httpd restart pidfile /var/run/httpd.pid loglevel LOG_CRIT adminemail pager@noc.company.com </Process>
Here we are monitoring Apache to ensure that it is restarted if it dies. The pidfile directive in this example is actually redundant because we have not defined any rule where we should consider killing any httpd processes.
All notifications relating to this process will be logged with the syslog priority of critical (LOG_CRIT), and all emailed to pager@noc.company.com which could typically forward to a pager.
Any failed attempts to kill or restart a process will automatically be logged as a syslog priority one level higher than that specified. If a restart of Apache were to fail in this example, a wall notification would be broadcast to all interactive terminals connected to the machine, since the next log priority up from LOG_CRIT is LOG_EMERG.
Note that the functionality to log information to syslog requires the Unix::Syslog module. In the event that Unix::Syslog is not installed, PSMon will write all status messages that would have been destined for syslog, to STDERR instead.
<Process find> noemail True ttl 3600 </Process>
Kill old find processes which have been running for over an hour. Do not send an email notification since it's not too important.
psmon is not especially fast. Much of its time is spent reading the process table. If the process table is particularly large this can take a number of seconds. Although is rarely a major problem on todays speedy machines, I have run a few tests so you take look at the times and decide if you can afford the wait.
Approximate figures from release 1.0.3:
CPU OS Open Files/Procs 1m Load Real Time PIII 1.1G Mandrake 9.0 10148 / 267 0.01 0m0.430s PIII 1.2G Mandrake 9.0 16714 / 304 0.44 0m0.640s Celeron 500 Red Hat 6.1 1780 / 81 1.27 0m0.880s PII 450 Red Hat 6.0 300 / 23 0.01 0m1.050s 2x Xeon 1.8G Mandrake 9.0 90530 / 750 0.38 0m1.130s Celeron 500 Red Hat 6.1 1517 / 77 1.00 0m1.450s PIII 866 Red Hat 8.0 3769 / 76 0.63 0m1.662s PIII 750 Red Hat 6.2 754 / 35 3.50 0m2.170s
These production machines were running the latest patched stock distribution kernels. I have listed the total number of open file descriptors, processes running and 1 minute load average to give you a slightly better context of the performance.
Approximate figures from release 1.17:
CPU OS 1m Load CPU Time UltraSPARC-IIe 500Mhz SunOS 5.9 0.10 0m0.550s Athlon XP 2400+ 2Ghz RHEL 3.0 1.00 0m0.150s
Reads the current process table, checks and then executes any appropriate action to be taken. Does not accept any parameters.
Attempts to kill a process with its killcmd, or failing that using the kill() function. Accepts the process name, syslog log level, email notification to address and a reference to the %slay hash.
Slurps up the contents of a temporary log file and returns it as a chomped array after unlinking the temporary log file.
Prints a Red Hat sysvinit style status message. Accepts an array of messages to display in sequence.
Attempts to spawn a process. Accepts the process name, syslog log level, mail notification to address and spawn command.
Displays command line help.
Determine what UID to scan for in the process table.
Reads in runtime configuration options.
An evil bastard fudge to ensure that we're only dealing with numerics when necessary, from the config file and Proc::ProcessTable scan.
Launches the process in to the background. Checks to see if there is already an instance running.
The __DATA__ section of the PSMon code contains a stub version of the Unix::Syslog module. It is automatically loaded in the event that the real Unix::Syslog module is not present and/or cannot be loaded. This stub module provides very basic functionality to output the messages generated by the PSMon::Logging module to STDERR, instead of simply dropping them.
Hopefully none. ;-) Send any bug reports to me at nicolaw@cpan.org along with any patches and details of how to replicate the problem. Please only send reports for bugs which can be replicated in the latest version of the software. The latest version can always be found at http://search.cpan.org/~nicolaw/
The following functionality will be added soon:
nsmon
Written by Nicola Worthington, <nicolaw@cpan.org>. Copyright (C) 2002,2003,2004,2005 Nicola Worthington.
This program is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 2 of the License, or (at your option) any later version.
This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.
You should have received a copy of the GNU General Public License along with this program; if not, write to the Free Software Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA.
Nicola Worthington <nicolaw@cpan.org>
http://search.cpan.org/~nicolaw/
http://www.psmon.com
http://www.nicolaworthington.com