Performance Issues / fork() errors
Re: Performance Issues / fork() errors
I spoke too soon.
-
scottwilkerson
- DevOps Engineer
- Posts: 19396
- Joined: Tue Nov 15, 2011 3:11 pm
- Location: Nagios Enterprises
- Contact:
Re: Performance Issues / fork() errors
That didn't work?chrisp wrote:I spoke too soon.
Re: Performance Issues / fork() errors
Nope. I'm losing the will to live.
Re: Performance Issues / fork() errors
Well, don't notarize the living will just yet. Maybe the sleep is not long enough, or the order is not correct. You could also have race conditions concerning those 3 services. Maybe sleep between each service start? Ideally you will not be rebooting often, but I understand the desire and requirement for production boxes to come back up unattended after a reboot.
Former Nagios employee
"It is turtles. All. The. Way. Down. . . .and maybe an elephant or two."
VI VI VI - The editor of the Beast!
Come to the Dark Side.
"It is turtles. All. The. Way. Down. . . .and maybe an elephant or two."
VI VI VI - The editor of the Beast!
Come to the Dark Side.
-
scottwilkerson
- DevOps Engineer
- Posts: 19396
- Joined: Tue Nov 15, 2011 3:11 pm
- Location: Nagios Enterprises
- Contact:
Re: Performance Issues / fork() errors
Looking at this again, the first item looks like the nagios.lock file is not being created, the second item will only appear if the lock file exists but the process is not running... Can we look at the following permissionschrisp wrote:Code: Select all
nagios: No lock file found in /usr/local/nagios/var/nagios.lock ndo2db: ndo2db is not running but subsystem locked
Code: Select all
ls -l /usr/local/nagiosxi/var/subsys
ls -ld /usr/local/nagiosxi/var/subsysRe: Performance Issues / fork() errors
@abrist: I toyed with varying sleeps, all the way up to 300s
Code: Select all
# ls -l /usr/local/nagiosxi/var/subsys
total 4
-rw-r--r-- 1 nagios nagios 0 Feb 20 22:22 nagios
-rw-r--r-- 1 nagios nagios 0 Feb 20 22:22 ndo2db
-rw-r--r-- 1 nagios nagios 4 Feb 20 22:21 npcd.pid
# ls -ld /usr/local/nagiosxi/var/subsys
drwxr-xr-x 2 nagios nagios 4096 Feb 20 22:22 /usr/local/nagiosxi/var/subsys
-
scottwilkerson
- DevOps Engineer
- Posts: 19396
- Joined: Tue Nov 15, 2011 3:11 pm
- Location: Nagios Enterprises
- Contact:
Re: Performance Issues / fork() errors
We are trying to replicate this but are not having any luck, is there any special configuration on this system?
Re: Performance Issues / fork() errors
RAMdisk is in play, but nothing exotic, unless Gavin says otherwise...
This is sorting it out while we figure out stuff: -
This is sorting it out while we figure out stuff: -
Code: Select all
# crontab -l
## Restart any errant programs
* * * * * /home/admin/bin/stuffstarter 2>&1 >/dev/null
Code: Select all
# cat /home/admin/bin/stuffstarter
#!/bin/sh
## stuffstarter - cuz sometimes stuff just doesn't start...
EMAIL="chrisp"
HOSTNAME=`hostname`
TIME=`date`
SERVICES="rrdcached ndo2db nagios"
for SERVICE in ${SERVICES}
do
/sbin/service ${SERVICE} status 2>&1 | grep -q 'is running'
if [ "$?" == "0" ]
then
echo "${SERVICE}: OK"
else
echo "${SERVICE}: Problem"
(
echo ""
echo "${HOSTNAME} @ ${TIME}"
echo ""
echo "$0 is restarting ${SERVICE}"
echo ""
/sbin/service ${SERVICE} status
echo ""
/sbin/service ${SERVICE} restart
echo ""
) | /bin/mail -s "Restarting ${SERVICE}" ${EMAIL}
fi
done
exit 0
-
scottwilkerson
- DevOps Engineer
- Posts: 19396
- Joined: Tue Nov 15, 2011 3:11 pm
- Location: Nagios Enterprises
- Contact:
Re: Performance Issues / fork() errors
We will continue to try to replicate this problem here and post back what we find.