Nagios frozen, not updating service status.

Support forum for Nagios Core, Nagios Plugins, NCPA, NRPE, NSCA, NDOUtils and more. Engage with the community of users including those using the open source solutions.
Locked
sh4ka
Posts: 2
Joined: Mon Nov 08, 2010 9:17 am

Nagios frozen, not updating service status.

Post by sh4ka »

Hi everybody,

I have a very strange problem on my nagios server.

I run nagios 3.2.0, compiled from source on a RHEL 5.5 plain server.
It was working fine since about 1 year ago when it was initially installed.

However, about 12 hours ago it stopped working. The admin interface is loading fine, I can surf and use all the options in the menu, however, the web interface doesn't seem to be updating the nagios status of all my service checks on all my 60 servers.

In fact, the checks are not really being executed as i do not see any nagios activity on the logs, it doesn't check for any local o remote service, and we are talking about a simple tcp 80 response or mailing on port 25, nrpe doesn't show any activity for internal load average or disk space checks.

When I launch the nrpe or tcp checks, they work fine and report good results from the shell:

Code: Select all

[root@server.myserver.com:~]/home/nagios/libexec/check_tcp -H REMOTE.SRV.IP -p 80
TCP OK - 0.001 second response time on port 80|time=0.000535s;;;0.000000;10.000000
[root@server.myserver.com:~]

Code: Select all

[root@server.myserver.com:~]/home/nagios/libexec/check_nrpe -H REMOTE.SRV.IP -c check_load
OK - load average: 2.35, 6.33, 4.46|load1=2.350;15.000;30.000;0; load5=6.330;10.000;25.000;0; load15=4.460;5.000;20.000;0;
[root@server.myserver.com:~]
However, this results never got updated at the nagios web interface. It's like it's kind of frozen.
This was the last thing nagios system reported to the logs before it got frozen:

Code: Select all

[1289196000] CURRENT SERVICE STATE: server223_01;Particion /mnt/disk2;OK;HARD;1;DISK OK - free space: /mnt/disk2 193225 MB (88% inode=99%):
[1289196279] Auto-save of retention data completed successfully.
[1289199879] Auto-save of retention data completed successfully.
[1289203479] Auto-save of retention data completed successfully.
[1289207079] Auto-save of retention data completed successfully.
[1289210679] Auto-save of retention data completed successfully.
[1289214279] Auto-save of retention data completed successfully.
[1289217879] Auto-save of retention data completed successfully.
[1289221479] Auto-save of retention data completed successfully.
[1289221859] Caught SIGTERM, shutting down...
[1289221859] Successfully shutdown... (PID=18009)
[1289221860] Nagios 3.2.0 starting... (PID=29329)
[1289221860] Local time is Mon Nov 08 07:11:00 CST 2010
[1289221860] LOG VERSION: 2.0
[1289221860] Finished daemonizing... (New PID=29330)
Now the only thing I got when I restart the service is this:

Code: Select all

[1289238578] Caught SIGTERM, shutting down...
[1289238578] Successfully shutdown... (PID=13234)
[1289238579] Nagios 3.2.0 starting... (PID=13293)
[1289238579] Local time is Mon Nov 08 11:49:39 CST 2010
[1289238579] LOG VERSION: 2.0
[1289238579] Finished daemonizing... (New PID=13294)

Code: Select all

[root@server.myserver.com:~]pidof nagios
13294
Any ideas are appreciated!

Thanks!
Last edited by sh4ka on Mon Nov 08, 2010 9:21 am, edited 1 time in total.
sh4ka
Posts: 2
Joined: Mon Nov 08, 2010 9:17 am

Re: Nagios frozen, not updating service status.

Post by sh4ka »

Verbose output of nagios binary also shows everything OK:

Code: Select all

[root@server.myserver.com:~]/home/nagios/bin/nagios -v /home/nagios/etc/nagios.cfg

Nagios Core 3.2.0
Copyright (c) 2009 Nagios Core Development Team and Community Contributors
Copyright (c) 1999-2009 Ethan Galstad
Last Modified: 08-12-2009
License: GPL

Website: http://www.nagios.org
Reading configuration data...
   Read main config file okay...
Processing object config file '/home/nagios/etc/objects/commands.cfg'...
Processing object config file '/home/nagios/etc/objects/contacts.cfg'...
Processing object config file '/home/nagios/etc/objects/timeperiods.cfg'...
Processing object config file '/home/nagios/etc/objects/templates.cfg'...
Processing object config file '/home/nagios/etc/objects/localhost.cfg'...
   Read object config files okay...

Running pre-flight check on configuration data...

Checking services...
    Checked 496 services.
Checking hosts...
    Checked 69 hosts.
Checking host groups...
    Checked 1 host groups.
Checking service groups...
    Checked 0 service groups.
Checking contacts...
    Checked 1 contacts.
Checking contact groups...
    Checked 1 contact groups.
Checking service escalations...
    Checked 0 service escalations.
Checking service dependencies...
    Checked 0 service dependencies.
Checking host escalations...
    Checked 0 host escalations.
Checking host dependencies...
    Checked 0 host dependencies.
Checking commands...
    Checked 32 commands.
Checking time periods...
    Checked 5 time periods.
Checking for circular paths between hosts...
Checking for circular host and service dependencies...
Checking global event handlers...
Checking obsessive compulsive processor commands...
Checking misc settings...

Total Warnings: 0
Total Errors:   0

Things look okay - No serious problems were detected during the pre-flight check

tommi
Posts: 1
Joined: Thu Nov 18, 2010 4:30 pm

Re: Nagios frozen, not updating service status.

Post by tommi »

I had a similar problem where the Nagios process froze. The problem was that I was using a event broker module, ndo2db and the mysql session was stopped which completely hung nagios.

Maybe your problem is similar.
Locked