Service check pending

This support forum board is for support questions relating to Nagios XI, our flagship commercial network monitoring solution.
mtkaschools
Posts: 58
Joined: Tue Sep 14, 2010 7:53 am

Service check pending

Post by mtkaschools »

After I upgraded to R1.3G, I get a service check pending on a few of my services. Since it seems to be pending, even though it showed up on the problem board, I can't add comments. I also seem to have a host that shows down, even though when I look at the hosts, all of them are up.

Any suggestions?
mguthrie
Posts: 4380
Joined: Mon Jun 14, 2010 10:21 am

Re: Service check pending

Post by mguthrie »

You could try scheduling an immediate check in the host/service details pages.

The other thing you might want to do is run this from the command-line just to make sure a second nagios process didn't get spawned from the upgrade:

Code: Select all

killall -9 nagios
service nagios start
If the above doesn't work:
You could also double check the host or service by running it's check command from the command-line: example:
/usr/local/nagios/libexec/check_icmp -H <hostaddress>

You can use this to verify the output that you're getting.
mtkaschools
Posts: 58
Joined: Tue Sep 14, 2010 7:53 am

Re: Service check pending

Post by mtkaschools »

That didn't seem to help my cause much. When I look at the host, it says host check pending, just like the services said the same thing. Even if I reboot the whole server, it doesn't clear up anything that might be going on in the background.

Something told me not to upgrade!
mguthrie
Posts: 4380
Joined: Mon Jun 14, 2010 10:21 am

Re: Service check pending

Post by mguthrie »

What kind of output do you get when you run the check manually from the command-line?
mtkaschools
Posts: 58
Joined: Tue Sep 14, 2010 7:53 am

Re: Service check pending

Post by mtkaschools »

What specifically would I run from the cmd line?
tonyyarusso
Posts: 1128
Joined: Wed Mar 03, 2010 12:38 pm
Location: St. Paul, MN, USA
Contact:

Re: Service check pending

Post by tonyyarusso »

Something in the format of

Code: Select all

/usr/local/nagios/libexec/check_icmp -H <hostaddress>
The specifics can be determined by going through the Core Config Manager to see what arguments were supplied to the check, and filling them in as appropriate.
Tony Yarusso
Technical Services
___
TIES
Web: http://ties.k12.mn.us/
mtkaschools
Posts: 58
Joined: Tue Sep 14, 2010 7:53 am

Re: Service check pending

Post by mtkaschools »

it comes back with 'OK', but yet still in the nagiosXI interface, it shows 'service check is pending...'
mguthrie
Posts: 4380
Joined: Mon Jun 14, 2010 10:21 am

Re: Service check pending

Post by mguthrie »

Can you run the following commands on the command line and send us the output from it?

Code: Select all

tail -50 /usr/local/nagios/var/nagios.log
I'd like to see if Nagios has "orphaned" checks.
mtkaschools
Posts: 58
Joined: Tue Sep 14, 2010 7:53 am

Re: Service check pending

Post by mtkaschools »

Code: Select all

login as: root
[email protected]'s password:
Access denied
[email protected]'s password:
Last login: Wed Dec 15 11:54:48 2010
[root@noc ~]# tail -50 /usr/local/nagios/var/nagios.log
[1292586822] SERVICE ALERT: Printer - MHS-1422-LJ2015;Ping;CRITICAL;HARD;1;CRITICAL - 10.10.220.31: Net unreachable @ 10.5.1.1. rta nan, lost 100%
[1292586942] SERVICE ALERT: Printer - MHS-1422-LJ2015;Ping;CRITICAL;HARD;1;CRITICAL - 10.10.220.31: Net unreachable @ 10.5.1.1. rta nan, lost 100%
[1292587062] SERVICE ALERT: Printer - MHS-1422-LJ2015;Ping;CRITICAL;HARD;1;CRITICAL - 10.10.220.31: Net unreachable @ 10.5.1.1. rta nan, lost 100%
[1292587182] SERVICE ALERT: Printer - MHS-1422-LJ2015;Ping;CRITICAL;HARD;1;CRITICAL - 10.10.220.31: Net unreachable @ 10.5.1.1. rta nan, lost 100%
[1292587302] SERVICE ALERT: Printer - MHS-1422-LJ2015;Ping;CRITICAL;HARD;1;CRITICAL - 10.10.220.31: Net unreachable @ 10.5.1.1. rta nan, lost 100%
[1292587422] SERVICE ALERT: Printer - MHS-1422-LJ2015;Ping;CRITICAL;HARD;1;CRITICAL - 10.10.220.31: Net unreachable @ 10.5.1.1. rta nan, lost 100%
[1292587542] SERVICE ALERT: Printer - MHS-1422-LJ2015;Ping;CRITICAL;HARD;1;CRITICAL - 10.10.220.31: Net unreachable @ 10.5.1.1. rta nan, lost 100%
[1292587662] SERVICE ALERT: Printer - MHS-1422-LJ2015;Ping;CRITICAL;HARD;1;CRITICAL - 10.10.220.31: Net unreachable @ 10.5.1.1. rta nan, lost 100%
[1292587782] SERVICE ALERT: Printer - MHS-1422-LJ2015;Ping;CRITICAL;HARD;1;CRITICAL - 10.10.220.31: Net unreachable @ 10.5.1.1. rta nan, lost 100%
[1292587833] Auto-save of retention data completed successfully.
[1292587902] SERVICE ALERT: Printer - MHS-1422-LJ2015;Ping;CRITICAL;HARD;1;CRITICAL - 10.10.220.31: Net unreachable @ 10.5.1.1. rta nan, lost 100%
[1292588022] SERVICE ALERT: Printer - MHS-1422-LJ2015;Ping;CRITICAL;HARD;1;CRITICAL - 10.10.220.31: Net unreachable @ 10.5.1.1. rta nan, lost 100%
[1292588142] SERVICE ALERT: Printer - MHS-1422-LJ2015;Ping;CRITICAL;HARD;1;CRITICAL - 10.10.220.31: Net unreachable @ 10.5.1.1. rta nan, lost 100%
[1292588262] SERVICE ALERT: Printer - MHS-1422-LJ2015;Ping;CRITICAL;HARD;1;CRITICAL - 10.10.220.31: Net unreachable @ 10.5.1.1. rta nan, lost 100%
[1292588382] SERVICE ALERT: Printer - MHS-1422-LJ2015;Ping;CRITICAL;HARD;1;CRITICAL - 10.10.220.31: Net unreachable @ 10.5.1.1. rta nan, lost 100%
[1292588502] SERVICE ALERT: Printer - MHS-1422-LJ2015;Ping;CRITICAL;HARD;1;CRITICAL - 10.10.220.31: Net unreachable @ 10.5.1.1. rta nan, lost 100%
[1292588622] SERVICE ALERT: Printer - MHS-1422-LJ2015;Ping;CRITICAL;HARD;1;CRITICAL - 10.10.220.31: Net unreachable @ 10.5.1.1. rta nan, lost 100%
[1292588742] SERVICE ALERT: Printer - MHS-1422-LJ2015;Ping;CRITICAL;HARD;1;CRITICAL - 10.10.220.31: Net unreachable @ 10.5.1.1. rta nan, lost 100%
[1292588862] SERVICE ALERT: Printer - MHS-1422-LJ2015;Ping;CRITICAL;HARD;1;CRITICAL - 10.10.220.31: Net unreachable @ 10.5.1.1. rta nan, lost 100%
[1292588982] SERVICE ALERT: Printer - MHS-1422-LJ2015;Ping;CRITICAL;HARD;1;CRITICAL - 10.10.220.31: Net unreachable @ 10.5.1.1. rta nan, lost 100%
[1292589102] SERVICE ALERT: Printer - MHS-1422-LJ2015;Ping;CRITICAL;HARD;1;CRITICAL - 10.10.220.31: Net unreachable @ 10.5.1.1. rta nan, lost 100%
[1292589222] SERVICE ALERT: Printer - MHS-1422-LJ2015;Ping;CRITICAL;HARD;1;CRITICAL - 10.10.220.31: Net unreachable @ 10.5.1.1. rta nan, lost 100%
[1292589352] SERVICE ALERT: Printer - MHS-1422-LJ2015;Ping;CRITICAL;HARD;1;CRITICAL - 10.10.220.31: rta nan, lost 100%
[1292589472] SERVICE ALERT: Printer - MHS-1422-LJ2015;Ping;CRITICAL;HARD;1;CRITICAL - 10.10.220.31: rta nan, lost 100%
[1292589582] SERVICE ALERT: Printer - MHS-1422-LJ2015;Ping;CRITICAL;HARD;1;CRITICAL - 10.10.220.31: Net unreachable @ 10.5.1.1. rta nan, lost 100%
[1292589702] SERVICE ALERT: Printer - MHS-1422-LJ2015;Ping;CRITICAL;HARD;1;CRITICAL - 10.10.220.31: Net unreachable @ 10.5.1.1. rta nan, lost 100%
[1292589822] SERVICE ALERT: Printer - MHS-1422-LJ2015;Ping;CRITICAL;HARD;1;CRITICAL - 10.10.220.31: Net unreachable @ 10.5.1.1. rta nan, lost 100%
[1292589942] SERVICE ALERT: Printer - MHS-1422-LJ2015;Ping;OK;HARD;1;OK - 10.10.220.31: rta 0.367ms, lost 0%
[1292589952] HOST ALERT: Printer - MHS-1422-LJ2015;UP;HARD;1;OK - 10.10.220.31: rta 11.786ms, lost 0%
[1292589962] SERVICE ALERT: Printer - MHS-1422-LJ2015;Printer Status;OK;HARD;2;Printer ok - ("Ready")
[1292591433] Auto-save of retention data completed successfully.
[1292591972] SERVICE ALERT: Server - SHAREPOINTDB;CPU Usage;CRITICAL;SOFT;1;Connection reset by peer
[1292592032] SERVICE ALERT: Server - SHAREPOINTDB;CPU Usage;OK;SOFT;2;CPU Load 1% (5 min average)
[1292595033] Auto-save of retention data completed successfully.
[1292598172] SERVICE ALERT: Server - SHAREPOINT1;Service - Sophos Anti-Virus;WARNING;SOFT;1;could not fetch information from server
[1292598232] SERVICE ALERT: Server - SHAREPOINT1;Service - Sophos Anti-Virus;OK;SOFT;2;SAVService: Started
[1292598633] Auto-save of retention data completed successfully.
[1292600972] SERVICE ALERT: Printer - DSC-TL-LJ3800-COLOR;Printer Status;WARNING;SOFT;1;Printer Offline ("Checking printer")
[1292600972] SERVICE ALERT: Printer - DSC-TL-LJ3800-BLACK;Printer Status;WARNING;SOFT;1;Printer Offline ("Checking printer")
[1292601032] SERVICE ALERT: Printer - DSC-TL-LJ3800-COLOR;Printer Status;OK;SOFT;2;Printer ok - ("Processing job from tray 2")
[1292601032] SERVICE ALERT: Printer - DSC-TL-LJ3800-BLACK;Printer Status;OK;SOFT;2;Printer ok - ("Processing job from tray 2")
[1292602233] Auto-save of retention data completed successfully.
[1292604932] SERVICE ALERT: Server - SHAREPOINT1;Uptime;WARNING;SOFT;1;could not fetch information from server
[1292604992] SERVICE ALERT: Server - SHAREPOINT1;Uptime;OK;SOFT;2;System Uptime - 2 day(s) 20 hour(s) 55 minute(s)
[1292605833] Auto-save of retention data completed successfully.
[1292609433] Auto-save of retention data completed successfully.
[1292611662] SERVICE ALERT: Server - BB2;Service - Sophos Agent;CRITICAL;SOFT;1;Connection reset by peer
[1292611722] SERVICE ALERT: Server - BB2;Service - Sophos Agent;OK;SOFT;2;Sophos Agent: Started
[1292613033] Auto-save of retention data completed successfully.
[1292616633] Auto-save of retention data completed successfully.
[root@noc ~]#
mguthrie
Posts: 4380
Joined: Mon Jun 14, 2010 10:21 am

Re: Service check pending

Post by mguthrie »

Can I have you check to see if the Nagios Core is showing the same issue? Access the core interface by going to http://<yourserver>/nagios. I want to see if the issue is with Core or related to the Xi interface.

Is there any relationship between the checks that are coming back as pending? (For example, are they all using the same check command?)
Locked