Warning: The check of service 'XXXXX' on host 'XXXXXX' looks

Support forum for Nagios Core, Nagios Plugins, NCPA, NRPE, NSCA, NDOUtils and more. Engage with the community of users including those using the open source solutions.
Locked
fran.pastor
Posts: 24
Joined: Tue Nov 22, 2011 3:17 am

Warning: The check of service 'XXXXX' on host 'XXXXXX' looks

Post by fran.pastor »

Hello, since update to version 3.4.1 see this message in the Event Log.

"Warning: The check of service 'XXXXX' on host 'XXXXXX' looks like it was orphaned (results never came back).'m Scheduling an immediate check of the service ..."

We've seen has happened to more people, and have tried all the solutions that were discussed (ulimit modify, delete checkresults, delete objects.cache, etc ...) without any change.
We can observe that hosts/service will be orphaned by looking in the Scheduling Queue. After 10 or 15 minutes, still on the list and its execution time has passed for more than 10 minutes. Immediately we can see the WARNING at the Event Log.

Nagios is very important to our company. We are monitoring more than 500 hosts with more than 3000 services.
Any idea??
Thank you in advance
User avatar
jsmurphy
Posts: 989
Joined: Wed Aug 18, 2010 9:46 pm

Re: Warning: The check of service 'XXXXX' on host 'XXXXXX' l

Post by jsmurphy »

I had hoped I would never see that message ever again. The most common cause of this error is that there are two instances of Nagios running... stop Nagios and all it's related components then do a "ps -ef | grep nagios" you shouldn't see any results. If you do see instances of Nagios then kill those pids.

Is your Nagios install residing on a SAN? Is it virtualised? I've also seen this error caused by high CPU or IO load on the filers where it was taking exorbitant amounts of time writing files and then getting upset about it.
fran.pastor
Posts: 24
Joined: Tue Nov 22, 2011 3:17 am

Re: Warning: The check of service 'XXXXX' on host 'XXXXXX' l

Post by fran.pastor »

Thanks for the reply. Yes, I also did this check, completely stop all Nagios processes.
The server has 16cores (Xeon) of CPU, 20GB RAM, and Redhat 5.7 x86_64
The load average: 4.01, 3.13, 2.76 and Nagios installation is stored on local disks.

I'm thinking back to the previous version to 3.3.1, I'd swear that this did not happen.
What do you think?
thz!!
User avatar
jsmurphy
Posts: 989
Joined: Wed Aug 18, 2010 9:46 pm

Re: Warning: The check of service 'XXXXX' on host 'XXXXXX' l

Post by jsmurphy »

Hmmm unusual, if it's causing impact to your business then by all means roll back. If this is a test server then it might be worth persisting with some testing... because if there was a new bug introduced that brings this Nagios horror back from the dead it would be nice to know.
fran.pastor
Posts: 24
Joined: Tue Nov 22, 2011 3:17 am

Re: Warning: The check of service 'XXXXX' on host 'XXXXXX' l

Post by fran.pastor »

Hello, investigating this anomaly i observed in "Scheduling Queue " the checks are beginning in the past, for example, the correct behavior is to see the schedule of checks are in the future, but i see the first 100 +/- are in the past (1 or 2 or 3 or 4 minuts in the past)
Our Nagios installation is quite large, we monitored 500 hosts and about 3100 services. All configuration of reprogramming checks are default nagios.cfg (inter_check_delay, check_spread, etc. .. etc. ..)
I'll make changes to these parameters to try but do not think that this is the reason for our problem, do you think?
We also observed that when we execute a command (Acknowledge, reschedule service check, etc ...) takes a long time to run, eg an Acknowledge takes about 1 minute to apply.

Any suggestions?
:|

P.D.: i uploaded attachment with screenshot of scheduling queue
nagios.JPG
User avatar
jsmurphy
Posts: 989
Joined: Wed Aug 18, 2010 9:46 pm

Re: Warning: The check of service 'XXXXX' on host 'XXXXXX' l

Post by jsmurphy »

Yeah the check delay and interval stuff should definitely be fine at the defaults, I run about 3500 hosts on the defaults.

I have no idea what would cause this behaviour though, I'm not even sure where to begin debugging it... the incorrect scheduling would probably be the crux of the problem but what would cause that I have no idea. I'm not even 100% sure where the appropriate place to ask would be... maybe the nagios-devel mailing list?
fran.pastor
Posts: 24
Joined: Tue Nov 22, 2011 3:17 am

Re: Warning: The check of service 'XXXXX' on host 'XXXXXX' l

Post by fran.pastor »

Hi jsmurphy, thanks for the help. These days I've been researching and I think I've found the cause / causes.
We observed that a script developed by us and a obsolete NagVis version they use a obsolete functionality of mk-Livestatus..and this affects to performance of Nagios. mk-Livestatus've updated and NagVis updated and scripts updated, performance has improved significantly.
Also some plugins that were using the snmpwalk command, if the host did not respond, they were hung, we changed the plugin and perfect. And a few more changes .. (checking host changed so that instead of using check_ping use fping. So, if the host replies, will do only a one ping. We spent an average runtime of 4 seconds to 0.4 seconds... etc.. etc..)
look screenshot, times are more normal now
nagios.JPG
thz for all!!
User avatar
jsmurphy
Posts: 989
Joined: Wed Aug 18, 2010 9:46 pm

Re: Warning: The check of service 'XXXXX' on host 'XXXXXX' l

Post by jsmurphy »

Hmmmm that is pretty interesting, I wouldn't have thought livestatus would have touched the information before processing! Glad you worked it out though... I'll have to remember that :D.
Locked