OCSP problem after upgrade to Nagios 4.0 (NCSA)

Support forum for Nagios Core, Nagios Plugins, NCPA, NRPE, NSCA, NDOUtils and more. Engage with the community of users including those using the open source solutions.
MalcolmPreen
Posts: 63
Joined: Wed Jan 25, 2012 9:21 am

OCSP problem after upgrade to Nagios 4.0 (NCSA)

Post by MalcolmPreen »

In my configuration, we have a number of tablets on remote sites, running CentOS 6.4 and Nagios 3.5.0
These tablets report back to the centre using NCSA - we have two hosts receiving data - one sits within the company network (and therefore needs vpn access), whilst the other is within our DMZ.

One of the tablets was having issues, which seemed to relate to load... and having read about Nagios 4.0 and its worker processes, this seemed to be worth a test.

The old problem was that rather than running every 10 minutes, some of the services would be 20-30 minutes between checks... Following the upgrade, this issue appears to be resolved... which is good...

However, I've got an oddity.... some of the reports are not being delivered by NCSA.... but this is only the case for the VPN connected host... the other works without issue.

What I've found in /var/log/messages is the following;

Code: Select all

Oct 23 17:02:57 TABLETNAME nagios: wproc:   command: /usr/local/nagios/libexec/submit_check_result_to_vision1 HOSTNAME 'SHOWOSLEVEL' OK '7100-01-02-1150|version=7.1 release=01 ml=02'
Oct 23 17:02:57 TABLETNAME nagios: wproc:   host=HOSTNAME; service=SHOWOSLEVEL; contact=(none)
Oct 23 17:02:57 TABLETNAME nagios: wproc:   early_timeout=1; exited_ok=0; wait_status=0; error_code=62;
Oct 23 17:02:57 TABLETNAME nagios: Warning: OCSP command '/usr/local/nagios/libexec/submit_check_result_to_vision1 HOSTNAME 'SHOWOSLEVEL' OK '7100-01-02-1150|version=7.1 release=01 ml=02'' for service 'SHOWOSLEVEL' on host 'HOSTNAME' timed out after 0.00 seconds
My ocsp command is;

Code: Select all

ocsp_command=submit_check_result_to_vision1
Can anyone explain "error_code=62" in line 3 above ?
And how can it "timed out after 0.00 seconds" ??? (I'm assuming this is related to error 62?)

This was never an issue in Nagios 3.5 (but since the upgrade to 4.0) it affects 4 services on this one host... whilst the other 16 work like clockwork ?

Any suggestions?

Thanks, Malcolm
abrist
Red Shirt
Posts: 8334
Joined: Thu Nov 15, 2012 1:20 pm

Re: OCSP problem after upgrade to Nagios 4.0 (NCSA)

Post by abrist »

Is the vpn connection up?
Can you run the command from the cli?
Can you show me the output of a working ocsp command?
Former Nagios employee
"It is turtles. All. The. Way. Down. . . .and maybe an elephant or two."
VI VI VI - The editor of the Beast!
Come to the Dark Side.
MalcolmPreen
Posts: 63
Joined: Wed Jan 25, 2012 9:21 am

Re: OCSP problem after upgrade to Nagios 4.0 (NCSA)

Post by MalcolmPreen »

Yes the vpn is up

It appears to always work from the cli (from the few tests I've done)

I'll investigate further next week and post more info... (its been a hell of a week, and I need a weekend off)

Thanks, Malcolm
abrist
Red Shirt
Posts: 8334
Joined: Thu Nov 15, 2012 1:20 pm

Re: OCSP problem after upgrade to Nagios 4.0 (NCSA)

Post by abrist »

np. See you next week!
Former Nagios employee
"It is turtles. All. The. Way. Down. . . .and maybe an elephant or two."
VI VI VI - The editor of the Beast!
Come to the Dark Side.
MalcolmPreen
Posts: 63
Joined: Wed Jan 25, 2012 9:21 am

Re: OCSP problem after upgrade to Nagios 4.0 (NCSA)

Post by MalcolmPreen »

OK, I've done a bit more debugging, and it appears to be a timeout issue.

I'm investigating further... but it'll be tomorrow at the earliest before I have more information.

Malcolm
sreinhardt
-fno-stack-protector
Posts: 4366
Joined: Mon Nov 19, 2012 12:10 pm

Re: OCSP problem after upgrade to Nagios 4.0 (NCSA)

Post by sreinhardt »

Cool, thanks for letting us know you're still looking at it!
Nagios-Plugins maintainer exclusively, unless you have other C language bugs with open-source nagios projects, then I am happy to help! Please pm or use other communication to alert me to issues as I no longer track the forum.
MalcolmPreen
Posts: 63
Joined: Wed Jan 25, 2012 9:21 am

Re: OCSP problem after upgrade to Nagios 4.0 (NCSA)

Post by MalcolmPreen »

OK... I seem to have an "avoidance"... but as I'm relatively new to nagios... I'd like to run it past the experts.

The problem appears to be my ocsp_command and ochp_command..... well actually not them.... but...

The scripts are very similar.... basically,

verification of input
for each central host to send_nsca to
do
verify host alive
send_nsca
check if send OK or error
done
report to local log files

The problem appears to be that one of the hosts I am reporting to sometimes takes a long time (15+ seconds) - which causes the timeout....

I can get round the problem, by increasing ocsp_timeout and ochp_timeout from their default.... however, I'm currently at 40 seconds... and even that fails sometimes....

That number seems much too high (is it ?) and therefore I think I actually need to understand why the one host is taking a long time to respond to the send_nsca.

As background, I currently have three hosts I am reporting to;
- two machines within our company network, requiring VPN access - one is a physical server... one is a VMware client
- one machine in the company DMZ... again a VMware client

The machine which is taking a long time to respond is the physical server... so I guess I need to focus my efforts there.

Is there any issue with setting the ocsp_timeout and ochp_timeout value so high ?

Thanks for any feedback.

Malcolm
sreinhardt
-fno-stack-protector
Posts: 4366
Joined: Mon Nov 19, 2012 12:10 pm

Re: OCSP problem after upgrade to Nagios 4.0 (NCSA)

Post by sreinhardt »

Considering the other devices or servers are responding withing normal time-frames, I don't see an issue extending it that long, other than somewhat false positive results for the one that is taking so long. I would completely agree that taking a look at why that host is responding so slowly is the right course of action. Are any of the other services on that host slow, or are there other devices you can check on that network that are responding similarly?
Nagios-Plugins maintainer exclusively, unless you have other C language bugs with open-source nagios projects, then I am happy to help! Please pm or use other communication to alert me to issues as I no longer track the forum.
MalcolmPreen
Posts: 63
Joined: Wed Jan 25, 2012 9:21 am

Re: OCSP problem after upgrade to Nagios 4.0 (NCSA)

Post by MalcolmPreen »

Sorry for the delay in updating... the problem appears to be load related... too many NSCA messages (ie too many outlying hosts reporting) seems to make the problem occur... I am still investigating... but I noticed that someone else has a very similar problem logged on the tracker...
(http://tracker.nagios.org/view.php?id=529)

I'm not convinced that my problem is actually a nagios problem... my gut feel is a local resource... or configuration problem... but I'll endeavour to keep this thread updated with regards to my investigations.

Malcolm
slansing
Posts: 7698
Joined: Mon Apr 23, 2012 4:28 pm
Location: Travelling through time and space...

Re: OCSP problem after upgrade to Nagios 4.0 (NCSA)

Post by slansing »

Out of curiosity how many services are being reported to by NSCA over a 5 minute time frame?
Locked