NSCA: big delay delay processing servicechecks

Support forum for Nagios Core, Nagios Plugins, NCPA, NRPE, NSCA, NDOUtils and more. Engage with the community of users including those using the open source solutions.
User avatar
sebastiaopburnay
Posts: 105
Joined: Sun Oct 31, 2010 1:40 pm
Location: Lisbon, Portugal

NSCA: big delay delay processing servicechecks

Post by sebastiaopburnay »

Hi!

I'm having a big problem with NSCA.

In an environment depicted in the attached picture, my Central Nagios' host receives the nsca submited servicechecks:

Code: Select all

Feb 14 15:11:38 SRVNAGIOSCORE01 nagios: EXTERNAL COMMAND: PROCESS_SERVICE_CHECK_RESULT;Monitored_Host_04;Ptinter_Toner_Yellow;0;Yellow Cartridge HP CB382A is at 59% - OK! Yellow Image Drum HP CB386A is at 52% - OK!|Yellow Cartridge HP CB382A=59;10;5; Yellow Image Drum HP CB386A=52;10;5;
But it takes long periods of time between that receival and the state change on the web interface and on the NDOUtils DB.

In my remote nagios server (which actively monitors hosts) I have strange errors in logs:

Code: Select all

[1392388136] EXTERNAL COMMAND: PROCESS_SERVICE_CHECK_RESULT;Monitored_Host_04;Ptinter_Toner_Yellow;OK;Yellow Toner 507A HP CE402A is at 59% - OK!
[1392388136] External command error: Command failed -> PROCESS_SERVICE_CHECK_RESULT;Monitored_Host_04;Ptinter_Toner_Yellow;OK;Yellow Toner 507A HP CE402A is at 59% - OK!
Do you have any idea why?
Attachments
Drawing6.jpg
abrist
Red Shirt
Posts: 8334
Joined: Thu Nov 15, 2012 1:20 pm

Re: NSCA: big delay delay processing servicechecks

Post by abrist »

Can you check the date/times on both the remote and master server? Make sure they are in sync.
Former Nagios employee
"It is turtles. All. The. Way. Down. . . .and maybe an elephant or two."
VI VI VI - The editor of the Beast!
Come to the Dark Side.
User avatar
sebastiaopburnay
Posts: 105
Joined: Sun Oct 31, 2010 1:40 pm
Location: Lisbon, Portugal

Re: NSCA: big delay delay processing servicechecks

Post by sebastiaopburnay »

abrist wrote:Can you check the date/times on both the remote and master server? Make sure they are in sync.
That is a good idea, i've noticed they are almost two minutes out of synch with each other.

As both servers are on networks with restrictive policies, I'll have to contact each domain admin to request NTP access to Portugal's oficial time servers or a list with accessible NTPs list.

I'll get back to you with the results.
tmcdonald
Posts: 9117
Joined: Mon Sep 23, 2013 8:40 am

Re: NSCA: big delay delay processing servicechecks

Post by tmcdonald »

We'll keep the thread open until you return.
Former Nagios employee
User avatar
sebastiaopburnay
Posts: 105
Joined: Sun Oct 31, 2010 1:40 pm
Location: Lisbon, Portugal

Re: NSCA: big delay delay processing servicechecks

Post by sebastiaopburnay »

Well,

I don't think this is an NTP synch issue, I've had both servers synched by friday at the end of the day and problem remains.

Only the nagios stop-then-start will make checks being updated.

I've seen a similar issue, but it had to do with active and passive checks enabled simoltaneously on my remote monitoring server.

Yet, in this remote environment, only active checks are enabled.
slansing
Posts: 7698
Joined: Mon Apr 23, 2012 4:28 pm
Location: Travelling through time and space...

Re: NSCA: big delay delay processing servicechecks

Post by slansing »

What version of Nagios Core, and NDO are you using?
User avatar
sebastiaopburnay
Posts: 105
Joined: Sun Oct 31, 2010 1:40 pm
Location: Lisbon, Portugal

Re: NSCA: big delay delay processing servicechecks

Post by sebastiaopburnay »

slansing wrote:What version of Nagios Core, and NDO are you using?
I'm using Nagios3.5.0 on my Central passive Server and Nagios 4.0.2rc1 on my Remote active server.

On my Central server I have installed NDO2DB 1.5.2.

On the remote active server, I've increased servicecheck intervals in order to reduce the impact of servicechecks and nsca messages, but it made no difference.

On the Central server I've noticed with ps that there were 15851 nsca processes '/usr/sbin/nsca --daemon -c /etc/nsca.cfg'.

I think this could be related
slansing
Posts: 7698
Joined: Mon Apr 23, 2012 4:28 pm
Location: Travelling through time and space...

Re: NSCA: big delay delay processing servicechecks

Post by slansing »

I agree, they may not be closing, what type of latency / packet loss do you get through the pipe from one server to the other? You can check this with a standard ping. Also, how many services are you forwarding via NSCA to the central server from the remote one?
User avatar
sebastiaopburnay
Posts: 105
Joined: Sun Oct 31, 2010 1:40 pm
Location: Lisbon, Portugal

Re: NSCA: big delay delay processing servicechecks

Post by sebastiaopburnay »

Well, on individual checks I get no problem.

Code: Select all

root@CentralServer:~# ping -c4 <remote-Server-IP>
PING <remote-Server-IP> (<remote-Server-IP>) 56(84) bytes of data.
64 bytes from <remote-Server-IP>: icmp_req=1 ttl=61 time=3.34 ms
64 bytes from <remote-Server-IP>: icmp_req=2 ttl=61 time=6.23 ms
64 bytes from <remote-Server-IP>: icmp_req=3 ttl=61 time=4.93 ms
64 bytes from <remote-Server-IP>: icmp_req=4 ttl=61 time=2.92 ms

--- <remote-Server-IP> ping statistics ---
4 packets transmitted, 4 received, 0% packet loss, time 3003ms
rtt min/avg/max/mdev = 2.927/4.359/6.233/1.316 ms
The remote server sends NSCA data on 362 Hosts having 2065 services.

Furthermore, I've tampered nagios.cfg to be more verbose on some logging, I got errors/warnings which suggest problems handling perfdata, such as:

Code: Select all

[1392845971] Warning: fork() in my_system_r() failed for command "/usr/local/pnp4nagios/libexec/process_perfdata.pl --bulk=/usr/local/pnp4nagios/var/service-perfdata"
This gave me the impression that the resources on the server are scarce.

At VCenter I've noticed alerts and warnings on CPU and Memory.

I will ask the Virtualization farm's owner to provide more Memory and COU to my ncentral nagios' host and than, watch the evolution
slansing
Posts: 7698
Joined: Mon Apr 23, 2012 4:28 pm
Location: Travelling through time and space...

Re: NSCA: big delay delay processing servicechecks

Post by slansing »

Hmm, okay so this is happening 24/7 correct? This error with processing the check results, and also the large latency between the received result, and it actually being displayed in the cgi's. Are you noticing a large influx of files in:

Code: Select all

/var/spool/checkresults
Or piling up in:

Code: Select all

/tmp
Locked