Page 1 of 2
NSCA: big delay delay processing servicechecks
Posted: Fri Feb 14, 2014 10:20 am
by sebastiaopburnay
Hi!
I'm having a big problem with NSCA.
In an environment depicted in the attached picture, my Central Nagios' host receives the nsca submited servicechecks:
Code: Select all
Feb 14 15:11:38 SRVNAGIOSCORE01 nagios: EXTERNAL COMMAND: PROCESS_SERVICE_CHECK_RESULT;Monitored_Host_04;Ptinter_Toner_Yellow;0;Yellow Cartridge HP CB382A is at 59% - OK! Yellow Image Drum HP CB386A is at 52% - OK!|Yellow Cartridge HP CB382A=59;10;5; Yellow Image Drum HP CB386A=52;10;5;
But it takes
long periods of time between that receival and the state change on the web interface and on the NDOUtils DB.
In my remote nagios server (which actively monitors hosts) I have strange errors in logs:
Code: Select all
[1392388136] EXTERNAL COMMAND: PROCESS_SERVICE_CHECK_RESULT;Monitored_Host_04;Ptinter_Toner_Yellow;OK;Yellow Toner 507A HP CE402A is at 59% - OK!
[1392388136] External command error: Command failed -> PROCESS_SERVICE_CHECK_RESULT;Monitored_Host_04;Ptinter_Toner_Yellow;OK;Yellow Toner 507A HP CE402A is at 59% - OK!
Do you have any idea why?
Re: NSCA: big delay delay processing servicechecks
Posted: Fri Feb 14, 2014 11:59 am
by abrist
Can you check the date/times on both the remote and master server? Make sure they are in sync.
Re: NSCA: big delay delay processing servicechecks
Posted: Fri Feb 14, 2014 2:05 pm
by sebastiaopburnay
abrist wrote:Can you check the date/times on both the remote and master server? Make sure they are in sync.
That is a good idea, i've noticed they are almost two minutes out of synch with each other.
As both servers are on networks with restrictive policies, I'll have to contact each domain admin to request NTP access to Portugal's oficial time servers or a list with accessible NTPs list.
I'll get back to you with the results.
Re: NSCA: big delay delay processing servicechecks
Posted: Fri Feb 14, 2014 3:05 pm
by tmcdonald
We'll keep the thread open until you return.
Re: NSCA: big delay delay processing servicechecks
Posted: Mon Feb 17, 2014 12:47 pm
by sebastiaopburnay
Well,
I don't think this is an NTP synch issue, I've had both servers synched by friday at the end of the day and problem remains.
Only the nagios stop-then-start will make checks being updated.
I've seen a similar issue, but it had to do with active and passive checks enabled simoltaneously on my remote monitoring server.
Yet, in this remote environment, only active checks are enabled.
Re: NSCA: big delay delay processing servicechecks
Posted: Tue Feb 18, 2014 11:36 am
by slansing
What version of Nagios Core, and NDO are you using?
Re: NSCA: big delay delay processing servicechecks
Posted: Wed Feb 19, 2014 6:51 am
by sebastiaopburnay
slansing wrote:What version of Nagios Core, and NDO are you using?
I'm using Nagios3.5.0 on my Central passive Server and Nagios 4.0.2rc1 on my Remote active server.
On my Central server I have installed NDO2DB 1.5.2.
On the remote active server, I've increased servicecheck intervals in order to reduce the impact of servicechecks and nsca messages, but it made no difference.
On the Central server I've noticed with ps that there were 15851 nsca processes '/usr/sbin/nsca --daemon -c /etc/nsca.cfg'.
I think this could be related
Re: NSCA: big delay delay processing servicechecks
Posted: Wed Feb 19, 2014 4:21 pm
by slansing
I agree, they may not be closing, what type of latency / packet loss do you get through the pipe from one server to the other? You can check this with a standard ping. Also, how many services are you forwarding via NSCA to the central server from the remote one?
Re: NSCA: big delay delay processing servicechecks
Posted: Wed Feb 19, 2014 4:43 pm
by sebastiaopburnay
Well, on individual checks I get no problem.
Code: Select all
root@CentralServer:~# ping -c4 <remote-Server-IP>
PING <remote-Server-IP> (<remote-Server-IP>) 56(84) bytes of data.
64 bytes from <remote-Server-IP>: icmp_req=1 ttl=61 time=3.34 ms
64 bytes from <remote-Server-IP>: icmp_req=2 ttl=61 time=6.23 ms
64 bytes from <remote-Server-IP>: icmp_req=3 ttl=61 time=4.93 ms
64 bytes from <remote-Server-IP>: icmp_req=4 ttl=61 time=2.92 ms
--- <remote-Server-IP> ping statistics ---
4 packets transmitted, 4 received, 0% packet loss, time 3003ms
rtt min/avg/max/mdev = 2.927/4.359/6.233/1.316 ms
The remote server sends NSCA data on 362 Hosts having 2065 services.
Furthermore, I've tampered nagios.cfg to be more verbose on some logging, I got errors/warnings which suggest problems handling perfdata, such as:
Code: Select all
[1392845971] Warning: fork() in my_system_r() failed for command "/usr/local/pnp4nagios/libexec/process_perfdata.pl --bulk=/usr/local/pnp4nagios/var/service-perfdata"
This gave me the impression that the resources on the server are scarce.
At VCenter I've noticed alerts and warnings on CPU and Memory.
I will ask the Virtualization farm's owner to provide more Memory and COU to my ncentral nagios' host and than, watch the evolution
Re: NSCA: big delay delay processing servicechecks
Posted: Thu Feb 20, 2014 12:15 pm
by slansing
Hmm, okay so this is happening 24/7 correct? This error with processing the check results, and also the large latency between the received result, and it actually being displayed in the cgi's. Are you noticing a large influx of files in:
Or piling up in: