Support forum for Nagios Core, Nagios Plugins, NCPA, NRPE, NSCA, NDOUtils and more. Engage with the community of users including those using the open source solutions.
Operating over 50 nagios core 3 servers we discovered some strange behavior on several servers: external commands written into the command pipe only get processed after several minutes. On one server we saw a delay of over 20 minutes. We are running Nagios Core 3.2.3 on 64-bits Redhat 5 or 6. The debug log on one of those servers showed the following:
[1406623333.323090] [001.0] [pid=24222] process_external_command1()
[1406623333.323095] [128.2] [pid=24222] Raw command entry: [1406622495] PROCESS_SERVICE_CHECK_RESULT;58CCE91A-32B8-408C-B1A8-023B96806A9E;f36fa53c-1c03-410f-98cd-319d0b16c566;0;OK - Reset to OK manually by Nagiosadmin
[1406623333.323147] [001.0] [pid=24222] process_external_command2()
[1406623333.323155] [128.1] [pid=24222] External Command Type: 30
[1406623333.323160] [128.1] [pid=24222] Command Entry Time: 1406622495
[1406623333.323165] [128.1] [pid=24222] Command Arguments: 58CCE91A-32B8-408C-B1A8-023B96806A9E;f36fa53c-1c03-410f-98cd-319d0b16c566;0;OK - Reset to OK manually by Nagiosadmin
When updating configurations (from a central database) we use a SIGHUP so the nagios process is never stopped. After stopping and restarting nagios external command processing happens without any delay. But sometimes even within a day the delay starts growing again. Some things we already determined:
No issue with the amount of command buffer slots (we had that issue in the past and solved it).
No issues with high iowait (we had issues on VM's with slow SAN storage but it also happens on real hardware with local very fast storage).
Please let us know if you have any ideas on investigating this issue. We have a workaround (restarting nagios every 24 hours) but than we are unable to investigate this issue any further.
On two or three larger nodes external_command_buffer_slots has been increased (but not on the system this topic is about). SElinux has been set to permissive. We did not yet try to fully disable it.
Very odd. command_check_interval is set to -1, so nagios should be checking external commands as often as possible. Lets try forcing it to a smallish number:
Former Nagios employee
"It is turtles. All. The. Way. Down. . . .and maybe an elephant or two."
VI VI VI - The editor of the Beast!
Come to the Dark Side.