I'm running Nagios XI 5.4.10 on CentOS 7, 64-bit, manual install, with Mod-Gearman installed from the instructions at https://assets.nagios.com/downloads/nag ... ios_XI.pdf and two workers (Also CentOS 7). This is a clone of my production Nagios XI host, so downtime isn't a concern.
The first problem I have is that the performance is abysmal. Checks are going stale by hours. Mod-Gearman seems to work for everyone else, so I'm sure I'm overlooking something.
Code: Select all
2017-11-02 16:13:33 - localhost:4730 - v0.33
Queue Name | Worker Available | Jobs Waiting | Jobs Running
----------------------------------------------------------------------
check_results | 1 | 0 | 0
eventhandler | 27 | 0 | 0
host | 27 | 0 | 3
service | 27 | 0 | 2
worker_den-gearman1 | 1 | 0 | 0
worker_den-gearman2 | 1 | 0 | 0
----------------------------------------------------------------------The other problem I have is executing scripts on the monitored targets with check_by_ssh. Checks using SNMP or check_http (for example) work fine, but checks over SSH return "CRITICAL: Return code of 255 is out of bounds. (worker: den-gearman2)". I exchanged SSH keys between the workers and the monitored hosts, so I can log in as root without a password, but that didn't help. I suspect that either I need to do that for another user, or that I have a problem quoting the command argument.
Thanks!