Hi All
Below is my nagios architecture
Machine A - Linux VM - manual nagios installation + Modgearman server
Machine B - Linux VM - manual nagios installation + Mod gearman server
Machine C - Linux VM - Mod gearman worker
Machine D - Linux VM - MariaDB
Machine A and B are made Master/Slave using DRBD + pacemaker . So at any given point in time only 1 machine is in active state.
Service checks using modgearman is wokring fine no matter where the NAGIOS MASTER is .
For Host checks , I get below log entry when my NAGIOS MASTER is A , whenever my MASTER is B it works fine there .
host check orphaned, is the mod-gearman worker on queue 'host' running?
I have checked the module.conf file for modgearman on both server and it looks fine . AFter enabling the debug logs at worker as well I can see that in both cased it gets the Output . But some how when MASTER is A , it never gets reflected on the dashboard.
Can any one help here ??
Host check not working with modgearman on 1 node
-
rajsshah86
- Posts: 5
- Joined: Wed Sep 12, 2018 4:46 am
-
swolf
Re: Host check not working with modgearman on 1 node
Hi @rajsshah86,
I would highly recommend contacting Linbit about this issue, as their support team will be better equipped to solve issues related to DRBD/pacemaker and to distributed monitoring in general. Regardless, there are a few things you can check:
- Can you run the host checks from Machine A manually? I'm guessing this will work fine, but if the plugin is timing out it might produce a similar error (though usually it will say that the plugin timed out instead)
- Did you check the worker.conf files? These might exist on Machine A/B as well as Machine C. If host checks are disabled on any of these machines, this might be the cause of your issues.
If your issues persist, please show output for gearman_top (or gearman_top2 if you're using mod gearman 2) for both machine A and machine B when each one is active.
I would highly recommend contacting Linbit about this issue, as their support team will be better equipped to solve issues related to DRBD/pacemaker and to distributed monitoring in general. Regardless, there are a few things you can check:
- Can you run the host checks from Machine A manually? I'm guessing this will work fine, but if the plugin is timing out it might produce a similar error (though usually it will say that the plugin timed out instead)
- Did you check the worker.conf files? These might exist on Machine A/B as well as Machine C. If host checks are disabled on any of these machines, this might be the cause of your issues.
If your issues persist, please show output for gearman_top (or gearman_top2 if you're using mod gearman 2) for both machine A and machine B when each one is active.
-
rajsshah86
- Posts: 5
- Joined: Wed Sep 12, 2018 4:46 am
Re: Host check not working with modgearman on 1 node
- Can you run the host checks from Machine A manually? I'm guessing this will work fine, but if the plugin is timing out it might produce a similar error (though usually it will say that the plugin timed out instead)
Yes Host check directly from Machine A and B works fine .
- Did you check the worker.conf files? These might exist on Machine A/B as well as Machine C. If host checks are disabled on any of these machines, this might be the cause of your issues.
Well since my worker daemon is disable and not running on MAchine A & B , the file is not being used . However I have still chked it and they both are same .
Machine C has below config on worker.conf Ihope that is correct way to putting it .:
# sets the addess of your gearman job server. Can be specified
# more than once to add more server.
server=MACHINE A:4730
server=MACHINE B::4730
# sets the address of your 2nd (duplicate) gearman job server. Can
# be specified more than once o add more servers.
#dupserver=<host>:<port>
OUTPUT : gearman_top2
MACHINE A :
2019-02-28 15:01:10 - localhost:4730 - v0.33
Queue Name | Worker Available | Jobs Waiting | Jobs Running
-------------------------------------------------------------------------
check_results | 1 | 0 | 0
eventhandler | 6 | 0 | 0
host | 6 | 0 | 0
service | 6 | 0 | 2
worker_weeus01plnagi03 | 1 | 0 | 0
-------------------------------------------------------------------------
MACHINE B :
2019-02-28 14:30:35 - localhost:4730 - v0.33
Queue Name | Worker Available | Jobs Waiting | Jobs Running
-------------------------------------------------------------------------
check_results | 1 | 0 | 0
eventhandler | 16 | 0 | 0
host | 16 | 0 | 0
service | 16 | 0 | 7
worker_weeus01plnagi03 | 1 | 0 | 0
-------------------------------------------------------------------------
-
rajsshah86
- Posts: 5
- Joined: Wed Sep 12, 2018 4:46 am
Re: Host check not working with modgearman on 1 node
Hey @swolf
I think I figured out the issue . But it is strange that how it behaves . Notice the parameter server and dupserver under worker.conf file on machine C .
When I stopped modgearman JOB server on Machine B ( even though it was SLAVE ) , the issue got resolved .
This means even though MACHINE A was sending checks to MACHINE C , MACHINE C was replying back to MACHINE B instead of A for HOST checks .
But I expect this behavior for SERVICE checks as well , but that was not the case . For SERVICE everything works fine irrespective where my MASTER is .
HAve you guys faces these issues ??
I think best for me will to put MACHINE A under server parameter and MACHINE B under dupserver parameter .
Well Thanks any ways for reply . This will definitely help some one in future
I also noticed that in my DEBUG logs .. I see the time entry as 1 hour behind my server time .. which is strange . Any comments on that are welcome .
I think I figured out the issue . But it is strange that how it behaves . Notice the parameter server and dupserver under worker.conf file on machine C .
When I stopped modgearman JOB server on Machine B ( even though it was SLAVE ) , the issue got resolved .
This means even though MACHINE A was sending checks to MACHINE C , MACHINE C was replying back to MACHINE B instead of A for HOST checks .
But I expect this behavior for SERVICE checks as well , but that was not the case . For SERVICE everything works fine irrespective where my MASTER is .
HAve you guys faces these issues ??
I think best for me will to put MACHINE A under server parameter and MACHINE B under dupserver parameter .
Well Thanks any ways for reply . This will definitely help some one in future
I also noticed that in my DEBUG logs .. I see the time entry as 1 hour behind my server time .. which is strange . Any comments on that are welcome .
-
swolf
Re: Host check not working with modgearman on 1 node
Now that I've had more time to think about it, I do remember coming across an issue like this when we were testing Mod Gearman 3 (that is, host checks would be orphaned when services weren't under some specific circumstances) . It wasn't fixed in v3.0.7, but it may be fixed in v3.0.8 - the changelog here shows that something was fixed related to orphaned checks, but I haven't had the chance to try it out yet.
There's one other thing I'd like you to check: on a server where you're having the orphaned host check issue, can you verify that service checks are actually coming in? If I remember correctly, we found an issue where services would show the last status for a returned check, but wouldn't update (so you might, for instance, see a check result that was over an hour old).
As far as your timezone issues: if you're based in Europe, it could be that logs are being written in UTC rather than being converted. Otherwise, I'm not sure what would be causing it.
There's one other thing I'd like you to check: on a server where you're having the orphaned host check issue, can you verify that service checks are actually coming in? If I remember correctly, we found an issue where services would show the last status for a returned check, but wouldn't update (so you might, for instance, see a check result that was over an hour old).
As far as your timezone issues: if you're based in Europe, it could be that logs are being written in UTC rather than being converted. Otherwise, I'm not sure what would be causing it.
-
rajsshah86
- Posts: 5
- Joined: Wed Sep 12, 2018 4:46 am
Re: Host check not working with modgearman on 1 node
Services were getting updated with latest time .It was the issue only with HOST checks . But thanks a lot for your response .
Might bother you with new questions ( if I come accross any )
Might bother you with new questions ( if I come accross any )
-
swolf
Re: Host check not working with modgearman on 1 node
Alright, since everything seems to be okay, I'm going to lock this up now.
If any issues come up again, or if you have other concerns, feel free to create a new thread.
If any issues come up again, or if you have other concerns, feel free to create a new thread.