Monitoring Event Engine Queue
Monitoring Event Engine Queue
HI
we are seeing issues with the Monitoring Event Engine queue.
I am not seeing issues in the logs related to ndo or too much messages
Even though I am not seeing this "NDOUtils - Message Queue Exceeded", I followed the recommendations in the link below
https://support.nagios.com/kb/article.php?id=139
This is what I see
# ipcs -q
------ Message Queues --------
key msqid owner perms used-bytes messages
0x50020080 327680 nagios 600 361473024 353001
I dont see issues in mysql. The database is healthy, we dont seem to have corruption in any of the tables or anything like that
I do see JOBs being process, when I run gearman_top2 I see jobs running and being process. I also check the mod_gearman_worker logs and I see jobs being process there as well
I have also tried disableing mod_gearman in nagios.cfg, but didn't make difference
I have also tried restarting nagios, but again it doesnt make any difference
[1506450666] Nagios 4.2.4 starting... (PID=68547)
[1506450666] Local time is Tue Sep 26 14:31:06 EDT 2017
[1506450666] LOG VERSION: 2.0
[1506450666] qh: Socket '/usr/local/nagios/var/rw/nagios.qh' successfully initialized
[1506450666] qh: core query handler registered
[1506450666] nerd: Channel hostchecks registered successfully
[1506450666] nerd: Channel servicechecks registered successfully
[1506450666] nerd: Channel opathchecks registered successfully
[1506450666] nerd: Fully initialized and ready to rock!
[1506450666] wproc: Successfully registered manager as @wproc with query handler
[1506450666] wproc: Registry request: name=Core Worker 68549;pid=68549
[1506450666] wproc: Registry request: name=Core Worker 68550;pid=68550
[1506450666] wproc: Registry request: name=Core Worker 68551;pid=68551
[1506450666] wproc: Registry request: name=Core Worker 68552;pid=68552
[1506450666] mod_gearman: initialized version 2.1.1 (libgearman 0.33)
[1506450666] Event broker module '/usr/lib64/mod_gearman2/mod_gearman2.o' initialized successfully.
[1506450666] ndomod: NDOMOD 2.1.2 (11-14-2016) Copyright (c) 2009 Nagios Core Development Team and Community Contributors
[1506450666] ndomod: Successfully connected to data sink. 0 queued items to flush.
[1506450666] ndomod registered for process data
[1506450666] ndomod registered for log data'
[1506450666] ndomod registered for system command data'
[1506450666] ndomod registered for event handler data'
[1506450666] ndomod registered for notification data'
[1506450666] ndomod registered for comment data'
[1506450666] ndomod registered for downtime data'
[1506450666] ndomod registered for flapping data'
[1506450666] ndomod registered for program status data'
[1506450666] ndomod registered for host status data'
[1506450666] ndomod registered for service status data'
[1506450666] ndomod registered for adaptive program data'
[1506450666] ndomod registered for adaptive host data'
[1506450666] ndomod registered for adaptive service data'
[1506450666] ndomod registered for external command data'
[1506450666] ndomod registered for aggregated status data'
[1506450666] ndomod registered for retention data'
[1506450666] ndomod registered for contact data'
[1506450666] ndomod registered for contact notification data'
[1506450666] ndomod registered for acknowledgement data'
[1506450666] ndomod registered for state change data'
[1506450666] ndomod registered for contact status data'
[1506450666] ndomod registered for adaptive contact data'
[1506450666] Event broker module '/usr/local/nagios/bin/ndomod.o' initialized successfully.
we are seeing issues with the Monitoring Event Engine queue.
I am not seeing issues in the logs related to ndo or too much messages
Even though I am not seeing this "NDOUtils - Message Queue Exceeded", I followed the recommendations in the link below
https://support.nagios.com/kb/article.php?id=139
This is what I see
# ipcs -q
------ Message Queues --------
key msqid owner perms used-bytes messages
0x50020080 327680 nagios 600 361473024 353001
I dont see issues in mysql. The database is healthy, we dont seem to have corruption in any of the tables or anything like that
I do see JOBs being process, when I run gearman_top2 I see jobs running and being process. I also check the mod_gearman_worker logs and I see jobs being process there as well
I have also tried disableing mod_gearman in nagios.cfg, but didn't make difference
I have also tried restarting nagios, but again it doesnt make any difference
[1506450666] Nagios 4.2.4 starting... (PID=68547)
[1506450666] Local time is Tue Sep 26 14:31:06 EDT 2017
[1506450666] LOG VERSION: 2.0
[1506450666] qh: Socket '/usr/local/nagios/var/rw/nagios.qh' successfully initialized
[1506450666] qh: core query handler registered
[1506450666] nerd: Channel hostchecks registered successfully
[1506450666] nerd: Channel servicechecks registered successfully
[1506450666] nerd: Channel opathchecks registered successfully
[1506450666] nerd: Fully initialized and ready to rock!
[1506450666] wproc: Successfully registered manager as @wproc with query handler
[1506450666] wproc: Registry request: name=Core Worker 68549;pid=68549
[1506450666] wproc: Registry request: name=Core Worker 68550;pid=68550
[1506450666] wproc: Registry request: name=Core Worker 68551;pid=68551
[1506450666] wproc: Registry request: name=Core Worker 68552;pid=68552
[1506450666] mod_gearman: initialized version 2.1.1 (libgearman 0.33)
[1506450666] Event broker module '/usr/lib64/mod_gearman2/mod_gearman2.o' initialized successfully.
[1506450666] ndomod: NDOMOD 2.1.2 (11-14-2016) Copyright (c) 2009 Nagios Core Development Team and Community Contributors
[1506450666] ndomod: Successfully connected to data sink. 0 queued items to flush.
[1506450666] ndomod registered for process data
[1506450666] ndomod registered for log data'
[1506450666] ndomod registered for system command data'
[1506450666] ndomod registered for event handler data'
[1506450666] ndomod registered for notification data'
[1506450666] ndomod registered for comment data'
[1506450666] ndomod registered for downtime data'
[1506450666] ndomod registered for flapping data'
[1506450666] ndomod registered for program status data'
[1506450666] ndomod registered for host status data'
[1506450666] ndomod registered for service status data'
[1506450666] ndomod registered for adaptive program data'
[1506450666] ndomod registered for adaptive host data'
[1506450666] ndomod registered for adaptive service data'
[1506450666] ndomod registered for external command data'
[1506450666] ndomod registered for aggregated status data'
[1506450666] ndomod registered for retention data'
[1506450666] ndomod registered for contact data'
[1506450666] ndomod registered for contact notification data'
[1506450666] ndomod registered for acknowledgement data'
[1506450666] ndomod registered for state change data'
[1506450666] ndomod registered for contact status data'
[1506450666] ndomod registered for adaptive contact data'
[1506450666] Event broker module '/usr/local/nagios/bin/ndomod.o' initialized successfully.
-
scottwilkerson
- DevOps Engineer
- Posts: 19396
- Joined: Tue Nov 15, 2011 3:11 pm
- Location: Nagios Enterprises
- Contact:
Re: Monitoring Event Engine Queue
What exactly is the issue you are experiencing?bosecorp wrote:we are seeing issues with the Monitoring Event Engine queue.
Also, can you post the output of the following
Code: Select all
ps -ef|grep bin/nagiosGenerally speaking restarting nagios makes the queue swell temporarily as all the data is pushed to the DB to update it with what is running.bosecorp wrote:I have also tried restarting nagios, but again it doesnt make any difference
Is this a local or offloaded DB?
Re: Monitoring Event Engine Queue
Yes , my DB is offloaded
the issue that I am experiencing is that I am not seeing any activity when I look at the Monitoring Event Engine Queue
# ps -ef | grep nagios.cfg
nagios 91709 1 15 16:39 ? 00:00:11 /usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg
nagios 91774 91709 0 16:39 ? 00:00:00 /usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg
root 99507 8098 0 16:40 pts/1 00:00:00 grep --color=auto nagios.cfg
[email protected]:(09-26 13:17): /usr/local/nagiosxi/html
# ps -ef|grep bin/nagios
nagios 91709 1 10 16:39 ? 00:00:14 /usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg
nagios 91711 91709 0 16:39 ? 00:00:00 /usr/local/nagios/bin/nagios --worker /usr/local/nagios/var/rw/nagios.qh
nagios 91712 91709 0 16:39 ? 00:00:00 /usr/local/nagios/bin/nagios --worker /usr/local/nagios/var/rw/nagios.qh
nagios 91713 91709 0 16:39 ? 00:00:00 /usr/local/nagios/bin/nagios --worker /usr/local/nagios/var/rw/nagios.qh
nagios 91714 91709 0 16:39 ? 00:00:00 /usr/local/nagios/bin/nagios --worker /usr/local/nagios/var/rw/nagios.qh
nagios 91774 91709 0 16:39 ? 00:00:00 /usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg
root 106586 8098 0 16:42 pts/1 00:00:00 grep --color=auto bin/nagios
[email protected]:(09-26 13:17): /usr/local/nagiosxi/html
the issue that I am experiencing is that I am not seeing any activity when I look at the Monitoring Event Engine Queue
# ps -ef | grep nagios.cfg
nagios 91709 1 15 16:39 ? 00:00:11 /usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg
nagios 91774 91709 0 16:39 ? 00:00:00 /usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg
root 99507 8098 0 16:40 pts/1 00:00:00 grep --color=auto nagios.cfg
[email protected]:(09-26 13:17): /usr/local/nagiosxi/html
# ps -ef|grep bin/nagios
nagios 91709 1 10 16:39 ? 00:00:14 /usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg
nagios 91711 91709 0 16:39 ? 00:00:00 /usr/local/nagios/bin/nagios --worker /usr/local/nagios/var/rw/nagios.qh
nagios 91712 91709 0 16:39 ? 00:00:00 /usr/local/nagios/bin/nagios --worker /usr/local/nagios/var/rw/nagios.qh
nagios 91713 91709 0 16:39 ? 00:00:00 /usr/local/nagios/bin/nagios --worker /usr/local/nagios/var/rw/nagios.qh
nagios 91714 91709 0 16:39 ? 00:00:00 /usr/local/nagios/bin/nagios --worker /usr/local/nagios/var/rw/nagios.qh
nagios 91774 91709 0 16:39 ? 00:00:00 /usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg
root 106586 8098 0 16:42 pts/1 00:00:00 grep --color=auto bin/nagios
[email protected]:(09-26 13:17): /usr/local/nagiosxi/html
You do not have the required permissions to view the files attached to this post.
-
scottwilkerson
- DevOps Engineer
- Posts: 19396
- Joined: Tue Nov 15, 2011 3:11 pm
- Location: Nagios Enterprises
- Contact:
Re: Monitoring Event Engine Queue
It's almost like there was a time shift on your server, does the time on your DB server match the time on your XI server?
If not we will want to get them synced to the same ntp server.
As a FYI this Queue is much different than the system Message Queues referenced in https://support.nagios.com/kb/article.php?id=139
If not we will want to get them synced to the same ntp server.
As a FYI this Queue is much different than the system Message Queues referenced in https://support.nagios.com/kb/article.php?id=139
Re: Monitoring Event Engine Queue
how can you tell that?
can you elaborate on the queue used for the monitoring event engine queue? what queue is used for that
can you elaborate on the queue used for the monitoring event engine queue? what queue is used for that
-
scottwilkerson
- DevOps Engineer
- Posts: 19396
- Joined: Tue Nov 15, 2011 3:11 pm
- Location: Nagios Enterprises
- Contact:
Re: Monitoring Event Engine Queue
I know specifically because I've developed the code.bosecorp wrote:how can you tell that?
can you elaborate on the queue used for the monitoring event engine queue? what queue is used for that
This queue are events that need to be processed and are things like email notifications, state changes etc, they are stored in the nagiosxi database and are processed in the eventman cronjob
Re: Monitoring Event Engine Queue
Interesting, thanks for the explication
So I checked the time and everything looks OK, on both the XI server and the DB server
I enable debuging on ndo2db. I see stuff going into the database, but I still see the Monitoring Event Engine queue blank.
if the events go to the nagiosxi database, should look in the Postgresql logs files and see if I find anything there
So I checked the time and everything looks OK, on both the XI server and the DB server
I enable debuging on ndo2db. I see stuff going into the database, but I still see the Monitoring Event Engine queue blank.
if the events go to the nagiosxi database, should look in the Postgresql logs files and see if I find anything there
-
scottwilkerson
- DevOps Engineer
- Posts: 19396
- Joined: Tue Nov 15, 2011 3:11 pm
- Location: Nagios Enterprises
- Contact:
Re: Monitoring Event Engine Queue
Yes this would make sense. You may also want to vacuum your postgres DBbosecorp wrote:if the events go to the nagiosxi database, should look in the Postgresql logs files and see if I find anything there
Code: Select all
echo "vacuum;vacuum analyse;vacuum full;"|psql nagiosxi postgres
echo "vacuum;vacuum analyse;vacuum full;"|psql postgres postgresRe: Monitoring Event Engine Queue
it didn't help
-
scottwilkerson
- DevOps Engineer
- Posts: 19396
- Joined: Tue Nov 15, 2011 3:11 pm
- Location: Nagios Enterprises
- Contact:
Re: Monitoring Event Engine Queue
Is your "Monitoring Engine Check Statistics" always all zeros too?
What version of XI are you running?
What version of XI are you running?