Monitoring Event Engine Queue

bosecorp · Post by **bosecorp** » Tue Sep 26, 2017 1:28 pm

HI

we are seeing issues with the Monitoring Event Engine queue.

I am not seeing issues in the logs related to ndo or too much messages

Even though I am not seeing this "NDOUtils - Message Queue Exceeded", I followed the recommendations in the link below

https://support.nagios.com/kb/article.php?id=139

This is what I see

# ipcs -q

------ Message Queues --------
key msqid owner perms used-bytes messages
0x50020080 327680 nagios 600 361473024 353001

I dont see issues in mysql. The database is healthy, we dont seem to have corruption in any of the tables or anything like that

I do see JOBs being process, when I run gearman_top2 I see jobs running and being process. I also check the mod_gearman_worker logs and I see jobs being process there as well

I have also tried disableing mod_gearman in nagios.cfg, but didn't make difference

I have also tried restarting nagios, but again it doesnt make any difference

[1506450666] Nagios 4.2.4 starting... (PID=68547)
[1506450666] Local time is Tue Sep 26 14:31:06 EDT 2017
[1506450666] LOG VERSION: 2.0
[1506450666] qh: Socket '/usr/local/nagios/var/rw/nagios.qh' successfully initialized
[1506450666] qh: core query handler registered
[1506450666] nerd: Channel hostchecks registered successfully
[1506450666] nerd: Channel servicechecks registered successfully
[1506450666] nerd: Channel opathchecks registered successfully
[1506450666] nerd: Fully initialized and ready to rock!
[1506450666] wproc: Successfully registered manager as @wproc with query handler
[1506450666] wproc: Registry request: name=Core Worker 68549;pid=68549
[1506450666] wproc: Registry request: name=Core Worker 68550;pid=68550
[1506450666] wproc: Registry request: name=Core Worker 68551;pid=68551
[1506450666] wproc: Registry request: name=Core Worker 68552;pid=68552
[1506450666] mod_gearman: initialized version 2.1.1 (libgearman 0.33)
[1506450666] Event broker module '/usr/lib64/mod_gearman2/mod_gearman2.o' initialized successfully.
[1506450666] ndomod: NDOMOD 2.1.2 (11-14-2016) Copyright (c) 2009 Nagios Core Development Team and Community Contributors
[1506450666] ndomod: Successfully connected to data sink. 0 queued items to flush.
[1506450666] ndomod registered for process data
[1506450666] ndomod registered for log data'
[1506450666] ndomod registered for system command data'
[1506450666] ndomod registered for event handler data'
[1506450666] ndomod registered for notification data'
[1506450666] ndomod registered for comment data'
[1506450666] ndomod registered for downtime data'
[1506450666] ndomod registered for flapping data'
[1506450666] ndomod registered for program status data'
[1506450666] ndomod registered for host status data'
[1506450666] ndomod registered for service status data'
[1506450666] ndomod registered for adaptive program data'
[1506450666] ndomod registered for adaptive host data'
[1506450666] ndomod registered for adaptive service data'
[1506450666] ndomod registered for external command data'
[1506450666] ndomod registered for aggregated status data'
[1506450666] ndomod registered for retention data'
[1506450666] ndomod registered for contact data'
[1506450666] ndomod registered for contact notification data'
[1506450666] ndomod registered for acknowledgement data'
[1506450666] ndomod registered for state change data'
[1506450666] ndomod registered for contact status data'
[1506450666] ndomod registered for adaptive contact data'
[1506450666] Event broker module '/usr/local/nagios/bin/ndomod.o' initialized successfully.

scottwilkerson · Post by **scottwilkerson** » Tue Sep 26, 2017 3:05 pm

bosecorp wrote:we are seeing issues with the Monitoring Event Engine queue.

What exactly is the issue you are experiencing?

Also, can you post the output of the following

Code: Select all

ps -ef|grep bin/nagios

bosecorp wrote:I have also tried restarting nagios, but again it doesnt make any difference

Generally speaking restarting nagios makes the queue swell temporarily as all the data is pushed to the DB to update it with what is running.

Is this a local or offloaded DB?

bosecorp · Post by **bosecorp** » Tue Sep 26, 2017 3:42 pm

Yes , my DB is offloaded

the issue that I am experiencing is that I am not seeing any activity when I look at the Monitoring Event Engine Queue

# ps -ef | grep nagios.cfg
nagios 91709 1 15 16:39 ? 00:00:11 /usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg
nagios 91774 91709 0 16:39 ? 00:00:00 /usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg
root 99507 8098 0 16:40 pts/1 00:00:00 grep --color=auto nagios.cfg
[email protected]:(09-26 13:17): /usr/local/nagiosxi/html
# ps -ef|grep bin/nagios
nagios 91709 1 10 16:39 ? 00:00:14 /usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg
nagios 91711 91709 0 16:39 ? 00:00:00 /usr/local/nagios/bin/nagios --worker /usr/local/nagios/var/rw/nagios.qh
nagios 91712 91709 0 16:39 ? 00:00:00 /usr/local/nagios/bin/nagios --worker /usr/local/nagios/var/rw/nagios.qh
nagios 91713 91709 0 16:39 ? 00:00:00 /usr/local/nagios/bin/nagios --worker /usr/local/nagios/var/rw/nagios.qh
nagios 91714 91709 0 16:39 ? 00:00:00 /usr/local/nagios/bin/nagios --worker /usr/local/nagios/var/rw/nagios.qh
nagios 91774 91709 0 16:39 ? 00:00:00 /usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg
root 106586 8098 0 16:42 pts/1 00:00:00 grep --color=auto bin/nagios
[email protected]:(09-26 13:17): /usr/local/nagiosxi/html

scottwilkerson · Post by **scottwilkerson** » Tue Sep 26, 2017 4:58 pm

It's almost like there was a time shift on your server, does the time on your DB server match the time on your XI server?

If not we will want to get them synced to the same ntp server.

As a FYI this Queue is much different than the system Message Queues referenced in https://support.nagios.com/kb/article.php?id=139

bosecorp · Post by **bosecorp** » Wed Sep 27, 2017 7:48 am

how can you tell that?

can you elaborate on the queue used for the monitoring event engine queue? what queue is used for that

scottwilkerson · Post by **scottwilkerson** » Wed Sep 27, 2017 11:31 am

bosecorp wrote:how can you tell that?

can you elaborate on the queue used for the monitoring event engine queue? what queue is used for that

I know specifically because I've developed the code.

This queue are events that need to be processed and are things like email notifications, state changes etc, they are stored in the nagiosxi database and are processed in the eventman cronjob

bosecorp · Post by **bosecorp** » Wed Sep 27, 2017 12:10 pm

Interesting, thanks for the explication

So I checked the time and everything looks OK, on both the XI server and the DB server

I enable debuging on ndo2db. I see stuff going into the database, but I still see the Monitoring Event Engine queue blank.

if the events go to the nagiosxi database, should look in the Postgresql logs files and see if I find anything there

scottwilkerson · Post by **scottwilkerson** » Wed Sep 27, 2017 1:50 pm

bosecorp wrote:if the events go to the nagiosxi database, should look in the Postgresql logs files and see if I find anything there

Yes this would make sense. You may also want to vacuum your postgres DB

Code: Select all

echo "vacuum;vacuum analyse;vacuum full;"|psql nagiosxi postgres
echo "vacuum;vacuum analyse;vacuum full;"|psql postgres postgres

bosecorp · Post by **bosecorp** » Thu Sep 28, 2017 9:17 am

it didn't help

scottwilkerson · Post by **scottwilkerson** » Thu Sep 28, 2017 10:23 am

Is your "Monitoring Engine Check Statistics" always all zeros too?

What version of XI are you running?

Nagios Support Forum

Monitoring Event Engine Queue

Monitoring Event Engine Queue

Re: Monitoring Event Engine Queue

Re: Monitoring Event Engine Queue

Re: Monitoring Event Engine Queue

Re: Monitoring Event Engine Queue

Re: Monitoring Event Engine Queue

Re: Monitoring Event Engine Queue

Re: Monitoring Event Engine Queue

Re: Monitoring Event Engine Queue

Re: Monitoring Event Engine Queue