Monitoring Event Engine Queue

This support forum board is for support questions relating to Nagios XI, our flagship commercial network monitoring solution.
bosecorp
Posts: 929
Joined: Thu Jun 26, 2014 1:00 pm

Monitoring Event Engine Queue

Post by bosecorp »

HI

we are seeing issues with the Monitoring Event Engine queue.

I am not seeing issues in the logs related to ndo or too much messages

Even though I am not seeing this "NDOUtils - Message Queue Exceeded", I followed the recommendations in the link below

https://support.nagios.com/kb/article.php?id=139


This is what I see

# ipcs -q

------ Message Queues --------
key msqid owner perms used-bytes messages
0x50020080 327680 nagios 600 361473024 353001

I dont see issues in mysql. The database is healthy, we dont seem to have corruption in any of the tables or anything like that

I do see JOBs being process, when I run gearman_top2 I see jobs running and being process. I also check the mod_gearman_worker logs and I see jobs being process there as well

I have also tried disableing mod_gearman in nagios.cfg, but didn't make difference

I have also tried restarting nagios, but again it doesnt make any difference

[1506450666] Nagios 4.2.4 starting... (PID=68547)
[1506450666] Local time is Tue Sep 26 14:31:06 EDT 2017
[1506450666] LOG VERSION: 2.0
[1506450666] qh: Socket '/usr/local/nagios/var/rw/nagios.qh' successfully initialized
[1506450666] qh: core query handler registered
[1506450666] nerd: Channel hostchecks registered successfully
[1506450666] nerd: Channel servicechecks registered successfully
[1506450666] nerd: Channel opathchecks registered successfully
[1506450666] nerd: Fully initialized and ready to rock!
[1506450666] wproc: Successfully registered manager as @wproc with query handler
[1506450666] wproc: Registry request: name=Core Worker 68549;pid=68549
[1506450666] wproc: Registry request: name=Core Worker 68550;pid=68550
[1506450666] wproc: Registry request: name=Core Worker 68551;pid=68551
[1506450666] wproc: Registry request: name=Core Worker 68552;pid=68552
[1506450666] mod_gearman: initialized version 2.1.1 (libgearman 0.33)
[1506450666] Event broker module '/usr/lib64/mod_gearman2/mod_gearman2.o' initialized successfully.
[1506450666] ndomod: NDOMOD 2.1.2 (11-14-2016) Copyright (c) 2009 Nagios Core Development Team and Community Contributors
[1506450666] ndomod: Successfully connected to data sink. 0 queued items to flush.
[1506450666] ndomod registered for process data
[1506450666] ndomod registered for log data'
[1506450666] ndomod registered for system command data'
[1506450666] ndomod registered for event handler data'
[1506450666] ndomod registered for notification data'
[1506450666] ndomod registered for comment data'
[1506450666] ndomod registered for downtime data'
[1506450666] ndomod registered for flapping data'
[1506450666] ndomod registered for program status data'
[1506450666] ndomod registered for host status data'
[1506450666] ndomod registered for service status data'
[1506450666] ndomod registered for adaptive program data'
[1506450666] ndomod registered for adaptive host data'
[1506450666] ndomod registered for adaptive service data'
[1506450666] ndomod registered for external command data'
[1506450666] ndomod registered for aggregated status data'
[1506450666] ndomod registered for retention data'
[1506450666] ndomod registered for contact data'
[1506450666] ndomod registered for contact notification data'
[1506450666] ndomod registered for acknowledgement data'
[1506450666] ndomod registered for state change data'
[1506450666] ndomod registered for contact status data'
[1506450666] ndomod registered for adaptive contact data'
[1506450666] Event broker module '/usr/local/nagios/bin/ndomod.o' initialized successfully.
scottwilkerson
DevOps Engineer
Posts: 19396
Joined: Tue Nov 15, 2011 3:11 pm
Location: Nagios Enterprises
Contact:

Re: Monitoring Event Engine Queue

Post by scottwilkerson »

bosecorp wrote:we are seeing issues with the Monitoring Event Engine queue.
What exactly is the issue you are experiencing?

Also, can you post the output of the following

Code: Select all

ps -ef|grep bin/nagios
bosecorp wrote:I have also tried restarting nagios, but again it doesnt make any difference
Generally speaking restarting nagios makes the queue swell temporarily as all the data is pushed to the DB to update it with what is running.

Is this a local or offloaded DB?
Former Nagios employee
Creator:
Human Design Website
Get Your Human Design Chart
bosecorp
Posts: 929
Joined: Thu Jun 26, 2014 1:00 pm

Re: Monitoring Event Engine Queue

Post by bosecorp »

Yes , my DB is offloaded


the issue that I am experiencing is that I am not seeing any activity when I look at the Monitoring Event Engine Queue

# ps -ef | grep nagios.cfg
nagios 91709 1 15 16:39 ? 00:00:11 /usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg
nagios 91774 91709 0 16:39 ? 00:00:00 /usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg
root 99507 8098 0 16:40 pts/1 00:00:00 grep --color=auto nagios.cfg
[email protected]:(09-26 13:17): /usr/local/nagiosxi/html
# ps -ef|grep bin/nagios
nagios 91709 1 10 16:39 ? 00:00:14 /usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg
nagios 91711 91709 0 16:39 ? 00:00:00 /usr/local/nagios/bin/nagios --worker /usr/local/nagios/var/rw/nagios.qh
nagios 91712 91709 0 16:39 ? 00:00:00 /usr/local/nagios/bin/nagios --worker /usr/local/nagios/var/rw/nagios.qh
nagios 91713 91709 0 16:39 ? 00:00:00 /usr/local/nagios/bin/nagios --worker /usr/local/nagios/var/rw/nagios.qh
nagios 91714 91709 0 16:39 ? 00:00:00 /usr/local/nagios/bin/nagios --worker /usr/local/nagios/var/rw/nagios.qh
nagios 91774 91709 0 16:39 ? 00:00:00 /usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg
root 106586 8098 0 16:42 pts/1 00:00:00 grep --color=auto bin/nagios
[email protected]:(09-26 13:17): /usr/local/nagiosxi/html
You do not have the required permissions to view the files attached to this post.
scottwilkerson
DevOps Engineer
Posts: 19396
Joined: Tue Nov 15, 2011 3:11 pm
Location: Nagios Enterprises
Contact:

Re: Monitoring Event Engine Queue

Post by scottwilkerson »

It's almost like there was a time shift on your server, does the time on your DB server match the time on your XI server?

If not we will want to get them synced to the same ntp server.

As a FYI this Queue is much different than the system Message Queues referenced in https://support.nagios.com/kb/article.php?id=139
Former Nagios employee
Creator:
Human Design Website
Get Your Human Design Chart
bosecorp
Posts: 929
Joined: Thu Jun 26, 2014 1:00 pm

Re: Monitoring Event Engine Queue

Post by bosecorp »

how can you tell that?

can you elaborate on the queue used for the monitoring event engine queue? what queue is used for that
scottwilkerson
DevOps Engineer
Posts: 19396
Joined: Tue Nov 15, 2011 3:11 pm
Location: Nagios Enterprises
Contact:

Re: Monitoring Event Engine Queue

Post by scottwilkerson »

bosecorp wrote:how can you tell that?

can you elaborate on the queue used for the monitoring event engine queue? what queue is used for that
I know specifically because I've developed the code.

This queue are events that need to be processed and are things like email notifications, state changes etc, they are stored in the nagiosxi database and are processed in the eventman cronjob
Former Nagios employee
Creator:
Human Design Website
Get Your Human Design Chart
bosecorp
Posts: 929
Joined: Thu Jun 26, 2014 1:00 pm

Re: Monitoring Event Engine Queue

Post by bosecorp »

Interesting, thanks for the explication

So I checked the time and everything looks OK, on both the XI server and the DB server

I enable debuging on ndo2db. I see stuff going into the database, but I still see the Monitoring Event Engine queue blank.

if the events go to the nagiosxi database, should look in the Postgresql logs files and see if I find anything there
scottwilkerson
DevOps Engineer
Posts: 19396
Joined: Tue Nov 15, 2011 3:11 pm
Location: Nagios Enterprises
Contact:

Re: Monitoring Event Engine Queue

Post by scottwilkerson »

bosecorp wrote:if the events go to the nagiosxi database, should look in the Postgresql logs files and see if I find anything there
Yes this would make sense. You may also want to vacuum your postgres DB

Code: Select all

echo "vacuum;vacuum analyse;vacuum full;"|psql nagiosxi postgres
echo "vacuum;vacuum analyse;vacuum full;"|psql postgres postgres
Former Nagios employee
Creator:
Human Design Website
Get Your Human Design Chart
bosecorp
Posts: 929
Joined: Thu Jun 26, 2014 1:00 pm

Re: Monitoring Event Engine Queue

Post by bosecorp »

it didn't help
scottwilkerson
DevOps Engineer
Posts: 19396
Joined: Tue Nov 15, 2011 3:11 pm
Location: Nagios Enterprises
Contact:

Re: Monitoring Event Engine Queue

Post by scottwilkerson »

Is your "Monitoring Engine Check Statistics" always all zeros too?

What version of XI are you running?
Former Nagios employee
Creator:
Human Design Website
Get Your Human Design Chart
Locked