Page 1 of 1

Notifications although in scheduled downtime

Posted: Mon Nov 21, 2016 6:49 am
by sib
Hi

Currently we get sometimes notified by services that are in scheduled downtime. I am a bit puzzled. Also on the GUI does not always show that a service has scheduled downtime.

We use Nagios XI 5.3.2

I added Host and all its services to scheduled downtime using the command file.

Code: Select all

[1479728291] EXTERNAL COMMAND: SCHEDULE_HOST_DOWNTIME;lbnss22;1479728291;1479731891;1;0;3600;ch002854;Schedule Downtime
[1479728291] EXTERNAL COMMAND: SCHEDULE_HOST_SVC_DOWNTIME;lbnss22;1479728291;1479731891;1;0;3600;ch002854;Scheduled Downtime
[1479728291] HOST DOWNTIME ALERT: lbnss22;STARTED; Host has entered a period of scheduled downtime
[1479728291] SERVICE DOWNTIME ALERT: lbnss22;Worker choked;STARTED; Service has entered a period of scheduled downtime
[1479728291] SERVICE DOWNTIME ALERT: lbnss22;Total Service Problems;STARTED; Service has entered a period of scheduled downtime
[1479728291] SERVICE DOWNTIME ALERT: lbnss22;Total Host Problems;STARTED; Service has entered a period of scheduled downtime
[1479728291] SERVICE DOWNTIME ALERT: lbnss22;PassiveServiceChecks 5min;STARTED; Service has entered a period of scheduled downtime
[1479728291] SERVICE DOWNTIME ALERT: lbnss22;PassiveServiceChecks 1min;STARTED; Service has entered a period of scheduled downtime
[1479728291] SERVICE DOWNTIME ALERT: lbnss22;PassiveServiceChecks 15min;STARTED; Service has entered a period of scheduled downtime
[1479728291] SERVICE DOWNTIME ALERT: lbnss22;PassiveHostChecks 5min;STARTED; Service has entered a period of scheduled downtime
[1479728291] SERVICE DOWNTIME ALERT: lbnss22;PassiveHostChecks 1min;STARTED; Service has entered a period of scheduled downtime
[1479728291] SERVICE DOWNTIME ALERT: lbnss22;PassiveHostChecks 15min;STARTED; Service has entered a period of scheduled downtime
[1479728291] SERVICE DOWNTIME ALERT: lbnss22;NRPE Service;STARTED; Service has entered a period of scheduled downtime
[1479728291] SERVICE DOWNTIME ALERT: lbnss22;Local Filesystems;STARTED; Service has entered a period of scheduled downtime
[1479728291] SERVICE DOWNTIME ALERT: lbnss22;Load Average;STARTED; Service has entered a period of scheduled downtime
[1479728291] SERVICE DOWNTIME ALERT: lbnss22;HighCommandBufferUsage;STARTED; Service has entered a period of scheduled downtime
[1479728291] SERVICE DOWNTIME ALERT: lbnss22;FS corruption;STARTED; Service has entered a period of scheduled downtime
[1479728291] SERVICE DOWNTIME ALERT: lbnss22;ExternalCommandsUsed 5min;STARTED; Service has entered a period of scheduled downtime
[1479728291] SERVICE DOWNTIME ALERT: lbnss22;ExternalCommandsUsed 1min;STARTED; Service has entered a period of scheduled downtime
[1479728291] SERVICE DOWNTIME ALERT: lbnss22;Check Nagios command file;STARTED; Service has entered a period of scheduled downtime
[1479728291] SERVICE DOWNTIME ALERT: lbnss22;Check Nagios API Service;STARTED; Service has entered a period of scheduled downtime
[1479728291] SERVICE DOWNTIME ALERT: lbnss22;Check Linux API Service;STARTED; Service has entered a period of scheduled downtime
[1479728291] SERVICE DOWNTIME ALERT: lbnss22;AvgServiceExecTime;STARTED; Service has entered a period of scheduled downtime
[1479728291] SERVICE DOWNTIME ALERT: lbnss22;AvgPassiveHostLatency;STARTED; Service has entered a period of scheduled downtime
[1479728291] SERVICE DOWNTIME ALERT: lbnss22;AvgHostExecTime;STARTED; Service has entered a period of scheduled downtime
[1479728291] SERVICE DOWNTIME ALERT: lbnss22;AvgCommandBufferUsage;STARTED; Service has entered a period of scheduled downtime
[1479728291] SERVICE DOWNTIME ALERT: lbnss22;AvgActiveServiceLatency;STARTED; Service has entered a period of scheduled downtime
[1479728291] SERVICE DOWNTIME ALERT: lbnss22;AvgActiveHostLatency;STARTED; Service has entered a period of scheduled downtime
[1479728291] SERVICE DOWNTIME ALERT: lbnss22;ActiveServiceChecks 5min;STARTED; Service has entered a period of scheduled downtime
[1479728291] SERVICE DOWNTIME ALERT: lbnss22;ActiveServiceChecks 1min;STARTED; Service has entered a period of scheduled downtime
[1479728291] SERVICE DOWNTIME ALERT: lbnss22;ActiveServiceChecks 15min;STARTED; Service has entered a period of scheduled downtime
[1479728291] SERVICE DOWNTIME ALERT: lbnss22;ActiveHostChecks 5min;STARTED; Service has entered a period of scheduled downtime
[1479728291] SERVICE DOWNTIME ALERT: lbnss22;ActiveHostChecks 1min;STARTED; Service has entered a period of scheduled downtime
[1479728291] SERVICE DOWNTIME ALERT: lbnss22;ActiveHostChecks 15min;STARTED; Service has entered a period of scheduled downtime
The log file states that the checks have entered scheduled downtime period.

But still I get alerted

Code: Select all

[1479728579] SERVICE ALERT: lbnss22;Check Nagios API Service;CRITICAL;HARD;1;CRITICAL: Status CRITICAL.URLError[[Errno 111] Connection refused], url:http://10.32.30.22:8080
[1479728579] SERVICE NOTIFICATION: Linux Team E-Mail;lbnss22;Check Nagios API Service;CRITICAL;notify-service-by-email;CRITICAL: Status CRITICAL.URLError[[Errno 111] Connection refused], url:http://10.32.30.22:8080
[1479728579] SERVICE NOTIFICATION: Linux Team SMS;lbnss22;Check Nagios API Service;CRITICAL;notify-service-by-inConsole-group;CRITICAL: Status CRITICAL.URLError[[Errno 111] Connection refused], url:http://10.32.30.22:8080
Also looking at the GUI sometimes the downtime icon is there. Sometimes not. Looking at the screenshot all services should have the scheduled downtime icon.
downtime.PNG

Re: Notifications although in scheduled downtime

Posted: Mon Nov 21, 2016 10:43 am
by rkennedy
Could you PM a profile over? This will have quite a few files to look at that should have some interesting information. (Admin -> System Profile -> Download Profile)

Also, a few things to verify -
1. Can you show us a screenshot of the scheduled downtime on the XI interface?
2. Is it applying properly in the Core interface? Could you post a screenshot of the Core interface - http://ip.of.nagios/nagios/ (replace ip.of.nagios with the IP/hostname of the machine) - then login with an admin account.
3. Please run the following commands and post the output, to verify the timezone set on your XI machine.

Code: Select all

grep "date.timezone" /etc/php.ini
ls -l /etc/localtime
php -r 'echo date("D M j G:i:s T Y")."\n";'
date
mysql -unagiosxi -pn@gweb -e "SELECT NOW();"
EDIT: profile received.

Re: Notifications although in scheduled downtime

Posted: Mon Nov 21, 2016 11:07 am
by sib
rkennedy wrote:Could you PM a profile over? This will have quite a few files to look at that should have some interesting information. (Admin -> System Profile -> Download Profile)

Also, a few things to verify -
1. Can you show us a screenshot of the scheduled downtime on the XI interface?
2. Is it applying properly in the Core interface? Could you post a screenshot of the Core interface - http://ip.of.nagios/nagios/ (replace ip.of.nagios with the IP/hostname of the machine) - then login with an admin account.
3. Please run the following commands and post the output, to verify the timezone set on your XI machine.

Code: Select all

grep "date.timezone" /etc/php.ini
ls -l /etc/localtime
php -r 'echo date("D M j G:i:s T Y")."\n";'
date
mysql -unagiosxi -pn@gweb -e "SELECT NOW();"
1.
scheduled_downtime.PNG
2.
The url does not exist_

3.
; http://www.php.net/manual/en/datetime.c ... e.timezone
date.timezone = Europe/Zurich

[root][lbnss22][/usr/local/nagios/libexec][git::master][0]
# ls -l /etc/localtime
lrwxrwxrwx 1 root root 33 Mar 2 2015 /etc/localtime -> /usr/share/zoneinfo/Europe/Zurich

[root][lbnss22][/usr/local/nagios/libexec][git::master][0]
# php -r 'echo date("D M j G:i:s T Y")."\n";'
No log handling enabled - turning on stderr logging
/usr/local/nagioslogserver/mibs/NAGIOS-ROOT-MIB.txt: No such file or directory
/usr/local/nagioslogserver/mibs/NAGIOS-NOTIFY-MIB.txt: No such file or directory
/usr/local/nagioslogserver/mibs/NAGIOS-ROOT-MIB.txt: No such file or directory
/usr/local/nagioslogserver/mibs/NAGIOS-NOTIFY-MIB.txt: No such file or directory
Mon Nov 21 17:05:54 CET 2016

[root][lbnss22][/usr/local/nagios/libexec][git::master][0]
# date
Mon Nov 21 17:05:55 CET 2016

[root][lbnss22][/usr/local/nagios/libexec][git::master][0]
# mysql -unagiosxi -pn@gweb -e "SELECT NOW();"
ERROR 1045 (28000): Access denied for user 'nagiosxi'@'localhost' (using password: YES)

[root][lbnss22][/usr/local/nagios/libexec][git::master][1]
# mysql -unagiosql -pn@gweb -e "SELECT NOW();"
+---------------------+
| NOW() |
+---------------------+
| 2016-11-21 17:06:29 |
+---------------------+

Re: Notifications although in scheduled downtime

Posted: Mon Nov 21, 2016 4:58 pm
by ssax
What is the output of these commands?

Code: Select all

ipcs -q
ps aux | grep nagios.cfg
Thank you

Re: Notifications although in scheduled downtime

Posted: Tue Nov 22, 2016 2:35 am
by sib
Hi

Code: Select all

# ipcs -q

------ Message Queues --------
key        msqid      owner      perms      used-bytes   messages
0xc1010002 1507328    nagios     600        0            0
0x01010002 1736705    nagios     600        0            0

Code: Select all

# ps aux | grep nagios.cfg
nagios    4842  0.3  0.0  23524  3728 ?        Ss   Nov17  28:14 /usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg
nagios    5823  0.0  0.0  22884   576 ?        S    Nov17   0:14 /usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg
nagios   16854  0.3  0.0  23552  6216 ?        Ss   08:01   0:08 /usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg
nagios   17046  0.0  0.0  22908  3040 ?        S    08:01   0:00 /usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg
root     24737  0.0  0.0 103332   920 pts/0    S+   08:35   0:00 grep nagios.cfg

Re: Notifications although in scheduled downtime

Posted: Tue Nov 22, 2016 3:08 am
by sib
Hi

The issue seems to be fixed. Due to some ghost services I removed the object cache file and rebooted the whole server.

Looking good so far.

I moved the following file

Code: Select all

mv /usr/local/nagios/var/objects.cache /usr/local/nagios/var/objects.cache.old

Re: Notifications although in scheduled downtime

Posted: Tue Nov 22, 2016 10:17 am
by ssax
Glad you're up and running!

In regards to what you posted before it was resolved, you should only have one message queue for nagios like below:

Code: Select all

[root@ssc66xid ~]# ipcs -q

------ Message Queues --------
key        msqid      owner      perms      used-bytes   messages
0x63000002 98304      nagios     600        0            0
If you have more than one you would need to perform the following steps to fix the message queues (the reboot likely fixed it):

Code: Select all

service nagios stop
killall -9 nagios
service ndo2db stop
service mysqld restart
for i in `ipcs -q | grep nagios |awk '{print $2}'`; do ipcrm -q $i; done
service ndo2db start
service nagios start
And you should only have two nagios daemon processes, any more any you have a problem:

Code: Select all

[root@ssc66xid ~]# ps aux | grep nagios.cfg | grep -v grep
nagios    8762  0.0  0.1  21224  2300 ?        Ss   Nov21   0:59 /usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg
nagios    8778  0.0  0.0  20580  1248 ?        S    Nov21   0:04 /usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg
The service nagios stop and the killall -9 nagios above would fix the multiple nagios daemon processes issue.

Re: Notifications although in scheduled downtime

Posted: Tue Nov 22, 2016 10:24 am
by sib
hi

that would explain why a reboot helped.

thanks

Re: Notifications although in scheduled downtime

Posted: Tue Nov 22, 2016 12:45 pm
by avandemore
Great, it seems your issue is resolved. Are we okay to lock this thread?

Re: Notifications although in scheduled downtime

Posted: Tue Nov 22, 2016 3:38 pm
by sib
yes. thank you