Nagios Support Forum

Posted: **Mon Jul 31, 2017 1:24 am**

Hi Team,

We are facing a problem with Maintenance mode in Nagios. It seems that yesterday when I have set all the devices in NA region for maintenance mode it has overridden all the previous hosts/services which were under MM for one-month or longer (Windows and AIX Servers which were to be decommissioned).

We are receiving incidents from Nagios of those hosts, which were under maintenance for about a month or more, but had been removed from old downtime because of last night's downtime activity.Because of this availability is getting affected and we are escalated badly

Could you please help us in resolving the issue on priority.

Posted: **Mon Jul 31, 2017 1:41 am**

Also there are few servers for which downtime ends prematurely and host down alert is being sent.

Nagios version-Nagios XI 5.4.4

Posted: **Mon Jul 31, 2017 9:09 am**

did you use a regular schedule downtime, or a reoccuring downtime?

how was your windows / aix servers scheduled in downtime?

my guess is they'll need to reproduce this.

Posted: **Mon Jul 31, 2017 10:06 am**

raamardhani7, can you describe in details all of the steps you took to place your devices in maintenance mode, prior to encountering the issue? Screenshots of the previous scheduled and/or recurring downtimes would be helpful, along with relevant logs. We will try to recreate the issue in house.

Posted: **Fri Nov 03, 2017 8:55 am**

Hi ,

If a device has been kept in maintenance mode for a month and if there is an activity for which multiple devices have to be put in maintenance mode for 2 hours, then the previous maintenance period of one month gets canceled/overridden.

This issue is we are facing from this June.Earlier it was the host takes longest scheduled time of the windows.
EX if a host is put in maintenance mode in today for week but after 2 days someone puts the same server in maintenece mode for 15 days .The host was in maintenance mode for 17 days (2 +15 ).

We have 3 NagiosXI server but only this one Nagios XI server is having this issue. This mostly happens whenever there is network activty and all the servers configured in that Nagios XI is put in Maintenance mode.
Also one more finding we found that during such activty our Nagios XI disk space reaches 100% and also the database also crashes.The event handler is filling rapidly which causes the disk space to reach 100%. Not sure this has something to do with
server coming out of scheduled downtime in Nagios.

Currently we have 1128 servers and 180501 services configured on that NagiosXI

Posted: **Fri Nov 03, 2017 12:59 pm**

Hello, @raamardhani7.

Currently we have 1128 servers and 180501 services configured on that NagiosXI

The event handler is filling rapidly which causes the disk space to reach 100%.

Have you considered splitting the XI load between two servers? I don't know your hardware configuration but it seems that 180501 services are a lot to handle for only 1 xi server.

Nagios version-Nagios XI 5.4.4

I'd start with upgrading your Nagios XI to the latest version. That might automatically fix the issue.

Also, could you upload timeperiods.cfg file from /usr/local/nagios/etc/

Posted: **Mon Nov 20, 2017 8:33 am**

Please find the attached file of timeperiod.cfg file.
We still facing the same issue.

Posted: **Mon Nov 20, 2017 12:06 pm**

@raamardhani7, I think upgrading Nagios XI to the latest version may fix this issue. There were a few bug fixes related to the scheduled downtime since version 5.4.4. However, before you upgrade I highly recommend doing some optimizations on your system. Can you post the output of df -h. Chances are your system needs more memory to be able to function normally. Also, you said that this XI is responsible for 180501 service checks, that seems like a lot! Did you mean to say 18501 by chance?
PS: For a faster resolution, you may also create a support ticket: https://support.nagios.com/tickets

Posted: **Tue Nov 21, 2017 10:09 am**

I have added the df -h file.
Yes there are 18501 services only.

Posted: **Tue Nov 21, 2017 11:42 am**

Let's check a few things. Run the following commands and show the output:

# These two commands will show us the ramdisk entries in the /etc/init.d/nagios file, and the entire /etc/sysconfig/nagios file

Code: Select all

grep -i ramdisk /etc/init.d/nagios
cat /etc/sysconfig/nagios

# These commands will show us the nagiosramdisk entries in various config files

Code: Select all

grep nagiosramdisk /usr/local/nagios/etc/nagios.cfg
grep nagiosramdisk /usr/local/nagiosmobile/include.inc.php
grep nagiosramdisk /usr/local/nrdp/server/config.inc.php
grep nagiosramdisk /usr/local/nagiosxi/html/config.inc.php
grep nagiosramdisk /usr/local/nagios/etc/pnp/npcd.cfg
grep nagiosramdisk /usr/local/nagios/etc/commands.cfg

# These commands will show the permissions on the checkresults directory, and how many perfdata files are in it

Code: Select all

ls -lad /var/nagiosramdisk/spool/checkresults
ls /var/nagiosramdisk/spool/checkresults/ | wc -l

# Let's see if the perdataproc cron job is running

Code: Select all

ps -ef | grep [p]erfdataproc

Nagios Support Forum

Server Moving out from maintenance

Server Moving out from maintenance

Re: Server Moving out from maintenance

Re: Server Moving out from maintenance

Re: Server Moving out from maintenance

Re: Server Moving out from maintenance

Re: Server Moving out from maintenance

Re: Server Moving out from maintenance

Re: Server Moving out from maintenance

Re: Server Moving out from maintenance

Re: Server Moving out from maintenance