Server Moving out from maintenance
-
raamardhani7
- Posts: 459
- Joined: Tue Jun 02, 2015 12:36 am
Server Moving out from maintenance
Hi Team,
We are facing a problem with Maintenance mode in Nagios. It seems that yesterday when I have set all the devices in NA region for maintenance mode it has overridden all the previous hosts/services which were under MM for one-month or longer (Windows and AIX Servers which were to be decommissioned).
We are receiving incidents from Nagios of those hosts, which were under maintenance for about a month or more, but had been removed from old downtime because of last night's downtime activity.Because of this availability is getting affected and we are escalated badly
Could you please help us in resolving the issue on priority.
We are facing a problem with Maintenance mode in Nagios. It seems that yesterday when I have set all the devices in NA region for maintenance mode it has overridden all the previous hosts/services which were under MM for one-month or longer (Windows and AIX Servers which were to be decommissioned).
We are receiving incidents from Nagios of those hosts, which were under maintenance for about a month or more, but had been removed from old downtime because of last night's downtime activity.Because of this availability is getting affected and we are escalated badly
Could you please help us in resolving the issue on priority.
-
raamardhani7
- Posts: 459
- Joined: Tue Jun 02, 2015 12:36 am
Re: Server Moving out from maintenance
Also there are few servers for which downtime ends prematurely and host down alert is being sent.
Nagios version-Nagios XI 5.4.4
Nagios version-Nagios XI 5.4.4
- tacolover101
- Posts: 432
- Joined: Mon Apr 10, 2017 11:55 am
Re: Server Moving out from maintenance
did you use a regular schedule downtime, or a reoccuring downtime?
how was your windows / aix servers scheduled in downtime?
my guess is they'll need to reproduce this.
how was your windows / aix servers scheduled in downtime?
my guess is they'll need to reproduce this.
Re: Server Moving out from maintenance
raamardhani7, can you describe in details all of the steps you took to place your devices in maintenance mode, prior to encountering the issue? Screenshots of the previous scheduled and/or recurring downtimes would be helpful, along with relevant logs. We will try to recreate the issue in house.
Be sure to check out our Knowledgebase for helpful articles and solutions!
-
raamardhani7
- Posts: 459
- Joined: Tue Jun 02, 2015 12:36 am
Re: Server Moving out from maintenance
Hi ,
If a device has been kept in maintenance mode for a month and if there is an activity for which multiple devices have to be put in maintenance mode for 2 hours, then the previous maintenance period of one month gets canceled/overridden.
This issue is we are facing from this June.Earlier it was the host takes longest scheduled time of the windows.
EX if a host is put in maintenance mode in today for week but after 2 days someone puts the same server in maintenece mode for 15 days .The host was in maintenance mode for 17 days (2 +15 ).
We have 3 NagiosXI server but only this one Nagios XI server is having this issue. This mostly happens whenever there is network activty and all the servers configured in that Nagios XI is put in Maintenance mode.
Also one more finding we found that during such activty our Nagios XI disk space reaches 100% and also the database also crashes.The event handler is filling rapidly which causes the disk space to reach 100%. Not sure this has something to do with
server coming out of scheduled downtime in Nagios.
Currently we have 1128 servers and 180501 services configured on that NagiosXI
If a device has been kept in maintenance mode for a month and if there is an activity for which multiple devices have to be put in maintenance mode for 2 hours, then the previous maintenance period of one month gets canceled/overridden.
This issue is we are facing from this June.Earlier it was the host takes longest scheduled time of the windows.
EX if a host is put in maintenance mode in today for week but after 2 days someone puts the same server in maintenece mode for 15 days .The host was in maintenance mode for 17 days (2 +15 ).
We have 3 NagiosXI server but only this one Nagios XI server is having this issue. This mostly happens whenever there is network activty and all the servers configured in that Nagios XI is put in Maintenance mode.
Also one more finding we found that during such activty our Nagios XI disk space reaches 100% and also the database also crashes.The event handler is filling rapidly which causes the disk space to reach 100%. Not sure this has something to do with
server coming out of scheduled downtime in Nagios.
Currently we have 1128 servers and 180501 services configured on that NagiosXI
-
npolovenko
- Support Tech
- Posts: 3457
- Joined: Mon May 15, 2017 5:00 pm
Re: Server Moving out from maintenance
Hello, @raamardhani7.
Also, could you upload timeperiods.cfg file from /usr/local/nagios/etc/
Currently we have 1128 servers and 180501 services configured on that NagiosXI
Have you considered splitting the XI load between two servers? I don't know your hardware configuration but it seems that 180501 services are a lot to handle for only 1 xi server.The event handler is filling rapidly which causes the disk space to reach 100%.
I'd start with upgrading your Nagios XI to the latest version. That might automatically fix the issue.Nagios version-Nagios XI 5.4.4
Also, could you upload timeperiods.cfg file from /usr/local/nagios/etc/
As of May 25th, 2018, all communications with Nagios Enterprises and its employees are covered under our new Privacy Policy.
-
raamardhani7
- Posts: 459
- Joined: Tue Jun 02, 2015 12:36 am
Re: Server Moving out from maintenance
Please find the attached file of timeperiod.cfg file.
We still facing the same issue.
We still facing the same issue.
You do not have the required permissions to view the files attached to this post.
-
npolovenko
- Support Tech
- Posts: 3457
- Joined: Mon May 15, 2017 5:00 pm
Re: Server Moving out from maintenance
@raamardhani7, I think upgrading Nagios XI to the latest version may fix this issue. There were a few bug fixes related to the scheduled downtime since version 5.4.4. However, before you upgrade I highly recommend doing some optimizations on your system. Can you post the output of df -h. Chances are your system needs more memory to be able to function normally. Also, you said that this XI is responsible for 180501 service checks, that seems like a lot! Did you mean to say 18501 by chance?
PS: For a faster resolution, you may also create a support ticket: https://support.nagios.com/tickets
PS: For a faster resolution, you may also create a support ticket: https://support.nagios.com/tickets
As of May 25th, 2018, all communications with Nagios Enterprises and its employees are covered under our new Privacy Policy.
-
raamardhani7
- Posts: 459
- Joined: Tue Jun 02, 2015 12:36 am
Re: Server Moving out from maintenance
I have added the df -h file.
Yes there are 18501 services only.
Yes there are 18501 services only.
You do not have the required permissions to view the files attached to this post.
Re: Server Moving out from maintenance
Let's check a few things. Run the following commands and show the output:
# These two commands will show us the ramdisk entries in the /etc/init.d/nagios file, and the entire /etc/sysconfig/nagios file
# These commands will show us the nagiosramdisk entries in various config files
# These commands will show the permissions on the checkresults directory, and how many perfdata files are in it
# Let's see if the perdataproc cron job is running
# These two commands will show us the ramdisk entries in the /etc/init.d/nagios file, and the entire /etc/sysconfig/nagios file
Code: Select all
grep -i ramdisk /etc/init.d/nagios
cat /etc/sysconfig/nagiosCode: Select all
grep nagiosramdisk /usr/local/nagios/etc/nagios.cfg
grep nagiosramdisk /usr/local/nagiosmobile/include.inc.php
grep nagiosramdisk /usr/local/nrdp/server/config.inc.php
grep nagiosramdisk /usr/local/nagiosxi/html/config.inc.php
grep nagiosramdisk /usr/local/nagios/etc/pnp/npcd.cfg
grep nagiosramdisk /usr/local/nagios/etc/commands.cfgCode: Select all
ls -lad /var/nagiosramdisk/spool/checkresults
ls /var/nagiosramdisk/spool/checkresults/ | wc -lCode: Select all
ps -ef | grep [p]erfdataprocBe sure to check out our Knowledgebase for helpful articles and solutions!