We have changed the configuration to 5-2-8. (check interval-5 mins, max check attempts to 8 and retry interval-2mins) but still the alerts are getting generated in less than 8 mins(instead of 16 mins).Please refer the attachments and suggest something.
We have also set the time out time for check ping host to 120 seconds.
Alerts are getting generated before the time limit.
-
raamardhani7
- Posts: 459
- Joined: Tue Jun 02, 2015 12:36 am
Alerts are getting generated before the time limit.
You do not have the required permissions to view the files attached to this post.
-
dwhitfield
- Former Nagios Staff
- Posts: 4583
- Joined: Wed Sep 21, 2016 10:29 am
- Location: NoLo, Minneapolis, MN
- Contact:
Re: Alerts are getting generated before the time limit.
It's unlikely to ever be exact, due to scheduling to reduce load. That said, you may have multiple nagios processes.
Regarding the instructions below, if you do not have killall, you can install it via the following command:
# yum install psmisc
If psmisc is not in your repos, then instead you can check to make sure nagios is not running with
# ps -aef | grep nagios
The following are not all required, but if you are going to bring nagios down, it seems like a good time to bring everything else down briefly. Also, you should not have to clear any of these lock files, but checking to see if there is a lock takes just as much time as clearing the lock, so I just have people go ahead and clear.
You ***must*** use mariadb instead of mysqld in the commands below, ***if*** you have mariadb.
# service nagios stop
# service ndo2db stop
# service mysqld stop
# service crond stop
# service httpd stop
# killall -9 nagios
# killall -9 ndo2db
# rm -f /usr/local/nagios/var/rw/nagios.cmd
# rm -f /usr/local/nagios/var/nagios.lock
# rm -f /usr/local/nagios/var/ndo.sock
# rm -f /usr/local/nagios/var/ndo2db.lock
# rm -f /usr/local/nagiosxi/var/reconfigure_nagios.lock
# for i in `ipcs -q | grep nagios |awk '{print $2}'`; do ipcrm -q $i; done
# service mysqld start
# service ndo2db start
# service nagios start
# service httpd start
# service crond start
If that does not resolve the issue, can you PM me your Profile? You can download it by going to Admin > System Config > System Profile and click the ***Download Profile*** button towards the top. If for whatever reason you *cannot* download the profile, please put the output of View System Info (5.3.4+, Show Profile if older) in the thread (that will at least get us some info). This will give us access to many of the logs we would otherwise ask for individually. If security is a concern, you can unzip the profile take out what you like, and then zip it up again. We may end up needing something you remove, but we can ask for that specifically.
You can also generate a profile manually using the script at /usr/local/nagiosxi/html/includes/components/profile/getprofile.sh
That should generate a profile in /usr/local/nagiosxi/var/components/ which you can get off the server with an application such as FileZilla.
After you PM the profile, please update this thread. Updating this thread is the only way for it to show back up on our dashboard.
If you get an error that PROFILE BUILD FAILED, please see https://support.nagios.com/kb/article.p ... ategory=44
Regarding the instructions below, if you do not have killall, you can install it via the following command:
# yum install psmisc
If psmisc is not in your repos, then instead you can check to make sure nagios is not running with
# ps -aef | grep nagios
The following are not all required, but if you are going to bring nagios down, it seems like a good time to bring everything else down briefly. Also, you should not have to clear any of these lock files, but checking to see if there is a lock takes just as much time as clearing the lock, so I just have people go ahead and clear.
You ***must*** use mariadb instead of mysqld in the commands below, ***if*** you have mariadb.
# service nagios stop
# service ndo2db stop
# service mysqld stop
# service crond stop
# service httpd stop
# killall -9 nagios
# killall -9 ndo2db
# rm -f /usr/local/nagios/var/rw/nagios.cmd
# rm -f /usr/local/nagios/var/nagios.lock
# rm -f /usr/local/nagios/var/ndo.sock
# rm -f /usr/local/nagios/var/ndo2db.lock
# rm -f /usr/local/nagiosxi/var/reconfigure_nagios.lock
# for i in `ipcs -q | grep nagios |awk '{print $2}'`; do ipcrm -q $i; done
# service mysqld start
# service ndo2db start
# service nagios start
# service httpd start
# service crond start
If that does not resolve the issue, can you PM me your Profile? You can download it by going to Admin > System Config > System Profile and click the ***Download Profile*** button towards the top. If for whatever reason you *cannot* download the profile, please put the output of View System Info (5.3.4+, Show Profile if older) in the thread (that will at least get us some info). This will give us access to many of the logs we would otherwise ask for individually. If security is a concern, you can unzip the profile take out what you like, and then zip it up again. We may end up needing something you remove, but we can ask for that specifically.
You can also generate a profile manually using the script at /usr/local/nagiosxi/html/includes/components/profile/getprofile.sh
That should generate a profile in /usr/local/nagiosxi/var/components/ which you can get off the server with an application such as FileZilla.
After you PM the profile, please update this thread. Updating this thread is the only way for it to show back up on our dashboard.
If you get an error that PROFILE BUILD FAILED, please see https://support.nagios.com/kb/article.p ... ategory=44
-
raamardhani7
- Posts: 459
- Joined: Tue Jun 02, 2015 12:36 am
Re: Alerts are getting generated before the time limit.
Hi
We have followed the instructions given by you and obtained the following result :
[root@lussvpnagxi00 ~]# service nagios stop
service crond stop
service httpd stop
killall -9 nagios
killall -9 ndo2db
rm -f /usr/local/nagios/var/rw/nagios.cmd
rm -f /usr/local/nagios/var/nagios.lock
rm -f /usr/local/nagios/var/ndo.sock
rm -f /usr/local/nagios/var/ndo2db.lock
rm -f /usr/local/nagiosxi/var/reconfigure_nagios.lock
for i in `ipcs -q | grep nagios |awk '{print $2}'`; do ipcrm -q $i; done
service mysqld start
service ndo2db start
Stopping nagios:service nagios start
service httpd start
service crond start. done.
You have new mail in /var/spool/mail/root
[root@lussvpnagxi00 ~]# service ndo2db stop
Stopping ndo2db: done.
[root@lussvpnagxi00 ~]# service mysqld stop
Stopping mysqld: [ OK ]
[root@lussvpnagxi00 ~]# service crond stop
Stopping crond: [ OK ]
[root@lussvpnagxi00 ~]# service httpd stop
Stopping httpd: [ OK ]
[root@lussvpnagxi00 ~]# killall -9 nagios
nagios: no process killed
[root@lussvpnagxi00 ~]# killall -9 ndo2db
ndo2db: no process killed
[root@lussvpnagxi00 ~]# rm -f /usr/local/nagios/var/rw/nagios.cmd
[root@lussvpnagxi00 ~]# rm -f /usr/local/nagios/var/nagios.lock
[root@lussvpnagxi00 ~]# rm -f /usr/local/nagios/var/ndo.sock
[root@lussvpnagxi00 ~]# rm -f /usr/local/nagios/var/ndo2db.lock
[root@lussvpnagxi00 ~]# rm -f /usr/local/nagiosxi/var/reconfigure_nagios.lock
[root@lussvpnagxi00 ~]# for i in `ipcs -q | grep nagios |awk '{print $2}'`; do ipcrm -q $i; done
[root@lussvpnagxi00 ~]# service mysqld start
Starting mysqld: [ OK ]
[root@lussvpnagxi00 ~]# service ndo2db start
Starting ndo2db: done.
[root@lussvpnagxi00 ~]# service nagios start
Starting nagios: done.
[root@lussvpnagxi00 ~]# service httpd start
Starting httpd: [ OK ]
[root@lussvpnagxi00 ~]# service crond start
Starting crond: [ OK ]
[root@lussvpnagxi00 ~]# ps -ef | grep -i nagios.cfg | grep -v grep
nagios 1545 1 19 23:18 ? 00:00:03 /usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg
nagios 1671 1545 0 23:18 ? 00:00:00 /usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg
[root@lussvpnagxi00 ~]#
We tried to test this by putting one of our servers in down state ,using the following path:
Advanced settings --> Submit passive check result --> Check Result = Down && Check Output = Test
But we are not getting the desired result yet, two(out of 8) checks were performed on the server before it became up again but the time interval between two consecutive checks was less than a minute.Please find attached the screenshot of this result.
We have generated the profile, but it could not be attached here so how can we send it to you? Please share an email Id for this.
We have followed the instructions given by you and obtained the following result :
[root@lussvpnagxi00 ~]# service nagios stop
service crond stop
service httpd stop
killall -9 nagios
killall -9 ndo2db
rm -f /usr/local/nagios/var/rw/nagios.cmd
rm -f /usr/local/nagios/var/nagios.lock
rm -f /usr/local/nagios/var/ndo.sock
rm -f /usr/local/nagios/var/ndo2db.lock
rm -f /usr/local/nagiosxi/var/reconfigure_nagios.lock
for i in `ipcs -q | grep nagios |awk '{print $2}'`; do ipcrm -q $i; done
service mysqld start
service ndo2db start
Stopping nagios:service nagios start
service httpd start
service crond start. done.
You have new mail in /var/spool/mail/root
[root@lussvpnagxi00 ~]# service ndo2db stop
Stopping ndo2db: done.
[root@lussvpnagxi00 ~]# service mysqld stop
Stopping mysqld: [ OK ]
[root@lussvpnagxi00 ~]# service crond stop
Stopping crond: [ OK ]
[root@lussvpnagxi00 ~]# service httpd stop
Stopping httpd: [ OK ]
[root@lussvpnagxi00 ~]# killall -9 nagios
nagios: no process killed
[root@lussvpnagxi00 ~]# killall -9 ndo2db
ndo2db: no process killed
[root@lussvpnagxi00 ~]# rm -f /usr/local/nagios/var/rw/nagios.cmd
[root@lussvpnagxi00 ~]# rm -f /usr/local/nagios/var/nagios.lock
[root@lussvpnagxi00 ~]# rm -f /usr/local/nagios/var/ndo.sock
[root@lussvpnagxi00 ~]# rm -f /usr/local/nagios/var/ndo2db.lock
[root@lussvpnagxi00 ~]# rm -f /usr/local/nagiosxi/var/reconfigure_nagios.lock
[root@lussvpnagxi00 ~]# for i in `ipcs -q | grep nagios |awk '{print $2}'`; do ipcrm -q $i; done
[root@lussvpnagxi00 ~]# service mysqld start
Starting mysqld: [ OK ]
[root@lussvpnagxi00 ~]# service ndo2db start
Starting ndo2db: done.
[root@lussvpnagxi00 ~]# service nagios start
Starting nagios: done.
[root@lussvpnagxi00 ~]# service httpd start
Starting httpd: [ OK ]
[root@lussvpnagxi00 ~]# service crond start
Starting crond: [ OK ]
[root@lussvpnagxi00 ~]# ps -ef | grep -i nagios.cfg | grep -v grep
nagios 1545 1 19 23:18 ? 00:00:03 /usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg
nagios 1671 1545 0 23:18 ? 00:00:00 /usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg
[root@lussvpnagxi00 ~]#
We tried to test this by putting one of our servers in down state ,using the following path:
Advanced settings --> Submit passive check result --> Check Result = Down && Check Output = Test
But we are not getting the desired result yet, two(out of 8) checks were performed on the server before it became up again but the time interval between two consecutive checks was less than a minute.Please find attached the screenshot of this result.
We have generated the profile, but it could not be attached here so how can we send it to you? Please share an email Id for this.
You do not have the required permissions to view the files attached to this post.
-
dwhitfield
- Former Nagios Staff
- Posts: 4583
- Joined: Wed Sep 21, 2016 10:29 am
- Location: NoLo, Minneapolis, MN
- Contact:
Re: Alerts are getting generated before the time limit.
It looks like you were able to upload the profile, but if you want to email, the email is [email protected]
That's pretty close to a minute. As mentioned, it's unlikely to be exact. You can turn off rescheduling in your nagios.cfg, but given the size of your instance, I don't think it's going to go well.
That's pretty close to a minute. As mentioned, it's unlikely to be exact. You can turn off rescheduling in your nagios.cfg, but given the size of your instance, I don't think it's going to go well.
https://assets.nagios.com/downloads/nag ... gmain.htmlAuto-Rescheduling Option
Format: auto_reschedule_checks=<0/1>
Example: auto_reschedule_checks=1
This option determines whether or not Nagios will attempt to automatically reschedule active host and service checks to "smooth" them out over time. This can help to balance the load on the monitoring server, as it will attempt to keep the time between consecutive checks consistent, at the expense of executing checks on a more rigid schedule.