service fails but succeed if retried manually
Posted: Tue Apr 04, 2017 6:55 am
Hi
i have a weird issue with nagios XI, like a false positive.
i wrote a small check with NRPE that verify the accessibility of a mountpoint.
i had to write a script because the mountpoints i'm checking are autofs managed, and not always mounted, and the nagios mountpoint wizard fails to monitor them correctly(even with the write test), it always fails.
only problem is, it works most of the time, but i always have around 50-70 checks that fails (and succeed if i retry manually or next time nagios checks them again)
this is creating A LOT of mails notification, making notifications useless.
what do i do wrong ? the nagios server is pretty strong (2 E5450 3GHz, 16GB ram, raid1 SAS enterprise disks) and i tried to change some of the values in nagios.cfg that i found on this forum, nothing seems to change, the event queue is like stuck at 200 (the small dashlet graph)
i also verified that my ntp server is accurate.
any help would be appreciated.
thanks
i have a weird issue with nagios XI, like a false positive.
i wrote a small check with NRPE that verify the accessibility of a mountpoint.
i had to write a script because the mountpoints i'm checking are autofs managed, and not always mounted, and the nagios mountpoint wizard fails to monitor them correctly(even with the write test), it always fails.
only problem is, it works most of the time, but i always have around 50-70 checks that fails (and succeed if i retry manually or next time nagios checks them again)
this is creating A LOT of mails notification, making notifications useless.
what do i do wrong ? the nagios server is pretty strong (2 E5450 3GHz, 16GB ram, raid1 SAS enterprise disks) and i tried to change some of the values in nagios.cfg that i found on this forum, nothing seems to change, the event queue is like stuck at 200 (the small dashlet graph)
i also verified that my ntp server is accurate.
any help would be appreciated.
thanks