Nagios XI Service Checks Randomly Change Status Without Configuration Changes

This support forum board is for support questions relating to Nagios XI, our flagship commercial network monitoring solution.
Post Reply
MelvinAtkins
Posts: 1
Joined: Tue Dec 23, 2025 12:24 am

Nagios XI Service Checks Randomly Change Status Without Configuration Changes

Post by MelvinAtkins »

Hi everyone,

I’ve noticed that some service checks on my Nagios XI instance are changing status from OK to WARNING or CRITICAL even though I haven’t made any configuration updates and the services are functioning normally. The logs don’t show obvious errors, and the last state changes seem to happen at different intervals.

Has anyone experienced similar behavior? Could this be related to check timing, plugin versions, or resource limits on the server? I’d appreciate suggestions on how to troubleshoot or identify what’s causing these unexpected state changes.

Thanks!
cdietsch
Posts: 62
Joined: Wed Aug 06, 2025 9:12 am

Re: Nagios XI Service Checks Randomly Change Status Without Configuration Changes

Post by cdietsch »

Hello @MelvinAtkins,

This could be related to all the things you mentioned. It could also have to do with network saturation, which is what I have commonly seen in the past.

What are the hardware specs on your XI server? What XI version and OS version are you running? Is your XI server under a high amount of load?

I would recommend to try enabling flap detection on some of the service checks to see if this helps mitigate the issue.
Cheers,
- Cole
DoubleDoubleA
Posts: 286
Joined: Thu Feb 09, 2017 5:07 pm

Re: Nagios XI Service Checks Randomly Change Status Without Configuration Changes

Post by DoubleDoubleA »

Can you give an example or two of the service checks this is happening with?
lunahart6374
Posts: 1
Joined: Mon Jan 05, 2026 2:01 am

Re: Nagios XI Service Checks Randomly Change Status Without Configuration Changes

Post by lunahart6374 »

I’ve seen similar behavior on a few Nagios XI deployments, and in our case it was a combination of check timing and transient resource issues, not actual service failures. I’d recommend enabling flap detection, as suggested, and also reviewing the check execution time graphs and system performance (CPU, RAM, I/O wait) around the timestamps when the state changes occur.
Post Reply