Hi,
Found strange behaviour of Nagios checks.
For one host several checks were not working, as if disabled for several hours. The graph is in the attachment.
Tried to find something in the Event Log, but nothing suspicious. The host has about 50 checks, only few of them not worked. All other hosts were pooled successfully. This check is executed via SNMP.
Please, advice where I could look to find the reason for this behaviour.
We have about 2500 checks and a lot of hosts, the issue was only with one host.
Nagios XI version is 5.5.2. (no support).
Some checks were not working
-
goldmund84
- Posts: 42
- Joined: Mon Jan 06, 2014 6:48 am
Some checks were not working
You do not have the required permissions to view the files attached to this post.
Re: Some checks were not working
Try this, run a State History report for the Host and all of it services to see what sort of errors the plugins were generating at that time to see if there is any correlation on why some of the checks were not working.
Be sure to check out our Knowledgebase for helpful articles and solutions!
-
goldmund84
- Posts: 42
- Joined: Mon Jan 06, 2014 6:48 am
Re: Some checks were not working
Nothing suspicious in State History, just normal flow of events. And like on the graph, for the certain service there is a state change at 22:16 and the next one only next day at 15:20.
Between 22:16 and 15:20 - there is a gap with no events. But in reality, there were events and we missed the accident.
Any more thoughts?
Between 22:16 and 15:20 - there is a gap with no events. But in reality, there were events and we missed the accident.
Any more thoughts?
Re: Some checks were not working
Can you clarify what you mean be this?
If the device had issues and stopped responding to SNMP polling, that would show the issue you are seeing.
What was the last state change at 22:16?
Are you saying that the host in question had issues during that time the graph stopped?"there were events and we missed the accident."
If the device had issues and stopped responding to SNMP polling, that would show the issue you are seeing.
What was the last state change at 22:16?
Be sure to check out our Knowledgebase for helpful articles and solutions!
-
goldmund84
- Posts: 42
- Joined: Mon Jan 06, 2014 6:48 am
Re: Some checks were not working
The host didn't have issues. There was issues in services related to the host. But not on the host. The host received less traffic than usual - and that was the issue that we needed to be alerted. But everything was working on the host itself. Other SNMP checks worked correctly on the same host with no gap.
At 22:16 the response reported OK State.
At 22:16 the response reported OK State.
Re: Some checks were not working
If the services related to the host, caused the check to fail and not return performance data, that would show in the graph just like you are seeing.
No performance data returned for a check, means no data for the graph and a gap will be displayed.
Look in the archived log files. Do you see the check running for that service during the time the issue happened?
If so, post the entries so we can view them.
No performance data returned for a check, means no data for the graph and a gap will be displayed.
Look in the archived log files. Do you see the check running for that service during the time the issue happened?
Code: Select all
/usr/local/nagios/var/archives/nagios-02-26-2020-00.log
/usr/local/nagios/var/archives/nagios-02-27-2020-00.logBe sure to check out our Knowledgebase for helpful articles and solutions!
-
goldmund84
- Posts: 42
- Joined: Mon Jan 06, 2014 6:48 am
Re: Some checks were not working
Hi,
Checked those logs and there are no reported states on the affected service during the "gap period". As if somebody disabled it for the period. Is there a way to check whether it was put in the Scheduled Downtime or Acknowledged/Disabled or something else?
Checked those logs and there are no reported states on the affected service during the "gap period". As if somebody disabled it for the period. Is there a way to check whether it was put in the Scheduled Downtime or Acknowledged/Disabled or something else?
Re: Some checks were not working
You might be able to search the Audit Log in the Admin > Audit Log menu in the XI GUI.
It is an Enterprise feature so you need that license enabled.
It is an Enterprise feature so you need that license enabled.
Be sure to check out our Knowledgebase for helpful articles and solutions!
-
goldmund84
- Posts: 42
- Joined: Mon Jan 06, 2014 6:48 am
Re: Some checks were not working
Is there any method to look into audit log from console, not from GUI? Because we don't have Enterprise support.
Re: Some checks were not working
Go to the Admin > System Settings menu and see if your version has the Audit Log enabled and is so, it will show you the path to the file.
Be sure to check out our Knowledgebase for helpful articles and solutions!