Page 1 of 1

Nagios core web interface sometimes shows wrong information

Posted: Thu Jun 06, 2024 4:26 am
by wleight
Dear experts,
When I reconnect to the nagios core web interface and go to the service status for the switch I am monitoring, it shows me something like this:
Screenshot 2024-06-06 at 10.39.43 AM.png
But this represents a configuration that is now several weeks old. In the interim, I fixed the issues shown here and added a number of additional checks. Usually, after a little while the correct service status shows up:
Screenshot 2024-06-06 at 11.11.31 AM.png
Sometimes I can get it to update by clicking around on the web interface, but sometimes I just have to wait. But even once the right list of services comes up, the notifications are still sometimes wrong:
Screenshot 2024-06-06 at 11.12.10 AM.png
I tried restarting the nagios process again -- I had already restarted it after updating the configuration, of course -- but it didn't make any difference. Do you know what could be causing this?

many thanks,

will

Re: Nagios core web interface sometimes shows wrong information

Posted: Thu Jun 06, 2024 9:25 am
by gwesterman
Hi @wleight,

I found two potentially similar posts that were both resolved here and here. If that looks to be the same issue you are seeing, give their fix a try. There are a few other forum posts seemingly related but less neatly resolved.

If this is not your issue or otherwise doesn't work for you, could you look into a few things for us?
- Can you replicate this by running the command from the CLI? What does it give you?
- Has this always happened or did it get incited by a recent change?
- What is your Core version, core plugin version, and distribution?

Please let us know what you find.

Thank you!

Re: Nagios core web interface sometimes shows wrong information

Posted: Fri Jun 07, 2024 4:56 am
by wleight
Hi @gwesterman,

Thanks for your reply, but I don't think that this is the same issue I'm seeing. For instance, if you look at the first screenshot, the first error is for "NESE Link 1 Status". Currently this is configured as follows:

Code: Select all

define service {

    use                     generic-service
    host_name               atlas-rt-1-1
    service_description     NESE Link 1 Status
    check_command           check_snmp!-C <community> -o ifOperStatus.554 -r 1 -m RFC1213-MIB
}
If I try to run this from the command line, I get the correct value:

Code: Select all

root@c4c537d8bc4c:/# /opt/nagios/libexec/check_snmp -C <community> -o ifOperStatus.554 -m RFC1213-MIB -H 172.20.170.1
SNMP OK - up(1) | 
But if I go to the webpage for this particular service, it tells me:

External command error: Error in packet
Reason: (noSuchName) There is no such variable name in this MIB.
Failed object: RFC1213-MIB::ifOperStatus.573

The weird thing is that I did make a mistake the very first time I configured this: I told it to use the wrong MIB, and it gave me a corresponding error. So it seems like Nagios is somehow remembering that mistake.

The other weird thing is that the web interface changes what it shows. For instance, if I leave the service state information page for this service open, it will go back and forth between "OK" and "Unknown" with the above error, while from the command line I always get SNMP OK - up(1).

To answer your specific questions:
The CLI always gives me the same value, as shown above.
This has always happened, but "always" in this case only covers about a month, I'm still pretty new to Nagios.
I'm using the jasonrivers docker build (https://github.com/JasonRivers/Docker-Nagios) which uses Nagios Core 4.5.0 running on Ubuntu 22.04 LTS. The check_snmp version is check_snmp v2.4.10.git (nagios-plugins 2.4.10).

thanks again for your help,

will

Re: Nagios core web interface sometimes shows wrong information

Posted: Fri Jun 07, 2024 9:30 am
by gwesterman
Hi @wleight,

So apparently this MIB is particularly finicky. I found multiple results for this exact error from this exact MIB. This thread in particular seems relevant. Their issue was solved by changing the index used for ifOperStatus (and you can find the correct index using snmpwalk against the device). I am not sure if this explains why it works from the CLI but not the interface. Still worth a shot.

Probably worth restarting nagios, restarting your server, etc. as well.

Let us know what you find.

Thank you!

Re: Nagios core web interface sometimes shows wrong information

Posted: Mon Jun 17, 2024 9:42 am
by wleight
Hi @gwesterman,

Sorry for the delayed reply, I was traveling. In the end, restarting the container where Nagios was running did the trick.

thanks for your help,

will