Page 1 of 1

Discrepancy between alerts and Error when connecting to API.

Posted: Tue Nov 05, 2024 11:24 am
by arthurkroth
Hi all,

I have recently Migrated my Nagios XI server from CentOS 7.9(EOL) to Ubuntu 22.04 LTS; everything has been working smoothly so far.

I have kept both servers running side by side. My CentOS is running Nagios XI 5.11.1, and my Ubuntu 22.04 LTS is running Nagios XI 2024R1.3.

I have noticed that most notifications are duplicated(which is the expected behaviour since I have messages from both servers being sent). Still, some notifications are being sent from one server but not the other. Sometimes, there is a gap of 10/15 minutes for the notifications of the same problem/warning to come through.

Is there any known difference between the versions mentioned regarding the Notification system? Or the way the service is monitored?


Another weird behaviour that I noted was sometimes on my new Nagios(Ubuntu 22.04 LTS/Nagios XI 2024R1.3), I have unknown service status as follows:

Code: Select all

  State: UNKNOWN
  Info:
  UNKNOWN: An error occurred connecting to API. (Connection error: [Errno -3] Temporary failure in name resolution)
  Date/Time: 04/11/2024 11:17:30
Could that happen because I'm running 2 Nagios servers using the same API? I am only running two servers side by side to check the functionality of the new server(Ubuntu 22.04) before decommissioning the old one(CentOS)


Thank you very much for your time :)

Arthur.

Re: Discrepancy between alerts and Error when connecting to API.

Posted: Wed Nov 06, 2024 2:42 pm
by jmichaelson
Hi Arthur,

Given that you're receiving some notifications but not others, it seems unlikely that it has to do specifically with the back end between both versions. If you wanted to take a deeper dive into that a support ticket might be the best option. (You're correct that the duplicated notifications would be expected since both instances of XI are still running).

I am curious about the one or the other aspect of this. Is that happening both ways? I.e., are both servers sending some notifications that the other one isn't? Or is it just one way?

As for the last weirdness, could you be a little more specific? What kind of service check is it and how is it obtaining the service status? Also, if the service check is via a name instead of an IP address, are the DNS settings identical between the two servers (both the DNS servers, search domains, and /etc/hosts)? If they are, you may want to try and recreate the host and service on the new server if its an easy matter.

Re: Discrepancy between alerts and Error when connecting to API.

Posted: Thu Nov 14, 2024 5:26 am
by arthurkroth
Hi Folks,

I wanted to provide an update regarding the issues I was facing.

It turns out that both problems were related. Most of the servers I monitored were running NCPA version 2.4.0, which is six versions behind the latest release, version 3.1.1. After updating NCPA on all my servers, I noticed a significant reduction in the discrepancies in notifications.

Additionally, while monitoring my firewall, I observed a large number of dropped packets directed to an internal IP address. Upon investigation, I discovered that this IP was assigned to an old domain controller (DC). Consequently, my new Nagios server was attempting to reach that DC for DNS resolution, which failed to resolve the servers' names, leading to the errors I was encountering:

Code: Select all

  State: UNKNOWN
  Info:
  UNKNOWN: An error occurred connecting to API. [b](Connection error: [Errno -3] Temporary failure in name resolution)[/b]
  Date/Time: 04/11/2024 11:17:30
After I updated the DNS IP to the correct domain controller, the name resolution issue was resolved.

Thanks for your time :)

Arthur.

Re: Discrepancy between alerts and Error when connecting to API.

Posted: Thu Nov 14, 2024 12:15 pm
by jsimon
Thanks for the update @arthurkroth! Glad you were able to get your issue resolved.

I'll go ahead and lock this thread.