Nagios Xi email alerts stopped to work
Nagios Xi email alerts stopped to work
Hello Nagios Support Team,
We are experiencing problems with email alerting on one of our Nagios Xi servers. Email alerting was working as expected foe a while but then it stopped to work.
Server is set to send notifications using SMTP and pointed to relay using standard port 25 without authentication.
Since email settings was set they haven't been changed. Sending Test email from web interface is going through and I am getting this test email, however, host and service notifications are not working.
Ping and telnet of the relay is successful, Nagios server can communicate with relay. Alert settings on hosts\services are set to send notifications 24x7, contacts are in place and configured to receive notifications.
Please assist with this email problem as this is critical for our infrastructure monitoring?
System profile:
Nagios XI Installation Profile
System:
Nagios XI Version : 5.2.5
nagxiliv02.caf.org.uk 3.10.0-327.10.1.el7.x86_64 x86_64
CentOS Linux release 7.2.1511 (Core)
Gnome Installed
Apache Information
PHP Version: 5.4.16
Agent: Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/49.0.2623.112 Safari/537.36
Server Name: nagxiliv02.caf.org.uk
Server Address: 10.120.0.22
Server Port: 443
Date/Time
PHP Timezone: Europe/London
PHP Time: Thu, 21 Apr 2016 11:27:48 +0100
System Time: Thu, 21 Apr 2016 11:27:48 +0100
Nagios XI Data
License ends in: RPSNPT
nagios (pid 21419) is running...
NPCD running (pid 956).
ndo2db (pid 1198) is running...
CPU Load 15: 0.44
Total Hosts: 21
Total Services: 190
Function 'get_base_uri' returns: https://nagxiliv02.caf.org.uk/nagiosxi/
Function 'get_base_url' returns: https://nagxiliv02.caf.org.uk/nagiosxi/
Function 'get_backend_url(internal_call=false)' returns: https://nagxiliv02.caf.org.uk/nagiosxi/ ... rofile.php
Function 'get_backend_url(internal_call=true)' returns: https://localhost/nagiosxi/backend/
Ping Test localhost
Running:
/bin/ping -c 3 localhost 2>&1
PING localhost (127.0.0.1) 56(84) bytes of data.
64 bytes from localhost (127.0.0.1): icmp_seq=1 ttl=64 time=0.104 ms
64 bytes from localhost (127.0.0.1): icmp_seq=2 ttl=64 time=0.124 ms
64 bytes from localhost (127.0.0.1): icmp_seq=3 ttl=64 time=0.127 ms
--- localhost ping statistics ---
3 packets transmitted, 3 received, 0% packet loss, time 2000ms
rtt min/avg/max/mdev = 0.104/0.118/0.127/0.013 ms
Test wget To localhost
WGET From URL: https://localhost/nagiosxi/includes/components/ccm/
Running:
/usr/bin/wget https://localhost/nagiosxi/includes/components/ccm/
--2016-04-21 11:27:50-- https://localhost/nagiosxi/includes/components/ccm/
Resolving localhost (localhost)... ::1, 127.0.0.1
Connecting to localhost (localhost)|::1|:443... connected.
ERROR: cannot verify localhost's certificate, issued by '/C=UK/ST=Kent/L=West Malling/O=CAF/OU=IT/CN=nagxiliv01/emailAddress=[email protected]':
Self-signed certificate encountered.
ERROR: certificate common name 'nagxiliv01' doesn't match requested host name 'localhost'.
To connect to localhost insecurely, use `--no-check-certificate'.
Network Settings
1: lo: mtu 65536 qdisc noqueue state UNKNOWN
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 scope host lo
valid_lft forever preferred_lft forever
inet6 ::1/128 scope host
valid_lft forever preferred_lft forever
2: ens160: mtu 1500 qdisc mq state UP qlen 1000
link/ether 00:50:56:8a:4a:d2 brd ff:ff:ff:ff:ff:ff
inet 10.120.0.22/16 brd 10.120.255.255 scope global ens160
valid_lft forever preferred_lft forever
inet6 fe80::250:56ff:fe8a:4ad2/64 scope link
valid_lft forever preferred_lft forever
default via 10.120.0.3 dev ens160 proto static metric 100
10.120.0.0/16 dev ens160 proto kernel scope link src 10.120.0.22 metric 100
We are experiencing problems with email alerting on one of our Nagios Xi servers. Email alerting was working as expected foe a while but then it stopped to work.
Server is set to send notifications using SMTP and pointed to relay using standard port 25 without authentication.
Since email settings was set they haven't been changed. Sending Test email from web interface is going through and I am getting this test email, however, host and service notifications are not working.
Ping and telnet of the relay is successful, Nagios server can communicate with relay. Alert settings on hosts\services are set to send notifications 24x7, contacts are in place and configured to receive notifications.
Please assist with this email problem as this is critical for our infrastructure monitoring?
System profile:
Nagios XI Installation Profile
System:
Nagios XI Version : 5.2.5
nagxiliv02.caf.org.uk 3.10.0-327.10.1.el7.x86_64 x86_64
CentOS Linux release 7.2.1511 (Core)
Gnome Installed
Apache Information
PHP Version: 5.4.16
Agent: Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/49.0.2623.112 Safari/537.36
Server Name: nagxiliv02.caf.org.uk
Server Address: 10.120.0.22
Server Port: 443
Date/Time
PHP Timezone: Europe/London
PHP Time: Thu, 21 Apr 2016 11:27:48 +0100
System Time: Thu, 21 Apr 2016 11:27:48 +0100
Nagios XI Data
License ends in: RPSNPT
nagios (pid 21419) is running...
NPCD running (pid 956).
ndo2db (pid 1198) is running...
CPU Load 15: 0.44
Total Hosts: 21
Total Services: 190
Function 'get_base_uri' returns: https://nagxiliv02.caf.org.uk/nagiosxi/
Function 'get_base_url' returns: https://nagxiliv02.caf.org.uk/nagiosxi/
Function 'get_backend_url(internal_call=false)' returns: https://nagxiliv02.caf.org.uk/nagiosxi/ ... rofile.php
Function 'get_backend_url(internal_call=true)' returns: https://localhost/nagiosxi/backend/
Ping Test localhost
Running:
/bin/ping -c 3 localhost 2>&1
PING localhost (127.0.0.1) 56(84) bytes of data.
64 bytes from localhost (127.0.0.1): icmp_seq=1 ttl=64 time=0.104 ms
64 bytes from localhost (127.0.0.1): icmp_seq=2 ttl=64 time=0.124 ms
64 bytes from localhost (127.0.0.1): icmp_seq=3 ttl=64 time=0.127 ms
--- localhost ping statistics ---
3 packets transmitted, 3 received, 0% packet loss, time 2000ms
rtt min/avg/max/mdev = 0.104/0.118/0.127/0.013 ms
Test wget To localhost
WGET From URL: https://localhost/nagiosxi/includes/components/ccm/
Running:
/usr/bin/wget https://localhost/nagiosxi/includes/components/ccm/
--2016-04-21 11:27:50-- https://localhost/nagiosxi/includes/components/ccm/
Resolving localhost (localhost)... ::1, 127.0.0.1
Connecting to localhost (localhost)|::1|:443... connected.
ERROR: cannot verify localhost's certificate, issued by '/C=UK/ST=Kent/L=West Malling/O=CAF/OU=IT/CN=nagxiliv01/emailAddress=[email protected]':
Self-signed certificate encountered.
ERROR: certificate common name 'nagxiliv01' doesn't match requested host name 'localhost'.
To connect to localhost insecurely, use `--no-check-certificate'.
Network Settings
1: lo: mtu 65536 qdisc noqueue state UNKNOWN
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 scope host lo
valid_lft forever preferred_lft forever
inet6 ::1/128 scope host
valid_lft forever preferred_lft forever
2: ens160: mtu 1500 qdisc mq state UP qlen 1000
link/ether 00:50:56:8a:4a:d2 brd ff:ff:ff:ff:ff:ff
inet 10.120.0.22/16 brd 10.120.255.255 scope global ens160
valid_lft forever preferred_lft forever
inet6 fe80::250:56ff:fe8a:4ad2/64 scope link
valid_lft forever preferred_lft forever
default via 10.120.0.3 dev ens160 proto static metric 100
10.120.0.0/16 dev ens160 proto kernel scope link src 10.120.0.22 metric 100
You do not have the required permissions to view the files attached to this post.
Re: Nagios Xi email alerts stopped to work
Can you run /usr/local/nagiosxi/scripts/repair/repair_databases.sh and see if that helps? Usually if they stop out of no where it's related to SQL.
If that's not it, can you PM over your profile for me to take a look at? (Admin -> System Profile -> Download Profile)
If that's not it, can you PM over your profile for me to take a look at? (Admin -> System Profile -> Download Profile)
Former Nagios Employee
Re: Nagios Xi email alerts stopped to work
I have run the DB repair script on Friday and left Nagios to run over the weekend. The script run OK just 1 message at the end was stating ERROR (see attachment). Today I have checked my email for alerts and there was some but these alerts was for NON existing (old) host/service which was deleted fom Nagios long time ago. It looks like Nagios is using rolled back (repaired) DB but not current existing DB and that is really confusing. That raised a question - where is current DB and where new data is stored? Is there are any other ways how to repair Nagios alerts?
You do not have the required permissions to view the files attached to this post.
Re: Nagios Xi email alerts stopped to work
I'm posting our Nagios Xi System Profile zip for troubleshooting.
Last edited by rkennedy on Mon Apr 25, 2016 10:48 am, edited 1 time in total.
Reason: Deleted profile as it may contain sensitive information.
Reason: Deleted profile as it may contain sensitive information.
Re: Nagios Xi email alerts stopped to work
Code: Select all
160406 9:31:14 [ERROR] /usr/libexec/mysqld: Table './nagios/nagios_servicestatus' is marked as crashed and should be repaired
160406 9:31:14 [ERROR] /usr/libexec/mysqld: Table './nagios/nagios_servicestatus' is marked as crashed and should be repaired
160406 9:31:14 [Note] /usr/libexec/mysqld: Normal shutdown
How many CPU's do you have allocated to this machine? Your load looks abnormally high.
Additionally, I noticed -
Code: Select all
nagios 3970 2.2 0.0 144884 10828 ? S 11:45 0:00 /usr/bin/perl -w /usr/local/nagios/libexec/check_ifoperstatus -H 172.18.21.1 -C ######## -k 10625
Former Nagios Employee
Re: Nagios Xi email alerts stopped to work
Currently we have 2 vCPU allocated for Nagios host and 4GB of memory. If CPU is the problem we can increase to 4 at any time. I will repair mysql tables tomorrow as office hours in London are over. In terms if SNMP we are not using any SNMP check at this point (have to double check). This Nagios instance is monitoring only 21 host and it shouldn't be a CPU issue because it is not much load on the server. Apart mysql table repair, would you suggest to add more CPU, Memory, Disk space?
I will check your reply tomorrow as soon I will be in office, make necessary repair work and post results.
Many Thanks!
I will check your reply tomorrow as soon I will be in office, make necessary repair work and post results.
Many Thanks!
Re: Nagios Xi email alerts stopped to work
I would make sure you repair the tables before you do anything. I've seen this cause major performance issues.
Former Nagios Employee.
me.
me.
- Box293
- Too Basu
- Posts: 5126
- Joined: Sun Feb 07, 2010 10:55 pm
- Location: Deniliquin, Australia
- Contact:
Re: Nagios Xi email alerts stopped to work
What is the output of:
Code: Select all
free -mAs of May 25th, 2018, all communications with Nagios Enterprises and its employees are covered under our new Privacy Policy.
Re: Nagios Xi email alerts stopped to work
The output from free -m:
You do not have the required permissions to view the files attached to this post.
Re: Nagios Xi email alerts stopped to work
I have run the repair script which seems to repair database but emails alerts still not sent
Also, I have performed manual table repair as described in the Repairing_The_Nagios_XI_Database.pdf document and still no luck.
Attaching Profile zip file.
OK. Let's narrow it a bit. I have found some error running repairmysql.sh script for nagiosxi database and it is as follows:
Also, I have performed manual table repair as described in the Repairing_The_Nagios_XI_Database.pdf document and still no luck.
Attaching Profile zip file.
OK. Let's narrow it a bit. I have found some error running repairmysql.sh script for nagiosxi database and it is as follows:
You do not have the required permissions to view the files attached to this post.