Page 1 of 1

Azure server nagios checks

Posted: Wed Mar 14, 2018 9:21 am
by Shwele
Heya guys,

I've been having issues with one server on Azure for whole day nagios wont stop showing unknown for shiny, sshd, apache2 etc etc as you can see bellow. Tried looking if anything was out of the order, everything was fine. Restarted services that were causing the issues, rebooted the server, nothing changed.

Code: Select all

Service Unknown	2018-03-14 15:10:21	SERVICE ALERT: givingbalkans.org;sshd;UNKNOWN;SOFT;2;ERROR: Alarm signal (Nagios time-out)
Service Recovery	2018-03-14 15:10:11	SERVICE ALERT: givingbalkans.org;shiny-server;OK;SOFT;3;1 process named shiny-server (> 0)
Service Unknown	2018-03-14 15:09:27	SERVICE ALERT: givingbalkans.org;sshd;UNKNOWN;SOFT;1;ERROR: Alarm signal (Nagios time-out)
Service Unknown	2018-03-14 15:09:21	SERVICE ALERT: givingbalkans.org;shiny-server;UNKNOWN;SOFT;2;ERROR: Alarm signal (Nagios time-out)
Service Unknown	2018-03-14 15:08:21	SERVICE ALERT: givingbalkans.org;shiny-server;UNKNOWN;SOFT;1;ERROR: Alarm signal (Nagios time-out)
Service Recovery	2018-03-14 15:04:21	SERVICE ALERT: givingbalkans.org;sshd;OK;SOFT;3;2 process named sshd (> 0)
Service Unknown	2018-03-14 15:03:32	SERVICE ALERT: givingbalkans.org;sshd;UNKNOWN;SOFT;2;ERROR: Alarm signal (Nagios time-out)
Service Recovery	2018-03-14 15:02:35	SERVICE ALERT: givingbalkans.org;mysqld;OK;SOFT;3;1 process named mysqld (> 0)
Service Unknown	2018-03-14 15:02:32	SERVICE ALERT: givingbalkans.org;sshd;UNKNOWN;SOFT;1;ERROR: Alarm signal (Nagios time-out)
Service Recovery	2018-03-14 15:02:17	SERVICE ALERT: givingbalkans.org;MySQL Connection Time;OK;SOFT;2;OK - 0.21 seconds to connect as nagios
Service Unknown	2018-03-14 15:01:40	SERVICE ALERT: givingbalkans.org;mysqld;UNKNOWN;SOFT;2;ERROR: Alarm signal (Nagios time-out)
Service Warning	2018-03-14 15:01:24	SERVICE ALERT: givingbalkans.org;MySQL Connection Time;WARNING;SOFT;1;WARNING - 1.22 seconds to connect as nagios
Service Recovery	2018-03-14 15:01:13	SERVICE ALERT: givingbalkans.org;apache2;OK;SOFT;3;12 process named apache2 (> 0)
Service Unknown	2018-03-14 15:00:41	SERVICE ALERT: givingbalkans.org;mysqld;UNKNOWN;SOFT;1;ERROR: Alarm signal (Nagios time-out)
Service Unknown	2018-03-14 15:00:21	SERVICE ALERT: givingbalkans.org;apache2;UNKNOWN;SOFT;2;ERROR: Alarm signal (Nagios time-out)
Service Recovery	2018-03-14 14:59:54	SERVICE ALERT: givingbalkans.org;Swap Usage;OK;SOFT;2;Swap space: 0%used(0MB/0MB) (<80%) : OK
Service Unknown	2018-03-14 14:59:27	SERVICE ALERT: givingbalkans.org;apache2;UNKNOWN;SOFT;1;ERROR: Alarm signal (Nagios time-out)
Service Critical	2018-03-14 14:59:14	SERVICE ALERT: givingbalkans.org;Swap Usage;CRITICAL;SOFT;1;ERROR: Description/Type table : No response from remote host "13.80.152.28".
Service Recovery	2018-03-14 14:57:27	SERVICE ALERT: givingbalkans.org;sshd;OK;SOFT;2;2 process named sshd (> 0)
Service Unknown	2018-03-14 14:56:32	SERVICE ALERT: givingbalkans.org;sshd;UNKNOWN;SOFT;1;ERROR: Alarm signal (Nagios time-out)
Service Recovery	2018-03-14 14:55:35	SERVICE ALERT: givingbalkans.org;mysqld;OK;SOFT;5;1 process named mysqld (> 0)
Service Unknown	2018-03-14 14:54:40	SERVICE ALERT: givingbalkans.org;mysqld;UNKNOWN;SOFT;4;ERROR: Alarm signal (Nagios time-out)
Service Unknown	2018-03-14 14:53:40	SERVICE ALERT: givingbalkans.org;mysqld;UNKNOWN;SOFT;3;ERROR: Alarm signal (Nagios time-out)
Service Unknown	2018-03-14 14:52:42	SERVICE ALERT: givingbalkans.org;mysqld;UNKNOWN;SOFT;2;ERROR: Alarm signal (Nagios time-out)
Service Recovery	2018-03-14 14:52:18	SERVICE ALERT: givingbalkans.org;Memory Usage;OK;SOFT;2;Physical memory: 11%used(1778MB/16030MB) (<80%) : OK
Service Unknown	2018-03-14 14:51:42	SERVICE ALERT: givingbalkans.org;mysqld;UNKNOWN;SOFT;1;ERROR: Alarm signal (Nagios time-out)
Service Recovery	2018-03-14 14:51:28	SERVICE ALERT: givingbalkans.org;sshd;OK;SOFT;2;1 process named sshd (> 0)
Service Critical	2018-03-14 14:51:22	SERVICE ALERT: givingbalkans.org;Memory Usage;CRITICAL;SOFT;1;ERROR: Description/Type table : No response from remote host "13.80.152.28".
Service Unknown	2018-03-14 14:50:32	SERVICE ALERT: givingbalkans.org;sshd;UNKNOWN;SOFT;1;ERROR: Alarm signal (Nagios time-out)
Service Recovery	2018-03-14 14:46:38	SERVICE ALERT: givingbalkans.org;mysqld;OK;SOFT;2;1 process named mysqld (> 0)
Service Recovery	2018-03-14 14:46:03	SERVICE ALERT: givingbalkans.org;CPU Usage;OK;SOFT;3;4 CPU, average load 1.0% < 80% : OK
Service Unknown	2018-03-14 14:45:42	SERVICE ALERT: givingbalkans.org;mysqld;UNKNOWN;SOFT;1;ERROR: Alarm signal (Nagios time-out)
Service Recovery	2018-03-14 14:45:28	SERVICE ALERT: givingbalkans.org;sshd;OK;SOFT;2;1 process named sshd (> 0)
Service Unknown	2018-03-14 14:45:03	SERVICE ALERT: givingbalkans.org;CPU Usage;UNKNOWN;SOFT;2;No answer from host
Service Unknown	2018-03-14 14:44:34	SERVICE ALERT: givingbalkans.org;sshd;UNKNOWN;SOFT;1;ERROR: Alarm signal (Nagios time-out)
Service Unknown	2018-03-14 14:44:02	SERVICE ALERT: givingbalkans.org;CPU Usage;UNKNOWN;SOFT;1;No answer from host
Service Recovery	2018-03-14 14:43:17	SERVICE ALERT: givingbalkans.org;shiny-server;OK;SOFT;4;1 process named shiny-server (> 0)
Service Unknown	2018-03-14 14:42:21	SERVICE ALERT: givingbalkans.org;shiny-server;UNKNOWN;SOFT;3;ERROR: Alarm signal (Nagios time-out)
Service Unknown	2018-03-14 14:41:22	SERVICE ALERT: givingbalkans.org;shiny-server;UNKNOWN;SOFT;2;ERROR: Alarm signal (Nagios time-out)
Service Recovery	2018-03-14 14:40:36	SERVICE ALERT: givingbalkans.org;mysqld;OK;SOFT;4;1 process named mysqld (> 0)
Service Unknown	2018-03-14 14:40:28	SERVICE ALERT: givingbalkans.org;shiny-server;UNKNOWN;SOFT;1;ERROR: Alarm signal (Nagios time-out)
Process Information	2018-03-14 14:40:10	Auto-save of retention data completed successfully.
Service Unknown	2018-03-14 14:39:40	SERVICE ALERT: givingbalkans.org;mysqld;UNKNOWN;SOFT;3;ERROR: Alarm signal (Nagios time-out)
Service Recovery	2018-03-14 14:39:23	SERVICE ALERT: givingbalkans.org;apache2;OK;SOFT;3;12 process named apache2 (> 0)
Service Unknown	2018-03-14 14:38:40	SERVICE ALERT: givingbalkans.org;mysqld;UNKNOWN;SOFT;2;ERROR: Alarm signal (Nagios time-out)
Service Unknown	2018-03-14 14:38:29	SERVICE ALERT: givingbalkans.org;apache2;UNKNOWN;SOFT;2;ERROR: Alarm signal (Nagios time-out)
Service Unknown	2018-03-14 14:37:40	SERVICE ALERT: givingbalkans.org;mysqld;UNKNOWN;SOFT;1;ERROR: Alarm signal (Nagios time-out)
Service Unknown	2018-03-14 14:37:28	SERVICE ALERT: givingbalkans.org;apache2;UNKNOWN;SOFT;1;ERROR: Alarm signal (Nagios time-out)
Service Recovery	2018-03-14 14:35:23	SERVICE ALERT: givingbalkans.org;shiny-server;OK;SOFT;2;1 process named shiny-server (> 0)
Service Recovery	2018-03-14 14:34:30	SERVICE ALERT: givingbalkans.org;sshd;OK;SOFT;6;1 process named sshd (> 0)
Service Unknown	2018-03-14 14:34:28	SERVICE ALERT: givingbalkans.org;shiny-server;UNKNOWN;SOFT;1;ERROR: Alarm signal (Nagios time-out)
Service Unknown	2018-03-14 14:33:34	SERVICE ALERT: givingbalkans.org;sshd;UNKNOWN;SOFT;5;ERROR: Alarm signal (Nagios time-out)
Service Unknown	2018-03-14 14:32:34	SERVICE ALERT: givingbalkans.org;sshd;UNKNOWN;SOFT;4;ERROR: Alarm signal (Nagios time-out)
Service Recovery	2018-03-14 14:32:24	SERVICE ALERT: givingbalkans.org;apache2;OK;SOFT;4;12 process named apache2 (> 0)
Service Unknown	2018-03-14 14:31:40	SERVICE ALERT: givingbalkans.org;sshd;UNKNOWN;SOFT;3;ERROR: Alarm signal (Nagios time-out)
Service Unknown	2018-03-14 14:31:34	SERVICE ALERT: givingbalkans.org;apache2;UNKNOWN;SOFT;3;ERROR: Alarm signal (Nagios time-out)
Service Unknown	2018-03-14 14:30:40	SERVICE ALERT: givingbalkans.org;sshd;UNKNOWN;SOFT;2;ERROR: Alarm signal (Nagios time-out)
Service Unknown	2018-03-14 14:30:34	SERVICE ALERT: givingbalkans.org;apache2;UNKNOWN;SOFT;2;ERROR: Alarm signal (Nagios time-out)
Service Unknown	2018-03-14 14:29:40	SERVICE ALERT: givingbalkans.org;sshd;UNKNOWN;SOFT;1;ERROR: Alarm signal (Nagios time-out)
Service Unknown	2018-03-14 14:29:34	SERVICE ALERT: givingbalkans.org;apache2;UNKNOWN;SOFT;1;ERROR: Alarm signal (Nagios time-out)
Service Recovery	2018-03-14 14:24:35	SERVICE ALERT: givingbalkans.org;sshd;OK;SOFT;2;1 process named sshd (> 0)
Service Unknown	2018-03-14 14:23:40	SERVICE ALERT: givingbalkans.org;sshd;UNKNOWN;SOFT;1;ERROR: Alarm signal (Nagios time-out)
Service Recovery	2018-03-14 14:22:37	SERVICE ALERT: givingbalkans.org;mysqld;OK;SOFT;4;1 process named mysqld (> 0)
Service Unknown	2018-03-14 14:21:43	SERVICE ALERT: givingbalkans.org;mysqld;UNKNOWN;SOFT;3;ERROR: Alarm signal (Nagios time-out)
Service Unknown	2018-03-14 14:20:44	SERVICE ALERT: givingbalkans.org;mysqld;UNKNOWN;SOFT;2;ERROR: Alarm signal (Nagios time-out)
Service Recovery	2018-03-14 14:20:03	SERVICE ALERT: givingbalkans.org;/ Disk Usage;OK;SOFT;2;/: 51%used(15206MB/29715MB) (<80%) : OK
Service Unknown	2018-03-14 14:19:45	SERVICE ALERT: givingbalkans.org;mysqld;UNKNOWN;SOFT;1;ERROR: Alarm signal (Nagios time-out)
Service Recovery	2018-03-14 14:19:30	SERVICE ALERT: givingbalkans.org;apache2;OK;SOFT;2;12 process named apache2 (> 0)
Service Recovery	2018-03-14 14:19:24	SERVICE ALERT: givingbalkans.org;shiny-server;OK;SOFT;3;1 process named shiny-server (> 0)
Service Unknown	2018-03-14 14:19:09	SERVICE ALERT: givingbalkans.org;/ Disk Usage;UNKNOWN;SOFT;1;ERROR: No response from remote host "13.80.152.28" during discovery.
Service Recovery	2018-03-14 14:18:36	SERVICE ALERT: givingbalkans.org;sshd;OK;SOFT;9;1 process named sshd (> 0)
Service Unknown	2018-03-14 14:18:31	SERVICE ALERT: givingbalkans.org;apache2;UNKNOWN;SOFT;1;ERROR: Alarm signal (Nagios time-out)
Service Unknown	2018-03-14 14:18:28	SERVICE ALERT: givingbalkans.org;shiny-server;UNKNOWN;SOFT;2;ERROR: Alarm signal (Nagios time-out)
Service Unknown	2018-03-14 14:17:40	SERVICE ALERT: givingbalkans.org;sshd;UNKNOWN;SOFT;8;ERROR: Alarm signal (Nagios time-out)
Service Unknown	2018-03-14 14:17:28	SERVICE ALERT: givingbalkans.org;shiny-server;UNKNOWN;SOFT;1;ERROR: Alarm signal (Nagios time-out)
Service Unknown	2018-03-14 14:16:40	SERVICE ALERT: givingbalkans.org;sshd;UNKNOWN;SOFT;7;ERROR: Alarm signal (Nagios time-out)
Service Unknown	2018-03-14 14:15:40	SERVICE ALERT: givingbalkans.org;sshd;UNKNOWN;SOFT;6;ERROR: Alarm signal (Nagios time-out)
Service Unknown	2018-03-14 14:14:40	SERVICE ALERT: givingbalkans.org;sshd;UNKNOWN;SOFT;5;ERROR: Alarm signal (Nagios time-out)
Service Recovery	2018-03-14 14:14:30	SERVICE ALERT: givingbalkans.org;mysqld;OK;SOFT;3;1 process named mysqld (> 0)
Service Unknown	2018-03-14 14:13:40	SERVICE ALERT: givingbalkans.org;sshd;UNKNOWN;SOFT;4;ERROR: Alarm signal (Nagios time-out)
Service Unknown	2018-03-14 14:13:34	SERVICE ALERT: givingbalkans.org;mysqld;UNKNOWN;SOFT;2;ERROR: Alarm signal (Nagios time-out)
Service Unknown	2018-03-14 14:12:40	SERVICE ALERT: givingbalkans.org;sshd;UNKNOWN;SOFT;3;ERROR: Alarm signal (Nagios time-out)
Service Unknown	2018-03-14 14:12:34	SERVICE ALERT: givingbalkans.org;mysqld;UNKNOWN;SOFT;1;ERROR: Alarm signal (Nagios time-out)
Service Recovery	2018-03-14 14:12:24	SERVICE ALERT: givingbalkans.org;shiny-server;OK;SOFT;3;1 process named shiny-server (> 0)
Service Unknown	2018-03-14 14:11:40	SERVICE ALERT: givingbalkans.org;sshd;UNKNOWN;SOFT;2;ERROR: Alarm signal (Nagios time-out)
Service Unknown	2018-03-14 14:11:28	SERVICE ALERT: givingbalkans.org;shiny-server;UNKNOWN;SOFT;2;ERROR: Alarm signal (Nagios time-out)
Service Unknown	2018-03-14 14:10:40	SERVICE ALERT: givingbalkans.org;sshd;UNKNOWN;SOFT;1;ERROR: Alarm signal (Nagios time-out)
Service Unknown	2018-03-14 14:10:28	SERVICE ALERT: givingbalkans.org;shiny-server;UNKNOWN;SOFT;1;ERROR: Alarm signal (Nagios time-out)
Service Recovery	2018-03-14 14:08:26	SERVICE ALERT: givingbalkans.org;apache2;OK;SOFT;2;12 process named apache2 (> 0)
Service Unknown	2018-03-14 14:07:43	SERVICE ALERT: givingbalkans.org;apache2;UNKNOWN;SOFT;1;ERROR: Alarm signal (Nagios time-out)
Service Recovery	2018-03-14 14:02:39	SERVICE ALERT: givingbalkans.org;apache2;OK;SOFT;2;12 process named apache2 (> 0)
Service Unknown	2018-03-14 14:01:43	SERVICE ALERT: givingbalkans.org;apache2;UNKNOWN;SOFT;1;ERROR: Alarm signal (Nagios time-out)
Service Recovery	2018-03-14 13:55:35	SERVICE ALERT: givingbalkans.org;sshd;OK;SOFT;2;1 process named sshd (> 0)
Service Unknown	2018-03-14 13:54:43	SERVICE ALERT: givingbalkans.org;sshd;UNKNOWN;SOFT;1;ERROR: Alarm signal (Nagios time-out)
Service Recovery	2018-03-14 13:52:35	SERVICE ALERT: givingbalkans.org;mysqld;OK;SOFT;2;1 process named mysqld (> 0)
Service Unknown	2018-03-14 13:51:40	SERVICE ALERT: givingbalkans.org;mysqld;UNKNOWN;SOFT;1;ERROR: Alarm signal (Nagios time-out)
Service Recovery	2018-03-14 13:50:33	SERVICE ALERT: givingbalkans.org;shiny-server;OK;SOFT;3;1 process named shiny-server (> 0)
Service Unknown	2018-03-14 13:49:37	SERVICE ALERT: givingbalkans.org;shiny-server;UNKNOWN;SOFT;2;ERROR: Alarm signal (Nagios time-out)
Service Unknown	2018-03-14 13:48:35	SERVICE ALERT: givingbalkans.org;shiny-server;UNKNOWN;SOFT;1;ERROR: Alarm signal (Nagios time-out)
Service Recovery	2018-03-14 13:46:36	SERVICE ALERT: givingbalkans.org;mysqld;OK;SOFT;8;1 process named mysqld (> 0)
Does anyone have Azure server and has issues with it or is it bug known to have with their servers?

If it persists till tomorrow, will let ya know.

Thanks in advance.

Re: Azure server nagios checks

Posted: Wed Mar 14, 2018 10:55 am
by npolovenko
Welcome back, @Shwele!
It shows unknown because they timed out. Have they always been in the unknown state? Can you show us the commands and service definitions?

Re: Azure server nagios checks

Posted: Thu Mar 15, 2018 4:08 am
by Shwele
Heya @npolovenko, nice of you that you still remember me :D

Yea they keep timing out since yesterday. That wasn't the issue before, it has started around this time yesterday. Its only small timeout so we don't get mail notification, at least it doesn't wake me in the middle of the might heh.

Sure, here are commands and service definitions:

Code: Select all

$USER1$/check_snmp_process_wizard.pl -H $HOSTADDRESS$ $ARG1$  --login=nagios --passwd=######--privpass=#######--protocols=sha,aes -r -n apache2
$USER1$/check_snmp_process_wizard.pl -H $HOSTADDRESS$ $ARG1$  --login=nagios --passwd=######--privpass=#######--protocols=sha,aes -r -n shiny-server
$USER1$/check_snmp_process_wizard.pl -H $HOSTADDRESS$ $ARG1$  --login=nagios --passwd=######--privpass=#######--protocols=sha,aes -r -n sshd
$USER1$/check_snmp_storage_wizard.pl -H $HOSTADDRESS$ $ARG1$  --login=nagios --passwd=######--privpass=#######--protocols=sha,aes -m 'Swap' -w 80 -c 90 -f
$USER1$/check_snmp_load_wizard.pl -H $HOSTADDRESS$ $ARG1$  --login=nagios --passwd=######--privpass=#######--protocols=sha,aes -w 80 -c 90 -f
$USER1$/check_snmp_process_wizard.pl -H $HOSTADDRESS$ $ARG1$  --login=nagios --passwd=######--privpass=#######--protocols=sha,aes -r -n mysqld
They are the same for 2 other hosts that are not showing issues like this.

Re: Azure server nagios checks

Posted: Thu Mar 15, 2018 11:39 am
by lmiltchev
@Shwele, these errors:
ERROR: Alarm signal (Nagios time-out)
are usually caused by snmpd not running on the remote server. Can you verify if snmpd is running and try to start/restart it?

Let us know if this fixed the issue.

Re: Azure server nagios checks

Posted: Mon Mar 19, 2018 8:48 am
by Shwele
Yea its running. For now there are no longer issues, could be that connection with the server itself was lagging for a few days.

Here is the output:

Code: Select all

echo -e "quit" | nc localhost 25
220 civicatalyst.org ESMTP Postfix (Ubuntu)
221 2.0.0 Bye
as well as telnet:

Code: Select all

telnet localhost 25
Trying 127.0.0.1...
Connected to localhost.
Escape character is '^]'.
220 civicatalyst.org ESMTP Postfix (Ubuntu)
quit
221 2.0.0 Bye
So yea, it is fixed, you can lock the thread. I will update you if the issue persists and open it.

Thanks!!

Re: Azure server nagios checks

Posted: Mon Mar 19, 2018 1:55 pm
by lmiltchev
I am glad the issue has been resolved! I am locking this post. Feel free to start a new thread if the issue resurfaces or if you have more questions. Thank you!