Service Alert Error "ERROR: General time-out (Alarm signal)"

This support forum board is for support questions relating to Nagios XI, our flagship commercial network monitoring solution.
Locked
h2j
Posts: 4
Joined: Mon Jan 28, 2019 12:42 pm

Service Alert Error "ERROR: General time-out (Alarm signal)"

Post by h2j »

We have over 500 hosts spreading across a few different subnets (all Windows hosts). Since roughly the beginning of December, random host services will have an alert "ERROR: General time-out (Alarm signal)", for only specific services. For example, C:\ drive usage will report fine, but D:\ usage will have the error. Sometimes the alarm clears on it's own, other times it does not, and we have to either restart the SNMP service (several times), or reboot the server itself the resolve the alarm.

These devices are both on local subnets and some remote sites, so I don't necessarily believe bandwidth is coming in to play, especially since half the services of a particular host still report correctly. For example, I'd say 95% of hosts report correctly 100% of the time.

I've seen past threads suggesting to modify the check commands for these services, but since the command it static across all servers (except for the host variable), I don't see that as the issue.
benjaminsmith
Posts: 5324
Joined: Wed Aug 22, 2018 4:39 pm
Location: saint paul

Re: Service Alert Error "ERROR: General time-out (Alarm sign

Post by benjaminsmith »

Hi @h2j,

Looks like this is your first post, thank you for joining the Nagios Support Forum! Since this issue is happening randomly on specific services, have you tried running ping tests on those hosts to rule out this could be caused by network congestion?

Additionally, I'd like to review the logs in the system profile to make sure everything looks normal. Can you send us your system profile along with the exact name of the hosts and services having the error message? Thanks.

To send us your system profile.
Login to the Nagios XI GUI using a web browser
Click the "Admin" > "System Profile" Menu
Click the "Download Profile" button
Save the profile.zip file and share this in a private message and then reply to this post to bring it up in the queue.
As of May 25th, 2018, all communications with Nagios Enterprises and its employees are covered under our new Privacy Policy.

Be sure to check out our Knowledgebase for helpful articles and solutions!
h2j
Posts: 4
Joined: Mon Jan 28, 2019 12:42 pm

Re: Service Alert Error "ERROR: General time-out (Alarm sign

Post by h2j »

benjaminsmith wrote: have you tried running ping tests on those hosts to rule out this could be caused by network congestion?
Hi, the pings are always constant response times, no drops, even in extended pings.
benjaminsmith wrote: Save the profile.zip file and share this in a private message and then reply to this post to bring it up in the queue.

I am unable to send private messages, with the below message:

"We are sorry, but you are not authorised to use this feature. You may have just registered here and may need to participate more to be able to use this feature."
benjaminsmith
Posts: 5324
Joined: Wed Aug 22, 2018 4:39 pm
Location: saint paul

Re: Service Alert Error "ERROR: General time-out (Alarm sign

Post by benjaminsmith »

Hi,

You should be able to send it now. Please try again. Thanks.
As of May 25th, 2018, all communications with Nagios Enterprises and its employees are covered under our new Privacy Policy.

Be sure to check out our Knowledgebase for helpful articles and solutions!
h2j
Posts: 4
Joined: Mon Jan 28, 2019 12:42 pm

Re: Service Alert Error "ERROR: General time-out (Alarm sign

Post by h2j »

I've sent the profile.zip export you requested. Thanks!
benjaminsmith
Posts: 5324
Joined: Wed Aug 22, 2018 4:39 pm
Location: saint paul

Re: Service Alert Error "ERROR: General time-out (Alarm sign

Post by benjaminsmith »

Hi @h2j,

Thanks for sending over the system profile. While the profile does not contain the entire nagios archive, the timeout in the log is only happening for one host on the following services:

Code: Select all

OSCRDC01;Server Physical Memory Usage;UNKNOWN;SOFT;2;ERROR: General time-out (Alarm signal)
OSCRDC01;Server Drive C: Disk Usage;UNKNOWN;SOFT;1;ERROR: General time-out (Alarm signal)
Both of these services are using the check_snmp_storage.pl plugin. Let's bump up the default timeout on this plugin to -t 30 and see if this resolves the issue. Since the other services are without issue, it seems that this host maybe responding slower than the others.

You can change the timeout by adjusting the check command settngs for check_xi_service_snmp_win_storage. Let me know if you need instructions on modifying check commmands in Nagios XI.

Code: Select all

$USER1$/check_snmp_storage.pl -H $HOSTADDRESS$ $ARG1$ -t 30
I did notice some crashed database tables that seem to be ok now, but I would re-run database repair script.

Code: Select all

/usr/local/nagiosxi/scripts/repair_databases.sh
Lastly, I'd like to check the apply configuration command for errors, post the output to the following command:

Code: Select all

/usr/local/nagiosxi/scripts/reconfigure_nagios.sh
As of May 25th, 2018, all communications with Nagios Enterprises and its employees are covered under our new Privacy Policy.

Be sure to check out our Knowledgebase for helpful articles and solutions!
h2j
Posts: 4
Joined: Mon Jan 28, 2019 12:42 pm

Re: Service Alert Error "ERROR: General time-out (Alarm sign

Post by h2j »

I updated the timeout to 30 seconds as you indicated. It works partially now (ie. when I run it manually, it seems to fail about half the time; but when run through the default 5min check intervals it seems to be okay- no errors/warnings in about 30mins time). Although it doesn't actually seem to take the full 30 seconds before timing out (ie. will fail around 20 seconds or so). I also tried setting it to 60 seconds, but same issue. A constant ping will get me no dropped packets with an average 5ms response time.

Here is the check command including the timeout flag (I cloned the existing check_xi_service_snmp_win_storage check command in CCM):

Code: Select all

$USER1$/check_snmp_storage.pl -H $HOSTADDRESS$ $ARG1$ -t 30
benjaminsmith wrote:I did notice some crashed database tables that seem to be ok now, but I would re-run database repair script.

Code: Select all

/usr/local/nagiosxi/scripts/repair_databases.sh
This was successful and repaired a few issues that were found.
benjaminsmith wrote: Lastly, I'd like to check the apply configuration command for errors, post the output to the following command:

Code: Select all

/usr/local/nagiosxi/scripts/reconfigure_nagios.sh
See the results below:

Code: Select all

--- reset_config_perms.sh ------------
> Setting CCM script permissions
> Setting script permissions
> Setting special component script permissions
> Setting configuration file/directory permissions
> Setting perfdata directory and RRD permissions
> Setting NOM checkpoint user:group permissions
> + Setting Nagios Core corelog.newobjects user:group permissions
> + Setting CCM configuration file user:group permissions
> + Setting Recurring Downtime file user:group permissions
> + Setting BPI configuration file user:group permissions
--------------------------------------

--- ccm_import.php -------------------
> Setting import directory: /usr/local/nagios/etc/import/
> Importing config files into the CCM
  No files to import
--------------------------------------

--- ccm_export.php -------------------
> Writing CCM configuration to Nagios files
  Finished writing out configuraton
--------------------------------------

--------------------------------------
> Verifying configuration with Nagios Core
> Output:
Nagios Core 4.4.3
Copyright (c) 2009-present Nagios Core Development Team and Community Contributors
Copyright (c) 1999-2009 Ethan Galstad
Last Modified: 2019-01-15
License: GPL

Website: https://www.nagios.org
Reading configuration data...
   Read main config file okay...
   Read object config files okay...

Running pre-flight check on configuration data...

Checking objects...
        Checked 1029 services.
        Checked 649 hosts.
        Checked 24 host groups.
        Checked 19 service groups.
        Checked 12 contacts.
        Checked 4 contact groups.
        Checked 142 commands.
        Checked 19 time periods.
        Checked 0 host escalations.
        Checked 0 service escalations.
Checking for circular paths...
        Checked 649 hosts
        Checked 0 service dependencies
        Checked 246 host dependencies
        Checked 19 timeperiods
Checking global event handlers...
Checking obsessive compulsive processor commands...
Checking misc settings...

Total Warnings: 0
Total Errors:   0

Things look okay - No serious problems were detected during the pre-flight check
> Return Code: 0
--------------------------------------
Stopping nagios: .done.
Starting nagios: done.
You have new mail in /var/spool/mail/root
benjaminsmith
Posts: 5324
Joined: Wed Aug 22, 2018 4:39 pm
Location: saint paul

Re: Service Alert Error "ERROR: General time-out (Alarm sign

Post by benjaminsmith »

Hello @h2j,

The output to the reconfigure script looks good. Since ping checks are coming back good, it looks like the SNMP service for this particular Windows system is just not responding like the others. I would recommend restarting the SNMP service on this system.

The other option is to adjust the Check Settings ( intervals and max check attempts ) for this service to avoid unnecessary notifications.
As of May 25th, 2018, all communications with Nagios Enterprises and its employees are covered under our new Privacy Policy.

Be sure to check out our Knowledgebase for helpful articles and solutions!
Locked