Page 1 of 2

Nagios UI is saying something is down when it is not

Posted: Thu Mar 12, 2020 9:38 am
by Alan
Hello, I have two APC units and a camera that Nagios is monitoring. In the Nagios UI it shows down for the status and if I go into the Host State Information it shows Host Status down. I have had this in the past but after a day or so it changes back to up. But it was never down. I am not sure if there is something weird in my config file that i set incorrectly. I Have attached several screen shots to show what I am seeing.

Re: Nagios UI is saying something is down when it is not

Posted: Thu Mar 12, 2020 3:25 pm
by scottwilkerson
Can you share the configuration you have for the APC1 host and any underlying templates and commands?

Re: Nagios UI is saying something is down when it is not

Posted: Fri Mar 13, 2020 10:10 am
by Alan
One thing I just noticed was the host has generic-switch and the define service has generic-service. I am not sure if this could be the issue?

Code: Select all

define host{
        use             generic-switch
        host_name       APC1
        alias           APC1
        address         172.17.20.100
        hostgroups      DataCenter
        }
		
define service{
        use                     generic-service
        host_name               APC1
        service_description     PING
        check_command           check_ping!200.0,20%!600.0,60%
        normal_check_interval   5
        retry_check_interval    1
        }
		
define service {
    use                 generic-service ; Inherit values from a template
    host_name           APC1
    service_description Air Filter Run Hours
    check_command       check_snmp!-C CMPublic -o enterprises.318.1.1.13.3.4.1.3.1.0
}

define service {
    use                 generic-service ; Inherit values from a template
    host_name           APC1
    service_description Air Flow
    freshness_threshold 0
    check_command       check_snmp!-C CMPublic -o .1.3.6.1.4.1.318.1.1.13.3.4.1.2.4.0
}

define service {
    use                 generic-service ; Inherit values from a template
    host_name           APC1
    service_description Compressor Status
    check_command       check_snmp!-C CMPublic -o .1.3.6.1.4.1.318.1.1.13.3.4.1.2.32.0
}

define service {
    use                 generic-service ; Inherit values from a template
    host_name           APC1
    service_description Rack Inlet Temp
    check_command       check_snmp!-C CMPublic -o .1.3.6.1.4.1.318.1.1.13.3.4.1.2.6.0
}

Code: Select all

###############################################################################
# TEMPLATES.CFG - SAMPLE OBJECT TEMPLATES
#
#
# NOTES: This config file provides you with some example object definition
#        templates that are refered by other host, service, contact, etc.
#        definitions in other config files.
#       
#        You don't need to keep these definitions in a separate file from your
#        other object definitions.  This has been done just to make things
#        easier to understand.
#
###############################################################################

# Generic service definition template - This is NOT a real service, just a template!

define service{
        name                            generic-service 	; The 'name' of this service template
        active_checks_enabled           1       		; Active service checks are enabled
        passive_checks_enabled          1    		   	; Passive service checks are enabled/accepted
        parallelize_check               1       		; Active service checks should be parallelized (disabling this can lead to major performance problems)
        obsess_over_service             1       		; We should obsess over this service (if necessary)
        check_freshness                 0       		; Default is to NOT check service 'freshness'
        notifications_enabled           1       		; Service notifications are enabled
        event_handler_enabled           1       		; Service event handler is enabled
        flap_detection_enabled          1       		; Flap detection is enabled
        process_perf_data               1       		; Process performance data
        retain_status_information       1       		; Retain status information across program restarts
        retain_nonstatus_information    1       		; Retain non-status information across program restarts
        is_volatile                     0       		; The service is not volatile
        check_period                    24x7			; The service can be checked at any time of the day
        max_check_attempts              3			; Re-check the service up to 3 times in order to determine its final (hard) state
        normal_check_interval           10			; Check the service every 10 minutes under normal conditions
        retry_check_interval            2			; Re-check the service every two minutes until a hard state can be determined
        contact_groups                  admins			; Notifications get sent out to everyone in the 'admins' group
	notification_options		w,u,c,r			; Send notifications about warning, unknown, critical, and recovery events
        notification_interval           60			; Re-notify about service problems every hour
        notification_period             24x7			; Notifications can be sent out at any time
         register                        0       		; DONT REGISTER THIS DEFINITION - ITS NOT A REAL SERVICE, JUST A TEMPLATE!
        }
		
# Define a template for switches that we can reuse

define host{
	name			    generic-switch	; The name of this host template
	use			    generic-host	; Inherit default values from the generic-host template
	check_period		    24x7		    ; By default, switches are monitored round the clock
	check_interval		    0.3				; Switches are checked every 5 minutes
	retry_interval		    1				; Schedule host check retries at 1 minute intervals
	max_check_attempts	    2				; Check each switch 10 times (max)
	check_command		    check-host-alive	; Default command to check if routers are "alive"
	notification_period	    24x7			; Send notifications at any time
	notification_interval	    30				; Resend notifications every 30 minutes
	notification_options	    d,r,u				; Only send notifications for specific host states
	contact_groups		    admins,calls		; Notifications get sent to the admins by default
	register		    0				; DONT REGISTER THIS - ITS JUST A TEMPLATE
	}

Re: Nagios UI is saying something is down when it is not

Posted: Fri Mar 13, 2020 4:23 pm
by scottwilkerson
In looking at the bottom screenshot above I noticed that Last Check Time is 06-05-2031

Is the date way off on this machine?

If it was and you fixed it, you may need to force an immediate check to reset this

Re: Nagios UI is saying something is down when it is not

Posted: Mon Mar 16, 2020 11:09 am
by Alan
So I looked at the date and time on the CentOS server and it is showing correct. I put a screen shot of the time and this is what is shows when SSH into the server:

Code: Select all

[alan@Svr-Monitor ~]$ date
Mon Mar 16 09:03:39 PDT 2020
[alan@Svr-Monitor ~]$ 
How do I force an immediate check to reset it?

Re: Nagios UI is saying something is down when it is not

Posted: Mon Mar 16, 2020 11:15 am
by Alan
I did look at a server that shows up and the date is at least in 2020. I checked all three devices that say down and all of them have the date Last State Change: 06-05-2031 19:19:20

I also look at several other devices that are up and they have the date Last State Change: 03-02-2020 09:31:29. Not sure why those three have that weird date.

Re: Nagios UI is saying something is down when it is not

Posted: Mon Mar 16, 2020 3:26 pm
by scottwilkerson
Can you go to each of these and under the "Commands" box click "Re-schedule the next check of this host"
Then Click "Commit"

Re: Nagios UI is saying something is down when it is not

Posted: Tue Mar 17, 2020 1:33 pm
by Alan
I apologize but I have not been able to find where that is. Is this in the Nagios UI?

Re: Nagios UI is saying something is down when it is not

Posted: Tue Mar 17, 2020 1:41 pm
by scottwilkerson
Alan wrote:I apologize but I have not been able to find where that is. Is this in the Nagios UI?
Yes in your user added to these lines in the cgi.cfg?

Code: Select all

authorized_for_all_host_commands=
authorized_for_all_hosts=
authorized_for_all_service_commands=
authorized_for_all_services=
authorized_for_configuration_information=
authorized_for_system_commands=
authorized_for_system_information=

Re: Nagios UI is saying something is down when it is not

Posted: Tue Mar 17, 2020 4:48 pm
by Alan
I do have the username added to all of these in the cgi.cfg file. It is just the nagiosadmin user.