Nagios UI is saying something is down when it is not

Support forum for Nagios Core, Nagios Plugins, NCPA, NRPE, NSCA, NDOUtils and more. Engage with the community of users including those using the open source solutions.
Alan
Posts: 86
Joined: Wed Aug 21, 2019 4:14 pm

Nagios UI is saying something is down when it is not

Post by Alan »

Hello, I have two APC units and a camera that Nagios is monitoring. In the Nagios UI it shows down for the status and if I go into the Host State Information it shows Host Status down. I have had this in the past but after a day or so it changes back to up. But it was never down. I am not sure if there is something weird in my config file that i set incorrectly. I Have attached several screen shots to show what I am seeing.
Attachments
Service Overview for all host groups (2).png
Pinging.png
Host State information.png
scottwilkerson
DevOps Engineer
Posts: 19396
Joined: Tue Nov 15, 2011 3:11 pm
Location: Nagios Enterprises
Contact:

Re: Nagios UI is saying something is down when it is not

Post by scottwilkerson »

Can you share the configuration you have for the APC1 host and any underlying templates and commands?
Former Nagios employee
Creator:
Human Design Website
Get Your Human Design Chart
Alan
Posts: 86
Joined: Wed Aug 21, 2019 4:14 pm

Re: Nagios UI is saying something is down when it is not

Post by Alan »

One thing I just noticed was the host has generic-switch and the define service has generic-service. I am not sure if this could be the issue?

Code: Select all

define host{
        use             generic-switch
        host_name       APC1
        alias           APC1
        address         172.17.20.100
        hostgroups      DataCenter
        }
		
define service{
        use                     generic-service
        host_name               APC1
        service_description     PING
        check_command           check_ping!200.0,20%!600.0,60%
        normal_check_interval   5
        retry_check_interval    1
        }
		
define service {
    use                 generic-service ; Inherit values from a template
    host_name           APC1
    service_description Air Filter Run Hours
    check_command       check_snmp!-C CMPublic -o enterprises.318.1.1.13.3.4.1.3.1.0
}

define service {
    use                 generic-service ; Inherit values from a template
    host_name           APC1
    service_description Air Flow
    freshness_threshold 0
    check_command       check_snmp!-C CMPublic -o .1.3.6.1.4.1.318.1.1.13.3.4.1.2.4.0
}

define service {
    use                 generic-service ; Inherit values from a template
    host_name           APC1
    service_description Compressor Status
    check_command       check_snmp!-C CMPublic -o .1.3.6.1.4.1.318.1.1.13.3.4.1.2.32.0
}

define service {
    use                 generic-service ; Inherit values from a template
    host_name           APC1
    service_description Rack Inlet Temp
    check_command       check_snmp!-C CMPublic -o .1.3.6.1.4.1.318.1.1.13.3.4.1.2.6.0
}

Code: Select all

###############################################################################
# TEMPLATES.CFG - SAMPLE OBJECT TEMPLATES
#
#
# NOTES: This config file provides you with some example object definition
#        templates that are refered by other host, service, contact, etc.
#        definitions in other config files.
#       
#        You don't need to keep these definitions in a separate file from your
#        other object definitions.  This has been done just to make things
#        easier to understand.
#
###############################################################################

# Generic service definition template - This is NOT a real service, just a template!

define service{
        name                            generic-service 	; The 'name' of this service template
        active_checks_enabled           1       		; Active service checks are enabled
        passive_checks_enabled          1    		   	; Passive service checks are enabled/accepted
        parallelize_check               1       		; Active service checks should be parallelized (disabling this can lead to major performance problems)
        obsess_over_service             1       		; We should obsess over this service (if necessary)
        check_freshness                 0       		; Default is to NOT check service 'freshness'
        notifications_enabled           1       		; Service notifications are enabled
        event_handler_enabled           1       		; Service event handler is enabled
        flap_detection_enabled          1       		; Flap detection is enabled
        process_perf_data               1       		; Process performance data
        retain_status_information       1       		; Retain status information across program restarts
        retain_nonstatus_information    1       		; Retain non-status information across program restarts
        is_volatile                     0       		; The service is not volatile
        check_period                    24x7			; The service can be checked at any time of the day
        max_check_attempts              3			; Re-check the service up to 3 times in order to determine its final (hard) state
        normal_check_interval           10			; Check the service every 10 minutes under normal conditions
        retry_check_interval            2			; Re-check the service every two minutes until a hard state can be determined
        contact_groups                  admins			; Notifications get sent out to everyone in the 'admins' group
	notification_options		w,u,c,r			; Send notifications about warning, unknown, critical, and recovery events
        notification_interval           60			; Re-notify about service problems every hour
        notification_period             24x7			; Notifications can be sent out at any time
         register                        0       		; DONT REGISTER THIS DEFINITION - ITS NOT A REAL SERVICE, JUST A TEMPLATE!
        }
		
# Define a template for switches that we can reuse

define host{
	name			    generic-switch	; The name of this host template
	use			    generic-host	; Inherit default values from the generic-host template
	check_period		    24x7		    ; By default, switches are monitored round the clock
	check_interval		    0.3				; Switches are checked every 5 minutes
	retry_interval		    1				; Schedule host check retries at 1 minute intervals
	max_check_attempts	    2				; Check each switch 10 times (max)
	check_command		    check-host-alive	; Default command to check if routers are "alive"
	notification_period	    24x7			; Send notifications at any time
	notification_interval	    30				; Resend notifications every 30 minutes
	notification_options	    d,r,u				; Only send notifications for specific host states
	contact_groups		    admins,calls		; Notifications get sent to the admins by default
	register		    0				; DONT REGISTER THIS - ITS JUST A TEMPLATE
	}
scottwilkerson
DevOps Engineer
Posts: 19396
Joined: Tue Nov 15, 2011 3:11 pm
Location: Nagios Enterprises
Contact:

Re: Nagios UI is saying something is down when it is not

Post by scottwilkerson »

In looking at the bottom screenshot above I noticed that Last Check Time is 06-05-2031

Is the date way off on this machine?

If it was and you fixed it, you may need to force an immediate check to reset this
Former Nagios employee
Creator:
Human Design Website
Get Your Human Design Chart
Alan
Posts: 86
Joined: Wed Aug 21, 2019 4:14 pm

Re: Nagios UI is saying something is down when it is not

Post by Alan »

So I looked at the date and time on the CentOS server and it is showing correct. I put a screen shot of the time and this is what is shows when SSH into the server:

Code: Select all

[alan@Svr-Monitor ~]$ date
Mon Mar 16 09:03:39 PDT 2020
[alan@Svr-Monitor ~]$ 
How do I force an immediate check to reset it?
Attachments
CentOS Time.png
Alan
Posts: 86
Joined: Wed Aug 21, 2019 4:14 pm

Re: Nagios UI is saying something is down when it is not

Post by Alan »

I did look at a server that shows up and the date is at least in 2020. I checked all three devices that say down and all of them have the date Last State Change: 06-05-2031 19:19:20

I also look at several other devices that are up and they have the date Last State Change: 03-02-2020 09:31:29. Not sure why those three have that weird date.
Attachments
Server that is up.png
scottwilkerson
DevOps Engineer
Posts: 19396
Joined: Tue Nov 15, 2011 3:11 pm
Location: Nagios Enterprises
Contact:

Re: Nagios UI is saying something is down when it is not

Post by scottwilkerson »

Can you go to each of these and under the "Commands" box click "Re-schedule the next check of this host"
Then Click "Commit"
Former Nagios employee
Creator:
Human Design Website
Get Your Human Design Chart
Alan
Posts: 86
Joined: Wed Aug 21, 2019 4:14 pm

Re: Nagios UI is saying something is down when it is not

Post by Alan »

I apologize but I have not been able to find where that is. Is this in the Nagios UI?
scottwilkerson
DevOps Engineer
Posts: 19396
Joined: Tue Nov 15, 2011 3:11 pm
Location: Nagios Enterprises
Contact:

Re: Nagios UI is saying something is down when it is not

Post by scottwilkerson »

Alan wrote:I apologize but I have not been able to find where that is. Is this in the Nagios UI?
Yes in your user added to these lines in the cgi.cfg?

Code: Select all

authorized_for_all_host_commands=
authorized_for_all_hosts=
authorized_for_all_service_commands=
authorized_for_all_services=
authorized_for_configuration_information=
authorized_for_system_commands=
authorized_for_system_information=
Former Nagios employee
Creator:
Human Design Website
Get Your Human Design Chart
Alan
Posts: 86
Joined: Wed Aug 21, 2019 4:14 pm

Re: Nagios UI is saying something is down when it is not

Post by Alan »

I do have the username added to all of these in the cgi.cfg file. It is just the nagiosadmin user.
Locked