Nagios UI is saying something is down when it is not
Nagios UI is saying something is down when it is not
Hello, I have two APC units and a camera that Nagios is monitoring. In the Nagios UI it shows down for the status and if I go into the Host State Information it shows Host Status down. I have had this in the past but after a day or so it changes back to up. But it was never down. I am not sure if there is something weird in my config file that i set incorrectly. I Have attached several screen shots to show what I am seeing.
-
scottwilkerson
- DevOps Engineer
- Posts: 19396
- Joined: Tue Nov 15, 2011 3:11 pm
- Location: Nagios Enterprises
- Contact:
Re: Nagios UI is saying something is down when it is not
Can you share the configuration you have for the APC1 host and any underlying templates and commands?
Re: Nagios UI is saying something is down when it is not
One thing I just noticed was the host has generic-switch and the define service has generic-service. I am not sure if this could be the issue?
Code: Select all
define host{
use generic-switch
host_name APC1
alias APC1
address 172.17.20.100
hostgroups DataCenter
}
define service{
use generic-service
host_name APC1
service_description PING
check_command check_ping!200.0,20%!600.0,60%
normal_check_interval 5
retry_check_interval 1
}
define service {
use generic-service ; Inherit values from a template
host_name APC1
service_description Air Filter Run Hours
check_command check_snmp!-C CMPublic -o enterprises.318.1.1.13.3.4.1.3.1.0
}
define service {
use generic-service ; Inherit values from a template
host_name APC1
service_description Air Flow
freshness_threshold 0
check_command check_snmp!-C CMPublic -o .1.3.6.1.4.1.318.1.1.13.3.4.1.2.4.0
}
define service {
use generic-service ; Inherit values from a template
host_name APC1
service_description Compressor Status
check_command check_snmp!-C CMPublic -o .1.3.6.1.4.1.318.1.1.13.3.4.1.2.32.0
}
define service {
use generic-service ; Inherit values from a template
host_name APC1
service_description Rack Inlet Temp
check_command check_snmp!-C CMPublic -o .1.3.6.1.4.1.318.1.1.13.3.4.1.2.6.0
}Code: Select all
###############################################################################
# TEMPLATES.CFG - SAMPLE OBJECT TEMPLATES
#
#
# NOTES: This config file provides you with some example object definition
# templates that are refered by other host, service, contact, etc.
# definitions in other config files.
#
# You don't need to keep these definitions in a separate file from your
# other object definitions. This has been done just to make things
# easier to understand.
#
###############################################################################
# Generic service definition template - This is NOT a real service, just a template!
define service{
name generic-service ; The 'name' of this service template
active_checks_enabled 1 ; Active service checks are enabled
passive_checks_enabled 1 ; Passive service checks are enabled/accepted
parallelize_check 1 ; Active service checks should be parallelized (disabling this can lead to major performance problems)
obsess_over_service 1 ; We should obsess over this service (if necessary)
check_freshness 0 ; Default is to NOT check service 'freshness'
notifications_enabled 1 ; Service notifications are enabled
event_handler_enabled 1 ; Service event handler is enabled
flap_detection_enabled 1 ; Flap detection is enabled
process_perf_data 1 ; Process performance data
retain_status_information 1 ; Retain status information across program restarts
retain_nonstatus_information 1 ; Retain non-status information across program restarts
is_volatile 0 ; The service is not volatile
check_period 24x7 ; The service can be checked at any time of the day
max_check_attempts 3 ; Re-check the service up to 3 times in order to determine its final (hard) state
normal_check_interval 10 ; Check the service every 10 minutes under normal conditions
retry_check_interval 2 ; Re-check the service every two minutes until a hard state can be determined
contact_groups admins ; Notifications get sent out to everyone in the 'admins' group
notification_options w,u,c,r ; Send notifications about warning, unknown, critical, and recovery events
notification_interval 60 ; Re-notify about service problems every hour
notification_period 24x7 ; Notifications can be sent out at any time
register 0 ; DONT REGISTER THIS DEFINITION - ITS NOT A REAL SERVICE, JUST A TEMPLATE!
}
# Define a template for switches that we can reuse
define host{
name generic-switch ; The name of this host template
use generic-host ; Inherit default values from the generic-host template
check_period 24x7 ; By default, switches are monitored round the clock
check_interval 0.3 ; Switches are checked every 5 minutes
retry_interval 1 ; Schedule host check retries at 1 minute intervals
max_check_attempts 2 ; Check each switch 10 times (max)
check_command check-host-alive ; Default command to check if routers are "alive"
notification_period 24x7 ; Send notifications at any time
notification_interval 30 ; Resend notifications every 30 minutes
notification_options d,r,u ; Only send notifications for specific host states
contact_groups admins,calls ; Notifications get sent to the admins by default
register 0 ; DONT REGISTER THIS - ITS JUST A TEMPLATE
}-
scottwilkerson
- DevOps Engineer
- Posts: 19396
- Joined: Tue Nov 15, 2011 3:11 pm
- Location: Nagios Enterprises
- Contact:
Re: Nagios UI is saying something is down when it is not
In looking at the bottom screenshot above I noticed that Last Check Time is 06-05-2031
Is the date way off on this machine?
If it was and you fixed it, you may need to force an immediate check to reset this
Is the date way off on this machine?
If it was and you fixed it, you may need to force an immediate check to reset this
Re: Nagios UI is saying something is down when it is not
So I looked at the date and time on the CentOS server and it is showing correct. I put a screen shot of the time and this is what is shows when SSH into the server:
How do I force an immediate check to reset it?
Code: Select all
[alan@Svr-Monitor ~]$ date
Mon Mar 16 09:03:39 PDT 2020
[alan@Svr-Monitor ~]$ Re: Nagios UI is saying something is down when it is not
I did look at a server that shows up and the date is at least in 2020. I checked all three devices that say down and all of them have the date Last State Change: 06-05-2031 19:19:20
I also look at several other devices that are up and they have the date Last State Change: 03-02-2020 09:31:29. Not sure why those three have that weird date.
I also look at several other devices that are up and they have the date Last State Change: 03-02-2020 09:31:29. Not sure why those three have that weird date.
-
scottwilkerson
- DevOps Engineer
- Posts: 19396
- Joined: Tue Nov 15, 2011 3:11 pm
- Location: Nagios Enterprises
- Contact:
Re: Nagios UI is saying something is down when it is not
Can you go to each of these and under the "Commands" box click "Re-schedule the next check of this host"
Then Click "Commit"
Then Click "Commit"
Re: Nagios UI is saying something is down when it is not
I apologize but I have not been able to find where that is. Is this in the Nagios UI?
-
scottwilkerson
- DevOps Engineer
- Posts: 19396
- Joined: Tue Nov 15, 2011 3:11 pm
- Location: Nagios Enterprises
- Contact:
Re: Nagios UI is saying something is down when it is not
Yes in your user added to these lines in the cgi.cfg?Alan wrote:I apologize but I have not been able to find where that is. Is this in the Nagios UI?
Code: Select all
authorized_for_all_host_commands=
authorized_for_all_hosts=
authorized_for_all_service_commands=
authorized_for_all_services=
authorized_for_configuration_information=
authorized_for_system_commands=
authorized_for_system_information=Re: Nagios UI is saying something is down when it is not
I do have the username added to all of these in the cgi.cfg file. It is just the nagiosadmin user.