Support forum for Nagios Core, Nagios Plugins, NCPA, NRPE, NSCA, NDOUtils and more. Engage with the community of users including those using the open source solutions.
I've recently upgraded one nagios box from 3.8 to 4.0.8
Yesterday and today we had patching interventions so I scheduled downtime all the services and the host, however, I was alerted just for one service when the service was down!
This happened too yesterday. Its quite odd. Any suggestions?
[b][1440500468] SERVICE DOWNTIME ALERT: ad.stage;windows_services;STARTED; Service has entered a period of scheduled downtime[/b]
[1440502528] SERVICE ALERT: ad.stage;windows_services;CRITICAL;SOFT;1;CRITICAL:
[1440502558] SERVICE ALERT: ad.stage;windows_services;OK;SOFT;2;OK: All services are in their appropriate state.
[1440502852] SERVICE ALERT: ad.stage;windows_services;CRITICAL;SOFT;1;CRITICAL:
[1440502858] SERVICE ALERT: ad.stage;windows_services;CRITICAL;SOFT;1;CRITICAL:
[1440502882] SERVICE ALERT: ad.stage;windows_services;CRITICAL;SOFT;2;CRITICAL:
[1440502888] SERVICE ALERT: ad.stage;windows_services;CRITICAL;SOFT;2;CRITICAL:
[1440502913] SERVICE ALERT: ad.stage;windows_services;CRITICAL;SOFT;3;CRITICAL:
[1440502918] SERVICE ALERT: ad.stage;windows_services;CRITICAL;SOFT;3;CRITICAL:
[1440502942] SERVICE ALERT: ad.stage;windows_services;CRITICAL;HARD;4;CRITICAL:
[1440502948] SERVICE ALERT: ad.stage;windows_services;CRITICAL;HARD;4;CRITICAL:
[b][1440502948] SERVICE NOTIFICATION: notification-email-alert;ad.stage;windows_services;CRITICAL;notify-by-email;CRITICAL:[/b]
[1440503251] SERVICE ALERT: ad.stage;windows_services;OK;HARD;4;OK: All services are in their appropriate state.
[1440503253] SERVICE ALERT: ad.stage;windows_services;OK;HARD;4;OK: All services are in their appropriate state.
[1440503253] SERVICE NOTIFICATION: notification-email-alert;ad.stage;windows_services;OK;notify-by-email;OK: All services are in their appropriate state.
[b][1440509457] SERVICE DOWNTIME ALERT: ad.stage;windows_services;STOPPED; Service has exited from a period of scheduled downtime[/b]
With the Duplicate Service Alerts happening seconds apart kind of points to that when the upgrade happened, the configs were duplicated.
Can you check the config settings and see if that happened?
Can you run the following and post the output here?
define service {
service_description windows_services
display_name Windows Services
check_command check_nrpe!check_services
use generic-service
hostgroup_name windows
_criticality medium
}
I use this template for all the nagios that I have, but just its failing in one of them.
# generic service template definition
define service{
use remote
name generic-service ; The 'name' of this service template
;active_checks_enabled 1 ; Active service checks are enabled
;passive_checks_enabled 1 ; Passive service checks are enabled/accepted
parallelize_check 1 ; Active service checks should be parallelized (disabling this can lead to major performance problems)
;obsess_over_service 1 ; We should obsess over this service (if necessary)
;check_freshness 1 ; Default is to NOT check service 'freshness'
;freshness_threshold 900
notifications_enabled 1 ; Service notifications are enabled
event_handler_enabled 1 ; Service event handler is enabled
flap_detection_enabled 1 ; Flap detection is enabled
process_perf_data 1 ; Process performance data
retain_status_information 1 ; Retain status information across program restarts
retain_nonstatus_information 1 ; Retain non-status information across program restarts
notification_interval 15 ; Only send notifications on status change by default.
is_volatile 0
;check_period 24x7
normal_check_interval 10
retry_check_interval 1
max_check_attempts 4
notification_period 24x7
notification_options u,c,r,f
#contact_groups admins
_nrpecheck check_nrpe
_criticality normal
register 0 ; DONT REGISTER THIS DEFINITION - ITS NOT A REAL SERVICE, JUST A TEMPLATE!
}
[1440745254] SERVICE DOWNTIME ALERT: windows_pet_serv;windows_services;STARTED; Service has entered a period of scheduled downtime
[1440745741] SERVICE ALERT: windows_pet_serv;windows_services;UNKNOWN;SOFT;1;CHECK_NRPE: Socket timeout after 30 seconds.
[1440745744] SERVICE ALERT: windows_pet_serv;windows_services;CRITICAL;SOFT;2;Connection refused by host
[1440745771] SERVICE ALERT: windows_pet_serv;windows_services;CRITICAL;SOFT;3;Connection refused by host
[1440745801] SERVICE ALERT: windows_pet_serv;windows_services;CRITICAL;HARD;4;Connection refused by host
[1440745818] SERVICE ALERT: windows_pet_serv;windows_services;CRITICAL;SOFT;1;Connection refused by host
[1440745863] SERVICE ALERT: windows_pet_serv;windows_services;CRITICAL;SOFT;2;CRITICAL: SERVICE1: stopped (critical), SERVICE2: stopped (critical), SERVICE3: sto pped (critical)
[1440745878] SERVICE ALERT: windows_pet_serv;windows_services;CRITICAL;SOFT;3;CRITICAL: SERVICE1: stopped (critical), SERVICE3: stopped (critical)
[1440745908] SERVICE ALERT: windows_pet_serv;windows_services;CRITICAL;HARD;4;CRITICAL: SERVICE1: stopped (critical), SERVICE3: stopped (critical)
[1440745908] SERVICE NOTIFICATION: internalmonitor-alert;windows_pet_serv;windows_services;CRITICAL;service-notify-by-email2;CRITICAL: SERVICE1: stopped (criti cal), UALSVC: stopped (critical)
[1440745908] SERVICE NOTIFICATION: internalmonitor-alert;windows_pet_serv;windows_services;CRITICAL;notify-by-email;CRITICAL: SERVICE1: stopped (critical), UALSVC: stopped (critical)
[1440746101] SERVICE ALERT: windows_pet_serv;windows_services;OK;HARD;4;OK: All services are in their appropriate state.
[1440746208] SERVICE ALERT: windows_pet_serv;windows_services;OK;HARD;4;OK: All services are in their appropriate state.
[1440746208] SERVICE NOTIFICATION: internalmonitor-alert;windows_pet_serv;windows_services;OK;service-notify-by-email2;OK: All services are in their appro priate state.
[1440746208] SERVICE NOTIFICATION: internalmonitor-alert;windows_pet_serv;windows_services;OK;notify-by-email;OK: All services are in their appropriate state.
When downtime is scheduled, receiving alerts are normal but you should not receive a notification (Email) during this time.
Did you receive an email when that service was down during the scheduled downtime?
Here is a quick description of Scheduled Downtime. https://assets.nagios.com/downloads/nag ... ntime.html
Be sure to check out our Knowledgebase for helpful articles and solutions!
That's the problem. We are getting emails when the service is in scheduled downtime, but we are just getting alerts for one of the service of the monitor (not all the services)!