Page 1 of 2

freshness checks stop working periodically

Posted: Sat May 07, 2016 4:33 am
by WillemDH
Hello,

So I set up freshnesh checks for this passive service which is sending about 20 critical passive events a day. I set the timer to 300 seconds. This seems to work fine, but this is the fourth time now that this suddenly stops working. The service is no longer reset after 5 minutes.
Please check the screenshot for more information. as you can see the last critical passive check arrived at 07:00. I did a manual reset at 11:23.

Now the weird thing is that it only seems to fail with the first event after 07:00. At 07:00 there is an automatic apply configuration done with Reactor and the REST API.

Please advice how to prevent the freshnesh check from working as intended (reset critical states after 5 minutes)

Grtz

Willem

Re: freshness checks stop working periodically

Posted: Sun May 08, 2016 7:36 pm
by Box293
Can you look in your objects.cache file and post one of the service definitions that is not working correctly.

Re: freshness checks stop working periodically

Posted: Mon May 09, 2016 1:59 am
by WillemDH
Here you go: Host:

Code: Select all

define host {
        host_name       cash0001
        alias   cash0001
        address 10.54.86.128
        check_period    xi_timeperiod_24x7
        check_command   check_xi_host_ping!3000.0!20%!5000.0!80%!!!!
        contacts        steven.reenders,nagiosadmin,nagiosadmin
        contact_groups  cg_dummy
        notification_period     xi_timeperiod_24x7
        initial_state   o
        importance      0
        check_interval  5.000000
        retry_interval  1.000000
        max_check_attempts      7
        active_checks_enabled   1
        passive_checks_enabled  1
        obsess  1
        event_handler_enabled   1
        low_flap_threshold      0.000000
        high_flap_threshold     0.000000
        flap_detection_enabled  1
        flap_detection_options  a
        freshness_threshold     0
        check_freshness 0
        notification_options    a
        notifications_enabled   1
        notification_interval   1440.000000
        first_notification_delay        0.000000
        stalking_options        n
        process_perf_data       1
        icon_image      win_server.png
        statusmap_image win_server.png
        retain_status_information       1
        retain_nonstatus_information    1
        _XIWIZARD       windowsserver
        }
Service:

Code: Select all

define service {
        host_name       cash0003
        service_description     EVT_Cash_Quota
        display_name    EVT_System
        check_period    xi_timeperiod_24x7
        check_command   check_dummy!0!"Dummy check passed"!!!!!!
        contacts        steven.reynders,nagiosadmin,nagiosadmin
        contact_groups  cg_dummy
        notification_period     xi_timeperiod_24x7
        initial_state   o
        importance      0
        check_interval  1440.000000
        retry_interval  1.000000
        max_check_attempts      1
        is_volatile     0
        parallelize_check       1
        active_checks_enabled   0
        passive_checks_enabled  1
        obsess  1
        event_handler_enabled   1
        low_flap_threshold      0.000000
        high_flap_threshold     0.000000
        flap_detection_enabled  0
        flap_detection_options  a
        freshness_threshold     300
        check_freshness 1
        notification_options    a
        notifications_enabled   1
        notification_interval   1440.000000
        first_notification_delay        0.000000
        stalking_options        o,w,u,c
        process_perf_data       0
        icon_image      windowseventlog.png
        retain_status_information       1
        retain_nonstatus_information    1
        _XIWIZARD       windowseventlog
        }
Let me know if you need any more info.

Re: freshness checks stop working periodically

Posted: Mon May 09, 2016 4:23 pm
by tmcdonald
Might wanna see some logging if there's nothing sensitive:

grep "cash0001" /usr/local/nagios/var/nagios.log | tail -100

Re: freshness checks stop working periodically

Posted: Tue May 10, 2016 2:02 am
by WillemDH
This is the result for cash0001:

Code: Select all

grep "cash0001" /usr/local/nagios/var/nagios.log | tail -100
[1462831200] CURRENT HOST STATE: cash0001;DOWN;HARD;7;CRITICAL - 10.54.86.128: rta nan, lost 100%
[1462831200] CURRENT SERVICE STATE: cash0001;DRV_C_Load;OK;HARD;1;OK: Drive C: Avg of 5 samples: {Rate (Read: 0.00000MB/s)(Write: 0.01867MB/s)} {Avg Nr of (Reads: 0.00000r/s)(Writes: 2.18149w/s)} {Latency (Read: 0.00000ms)(Write: 1.77600ms)} {Queue Length (Read: 0.00000ql)(Write: 0.00580ql)}
[1462831200] CURRENT SERVICE STATE: cash0001;DRV_C_Usage;OK;HARD;1;OK: C:: Total: 55.8G - Used: 25.2G (45%) - Free: 30.6G (55%)
[1462831200] CURRENT SERVICE STATE: cash0001;EVT_Application;OK;HARD;1;OK: Dummy check passed
[1462831200] CURRENT SERVICE STATE: cash0001;EVT_Cash_Quota;OK;HARD;1;OK: Dummy check passed
[1462831200] CURRENT SERVICE STATE: cash0001;EVT_System;OK;HARD;1;OK - Manual Reset
[1462831200] CURRENT SERVICE STATE: cash0001;NET_Connections;OK;HARD;1;OK: {TCP: (Total: 00037)(Established: 7)(Listening: 30)(Time_Wait: 0)(Close_Wait: 0)(Other: 0)}{UDP: (Total: 15)}
[1462831200] CURRENT SERVICE STATE: cash0001;NET_Load;OK;HARD;1;OK: Realtek PCIe GBE Family Controller: Avg of 2 seconds: {Total Link Utilisation: 0,00012%}{Rate (Total: 0,00014 MB/sec)(Received: 0,00000 MB/sec)(Sent: 0,00014 MB/sec)}
[1462831200] CURRENT SERVICE STATE: cash0001;PRC_Tracs;OK;HARD;1;OK: All processes are running.
[1462831200] CURRENT SERVICE STATE: cash0001;SRV_CPU_Usage;OK;HARD;1;OK: 1m: 0%, 5m: 2%, 15m: 2%
[1462831200] CURRENT SERVICE STATE: cash0001;SRV_Certificates;OK;HARD;1;All certificates are OK.
[1462831200] CURRENT SERVICE STATE: cash0001;SRV_Memory;OK;HARD;1;OK: physical memory: Total: 3.39G - Used: 1.1G (32%) - Free: 2.29G (68%), paged bytes: Total: 6.78G - Used: 1.01G (14%) - Free: 5.78G (86%)
[1462831200] CURRENT SERVICE STATE: cash0001;SRV_Ping;CRITICAL;SOFT;1;CRITICAL - 10.54.86.128: rta nan, lost 100%
[1462831200] CURRENT SERVICE STATE: cash0001;SRV_Uptime;CRITICAL;SOFT;1;CHECK_NRPE: Socket timeout after 20 seconds.
[1462831200] CURRENT SERVICE STATE: cash0001;SVC_McAfee;OK;HARD;1;OK: All services are in their appropriate state.
[1462831200] CURRENT SERVICE STATE: cash0001;SVC_Windows;OK;HARD;1;OK: All services are in their appropriate state.
[1462831566] Warning: The results of service 'EVT_Cash_Quota' on host 'cash0001' are stale by 0d 5h 6m 13s (threshold=0d 0h 5m 0s).  I'm forcing an immediate check of the service.
[1462856484] Warning: The results of service 'EVT_Cash_Quota' on host 'cash0001' are stale by 0d 12h 1m 31s (threshold=0d 0h 5m 0s).  I'm forcing an immediate check of the service.
 2.00 EUR: 0 stuk(s).  ERT: cash0001;EVT_Cash_Quota;CRITICAL;HARD;1;error 47 PayTracs: Hopperniveau kritisch
[1462856519] HOST ALERT: cash0001;UP;HARD;7;OK - 10.54.86.128: rta 0.355ms, lost 0%
[1462856519] HOST NOTIFICATION: nagiosadmin;cash0001;UP;xi_host_notification_handler;OK - 10.54.86.128: rta 0.355ms, lost 0%
[1462856519] HOST NOTIFICATION: steven.reynders;cash0001;UP;xi_host_notification_handler;OK - 10.54.86.128: rta 0.355ms, lost 0%
[1462856735] SERVICE ALERT: cash0001;SRV_Ping;OK;SOFT;2;OK - 10.54.86.128: rta 0.355ms, lost 0%
[1462856786] SERVICE ALERT: cash0001;SRV_Uptime;OK;SOFT;2;OK: uptime: 0:6
And for cash0002:

Code: Select all

grep "cash0002" /usr/local/nagios/var/nagios.log | tail -100
[1462831200] CURRENT HOST STATE: cash0002;DOWN;HARD;7;CRITICAL - 10.54.86.148: rta nan, lost 100%
[1462831200] CURRENT SERVICE STATE: cash0002;DRV_C_Load;CRITICAL;SOFT;1;CHECK_NRPE: Socket timeout after 60 seconds.
[1462831200] CURRENT SERVICE STATE: cash0002;DRV_C_Usage;OK;HARD;1;OK: C:: Total: 55.8G - Used: 21.5G (38%) - Free: 34.3G (62%)
[1462831200] CURRENT SERVICE STATE: cash0002;EVT_Application;OK;HARD;1;OK: Dummy check passed
 * Cutter OK: Trueeend: True STATE: cash0002;EVT_Cash_Quota;CRITICAL;HARD;1;error 47 PayTracs: Ticket printer melding:
[1462831200] CURRENT SERVICE STATE: cash0002;EVT_System;OK;HARD;1;OK - Manual Reset
[1462831200] CURRENT SERVICE STATE: cash0002;NET_Connections;OK;HARD;1;OK: {TCP: (Total: 00045)(Established: 11)(Listening: 32)(Time_Wait: 1)(Close_Wait: 1)(Other: 0)}{UDP: (Total: 15)}
[1462831200] CURRENT SERVICE STATE: cash0002;NET_Load;CRITICAL;SOFT;1;Timeout while attempting connection
[1462831200] CURRENT SERVICE STATE: cash0002;PRC_Tracs;CRITICAL;SOFT;1;CHECK_NRPE: Socket timeout after 60 seconds.
[1462831200] CURRENT SERVICE STATE: cash0002;SRV_CPU_Usage;OK;HARD;1;OK: 1m: 2%, 5m: 2%, 15m: 2%
[1462831200] CURRENT SERVICE STATE: cash0002;SRV_Certificates;OK;HARD;1;All certificates are OK.
[1462831200] CURRENT SERVICE STATE: cash0002;SRV_Memory;OK;HARD;1;OK: physical memory: Total: 3.39G - Used: 1.12G (33%) - Free: 2.27G (67%), paged bytes: Total: 6.78G - Used: 1.01G (14%) - Free: 5.77G (86%)
[1462831200] CURRENT SERVICE STATE: cash0002;SRV_Ping;CRITICAL;SOFT;1;CRITICAL - 10.54.86.148: rta nan, lost 100%
[1462831200] CURRENT SERVICE STATE: cash0002;SRV_Uptime;OK;HARD;1;OK: uptime: 9:38
[1462831200] CURRENT SERVICE STATE: cash0002;SVC_McAfee;OK;HARD;1;OK: All services are in their appropriate state.
[1462831200] CURRENT SERVICE STATE: cash0002;SVC_Windows;OK;HARD;1;OK: All services are in their appropriate state.
[1462831566] Warning: The results of service 'EVT_Cash_Quota' on host 'cash0002' are stale by 0d 5h 3m 5s (threshold=0d 0h 5m 0s).  I'm forcing an immediate check of the service.
[1462831742] HOST ALERT: cash0002;UP;HARD;7;OK - 10.54.86.148: rta 0.361ms, lost 0%
[1462831742] HOST NOTIFICATION: nagiosadmin;cash0002;UP;xi_host_notification_handler;OK - 10.54.86.148: rta 0.361ms, lost 0%
[1462831742] HOST NOTIFICATION: steven.reynders;cash0002;UP;xi_host_notification_handler;OK - 10.54.86.148: rta 0.361ms, lost 0%
 2.00 EUR: 0 stuk(s).  ERT: cash0002;EVT_Cash_Quota;CRITICAL;HARD;1;error 47 PayTracs: Hopperniveau kritisch
[1462831832] SERVICE ALERT: cash0002;NET_Load;OK;SOFT;2;OK: Realtek PCIe GBE Family Controller: Avg of 2 seconds: {Total Link Utilisation: 0,00018%}{Rate (Total: 0,00021 MB/sec)(Received: 0,00021 MB/sec)(Sent: 0,00000 MB/sec)}
[1462831847] SERVICE ALERT: cash0002;SRV_Ping;OK;SOFT;2;OK - 10.54.86.148: rta 0.316ms, lost 0%
[1462831871] SERVICE ALERT: cash0002;DRV_C_Load;OK;SOFT;2;OK: Drive C: Avg of 5 samples: {Rate (Read: 0.16296MB/s)(Write: 15.47346MB/s)} {Avg Nr of (Reads: 21.64613r/s)(Writes: 18.79993w/s)} {Latency (Read: 1.25207ms)(Write: 7.92500ms)} {Queue Length (Read: 0.05148ql)(Write: 0.23789ql)}
[1462831907] SERVICE ALERT: cash0002;PRC_Tracs;OK;SOFT;2;OK: All processes are running.
 * Cutter OK: Trueeend: Truecash0002;EVT_Cash_Quota;CRITICAL;HARD;1;error 47 PayTracs: Ticket printer melding:
 2.00 EUR: 106 stuk(s).  T: cash0002;EVT_Cash_Quota;CRITICAL;HARD;1;error 47 PayTracs: Hopperniveau kritisch
 * Cutter OK: Trueeend: Truecash0002;EVT_Cash_Quota;CRITICAL;HARD;1;error 47 PayTracs: Ticket printer melding:
 2.00 EUR: 106 stuk(s).  T: cash0002;EVT_Cash_Quota;CRITICAL;HARD;1;error 47 PayTracs: Hopperniveau kritisch
 * Cutter OK: Trueeend: Truecash0002;EVT_Cash_Quota;CRITICAL;HARD;1;error 47 PayTracs: Ticket printer melding:
 2.00 EUR: 106 stuk(s).  T: cash0002;EVT_Cash_Quota;CRITICAL;HARD;1;error 47 PayTracs: Hopperniveau kritisch
 * Cutter OK: Trueeend: Truecash0002;EVT_Cash_Quota;CRITICAL;HARD;1;error 47 PayTracs: Ticket printer melding:
 2.00 EUR: 106 stuk(s).  T: cash0002;EVT_Cash_Quota;CRITICAL;HARD;1;error 47 PayTracs: Hopperniveau kritisch
 * Cutter OK: Trueeend: Truecash0002;EVT_Cash_Quota;CRITICAL;HARD;1;error 47 PayTracs: Ticket printer melding:
 2.00 EUR: 106 stuk(s).  T: cash0002;EVT_Cash_Quota;CRITICAL;HARD;1;error 47 PayTracs: Hopperniveau kritisch
 * Cutter OK: Trueeend: Truecash0002;EVT_Cash_Quota;CRITICAL;HARD;1;error 47 PayTracs: Ticket printer melding:
 2.00 EUR: 106 stuk(s).  T: cash0002;EVT_Cash_Quota;CRITICAL;HARD;1;error 47 PayTracs: Hopperniveau kritisch
[1462856424] SERVICE ALERT: cash0002;EVT_Cash_Quota;OK;HARD;1;OK: Dummy check passed
[1462856424] SERVICE NOTIFICATION: nagiosadmin;cash0002;EVT_Cash_Quota;OK;xi_service_notification_handler;OK: Dummy check passed
[1462856784] Warning: The results of service 'EVT_Cash_Quota' on host 'cash0002' are stale by 0d 0h 1m 0s (threshold=0d 0h 5m 0s).  I'm forcing an immediate check of the service.
 * Cutter OK: Trueeend: Truecash0002;EVT_Cash_Quota;CRITICAL;HARD;1;error 47 PayTracs: Ticket printer melding:
[1462856958] SERVICE NOTIFICATION: nagiosadmin;cash0002;EVT_Cash_Quota;CRITICAL;xi_service_notification_handler;error 47 PayTracs: Ticket printer  * Cutter OK: Trueeend: True
[1462856958] SERVICE NOTIFICATION: steven.reynders;cash0002;EVT_Cash_Quota;CRITICAL;xi_service_notification_handler;error 47 PayTracs: Ticket prin * Cutter OK: Trueeend: True
 2.00 EUR: 106 stuk(s).  T: cash0002;EVT_Cash_Quota;CRITICAL;HARD;1;error 47 PayTracs: Hopperniveau kritisch
[1462857324] Warning: The results of service 'EVT_Cash_Quota' on host 'cash0002' are stale by 0d 0h 0m 59s (threshold=0d 0h 5m 0s).  I'm forcing an immediate check of the service.
[1462857324] SERVICE ALERT: cash0002;EVT_Cash_Quota;OK;HARD;1;OK: Dummy check passed
[1462857324] SERVICE NOTIFICATION: nagiosadmin;cash0002;EVT_Cash_Quota;OK;xi_service_notification_handler;OK: Dummy check passed
[1462857684] Warning: The results of service 'EVT_Cash_Quota' on host 'cash0002' are stale by 0d 0h 1m 0s (threshold=0d 0h 5m 0s).  I'm forcing an immediate check of the service.
[1462858044] Warning: The results of service 'EVT_Cash_Quota' on host 'cash0002' are stale by 0d 0h 1m 0s (threshold=0d 0h 5m 0s).  I'm forcing an immediate check of the service.
[1462858404] Warning: The results of service 'EVT_Cash_Quota' on host 'cash0002' are stale by 0d 0h 1m 0s (threshold=0d 0h 5m 0s).  I'm forcing an immediate check of the service.
[1462858764] Warning: The results of service 'EVT_Cash_Quota' on host 'cash0002' are stale by 0d 0h 1m 0s (threshold=0d 0h 5m 0s).  I'm forcing an immediate check of the service.
[1462859124] Warning: The results of service 'EVT_Cash_Quota' on host 'cash0002' are stale by 0d 0h 1m 0s (threshold=0d 0h 5m 0s).  I'm forcing an immediate check of the service.
[1462859483] Warning: The results of service 'EVT_Cash_Quota' on host 'cash0002' are stale by 0d 0h 0m 59s (threshold=0d 0h 5m 0s).  I'm forcing an immediate check of the service.
[1462859844] Warning: The results of service 'EVT_Cash_Quota' on host 'cash0002' are stale by 0d 0h 1m 1s (threshold=0d 0h 5m 0s).  I'm forcing an immediate check of the service.
[1462860204] Warning: The results of service 'EVT_Cash_Quota' on host 'cash0002' are stale by 0d 0h 1m 0s (threshold=0d 0h 5m 0s).  I'm forcing an immediate check of the service.
 * Cutter OK: Trueeend: Truecash0002;EVT_Cash_Quota;CRITICAL;HARD;1;error 47 PayTracs: Ticket printer melding:
[1462860561] SERVICE NOTIFICATION: nagiosadmin;cash0002;EVT_Cash_Quota;CRITICAL;xi_service_notification_handler;error 47 PayTracs: Ticket printer  * Cutter OK: Trueeend: True
[1462860561] SERVICE NOTIFICATION: steven.reynders;cash0002;EVT_Cash_Quota;CRITICAL;xi_service_notification_handler;error 47 PayTracs: Ticket prin * Cutter OK: Trueeend: True
 2.00 EUR: 106 stuk(s).  T: cash0002;EVT_Cash_Quota;CRITICAL;HARD;1;error 47 PayTracs: Hopperniveau kritisch
[1462860924] Warning: The results of service 'EVT_Cash_Quota' on host 'cash0002' are stale by 0d 0h 0m 59s (threshold=0d 0h 5m 0s).  I'm forcing an immediate check of the service.
[1462860924] SERVICE ALERT: cash0002;EVT_Cash_Quota;OK;HARD;1;OK: Dummy check passed
[1462860924] SERVICE NOTIFICATION: nagiosadmin;cash0002;EVT_Cash_Quota;OK;xi_service_notification_handler;OK: Dummy check passed
[1462861283] Warning: The results of service 'EVT_Cash_Quota' on host 'cash0002' are stale by 0d 0h 0m 59s (threshold=0d 0h 5m 0s).  I'm forcing an immediate check of the service.
[1462861643] Warning: The results of service 'EVT_Cash_Quota' on host 'cash0002' are stale by 0d 0h 1m 0s (threshold=0d 0h 5m 0s).  I'm forcing an immediate check of the service.
[1462861944] Warning: The results of service 'EVT_Cash_Quota' on host 'cash0002' are stale by 0d 0h 0m 1s (threshold=0d 0h 5m 0s).  I'm forcing an immediate check of the service.
[1462862303] Warning: The results of service 'EVT_Cash_Quota' on host 'cash0002' are stale by 0d 0h 0m 59s (threshold=0d 0h 5m 0s).  I'm forcing an immediate check of the service.
[1462862604] Warning: The results of service 'EVT_Cash_Quota' on host 'cash0002' are stale by 0d 0h 0m 1s (threshold=0d 0h 5m 0s).  I'm forcing an immediate check of the service.
[1462862963] Warning: The results of service 'EVT_Cash_Quota' on host 'cash0002' are stale by 0d 0h 0m 59s (threshold=0d 0h 5m 0s).  I'm forcing an immediate check of the service.
[1462863323] Warning: The results of service 'EVT_Cash_Quota' on host 'cash0002' are stale by 0d 0h 1m 0s (threshold=0d 0h 5m 0s).  I'm forcing an immediate check of the service.
I'm not really sure what the results mean. As you can see the threshold is (threshold=0d 0h 5m 0s). So why does it sometimes sorce a check on weird times, such as "Warning: The results of service 'EVT_Cash_Quota' on host 'cash0002' are stale by 0d 5h 3m 5s" and "Warning: The results of service 'EVT_Cash_Quota' on host 'cash0002' are stale by 0d 0h 0m 1s (threshold=0d 0h 5m 0s). I'm forcing an immediate check of the service."

Please let me know if this mean anything to you.

Re: freshness checks stop working periodically

Posted: Tue May 10, 2016 4:34 pm
by ssax
Your check_interval is 5 minutes, so when it says Warning: The results of service 'EVT_Cash_Quota' on host 'cash0002' are stale by 0d 0h 0m 1s (threshold=0d 0h 5m 0s). I'm forcing an immediate check of the service. it is saying that it it's 5 minutes and 1 second old when the check_interval ran so it's considered stale, that looks normal to me. The freshness checks will only run on the scheduled check_interval so the last check would have been 5 minutes ago.

I have no idea why you are getting stale by 0d 5h 3m 5s, what do you have set in your /usr/local/nagios/etc/nagios.cfg for these:

Code: Select all

additional_freshness_latency
check_host_freshness
check_service_freshness
host_freshness_check_interval
service_freshness_check_interval
Edit: I don't think that's right, it should check based on the host/service_freshness_check_interval in your nagios.cfg

Re: freshness checks stop working periodically

Posted: Wed May 11, 2016 1:14 am
by WillemDH
Sean,

My nagios.cfg:

Code: Select all

# MODIFIED
admin_email=root@localhost
admin_pager=root@localhost
translate_passive_host_checks=1
log_event_handlers=0
use_large_installation_tweaks=1
enable_environment_macros=0


# NDOUtils module
broker_module=/usr/local/nagios/bin/ndomod.o config_file=/usr/local/nagios/etc/ndomod.cfg

# Mod Gearman module
broker_module=/usr/lib64/mod_gearman/mod_gearman.o config=/etc/mod_gearman/mod_gearman_neb.conf

# PNP settings - bulk mode with NCPD
process_performance_data=1
# service performance data
service_perfdata_file=/var/nagiosramdisk/service-perfdata
service_perfdata_file_template=DATATYPE::SERVICEPERFDATA\tTIMET::$TIMET$\tHOSTNAME::$HOSTNAME$\tSERVICEDESC::$SERVICEDESC$\tSERVICEPERFDATA::$SERVICEPERFDATA$\tSERVICECHECKCOMMAND::$SERVICECHECKCOMMAND$\tHOSTSTATE::$HOSTSTATE$\tHOSTSTATETYPE::$HOSTSTATETYPE$\tSERVICESTATE::$SERVICESTATE$\tSERVICESTATETYPE::$SERVICESTATETYPE$\tSERVICEOUTPUT::$SERVICEOUTPUT$
service_perfdata_file_mode=a
service_perfdata_file_processing_interval=15
service_perfdata_file_processing_command=process-service-perfdata-file-bulk
# host performance data
host_perfdata_file=/var/nagiosramdisk/host-perfdata
host_perfdata_file_template=DATATYPE::HOSTPERFDATA\tTIMET::$TIMET$\tHOSTNAME::$HOSTNAME$\tHOSTPERFDATA::$HOSTPERFDATA$\tHOSTCHECKCOMMAND::$HOSTCHECKCOMMAND$\tHOSTSTATE::$HOSTSTATE$\tHOSTSTATETYPE::$HOSTSTATETYPE$\tHOSTOUTPUT::$HOSTOUTPUT$
host_perfdata_file_mode=a
host_perfdata_file_processing_interval=15
host_perfdata_file_processing_command=process-host-perfdata-file-bulk


# OBJECTS - UNMODIFIED
#cfg_file=/usr/local/nagios/etc/objects/commands.cfg
#cfg_file=/usr/local/nagios/etc/objects/contacts.cfg
#cfg_file=/usr/local/nagios/etc/objects/localhost.cfg
#cfg_file=/usr/local/nagios/etc/objects/templates.cfg
#cfg_file=/usr/local/nagios/etc/objects/timeperiods.cfg


# STATIC OBJECT DEFINITIONS (THESE DON'T GET EXPORTED/IMPORTED BY NAGIOSQL)
cfg_dir=/usr/local/nagios/etc/static
# OBJECTS EXPORTED FROM NAGIOSQL
cfg_file=/usr/local/nagios/etc/contacttemplates.cfg
cfg_file=/usr/local/nagios/etc/contactgroups.cfg
cfg_file=/usr/local/nagios/etc/contacts.cfg
cfg_file=/usr/local/nagios/etc/timeperiods.cfg
cfg_file=/usr/local/nagios/etc/commands.cfg
cfg_file=/usr/local/nagios/etc/hostgroups.cfg
cfg_file=/usr/local/nagios/etc/servicegroups.cfg
cfg_file=/usr/local/nagios/etc/hosttemplates.cfg
cfg_file=/usr/local/nagios/etc/servicetemplates.cfg
cfg_file=/usr/local/nagios/etc/servicedependencies.cfg
cfg_file=/usr/local/nagios/etc/serviceescalations.cfg
cfg_file=/usr/local/nagios/etc/hostdependencies.cfg
cfg_file=/usr/local/nagios/etc/hostescalations.cfg
cfg_file=/usr/local/nagios/etc/hostextinfo.cfg
cfg_file=/usr/local/nagios/etc/serviceextinfo.cfg
cfg_dir=/usr/local/nagios/etc/hosts
cfg_dir=/usr/local/nagios/etc/services

# GLOBAL EVENT HANDLERS
global_host_event_handler=xi_host_event_handler
global_service_event_handler=xi_service_event_handler
# UNMODIFIED
accept_passive_host_checks=1
accept_passive_service_checks=1
additional_freshness_latency=15
auto_reschedule_checks=1
auto_rescheduling_interval=30
auto_rescheduling_window=45
bare_update_check=0
cached_host_check_horizon=15
cached_service_check_horizon=15
check_external_commands=1
check_for_orphaned_hosts=1
check_for_orphaned_services=1
check_for_updates=1
check_host_freshness=0
check_result_path=/var/nagiosramdisk/spool/checkresults
check_result_reaper_frequency=10
check_service_freshness=1
#command_check_interval=-1
command_file=/usr/local/nagios/var/rw/nagios.cmd
daemon_dumps_core=0
date_format=us
debug_file=/usr/local/nagios/var/nagios.debug
debug_level=0
debug_verbosity=1
#enable_embedded_perl=1
enable_event_handlers=1
enable_flap_detection=1
enable_notifications=1
enable_predictive_host_dependency_checks=1
enable_predictive_service_dependency_checks=1
event_broker_options=-1
event_handler_timeout=30
execute_host_checks=1
execute_service_checks=1
#external_command_buffer_slots=4096
high_host_flap_threshold=20.0
high_service_flap_threshold=20.0
host_check_timeout=30
host_freshness_check_interval=60
host_inter_check_delay_method=s
illegal_macro_output_chars=`~$&|'"<>
illegal_object_name_chars=`~!$%^&*|'"<>?,()=
interval_length=60
lock_file=/usr/local/nagios/var/nagios.lock
log_archive_path=/usr/local/nagios/var/archives
log_external_commands=0
log_file=/usr/local/nagios/var/nagios.log
log_host_retries=1
log_initial_states=0
log_notifications=1
log_passive_checks=0
log_rotation_method=d
log_service_retries=1
low_host_flap_threshold=5.0
low_service_flap_threshold=5.0
max_check_result_file_age=3600
max_check_result_reaper_time=30
max_concurrent_checks=0
max_debug_file_size=1000000
max_host_check_spread=30
max_service_check_spread=30
nagios_group=nagios
nagios_user=nagios
notification_timeout=30
object_cache_file=/var/nagiosramdisk/objects.cache
obsess_over_hosts=0
obsess_over_services=0
ocsp_timeout=5
#p1_file=/usr/local/nagios/bin/p1.pl
passive_host_checks_are_soft=0
perfdata_timeout=5
precached_object_file=/usr/local/nagios/var/objects.precache
resource_file=/usr/local/nagios/etc/resource.cfg
retained_contact_host_attribute_mask=0
retained_contact_service_attribute_mask=0
retained_host_attribute_mask=0
retained_process_host_attribute_mask=0
retained_process_service_attribute_mask=0
retained_service_attribute_mask=0
retain_state_information=1
retention_update_interval=60
service_check_timeout=250
service_freshness_check_interval=60
service_inter_check_delay_method=s
service_interleave_factor=s
#sleep_time=0.25
soft_state_dependencies=0
state_retention_file=/usr/local/nagios/var/retention.dat
status_file=/var/nagiosramdisk/status.dat
status_update_interval=10
temp_file=/usr/local/nagios/var/nagios.tmp
temp_path=/var/nagiosramdisk/tmp
use_aggressive_host_checking=0
#####use_embedded_perl_implicitly=1
use_regexp_matching=0
use_retained_program_state=1
use_retained_scheduling_info=1
use_syslog=1
use_true_regexp_matching=0
host_down_disable_service_checks=1
Or the freshnesh only:

Code: Select all

cat /usr/local/nagios/etc/nagios.cfg | grep freshness
additional_freshness_latency=15
check_host_freshness=0
check_service_freshness=1
host_freshness_check_interval=60
service_freshness_check_interval=60
Not sure what's going on..

Re: freshness checks stop working periodically

Posted: Wed May 11, 2016 4:22 pm
by ssax
Are you able to replicate this at all? I'm wondering what it says if you enable logging in there, I talked to the developer and he was saying that maybe something got stuck in the queue.

Do you have multiple message queues?

Code: Select all

ipcs -q
Does this output anything?

Code: Select all

echo "select * from nagios_timedevents; select * from nagios_timedeventqueue;" | mysql -pnagiosxi nagios

Re: freshness checks stop working periodically

Posted: Thu May 12, 2016 8:59 am
by WillemDH
Nope, sorry can't replicate this. Unless it happens again on Saturday as I noticed it happened last two Saturdays starting from 07:00. I'll let you know next week.
ipcs -q

------ Message Queues --------
key msqid owner perms used-bytes messages
0xc2010002 1310720 nagios 600 4611072 4503

Code: Select all

echo "select * from nagios_timedevents; select * from nagios_timedeventqueue;" | mysql -pnagiosxi nagios
ERROR 1045 (28000): Access denied for user 'root'@'localhost' (using password: YES)

Re: freshness checks stop working periodically

Posted: Thu May 12, 2016 4:53 pm
by ssax
Ok.

Did you offload your DB or change the root MySQL pass?