freshness checks stop working periodically

This support forum board is for support questions relating to Nagios XI, our flagship commercial network monitoring solution.
User avatar
WillemDH
Posts: 2320
Joined: Wed Mar 20, 2013 5:49 am
Location: Ghent
Contact:

freshness checks stop working periodically

Post by WillemDH »

Hello,

So I set up freshnesh checks for this passive service which is sending about 20 critical passive events a day. I set the timer to 300 seconds. This seems to work fine, but this is the fourth time now that this suddenly stops working. The service is no longer reset after 5 minutes.
Please check the screenshot for more information. as you can see the last critical passive check arrived at 07:00. I did a manual reset at 11:23.

Now the weird thing is that it only seems to fail with the first event after 07:00. At 07:00 there is an automatic apply configuration done with Reactor and the REST API.

Please advice how to prevent the freshnesh check from working as intended (reset critical states after 5 minutes)

Grtz

Willem
You do not have the required permissions to view the files attached to this post.
Nagios XI 5.8.1
https://outsideit.net
User avatar
Box293
Too Basu
Posts: 5126
Joined: Sun Feb 07, 2010 10:55 pm
Location: Deniliquin, Australia
Contact:

Re: freshness checks stop working periodically

Post by Box293 »

Can you look in your objects.cache file and post one of the service definitions that is not working correctly.
As of May 25th, 2018, all communications with Nagios Enterprises and its employees are covered under our new Privacy Policy.
User avatar
WillemDH
Posts: 2320
Joined: Wed Mar 20, 2013 5:49 am
Location: Ghent
Contact:

Re: freshness checks stop working periodically

Post by WillemDH »

Here you go: Host:

Code: Select all

define host {
        host_name       cash0001
        alias   cash0001
        address 10.54.86.128
        check_period    xi_timeperiod_24x7
        check_command   check_xi_host_ping!3000.0!20%!5000.0!80%!!!!
        contacts        steven.reenders,nagiosadmin,nagiosadmin
        contact_groups  cg_dummy
        notification_period     xi_timeperiod_24x7
        initial_state   o
        importance      0
        check_interval  5.000000
        retry_interval  1.000000
        max_check_attempts      7
        active_checks_enabled   1
        passive_checks_enabled  1
        obsess  1
        event_handler_enabled   1
        low_flap_threshold      0.000000
        high_flap_threshold     0.000000
        flap_detection_enabled  1
        flap_detection_options  a
        freshness_threshold     0
        check_freshness 0
        notification_options    a
        notifications_enabled   1
        notification_interval   1440.000000
        first_notification_delay        0.000000
        stalking_options        n
        process_perf_data       1
        icon_image      win_server.png
        statusmap_image win_server.png
        retain_status_information       1
        retain_nonstatus_information    1
        _XIWIZARD       windowsserver
        }
Service:

Code: Select all

define service {
        host_name       cash0003
        service_description     EVT_Cash_Quota
        display_name    EVT_System
        check_period    xi_timeperiod_24x7
        check_command   check_dummy!0!"Dummy check passed"!!!!!!
        contacts        steven.reynders,nagiosadmin,nagiosadmin
        contact_groups  cg_dummy
        notification_period     xi_timeperiod_24x7
        initial_state   o
        importance      0
        check_interval  1440.000000
        retry_interval  1.000000
        max_check_attempts      1
        is_volatile     0
        parallelize_check       1
        active_checks_enabled   0
        passive_checks_enabled  1
        obsess  1
        event_handler_enabled   1
        low_flap_threshold      0.000000
        high_flap_threshold     0.000000
        flap_detection_enabled  0
        flap_detection_options  a
        freshness_threshold     300
        check_freshness 1
        notification_options    a
        notifications_enabled   1
        notification_interval   1440.000000
        first_notification_delay        0.000000
        stalking_options        o,w,u,c
        process_perf_data       0
        icon_image      windowseventlog.png
        retain_status_information       1
        retain_nonstatus_information    1
        _XIWIZARD       windowseventlog
        }
Let me know if you need any more info.
Nagios XI 5.8.1
https://outsideit.net
tmcdonald
Posts: 9117
Joined: Mon Sep 23, 2013 8:40 am

Re: freshness checks stop working periodically

Post by tmcdonald »

Might wanna see some logging if there's nothing sensitive:

grep "cash0001" /usr/local/nagios/var/nagios.log | tail -100
Former Nagios employee
User avatar
WillemDH
Posts: 2320
Joined: Wed Mar 20, 2013 5:49 am
Location: Ghent
Contact:

Re: freshness checks stop working periodically

Post by WillemDH »

This is the result for cash0001:

Code: Select all

grep "cash0001" /usr/local/nagios/var/nagios.log | tail -100
[1462831200] CURRENT HOST STATE: cash0001;DOWN;HARD;7;CRITICAL - 10.54.86.128: rta nan, lost 100%
[1462831200] CURRENT SERVICE STATE: cash0001;DRV_C_Load;OK;HARD;1;OK: Drive C: Avg of 5 samples: {Rate (Read: 0.00000MB/s)(Write: 0.01867MB/s)} {Avg Nr of (Reads: 0.00000r/s)(Writes: 2.18149w/s)} {Latency (Read: 0.00000ms)(Write: 1.77600ms)} {Queue Length (Read: 0.00000ql)(Write: 0.00580ql)}
[1462831200] CURRENT SERVICE STATE: cash0001;DRV_C_Usage;OK;HARD;1;OK: C:: Total: 55.8G - Used: 25.2G (45%) - Free: 30.6G (55%)
[1462831200] CURRENT SERVICE STATE: cash0001;EVT_Application;OK;HARD;1;OK: Dummy check passed
[1462831200] CURRENT SERVICE STATE: cash0001;EVT_Cash_Quota;OK;HARD;1;OK: Dummy check passed
[1462831200] CURRENT SERVICE STATE: cash0001;EVT_System;OK;HARD;1;OK - Manual Reset
[1462831200] CURRENT SERVICE STATE: cash0001;NET_Connections;OK;HARD;1;OK: {TCP: (Total: 00037)(Established: 7)(Listening: 30)(Time_Wait: 0)(Close_Wait: 0)(Other: 0)}{UDP: (Total: 15)}
[1462831200] CURRENT SERVICE STATE: cash0001;NET_Load;OK;HARD;1;OK: Realtek PCIe GBE Family Controller: Avg of 2 seconds: {Total Link Utilisation: 0,00012%}{Rate (Total: 0,00014 MB/sec)(Received: 0,00000 MB/sec)(Sent: 0,00014 MB/sec)}
[1462831200] CURRENT SERVICE STATE: cash0001;PRC_Tracs;OK;HARD;1;OK: All processes are running.
[1462831200] CURRENT SERVICE STATE: cash0001;SRV_CPU_Usage;OK;HARD;1;OK: 1m: 0%, 5m: 2%, 15m: 2%
[1462831200] CURRENT SERVICE STATE: cash0001;SRV_Certificates;OK;HARD;1;All certificates are OK.
[1462831200] CURRENT SERVICE STATE: cash0001;SRV_Memory;OK;HARD;1;OK: physical memory: Total: 3.39G - Used: 1.1G (32%) - Free: 2.29G (68%), paged bytes: Total: 6.78G - Used: 1.01G (14%) - Free: 5.78G (86%)
[1462831200] CURRENT SERVICE STATE: cash0001;SRV_Ping;CRITICAL;SOFT;1;CRITICAL - 10.54.86.128: rta nan, lost 100%
[1462831200] CURRENT SERVICE STATE: cash0001;SRV_Uptime;CRITICAL;SOFT;1;CHECK_NRPE: Socket timeout after 20 seconds.
[1462831200] CURRENT SERVICE STATE: cash0001;SVC_McAfee;OK;HARD;1;OK: All services are in their appropriate state.
[1462831200] CURRENT SERVICE STATE: cash0001;SVC_Windows;OK;HARD;1;OK: All services are in their appropriate state.
[1462831566] Warning: The results of service 'EVT_Cash_Quota' on host 'cash0001' are stale by 0d 5h 6m 13s (threshold=0d 0h 5m 0s).  I'm forcing an immediate check of the service.
[1462856484] Warning: The results of service 'EVT_Cash_Quota' on host 'cash0001' are stale by 0d 12h 1m 31s (threshold=0d 0h 5m 0s).  I'm forcing an immediate check of the service.
 2.00 EUR: 0 stuk(s).  ERT: cash0001;EVT_Cash_Quota;CRITICAL;HARD;1;error 47 PayTracs: Hopperniveau kritisch
[1462856519] HOST ALERT: cash0001;UP;HARD;7;OK - 10.54.86.128: rta 0.355ms, lost 0%
[1462856519] HOST NOTIFICATION: nagiosadmin;cash0001;UP;xi_host_notification_handler;OK - 10.54.86.128: rta 0.355ms, lost 0%
[1462856519] HOST NOTIFICATION: steven.reynders;cash0001;UP;xi_host_notification_handler;OK - 10.54.86.128: rta 0.355ms, lost 0%
[1462856735] SERVICE ALERT: cash0001;SRV_Ping;OK;SOFT;2;OK - 10.54.86.128: rta 0.355ms, lost 0%
[1462856786] SERVICE ALERT: cash0001;SRV_Uptime;OK;SOFT;2;OK: uptime: 0:6
And for cash0002:

Code: Select all

grep "cash0002" /usr/local/nagios/var/nagios.log | tail -100
[1462831200] CURRENT HOST STATE: cash0002;DOWN;HARD;7;CRITICAL - 10.54.86.148: rta nan, lost 100%
[1462831200] CURRENT SERVICE STATE: cash0002;DRV_C_Load;CRITICAL;SOFT;1;CHECK_NRPE: Socket timeout after 60 seconds.
[1462831200] CURRENT SERVICE STATE: cash0002;DRV_C_Usage;OK;HARD;1;OK: C:: Total: 55.8G - Used: 21.5G (38%) - Free: 34.3G (62%)
[1462831200] CURRENT SERVICE STATE: cash0002;EVT_Application;OK;HARD;1;OK: Dummy check passed
 * Cutter OK: Trueeend: True STATE: cash0002;EVT_Cash_Quota;CRITICAL;HARD;1;error 47 PayTracs: Ticket printer melding:
[1462831200] CURRENT SERVICE STATE: cash0002;EVT_System;OK;HARD;1;OK - Manual Reset
[1462831200] CURRENT SERVICE STATE: cash0002;NET_Connections;OK;HARD;1;OK: {TCP: (Total: 00045)(Established: 11)(Listening: 32)(Time_Wait: 1)(Close_Wait: 1)(Other: 0)}{UDP: (Total: 15)}
[1462831200] CURRENT SERVICE STATE: cash0002;NET_Load;CRITICAL;SOFT;1;Timeout while attempting connection
[1462831200] CURRENT SERVICE STATE: cash0002;PRC_Tracs;CRITICAL;SOFT;1;CHECK_NRPE: Socket timeout after 60 seconds.
[1462831200] CURRENT SERVICE STATE: cash0002;SRV_CPU_Usage;OK;HARD;1;OK: 1m: 2%, 5m: 2%, 15m: 2%
[1462831200] CURRENT SERVICE STATE: cash0002;SRV_Certificates;OK;HARD;1;All certificates are OK.
[1462831200] CURRENT SERVICE STATE: cash0002;SRV_Memory;OK;HARD;1;OK: physical memory: Total: 3.39G - Used: 1.12G (33%) - Free: 2.27G (67%), paged bytes: Total: 6.78G - Used: 1.01G (14%) - Free: 5.77G (86%)
[1462831200] CURRENT SERVICE STATE: cash0002;SRV_Ping;CRITICAL;SOFT;1;CRITICAL - 10.54.86.148: rta nan, lost 100%
[1462831200] CURRENT SERVICE STATE: cash0002;SRV_Uptime;OK;HARD;1;OK: uptime: 9:38
[1462831200] CURRENT SERVICE STATE: cash0002;SVC_McAfee;OK;HARD;1;OK: All services are in their appropriate state.
[1462831200] CURRENT SERVICE STATE: cash0002;SVC_Windows;OK;HARD;1;OK: All services are in their appropriate state.
[1462831566] Warning: The results of service 'EVT_Cash_Quota' on host 'cash0002' are stale by 0d 5h 3m 5s (threshold=0d 0h 5m 0s).  I'm forcing an immediate check of the service.
[1462831742] HOST ALERT: cash0002;UP;HARD;7;OK - 10.54.86.148: rta 0.361ms, lost 0%
[1462831742] HOST NOTIFICATION: nagiosadmin;cash0002;UP;xi_host_notification_handler;OK - 10.54.86.148: rta 0.361ms, lost 0%
[1462831742] HOST NOTIFICATION: steven.reynders;cash0002;UP;xi_host_notification_handler;OK - 10.54.86.148: rta 0.361ms, lost 0%
 2.00 EUR: 0 stuk(s).  ERT: cash0002;EVT_Cash_Quota;CRITICAL;HARD;1;error 47 PayTracs: Hopperniveau kritisch
[1462831832] SERVICE ALERT: cash0002;NET_Load;OK;SOFT;2;OK: Realtek PCIe GBE Family Controller: Avg of 2 seconds: {Total Link Utilisation: 0,00018%}{Rate (Total: 0,00021 MB/sec)(Received: 0,00021 MB/sec)(Sent: 0,00000 MB/sec)}
[1462831847] SERVICE ALERT: cash0002;SRV_Ping;OK;SOFT;2;OK - 10.54.86.148: rta 0.316ms, lost 0%
[1462831871] SERVICE ALERT: cash0002;DRV_C_Load;OK;SOFT;2;OK: Drive C: Avg of 5 samples: {Rate (Read: 0.16296MB/s)(Write: 15.47346MB/s)} {Avg Nr of (Reads: 21.64613r/s)(Writes: 18.79993w/s)} {Latency (Read: 1.25207ms)(Write: 7.92500ms)} {Queue Length (Read: 0.05148ql)(Write: 0.23789ql)}
[1462831907] SERVICE ALERT: cash0002;PRC_Tracs;OK;SOFT;2;OK: All processes are running.
 * Cutter OK: Trueeend: Truecash0002;EVT_Cash_Quota;CRITICAL;HARD;1;error 47 PayTracs: Ticket printer melding:
 2.00 EUR: 106 stuk(s).  T: cash0002;EVT_Cash_Quota;CRITICAL;HARD;1;error 47 PayTracs: Hopperniveau kritisch
 * Cutter OK: Trueeend: Truecash0002;EVT_Cash_Quota;CRITICAL;HARD;1;error 47 PayTracs: Ticket printer melding:
 2.00 EUR: 106 stuk(s).  T: cash0002;EVT_Cash_Quota;CRITICAL;HARD;1;error 47 PayTracs: Hopperniveau kritisch
 * Cutter OK: Trueeend: Truecash0002;EVT_Cash_Quota;CRITICAL;HARD;1;error 47 PayTracs: Ticket printer melding:
 2.00 EUR: 106 stuk(s).  T: cash0002;EVT_Cash_Quota;CRITICAL;HARD;1;error 47 PayTracs: Hopperniveau kritisch
 * Cutter OK: Trueeend: Truecash0002;EVT_Cash_Quota;CRITICAL;HARD;1;error 47 PayTracs: Ticket printer melding:
 2.00 EUR: 106 stuk(s).  T: cash0002;EVT_Cash_Quota;CRITICAL;HARD;1;error 47 PayTracs: Hopperniveau kritisch
 * Cutter OK: Trueeend: Truecash0002;EVT_Cash_Quota;CRITICAL;HARD;1;error 47 PayTracs: Ticket printer melding:
 2.00 EUR: 106 stuk(s).  T: cash0002;EVT_Cash_Quota;CRITICAL;HARD;1;error 47 PayTracs: Hopperniveau kritisch
 * Cutter OK: Trueeend: Truecash0002;EVT_Cash_Quota;CRITICAL;HARD;1;error 47 PayTracs: Ticket printer melding:
 2.00 EUR: 106 stuk(s).  T: cash0002;EVT_Cash_Quota;CRITICAL;HARD;1;error 47 PayTracs: Hopperniveau kritisch
[1462856424] SERVICE ALERT: cash0002;EVT_Cash_Quota;OK;HARD;1;OK: Dummy check passed
[1462856424] SERVICE NOTIFICATION: nagiosadmin;cash0002;EVT_Cash_Quota;OK;xi_service_notification_handler;OK: Dummy check passed
[1462856784] Warning: The results of service 'EVT_Cash_Quota' on host 'cash0002' are stale by 0d 0h 1m 0s (threshold=0d 0h 5m 0s).  I'm forcing an immediate check of the service.
 * Cutter OK: Trueeend: Truecash0002;EVT_Cash_Quota;CRITICAL;HARD;1;error 47 PayTracs: Ticket printer melding:
[1462856958] SERVICE NOTIFICATION: nagiosadmin;cash0002;EVT_Cash_Quota;CRITICAL;xi_service_notification_handler;error 47 PayTracs: Ticket printer  * Cutter OK: Trueeend: True
[1462856958] SERVICE NOTIFICATION: steven.reynders;cash0002;EVT_Cash_Quota;CRITICAL;xi_service_notification_handler;error 47 PayTracs: Ticket prin * Cutter OK: Trueeend: True
 2.00 EUR: 106 stuk(s).  T: cash0002;EVT_Cash_Quota;CRITICAL;HARD;1;error 47 PayTracs: Hopperniveau kritisch
[1462857324] Warning: The results of service 'EVT_Cash_Quota' on host 'cash0002' are stale by 0d 0h 0m 59s (threshold=0d 0h 5m 0s).  I'm forcing an immediate check of the service.
[1462857324] SERVICE ALERT: cash0002;EVT_Cash_Quota;OK;HARD;1;OK: Dummy check passed
[1462857324] SERVICE NOTIFICATION: nagiosadmin;cash0002;EVT_Cash_Quota;OK;xi_service_notification_handler;OK: Dummy check passed
[1462857684] Warning: The results of service 'EVT_Cash_Quota' on host 'cash0002' are stale by 0d 0h 1m 0s (threshold=0d 0h 5m 0s).  I'm forcing an immediate check of the service.
[1462858044] Warning: The results of service 'EVT_Cash_Quota' on host 'cash0002' are stale by 0d 0h 1m 0s (threshold=0d 0h 5m 0s).  I'm forcing an immediate check of the service.
[1462858404] Warning: The results of service 'EVT_Cash_Quota' on host 'cash0002' are stale by 0d 0h 1m 0s (threshold=0d 0h 5m 0s).  I'm forcing an immediate check of the service.
[1462858764] Warning: The results of service 'EVT_Cash_Quota' on host 'cash0002' are stale by 0d 0h 1m 0s (threshold=0d 0h 5m 0s).  I'm forcing an immediate check of the service.
[1462859124] Warning: The results of service 'EVT_Cash_Quota' on host 'cash0002' are stale by 0d 0h 1m 0s (threshold=0d 0h 5m 0s).  I'm forcing an immediate check of the service.
[1462859483] Warning: The results of service 'EVT_Cash_Quota' on host 'cash0002' are stale by 0d 0h 0m 59s (threshold=0d 0h 5m 0s).  I'm forcing an immediate check of the service.
[1462859844] Warning: The results of service 'EVT_Cash_Quota' on host 'cash0002' are stale by 0d 0h 1m 1s (threshold=0d 0h 5m 0s).  I'm forcing an immediate check of the service.
[1462860204] Warning: The results of service 'EVT_Cash_Quota' on host 'cash0002' are stale by 0d 0h 1m 0s (threshold=0d 0h 5m 0s).  I'm forcing an immediate check of the service.
 * Cutter OK: Trueeend: Truecash0002;EVT_Cash_Quota;CRITICAL;HARD;1;error 47 PayTracs: Ticket printer melding:
[1462860561] SERVICE NOTIFICATION: nagiosadmin;cash0002;EVT_Cash_Quota;CRITICAL;xi_service_notification_handler;error 47 PayTracs: Ticket printer  * Cutter OK: Trueeend: True
[1462860561] SERVICE NOTIFICATION: steven.reynders;cash0002;EVT_Cash_Quota;CRITICAL;xi_service_notification_handler;error 47 PayTracs: Ticket prin * Cutter OK: Trueeend: True
 2.00 EUR: 106 stuk(s).  T: cash0002;EVT_Cash_Quota;CRITICAL;HARD;1;error 47 PayTracs: Hopperniveau kritisch
[1462860924] Warning: The results of service 'EVT_Cash_Quota' on host 'cash0002' are stale by 0d 0h 0m 59s (threshold=0d 0h 5m 0s).  I'm forcing an immediate check of the service.
[1462860924] SERVICE ALERT: cash0002;EVT_Cash_Quota;OK;HARD;1;OK: Dummy check passed
[1462860924] SERVICE NOTIFICATION: nagiosadmin;cash0002;EVT_Cash_Quota;OK;xi_service_notification_handler;OK: Dummy check passed
[1462861283] Warning: The results of service 'EVT_Cash_Quota' on host 'cash0002' are stale by 0d 0h 0m 59s (threshold=0d 0h 5m 0s).  I'm forcing an immediate check of the service.
[1462861643] Warning: The results of service 'EVT_Cash_Quota' on host 'cash0002' are stale by 0d 0h 1m 0s (threshold=0d 0h 5m 0s).  I'm forcing an immediate check of the service.
[1462861944] Warning: The results of service 'EVT_Cash_Quota' on host 'cash0002' are stale by 0d 0h 0m 1s (threshold=0d 0h 5m 0s).  I'm forcing an immediate check of the service.
[1462862303] Warning: The results of service 'EVT_Cash_Quota' on host 'cash0002' are stale by 0d 0h 0m 59s (threshold=0d 0h 5m 0s).  I'm forcing an immediate check of the service.
[1462862604] Warning: The results of service 'EVT_Cash_Quota' on host 'cash0002' are stale by 0d 0h 0m 1s (threshold=0d 0h 5m 0s).  I'm forcing an immediate check of the service.
[1462862963] Warning: The results of service 'EVT_Cash_Quota' on host 'cash0002' are stale by 0d 0h 0m 59s (threshold=0d 0h 5m 0s).  I'm forcing an immediate check of the service.
[1462863323] Warning: The results of service 'EVT_Cash_Quota' on host 'cash0002' are stale by 0d 0h 1m 0s (threshold=0d 0h 5m 0s).  I'm forcing an immediate check of the service.
I'm not really sure what the results mean. As you can see the threshold is (threshold=0d 0h 5m 0s). So why does it sometimes sorce a check on weird times, such as "Warning: The results of service 'EVT_Cash_Quota' on host 'cash0002' are stale by 0d 5h 3m 5s" and "Warning: The results of service 'EVT_Cash_Quota' on host 'cash0002' are stale by 0d 0h 0m 1s (threshold=0d 0h 5m 0s). I'm forcing an immediate check of the service."

Please let me know if this mean anything to you.
Nagios XI 5.8.1
https://outsideit.net
ssax
Dreams In Code
Posts: 7682
Joined: Wed Feb 11, 2015 12:54 pm

Re: freshness checks stop working periodically

Post by ssax »

Your check_interval is 5 minutes, so when it says Warning: The results of service 'EVT_Cash_Quota' on host 'cash0002' are stale by 0d 0h 0m 1s (threshold=0d 0h 5m 0s). I'm forcing an immediate check of the service. it is saying that it it's 5 minutes and 1 second old when the check_interval ran so it's considered stale, that looks normal to me. The freshness checks will only run on the scheduled check_interval so the last check would have been 5 minutes ago.

I have no idea why you are getting stale by 0d 5h 3m 5s, what do you have set in your /usr/local/nagios/etc/nagios.cfg for these:

Code: Select all

additional_freshness_latency
check_host_freshness
check_service_freshness
host_freshness_check_interval
service_freshness_check_interval
Edit: I don't think that's right, it should check based on the host/service_freshness_check_interval in your nagios.cfg
User avatar
WillemDH
Posts: 2320
Joined: Wed Mar 20, 2013 5:49 am
Location: Ghent
Contact:

Re: freshness checks stop working periodically

Post by WillemDH »

Sean,

My nagios.cfg:

Code: Select all

# MODIFIED
admin_email=root@localhost
admin_pager=root@localhost
translate_passive_host_checks=1
log_event_handlers=0
use_large_installation_tweaks=1
enable_environment_macros=0


# NDOUtils module
broker_module=/usr/local/nagios/bin/ndomod.o config_file=/usr/local/nagios/etc/ndomod.cfg

# Mod Gearman module
broker_module=/usr/lib64/mod_gearman/mod_gearman.o config=/etc/mod_gearman/mod_gearman_neb.conf

# PNP settings - bulk mode with NCPD
process_performance_data=1
# service performance data
service_perfdata_file=/var/nagiosramdisk/service-perfdata
service_perfdata_file_template=DATATYPE::SERVICEPERFDATA\tTIMET::$TIMET$\tHOSTNAME::$HOSTNAME$\tSERVICEDESC::$SERVICEDESC$\tSERVICEPERFDATA::$SERVICEPERFDATA$\tSERVICECHECKCOMMAND::$SERVICECHECKCOMMAND$\tHOSTSTATE::$HOSTSTATE$\tHOSTSTATETYPE::$HOSTSTATETYPE$\tSERVICESTATE::$SERVICESTATE$\tSERVICESTATETYPE::$SERVICESTATETYPE$\tSERVICEOUTPUT::$SERVICEOUTPUT$
service_perfdata_file_mode=a
service_perfdata_file_processing_interval=15
service_perfdata_file_processing_command=process-service-perfdata-file-bulk
# host performance data
host_perfdata_file=/var/nagiosramdisk/host-perfdata
host_perfdata_file_template=DATATYPE::HOSTPERFDATA\tTIMET::$TIMET$\tHOSTNAME::$HOSTNAME$\tHOSTPERFDATA::$HOSTPERFDATA$\tHOSTCHECKCOMMAND::$HOSTCHECKCOMMAND$\tHOSTSTATE::$HOSTSTATE$\tHOSTSTATETYPE::$HOSTSTATETYPE$\tHOSTOUTPUT::$HOSTOUTPUT$
host_perfdata_file_mode=a
host_perfdata_file_processing_interval=15
host_perfdata_file_processing_command=process-host-perfdata-file-bulk


# OBJECTS - UNMODIFIED
#cfg_file=/usr/local/nagios/etc/objects/commands.cfg
#cfg_file=/usr/local/nagios/etc/objects/contacts.cfg
#cfg_file=/usr/local/nagios/etc/objects/localhost.cfg
#cfg_file=/usr/local/nagios/etc/objects/templates.cfg
#cfg_file=/usr/local/nagios/etc/objects/timeperiods.cfg


# STATIC OBJECT DEFINITIONS (THESE DON'T GET EXPORTED/IMPORTED BY NAGIOSQL)
cfg_dir=/usr/local/nagios/etc/static
# OBJECTS EXPORTED FROM NAGIOSQL
cfg_file=/usr/local/nagios/etc/contacttemplates.cfg
cfg_file=/usr/local/nagios/etc/contactgroups.cfg
cfg_file=/usr/local/nagios/etc/contacts.cfg
cfg_file=/usr/local/nagios/etc/timeperiods.cfg
cfg_file=/usr/local/nagios/etc/commands.cfg
cfg_file=/usr/local/nagios/etc/hostgroups.cfg
cfg_file=/usr/local/nagios/etc/servicegroups.cfg
cfg_file=/usr/local/nagios/etc/hosttemplates.cfg
cfg_file=/usr/local/nagios/etc/servicetemplates.cfg
cfg_file=/usr/local/nagios/etc/servicedependencies.cfg
cfg_file=/usr/local/nagios/etc/serviceescalations.cfg
cfg_file=/usr/local/nagios/etc/hostdependencies.cfg
cfg_file=/usr/local/nagios/etc/hostescalations.cfg
cfg_file=/usr/local/nagios/etc/hostextinfo.cfg
cfg_file=/usr/local/nagios/etc/serviceextinfo.cfg
cfg_dir=/usr/local/nagios/etc/hosts
cfg_dir=/usr/local/nagios/etc/services

# GLOBAL EVENT HANDLERS
global_host_event_handler=xi_host_event_handler
global_service_event_handler=xi_service_event_handler
# UNMODIFIED
accept_passive_host_checks=1
accept_passive_service_checks=1
additional_freshness_latency=15
auto_reschedule_checks=1
auto_rescheduling_interval=30
auto_rescheduling_window=45
bare_update_check=0
cached_host_check_horizon=15
cached_service_check_horizon=15
check_external_commands=1
check_for_orphaned_hosts=1
check_for_orphaned_services=1
check_for_updates=1
check_host_freshness=0
check_result_path=/var/nagiosramdisk/spool/checkresults
check_result_reaper_frequency=10
check_service_freshness=1
#command_check_interval=-1
command_file=/usr/local/nagios/var/rw/nagios.cmd
daemon_dumps_core=0
date_format=us
debug_file=/usr/local/nagios/var/nagios.debug
debug_level=0
debug_verbosity=1
#enable_embedded_perl=1
enable_event_handlers=1
enable_flap_detection=1
enable_notifications=1
enable_predictive_host_dependency_checks=1
enable_predictive_service_dependency_checks=1
event_broker_options=-1
event_handler_timeout=30
execute_host_checks=1
execute_service_checks=1
#external_command_buffer_slots=4096
high_host_flap_threshold=20.0
high_service_flap_threshold=20.0
host_check_timeout=30
host_freshness_check_interval=60
host_inter_check_delay_method=s
illegal_macro_output_chars=`~$&|'"<>
illegal_object_name_chars=`~!$%^&*|'"<>?,()=
interval_length=60
lock_file=/usr/local/nagios/var/nagios.lock
log_archive_path=/usr/local/nagios/var/archives
log_external_commands=0
log_file=/usr/local/nagios/var/nagios.log
log_host_retries=1
log_initial_states=0
log_notifications=1
log_passive_checks=0
log_rotation_method=d
log_service_retries=1
low_host_flap_threshold=5.0
low_service_flap_threshold=5.0
max_check_result_file_age=3600
max_check_result_reaper_time=30
max_concurrent_checks=0
max_debug_file_size=1000000
max_host_check_spread=30
max_service_check_spread=30
nagios_group=nagios
nagios_user=nagios
notification_timeout=30
object_cache_file=/var/nagiosramdisk/objects.cache
obsess_over_hosts=0
obsess_over_services=0
ocsp_timeout=5
#p1_file=/usr/local/nagios/bin/p1.pl
passive_host_checks_are_soft=0
perfdata_timeout=5
precached_object_file=/usr/local/nagios/var/objects.precache
resource_file=/usr/local/nagios/etc/resource.cfg
retained_contact_host_attribute_mask=0
retained_contact_service_attribute_mask=0
retained_host_attribute_mask=0
retained_process_host_attribute_mask=0
retained_process_service_attribute_mask=0
retained_service_attribute_mask=0
retain_state_information=1
retention_update_interval=60
service_check_timeout=250
service_freshness_check_interval=60
service_inter_check_delay_method=s
service_interleave_factor=s
#sleep_time=0.25
soft_state_dependencies=0
state_retention_file=/usr/local/nagios/var/retention.dat
status_file=/var/nagiosramdisk/status.dat
status_update_interval=10
temp_file=/usr/local/nagios/var/nagios.tmp
temp_path=/var/nagiosramdisk/tmp
use_aggressive_host_checking=0
#####use_embedded_perl_implicitly=1
use_regexp_matching=0
use_retained_program_state=1
use_retained_scheduling_info=1
use_syslog=1
use_true_regexp_matching=0
host_down_disable_service_checks=1
Or the freshnesh only:

Code: Select all

cat /usr/local/nagios/etc/nagios.cfg | grep freshness
additional_freshness_latency=15
check_host_freshness=0
check_service_freshness=1
host_freshness_check_interval=60
service_freshness_check_interval=60
Not sure what's going on..
Nagios XI 5.8.1
https://outsideit.net
ssax
Dreams In Code
Posts: 7682
Joined: Wed Feb 11, 2015 12:54 pm

Re: freshness checks stop working periodically

Post by ssax »

Are you able to replicate this at all? I'm wondering what it says if you enable logging in there, I talked to the developer and he was saying that maybe something got stuck in the queue.

Do you have multiple message queues?

Code: Select all

ipcs -q
Does this output anything?

Code: Select all

echo "select * from nagios_timedevents; select * from nagios_timedeventqueue;" | mysql -pnagiosxi nagios
User avatar
WillemDH
Posts: 2320
Joined: Wed Mar 20, 2013 5:49 am
Location: Ghent
Contact:

Re: freshness checks stop working periodically

Post by WillemDH »

Nope, sorry can't replicate this. Unless it happens again on Saturday as I noticed it happened last two Saturdays starting from 07:00. I'll let you know next week.
ipcs -q

------ Message Queues --------
key msqid owner perms used-bytes messages
0xc2010002 1310720 nagios 600 4611072 4503

Code: Select all

echo "select * from nagios_timedevents; select * from nagios_timedeventqueue;" | mysql -pnagiosxi nagios
ERROR 1045 (28000): Access denied for user 'root'@'localhost' (using password: YES)
Nagios XI 5.8.1
https://outsideit.net
ssax
Dreams In Code
Posts: 7682
Joined: Wed Feb 11, 2015 12:54 pm

Re: freshness checks stop working periodically

Post by ssax »

Ok.

Did you offload your DB or change the root MySQL pass?
Locked