Code: Select all
Apr 12 09:45:09 nagios: SERVICE DOWNTIME ALERT: host1.com.au;ZPOOL Status;STARTED; Service has entered a period of scheduled downtime
Apr 12 09:45:09 nagios: SERVICE DOWNTIME ALERT: host2;CPU Usage;STARTED; Service has entered a period of scheduled downtime
Apr 12 09:45:09 nagios: SERVICE DOWNTIME ALERT: host2;Swap Usage;STARTED; Service has entered a period of scheduled downtime
Apr 12 09:45:09 nagios: HOST DOWNTIME ALERT: host3;STARTED; Host has entered a period of scheduled downtime
Apr 12 09:45:09 nagios: SERVICE DOWNTIME ALERT: host4;/ Disk Free - Linux;STOPPED; Service has exited from a period of scheduled downtime
Apr 12 09:45:09 nagios: HOST DOWNTIME ALERT: host6;STOPPED; Host has exited from a period of scheduled downtime
Apr 12 09:45:09 nagios: Caught SIGSEGV, shutting down...
I've checked the debug.log file (Level=1) and it looks like it is trying to process the scheduled downtimes:
Code: Select all
[1460428580.088375] [001.0] [pid=15596] create_notification_list_from_host()
[1460428580.088401] [001.0] [pid=15596] should_host_notification_be_escalated()
[1460428580.088418] [001.0] [pid=15596] check_contact_host_notification_viability()
[1460428580.088430] [001.0] [pid=15596] check_contact_host_notification_viability()
[1460428580.088442] [001.0] [pid=15596] check_contact_host_notification_viability()
[1460428580.088456] [001.0] [pid=15596] check_time_against_period()
[1460428580.088490] [001.0] [pid=15596] _get_matching_timerange()
[1460428580.088511] [001.0] [pid=15596] check_contact_host_notification_viability()
[1460428580.088520] [001.0] [pid=15596] check_time_against_period()
[1460428580.088537] [001.0] [pid=15596] _get_matching_timerange()
[1460428580.088556] [001.0] [pid=15596] check_contact_host_notification_viability()
[1460428580.088570] [001.0] [pid=15596] check_contact_host_notification_viability()
[1460428580.088583] [001.0] [pid=15596] check_contact_host_notification_viability()
[1460428580.088593] [001.0] [pid=15596] check_time_against_period()
[1460428580.088614] [001.0] [pid=15596] _get_matching_timerange()
[1460428580.088633] [001.0] [pid=15596] check_contact_host_notification_viability()
[1460428580.088642] [001.0] [pid=15596] check_time_against_period()
[1460428580.088659] [001.0] [pid=15596] _get_matching_timerange()
[1460428580.088677] [001.0] [pid=15596] check_contact_host_notification_viability()
[1460428580.088686] [001.0] [pid=15596] check_time_against_period()
[1460428580.088716] [001.0] [pid=15596] _get_matching_timerange()
[1460428580.088996] [001.0] [pid=15596] find_downtime()
[1460428580.089165] [001.0] [pid=15596] handle_scheduled_downtime()
[1460428580.117268] [001.0] [pid=15610] clear_volatile_macros_r()Is there anyway I can delete the downtime entries to isolate whether this is causing the issues ?
As nagios is not running, I can't update the nagios.cmd file manually.
This is pretty urgent as this is our Prod site. Any suggestions appreciated.
regards... Fred