This KB article explains how to clear the Solaris Maintenance Status on a service, specifically the Nagios Core or NRPE service. This KB article will focus specially on Nagios Core and the resolution, however the steps are relevant to NRPE as well.
When the Nagios Core service starts, it verifies the configuration files and if there is an invalid configuration, the Nagios Core service will NOT start. To resolve the problem you must fix the problem Nagios Core is complaining about. This is normal behaviour of Nagios Core, it is not specific to Solaris.
However on Solaris, after a service has failed to start several times, Solaris will put the service into what is called a Maintenance State. This state prevents a small problem from becoming a bigger problem. Even after fixing the problem Nagios Core is complaining about, you must also clear the maintenance state on the service before Solaris allows a service to be started again.
This KB article will show you how to determine what the problem is and how to resolve the issue.
Diagnose The Problem
A common scenario is when you have rebooted your Solaris server and the Nagios Core service fails to start.
The first step is to determine why the service did not start.
Execute the following command to see detailed status information:
svcs -xv nagios
The output resembles something like this:
svc:/application/nagios:default (?)
State: maintenance since March 6, 2017 04:57:38 PM EST
Reason: Start method failed repeatedly, last exited with status 8.
See: http://support.oracle.com/msg/SMF-8000-KS
See: /var/svc/log/application-nagios:default.log
Impact: This service is not running.
It's clear that the service is in a maintenance state, however there is not a lot of detail as to the cause of the issue except that the Start method failed repeatedly. It does however provide the name of a log file /var/svc/log/application-nagios:default.log.
Execute the following command to perform further troubleshooting:
tail -20 /var/svc/log/application-nagios:default.log
The output might resemble something like this:
License: GPL
Website: https://www.nagios.org
Reading configuration data...
Read main config file okay...
Error: Invalid max_check_attempts value for host 'test'
Error: Could not register host (config file '/usr/local/nagios/etc/objects/localhost.cfg', starting on line 33)
Error processing object config files!
***> One or more problems was encountered while processing the config files...
Check your configuration file(s) to ensure that they contain valid
directives and data defintions. If you are upgrading from a previous
version of Nagios, you should be aware that some variables/definitions
may have been removed or modified in this version. Make sure to read
the HTML documentation regarding the config files, as well as the
'Whats New' section to find out what has changed.
[ Mar 6 16:57:38 Method "start" exited with status 8. ]
This is a common Nagios Core problem, the object definition in the configuration file is missing required directives. In this example the template was forgotten and hence all the common options were missing.
Before proceeding you need to fix the error that is being reported. Once you think you've resolved the issue, you can run the verify command to check:
/usr/local/nagios/bin/nagios -v /usr/local/nagios/etc/nagios.cfg
The output of a successful verify should end like this:
Total Errors: 0
Things look okay - No serious problems were detected during the pre-flight check
At this point however if you try to restart the service you'll find it won't start:
svcadm enable nagios
svcs -xv nagios
The output will be like this:
svc:/application/nagios:default (?)
State: maintenance since March 6, 2017 04:57:38 PM EST
Reason: Start method failed repeatedly, last exited with status 8.
See: http://support.oracle.com/msg/SMF-8000-KS
See: /var/svc/log/application-nagios:default.log
Impact: This service is not running.
Don't be fooled thinking there is still a Nagios configuration issue. Pay careful attention to the date and time on the state, it hasn't changed. Solaris will refuse to start the service until the maintenance state is cleared.
Clear Maintenance State
Execute the following command to clear the maintenance state:
svcadm clear nagios
Execute the following command to start Nagios:
svcadm enable nagios
Now check the state of the service:
svcs -xv nagios
The output resembles something like this:
svc:/application/nagios:default (?)
State: online since March 6, 2017 05:17:21 PM EST
See: /var/svc/log/application-nagios:default.log
Impact: None.
Problem solved, Nagios Core is now running again.
Final Thoughts
For any support related questions please visit the Nagios Support Forums at:
http://support.nagios.com/forum/
Article ID: 565
Created On: Mon, Mar 6, 2017 at 1:22 AM
Last Updated On: Mon, Mar 6, 2017 at 1:22 AM
Authored by: tlea
Online URL: https://support.nagios.com/kb/article/how-to-clear-solaris-service-maintenance-status-565.html