How To Clear Solaris Service Maintenance Status


Problem Description

This KB article explains how to clear the Solaris Maintenance Status on a service, specifically the Nagios Core or NRPE service. This KB article will focus specially on Nagios Core and the resolution, however the steps are relevant to NRPE as well.

When the Nagios Core service starts, it verifies the configuration files and if there is an invalid configuration, the Nagios Core service will NOT start. To resolve the problem you must fix the problem Nagios Core is complaining about. This is normal behaviour of Nagios Core, it is not specific to Solaris.

However on Solaris, after a service has failed to start several times, Solaris will put the service into what is called a Maintenance State. This state prevents a small problem from becoming a bigger problem. Even after fixing the problem Nagios Core is complaining about, you must also clear the maintenance state on the service before Solaris allows a service to be started again.

This KB article will show you how to determine what the problem is and how to resolve the issue.

 

Diagnose The Problem

A common scenario is when you have rebooted your Solaris server and the Nagios Core service fails to start.

The first step is to determine why the service did not start.

Execute the following command to see detailed status information:

svcs -xv nagios

 

The output resembles something like this:

svc:/application/nagios:default (?)
 State: maintenance since March  6, 2017 04:57:38 PM EST
Reason: Start method failed repeatedly, last exited with status 8.
   See: http://support.oracle.com/msg/SMF-8000-KS
   See: /var/svc/log/application-nagios:default.log
Impact: This service is not running.

 

It's clear that the service is in a maintenance state, however there is not a lot of detail as to the cause of the issue except that the Start method failed repeatedly. It does however provide the name of a log file /var/svc/log/application-nagios:default.log.

Execute the following command to perform further troubleshooting:

tail -20 /var/svc/log/application-nagios:default.log

 

The output might resemble something like this:

License: GPL

Website: https://www.nagios.org
Reading configuration data...
   Read main config file okay...
Error: Invalid max_check_attempts value for host 'test'
Error: Could not register host (config file '/usr/local/nagios/etc/objects/localhost.cfg', starting on line 33)
   Error processing object config files!


***> One or more problems was encountered while processing the config files...

     Check your configuration file(s) to ensure that they contain valid
     directives and data defintions.  If you are upgrading from a previous
     version of Nagios, you should be aware that some variables/definitions
     may have been removed or modified in this version.  Make sure to read
     the HTML documentation regarding the config files, as well as the
     'Whats New' section to find out what has changed.

[ Mar  6 16:57:38 Method "start" exited with status 8. ]

 

This is a common Nagios Core problem, the object definition in the configuration file is missing required directives. In this example the template was forgotten and hence all the common options were missing.

Before proceeding you need to fix the error that is being reported. Once you think you've resolved the issue, you can run the verify command to check:

/usr/local/nagios/bin/nagios -v /usr/local/nagios/etc/nagios.cfg

 

The output of a successful verify should end like this:

Total Errors:   0

Things look okay - No serious problems were detected during the pre-flight check

 

At this point however if you try to restart the service you'll find it won't start:

svcadm enable nagios

svcs -xv nagios

 

The output will be like this:

svc:/application/nagios:default (?)
 State: maintenance since March  6, 2017 04:57:38 PM EST
Reason: Start method failed repeatedly, last exited with status 8.
   See: http://support.oracle.com/msg/SMF-8000-KS
   See: /var/svc/log/application-nagios:default.log
Impact: This service is not running.

 

Don't be fooled thinking there is still a Nagios configuration issue. Pay careful attention to the date and time on the state, it hasn't changed. Solaris will refuse to start the service until the maintenance state is cleared.

 

 

Clear Maintenance State

Execute the following command to clear the maintenance state:

svcadm clear nagios

 

Execute the following command to start Nagios:

svcadm enable nagios

 

Now check the state of the service:

svcs -xv nagios

 

The output resembles something like this:

svc:/application/nagios:default (?)
 State: online since March  6, 2017 05:17:21 PM EST
   See: /var/svc/log/application-nagios:default.log
Impact: None.

 

Problem solved, Nagios Core is now running again.

 

 

Final Thoughts

For any support related questions please visit the Nagios Support Forums at:

http://support.nagios.com/forum/



Article ID: 565
Created On: Mon, Mar 6, 2017 at 1:22 AM
Last Updated On: Mon, Mar 6, 2017 at 1:22 AM
Authored by: tlea

Online URL: https://support.nagios.com/kb/article/how-to-clear-solaris-service-maintenance-status-565.html