Page 2 of 3

Re: Service check works in Core, but not XI

Posted: Tue Apr 29, 2014 5:41 pm
by abrist
Goofy behavior like this can sometimes be blamed on fork/orphan errors. Check the system messages for such issues:
tail -500 /var/log/messages | grep "fork\|orphan\|segfault"
Additionally, check the db for crashed tables:

Code: Select all

tail -100 /var/log/mysqld.log | grep "crashed"

Re: Service check works in Core, but not XI

Posted: Tue Apr 29, 2014 7:50 pm
by rseiwert
It looks like XI is truncating the string. It's too long. Remove some performance data and I bet it works.

Re: Service check works in Core, but not XI

Posted: Wed Apr 30, 2014 9:34 am
by lmiltchev
snapon_admin, can you give us an update on your issue? Did you check the system messages for fork/orphan errors as suggested by abrist?

Re: Service check works in Core, but not XI

Posted: Wed Apr 30, 2014 11:37 am
by snapon_admin
Neither of those spit out any results. I also tried deleting and re-adding the service from scratch, even though this check works on every other host, just to be sure. I even ticked the option to off for process perf data. There definitely would be more perf data on this host than on the others, but that also doesn't explain why the 2 host checks having similar issues aren't working.

Re: Service check works in Core, but not XI

Posted: Wed Apr 30, 2014 1:44 pm
by lmiltchev
I also tried deleting and re-adding the service from scratch, even though this check works on every other host, just to be sure.
Can you show us the definition of the service that is not working and one that is working? Hide sensitive info.

Re: Service check works in Core, but not XI

Posted: Wed Apr 30, 2014 2:15 pm
by snapon_admin
Working:

Code: Select all

define service {
        host_name                       USSNAPMKEWI-Core
        service_description             Hardware health
        check_command                   check_nwc_hardware_health!SNMPCOMMUNITYSTRING!!!!!!!
        max_check_attempts              5
        check_interval                  10
        retry_interval                  2
        active_checks_enabled           1
        check_period                    xi_timeperiod_24x7
        notification_period             xi_timeperiod_24x7
        notifications_enabled           0
        contacts                        nagiosadmin
        _xiwizard                       switch
        register                        1
        }
Not working:

Code: Select all

define service {
        host_name                       USSNAPLSAIL-Core
        service_description             Hardware health
        check_command                   check_nwc_hardware_health!SNMPCOMMUNITYSTRING!!!!!!!
        initial_state                   o
        max_check_attempts              5
        check_interval                  10
        retry_interval                  2
        active_checks_enabled           1
        passive_checks_enabled          1
        check_period                    xi_timeperiod_24x7
        event_handler                   xi_service_notification_handler
        event_handler_enabled           1
        low_flap_threshold              10
        high_flap_threshold             40
        flap_detection_enabled          1
        process_perf_data               0
        notification_interval           60
        notification_period             xi_timeperiod_24x7
        notification_options            w,c,r,
        notifications_enabled           0
        contacts                        nagiosadmin
        _xiwizard                       switch
        register                        1
        }


Re: Service check works in Core, but not XI

Posted: Wed Apr 30, 2014 4:41 pm
by abrist
There are a number of things disabled on this object - performance data, notifications, etc. When you re-created this check did you do so through the copy mechanism, or did you re-create it by hand manually?

Re: Service check works in Core, but not XI

Posted: Thu May 01, 2014 10:41 am
by snapon_admin
I copied it from another service check using the check_nwc plugin, but I removed all templates and such and reconfigured pretty much every part of it. The reason that perfdata is disabled is that this particular check does have a ton of perfdata and I wanted to see if disabling it would correct the issue, which it didn't. I also disabled notifications in case something went wrong with adding the check, didn't want to confuse my ops team with a false alert.

Re: Service check works in Core, but not XI

Posted: Thu May 01, 2014 12:57 pm
by scottwilkerson
Actually if the check has a TON of perfdata, this could cause it to not make it in the DB depending on you mysql version (disabled or not).

Here would be a workaround, increase the perfdata size in the nagios DB

Code: Select all

echo "ALTER TABLE nagios_servicestatus MODIFY perfdata VARCHAR(65536);"|mysql -pnagiosxi nagios
echo "ALTER TABLE nagios_servicechecks MODIFY perfdata VARCHAR(65536);"|mysql -pnagiosxi nagios
echo "ALTER TABLE nagios_hoststatus MODIFY perfdata VARCHAR(65536);"|mysql -pnagiosxi nagios
echo "ALTER TABLE nagios_hostchecks MODIFY perfdata VARCHAR(65536);"|mysql -pnagiosxi nagios

Re: Service check works in Core, but not XI

Posted: Thu May 01, 2014 1:37 pm
by snapon_admin
Alright I'll try that. Once I've made that change should removing and re-adding the service check be all I need to do or is there any other steps I need to take?