Page 1 of 1

Mismatched result between web and command line

Posted: Tue Feb 02, 2021 2:00 pm
by smoren
Hello,
I'm struggling with strange behavior of Nagios XI. I'm trying to configure monitoring of DB2 databases using check_db2_health. After configuring a service to check database, web interface returns critical state (redacted):

Code: Select all

CRITICAL - cannot connect to 10.x.x.x. [IBM][CLI Driver] SQL0332N Character conversion from the source code page "819" to the target code page "UNKNOWN" is not supported. SQLSTATE=57017
Same check returns an OK state from command line (redacted):

Code: Select all

# su - nagios
$ /usr/local/nagios/libexec/check_db2_health --hostname 10.x.x.x --username USER --password pass --database DBNAME --mode connection-time --warning 2 --critical 5 --port 50004
OK - 0.09 seconds to connect as USER | connection_time=0.0860;2;5
Strange think is, that this check returns an OK state in web interface just after restart of entire server. But as soon as I apply configuration in CCM (even with no change), service is back in Critical.

Do you have any ideas what might be wrong?
Thanks.

Environment: Nagios XI 5.7.3, RHEL6.

Re: Mismatched result between web and command line

Posted: Tue Feb 02, 2021 4:27 pm
by dchurch
Are you using passive checks to monitor that host? A passive service check with the exact same name as an active one will clobber each other and lead to confusing service states.

Is the "Initial State" toggle set to Critical? If it is, it'll show critical until the first check result comes in.

If you PM me a system profile I can diagnose further. Get one by going to Admin (top menu) => System Profile (in the left menu), then clicking the blue button.

Re: Mismatched result between web and command line

Posted: Tue Feb 02, 2021 5:36 pm
by smoren
Hello,
it is an active check. I couldn't find any duplicate services (same name, same config,..). Pre-flight check on configuration data returns 0 errors and 0 warnings.
You have the profile in PM.
Thanks.

Re: Mismatched result between web and command line

Posted: Wed Feb 03, 2021 1:10 pm
by dchurch
The profile I received from you doesn't seem to be a complete profile. As such, I was only able to see limited information about your system.

Get a complete profile by going to Admin (top menu) => System Profile (in the left menu), then clicking the blue button.

If you're unable to generate the the profile through the web interface, please try generating it from the command line by running these commands as root:

Code: Select all

rm -rf /usr/local/nagiosxi/var/components/profile*
/usr/local/nagiosxi/scripts/components/getprofile.sh SUPPORT
Then send me the resulting /usr/local/nagiosxi/var/components/profile.zip file.
If the profile script fails, please include the ENTIRE output.

Re: Mismatched result between web and command line

Posted: Wed Feb 03, 2021 2:44 pm
by smoren
You have full profile in PM.
Rene

Re: Mismatched result between web and command line

Posted: Wed Feb 03, 2021 3:47 pm
by dchurch
What's the output from the following command from the command line?

Code: Select all

export LANG=C
/usr/local/nagios/libexec/check_db2_health --hostname 10.x.x.x --username USER --password pass --database DBNAME --mode connection-time --warning 2 --critical 5 --port 50004
Does it still work?

If it doesn't, then the following explanation may shed some light: For some reason apache runs with a different locale setting from when running from the command line. The DB2 driver relies on this setting to translate messages, apparently. It's choking because it doesn't recognize the code page specified in Apache's locale setting. Check the locale at any time by using the "locale" command.

Re: Mismatched result between web and command line

Posted: Wed Feb 03, 2021 4:33 pm
by smoren
Hello,
we are getting closer :-)
I executed these commands (redacted):

Code: Select all

# su - nagios
$ /usr/local/nagios/libexec/check_db2_health ...
OK - 0.09 seconds to connect as USER | connection_time=0.0851;2;5
$ locale
LANG=en_US.UTF-8
LC_CTYPE="en_US.UTF-8"
LC_NUMERIC="en_US.UTF-8"
LC_TIME="en_US.UTF-8"
LC_COLLATE="en_US.UTF-8"
LC_MONETARY="en_US.UTF-8"
LC_MESSAGES="en_US.UTF-8"
LC_PAPER="en_US.UTF-8"
LC_NAME="en_US.UTF-8"
LC_ADDRESS="en_US.UTF-8"
LC_TELEPHONE="en_US.UTF-8"
LC_MEASUREMENT="en_US.UTF-8"
LC_IDENTIFICATION="en_US.UTF-8"
LC_ALL=
$ export LANG=C
$ /usr/local/nagios/libexec/check_db2_health ...
CRITICAL - cannot connect to 10.x.x.x. [IBM][CLI Driver] SQL0332N  Character conversion from the source code page "819" to the target code page "UNKNOWN" is not supported.  SQLSTATE=57017
$ export LANG=en_US.UTF-8
$ /usr/local/nagios/libexec/check_db2_health ...
OK - 0.08 seconds to connect as USER | connection_time=0.0831;2;5
So it seems the LANG variable is indeed the issue. Do you have any recommendations on how/where to set it properly?
Thanks.

Re: Mismatched result between web and command line

Posted: Thu Feb 04, 2021 10:24 am
by dchurch
A couple of options are available to fix this:
  • You could change the service invocation to be

    Code: Select all

    LANG=en_US.utf8 $USER1$/check_db2_health --hostname 10.x.x.x --username USER --password pass --database DBNAME --mode connection-time --warning 2 --critical 5 --port 50004
  • You can edit the script check_db2_health to include the lines near the top: e.g.

    Code: Select all

    #!/usr/bin/perl
    # Fix server codepage mismatch 2021-02-04
    $ENV{LANG} = 'en_US.utf8';
    Make sure to submit a bug report or patch to the authors to comply with the GPL2 license.
  • You can apply a fix to PHP to make it always run in UTF-8 locale. Looks like there's a couple of different ways to do it, but it might make Nagios XI behave in undefined ways.
Pick ONE method of fixing this.