Service check "pending" in Nagiosxi but works in Core
-
- Posts: 9
- Joined: Wed Jul 07, 2021 2:04 am
Service check "pending" in Nagiosxi but works in Core
Hi,
we are facing an issue with some service checks on our NagiosXI server.
These service checks seem to be in pending state forever, and there is no button to schedule an immediate check:
The state of active checks, notifications, etc appear as "disabled", but it is not possible to enable them (i.e. clicking on the buttons has no effect as they remain disabled):
However, in Nagios Core the service check is working fine, and also running by command line or "test check command" button, it works.
The only different thing, comparing to other working services is a very long output and performance data returned by the plugin:
Is there any limitation on where the data is stored, or some sort of constraint which prevents the result from being shown in NagiosXI interface?
How could we increase this limit, or unblock this service in NagiosXI UI?
Thanks for your help
Zac
we are facing an issue with some service checks on our NagiosXI server.
These service checks seem to be in pending state forever, and there is no button to schedule an immediate check:
The state of active checks, notifications, etc appear as "disabled", but it is not possible to enable them (i.e. clicking on the buttons has no effect as they remain disabled):
However, in Nagios Core the service check is working fine, and also running by command line or "test check command" button, it works.
The only different thing, comparing to other working services is a very long output and performance data returned by the plugin:
Is there any limitation on where the data is stored, or some sort of constraint which prevents the result from being shown in NagiosXI interface?
How could we increase this limit, or unblock this service in NagiosXI UI?
Thanks for your help
Zac
You do not have the required permissions to view the files attached to this post.
Re: Service check "pending" in Nagiosxi but works in Core
Hello @Sistemisti Sisal
Thanks for reaching out, want to go ahead and get a copy of your System Profile so we can see what is going on.
To send us your system profile.
Perry
Thanks for reaching out, want to go ahead and get a copy of your System Profile so we can see what is going on.
To send us your system profile.
- Login to the Nagios XI GUI using a web browser.
- Click the "Admin" > "System Profile" Menu
- Click the "Download Profile" button
- Save the profile.zip file and send via Private Message
Perry
Re: Service check "pending" in Nagiosxi but works in Core
Hello @Sistemisti Sisal
Thanks for following up, and providing the System Profile.
Thanks for sending over the System Profile, and after review, I did not catch any applyconfig logged and want to have you run through the index.
[*]Reindex the Core Configuration Manager (CCM) configs[/*]
Verify that the host and services look good in pre-flight with no errors in core by running:
Thanks,
Perry
Thanks for following up, and providing the System Profile.
Thanks for sending over the System Profile, and after review, I did not catch any applyconfig logged and want to have you run through the index.
[*]Reindex the Core Configuration Manager (CCM) configs[/*]
- rm -rf /usr/local/nagios/etc/import/*
- 1: Terminal command list all running /bin/nagios -> ps -aux | grep -E '/bin/nagios'
- 2: Terminal command -> killall -9 nagios (or pkill nagios)
- 3: Terminal command check to see if /bin/nagios processes are stopped
- 4: Restart nagios.service by terminal command: systemctl restart nagios
- 5: Head over to the Nagios XI web console
==> Core Configuration Manager (CCM)
==> Config File Management
==> [Delete Files]
==> [Write Files]
==> [Verify Files] - 6: Core Configuration Manager (CCM)
==> Under Quick Tools
==> "Apply Configuration" - 7: Restart nagios.service by terminal command: systemctl restart nagios [list]
Code: Select all
systemctl restart nagios
Verify that the host and services look good in pre-flight with no errors in core by running:
Code: Select all
/usr/local/nagios/bin/nagios -vvv /usr/local/nagios/etc/nagios.cfg
Thanks,
Perry
-
- Posts: 9
- Joined: Wed Jul 07, 2021 2:04 am
Re: Service check "pending" in Nagiosxi but works in Core
Hi Perry,
I followed your suggestion and made clean, reindexing of all configuration, restart and apply.
There is no error during the pre-flight check:
However I was not able to spot the applyconfig in the logs either, where is it usually output?
Still the issue with the "pending" serivce persists:
Thanks for your support
Zac
I followed your suggestion and made clean, reindexing of all configuration, restart and apply.
There is no error during the pre-flight check:
Code: Select all
Running pre-flight check on configuration data...
Checking services...
Checked 66589 services.
Checking hosts...
Checked 3542 hosts.
Checking host groups...
Checked 85 host groups.
Checking service groups...
Checked 9 service groups.
Checking contacts...
Checked 244 contacts.
Checking contact groups...
Checked 21 contact groups.
Checking service escalations...
Checked 0 service escalations.
Checking service dependencies...
Checked 0 service dependencies.
Checking host escalations...
Checked 0 host escalations.
Checking host dependencies...
Checked 0 host dependencies.
Checking commands...
Checked 199 commands.
Checking time periods...
Checked 269 time periods.
Checking for circular paths between hosts...
Checking for circular host and service dependencies...
Checking global event handlers...
Checking obsessive compulsive processor commands...
Checking misc settings...
Total Warnings: 0
Total Errors: 0
Things look okay - No serious problems were detected during the pre-flight check
Still the issue with the "pending" serivce persists:
Thanks for your support
Zac
You do not have the required permissions to view the files attached to this post.
Re: Service check "pending" in Nagiosxi but works in Core
Hello Zac,
Thanks for the update, reviewing the screenshot we see that the host is 'vdwhdbpsp01' and in the System Profile I don't see the one that matches (cbetdbpsp01.cfg vsppdbpsp01.cfg). I assume that it is a active service check for that host, and the option to remove it and reconfigure.
You will want to review the logs found in; '/usr/loca/nagiosxi/var/....' view the eventman.log, cmdsubsys.log, and auditlog.
Please let us know if you have any questions while parsing through those.
Thanks,
Perry
Thanks for the update, reviewing the screenshot we see that the host is 'vdwhdbpsp01' and in the System Profile I don't see the one that matches (cbetdbpsp01.cfg vsppdbpsp01.cfg). I assume that it is a active service check for that host, and the option to remove it and reconfigure.
You will want to review the logs found in; '/usr/loca/nagiosxi/var/....' view the eventman.log, cmdsubsys.log, and auditlog.
Please let us know if you have any questions while parsing through those.
Thanks,
Perry
-
- Posts: 9
- Joined: Wed Jul 07, 2021 2:04 am
Re: Service check "pending" in Nagiosxi but works in Core
Hi Perry,
both vdwhdbpsp01 and vsppdbpsp01 have the same check configured which is causing this issue (sorry for wrong server screenshot).
Other similar checks (based on database queries) both on the same servers and on different servers give no problems:
We already tried:
- cloning the service check from other hosts on which is fine
- deleting and creating it again via UI
- creting again by importing the "define service" block, taken from another working .cfg file
all with the same effect.
No indication of any problem in /usr/local/nagiosxi/var logfiles.
Still, the command executed by command line returns a lot of data, both output and performance data.
Could this be the cause of our trouble?
both vdwhdbpsp01 and vsppdbpsp01 have the same check configured which is causing this issue (sorry for wrong server screenshot).
Other similar checks (based on database queries) both on the same servers and on different servers give no problems:
We already tried:
- cloning the service check from other hosts on which is fine
- deleting and creating it again via UI
- creting again by importing the "define service" block, taken from another working .cfg file
all with the same effect.
No indication of any problem in /usr/local/nagiosxi/var logfiles.
Still, the command executed by command line returns a lot of data, both output and performance data.
Could this be the cause of our trouble?
You do not have the required permissions to view the files attached to this post.
Re: Service check "pending" in Nagiosxi but works in Core
Hello Zac @Sistemisti Sisal
Thanks for following up, and providing the details on the two hosts' configs. You state that when you run the command from the command line you are not receiving results without errors from what you can tell.
Here is a one-liner that I use to grab a quick look throughout the system so I can see if there is anything that sticks out. Works great to get a nice overview, and when you see a key piece grab that and dig into it further:
Also to note the 'applyconfig' will run through a script to check permissions, you can run from: '/usr/local/nagiosxi/scripts/reconfigure_nagios.sh'.
Database issues please check with '/var/log/...databaseurusing...' Check columns and repair only if needed '/usr/local/nagiosxi/scripts/repair_database.sh'
Please let us know how we can help going forward,
Perry
Thanks for following up, and providing the details on the two hosts' configs. You state that when you run the command from the command line you are not receiving results without errors from what you can tell.
Here is a one-liner that I use to grab a quick look throughout the system so I can see if there is anything that sticks out. Works great to get a nice overview, and when you see a key piece grab that and dig into it further:
Code: Select all
grep -Eir 'warn|error|fail|terminated|timeout|time-out|sigsegv' /var/log/mariadb/mariadb.log /var/log/mysql/mysql.log /var/log/httpd/error_log /var/log/apache2 /var/log/syslog /var/log/messages /usr/local/nagios/var/ /usr/local/nagiosxi/var/ -A 2 -B 2 --color=always | less -SR
Database issues please check with '/var/log/...databaseurusing...' Check columns and repair only if needed '/usr/local/nagiosxi/scripts/repair_database.sh'
Code: Select all
echo "SELECT table_name AS 'Table', round(((data_length + index_length) / 1024 / 1024), 2) 'Size in MB' FROM information_schema.TABLES WHERE table_schema IN ('nagios', 'nagiosql', 'nagiosxi');" | mysql -h 127.0.0.1 -uroot -pnagiosxi --table
Perry
-
- Posts: 9
- Joined: Wed Jul 07, 2021 2:04 am
Re: Service check "pending" in Nagiosxi but works in Core
Hi Perry,
sorry for not being clear again...
When I run the script by command line, it works as expected, have a look at (part of) the output:
The same command works fine by using "test check command" button, as you can see:
Howerver the status of the check never changes from "pending" in the main Nagios interface.
Running your one-liner gives no serious issue on the system, database, or nagios logs.
Do you have any idea of what could be causing this issue?
Thanks,
Zac
sorry for not being clear again...
When I run the script by command line, it works as expected, have a look at (part of) the output:
The same command works fine by using "test check command" button, as you can see:
Howerver the status of the check never changes from "pending" in the main Nagios interface.
Running your one-liner gives no serious issue on the system, database, or nagios logs.
Do you have any idea of what could be causing this issue?
Thanks,
Zac
You do not have the required permissions to view the files attached to this post.
Re: Service check "pending" in Nagiosxi but works in Core
Hello @Sistemisti Sisal
Thanks for following; yeah this is a one-off with no clear answer as to why the command flies, but see strange responses from the UI.
Looking at the option to select change the 'Service Attributes' from the UI, you see an action processing. When you look at the web browser > Network tab in Development Tools and trigger the change on the Service Attribute do you see an action taking place? Also, get the (tail -f) response from the apache logs in /var/log/.... At the same time, when trigged, you will see a response in the 'nagios.log' and 'cmdsubsys.log' as well.
The next option is to "physically" move (copy) the check command to the host's ncpa plugin directory and verify.
Perry
Please let us know what you are seeing,
Perry
Thanks for following; yeah this is a one-off with no clear answer as to why the command flies, but see strange responses from the UI.
Looking at the option to select change the 'Service Attributes' from the UI, you see an action processing. When you look at the web browser > Network tab in Development Tools and trigger the change on the Service Attribute do you see an action taking place? Also, get the (tail -f) response from the apache logs in /var/log/.... At the same time, when trigged, you will see a response in the 'nagios.log' and 'cmdsubsys.log' as well.
Code: Select all
tail -F /usr/local/nagios/var/nagios.log /usr/local/nagiosxi/var/cmdsubsys.log
Thanks,/usr/local/nagios/libexec/check_ncpa.py -H your_linux_server_ip_address -t your_linux_server_ncpa_token -P 5693 -M 'plugins/check_your_check_here
Perry
Please let us know what you are seeing,
Perry
-
- Posts: 9
- Joined: Wed Jul 07, 2021 2:04 am
Re: Service check "pending" in Nagiosxi but works in Core
Hi Perry,
I was able to make this check work by removing performance data from the plugin output (i.e. appending --noperfdata to the plugin script).
Now the service looks fine also in NagiosXI:
The downside of this approach is that we are removing part of the data, so we get no graphs, statistics and so on.
Now that we identified part of the cause, is there something we can do to fix it or tune some parameter to make also perfomance data work?
For instance, where is this parformance data stored, once it is received from the plugin?
[*] could it be a database table with some char limit?
[*] maybe the html does not allow passing so much data?
[*] any idea is really appreciated
thanks for your patience
Zac
I was able to make this check work by removing performance data from the plugin output (i.e. appending --noperfdata to the plugin script).
Now the service looks fine also in NagiosXI:
The downside of this approach is that we are removing part of the data, so we get no graphs, statistics and so on.
Now that we identified part of the cause, is there something we can do to fix it or tune some parameter to make also perfomance data work?
For instance, where is this parformance data stored, once it is received from the plugin?
[*] could it be a database table with some char limit?
[*] maybe the html does not allow passing so much data?
[*] any idea is really appreciated
thanks for your patience
Zac
You do not have the required permissions to view the files attached to this post.