Nagios XI Services not working

This support forum board is for support questions relating to Nagios XI, our flagship commercial network monitoring solution.
uidaho
Posts: 89
Joined: Tue Feb 12, 2013 11:58 am

Re: Nagios XI Services not working

Post by uidaho »

I am having exactly the same problem. I updated to latest version of XI this morning rebooted the host. RHEL6 box.

from nagios.log
wproc: Registry request: name=Core Worker 32367;pid=32367
wproc: Registry request: name=Core Worker 32368;pid=32368
wproc: Registry request: name=Core Worker 32369;pid=32369
Event broker module '/usr/local/nagios/bin/ndomod.o' initialized successfully.

from var/log/messages

Segmentation fault

Jan 24 09:32:02 monitor01 ndo2db: Trimming eventhandlers.
Jan 24 09:32:03 monitor01 kernel: nagios[32320]: segfault at 372a000 ip 000000391927f791 sp 00007ffe1c8880d8 error 6 in libc-2.12.so[3919200000+18a000]


I'd love to hear what you find out!
askewdread
Posts: 69
Joined: Wed Nov 16, 2016 4:54 pm

Re: Nagios XI Services not working

Post by askewdread »

uidaho wrote:I am having exactly the same problem. I updated to latest version of XI this morning rebooted the host. RHEL6 box.

from nagios.log
wproc: Registry request: name=Core Worker 32367;pid=32367
wproc: Registry request: name=Core Worker 32368;pid=32368
wproc: Registry request: name=Core Worker 32369;pid=32369
Event broker module '/usr/local/nagios/bin/ndomod.o' initialized successfully.

from var/log/messages

Segmentation fault

Jan 24 09:32:02 monitor01 ndo2db: Trimming eventhandlers.
Jan 24 09:32:03 monitor01 kernel: nagios[32320]: segfault at 372a000 ip 000000391927f791 sp 00007ffe1c8880d8 error 6 in libc-2.12.so[3919200000+18a000]


I'd love to hear what you find out!
looks pretty much the same as ours, except ours is CentOS 7.3
User avatar
tgriep
Madmin
Posts: 9190
Joined: Thu Oct 30, 2014 9:02 am

Re: Nagios XI Services not working

Post by tgriep »

Thanks for the retention.dat file. I did notice in it is a service check called "Check WMI Physical Disk IO" whose output data is getting cut off at 8192 bytes and maybe that is causing the issue.
For a test, can you disable that service check and see if the issue is resolved?
Thanks.
Be sure to check out our Knowledgebase for helpful articles and solutions!
askewdread
Posts: 69
Joined: Wed Nov 16, 2016 4:54 pm

Re: Nagios XI Services not working

Post by askewdread »

Hey,

thanks for that... i had to remove the retentions.dat file again that time but i suspect thats expected as it still had those checks inside it.... ill keep an eye on it and let you know :)
dwhitfield
Former Nagios Staff
Posts: 4583
Joined: Wed Sep 21, 2016 10:29 am
Location: NoLo, Minneapolis, MN
Contact:

Re: Nagios XI Services not working

Post by dwhitfield »

Thanks for letting us know. We await results.
askewdread
Posts: 69
Joined: Wed Nov 16, 2016 4:54 pm

Re: Nagios XI Services not working

Post by askewdread »

Hi,

unfortunately this has reoccured this morning, latest retention.dat attached
You do not have the required permissions to view the files attached to this post.
tmcdonald
Posts: 9117
Joined: Mon Sep 23, 2013 8:40 am

Re: Nagios XI Services not working

Post by tmcdonald »

Can you run a MySQL command for me? As root, from the command line:

echo "use nagios;select count(*) from nagios_servicestatus where LENGTH(output) >= 255;" | mysql -u root -pnagiosxi

You may need to change the last part if the password is not nagiosxi. I have a theory that the segfault might be related to long output being parsed, but we are not able to replicate this internally to test.
Former Nagios employee
askewdread
Posts: 69
Joined: Wed Nov 16, 2016 4:54 pm

Re: Nagios XI Services not working

Post by askewdread »

tmcdonald wrote:Can you run a MySQL command for me? As root, from the command line:

echo "use nagios;select count(*) from nagios_servicestatus where LENGTH(output) >= 255;" | mysql -u root -pnagiosxi

You may need to change the last part if the password is not nagiosxi. I have a theory that the segfault might be related to long output being parsed, but we are not able to replicate this internally to test.
it comes back:

Code: Select all

count(*)
0
User avatar
tgriep
Madmin
Posts: 9190
Joined: Thu Oct 30, 2014 9:02 am

Re: Nagios XI Services not working

Post by tgriep »

Can you run this command and post the output so we can see if the MYSQL table settings are correct?

Code: Select all

echo 'desc nagios_servicestatus;' | mysql  -t -pnagiosxi nagios
Can you either post or PM me the full /var/log/messages and the /usr/local/nagios/var/nagios.log files so we can view them?
Be sure to check out our Knowledgebase for helpful articles and solutions!
askewdread
Posts: 69
Joined: Wed Nov 16, 2016 4:54 pm

Re: Nagios XI Services not working

Post by askewdread »

tgriep wrote:Can you run this command and post the output so we can see if the MYSQL table settings are correct?

Code: Select all

echo 'desc nagios_servicestatus;' | mysql  -t -pnagiosxi nagios
Can you either post or PM me the full /var/log/messages and the /usr/local/nagios/var/nagios.log files so we can view them?
mysql

Code: Select all

+-------------------------------+--------------+------+-----+---------------------+----------------+
| Field                         | Type         | Null | Key | Default             | Extra          |
+-------------------------------+--------------+------+-----+---------------------+----------------+
| servicestatus_id              | int(11)      | NO   | PRI | NULL                | auto_increment |
| instance_id                   | smallint(6)  | NO   | MUL | 0                   |                |
| service_object_id             | int(11)      | NO   | UNI | 0                   |                |
| status_update_time            | datetime     | NO   | MUL | 0000-00-00 00:00:00 |                |
| output                        | varchar(255) | NO   |     |                     |                |
| long_output                   | text         | NO   |     | NULL                |                |
| perfdata                      | text         | NO   |     | NULL                |                |
| current_state                 | smallint(6)  | NO   | MUL | 0                   |                |
| has_been_checked              | smallint(6)  | NO   |     | 0                   |                |
| should_be_scheduled           | smallint(6)  | NO   |     | 0                   |                |
| current_check_attempt         | smallint(6)  | NO   |     | 0                   |                |
| max_check_attempts            | smallint(6)  | NO   |     | 0                   |                |
| last_check                    | datetime     | NO   |     | 0000-00-00 00:00:00 |                |
| next_check                    | datetime     | NO   |     | 0000-00-00 00:00:00 |                |
| check_type                    | smallint(6)  | NO   | MUL | 0                   |                |
| last_state_change             | datetime     | NO   | MUL | 0000-00-00 00:00:00 |                |
| last_hard_state_change        | datetime     | NO   |     | 0000-00-00 00:00:00 |                |
| last_hard_state               | smallint(6)  | NO   |     | 0                   |                |
| last_time_ok                  | datetime     | NO   |     | 0000-00-00 00:00:00 |                |
| last_time_warning             | datetime     | NO   |     | 0000-00-00 00:00:00 |                |
| last_time_unknown             | datetime     | NO   |     | 0000-00-00 00:00:00 |                |
| last_time_critical            | datetime     | NO   |     | 0000-00-00 00:00:00 |                |
| state_type                    | smallint(6)  | NO   | MUL | 0                   |                |
| last_notification             | datetime     | NO   |     | 0000-00-00 00:00:00 |                |
| next_notification             | datetime     | NO   |     | 0000-00-00 00:00:00 |                |
| no_more_notifications         | smallint(6)  | NO   |     | 0                   |                |
| notifications_enabled         | smallint(6)  | NO   | MUL | 0                   |                |
| problem_has_been_acknowledged | smallint(6)  | NO   | MUL | 0                   |                |
| acknowledgement_type          | smallint(6)  | NO   |     | 0                   |                |
| current_notification_number   | smallint(6)  | NO   |     | 0                   |                |
| passive_checks_enabled        | smallint(6)  | NO   | MUL | 0                   |                |
| active_checks_enabled         | smallint(6)  | NO   | MUL | 0                   |                |
| event_handler_enabled         | smallint(6)  | NO   | MUL | 0                   |                |
| flap_detection_enabled        | smallint(6)  | NO   | MUL | 0                   |                |
| is_flapping                   | smallint(6)  | NO   | MUL | 0                   |                |
| percent_state_change          | double       | NO   | MUL | 0                   |                |
| latency                       | double       | NO   | MUL | 0                   |                |
| execution_time                | double       | NO   | MUL | 0                   |                |
| scheduled_downtime_depth      | smallint(6)  | NO   | MUL | 0                   |                |
| failure_prediction_enabled    | smallint(6)  | NO   |     | 0                   |                |
| process_performance_data      | smallint(6)  | NO   |     | 0                   |                |
| obsess_over_service           | smallint(6)  | NO   |     | 0                   |                |
| modified_service_attributes   | int(11)      | NO   |     | 0                   |                |
| event_handler                 | varchar(255) | NO   |     |                     |                |
| check_command                 | varchar(255) | NO   |     |                     |                |
| normal_check_interval         | double       | NO   |     | 0                   |                |
| retry_check_interval          | double       | NO   |     | 0                   |                |
| check_timeperiod_object_id    | int(11)      | NO   |     | 0                   |                |
+-------------------------------+--------------+------+-----+---------------------+----------------+
have pm'd other files
Locked