event_handler.log is flooded

This support forum board is for support questions relating to Nagios XI, our flagship commercial network monitoring solution.
Locked
noweda
Posts: 52
Joined: Fri Dec 06, 2013 2:47 am

event_handler.log is flooded

Post by noweda »

Hello support,

we are facing the problem, that the file /usr/local/nagiosxi/var/event_handler.log is filling up the disk within a few hours. Our workaround was to delete this file. That stopped the problem. The file is newly created but has permission root:root set and about once a minute the following line is logged:

Code: Select all

/bin/sh: nagios: command not found
When we set permissions to nagios:nagios, the file is flooded again with entries like these:

Code: Select all

Array
(
    [eventqueue_id] => 1802888
    [event_time] =>
    [event_source] => 2
    [event_type] => 1
    [event_meta] => YToyMDp7czoxMjoiaGFuZGxlci10eXBlIjtzOjc6InNlcnZpY2UiO3M6NDoiaG9zdCI7czo4OiJQUjAxMzE0MCI7czo3OiJzZXJ2aWNlIjtzOjI0OiIwMC1zbm1wX1ByaW50ZXJQYWdlQ291bnQiO3M6MTE6Imhvc3RhZGRyZXNzIjtzOjEyOiIxODguMS4zMy4xNDAiO3M6OToiaG9zdHN0YXRlIjtzOjQ6IkRPV04iO3M6MTE6Imhvc3RzdGF0ZWlkIjtzOjE6IjEiO3M6MTE6Imhvc3RldmVudGlkIjtzOjc6IjI5MzY0NDciO3M6MTM6Imhvc3Rwcm9ibGVtaWQiO3M6NzoiMTM0NTUxMyI7czoxMjoic2VydmljZXN0YXRlIjtzOjg6IkNSSVRJQ0FMIjtzOjE0OiJzZXJ2aWNlc3RhdGVpZCI7czoxOiIyIjtzOjE2OiJsYXN0c2VydmljZXN0YXRlIjtzOjg6IkNSSVRJQ0FMIjtzOjE4OiJsYXN0c2VydmljZXN0YXRlaWQiO3M6MToiMiI7czoxNjoic2VydmljZXN0YXRldHlwZSI7czo0OiJTT0ZUIjtzOjE0OiJjdXJyZW50YXR0ZW1wdCI7czoxOiI0IjtzOjExOiJtYXhhdHRlbXB0cyI7czoxOiI0IjtzOjE0OiJzZXJ2aWNlZXZlbnRpZCI7czo3OiIyOTM2NDg0IjtzOjE2OiJzZXJ2aWNlcHJvYmxlbWlkIjtzOjc6IjEzNDU1MzQiO3M6MTM6InNlcnZpY2VvdXRwdXQiO3M6NTU6IkNSSVRJQ0FMIC0gUGx1Z2luIHRpbWVkIG91dCB3aGlsZSBleGVjdXRpbmcgc3lzdGVtIGNhbGwiO3M6MTc6ImxvbmdzZXJ2aWNlb3V0cHV0IjtiOjA7czoxNToic2VydmljZWRvd250aW1lIjtzOjE6IjEiO30=
)
Array
(
    [eventqueue_id] => 1802889
    [event_time] =>
    [event_source] => 2
    [event_type] => 1
    [event_meta] => YToyMDp7czoxMjoiaGFuZGxlci10eXBlIjtzOjc6InNlcnZpY2UiO3M6NDoiaG9zdCI7czo4OiJQUjEwMzE0MSI7czo3OiJzZXJ2aWNlIjtzOjIxOiIwMS1zbm1wX1ByaW50ZXJTdGF0dXMiO3M6MTE6Imhvc3RhZGRyZXNzIjtzOjEzOiIxMC4xMTAuMzMuMTQxIjtzOjk6Imhvc3RzdGF0ZSI7czoyOiJVUCI7czoxMToiaG9zdHN0YXRlaWQiO3M6MToiMCI7czoxMToiaG9zdGV2ZW50aWQiO3M6NzoiMjkzNjY3OSI7czoxMzoiaG9zdHByb2JsZW1pZCI7czoxOiIwIjtzOjEyOiJzZXJ2aWNlc3RhdGUiO3M6NzoiV0FSTklORyI7czoxNDoic2VydmljZXN0YXRlaWQiO3M6MToiMSI7czoxNjoibGFzdHNlcnZpY2VzdGF0ZSI7czo3OiJVTktOT1dOIjtzOjE4OiJsYXN0c2VydmljZXN0YXRlaWQiO3M6MToiMyI7czoxNjoic2VydmljZXN0YXRldHlwZSI7czo0OiJIQVJEIjtzOjE0OiJjdXJyZW50YXR0ZW1wdCI7czoxOiI0IjtzOjExOiJtYXhhdHRlbXB0cyI7czoxOiI0IjtzOjE0OiJzZXJ2aWNlZXZlbnRpZCI7czo3OiIyOTM2NzA2IjtzOjE2OiJzZXJ2aWNlcHJvYmxlbWlkIjtzOjc6IjEzNDUxMDQiO3M6MTM6InNlcnZpY2VvdXRwdXQiO3M6NDE6IldBUk5JTkcgLSBTdGF0dXM6IG9mZmxpbmUsIHNlcnZpY2UgbmVlZGVkIjtzOjE3OiJsb25nc2VydmljZW91dHB1dCI7YjowO3M6MTU6InNlcnZpY2Vkb3dudGltZSI7czoxOiIxIjt9
)
Array
(
    [eventqueue_id] => 1802890
    [event_time] =>
    [event_source] => 2
    [event_type] => 1
    [event_meta] => YToyMDp7czoxMjoiaGFuZGxlci10eXBlIjtzOjc6InNlcnZpY2UiO3M6NDoiaG9zdCI7czo4OiJQUjE3MzE0MSI7czo3OiJzZXJ2aWNlIjtzOjIxOiIwMS1zbm1wX1ByaW50ZXJTdGF0dXMiO3M6MTE6Imhvc3RhZGRyZXNzIjtzOjEzOiIxODguMTcuMzMuMTQxIjtzOjk6Imhvc3RzdGF0ZSI7czoyOiJVUCI7czoxMToiaG9zdHN0YXRlaWQiO3M6MToiMCI7czoxMToiaG9zdGV2ZW50aWQiO3M6NzoiMjkzNjM5MyI7czoxMzoiaG9zdHByb2JsZW1pZCI7czoxOiIwIjtzOjEyOiJzZXJ2aWNlc3RhdGUiO3M6NzoiV0FSTklORyI7czoxNDoic2VydmljZXN0YXRlaWQiO3M6MToiMSI7czoxNjoibGFzdHNlcnZpY2VzdGF0ZSI7czo3OiJXQVJOSU5HIjtzOjE4OiJsYXN0c2VydmljZXN0YXRlaWQiO3M6MToiMSI7czoxNjoic2VydmljZXN0YXRldHlwZSI7czo0OiJTT0ZUIjtzOjE0OiJjdXJyZW50YXR0ZW1wdCI7czoxOiIyIjtzOjExOiJtYXhhdHRlbXB0cyI7czoxOiI0IjtzOjE0OiJzZXJ2aWNlZXZlbnRpZCI7czo3OiIyOTM2NzAzIjtzOjE2OiJzZXJ2aWNlcHJvYmxlbWlkIjtzOjc6IjEzNDU2MzYiO3M6MTM6InNlcnZpY2VvdXRwdXQiO3M6NDE6IldBUk5JTkcgLSBTdGF0dXM6IG9mZmxpbmUsIHNlcnZpY2UgbmVlZGVkIjtzOjE3OiJsb25nc2VydmljZW91dHB1dCI7YjowO3M6MTU6InNlcnZpY2Vkb3dudGltZSI7czoxOiIxIjt9
)
What can we do? How can we find the root cause for that?

Best regards
Christoph
noweda
Posts: 52
Joined: Fri Dec 06, 2013 2:47 am

Re: event_handler.log is flooded

Post by noweda »

Hello support,

it seems, that I have found a solution. In /etc/cron.d/nagiosxi I found the process that writes into that file and I checked the php file:

Code: Select all

root@ux010162:~# grep -ir event /etc/cron*
/etc/cron.d/nagiosxi:*   * * * * nagios /usr/bin/php -q /usr/local/nagiosxi/cron/eventman.php >> /usr/local/nagiosxi/var/eventman.log 2>&1
/etc/cron.d/nagiosxi:*   * * * * nagios /usr/bin/php -q /usr/local/nagiosxi/cron/event_handler.php >> /usr/local/nagiosxi/var/event_handler.log 2>&1
In the database xi_eventqueue I saw, that the entries do not seem to change - the amount of lines stayed fix, even if the should be truncated at every run of event_handler.php:

Code: Select all

root@ux010162:~# echo "select count(*) from xi_eventqueue" | mysql -uroot -pnagiosxi -D nagiosxi
count(*)
63
For that I ran the line in the cron file under root:

Code: Select all

/usr/bin/php -q /usr/local/nagiosxi/cron/event_handler.php
After that the table was empty. For whatever reason, the user nagios was not able to truncate the database I guess.

Now I changed the owner of event_handler.log back to nagios:nagios. Now the file is growing normally and every minute something a few lines are logged.

But the file /usr/local/nagiosxi/var/eventman.log was still flooded. When I did a "tail -f" on the file, the lines ran over the screen.

When I have a look in the database xi_events, it's growing slowly with 1 or 2 entries per minute, but with more that 84k entries:

Code: Select all

Every 2.0s: echo "select count(*) from xi_events" | mysql -uroot -pnagiosxi -D nagiosxi                               Wed Jul 25 11:14:29 2018

count(*)
84358
For that I did a truncate on the dbs xi_events and xi_meta:

Code: Select all

root@ux010162:~# echo "truncate table xi_events; truncate table xi_meta;" | mysql -uroot -pnagiosxi -D nagiosxi 
Now the eventman.log seems to be OK:

Code: Select all

root@ux010162:/tmp/pg2mysql-master# tail -f /usr/local/nagiosxi/var/eventman.log
    [event_source] => 2
    [event_type] => 1
    [event_time] => 2018-07-25 09:51:39
    [event_meta] =>
    [logging_enabled] => 1
)
SNMP TRAP SENDER NOT CONFIGURED!
....../bin/sh: nagios: command not found
.
PROCESSED 407 EVENTS
.................../bin/sh: nagios: command not found

PROCESSED 0 EVENTS
..........
Another permission problem in the database?

Can we do something else to make sure, the problems don't occur again and to verify, that all other tables are OK!?

Best regards,
Christoph
scottwilkerson
DevOps Engineer
Posts: 19396
Joined: Tue Nov 15, 2011 3:11 pm
Location: Nagios Enterprises
Contact:

Re: event_handler.log is flooded

Post by scottwilkerson »

I would run the database repair script also
https://assets.nagios.com/downloads/nag ... tabase.pdf

I'm not sure how the permissions on those log files changed, but if they are root:root the cron job will not be able to run.

If the cron doesn't run, the queue will never empty causing the problem you saw.


It is safe to change the lines in /etc/cron.d/nagiosxi to the following which will prevent the log from getting too big

Code: Select all

* * * * * nagios /usr/bin/php -q /usr/local/nagiosxi/cron/eventman.php > /usr/local/nagiosxi/var/eventman.log 2>&1
* * * * * nagios /usr/bin/php -q /usr/local/nagiosxi/cron/event_handler.php > /usr/local/nagiosxi/var/event_handler.log 2>&1
Former Nagios employee
Creator:
Human Design Website
Get Your Human Design Chart
Locked