Nagios Support Forum

Posted: **Mon Jul 09, 2018 7:30 pm**

We had a network interface issue last week.
Since then the event_handler log and eventman.log has increased a lot.
I had to change the cron to > instead of >> for the 2 log files.

The PostgreSQL log file also has shot up tremendously.
It is full of this kind of text

Code: Select all

 OR eventqueue_id = 5311130 OR eventqueue_id = 5311131 OR eventqueue_id = 5311132 OR eventqueue_id = 5311133 OR eventqueue_id = 5311136 OR eventqueue_id = 5311137 OR eventqueue_id = 5311138 OR eventqueue_id = 5311139 OR eventqueue_id = 5311140 OR eventqueue_id = 5311141 OR eventqueue_id = 5311142 OR eventqueue_id = 5311143 OR eventqueue_id = 5311144 OR eventqueue_id = 5311145 OR eventqueue_id = 5311146 OR eventqueue_id = 5311147 OR eventqueue_id = 5311148 OR eventqueue_id = 5311149 OR eventqueue_id = 5311150 OR eventqueue_id = 5311151 OR eventqueue_id = 5311153 OR eventqueue_id = 5311154 OR eventqueue_id = 5311155 OR eventqueue_id = 5311156 OR eventqueue_id = 5311157 OR eventqueue_id = 5311158 OR eventqueue_id = 5311159 OR eventqueue_id = 5311160 OR eventqueue_id = 5311161 OR eventqueue_id = 5311162 OR eventqueue_id = 5311163 OR eventqueue_id = 5311164 OR eventqueue_id = 5311165 OR eventqueue_id = 5311166 OR eventqueue_id = 5311167 OR eventqueue_id = 5311168 OR eventqueue_id = 5311169 OR eventqueue_id = 5311170 OR eventqueue_id = 5311171 OR eventqueue_id = 5311172 OR eventqueue_id = 5311173 OR eventqueue_id = 5311174 OR eventqueue_id = 5311175 OR eventqueue_id = 5311176 OR eventqueue_id = 5311177 OR eventqueue_id = 5311178 OR eventqueue_id = 5311179 OR eventqueue_id = 5311180 OR eventqueue_id = 5311181 OR eventqueue_id = 5311182 OR eventqueue_id = 5311183 OR eventqueue_id = 5311184 OR eventqueue_id = 5311185 OR eventqueue_id = 5311186 OR eventqueue_id = 5311187 OR eventqueue_id = 5311188 OR eventqueue_id = 5311189 OR eventqueue_id = 5311190 OR eventqueue_id = 5311191 OR eventqueue_id = 5311192 OR eventqueue_id = 5311193 OR eventqueue_id = 5311194 OR eventqueue_id = 5311195 OR eventqueue_id = 5311196 OR eventqueue_id = 5311197 OR eventqueue_id = 5311198 OR eventqueue_id = 5311199 OR eventqueue_id = 5311200 OR eventqueue_id = 5311201 OR eventqueue_id = 5311202 OR eventqueue_id = 5311203 OR eventqueue_id = 5311204 OR eventqueue_id = 5311205 OR eventqueue_id = 5311206 OR eventqueue_id = 5311207 OR eventqueue_id = 5311208 OR eventqueue_id = 5311209 OR eventqueue_id = 5311210 OR eventqueue_id = 5311211 OR eventqueue_id = 5311212 OR eventqueue_id = 5311213 OR eventqueue_id = 5311214 OR eventqueue_id = 5311215 OR eventqueue_id = 5311216 OR eventqueue_id = 5311217 OR eventqueue_id = 5311218 OR eventqueue_id = 5311219 OR eventqueue_id = 5311220 OR eventqueue_id = 5311221 OR eventqueue_id = 5311222 OR eventqueue_id = 5311223 OR eventqueue_id = 5311224 OR eventqueue_id = 5311225 OR eventqueue_id = 5311226 OR eventqueue_id = 5311227 OR eventqueue_id = 5311228 OR eventqueue_id = 5311229 OR eventqueue_id = 5311230 OR eventqueue_id = 5311231 OR eventqueue_id = 5311232 OR eventqueue_id = 5311233 OR eventqueue_id = 5311234 OR eventqueue_id = 5311235 OR eventqueue_id = 5311236 OR eventqueue_id = 5311237 OR eventqueue_id = 5311238 OR eventqueue_id = 5311239 OR eventqueue_id = 5311240 OR eventqueue_id = 5311241 OR eventqueue_id = 5311242 OR eventqueue_id = 5311243 OR eventqueue_id = 5311244 OR eventqueue_id = 5311245 OR eventqueue_id = 5311246
LOG:  checkpoints are occurring too frequently (10 seconds apart)
HINT:  Consider increasing the configuration parameter "checkpoint_segments".
LOG:  checkpoints are occurring too frequently (11 seconds apart)
HINT:  Consider increasing the configuration parameter "checkpoint_segments".
LOG:  checkpoints are occurring too frequently (10 seconds apart)
HINT:  Consider increasing the configuration parameter "checkpoint_segments".
LOG:  checkpoints are occurring too frequently (10 seconds apart)
HINT:  Consider increasing the configuration parameter "checkpoint_segments".
LOG:  checkpoints are occurring too frequently (10 seconds apart)
HINT:  Consider increasing the configuration parameter "checkpoint_segments".
LOG:  checkpoints are occurring too frequently (10 seconds apart)
HINT:  Consider increasing the configuration parameter "checkpoint_segments".
LOG:  checkpoints are occurring too frequently (10 seconds apart)
HINT:  Consider increasing the configuration parameter "checkpoint_segments".
LOG:  checkpoints are occurring too frequently (10 seconds apart)
HINT:  Consider increasing the configuration parameter "checkpoint_segments".
LOG:  checkpoints are occurring too frequently (11 seconds apart)
HINT:  Consider increasing the configuration parameter "checkpoint_segments".
LOG:  checkpoints are occurring too frequently (10 seconds apart)
HINT:  Consider increasing the configuration parameter "checkpoint_segments".

event_handler.log

Code: Select all

    [event_time] =>
    [2] => 2
    [event_source] => 2
    [3] => 1
    [event_type] => 1
    [4] => YToyMDp7czoxMjoiaGFuZGxlci10eXBlIjtzOjc6InNlcnZpY2UiO3M6NDoiaG9zdCI7czoxNzoiSkhSX0JSX0JnblBlbGFuZ2kiO3M6Nzoic2VydmljZSI7czoyODoiUG9ydCAwNiBTdGF0dXMgLSBTZXJpYWwwLzAvMCI7czoxMToiaG9zdGFkZHJlc3MiO3M6MTE6IjEwLjEuMTUuMjU0IjtzOjk6Imhvc3RzdGF0ZSI7czoyOiJVUCI7czoxMToiaG9zdHN0YXRlaWQiO3M6MToiMCI7czoxMToiaG9zdGV2ZW50aWQiO3M6ODoiMTAzMjkyODMiO3M6MTM6Imhvc3Rwcm9ibGVtaWQiO3M6MToiMCI7czoxMjoic2VydmljZXN0YXRlIjtzOjI6Ik9LIjtzOjE0OiJzZXJ2aWNlc3RhdGVpZCI7czoxOiIwIjtzOjE2OiJsYXN0c2VydmljZXN0YXRlIjtzOjc6IlVOS05PV04iO3M6MTg6Imxhc3RzZXJ2aWNlc3RhdGVpZCI7czoxOiIzIjtzOjE2OiJzZXJ2aWNlc3RhdGV0eXBlIjtzOjQ6IkhBUkQiO3M6MTQ6ImN1cnJlbnRhdHRlbXB0IjtzOjE6IjIiO3M6MTE6Im1heGF0dGVtcHRzIjtzOjE6IjIiO3M6MTQ6InNlcnZpY2VldmVudGlkIjtzOjg6IjEwMzMyMjM1IjtzOjE2OiJzZXJ2aWNlcHJvYmxlbWlkIjtzOjE6IjAiO3M6MTM6InNlcnZpY2VvdXRwdXQiO3M6MjU6Ik9LIC0gU2VyaWFsMC8wLzAgaXMgdXAvdXAiO3M6MTc6ImxvbmdzZXJ2aWNlb3V0cHV0IjtiOjA7czoxNToic2VydmljZWRvd250aW1lIjtzOjE6IjAiO30=
    [event_meta] => YToyMDp7czoxMjoiaGFuZGxlci10eXBlIjtzOjc6InNlcnZpY2UiO3M6NDoiaG9zdCI7czoxNzoiSkhSX0JSX0JnblBlbGFuZ2kiO3M6Nzoic2VydmljZSI7czoyODoiUG9ydCAwNiBTdGF0dXMgLSBTZXJpYWwwLzAvMCI7czoxMToiaG9zdGFkZHJlc3MiO3M6MTE6IjEwLjEuMTUuMjU0IjtzOjk6Imhvc3RzdGF0ZSI7czoyOiJVUCI7czoxMToiaG9zdHN0YXRlaWQiO3M6MToiMCI7czoxMToiaG9zdGV2ZW50aWQiO3M6ODoiMTAzMjkyODMiO3M6MTM6Imhvc3Rwcm9ibGVtaWQiO3M6MToiMCI7czoxMjoic2VydmljZXN0YXRlIjtzOjI6Ik9LIjtzOjE0OiJzZXJ2aWNlc3RhdGVpZCI7czoxOiIwIjtzOjE2OiJsYXN0c2VydmljZXN0YXRlIjtzOjc6IlVOS05PV04iO3M6MTg6Imxhc3RzZXJ2aWNlc3RhdGVpZCI7czoxOiIzIjtzOjE2OiJzZXJ2aWNlc3RhdGV0eXBlIjtzOjQ6IkhBUkQiO3M6MTQ6ImN1cnJlbnRhdHRlbXB0IjtzOjE6IjIiO3M6MTE6Im1heGF0dGVtcHRzIjtzOjE6IjIiO3M6MTQ6InNlcnZpY2VldmVudGlkIjtzOjg6IjEwMzMyMjM1IjtzOjE2OiJzZXJ2aWNlcHJvYmxlbWlkIjtzOjE6IjAiO3M6MTM6InNlcnZpY2VvdXRwdXQiO3M6MjU6Ik9LIC0gU2VyaWFsMC8wLzAgaXMgdXAvdXAiO3M6MTc6ImxvbmdzZXJ2aWNlb3V0cHV0IjtiOjA7czoxNToic2VydmljZWRvd250aW1lIjtzOjE6IjAiO30=
)
    <p><pre>SQL Error [nagiosxi] : ERROR:  stack depth limit exceeded
HINT:  Increase the configuration parameter "max_stack_depth", after ensuring the platform's stack depth limit is adequate.</pre></p>

eventman.log is dumping historical event. It is now, 8:30am and everything is fine for the host and services but this 8 hours ago event comes out in the log

Code: Select all

SNMP TRAP SENDER NOT CONFIGURED!
PROCESS EVENT: ID=167049178, SOURCE=2, TYPE=1, TIME=2018-07-10 00:00:18.478106
*** GLOBAL HANDLER...
Array
(
    [event_id] => 167049178
    [event_source] => 2
    [event_type] => 1
    [event_time] => 2018-07-10 00:00:18.478106
    [event_meta] => Array
        (
            [handler-type] => service
            [host] => MBCMAIL01
            [service] => CPU Load
            [hostaddress] => 10.23.18.25
            [hoststate] => UP
            [hoststateid] => 0
            [hosteventid] => 10219291
            [hostproblemid] => 0
            [servicestate] => UNKNOWN
            [servicestateid] => 3
            [lastservicestate] => UNKNOWN
            [lastservicestateid] => 3
            [servicestatetype] => SOFT
            [currentattempt] => 3
            [maxattempts] => 6
            [serviceeventid] => 10301295
            [serviceproblemid] => 4560075
            [serviceoutput] => CHECK_NRPE: Socket timeout after 60 seconds.
            [longserviceoutput] =>
            [servicedowntime] => 0
        )

    [logging_enabled] => 1
)

Please help on how to fix this issue.

Posted: **Mon Jul 09, 2018 7:52 pm**

Look at the growth of the database backup. Something is not right

MySQL

[root@nagiosprodxi1 nagiosxi]# ls -lrt
total 3120688
-rw-r--r-- 1 root root 723798 Jul 4 07:00 nagiosxi_2018-07-04.Wednesday.sql.gz
-rw-r--r-- 1 root root 714193 Jul 5 07:00 nagiosxi_2018-07-05.Thursday.sql.gz
-rw-r--r-- 1 root root 524439 Jul 6 07:00 nagiosxi_2018-07-06.Friday.sql.gz
-rw-r--r-- 1 root root 686496215 Jul 8 07:01 nagiosxi_2018-07-08.Sunday.sql.gz
-rw-r--r-- 1 root root 712355038 Jul 9 07:01 nagiosxi_2018-07-09.Monday.sql.gz
-rw-r--r-- 1 root root 1794722718 Jul 10 07:07 nagiosxi_2018-07-10.Tuesday.sql.gz
[root@nagiosprodxi1 nagiosxi]# pwd
/store/backups/postgresql/daily/nagiosxi

Posted: **Tue Jul 10, 2018 10:40 am**

The network issue that happen probably caused a corruption to the database so the old entries are not getting removed which is causing the database to increase in size.
To fix this, the processes that nagios uses need to be stopped and the database tables truncated / vacuumed.
To do this, run the following as root.

Code: Select all

service nagios stop
service ndo2db stop
service crond stop
pkill -9 -u nagios
echo "truncate table xi_events; truncate table xi_meta; truncate table xi_eventqueue;" | psql nagiosxi nagiosxi
echo "vacuum;vacuum analyze;vacuum full;"|psql nagiosxi postgres
service postgresql restart
service crond start
service ndo2db start
service nagios start
service npcd restart

After that, keep an eye on the system to see if the database sizes continue to increase to a very large size.
You can run this to display the sizes of the Postgres tables.

Code: Select all

echo "SELECT relname AS objectname, relkind AS objecttype, reltuples, pg_size_pretty(relpages::bigint*8*1024) AS size FROM pg_class WHERE relpages >= 8 ORDER BY relpages DESC;" | psql nagiosxi nagiosxi

Let us know how it works out.

Posted: **Tue Jul 10, 2018 6:46 pm**

tgriep wrote:The network issue that happen probably caused a corruption to the database so the old entries are not getting removed which is causing the database to increase in size.
To fix this, the processes that nagios uses need to be stopped and the database tables truncated / vacuumed.
To do this, run the following as root.
Code: Select all
service nagios stop
service ndo2db stop
service crond stop
pkill -9 -u nagios
echo "truncate table xi_events; truncate table xi_meta; truncate table xi_eventqueue;" | psql nagiosxi nagiosxi
echo "vacuum;vacuum analyze;vacuum full;"|psql nagiosxi postgres
service postgresql restart
service crond start
service ndo2db start
service nagios start
service npcd restart
After that, keep an eye on the system to see if the database sizes continue to increase to a very large size.
You can run this to display the sizes of the Postgres tables.
Code: Select all
echo "SELECT relname AS objectname, relkind AS objecttype, reltuples, pg_size_pretty(relpages::bigint*8*1024) AS size FROM pg_class WHERE relpages >= 8 ORDER BY relpages DESC;" | psql nagiosxi nagiosxi
Let us know how it works out.

Thanks. The pgsql directory reduced from 37Gig to 2.1gig

Code: Select all

[root@nagiosprodxi1 ~]# echo "SELECT relname AS objectname, relkind AS objecttype, reltuples, pg_size_pretty(relpages::bigint*8*1024) AS size FROM pg_class WHERE relpages >= 8 ORDER BY relpages DESC;" | psql nagiosxi nagiosxi
           objectname            | objecttype | reltuples |  size
---------------------------------+------------+-----------+---------
 xi_options                      | r          |      4114 | 1016 kB
 pg_proc                         | r          |      2220 | 432 kB
 pg_attribute                    | r          |      2397 | 336 kB
 pg_depend                       | r          |      5659 | 336 kB
 xi_options_name_idx             | i          |      4114 | 312 kB
 pg_depend_reference_index       | i          |      5659 | 280 kB
 pg_depend_depender_index        | i          |      5659 | 272 kB
 xi_auditlog                     | r          |      1956 | 264 kB
 xi_auditlog_ip_address          | i          |      1956 | 232 kB
 xi_auditlog_user                | i          |      1956 | 208 kB
 pg_proc_proname_args_nsp_index  | i          |      2220 | 208 kB
 pg_toast_2618                   | t          |       115 | 200 kB
 xi_usermeta                     | r          |      1234 | 176 kB
 pg_description                  | r          |      2403 | 168 kB
 xi_auditlog_source              | i          |      1956 | 136 kB
 pg_statistic                    | r          |       420 | 136 kB
 pg_description_o_c_o_index      | i          |      2403 | 128 kB
 xi_auditlog_type                | i          |      1956 | 128 kB
 xi_options_pkey                 | i          |      4114 | 128 kB
 pg_attribute_relid_attnam_index | i          |      2397 | 120 kB
 pg_toast_2619                   | t          |        56 | 104 kB
 xi_usermeta_user_id_key         | i          |      1234 | 104 kB
 pg_operator                     | r          |       704 | 104 kB
 pg_attribute_relid_attnum_index | i          |      2397 | 80 kB
 xi_usermeta_autoload_idx        | i          |      1234 | 72 kB
 pg_rewrite                      | r          |        86 | 72 kB
 xi_auditlog_pkey                | i          |      1956 | 72 kB
 xi_auditlog_log_time            | i          |      1956 | 72 kB
 pg_proc_oid_index               | i          |      2220 | 64 kB
 pg_type                         | r          |       328 | 64 kB
(30 rows)

Posted: **Wed Jul 11, 2018 8:15 am**

That is good news. Let us know if it is OK to close the post and lock it up.

Posted: **Wed Jul 11, 2018 8:49 pm**

tgriep wrote:That is good news. Let us know if it is OK to close the post and lock it up.

OK. Please lock it up.

Posted: **Thu Jul 12, 2018 7:08 am**

rajasegar wrote:
tgriep wrote:That is good news. Let us know if it is OK to close the post and lock it up.
OK. Please lock it up.

Great Locking!

Nagios Support Forum

event_handler log and postgresql log filling up file system

event_handler log and postgresql log filling up file system

Re: event_handler log and postgresql log filling up file sys

Re: event_handler log and postgresql log filling up file sys

Re: event_handler log and postgresql log filling up file sys

Re: event_handler log and postgresql log filling up file sys

Re: event_handler log and postgresql log filling up file sys

Re: event_handler log and postgresql log filling up file sys