Nagios Support Forum

Posted: **Tue Jun 11, 2019 5:21 pm**

Ok here are NOM log files. There were 2.

nom-listing.PNG

In saying that NOM was acting up, Greg meant that in the Nagios XI Jobs service check. Nom was stale,
first at a warning level than then went to critical before clearing after I ran nom.php from command line.
This time none of the other Nagios XI Jobs reported any issues.

thanks,
Penny

Posted: **Wed Jun 12, 2019 9:28 am**

Thanks for the log files.
I see some database connection issues to the Postgress database that the NOM script uses to determine if it needs to run so can you get the following files from the Nagios server and upload them to the post?

Code: Select all

/var/lib/pgsql/data/pg_log/postgresql-Mon.log

This seems to cause the Apply Config to fail and that could of caused the issue so I would like to check a few more things.

Can you run the following commands as root and upload the /tmp/info.txt file to the post?

Code: Select all

echo "SELECT relname AS objectname, relkind AS objecttype, reltuples, pg_size_pretty(relpages::bigint*8*1024) AS size FROM pg_class WHERE relpages >= 8 ORDER BY relpages DESC;" | psql nagiosxi nagiosxi >/tmp/info.txt
echo "select * from xi_meta;" | psql nagiosxi nagiosxi |grep last_nom_nagioscore_checkpoint >>/tmp/info.txt
ls -lR /usr/local/nagiosxi/nom/ >>/tmp/info.txt
ls -lR /usr/local/nagios/share/perfdata/ >>/tmp/info.txt

Thanks.

Posted: **Wed Jun 12, 2019 9:48 am**

Hi Tom,
Here is info.txt
I've included the postgresql log file from Monday as well.

thanks,
Penny

Posted: **Wed Jun 12, 2019 1:24 pm**

The Postgres log was full of these errors.
FATAL: connection limit exceeded for non-superusers

That probably caused the issue with the NOM script as it could not connect to the Postgres database to update the information when it was running.

Edit this file on the Nagios server

Code: Select all

/var/lib/pgsql/data/postgresql.conf

change this from

Code: Select all

max_connections = 100

to

Code: Select all

max_connections = 400

Save the file and restart the nagios processes by running the following as root

Code: Select all

service npcd stop
service nagios stop
service ndo2db stop
service crond stop
service postgresql restart
rm -f /usr/local/nagios/var/rw/nagios.cmd
rm -f /usr/local/nagios/var/nagios.lock
rm -f /var/run/nagios.lock
rm -f /usr/local/nagios/var/ndo.sock
rm -f /usr/local/nagios/var/ndo2db.lock
rm -f /var/lib/mrtg/mrtg_l
rm -f /usr/local/nagiosxi/var/*.lock
for i in `ipcs -q | grep nagios |awk '{print $2}'`; do ipcrm -q $i; done
pkill -9 -u nagios
pkill python
service httpd restart
service ndo2db start
service nagios start
service npcd start
service crond start

Posted: **Fri Jun 14, 2019 11:21 am**

Since making those changes we haven't had the issue with NOM. However, the event log did record a time change on Monday.

2019-06-10 15:54:14 Warning: A system time change of -1 seconds (0d 0h 0m 1s backwards in time) has been detected. Compensating.

If you remember, when we first had an issue with the production server on May 16th, initially you thought it could be attributed to the Time change message. We had actually received four of those messages during that weekend. Then on the following Friday you upped the parameters on the mysql database and things seemed to run smoothly.

Seeing this again has me nervous. I understand that it is at the system level but we have moved to an ntp server that is more reliable. I had seen a post that said it could be a bad battery on the motherboard. Wanted to mention it in case you had any thoughts or suggestions.

Thanks,
Greg

Posted: **Fri Jun 14, 2019 11:56 am**

If I remember right, the time changes from before were for many minutes, this change is only for -1 second which I would not worry about.

Nagios Support Forum

Production server wproc errors returned

Re: Production server wproc errors returned

Re: Production server wproc errors returned

Re: Production server wproc errors returned

Re: Production server wproc errors returned

Re: Production server wproc errors returned

Re: Production server wproc errors returned