Production server wproc errors returned

This support forum board is for support questions relating to Nagios XI, our flagship commercial network monitoring solution.
pkarr
Posts: 58
Joined: Fri Oct 05, 2012 1:01 pm

Re: Production server wproc errors returned

Post by pkarr »

Ok here are NOM log files. There were 2.
nom-listing.PNG
In saying that NOM was acting up, Greg meant that in the Nagios XI Jobs service check. Nom was stale,
first at a warning level than then went to critical before clearing after I ran nom.php from command line.
This time none of the other Nagios XI Jobs reported any issues.

thanks,
Penny
You do not have the required permissions to view the files attached to this post.
User avatar
tgriep
Madmin
Posts: 9190
Joined: Thu Oct 30, 2014 9:02 am

Re: Production server wproc errors returned

Post by tgriep »

Thanks for the log files.
I see some database connection issues to the Postgress database that the NOM script uses to determine if it needs to run so can you get the following files from the Nagios server and upload them to the post?

Code: Select all

/var/lib/pgsql/data/pg_log/postgresql-Mon.log
This seems to cause the Apply Config to fail and that could of caused the issue so I would like to check a few more things.

Can you run the following commands as root and upload the /tmp/info.txt file to the post?

Code: Select all

echo "SELECT relname AS objectname, relkind AS objecttype, reltuples, pg_size_pretty(relpages::bigint*8*1024) AS size FROM pg_class WHERE relpages >= 8 ORDER BY relpages DESC;" | psql nagiosxi nagiosxi >/tmp/info.txt
echo "select * from xi_meta;" | psql nagiosxi nagiosxi |grep last_nom_nagioscore_checkpoint >>/tmp/info.txt
ls -lR /usr/local/nagiosxi/nom/ >>/tmp/info.txt
ls -lR /usr/local/nagios/share/perfdata/ >>/tmp/info.txt
Thanks.
Be sure to check out our Knowledgebase for helpful articles and solutions!
pkarr
Posts: 58
Joined: Fri Oct 05, 2012 1:01 pm

Re: Production server wproc errors returned

Post by pkarr »

Hi Tom,
Here is info.txt
I've included the postgresql log file from Monday as well.

thanks,
Penny
You do not have the required permissions to view the files attached to this post.
User avatar
tgriep
Madmin
Posts: 9190
Joined: Thu Oct 30, 2014 9:02 am

Re: Production server wproc errors returned

Post by tgriep »

The Postgres log was full of these errors.
FATAL: connection limit exceeded for non-superusers

That probably caused the issue with the NOM script as it could not connect to the Postgres database to update the information when it was running.

Edit this file on the Nagios server

Code: Select all

/var/lib/pgsql/data/postgresql.conf
change this from

Code: Select all

max_connections = 100
to

Code: Select all

max_connections = 400
Save the file and restart the nagios processes by running the following as root

Code: Select all

service npcd stop
service nagios stop
service ndo2db stop
service crond stop
service postgresql restart
rm -f /usr/local/nagios/var/rw/nagios.cmd
rm -f /usr/local/nagios/var/nagios.lock
rm -f /var/run/nagios.lock
rm -f /usr/local/nagios/var/ndo.sock
rm -f /usr/local/nagios/var/ndo2db.lock
rm -f /var/lib/mrtg/mrtg_l
rm -f /usr/local/nagiosxi/var/*.lock
for i in `ipcs -q | grep nagios |awk '{print $2}'`; do ipcrm -q $i; done
pkill -9 -u nagios
pkill python
service httpd restart
service ndo2db start
service nagios start
service npcd start
service crond start
Be sure to check out our Knowledgebase for helpful articles and solutions!
gregwhite
Posts: 206
Joined: Wed Jun 01, 2011 12:40 pm

Re: Production server wproc errors returned

Post by gregwhite »

Since making those changes we haven't had the issue with NOM. However, the event log did record a time change on Monday.

2019-06-10 15:54:14 Warning: A system time change of -1 seconds (0d 0h 0m 1s backwards in time) has been detected. Compensating.

If you remember, when we first had an issue with the production server on May 16th, initially you thought it could be attributed to the Time change message. We had actually received four of those messages during that weekend. Then on the following Friday you upped the parameters on the mysql database and things seemed to run smoothly.

Seeing this again has me nervous. I understand that it is at the system level but we have moved to an ntp server that is more reliable. I had seen a post that said it could be a bad battery on the motherboard. Wanted to mention it in case you had any thoughts or suggestions.

Thanks,
Greg
User avatar
tgriep
Madmin
Posts: 9190
Joined: Thu Oct 30, 2014 9:02 am

Re: Production server wproc errors returned

Post by tgriep »

If I remember right, the time changes from before were for many minutes, this change is only for -1 second which I would not worry about.
Be sure to check out our Knowledgebase for helpful articles and solutions!
Locked