Page 1 of 1

nagios serverd load rising

Posted: Tue Feb 02, 2021 4:04 am
by elinagios
Hello

server:
centos 7.9
nagiosxi 5.7.5

For some reason our nagios server load has gone up significantly in the last year but the host and service count has stayed almost the same. I have a feeling that someone made configuration error somewhere but i cant pin-point what it may be.
1 year ago average load was a bit under 3, 6 months ago around 4 and since then it has been rising, last month average load is around 8 and peaks often going to 15. Like i said, the host and service count doesn't seem to have grown (5100 hosts, 4606 services). I even changed many 1 minute checks to 2-5 minute, that didnt have any impact to load.

top -b -n1 |head -n 27
this command will show me the top load generators, there are different check going through, like check_snmp_stor, check_snmp_win and so on but what seems to be the most frequent is snmpget that takes a lot of load.

1) Are the snmpget queries logged somewhere? I looked through log files and didnt see anything intresting anywhere.
2) check_snmp_stor, check_snmp_win checks are using snmp, why is snmpget logged differently in top?

When i look at my commands in nagiosxi and filter snmpget, there are 0 answers there.

So what else should i check?

Re: nagios serverd load rising

Posted: Tue Feb 02, 2021 4:56 pm
by benjaminsmith
Hi elinagios,

The /usr/local/nagios/var/nagios.log is probably the best place to check. That said, snmp checks to usually take more CPU than other types of checks.

Can you send us the system profile and we'll take a closer look at the logs for you?

To send us your system profile.
Login to the Nagios XI GUI using a web browser.
Click the "Admin" > "System Profile" Menu
Click the "Download Profile" button

Best Regards,
Benjamin

Re: nagios serverd load rising

Posted: Wed Feb 03, 2021 3:12 am
by elinagios
So i looked closely on the /usr/local/nagios/var/nagios.log log file. What i noticed is that some services are giving:
Warning: The results of service 'Slot 5 TS Memory Usage' on host 'hostname1' are stale by 0d 0h 0m 34s (threshold=0d 0h 5m 18s). I'm forcing an immediate check of the service.
If i look at the service and host definition then the check freshness is enabled. Both are using active checks, check_ping for host and check_snmp for service. As i understand check freshness is not needed for active checks and can be disabled ?

Re: nagios serverd load rising

Posted: Wed Feb 03, 2021 4:50 pm
by benjaminsmith
Hi,
As i understand check freshness is not needed for active checks and can be disabled ?
That's correct, freshness checking is not required for active checks.

See: Host & Service Freshness Checks

Please PM the system profile and I'll review the logs and check for errors.

--Benjamin

Re: nagios serverd load rising

Posted: Thu Feb 04, 2021 2:49 am
by elinagios
I think you can close this thread. I saw a huge improvement on the CPU load side. Many those stale services where check_snmp running interval with 1 minute, removing the check_freshness and from some other active checks too has taken down the average load back to 3-4.
Thank you!

Re: nagios serverd load rising

Posted: Thu Feb 04, 2021 9:57 am
by scottwilkerson
elinagios wrote:I think you can close this thread. I saw a huge improvement on the CPU load side. Many those stale services where check_snmp running interval with 1 minute, removing the check_freshness and from some other active checks too has taken down the average load back to 3-4.
Thank you!
Locking thread