nagios serverd load rising

elinagios · Post by **elinagios** » Tue Feb 02, 2021 4:04 am

Hello

server:
centos 7.9
nagiosxi 5.7.5

For some reason our nagios server load has gone up significantly in the last year but the host and service count has stayed almost the same. I have a feeling that someone made configuration error somewhere but i cant pin-point what it may be.
1 year ago average load was a bit under 3, 6 months ago around 4 and since then it has been rising, last month average load is around 8 and peaks often going to 15. Like i said, the host and service count doesn't seem to have grown (5100 hosts, 4606 services). I even changed many 1 minute checks to 2-5 minute, that didnt have any impact to load.

top -b -n1 |head -n 27
this command will show me the top load generators, there are different check going through, like check_snmp_stor, check_snmp_win and so on but what seems to be the most frequent is snmpget that takes a lot of load.

1) Are the snmpget queries logged somewhere? I looked through log files and didnt see anything intresting anywhere.
2) check_snmp_stor, check_snmp_win checks are using snmp, why is snmpget logged differently in top?

When i look at my commands in nagiosxi and filter snmpget, there are 0 answers there.

So what else should i check?

benjaminsmith · Post by **benjaminsmith** » Tue Feb 02, 2021 4:56 pm

Hi elinagios,

The /usr/local/nagios/var/nagios.log is probably the best place to check. That said, snmp checks to usually take more CPU than other types of checks.

Can you send us the system profile and we'll take a closer look at the logs for you?

To send us your system profile.
Login to the Nagios XI GUI using a web browser.
Click the "Admin" > "System Profile" Menu
Click the "Download Profile" button

Best Regards,
Benjamin

elinagios · Post by **elinagios** » Wed Feb 03, 2021 3:12 am

So i looked closely on the /usr/local/nagios/var/nagios.log log file. What i noticed is that some services are giving:
Warning: The results of service 'Slot 5 TS Memory Usage' on host 'hostname1' are stale by 0d 0h 0m 34s (threshold=0d 0h 5m 18s). I'm forcing an immediate check of the service.
If i look at the service and host definition then the check freshness is enabled. Both are using active checks, check_ping for host and check_snmp for service. As i understand check freshness is not needed for active checks and can be disabled ?

benjaminsmith · Post by **benjaminsmith** » Wed Feb 03, 2021 4:50 pm

Hi,

As i understand check freshness is not needed for active checks and can be disabled ?

That's correct, freshness checking is not required for active checks.

See: Host & Service Freshness Checks

Please PM the system profile and I'll review the logs and check for errors.

--Benjamin

elinagios · Post by **elinagios** » Thu Feb 04, 2021 2:49 am

I think you can close this thread. I saw a huge improvement on the CPU load side. Many those stale services where check_snmp running interval with 1 minute, removing the check_freshness and from some other active checks too has taken down the average load back to 3-4.
Thank you!

scottwilkerson · Post by **scottwilkerson** » Thu Feb 04, 2021 9:57 am

elinagios wrote:I think you can close this thread. I saw a huge improvement on the CPU load side. Many those stale services where check_snmp running interval with 1 minute, removing the check_freshness and from some other active checks too has taken down the average load back to 3-4.
Thank you!

Locking thread

Nagios Support Forum

nagios serverd load rising

nagios serverd load rising

Re: nagios serverd load rising

Re: nagios serverd load rising

Re: nagios serverd load rising

Re: nagios serverd load rising

Re: nagios serverd load rising