Nagios logs filling up syslog

This support forum board is for support questions relating to Nagios XI, our flagship commercial network monitoring solution.
rajasegar
Posts: 1018
Joined: Sun Mar 30, 2014 10:49 pm

Nagios logs filling up syslog

Post by rajasegar »

I am using 5.7.3 XI Enterprise on RHEL 7.6.

The /var/log/messages is getting filled up with Nagios logs.
Please advice how to stop this.
use_syslog = 0 in nagios.cfg has no effect.

Code: Select all

Sep 30 08:21:31 nagiosprodxi5 check_nrpe: Remote 10.104.36.94 does not support version 3/4 packets
Sep 30 08:21:31 nagiosprodxi5 check_nrpe: Remote 10.104.36.94 does not support version 3/4 packets
Sep 30 08:21:31 nagiosprodxi5 snmptrapd[12533]: 2020-09-30 08:21:31 <UNKNOWN> [UDP: [10.100.100.254]:49458->[10.104.18.193]:162]:#012DISMAN-EVENT-MIB::sysUpTimeInstance = Timeticks: (399075629) 46 days, 4:32:36.29#011SNMPv2-MIB::snmpTrapOID.0 = OID: SNMPv2-SMI::enterprises.9.9.41.2.0.1#011SNMPv2-SMI::enterprises.9.9.41.1.2.3.1.2.17747447 = STRING: "EARL-STBY"#011SNMPv2-SMI::enterprises.9.9.41.1.2.3.1.3.17747447 = INTEGER: 2#011SNMPv2-SMI::enterprises.9.9.41.1.2.3.1.4.17747447 = STRING: "EXCESSIVE_PARITY_ERROR"#011SNMPv2-SMI::enterprises.9.9.41.1.2.3.1.5.17747447 = STRING: "EARL 0: Parity error detected in VRAM"#011SNMPv2-SMI::enterprises.9.9.41.1.2.3.1.6.17747447 = Timeticks: (399075628) 46 days, 4:32:36.28
Sep 30 08:21:31 nagiosprodxi5 snmptrapd[12533]: 2020-09-30 08:21:31 10.14.37.240(via UDP: [10.14.37.240]:161->[10.104.18.193]:162) TRAP, SNMP v1, community cimbnetmy#012#011SNMPv2-SMI::enterprises.11.2.3.7.11.90 Link Up Trap (0) Uptime: 392 days, 10:35:35.15#012#011IF-MIB::ifIndex.13 = INTEGER: 13
Sep 30 08:21:31 nagiosprodxi5 snmptrapd[12533]: 2020-09-30 08:21:31 10.14.37.240(via UDP: [10.14.37.240]:161->[10.104.18.193]:162) TRAP, SNMP v1, community cimbnetmy#012#011SNMPv2-SMI::enterprises.11.2.3.7.11.90 Enterprise Specific Trap (2) Uptime: 392 days, 10:35:35.15#012#011RMON-MIB::eventDescription.75 = STRING: I 09/30/20 08:21:27 ports: port 13 is now on-line
Sep 30 08:21:31 nagiosprodxi5 check_nrpe: Remote 10.104.10.19 accepted a version 2 packet
Sep 30 08:21:31 nagiosprodxi5 snmptrapd[12533]: 2020-09-30 08:21:31 10.110.1.217(via UDP: [10.110.1.217]:161->[10.104.18.193]:162) TRAP, SNMP v1, community cimbnetmy#012#011SNMPv2-SMI::enterprises.11.2.3.7.11.90 Link Up Trap (0) Uptime: 38 days, 19:44:00.55#012#011IF-MIB::ifIndex.19 = INTEGER: 19
Sep 30 08:21:31 nagiosprodxi5 check_nrpe: Remote 10.104.36.94 accepted a version 2 packet
Sep 30 08:21:31 nagiosprodxi5 snmptrapd[12533]: 2020-09-30 08:21:31 10.110.1.207(via UDP: [10.110.1.207]:161->[10.104.18.193]:162) TRAP, SNMP v1, community cimbnetmy#012#011SNMPv2-SMI::enterprises.11.2.14.12.1 Enterprise Specific Trap (5) Uptime: 80 days, 18:56:24.20#012#011SNMPv2-SMI::enterprises.11.2.14.11.1.7.2.1.4.1383744 = INTEGER: 20#011SNMPv2-SMI::enterprises.11.2.14.11.1.7.2.1.5.1383744 = INTEGER: 2#011SNMPv2-SMI::enterprises.11.2.14.11.1.7.2.1.6.1383744 = INTEGER: 2#011SNMPv2-SMI::enterprises.11.2.14.11.1.7.3.0.1383744 = STRING: "http://10.110.1.207/cgi/fDetail?index=1383744"
Sep 30 08:21:31 nagiosprodxi5 snmptrapd[12533]: 2020-09-30 08:21:31 <UNKNOWN> [UDP: [172.29.125.106]:50952->[10.104.18.193]:162]:#012DISMAN-EVENT-MIB::sysUpTimeInstance = Timeticks: (3135059013) 362 days, 20:29:50.13#011SNMPv2-MIB::snmpTrapOID.0 = OID: SNMPv2-SMI::enterprises.9.9.41.2.0.1#011SNMPv2-SMI::enterprises.9.9.41.1.2.3.1.2.19022766 = STRING: "LINEPROTO"#011SNMPv2-SMI::enterprises.9.9.41.1.2.3.1.3.19022766 = INTEGER: 6#011SNMPv2-SMI::enterprises.9.9.41.1.2.3.1.4.19022766 = STRING: "UPDOWN"#011SNMPv2-SMI::enterprises.9.9.41.1.2.3.1.5.19022766 = STRING: "Line protocol on Interface GigabitEthernet1/0/18, changed state to up"#011SNMPv2-SMI::enterprises.9.9.41.1.2.3.1.6.19022766 = Timeticks: (3135059013) 362 days, 20:29:50.13
Sep 30 08:21:31 nagiosprodxi5 snmptrapd[12533]: 2020-09-30 08:21:31 10.110.1.240(via UDP: [10.110.1.240]:161->[10.104.18.193]:162) TRAP, SNMP v1, community cimbnetmy#012#011SNMPv2-SMI::enterprises.11.2.3.7.11.90 Enterprise Specific Trap (2) Uptime: 101 days, 20:11:35.75#012#011RMON-MIB::eventDescription.75 = STRING: I 09/30/20 08:21:26 ports: port 7 is now on-line
Sep 30 08:21:31 nagiosprodxi5 snmptrapd[12533]: 2020-09-30 08:21:31 <UNKNOWN> [UDP: [10.14.14.246]:55155->[10.104.18.193]:162]:#012DISMAN-EVENT-MIB::sysUpTimeInstance = Timeticks: (900977334) 104 days, 6:42:53.34#011SNMPv2-MIB::snmpTrapOID.0 = OID: SNMPv2-MIB::authenticationFailure#011SNMPv2-SMI::enterprises.9.2.1.5.0 = IpAddress: 10.14.9.230#011SNMPv2-SMI::enterprises.9.9.412.1.1.1.0 = INTEGER: 1#011SNMPv2-SMI::enterprises.9.9.412.1.1.2.0 = STRING: "10.14.9.230"
Sep 30 08:21:31 nagiosprodxi5 snmptrapd[12533]: 2020-09-30 08:21:31 <UNKNOWN> [UDP: [172.26.17.245]:56691->[10.104.18.193]:162]:#012DISMAN-EVENT-MIB::sysUpTimeInstance = Timeticks: (913525356) 105 days, 17:34:13.56#011SNMPv2-MIB::snmpTrapOID.0 = OID: SNMPv2-MIB::authenticationFailure#011SNMPv2-SMI::enterprises.9.2.1.5.0 = IpAddress: 172.26.17.230
Sep 30 08:21:31 nagiosprodxi5 snmptrapd[12533]: 2020-09-30 08:21:31 <UNKNOWN> [UDP: [10.100.100.245]:161->[10.104.18.193]:162]:#012DISMAN-EVENT-MIB::sysUpTimeInstance = Timeticks: (16596827) 1 day, 22:06:08.27#011SNMPv2-MIB::snmpTrapOID.0 = OID: IF-MIB::linkUp#011IF-MIB::ifIndex.1 = INTEGER: 1#011IF-MIB::ifAdminStatus.1 = INTEGER: up(1)#011IF-MIB::ifOperStatus.1 = INTEGER: up(1)#011IF-MIB::ifDescr.1 = STRING: 1#011IF-MIB::ifAlias.1 = STRING: #011SNMPv2-MIB::snmpTrapEnterprise.0 = OID: SNMPv2-SMI::enterprises.11.2.3.7.11.139
Sep 30 08:21:31 nagiosprodxi5 check_nrpe: Remote 10.104.10.18 does not support version 3/4 packets
Sep 30 08:21:31 nagiosprodxi5 check_nrpe: Remote 10.104.10.18 does not support version 3/4 packets
Sep 30 08:21:31 nagiosprodxi5 snmptrapd[12533]: 2020-09-30 08:21:31 <UNKNOWN> [UDP: [10.14.14.246]:55155->[10.104.18.193]:162]:#012DISMAN-EVENT-MIB::sysUpTimeInstance = Timeticks: (900977335) 104 days, 6:42:53.35#011SNMPv2-MIB::snmpTrapOID.0 = OID: SNMPv2-MIB::authenticationFailure#011SNMPv2-SMI::enterprises.9.2.1.5.0 = IpAddress: 10.14.9.230#011SNMPv2-SMI::enterprises.9.9.412.1.1.1.0 = INTEGER: 1#011SNMPv2-SMI::enterprises.9.9.412.1.1.2.0 = STRING: "10.14.9.230"
Sep 30 08:21:31 nagiosprodxi5 snmptrapd[12533]: 2020-09-30 08:21:31 <UNKNOWN> [UDP: [172.30.144.248]:161->[10.104.18.193]:162]:#012DISMAN-EVENT-MIB::sysUpTimeInstance = Timeticks: (428851496) 49 days, 15:15:14.96#011SNMPv2-MIB::snmpTrapOID.0 = OID: IF-MIB::linkDown#011IF-MIB::ifIndex.5 = INTEGER: 5#011IF-MIB::ifAdminStatus.5 = INTEGER: down(2)#011IF-MIB::ifOperStatus.5 = INTEGER: down(2)#011IF-MIB::ifDescr.5 = STRING: 5#011IF-MIB::ifAlias.5 = STRING: #011SNMPv2-MIB::snmpTrapEnterprise.0 = OID: SNMPv2-SMI::enterprises.11.2.3.7.11.146
Sep 30 08:21:31 nagiosprodxi5 check_nrpe: Remote 10.104.38.18 does not support version 3/4 packets
Sep 30 08:21:31 nagiosprodxi5 check_nrpe: Remote 10.104.38.18 does not support version 3/4 packets
Sep 30 08:21:31 nagiosprodxi5 snmptrapd[12533]: 2020-09-30 08:21:31 <UNKNOWN> [UDP: [172.30.144.248]:161->[10.104.18.193]:162]:#012DISMAN-EVENT-MIB::sysUpTimeInstance = Timeticks: (428854554) 49 days, 15:15:45.54#011SNMPv2-MIB::snmpTrapOID.0 = OID: IF-MIB::linkDown#011IF-MIB::ifIndex.5 = INTEGER: 5#011IF-MIB::ifAdminStatus.5 = INTEGER: down(2)#011IF-MIB::ifOperStatus.5 = INTEGER: down(2)#011IF-MIB::ifDescr.5 = STRING: 5#011IF-MIB::ifAlias.5 = STRING: #011SNMPv2-MIB::snmpTrapEnterprise.0 = OID: SNMPv2-SMI::enterprises.11.2.3.7.11.146
Sep 30 08:21:31 nagiosprodxi5 check_nrpe: Remote 10.104.36.96 does not support version 3/4 packets
Sep 30 08:21:31 nagiosprodxi5 check_nrpe: Remote 10.104.36.96 does not support version 3/4 packets
Sep 30 08:21:31 nagiosprodxi5 snmptrapd[12533]: 2020-09-30 08:21:31 <UNKNOWN> [UDP: [10.100.100.238]:161->[10.104.18.193]:162]:#012DISMAN-EVENT-MIB::sysUpTimeInstance = Timeticks: (281588259) 32 days, 14:11:22.59#011SNMPv2-MIB::snmpTrapOID.0 = OID: SNMPv2-MIB::authenticationFailure
Sep 30 08:21:31 nagiosprodxi5 check_nrpe: Remote 10.17.110.143 does not support version 3/4 packets
Sep 30 08:21:31 nagiosprodxi5 check_nrpe: Remote 10.17.110.143 does not support version 3/4 packets
Sep 30 08:21:31 nagiosprodxi5 check_nrpe: Remote 10.104.10.18 accepted a version 2 packet
Sep 30 08:21:31 nagiosprodxi5 snmptrapd[12533]: 2020-09-30 08:21:31 <UNKNOWN> [UDP: [172.22.48.248]:161->[10.104.18.193]:162]:#012DISMAN-EVENT-MIB::sysUpTimeInstance = Timeticks: (4065631428) 470 days, 13:25:14.28#011SNMPv2-MIB::snmpTrapOID.0 = OID: IF-MIB::linkUp#011IF-MIB::ifIndex.14 = INTEGER: 14#011IF-MIB::ifAdminStatus.14 = INTEGER: up(1)#011IF-MIB::ifOperStatus.14 = INTEGER: up(1)#011IF-MIB::ifDescr.14 = STRING: 14#011IF-MIB::ifAlias.14 = STRING: #011SNMPv2-MIB::snmpTrapEnterprise.0 = OID: SNMPv2-SMI::enterprises.11.2.3.7.11.146
5 x Nagios 5.6.9 Enterprise Edition
RHEL 6 & 7
rrdcached & ramdisk optimisation
benjaminsmith
Posts: 5324
Joined: Wed Aug 22, 2018 4:39 pm
Location: saint paul

Re: Nagios logs filling up syslog

Post by benjaminsmith »

Hi,

Just want to verify that you restarted Nagios Core after making the changes to the configuration file. I just tested this on my system, and the setting use_syslog=0 to work as expected.

To restart Nagios:

Code: Select all

systemctl restart nagios
Let me know. Thanks, Benjamin
As of May 25th, 2018, all communications with Nagios Enterprises and its employees are covered under our new Privacy Policy.

Be sure to check out our Knowledgebase for helpful articles and solutions!
rajasegar
Posts: 1018
Joined: Sun Mar 30, 2014 10:49 pm

Re: Nagios logs filling up syslog

Post by rajasegar »

benjaminsmith wrote:Hi,

Just want to verify that you restarted Nagios Core after making the changes to the configuration file. I just tested this on my system, and the setting use_syslog=0 to work as expected.

To restart Nagios:

Code: Select all

systemctl restart nagios
Let me know. Thanks, Benjamin
Yes I did do that.

Code: Select all

[nagios@myucbrnagiapp05 etc]$ grep syslog nagios.cfg
use_syslog=0
[nagios@myucbrnagiapp05 etc]$ sudo tail /var/log/messages
Oct  1 09:14:22 nagiosprodxi5 check_nrpe: Remote 10.104.122.93 accepted a Version 2 Packet
Oct  1 09:14:22 nagiosprodxi5 check_nrpe: Error: (nerrs = 0)(!log_opts) Could not complete SSL handshake with 10.104.120.163: rc=-1 SSL-error=5
Oct  1 09:14:22 nagiosprodxi5 check_nrpe: Error: (nerrs = 0)(!log_opts) Could not complete SSL handshake with 10.104.120.163: rc=-1 SSL-error=5
Oct  1 09:14:22 nagiosprodxi5 snmptrapd[7211]: 2020-10-01 09:14:22 <UNKNOWN> [UDP: [172.29.31.247]:55912->[10.104.18.193]:162]:#012DISMAN-EVENT-MIB::sysUpTimeInstance = Timeticks: (223129845) 25 days, 19:48:18.45#011SNMPv2-MIB::snmpTrapOID.0 = OID: SNMPv2-MIB::authenticationFailure#011SNMPv2-SMI::enterprises.9.2.1.5.0 = IpAddress: 172.29.31.230
Oct  1 09:14:22 nagiosprodxi5 check_nrpe: Error: (nerrs = 0)(!log_opts) Could not complete SSL handshake with 10.104.120.163: rc=-1 SSL-error=5
Oct  1 09:14:22 nagiosprodxi5 check_nrpe: Error: (nerrs = 0)(!log_opts) Could not complete SSL handshake with 10.104.120.163: rc=-1 SSL-error=5
Oct  1 09:14:22 nagiosprodxi5 check_nrpe: Remote 10.104.122.93 does not support Version 3 Packets
Oct  1 09:14:22 nagiosprodxi5 check_nrpe: Remote 10.104.122.93 does not support Version 3 Packets
Oct  1 09:14:22 nagiosprodxi5 check_nrpe: Remote 10.104.122.93 accepted a Version 2 Packet
Oct  1 09:14:22 nagiosprodxi5 snmptrapd[7211]: 2020-10-01 09:14:22 <UNKNOWN> [UDP: [10.100.100.122]:161->[10.104.18.193]:162]:#012DISMAN-EVENT-MIB::sysUpTimeInstance = Timeticks: (645471885) 74 days, 16:58:38.85#011SNMPv2-MIB::snmpTrapOID.0 = OID: SNMPv2-MIB::authenticationFailure
[nagios@myucbrnagiapp05 etc]$

5 x Nagios 5.6.9 Enterprise Edition
RHEL 6 & 7
rrdcached & ramdisk optimisation
benjaminsmith
Posts: 5324
Joined: Wed Aug 22, 2018 4:39 pm
Location: saint paul

Re: Nagios logs filling up syslog

Post by benjaminsmith »

Hi,

Ok, thanks for trying. So it turns out the nrpe plugin will send to syslog, but you can turn this off with the following option in your check command.
-D, --disable-syslog Disable logging to syslog facilities
--Benajmin
As of May 25th, 2018, all communications with Nagios Enterprises and its employees are covered under our new Privacy Policy.

Be sure to check out our Knowledgebase for helpful articles and solutions!
rajasegar
Posts: 1018
Joined: Sun Mar 30, 2014 10:49 pm

Re: Nagios logs filling up syslog

Post by rajasegar »

benjaminsmith wrote:Hi,

Ok, thanks for trying. So it turns out the nrpe plugin will send to syslog, but you can turn this off with the following option in your check command.
-D, --disable-syslog Disable logging to syslog facilities
--Benajmin
What about XI version 5.6.14
use_syslog=0
There is no option -D for check_nrpe version 3.2.1
This still gets logged in syslog

Code: Select all

Oct  2 10:00:59 nagiosprodxi1 nagios: job 3848 (pid=12808): read() returned error 11
Oct  2 10:00:59 nagiosprodxi1 nagios: job 3848 (pid=12808): read() returned error 11
Oct  2 10:00:59 nagiosprodxi1 nagios: job 3848 (pid=12808): read() returned error 11
Oct  2 10:00:59 nagiosprodxi1 nagios: job 3848 (pid=12808): read() returned error 11
Oct  2 10:01:00 nagiosprodxi1 nagios: job 3850 (pid=15449): read() returned error 11
Oct  2 10:01:00 nagiosprodxi1 nagios: job 3850 (pid=15449): read() returned error 11
Oct  2 10:01:00 nagiosprodxi1 nagios: job 3848 (pid=12808): read() returned error 11
Oct  2 10:01:00 nagiosprodxi1 nagios: job 3848 (pid=12808): read() returned error 11
Oct  2 10:01:00 nagiosprodxi1 nagios: job 3850 (pid=15449): read() returned error 11
Oct  2 10:01:00 nagiosprodxi1 nagios: job 3850 (pid=15449): read() returned error 11
Oct  2 10:01:01 nagiosprodxi1 nagios: job 3848 (pid=12808): read() returned error 11
Oct  2 10:01:01 nagiosprodxi1 nagios: job 3848 (pid=12808): read() returned error 11
Oct  2 10:01:01 nagiosprodxi1 nagios: job 3850 (pid=15449): read() returned error 11
Oct  2 10:01:01 nagiosprodxi1 nagios: job 3850 (pid=15449): read() returned error 11
Oct  2 10:01:02 nagiosprodxi1 nagios: job 3850 (pid=15449): read() returned error 11
Oct  2 10:01:02 nagiosprodxi1 nagios: job 3850 (pid=15449): read() returned error 11

5 x Nagios 5.6.9 Enterprise Edition
RHEL 6 & 7
rrdcached & ramdisk optimisation
ssax
Dreams In Code
Posts: 7682
Joined: Wed Feb 11, 2015 12:54 pm

Re: Nagios logs filling up syslog

Post by ssax »

EDIT: The only way to get rid of those NRPE messages would be to upgrade your version of check_nrpe to the version that supports the -D option, it's the plugin doing that.

You could technically do it without upgrading XI via the instructions here:
- NOTE: Be aware that you may need to adjust your NSClient++ checks to use -2 if you do this

First, take a backup of your current one just in case you need to revert:

Code: Select all

cp -p /usr/local/nagios/libexec/check_nrpe /usr/local/nagios/libexec/check_nrpe3.2.1

Code: Select all

cd /tmp
wget https://github.com/NagiosEnterprises/nrpe/releases/download/nrpe-4.0.3/nrpe-4.0.3.tar.gz
tar zxf nrpe-4.0.3.tar.gz
cd nrpe-4.0.3
./configure
make all
make install-plugin
Then edit your nrpe commands in Configure > Core Config Manager > Commands and add the -D option.

That usually indicates an issue, please PM me a copy of your profile, you can download it from Admin > System Profile by clicking the Download Profile button.

Usually setting use_syslog=0 should disable all Nagios Core messages to /var/log/messages, you could either have multiple nagios services running (I'll check in the profile output) or that log may not honor that setting in the Nagios Core code, I'll take a look.

EDIT2: Nagios Core doesn't honor that setting for that notice according to the code:

Code: Select all

https://github.com/NagiosEnterprises/nagioscore/blob/73a6b070e46c63aa6e2be731133c1b078cbee35c/lib/worker.c#L464
The developers would need to implement the change to have it honor that, I have submitted the request here:

https://github.com/NagiosEnterprises/na ... issues/793
rajasegar
Posts: 1018
Joined: Sun Mar 30, 2014 10:49 pm

Re: Nagios logs filling up syslog

Post by rajasegar »

ssax wrote:EDIT: The only way to get rid of those NRPE messages would be to upgrade your version of check_nrpe to the version that supports the -D option, it's the plugin doing that.

You could technically do it without upgrading XI via the instructions here:
- NOTE: Be aware that you may need to adjust your NSClient++ checks to use -2 if you do this

First, take a backup of your current one just in case you need to revert:

Code: Select all

cp -p /usr/local/nagios/libexec/check_nrpe /usr/local/nagios/libexec/check_nrpe3.2.1

Code: Select all

cd /tmp
wget https://github.com/NagiosEnterprises/nrpe/releases/download/nrpe-4.0.3/nrpe-4.0.3.tar.gz
tar zxf nrpe-4.0.3.tar.gz
cd nrpe-4.0.3
./configure
make all
make install-plugin
Then edit your nrpe commands in Configure > Core Config Manager > Commands and add the -D option.

That usually indicates an issue, please PM me a copy of your profile, you can download it from Admin > System Profile by clicking the Download Profile button.

Usually setting use_syslog=0 should disable all Nagios Core messages to /var/log/messages, you could either have multiple nagios services running (I'll check in the profile output) or that log may not honor that setting in the Nagios Core code, I'll take a look.

EDIT2: Nagios Core doesn't honor that setting for that notice according to the code:

Code: Select all

https://github.com/NagiosEnterprises/nagioscore/blob/73a6b070e46c63aa6e2be731133c1b078cbee35c/lib/worker.c#L464
The developers would need to implement the change to have it honor that, I have submitted the request here:

https://github.com/NagiosEnterprises/na ... issues/793
I have sent you the profile via PM.
5 x Nagios 5.6.9 Enterprise Edition
RHEL 6 & 7
rrdcached & ramdisk optimisation
ssax
Dreams In Code
Posts: 7682
Joined: Wed Feb 11, 2015 12:54 pm

Re: Nagios logs filling up syslog

Post by ssax »

Please send the output of these commands as root:

Code: Select all

sysctl -p
ulimit -a
su -s /bin/bash -c 'ulimit -a' nagios
su -s /bin/bash -c 'ulimit -a' mysql
When you are seeing that queue send error, please get this output (as root) before fixing it and send it to us:

Code: Select all

ipcs -q
You are running at about 30% for load, given the size of your system your next steps are likely to be to setup mod_gearman to offload the load from the XI server for the checks to external workers (to reduce the load on the XI server), see here for a prewritten response I send to customers when they start hitting kernel message queue errors:

Generally at 10K total combined host/service checks we recommend that you setup a RAMDisk (you've already done this), and at around 20K we recommend you start looking at adding an additional XI server because they can only process so much. Now this may come sooner or later than 20K depending on what type of checks you are running, how much resources they use, your hardware speed, and what you're doing to mitigate the impact.

You should run this check profiler script and see what long running checks you have and determine what some of your long running checks are, they consume resources the whole time they are running so reducing those helps a lot:

https://exchange.nagios.org/directory/P ... me/details

The next step would be for you to look at offloading the checks using mod gearman to reduce the impact on the XI server, this would be my recommendation at what you can do to add more services and alleviate the system issues. There's just so much going with around 20K checks that you will need to do what you can to mitigate the impact such as using mod gearman, please see here for more information:

https://assets.nagios.com/downloads/nag ... ios_XI.pdf
https://support.nagios.com/kb/article.php?id=484

NOTE: Make sure that you follow the "Remote Worker Considerations" and the "Host groups and Service groups​" sections from the second link above and then follow the "Disable Worker​" section from the first link once you've setup your exclude groups.

Please read through this doc as well, with the number of checks you are running I would leave the DB local though at this point in time because of the large amount of total checks you have, it requires a lot of throughput to the DB:

https://assets.nagios.com/downloads/nag ... ios-XI.pdf

You can only do so much on a single server, you'll need to do what you can to mitigate the impact but you should start looking at adding another XI server soon if you continue to experience load/performance issues after doing the mitigation.

Let me know if you have any questions or if I can clarify anything.
rajasegar
Posts: 1018
Joined: Sun Mar 30, 2014 10:49 pm

Re: Nagios logs filling up syslog

Post by rajasegar »

sysctl -p

Code: Select all

[nagios@nagiosprodxi1 ~]$ sudo sysctl -p
net.ipv4.ip_forward = 0
net.ipv4.conf.default.rp_filter = 1
net.ipv4.conf.default.accept_source_route = 0
kernel.sysrq = 0
kernel.core_uses_pid = 1
net.ipv4.tcp_syncookies = 1
error: "net.bridge.bridge-nf-call-ip6tables" is an unknown key
error: "net.bridge.bridge-nf-call-iptables" is an unknown key
error: "net.bridge.bridge-nf-call-arptables" is an unknown key
kernel.msgmnb = 662144000
kernel.msgmax = 662144000
kernel.shmmax = 4294967295
kernel.shmall = 268435456
kernel.msgmni = 768000
net.ipv6.conf.default.accept_ra = 0
net.ipv6.conf.all.accept_ra = 0
net.ipv6.conf.all.disable_ipv6 = 1
net.ipv4.conf.all.accept_redirects = 0
net.ipv4.conf.default.secure_redirects = 0
net.ipv4.conf.all.send_redirects = 0
net.ipv4.conf.default.log_martians = 1
net.ipv4.conf.all.rp_filter = 1
net.ipv4.conf.default.accept_redirects = 0
net.ipv4.conf.all.secure_redirects = 0
net.ipv6.conf.default.accept_redirects = 0
net.ipv6.conf.all.accept_redirects = 0
net.ipv4.conf.default.send_redirects = 0
net.ipv4.conf.all.log_martians = 1
fs.suid_dumpable = 0
ulimit -a

Code: Select all

[nagios@nagiosprodxi1 ~]$ sudo ulimit -a
sudo: ulimit: command not found
[nagios@nagiosprodxi1 ~]$ ulimit -a
core file size          (blocks, -c) 0
data seg size           (kbytes, -d) unlimited
scheduling priority             (-e) 0
file size               (blocks, -f) unlimited
pending signals                 (-i) 96017
max locked memory       (kbytes, -l) 128
max memory size         (kbytes, -m) unlimited
open files                      (-n) 10000
pipe size            (512 bytes, -p) 8
POSIX message queues     (bytes, -q) 819200
real-time priority              (-r) 0
stack size              (kbytes, -s) 20480
cpu time               (seconds, -t) unlimited
max user processes              (-u) 4096
virtual memory          (kbytes, -v) unlimited
file locks                      (-x) unlimited
su -s /bin/bash -c 'ulimit -a' nagios

Code: Select all

[nagios@nagiosprodxi1 ~]$ su -s /bin/bash -c 'ulimit -a' nagios
Password:
core file size          (blocks, -c) 0
data seg size           (kbytes, -d) unlimited
scheduling priority             (-e) 0
file size               (blocks, -f) unlimited
pending signals                 (-i) 96017
max locked memory       (kbytes, -l) 128
max memory size         (kbytes, -m) unlimited
open files                      (-n) 10000
pipe size            (512 bytes, -p) 8
POSIX message queues     (bytes, -q) 819200
real-time priority              (-r) 0
stack size              (kbytes, -s) 20480
cpu time               (seconds, -t) unlimited
max user processes              (-u) 4096
virtual memory          (kbytes, -v) unlimited
file locks                      (-x) unlimited
su -s /bin/bash -c 'ulimit -a' mysql

Code: Select all

[root@nagiosprodxi1 nagios]#  su -s /bin/bash -c 'ulimit -a' mysql
core file size          (blocks, -c) 0
data seg size           (kbytes, -d) unlimited
scheduling priority             (-e) 0
file size               (blocks, -f) unlimited
pending signals                 (-i) 96017
max locked memory       (kbytes, -l) 128
max memory size         (kbytes, -m) unlimited
open files                      (-n) 10000
pipe size            (512 bytes, -p) 8
POSIX message queues     (bytes, -q) 819200
real-time priority              (-r) 0
stack size              (kbytes, -s) 20480
cpu time               (seconds, -t) unlimited
max user processes              (-u) 1024
virtual memory          (kbytes, -v) unlimited
file locks                      (-x) unlimited

[root@nagiosprodxi1 nagios]# ipcs -q

------ Message Queues --------
key        msqid      owner      perms      used-bytes   messages
0x64000002 27197440   nagios     600        0            0
We have tried mod_gearman before, unfortunately we found Nagios was better at it than using gearman.
This was 4 years ago. We have 7 instances of Nagios XI now.

Same applies to offloaded DB. In theory it sounds all perfect but in practise it slows down Nagios to a crawl at least in a large environment
Things went back to normal when we reverted back to local DB.
5 x Nagios 5.6.9 Enterprise Edition
RHEL 6 & 7
rrdcached & ramdisk optimisation
ssax
Dreams In Code
Posts: 7682
Joined: Wed Feb 11, 2015 12:54 pm

Re: Nagios logs filling up syslog

Post by ssax »

As I stated in the last response, given the current size of your system I recommend the DB stay local, otherwise the KMQ will not process fast enough.

All that output looks good.

Let's check your DB table sizes:

Additionally, please send the output of this command:
- NOTE: You may need to adjust the -h 127.0.0.1, the -uroot, and -pnagiosxi in the command if your DB is offloaded to another server and/or you've changed the root mysql password

Code: Select all

echo "SELECT table_name AS 'Table', round(((data_length + index_length) / 1024 / 1024), 2) 'Size in MB' FROM information_schema.TABLES WHERE table_schema IN ('nagios', 'nagiosql', 'nagiosxi');" | mysql -h 127.0.0.1 -uroot -pnagiosxi --table
Locked