Nagios Support Forum

Posted: **Fri Sep 03, 2021 9:57 am**

Hello @Dusan.Mandic

Sounds good, depending on your environment on the size increment(s).

You can set and watch it:

watch -n 5 ls -lahrt /usr/local/nagios/var/nagios.log /usr/local/nagios/var/nagios.debug

Let us know how things look,
Perry

Posted: **Wed Sep 08, 2021 1:28 pm**

Here's the ethtool. Just started the log today.

Posted: **Thu Sep 09, 2021 11:44 am**

Hello @Dusan.Mandic

Thanks for following up, with the results; let's go ahead and increase the service and host timeouts to 120 seconds and bounce the nagios.service.

In: /usr/local/ncpa/etc/ncpa.cfg
Change to:

plugin_timeout = 120

In: /usr/local/nagios/etc/nagios.cfg
Change to:

host_check_timeout=120
service_check_timeout=120

Then restart the nagios service:

Code: Select all

systemctl restart nagios

Please run for awhile and let us know how things are looking,
Perry

Posted: **Thu Sep 09, 2021 4:10 pm**

Things did not go well.

This caused multiple alerts. I think there were about 90 criticals thrown in a couple minutes.

This is our production server, i was able to catch it before alerts went out but we cannot have an event of that nature again.

Posted: **Thu Sep 09, 2021 4:18 pm**

.

results.txt

Posted: **Fri Sep 10, 2021 10:20 am**

Hello @Dusan.Mandic

Let's get an updated System Profile on this so we can see what is going on.

To send us your system profile.

Login to the Nagios XI GUI using a web browser.
Click the "Admin" > "System Profile" Menu
Click the "Download Profile" button
Save the profile.zip file and share via PM

Or command:

Code: Select all

rm -rf /usr/local/nagiosxi/var/components/profile.zip
/usr/local/nagiosxi/scripts/components/getprofile.sh SUPPORT

Then send the resulting /usr/local/nagiosxi/var/components/profile.zip file via Private Message.

Thanks,
Perry

Posted: **Fri Sep 10, 2021 12:47 pm**

pm sent.

the load has now gone way up from before, where it stood around 3/4/4, it has now almost tripled.

Posted: **Fri Sep 10, 2021 2:05 pm**

Hello @Dusan.Mandic

You are correct that we see resource issues

PHP Fatal error: Allowed memory size of 268435456 bytes exhausted (tried to allocate....) and the NPCD has reached MAX LOAD too.

22731 nagios 20 0 11.6g 40312 11688 S 112.5 0.1 0:00.20 java
10032 apache 20 0 684800 72184 6124 S 100.0 0.2 0:10.72 httpd
1936 apache 20 0 643556 30884 6232 R 93.8 0.1 0:05.10 httpd

We see that /opt/SumoCollector/jre/bin/java is taking up bit and it is on the top of the resource users listed as 'java'. May want to review that process to take a look at how things look.

Then it appears there are multiple definitions found for the same service. Take a look at your configuration files and see where they are occurring. We see that just happened recently, so I am not sure if there were any bulk changes or imports. May want to run through the database repair and CCM reindex.

Warning: Duplicate definition found for service 'CPU Stats' on host 'XXXXXXXXX' (config file '/usr/local/nagios/etc/services/RHEL_Jira.cfg'....)

You can run this check command on the configs to get results:

Code: Select all

/usr/local/nagios/bin/nagios -vvv /usr/local/nagios/etc/nagios.cfg

You should be able to run grep -Eir -A 5 -B 5 'CPU Stats' /usr/local/nagios/etc/* to help search through them all.

Here are the instructions on how to run through re-index.

run through the database repair.

Code: Select all

/usr/local/nagiosxi/scripts/repair_databases.sh

[*]Reindex the Core Configuration Manager (CCM) configs[/*]

rm -rf /usr/local/nagios/etc/import/*
1: Terminal command list all running /bin/nagios -> ps -aux | grep -E '/bin/nagios'
2: Terminal command -> killall -9 nagios (or pkill nagios)
3: Terminal command check to see if /bin/nagios processes are stopped
4: Restart nagios.service by terminal command: systemctl restart nagios
5: Head over to the Nagios XI web console ==> Core Configuration Manager (CCM) ==> Config File Management ==> [Delete Files] ==> [Write Files] ==> [Verify Files]
6: Core Configuration Manager (CCM) ==> Under Quick Tools ==> "Apply Configuration"
7: Restart nagios.service by terminal command: systemctl restart nagios
Code: Select all
```
systemctl restart nagios
```

[/list]

[*]verify that the host and services look good and verify that there are no errors in core by:[/*]

Code: Select all

/usr/local/nagios/bin/nagios -vvv /usr/local/nagios/etc/nagios.cfg

[/list]

Posted: **Tue Sep 14, 2021 11:31 am**

Edited: Warning: Duplicate definition found for service 'CPU Stats' on host 'XXXXXXXXX' (config file '/usr/local/nagios/etc/services/RHEL_Jira.cfg'....oa

These were applied to all hosts. I removed them from the two hostgroups they were associated with. I did not do any bulk imports. If these arent on the first system profile I uploaded in this thread, that would lead me to believe there was some sort of error on reconfigure.

Still hovering around the 9/9/9 load

Posted: **Wed Sep 15, 2021 10:57 am**

Hello @Dusan.Mandic

Want to get a better picture on what is going on realtime, and would like to capture the following event timeline to file so we can see where things are hanging up. Please change the <yourhostnameoripaddresshere> to a host that is logging timeout(s).

Code: Select all

tail -Fn0 /var/log/httpd/* /var/log/apache2/* /usr/local/nagios/var/* /usr/local/nagiosxi/tmp/* /usr/local/nagiosxi/var/* /var/log/syslog /var/log/messages /usr/local/nagios/var/spool/* /usr/local/nagiosxi/var/components/* | grep -Ei "warn|error|fail|timeout|<yourhostnameoripaddresshere>" >> /tmp/results.txt

Capture until you see a message that indicates a timeout, then 'cntl-c' to breakout to send /tmp/results.txt via private message.

Thanks,
Perry

Nagios Support Forum

Nagios Host/Service Check Timeouts

Re: Nagios Host/Service Check Timeouts

Re: Nagios Host/Service Check Timeouts

Re: Nagios Host/Service Check Timeouts

Re: Nagios Host/Service Check Timeouts

Re: Nagios Host/Service Check Timeouts

Re: Nagios Host/Service Check Timeouts

Re: Nagios Host/Service Check Timeouts

Re: Nagios Host/Service Check Timeouts

Re: Nagios Host/Service Check Timeouts

Re: Nagios Host/Service Check Timeouts