Nagios Not Functioning

This support forum board is for support questions relating to Nagios XI, our flagship commercial network monitoring solution.
Locked
User avatar
CGraham
Posts: 115
Joined: Tue Aug 16, 2011 2:43 pm

Nagios Not Functioning

Post by CGraham »

Nagios has become unstable and is exhibiting a number of issues:

1. Notifications are failing at random with an error:
"[1347476154] Warning: Contact 'cgraham' service notification command '/usr/bin/printf "%b" "[email warning]" cgraham@address.com' timed out after 30 seconds

2. Getting fork errors:
"Warning: The check of service '[service check]' could not be performed due to a fork() error: 'Resource temporarily unavailable'. The check will be rescheduled.

3. Getting pipe errors:
[1347475829] HOST ALERT: [host name];DOWN;SOFT;1;Could not open pipe: /bin/ping -n -U -w 30 -c 5 [ip address]

4. /tmp full of check entries (over 365K)

5. Seeing a block of retries in the logs:
[1347476391] Warning: The check of host '[hostname]' looks like it was orphaned (results never came back). I'm scheduling an immediate check of the host...

Any ideas where to start on this? My search of the various issues turned up nothing promising.
Last edited by CGraham on Thu Sep 13, 2012 8:14 am, edited 2 times in total.
User avatar
CGraham
Posts: 115
Joined: Tue Aug 16, 2011 2:43 pm

Re: Nagios Not Functioning

Post by CGraham »

Installed Version: 2011R3.2

Nagios XI Installation Profile
Download Profile
System:
[hostname].AirWatch.Local 2.6.32-220.13.1.el6.i686 i686
CentOS release 6.2 (Final)
Gnome Installed
Apache Information
PHP Version: 5.3.3
Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:15.0) Gecko/20100101 Firefox/15.0.1
Server Name: [hostname]
Server Address: 10.1.3.151
Server Port: 80
Date/Time
PHP Timezone: America/New_York
PHP Time: Wed, 12 Sep 2012 16:10:59 -0400
System Time: Wed, 12 Sep 2012 16:10:59 -0400
Nagios XI Data
nagios (pid 29397) is running...
NPCD running (pid 1973).
ndo2db (pid 2020) is running...
CPU Load 15: 1.42
Total Hosts: 616
Total Services: 6098
Function 'get_base_uri' returns: http://[hostname]/nagiosxi/
Function 'get_base_url' returns: http://[hostname]s/nagiosxi/
Function 'get_backend_url(internal_call=false)' returns: http://[hostname]/nagiosxi/includes/components/profile/profile.php
Function 'get_backend_url(internal_call=true)' returns: http://[hostname]/nagiosxi/backend/
Ping Test localhost
Running:

/bin/ping -c 3 localhost 2>&1

PING localhost (127.0.0.1) 56(84) bytes of data.
64 bytes from localhost (127.0.0.1): icmp_seq=1 ttl=64 time=0.026 ms
64 bytes from localhost (127.0.0.1): icmp_seq=2 ttl=64 time=0.028 ms
64 bytes from localhost (127.0.0.1): icmp_seq=3 ttl=64 time=0.041 ms

--- localhost ping statistics ---
3 packets transmitted, 3 received, 0% packet loss, time 1999ms
rtt min/avg/max/mdev = 0.026/0.031/0.041/0.009 ms
Test wget To locahost
WGET From URL: http://localhost/nagiosql/index.php
Running:

/usr/bin/wget http://localhost/nagiosql/index.php

--2012-09-12 16:11:01-- http://localhost/nagiosql/index.php
Resolving localhost... ::1, 127.0.0.1
Connecting to localhost|::1|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 5259 (5.1K) [text/html]
Saving to: `/tmp/nagiosql_index.tmp'

0K ..... 100% 431M=0s

2012-09-12 16:11:02 (431 MB/s) - `/tmp/nagiosql_index.tmp' saved [5259/5259]
mguthrie
Posts: 4380
Joined: Mon Jun 14, 2010 10:21 am

Re: Nagios Not Functioning

Post by mguthrie »

I think you're hitting a system ulimit max somewhere. Take a look at this wiki and see if it applies to your issue.
http://support.nagios.com/wiki/index.ph ... g_Orphaned
User avatar
CGraham
Posts: 115
Joined: Tue Aug 16, 2011 2:43 pm

Re: Nagios Not Functioning

Post by CGraham »

I made the changes to the /etc/security/limits.conf. Then rebooted. Here's the result:

[root@ATL02PRDMGMT03 ~]# ulimit -a
core file size (blocks, -c) 0
data seg size (kbytes, -d) unlimited
scheduling priority (-e) 0
file size (blocks, -f) unlimited
pending signals (-i) 31297
max locked memory (kbytes, -l) 64
max memory size (kbytes, -m) unlimited
open files (-n) 1024
pipe size (512 bytes, -p) 8
POSIX message queues (bytes, -q) 819200
real-time priority (-r) 0
stack size (kbytes, -s) 10240
cpu time (seconds, -t) unlimited
max user processes (-u) 1024
virtual memory (kbytes, -v) unlimited
file locks (-x) unlimited

It doesn't look like it took....
User avatar
CGraham
Posts: 115
Joined: Tue Aug 16, 2011 2:43 pm

Re: Nagios Not Functioning

Post by CGraham »

Here is my /etc/security/limits.conf file:

[root@[hostname] ~]# cat /etc/security/limits.conf | grep -v "^#"
* hard nofile 200000
@nagios hard memlock 128 #locked memory
@nagios soft memlock 128
@nagios soft nofile 4096 #open files
@nagios hard nofile 4096
@nagios hard nproc 4096 #max user processes
@nagios soft nproc 4096
@nagios hard stack 20480 #stack size
@nagios soft stack 20480
mguthrie
Posts: 4380
Joined: Mon Jun 14, 2010 10:21 am

Re: Nagios Not Functioning

Post by mguthrie »

Hmm, I hate to have you reboot one more time, but can you try updating the settings once more, only this time use the wildcard: * instead of @nagios for the ulimit directives.

Also, check the /etc/security/limits.d/ directory and make sure there aren't files in there that are overriding the new settings.
User avatar
CGraham
Posts: 115
Joined: Tue Aug 16, 2011 2:43 pm

Re: Nagios Not Functioning

Post by CGraham »

new limits.conf
commented out another limits file under limits.d

[root@ATL02PRDMGMT03 ~]# cat /etc/security/limits.conf | grep -v "^#"
* hard memlock 128 #locked memory
* soft memlock 128
* soft nofile 4096 #open files
* hard nofile 4096
* hard nproc 4096 #max user processes
* soft nproc 4096
* hard stack 20480 #stack size
* soft stack 20480

Rebooted:

[root@ATL02PRDMGMT03 ~]# ulimit -a
core file size (blocks, -c) 0
data seg size (kbytes, -d) unlimited
scheduling priority (-e) 0
file size (blocks, -f) unlimited
pending signals (-i) 31297
max locked memory (kbytes, -l) 128
max memory size (kbytes, -m) unlimited
open files (-n) 4096
pipe size (512 bytes, -p) 8
POSIX message queues (bytes, -q) 819200
real-time priority (-r) 0
stack size (kbytes, -s) 20480
cpu time (seconds, -t) unlimited
max user processes (-u) 4096
virtual memory (kbytes, -v) unlimited
file locks (-x) unlimited

That looks better
mguthrie
Posts: 4380
Joined: Mon Jun 14, 2010 10:21 am

Re: Nagios Not Functioning

Post by mguthrie »

Good deal. Anytime I've ever seen that issue the ulimit increases fixed it, so I'm betting that will take care of it.
User avatar
CGraham
Posts: 115
Joined: Tue Aug 16, 2011 2:43 pm

Re: Nagios Not Functioning

Post by CGraham »

Now this error is back in the logs:

[1347486156] Warning: Contact '[contact]' service notification command '/usr/bin/printf "%b" "[service check]" | /bin/mail -s "[Subject]" cgraham@air-watch.com' timed out after 30 seconds

EDIT: I'm not seeing any of the other previous issues. Maybe this is a separate postfix issues. I attempted to make postfix changes and restart postfix but when I ran "service postfix restart" it acted like postfix wasn't running. Am I confusing sendmail, /bin/mail and postfix?
User avatar
CGraham
Posts: 115
Joined: Tue Aug 16, 2011 2:43 pm

Re: Nagios Not Functioning

Post by CGraham »

I'm going to move this to another thread since it's another issue.
Locked