Nagios Support Forum

Posted: **Mon Feb 02, 2015 12:55 am**

Hi -

After updating NAgios XI to 2.5, the performance grapher stopped working. None of the graphs are being generated. I'm making this an urgent request because we have finally deployed Nagios and this is one of the main reasons for going to Nagios. Thanks in advance!

System profile:

Nagios XI Installation Profile
Download Profile
System:
Nagios XI Version : 2014R2.5
nwd2ng01.corp.analog.com 2.6.32-504.3.3.el6.x86_64 x86_64
CentOS release 6.6 (Final)
Gnome is not installed
Apache Information
PHP Version: 5.3.3
Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:35.0) Gecko/20100101 Firefox/35.0
Server Name: nwd2ng01.corp.analog.com
Server Address: 10.64.52.120
Server Port: 80
Date/Time
PHP Timezone: America/New_York
PHP Time: Mon, 02 Feb 2015 00:51:30 -0500
System Time: Mon, 02 Feb 2015 00:51:30 -0500
Nagios XI Data
License ends in: MSTNQS

nagios (pid 6281) is running...
NPCD running (pid 3886).
ndo2db (pid 2079) is running...
CPU Load 15: 22.25
Total Hosts: 509
Total Services: 3349
Function 'get_base_uri' returns: http://nwd2ng01.corp.analog.com/nagiosxi/
Function 'get_base_url' returns: http://nwd2ng01.corp.analog.com/nagiosxi/
Function 'get_backend_url(internal_call=false)' returns: http://nwd2ng01.corp.analog.com/nagiosx ... rofile.php
Function 'get_backend_url(internal_call=true)' returns: http://localhost/nagiosxi/backend/
Ping Test localhost
Running:

/bin/ping -c 3 localhost 2>&1

PING localhost (127.0.0.1) 56(84) bytes of data.
64 bytes from localhost (127.0.0.1): icmp_seq=1 ttl=64 time=0.013 ms
64 bytes from localhost (127.0.0.1): icmp_seq=2 ttl=64 time=0.012 ms
64 bytes from localhost (127.0.0.1): icmp_seq=3 ttl=64 time=0.013 ms

--- localhost ping statistics ---
3 packets transmitted, 3 received, 0% packet loss, time 1999ms
rtt min/avg/max/mdev = 0.012/0.012/0.013/0.004 ms
Test wget To localhost
WGET From URL: http://localhost/nagiosxi/includes/components/ccm/
Running:

/usr/bin/wget http://localhost/nagiosxi/includes/components/ccm/

--2015-02-02 00:51:32-- http://localhost/nagiosxi/includes/components/ccm/
Resolving localhost... ::1, 127.0.0.1
Connecting to localhost|::1|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: unspecified [text/html]
Saving to: "/usr/local/nagiosxi/tmp/ccm_index.tmp"

0K ........ 423M=0s

2015-02-02 00:51:32 (423 MB/s) - "/usr/local/nagiosxi/tmp/ccm_index.tmp" saved [8385]

Network Settings

1: lo: mtu 65536 qdisc noqueue state UNKNOWN

link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00

inet 127.0.0.1/8 scope host lo

inet6 ::1/128 scope host

valid_lft forever preferred_lft forever

2: eth0: mtu 1500 qdisc pfifo_fast state UP qlen 1000

link/ether 00:50:56:9f:52:ef brd ff:ff:ff:ff:ff:ff

inet 10.64.52.120/24 brd 10.64.52.255 scope global eth0

inet6 fe80::250:56ff:fe9f:52ef/64 scope link

valid_lft forever preferred_lft forever

Kernel IP routing table

Destination Gateway Genmask Flags Metric Ref Use Iface

10.64.52.0 * 255.255.255.0 U 0 0 0 eth0

link-local * 255.255.0.0 U 1002 0 0 eth0

default 10.64.52.1 0.0.0.0 UG 0 0 0 eth0

Posted: **Mon Feb 02, 2015 1:20 am**

Can you please run these commands on your Nagios XI host and send the output:

Code: Select all

ls -al /usr/local/nagios/var/spool/perfdata/ | wc -l

ls -al /usr/local/nagios/var/spool/xidpe/ | wc -ll

Also, increase the logging verbosity and then take a deeper look into the logs. Follow the FAQ entry below to increase the log level of process_perfdata and npcd:

http://support.nagios.com/wiki/index.ph ... leshooting

Wait 15 - 20 minutes and then get a tail of the logs:

Code: Select all

tail -250 /usr/local/nagios/var/perfdata.log > /tmp/perfdata.txt
tail -250 /usr/local/nagios/var/npcd.log > /tmp/npcd.txt

Send us a copy of /tmp/perfdata.txt and /tmp/npcd.txt

Don't forget to turn down the log level as per the FAQ when you are done!

Posted: **Mon Feb 02, 2015 9:19 am**

Hi -

Information below:

Code: Select all

[root@nwd2ng01 perfdata]# ls -al /usr/local/nagios/var/spool/perfdata/ | wc -l
56155
[root@nwd2ng01 perfdata]# ls -al /usr/local/nagios/var/spool/xidpe/ | wc -ll
5

Files are attached.

Also I know it is bad troubleshooting practice to make changes while someone is helping you, but I'm a little bit desperate here

Anyway, here's what I changed BEFORE modifying the log verbosity (based on what I saw in the logs):

In Perfdata.cfg I saw timeouts and tweaked it to:

Code: Select all

TIMEOUT = 5 to
TIMEOUT = 20

NPCD.cfg I saw MAX load errors and turned this up too but I'm worried about turning this setting too high and causing more issues.

Code: Select all

load_threshold = 10.0
load_threshold = 25.0

The good news is that perf queue is slowly, but surely going down (it's down to 53K from 56K now)

Posted: **Mon Feb 02, 2015 9:32 am**

Looks like perfdata has stacked up. Run the following to clear the files:

Code: Select all

cd /usr/local/nagios/var/spool/perfdata/
find . -type f -delete

Is npcd running?

Code: Select all

service npcd status

Restart it for good measure:

Code: Select all

service npcd restart

Afterwards, run the following watch to see if files start stacking or if they are getting reaped correctly:

Code: Select all

watch "ls /usr/local/nagios/var/spool/perfdata/ | wc -l"

Kill the above command with ctrl-c.

Posted: **Mon Feb 02, 2015 10:35 am**

Would this cause my perf data that was spooled to be missing? Most of the graphs stopped back on the 28th - which I'm hoping will back fill now that the immediate issue (looks) resolved. Or would it even matter?

Posted: **Mon Feb 02, 2015 11:37 am**

Would this cause my perf data that was spooled to be missing? Most of the graphs stopped back on the 28th...

Yes, removing these files will wipe out the perfdata from 28th till now. However, there is no guarantee that the data would be ever processed if you didn't delete these files. They are way too many.

Posted: **Mon Feb 02, 2015 11:56 am**

lmiltchev wrote:
Would this cause my perf data that was spooled to be missing? Most of the graphs stopped back on the 28th...
Yes, removing these files will wipe out the perfdata from 28th till now. However, there is no guarantee that the data would be ever processed if you didn't delete these files. They are way too many.

OK - that's a good point. I flushed that directory and will have to live with that gap of data. What is a sustainable amount of spooled items? Is this a per system basis?

Posted: **Mon Feb 02, 2015 12:29 pm**

OK the queue is near the single digits or 0 so it looks like the changes are keeping up. My 4 / 24 hour graphs are still empty but it looks like letting it churn through earlier allow me to recover from Jan 28th to Feb 1 at least.

But I still don't see new data from the 4hr graph and it's been a while since I flushed the perf data spool. Is this normal?

Posted: **Mon Feb 02, 2015 3:24 pm**

Make sure npcd and cron are running by running the following.

Code: Select all

service npcd status
service crond status

If either of them are not running, try restarting them by doing this.

Code: Select all

service npcd restart
service crond restart

Posted: **Mon Feb 02, 2015 3:39 pm**

OK I'm getting the graphs back so the combination of these fixes worked! Can we make a feature request for monitoring Nagios processing?

I found:

Performance Graphs - (monitoring /usr/local/nagios/var/spool/perfdata...)
Monitoring Kernel queues - ipcs -q

Thanks again for the help!

Nagios Support Forum

No graphs after updating Nagios XI

No graphs after updating Nagios XI

Re: No graphs after updating Nagios XI

Re: No graphs after updating Nagios XI

Re: No graphs after updating Nagios XI

Re: No graphs after updating Nagios XI

Re: No graphs after updating Nagios XI

Re: No graphs after updating Nagios XI

Re: No graphs after updating Nagios XI

Re: No graphs after updating Nagios XI

Re: No graphs after updating Nagios XI