Gauge dashlet broken in large environment

This support forum board is for support questions relating to Nagios XI, our flagship commercial network monitoring solution.
User avatar
BanditBBS
Posts: 2474
Joined: Tue May 31, 2011 12:57 pm
Location: Scio, OH
Contact:

Gauge dashlet broken in large environment

Post by BanditBBS »

In small environments it works fine. In my large one I get hundreds upon hundreds of this:

Code: Select all

[Tue Dec 02 15:41:50 2014] [error] [client 10.160.1.10] PHP Notice:  Array to string conversion in /usr/local/nagiosxi/html/includes/dashlets/gauges/gauges.inc.php on line 316, referer: https://monitoring.itciss.com/nagiosxi/dashboards/dashlets.php
and the host and service come up blank, so I cant add it.

There is a tracker opened, but this is sort of a high priority for me now so wanted to make sure it was known wider or if anyone else had a fix.

http://tracker.nagios.com/view.php?id=591
2 of XI5.6.14 Prod/DR/DEV - Nagios LogServer 2 Nodes
See my projects on the Exchange at BanditBBS - Also check out my Nagios stuff on my personal page at Bandit's Home and at github
User avatar
lmiltchev
Bugs find me
Posts: 13589
Joined: Mon May 23, 2011 12:15 pm

Re: Gauge dashlet broken in large environment

Post by lmiltchev »

You see this on the 2014r1.4 server, correct? How "large" is this instance? Is there option to test this on the latest XI 92014R2.0)?
Be sure to check out our Knowledgebase for helpful articles and solutions!
User avatar
BanditBBS
Posts: 2474
Joined: Tue May 31, 2011 12:57 pm
Location: Scio, OH
Contact:

Re: Gauge dashlet broken in large environment

Post by BanditBBS »

lmiltchev wrote:You see this on the 2014r1.4 server, correct? How "large" is this instance? Is there option to test this on the latest XI 92014R2.0)?
Yes the 2014 1.5 server. 741 hosts, 11181 services.

I can't test on 2.0 unfortunately. My dev box is 2.0 but it only has 2 hosts on it :(
2 of XI5.6.14 Prod/DR/DEV - Nagios LogServer 2 Nodes
See my projects on the Exchange at BanditBBS - Also check out my Nagios stuff on my personal page at Bandit's Home and at github
User avatar
lmiltchev
Bugs find me
Posts: 13589
Joined: Mon May 23, 2011 12:15 pm

Re: Gauge dashlet broken in large environment

Post by lmiltchev »

We are also going to have hard time testing this in house... :) I hope other users with large setups will give us some feedback.
Be sure to check out our Knowledgebase for helpful articles and solutions!
cmerchant
Posts: 546
Joined: Wed Sep 24, 2014 11:19 am

Re: Gauge dashlet broken in large environment

Post by cmerchant »

I believe this is also a related post, but was not resolved.

http://support.nagios.com/forum/viewtop ... 10#p110010
User avatar
BanditBBS
Posts: 2474
Joined: Tue May 31, 2011 12:57 pm
Location: Scio, OH
Contact:

Re: Gauge dashlet broken in large environment

Post by BanditBBS »

Adding 700 hosts to dev server now....will report back.

The odd thing is the line it is erroring out on: $perfdata_s = explode('=', $perfdata_datasource);

The only reason for that to error would be if the format changed, and we all know it hasn't, it's perfdata, it has to be in that format, right?

EDIT: Added 7500 checks to my dev 2014r2.0. It still works. and FYI - Easy as ever to quickly add 7500 services to my environment for testing :) Not sure what to do now. Was anything changed between 1.5 and 2.0 that could have resolved this? Could I have some corrupted perf data on my prod server causing this issue? If so, how could I begin to find it?
Last edited by BanditBBS on Tue Dec 02, 2014 10:46 pm, edited 2 times in total.
2 of XI5.6.14 Prod/DR/DEV - Nagios LogServer 2 Nodes
See my projects on the Exchange at BanditBBS - Also check out my Nagios stuff on my personal page at Bandit's Home and at github
User avatar
BanditBBS
Posts: 2474
Joined: Tue May 31, 2011 12:57 pm
Location: Scio, OH
Contact:

Re: Gauge dashlet broken in large environment

Post by BanditBBS »

Ok, screw it, I upgraded to 2014r2.0. It still isn't working, same issue. It almost has to be corrupted perfdata or something. Someone at Nagios wanna give me the cli command to export to a txt file all the perfdata the dashlet is looking at. So I can try and scan through for something obvious!

edit: Exported service status from the backend. Couldn't see anything staring me in the face saying "i'm bad, I'm bad!" Anyone have any ideas why it'd work on one server and not the other?
2 of XI5.6.14 Prod/DR/DEV - Nagios LogServer 2 Nodes
See my projects on the Exchange at BanditBBS - Also check out my Nagios stuff on my personal page at Bandit's Home and at github
sreinhardt
-fno-stack-protector
Posts: 4366
Joined: Mon Nov 19, 2012 12:10 pm

Re: Gauge dashlet broken in large environment

Post by sreinhardt »

The gauges and graphs use the exact same commands afaik with rrdtool to export data. Only difference really should be how it splits the data beyond that for per track gauges. So when you had 2 hosts and a few services, it worked perfectly fine, but as soon as you added 7.5k?(was this supposed to be 750?) additional services, it decided not to play nice? Were those services added to a single host or many hosts? I ask as they way this tread reads, it should be an issue with accessing the rrds themselves or finding the right directory, but then I would also expect regular graphs to break. But at the same time, if you are adding 7.5k services to a single host, I would not be shocked if php didn't appreciate that too much.
Nagios-Plugins maintainer exclusively, unless you have other C language bugs with open-source nagios projects, then I am happy to help! Please pm or use other communication to alert me to issues as I no longer track the forum.
User avatar
BanditBBS
Posts: 2474
Joined: Tue May 31, 2011 12:57 pm
Location: Scio, OH
Contact:

Re: Gauge dashlet broken in large environment

Post by BanditBBS »

sreinhardt wrote:The gauges and graphs use the exact same commands afaik with rrdtool to export data. Only difference really should be how it splits the data beyond that for per track gauges. So when you had 2 hosts and a few services, it worked perfectly fine, but as soon as you added 7.5k?(was this supposed to be 750?) additional services, it decided not to play nice? Were those services added to a single host or many hosts? I ask as they way this tread reads, it should be an issue with accessing the rrds themselves or finding the right directory, but then I would also expect regular graphs to break. But at the same time, if you are adding 7.5k services to a single host, I would not be shocked if php didn't appreciate that too much.
i had multiple posts as I was troubleshooting(I literally edited the one post probably 10 times), so I'll clear this up......

1.) It still works on the dev server after adding 7500(7.5k) services/750 hosts. I can pick any one of those new services and the gauge dashlet works fine.
2.) It does not work in my prod server which was 1.5 but I upgraded to 2.0 and still doesn't function. Everything is now same version between the 2 servers
3.) Gauge dashlet does not use rrd. It makes a bunch of call backwards through many many files to eventually use the backend api.
2 of XI5.6.14 Prod/DR/DEV - Nagios LogServer 2 Nodes
See my projects on the Exchange at BanditBBS - Also check out my Nagios stuff on my personal page at Bandit's Home and at github
abrist
Red Shirt
Posts: 8334
Joined: Thu Nov 15, 2012 1:20 pm

Re: Gauge dashlet broken in large environment

Post by abrist »

BanditBBS wrote:3.) Gauge dashlet does not use rrd. It makes a bunch of call backwards through many many files to eventually use the backend api.
Correct. It should just use the last value for the gauge (which is accessible through a number of apis).
I know there were updates to the gauges recently (before 2.0). Maybe your upgrade did not complete?
I would be willing to dive into a remote session tomorrow if necessary . . .
Former Nagios employee
"It is turtles. All. The. Way. Down. . . .and maybe an elephant or two."
VI VI VI - The editor of the Beast!
Come to the Dark Side.
Locked