I've got an XI server here with about 900 Hosts and 4,880 Services. I'm having trouble running various reports - specifically when looking backwards in time more than a week or so. The Availability report and Executive Summary reports are two, for example. If I do a report for the Host and Service data for a single host, for the last month, they both take a long time to complete. (5 minutes?). This is also seemingly causing host and service check result processing to get delayed and in-turn congested; and since I have alerting setup for Host/Service check Latency, I get alerts for that.
Looking deeper, the troublesome reports seem to call 'avail.cgi', which maxes out at 100% cpu for a few minutes at a time. If I try to go back, or open other Nagios pages, I see my browser waiting for an available socket.
Doing a strace of avail.cgi, I see a LOT of messages such as this:
Code: Select all
brk(NULL) = 0x7ec9f000
brk(0x7ecc0000) = 0x7ecc0000
brk(NULL) = 0x7ecc0000
brk(0x7ece1000) = 0x7ece1000
brk(NULL) = 0x7ece1000
brk(0x7ed02000) = 0x7ed02000
brk(NULL) = 0x7ed02000
brk(0x7ed23000) = 0x7ed23000
brk(NULL) = 0x7ed23000Can you help me figure out what the problem here is? Is there perhaps any cleanup that I can or should perform?
I've removed all perfdata files of anything older than (not updated in) 90-days. I do have a lot of 'disabled' hosts and services sitting in the CCM.
Thanks,
-marc