So in one of my environments out of the 4 i am building concurrently, it seems some checks have stopped graphing. In a weird way these seems to be the same checks for all machine of that node type - eg Netowrk bandwith for X server node type. Disk IO for a different node type [even though we use a generic nrpe.cfg for both these node types, that is the same on all!]. I noticed this by checking ofr the error on the XML files: found extra data on update argument.
I am going to put it down to "weirdness" during the build phase, as this seems to have happened back 2mo ago. So here are my questions, and hopefully what I have as a solution also:
#1 This looks like a solution: https://support.nagios.com/kb/article.php?id=149
Is it still valid? Or should I just delete the RRD files? TBH they are 2mo old data so are useless now anyway. Just wondering for if this happens again.
#2 Is there any way NagiosXI can tell us when it has stopped graphing stuff? [With 3500 or so nodes across these environs with 20-30K checks in each, it could happen a bit