NDO2DB Issue out of the blue

This support forum board is for support questions relating to Nagios XI, our flagship commercial network monitoring solution.
tmcdonald
Posts: 9117
Joined: Mon Sep 23, 2013 8:40 am

Re: NDO2DB Issue out of the blue

Post by tmcdonald »

Code: Select all

kill -9 7859
kill -9 15856
kill -9 22989
Then run the ps again to make sure nothing is running. service nagios start and ps once more, should just be a sandwich of "nagios -d, workers, nagios -d".
Former Nagios employee
User avatar
BanditBBS
Posts: 2474
Joined: Tue May 31, 2011 12:57 pm
Location: Scio, OH
Contact:

Re: NDO2DB Issue out of the blue

Post by BanditBBS »

Yeah Trevor, that does what we want, but the question is, what causes it? I rebooted the server yesterday at 11am, so what between then and now caused 2 extra?
2 of XI5.6.14 Prod/DR/DEV - Nagios LogServer 2 Nodes
See my projects on the Exchange at BanditBBS - Also check out my Nagios stuff on my personal page at Bandit's Home and at github
tmcdonald
Posts: 9117
Joined: Mon Sep 23, 2013 8:40 am

Re: NDO2DB Issue out of the blue

Post by tmcdonald »

That's a lot harder to say. I suppose a large environment could cause a "service nagios restart" to fail, or the Apply Config to not properly kill off the old nagios process before starting a new one. Is there any pattern to the multiple processes? Do they happen after every Apply Config or randomly? Does it ever happen spontaneously, like without running Apply Config or otherwise resetting nagios?
Former Nagios employee
User avatar
BanditBBS
Posts: 2474
Joined: Tue May 31, 2011 12:57 pm
Location: Scio, OH
Contact:

Re: NDO2DB Issue out of the blue

Post by BanditBBS »

tmcdonald wrote:That's a lot harder to say. I suppose a large environment could cause a "service nagios restart" to fail, or the Apply Config to not properly kill off the old nagios process before starting a new one. Is there any pattern to the multiple processes? Do they happen after every Apply Config or randomly? Does it ever happen spontaneously, like without running Apply Config or otherwise resetting nagios?
I just did an apply config and everything is still fine.....I'll keep an eye on the number of nagios processes running and try and figure out what's causing it. Hopefully it behaves from now until tomorrow morning at least :) Keep this open please(but feel free to get off your dashboard :) )
2 of XI5.6.14 Prod/DR/DEV - Nagios LogServer 2 Nodes
See my projects on the Exchange at BanditBBS - Also check out my Nagios stuff on my personal page at Bandit's Home and at github
tmcdonald
Posts: 9117
Joined: Mon Sep 23, 2013 8:40 am

Re: NDO2DB Issue out of the blue

Post by tmcdonald »

Let us know. Realistically we could write a check that sees if multiples are running, and if so does a killall nagios. Overkill perhaps, and a band-aid to be sure, but just a thought.
Former Nagios employee
User avatar
BanditBBS
Posts: 2474
Joined: Tue May 31, 2011 12:57 pm
Location: Scio, OH
Contact:

Re: NDO2DB Issue out of the blue

Post by BanditBBS »

tmcdonald wrote:Let us know. Realistically we could write a check that sees if multiples are running, and if so does a killall nagios. Overkill perhaps, and a band-aid to be sure, but just a thought.
Well...things aren't well. I checked it once after I got home and notice the active checks numbers started to fall and reached 0 so I restarted NDO2DB and all was well for a few hours. I even set the WiiU remote up to my XI page so I could keep an eye on it in the livingroom. Around 9:30pm I noticed it was again showing no active checks in past 1, 5, 15 mins. Easy fix for me was to do an apply config from the WiiU Remote. That fixe dit and all was well for just a few minutes(maybe 10-15). At that point I saw the first check mark was a red exclamation, indicating the monitoring engine was not running. So I tell it to start(using the GUI) and then goto my home office and do a ps. Well, now there are two instances of nagios running. I killed the first one, leaving the new one that was just started and the red exclamation turned into a green check after a few moments.

This is becoming nuts, I can't sleep tonight because I may be needed to troubleshoot this more if NDO stops or whatever. Can't for the life of me figure out why this started to do this out of the blue like this, but I desperately need some more help now, we can't continue like this.

EDIT: Just did an apply and now two processes running again
EDIT2: I'm really thinking about undoing the offload of ndo2db....hmmm
Last edited by BanditBBS on Wed Aug 19, 2015 10:13 pm, edited 1 time in total.
2 of XI5.6.14 Prod/DR/DEV - Nagios LogServer 2 Nodes
See my projects on the Exchange at BanditBBS - Also check out my Nagios stuff on my personal page at Bandit's Home and at github
User avatar
Box293
Too Basu
Posts: 5126
Joined: Sun Feb 07, 2010 10:55 pm
Location: Deniliquin, Australia
Contact:

Re: NDO2DB Issue out of the blue

Post by Box293 »

We could try defining how many check workers are allowed to start and see if that helps, reducing the number may help.

https://assets.nagios.com/downloads/nag ... gmain.html
Check Workers

Format: check_workers=<#>
Example: check_workers=10

This setting specifies how many worker process should be started when Nagios Core starts. Worker processes are used to perform host and service checks. If the number of workers is not specified, a default number of workers is determined based on the number of CPU cores on the system (1.5 workers per core). If not specified, there is always a minimum of 4 workers.
Maybe try 16.
As of May 25th, 2018, all communications with Nagios Enterprises and its employees are covered under our new Privacy Policy.
User avatar
BanditBBS
Posts: 2474
Joined: Tue May 31, 2011 12:57 pm
Location: Scio, OH
Contact:

Re: NDO2DB Issue out of the blue

Post by BanditBBS »

I did knock it down to 18, will try 16. I'm all for trying things, but I will keep mentioning, it was running fine for so long, with just threshold changes and other minor things, no major changes to anything.
2 of XI5.6.14 Prod/DR/DEV - Nagios LogServer 2 Nodes
See my projects on the Exchange at BanditBBS - Also check out my Nagios stuff on my personal page at Bandit's Home and at github
User avatar
Box293
Too Basu
Posts: 5126
Joined: Sun Feb 07, 2010 10:55 pm
Location: Deniliquin, Australia
Contact:

Re: NDO2DB Issue out of the blue

Post by Box293 »

Have you started monitoring any new services or started using a new plugin?
As of May 25th, 2018, all communications with Nagios Enterprises and its employees are covered under our new Privacy Policy.
User avatar
BanditBBS
Posts: 2474
Joined: Tue May 31, 2011 12:57 pm
Location: Scio, OH
Contact:

Re: NDO2DB Issue out of the blue

Post by BanditBBS »

No new plugins in the past couple weeks at least and this issue just started happening Friday.

Just saw this in the log on the offloaded DB/NDO server:

Code: Select all

Aug 19 22:40:07 iss-chi-nag09 ndo2db: Warning: Retrying message send. This can occur because you have too few messages allowed or too few total bytes allowed in message queues. You are currently using 128000 of 32768 messages and 131072000 of 131072000 bytes in the queue. See README for kernel tuning options.
Aug 19 22:40:09 iss-chi-nag09 ndo2db: Message sent to queue.
Aug 19 22:40:09 iss-chi-nag09 ndo2db: Warning: queue send error, retrying...
Aug 19 22:40:12 iss-chi-nag09 ndo2db: Message sent to queue.
Aug 19 22:40:12 iss-chi-nag09 ndo2db: Warning: queue send error, retrying...
Aug 19 22:40:14 iss-chi-nag09 ndo2db: Message sent to queue.
Aug 19 22:40:14 iss-chi-nag09 ndo2db: Warning: queue send error, retrying...
Aug 19 22:40:16 iss-chi-nag09 ndo2db: Message sent to queue.
Aug 19 22:40:16 iss-chi-nag09 ndo2db: Warning: queue send error, retrying...
Aug 19 22:40:19 iss-chi-nag09 ndo2db: Message sent to queue.
Aug 19 22:40:19 iss-chi-nag09 ndo2db: Warning: queue send error, retrying...
Aug 19 22:40:21 iss-chi-nag09 ndo2db: Message sent to queue.
Aug 19 22:40:21 iss-chi-nag09 ndo2db: Warning: queue send error, retrying...
Aug 19 22:40:23 iss-chi-nag09 ndo2db: Message sent to queue.
Aug 19 22:40:23 iss-chi-nag09 ndo2db: Warning: queue send error, retrying...
Aug 19 22:40:25 iss-chi-nag09 ndo2db: Message sent to queue.
Aug 19 22:40:25 iss-chi-nag09 ndo2db: Warning: queue send error, retrying...
Aug 19 22:40:27 iss-chi-nag09 ndo2db: Message sent to queue.
Aug 19 22:40:27 iss-chi-nag09 ndo2db: Warning: queue send error, retrying...
Aug 19 22:40:30 iss-chi-nag09 ndo2db: Message sent to queue.
Aug 19 22:40:30 iss-chi-nag09 ndo2db: Warning: queue send error, retrying...
Aug 19 22:40:32 iss-chi-nag09 ndo2db: Message sent to queue.
Aug 19 22:40:32 iss-chi-nag09 ndo2db: Warning: queue send error, retrying...
Aug 19 22:40:35 iss-chi-nag09 ndo2db: Message sent to queue.
Aug 19 22:40:35 iss-chi-nag09 ndo2db: Warning: queue send error, retrying...
Aug 19 22:40:37 iss-chi-nag09 ndo2db: Message sent to queue.
Aug 19 22:40:37 iss-chi-nag09 ndo2db: Warning: queue send error, retrying...
Aug 19 22:40:39 iss-chi-nag09 ndo2db: Message sent to queue.
Aug 19 22:40:39 iss-chi-nag09 ndo2db: Warning: queue send error, retrying...
Aug 19 22:40:42 iss-chi-nag09 ndo2db: Message sent to queue.
Aug 19 22:40:42 iss-chi-nag09 ndo2db: Warning: queue send error, retrying...
Restarted NDO and all is well again

This was in XI messages file:

Code: Select all

Aug 19 22:40:50 iss-chi-nag05 nagios: ndomod: Error writing to data sink!  Some output may get lost...
Aug 19 22:40:50 iss-chi-nag05 nagios: ndomod: Please check remote ndo2db log, database connection or SSL Parameters
Aug 19 22:41:06 iss-chi-nag05 nagios: ndomod: Successfully reconnected to data sink!  2869 items lost, 5000 queued items to flush.
Aug 19 22:41:06 iss-chi-nag05 nagios: ndomod: Successfully flushed 5000 queued items to data sink.
2 of XI5.6.14 Prod/DR/DEV - Nagios LogServer 2 Nodes
See my projects on the Exchange at BanditBBS - Also check out my Nagios stuff on my personal page at Bandit's Home and at github
Locked