Nagios 4.2.3 Crash every few hors

Support forum for Nagios Core, Nagios Plugins, NCPA, NRPE, NSCA, NDOUtils and more. Engage with the community of users including those using the open source solutions.
Mike-sbg
Posts: 37
Joined: Mon Jun 01, 2015 1:05 am

Nagios 4.2.3 Crash every few hors

Post by Mike-sbg »

Since End of November I use Nagios 4.2.3 with Livestatus 1.2.8p14.

It worked without any problemy, while It was our testsystem, where the results of our company where transfered via ohcp and oscp.

The I putted it as our primary nagios system, and from this time it crashes every few hours, which is not very funny to me.
The problem was persistent, while we upgraded from nagios 4.2.1 to 4.2.3 and changed the Livestatus version from Livestatus 1.2.8p11 to Livestatus 1.2.8p14

Does anybody know the problem and can help me, I also posted this problem to github and J.Frikcson suggested me the post the problem in this forum
(The Github-Thread is https://github.com/NagiosEnterprises/na ... issues/299)
dwhitfield
Former Nagios Staff
Posts: 4583
Joined: Wed Sep 21, 2016 10:29 am
Location: NoLo, Minneapolis, MN
Contact:

Re: Nagios 4.2.3 Crash every few hors

Post by dwhitfield »

Unfortunately, mk-livestatus is not something we support, but John is right to point you here. There are plenty of community contributors that can possibly help out.

Have you looked at alternatives such as mod_gearman? I'll also note that we have our own offloader planned for release in April: https://www.nagios.com/roadmaps/

More directly to your issue...what are the differences in the test system and production? Is the load the same? Are the check types the same? Is the network the same? The more specific you can be, the easier it will be to determine the point of failure.

For Techs: files emailed to John are shared.
Mike-sbg
Posts: 37
Joined: Mon Jun 01, 2015 1:05 am

Re: Nagios 4.2.3 Crash every few hors

Post by Mike-sbg »

I used nagios 3 for a few years and at the beginning of 2016 I installed a nagios 4 version.

The Nagios 3 System forwarded every check result via ocsp and ochp (sending the results via nsca-ng) to our new nagios (v4) system.
At the beginning I had problems with the stability of the new nagios system, but from August until end of November everything worked fine, so I decided to promote nagios as our primary nagios system.

The problems started two days afterwards.

What are the differences between the old and the new installation?

There two main differences:

- The Version of Nagios
- the primary nagios gets passive checks results via nsca-ng and via nsca checks

The NSCA-NG checked worked in the test environment without any problem, the only difference I know is, that there are results transmitted via nsca (old style), while in testmode every check was transmitted with the nsca-ng (from the former primary nagios)

By the way I installed a second nagios 4 system with the same software components, but at the moment it get's no passive check results (and disabled active checks) to make it short, it doesn't do anything with checks, but it is stable.
Today I'll try to send checks also to the new "test-nagios" to investigate if it stays stable, when it receives check-results.

So to be precise, what software components to I use:
  • nagios v 4.2.3
  • Livestatus 1.2.8p14
  • nsca (Server) 2.9.1
  • nsca-ng-server 1.4.1
  • everything installed on centos 7
rkennedy
Posts: 6579
Joined: Mon Oct 05, 2015 11:45 am

Re: Nagios 4.2.3 Crash every few hors

Post by rkennedy »

Livestatus can definitely have an affect on this, which is why it'll make it pretty difficult for us to help with. If the new system has the same errors, I'd be interested to see the logs produced as this might help figure out where the true issue lies.
Former Nagios Employee
Mike-sbg
Posts: 37
Joined: Mon Jun 01, 2015 1:05 am

Re: Nagios 4.2.3 Crash every few hors

Post by Mike-sbg »

Which log would help you ... I already sent' (crash-)logs to J.Frickson, but they didn't provide a clear solution.
How can I help, that we can get any step further to a solution?

As I use thruk for presentation, I'm force to use a tool like livestatus ... can you suggest me another solution?
dwhitfield
Former Nagios Staff
Posts: 4583
Joined: Wed Sep 21, 2016 10:29 am
Location: NoLo, Minneapolis, MN
Contact:

Re: Nagios 4.2.3 Crash every few hors

Post by dwhitfield »

Mike-sbg wrote: As I use thruk for presentation, I'm force to use a tool like livestatus ... can you suggest me another solution?
I've never used thruk. What features does it have that you find useful?
Updates to the Nagios Core GUI
coming in 4.3, so if you can wait, that may be a solution. Unfortunately, I don't have anything more precise than what's posted at https://www.nagios.com/roadmaps/
Mike-sbg
Posts: 37
Joined: Mon Jun 01, 2015 1:05 am

Re: Nagios 4.2.3 Crash every few hors

Post by Mike-sbg »

I reverted to Nagios V 3 ... to get some time to make further investigations.

In the first step I'll disable livestatus ... I'm curious if it is getting stable then ...
dwhitfield
Former Nagios Staff
Posts: 4583
Joined: Wed Sep 21, 2016 10:29 am
Location: NoLo, Minneapolis, MN
Contact:

Re: Nagios 4.2.3 Crash every few hors

Post by dwhitfield »

Mike-sbg wrote:I reverted to Nagios V 3
Just so we have an idea of what bugs/features you have, which version of Nagios Core 3.x are you now running?
Mike-sbg
Posts: 37
Joined: Mon Jun 01, 2015 1:05 am

Re: Nagios 4.2.3 Crash every few hors

Post by Mike-sbg »

I have 3 different Nagios systems running at the moment:

* Nagios 3.5.1 with Livestatus 1.1.12p6 on Centos 6 <- is now the live-systems which works perfectly
* Nagios 4.2.4 with Livestatus 1.2.8p11 on Centos 7 -> crashes every few hours (but please read below)
* Nagios 4.2.4 without any Livestatus on Centos 7 -> works without any problems for 24 hours now

I experimented with the parameters of the livestatus.broker with this parameters:

Code: Select all

/usr/lib64/check_mk/livestatus.o /var/log/pnp4nagios/mklivestatus debug=1 log_file=/tmp/livestatus.log max_response_size=204857600 thread_stack_size=67108864
it worked perfectly for about 6 hours, than It crashed (this is the log of my watchdog):

Code: Select all

Tue Dec 27 12:05:00 2016 ;ATNAGIOSSR03;CORE; Livestatus V1.2.8p11 with special parameters
Tue Dec 27 18:50:01 2016 ;NAGIOS;CORE; Nagios Restart 
Tue Dec 27 23:08:02 2016 ;NAGIOS;CORE; Nagios Restart 
Wed Dec 28 01:05:01 2016 ;NAGIOS;CORE; Nagios Restart
Without the cache-parameters it crashes 1-2 times an hour!
dwhitfield
Former Nagios Staff
Posts: 4583
Joined: Wed Sep 21, 2016 10:29 am
Location: NoLo, Minneapolis, MN
Contact:

Re: Nagios 4.2.3 Crash every few hors

Post by dwhitfield »

Mike-sbg wrote: * Nagios 4.2.4 with Livestatus 1.2.8p11 on Centos 7 -> crashes every few hours (but please read below)
We can certainly leave this open for community input.

If your working 4.2.4 on CentOS 7 starts having issues, you should open a new thread. 24 hours isn't a ton of data, but it highly likely that would be a different issue. Of course, you could always reference this thread just in case.
Locked