Nagios 4.2.3 Crash every few hors
Nagios 4.2.3 Crash every few hors
Since End of November I use Nagios 4.2.3 with Livestatus 1.2.8p14.
It worked without any problemy, while It was our testsystem, where the results of our company where transfered via ohcp and oscp.
The I putted it as our primary nagios system, and from this time it crashes every few hours, which is not very funny to me.
The problem was persistent, while we upgraded from nagios 4.2.1 to 4.2.3 and changed the Livestatus version from Livestatus 1.2.8p11 to Livestatus 1.2.8p14
Does anybody know the problem and can help me, I also posted this problem to github and J.Frikcson suggested me the post the problem in this forum
(The Github-Thread is https://github.com/NagiosEnterprises/na ... issues/299)
It worked without any problemy, while It was our testsystem, where the results of our company where transfered via ohcp and oscp.
The I putted it as our primary nagios system, and from this time it crashes every few hours, which is not very funny to me.
The problem was persistent, while we upgraded from nagios 4.2.1 to 4.2.3 and changed the Livestatus version from Livestatus 1.2.8p11 to Livestatus 1.2.8p14
Does anybody know the problem and can help me, I also posted this problem to github and J.Frikcson suggested me the post the problem in this forum
(The Github-Thread is https://github.com/NagiosEnterprises/na ... issues/299)
-
- Former Nagios Staff
- Posts: 4583
- Joined: Wed Sep 21, 2016 10:29 am
- Location: NoLo, Minneapolis, MN
- Contact:
Re: Nagios 4.2.3 Crash every few hors
Unfortunately, mk-livestatus is not something we support, but John is right to point you here. There are plenty of community contributors that can possibly help out.
Have you looked at alternatives such as mod_gearman? I'll also note that we have our own offloader planned for release in April: https://www.nagios.com/roadmaps/
More directly to your issue...what are the differences in the test system and production? Is the load the same? Are the check types the same? Is the network the same? The more specific you can be, the easier it will be to determine the point of failure.
For Techs: files emailed to John are shared.
Have you looked at alternatives such as mod_gearman? I'll also note that we have our own offloader planned for release in April: https://www.nagios.com/roadmaps/
More directly to your issue...what are the differences in the test system and production? Is the load the same? Are the check types the same? Is the network the same? The more specific you can be, the easier it will be to determine the point of failure.
For Techs: files emailed to John are shared.
Re: Nagios 4.2.3 Crash every few hors
I used nagios 3 for a few years and at the beginning of 2016 I installed a nagios 4 version.
The Nagios 3 System forwarded every check result via ocsp and ochp (sending the results via nsca-ng) to our new nagios (v4) system.
At the beginning I had problems with the stability of the new nagios system, but from August until end of November everything worked fine, so I decided to promote nagios as our primary nagios system.
The problems started two days afterwards.
What are the differences between the old and the new installation?
There two main differences:
- The Version of Nagios
- the primary nagios gets passive checks results via nsca-ng and via nsca checks
The NSCA-NG checked worked in the test environment without any problem, the only difference I know is, that there are results transmitted via nsca (old style), while in testmode every check was transmitted with the nsca-ng (from the former primary nagios)
By the way I installed a second nagios 4 system with the same software components, but at the moment it get's no passive check results (and disabled active checks) to make it short, it doesn't do anything with checks, but it is stable.
Today I'll try to send checks also to the new "test-nagios" to investigate if it stays stable, when it receives check-results.
So to be precise, what software components to I use:
The Nagios 3 System forwarded every check result via ocsp and ochp (sending the results via nsca-ng) to our new nagios (v4) system.
At the beginning I had problems with the stability of the new nagios system, but from August until end of November everything worked fine, so I decided to promote nagios as our primary nagios system.
The problems started two days afterwards.
What are the differences between the old and the new installation?
There two main differences:
- The Version of Nagios
- the primary nagios gets passive checks results via nsca-ng and via nsca checks
The NSCA-NG checked worked in the test environment without any problem, the only difference I know is, that there are results transmitted via nsca (old style), while in testmode every check was transmitted with the nsca-ng (from the former primary nagios)
By the way I installed a second nagios 4 system with the same software components, but at the moment it get's no passive check results (and disabled active checks) to make it short, it doesn't do anything with checks, but it is stable.
Today I'll try to send checks also to the new "test-nagios" to investigate if it stays stable, when it receives check-results.
So to be precise, what software components to I use:
- nagios v 4.2.3
- Livestatus 1.2.8p14
- nsca (Server) 2.9.1
- nsca-ng-server 1.4.1
- everything installed on centos 7
Re: Nagios 4.2.3 Crash every few hors
Livestatus can definitely have an affect on this, which is why it'll make it pretty difficult for us to help with. If the new system has the same errors, I'd be interested to see the logs produced as this might help figure out where the true issue lies.
Former Nagios Employee
Re: Nagios 4.2.3 Crash every few hors
Which log would help you ... I already sent' (crash-)logs to J.Frickson, but they didn't provide a clear solution.
How can I help, that we can get any step further to a solution?
As I use thruk for presentation, I'm force to use a tool like livestatus ... can you suggest me another solution?
How can I help, that we can get any step further to a solution?
As I use thruk for presentation, I'm force to use a tool like livestatus ... can you suggest me another solution?
-
- Former Nagios Staff
- Posts: 4583
- Joined: Wed Sep 21, 2016 10:29 am
- Location: NoLo, Minneapolis, MN
- Contact:
Re: Nagios 4.2.3 Crash every few hors
I've never used thruk. What features does it have that you find useful?Mike-sbg wrote: As I use thruk for presentation, I'm force to use a tool like livestatus ... can you suggest me another solution?
coming in 4.3, so if you can wait, that may be a solution. Unfortunately, I don't have anything more precise than what's posted at https://www.nagios.com/roadmaps/Updates to the Nagios Core GUI
Re: Nagios 4.2.3 Crash every few hors
I reverted to Nagios V 3 ... to get some time to make further investigations.
In the first step I'll disable livestatus ... I'm curious if it is getting stable then ...
In the first step I'll disable livestatus ... I'm curious if it is getting stable then ...
-
- Former Nagios Staff
- Posts: 4583
- Joined: Wed Sep 21, 2016 10:29 am
- Location: NoLo, Minneapolis, MN
- Contact:
Re: Nagios 4.2.3 Crash every few hors
Just so we have an idea of what bugs/features you have, which version of Nagios Core 3.x are you now running?Mike-sbg wrote:I reverted to Nagios V 3
Re: Nagios 4.2.3 Crash every few hors
I have 3 different Nagios systems running at the moment:
* Nagios 3.5.1 with Livestatus 1.1.12p6 on Centos 6 <- is now the live-systems which works perfectly
* Nagios 4.2.4 with Livestatus 1.2.8p11 on Centos 7 -> crashes every few hours (but please read below)
* Nagios 4.2.4 without any Livestatus on Centos 7 -> works without any problems for 24 hours now
I experimented with the parameters of the livestatus.broker with this parameters:
it worked perfectly for about 6 hours, than It crashed (this is the log of my watchdog):
Without the cache-parameters it crashes 1-2 times an hour!
* Nagios 3.5.1 with Livestatus 1.1.12p6 on Centos 6 <- is now the live-systems which works perfectly
* Nagios 4.2.4 with Livestatus 1.2.8p11 on Centos 7 -> crashes every few hours (but please read below)
* Nagios 4.2.4 without any Livestatus on Centos 7 -> works without any problems for 24 hours now
I experimented with the parameters of the livestatus.broker with this parameters:
Code: Select all
/usr/lib64/check_mk/livestatus.o /var/log/pnp4nagios/mklivestatus debug=1 log_file=/tmp/livestatus.log max_response_size=204857600 thread_stack_size=67108864
Code: Select all
Tue Dec 27 12:05:00 2016 ;ATNAGIOSSR03;CORE; Livestatus V1.2.8p11 with special parameters
Tue Dec 27 18:50:01 2016 ;NAGIOS;CORE; Nagios Restart
Tue Dec 27 23:08:02 2016 ;NAGIOS;CORE; Nagios Restart
Wed Dec 28 01:05:01 2016 ;NAGIOS;CORE; Nagios Restart
-
- Former Nagios Staff
- Posts: 4583
- Joined: Wed Sep 21, 2016 10:29 am
- Location: NoLo, Minneapolis, MN
- Contact:
Re: Nagios 4.2.3 Crash every few hors
We can certainly leave this open for community input.Mike-sbg wrote: * Nagios 4.2.4 with Livestatus 1.2.8p11 on Centos 7 -> crashes every few hours (but please read below)
If your working 4.2.4 on CentOS 7 starts having issues, you should open a new thread. 24 hours isn't a ton of data, but it highly likely that would be a different issue. Of course, you could always reference this thread just in case.