NSCA 2.9 client problem

An open discussion forum for obtaining help with Nagios Core. Nagios Core users of all experience levels are welcome here. Subforum have been created for the discussion of Nagios Core and Nagios Plugin development.

NOTE: The SourceForge.net mailing lists have been deprecated in favor of this forum in order to expedite support and provide additional features not available on the old mailing list.

Re: NSCA 2.9 client problem

Postby tgriep » Tue Apr 16, 2019 9:09 am

The Suppressed messages means the system is generating lots of messages and journal is configured to drop some of them. This is called rate limit, and is useful to not overload the logging system.
To get all messages for troubleshooting, you need to increase these limits. This can be achieved by setting the variables RateLimitInterval and RateLimitBurst inside the config file /etc/systemd/journald.conf.
To turn off any kind of rate limiting, set either value to 0.

After changing those settings, see if the messages are logged and post them here.
As of May 25th, 2018, all communications with Nagios Enterprises and its employees are covered under our new Privacy Policy.

Be sure to check out our Knowledgebase for helpful articles and solutions!
User avatar
tgriep
Madmin
 
Posts: 7767
Joined: Thu Oct 30, 2014 9:02 am

Re: NSCA 2.9 client problem

Postby lhozzan » Wed Apr 17, 2019 8:03 am

Hello.

Thank you for advise. I attempt it, but no strange was logged.
Code: Select all
Apr 17 12:43:21 localhost nsca[26319]: Caught SIGTERM - shutting down...
Apr 17 12:43:21 localhost systemd[1]: Stopping NSCA for uk cluster...
Apr 17 12:43:21 localhost nsca[26319]: Cannot remove pidfile '/var/run/nsca_uk.pid' - check your privileges.
Apr 17 12:43:21 localhost nsca[26319]: Daemon shutdown
Apr 17 12:43:21 localhost systemd[1]: Stopped NSCA for uk cluster.
Apr 17 12:43:21 localhost systemd[1]: Starting NSCA for uk cluster...
Apr 17 12:43:21 localhost systemd[1]: Started NSCA for uk cluster.
Apr 17 12:43:21 localhost nsca[19077]: Starting up daemon
Apr 17 12:43:43 localhost nagios: job 6192 (pid=19268): read() returned error 11
Apr 17 12:43:54 localhost nagios: job 6192 (pid=19364): read() returned error 11
Apr 17 12:48:43 localhost nagios: job 6201 (pid=21905): read() returned error 11
Apr 17 12:48:53 localhost nagios: job 6201 (pid=21990): read() returned error 11
Apr 17 12:48:54 localhost nagios: job 6201 (pid=22004): read() returned error 11
Apr 17 12:48:57 localhost nagios: job 6201 (pid=22039): read() returned error 11
Apr 17 12:50:01 localhost systemd[1]: Started Session 383 of user root.
Apr 17 12:57:05 localhost nagios: job 6215 (pid=26319): read() returned error 11
Apr 17 12:57:55 localhost nsca[19077]: Caught SIGTERM - shutting down...
Apr 17 12:57:55 localhost systemd[1]: Stopping NSCA for uk cluster...
Apr 17 12:57:55 localhost nsca[19077]: Cannot remove pidfile '/var/run/nsca_uk.pid' - check your privileges.
Apr 17 12:57:55 localhost nsca[19077]: Daemon shutdown
Apr 17 12:57:55 localhost systemd[1]: Stopped NSCA for uk cluster.

Of course, this is output after filtering with grep (see before).

Have you any idea, what to check next for working solution?

Thank you for your effort.
lhozzan
 
Posts: 18
Joined: Wed Mar 20, 2019 10:43 am

Re: NSCA 2.9 client problem

Postby tgriep » Wed Apr 17, 2019 8:56 am

Check the permissions of where the NSCA PID file is created.
Apr 17 12:43:21 localhost nsca[26319]: Cannot remove pidfile '/var/run/nsca_uk.pid' - check your privileges.


Other than that, the logs don't show much other that the daemon starting and stopping.
Question, did you go back to running the NSCA server as a daemon or left it to run out of xinetd?

Do this, when there are stuck connections on the Nagios server, note the IP addresses.
Go to the remote systems at those IP addressed and see if the send_nsca application is still running and holding open the connection.
If so, stop it from running and see if the connection is closed on the Nagios server.
As of May 25th, 2018, all communications with Nagios Enterprises and its employees are covered under our new Privacy Policy.

Be sure to check out our Knowledgebase for helpful articles and solutions!
User avatar
tgriep
Madmin
 
Posts: 7767
Joined: Thu Oct 30, 2014 9:02 am

Re: NSCA 2.9 client problem

Postby lhozzan » Thu Apr 18, 2019 2:45 am

Hello.

tgriep wrote:Check the permissions of where the NSCA PID file is created.

Code: Select all
-rw-r--r--  1 nagios nagios    5 apr 15 07:07 nsca_uk.pid

I think, this warning we can ignore. This is only bounded to shuting down. When shutdown occured, this PID file is persistant, but when process is started, to this file is placed correct PID. If you wish, i can change unit file and place this PID file to another location.

Other than that, the logs don't show much other that the daemon starting and stopping.

Unfortunatelly yes. I not have any idea, what is wrong, what is reason, why is opened too many CLOSE_WAIT connects and what attempt next.

Question, did you go back to running the NSCA server as a daemon or left it to run out of xinetd?

Yes, running as a daemon direct under systemd. When running it under xinetd cost huge amount of CPU power.

tgriep wrote:Do this, when there are stuck connections on the Nagios server, note the IP addresses.
Go to the remote systems at those IP addressed and see if the send_nsca application is still running and holding open the connection.
If so, stop it from running and see if the connection is closed on the Nagios server.

So, you want to let NSCA take all possible connections and next investigate, that is on client side some holding connections? Just question for clarify.

Thank you for your effort.
lhozzan
 
Posts: 18
Joined: Wed Mar 20, 2019 10:43 am

Re: NSCA 2.9 client problem

Postby tgriep » Thu Apr 18, 2019 1:37 pm

Your question
"So, you want to let NSCA take all possible connections and next investigate, that is on client side some holding connections?"
Is yes, setup a client with NSCA 2.9.2 and see if that server causes the issue to happen, then if so, check the client's log files to see if there are any errors there.
As of May 25th, 2018, all communications with Nagios Enterprises and its employees are covered under our new Privacy Policy.

Be sure to check out our Knowledgebase for helpful articles and solutions!
User avatar
tgriep
Madmin
 
Posts: 7767
Joined: Thu Oct 30, 2014 9:02 am

Previous

Return to Nagios Core

Who is online

Users browsing this forum: No registered users and 23 guests