For some service we are only getting recovery mail

Support forum for Nagios Core, Nagios Plugins, NCPA, NRPE, NSCA, NDOUtils and more. Engage with the community of users including those using the open source solutions.
chintan1511
Posts: 5
Joined: Tue Jun 18, 2019 5:26 am

For some service we are only getting recovery mail

Post by chintan1511 »

Hi Team,

We are using Nagios Core for the last 4 months. It's working very well. Currently, We are getting only recovery mail for some VM's service. It continuous trigger. As we checked on logs, we are getting critical socket timeout for that service. but it only sends recovery mail. We are not getting any problem notification.

Can anyone help us?
scottwilkerson
DevOps Engineer
Posts: 19396
Joined: Tue Nov 15, 2011 3:11 pm
Location: Nagios Enterprises
Contact:

Re: For some service we are only getting recovery mail

Post by scottwilkerson »

What version of Nagios Core are you using?

There were a few bugs related to this that should be resolved in 4.4.3
Former Nagios employee
Creator:
ahumandesign.com
enneagrams.com
chintan1511
Posts: 5
Joined: Tue Jun 18, 2019 5:26 am

Re: For some service we are only getting recovery mail

Post by chintan1511 »

Hi Team,

Thanks for the reply. Currently, we are working on 4.4.2.

Here is my log detail:
[1560848682] HOST ALERT: myProdVM;DOWN;SOFT;1;TCP CRITICAL - Invalid hostname, address or socket: myProdVM.url.com
[1560848727] SERVICE ALERT: myProdVM;C:\ Drive Space;CRITICAL;HARD;1;CRITICAL - Socket timeout
[1560848728] SERVICE ALERT: myProdVM;CPU Load;CRITICAL;HARD;1;CRITICAL - Socket timeout
[1560848744] HOST ALERT: myProdVM;UP;SOFT;1;TCP OK - 0.001 second response time on myProdVM.url.com port 12489
[1560848848] SERVICE NOTIFICATION: nagiosadmin;myProdVM;C:\ Drive Space;OK;notify-service-by-email;c:\ - total: 126.51 Gb - used: 55.41 Gb (44%) - free 71.10 Gb (56%)
[1560848848] SERVICE NOTIFICATION: nagiosadmin2;myProdVM;C:\ Drive Space;OK;notify-service-by-email;c:\ - total: 126.51 Gb - used: 55.41 Gb (44%) - free 71.10 Gb (56%)
[1560848848] SERVICE ALERT: myProdVM;C:\ Drive Space;OK;HARD;1;c:\ - total: 126.51 Gb - used: 55.41 Gb (44%) - free 71.10 Gb (56%)
[1560848848] SERVICE NOTIFICATION: nagiosadmin;myProdVM;CPU Load;OK;notify-service-by-email;CPU Load 2% (5 min average)
[1560848848] SERVICE NOTIFICATION: nagiosadmin2;myProdVM;CPU Load;OK;notify-service-by-email;CPU Load 2% (5 min average)
[1560848848] SERVICE ALERT: myProdVM;CPU Load;OK;HARD;1;CPU Load 2% (5 min average)
(Note: Changins HostName and host URL.)

The issue was not happening for starting 3 months. Currently, we are getting recovery mail for some service.

Is that same bug which can solve on 4.4.3? Could you please elaborate on the issue?
scottwilkerson
DevOps Engineer
Posts: 19396
Joined: Tue Nov 15, 2011 3:11 pm
Location: Nagios Enterprises
Contact:

Re: For some service we are only getting recovery mail

Post by scottwilkerson »

chintan1511 wrote:The issue was not happening for starting 3 months. Currently, we are getting recovery mail for some service.

Is that same bug which can solve on 4.4.3? Could you please elaborate on the issue?
Yes that looks like it could be it

https://github.com/NagiosEnterprises/na ... /Changelog

Specifically caused by these

Code: Select all

* Fixed services sending recovery emails when they recover if host in down state (#572) (Scott Wilkerson)
* Fixed services in soft states sometimes not switching into hard states (#576) (Jake Omann)
Former Nagios employee
Creator:
ahumandesign.com
enneagrams.com
chintan1511
Posts: 5
Joined: Tue Jun 18, 2019 5:26 am

Re: For some service we are only getting recovery mail

Post by chintan1511 »

Hi Team,

I upgraded with 4.4.3 version.

I am still getting recovery mail. Also, there is another issue. For some service, I got problem notification but not getting recovery mail.
User avatar
mcapra
Posts: 3739
Joined: Thu May 05, 2016 3:54 pm

Re: For some service we are only getting recovery mail

Post by mcapra »

The \ characters in the service status could be tripping some things up. Do the notifications you're experiencing problems with all related to Windows disk checks?

Depending on the underlying notification command you have defined, this un-escaped character could be causing issues when the command fully evaluates and is executed.
Former Nagios employee
https://www.mcapra.com/
User avatar
cdienger
Support Tech
Posts: 5045
Joined: Tue Feb 07, 2017 11:26 am

Re: For some service we are only getting recovery mail

Post by cdienger »

Are you only seeing this with Windows disk checks as @mcapra asked about?

I'd be curious to see the configuration files for these hosts and services to see if we can reproduce it here.
As of May 25th, 2018, all communications with Nagios Enterprises and its employees are covered under our new Privacy Policy.
chintan1511
Posts: 5
Joined: Tue Jun 18, 2019 5:26 am

Re: For some service we are only getting recovery mail

Post by chintan1511 »

Hi Team,

Sorry for the delay in reply.
Nope. We are also getting alert for memory usage, CPU Load, etc.
For that, can I share log details? Is there any other way without sending Host and service files? As per our policy, we can't share our data.

Thanks for helping.
ssax
Dreams In Code
Posts: 7682
Joined: Wed Feb 11, 2015 12:54 pm

Re: For some service we are only getting recovery mail

Post by ssax »

You can show us the nagios.log entries for the host AND this hosts services (after the upgrade) so that we can see what exactly is occurring (we need to see all HARD/SOFT states for BOTH the HOST AND the SERVICES over this timeperiod so that we can see what state they are all in when things occur).

What do the services show for the notification_options in your objects.cache? The contacts? The host?

- Check the HOST notification_options to make sure you have Warning, Critical, and Recovery selected
- Check the SERVICE notification_options to make sure you have Warning, Critical, and Recovery selected
- Check the CONTACT definitions and make sure they have Warning, Critical, and Recovery selected
chintan1511
Posts: 5
Joined: Tue Jun 18, 2019 5:26 am

Re: For some service we are only getting recovery mail

Post by chintan1511 »

Here is the log details:
[1563423742] SERVICE ALERT: myProdVM;Nagios Client;CRITICAL;SOFT;1;CRITICAL - Socket timeout
[1563423753] HOST ALERT: myProdVM;DOWN;SOFT;1;CRITICAL - Socket timeout
[1563423805] HOST ALERT: myProdVM;UP;SOFT;1;TCP OK - 0.001 second response time on url.domain.com port 12489
[1563423892] SERVICE ALERT: myProdVM;Nagios Client;CRITICAL;SOFT;2;CRITICAL - Socket timeout
[1563423920] SERVICE ALERT: myProdVM;nssm;CRITICAL;SOFT;1;CRITICAL - Socket timeout
[1563424012] SERVICE NOTIFICATION: nagiosadmin;myProdVM;Nagios Client;OK;notify-service-by-email;nscp.exe: Running
[1563424012] SERVICE NOTIFICATION: nagiosadmin2;myProdVM;Nagios Client;OK;notify-service-by-email;nscp.exe: Running
[1563424012] SERVICE NOTIFICATION: nagiosadmin3;myProdVM;Nagios Client;OK;notify-service-by-email;nscp.exe: Running
[1563424012] SERVICE NOTIFICATION: nagiosadmin4;myProdVM;Nagios Client;OK;notify-service-by-email;nscp.exe: Running
[1563424012] SERVICE ALERT: myProdVM;Nagios Client;OK;HARD;3;nscp.exe: Running
[1563424040] SERVICE ALERT: myProdVM;nssm;OK;SOFT;2;nssm.exe: Running
Note: (Note: Changing on HostName and host URL.)

Here are the details on Notification_optiorn on Host and services.
servie notification_options : w,u,c,r
Host notification_options: d,r
contact Defination: service_notification_options w,u,c,r,f,s
host_notification_options d,u,r,f,s

Could you please help where we can change. So, we will not get only recovery mail for some services.

Thanks for your support.
Locked