Nagios server /var disk space issue

This support forum board is for support questions relating to Nagios XI, our flagship commercial network monitoring solution.
Locked
lgaddam
Posts: 116
Joined: Wed Aug 28, 2019 1:01 am

Nagios server /var disk space issue

Post by lgaddam »

Hi Team,

We are facing issues with /var on our secondary server, disk space growing now at above 80%.

Earlier we have extended /var for 1 GB whne it reached above 80%, still the data is growing on the File system.

I have a look on this. There are thousand of files available on the path /var/spool/clientmqueue & this path utilizing more than 7GB.

[root@02 spool]# du -sh /var/spool/* |sort -n|grep G
7.1G /var/spool/clientmqueue
[root@02 spool]#

[root@02 spool]# pwd
/var/spool/clientmqueue
[root@02 spool]# ls -l|wc -l
1669695

Please check if you come acroos these directory clientqueue under /var/spool......kindly help..
User avatar
jdunitz
Posts: 235
Joined: Wed Feb 05, 2020 2:50 pm

Re: Nagios server /var disk space issue

Post by jdunitz »

/var/spool/clientmqueue is where outgoing mail goes when it can't be sent right away. Perhaps the remote mailserver wasn't reachable, or for some other reason, the mail got thrown into clientmqueue, where it will be retried until it is successfully sent, or exceeds the retry limit.

If you've got 7GB of queued-up mail, I'd say one of the following is probably true:

1) Sendmail isn't running, or running properly. You should have a queue runner showing up in ps if you've got mail in the queue:

Code: Select all

# ps -efa | grep sendmail
root     26049     1  0 Feb17 ?        00:00:19 sendmail: accepting connections
smmsp    26062     1  0 Feb17 ?        00:00:00 sendmail: Queue runner@01:00:00 for /var/spool/clientmqueue
2) Perhaps postfix isn't configured correctly for your environment

3) Perhaps you're just sending a massive amount of mail on a daily basis, and you're filling up your queue for that reason.

...and perhaps reason #3 is related to something in your XI setup. You might be sending mail to undeliverable addresses. You might have a cron job that's failing often and generating an email every time it does. There could be any number of other reasons, and you'll need to do a bit of investigating to figure out what is happening.

Some things that can help discover the reason for your enormous mail queue:

1) The "mailq" command will show you what messages are in the queue, as well as who they're from and who they're going to. Pay close attention to this, because it will give you some very good clues.
2) Have a look at /var/log/maillog will show you loads of useful info about what's happening with your mail. There will be a lot of good clues in here as well.
3) Is the load on your XI sever very high?

If you just want to flush your queue and start over (i.e., you don't care if your outgoing mail doesn't actually get sent, which you might not, because if they're notification mails from Nagios and they've been sitting in the queue for hours or days, they won't be relevant anymore...), you can do this:

Code: Select all

# postsuper -d ALL
And keep a close watch on your mail queue for a while, to see if messages start piling up right away.

Hope that helps!

Edit:
Also, please have a look at these three documents that explain some details of setting up users, contacts, and emails.
Part of your problem may be related to how you have those set up.
https://assets.nagios.com/downloads/nag ... ntacts.pdf
https://assets.nagios.com/downloads/nag ... Mailer.pdf
https://assets.nagios.com/downloads/nag ... ations.pdf
As of May 25th, 2018, all communications with Nagios Enterprises and its employees are covered under our new Privacy Policy.

Be sure to check out our Knowledgebase for helpful articles and solutions!
lgaddam
Posts: 116
Joined: Wed Aug 28, 2019 1:01 am

Re: Nagios server /var disk space issue

Post by lgaddam »

Thanks for the useful information.

I have informed this to Linux Admins to check my Nagios secondary server /var issue....they took a look on the machines.
They identified that the files increasing in clientqueue are related to Nagios anotifications.
In maillog , when I tested email, identified that the message is "deferred", they send the email not sent.
So, they checked whether sendmail is running or not and observed that sendmail was stopped.

Once we start the sendmail , the queue stopped increasing files.
Later after 12 hours, the file system space reduced automatically to from 85% to 44%.


MAny tha ks for your suggestion.


Here i have few queries.

1.Already email notifications are sending from Primary, is it necessary to sent email notofications from secondary as well
because users will get double time the same alert. (Each day primary backup gets restored in secondary)
2.How can i stop sending email notifications from secondary without queuing the email notificatuions in "clientqueue" file.
3.Please provide me some suggestions in how we have to take care these email notifications.
User avatar
jdunitz
Posts: 235
Joined: Wed Feb 05, 2020 2:50 pm

Re: Nagios server /var disk space issue

Post by jdunitz »

If I understand correctly, you have two Nagios servers that are supposed to be set up in a failover setup, correct?
If your primary Nagios machine stops working, you want the secondary to take over, yes?

If you want to avoid having duplicate emails and text messages for every event, you can disable active checks and notifications on the secondary server, and arrange to have them enabled if the primary server becomes unreachable.

Here is a page of all the things you can automate from the command line to manipulate service checks, host checks, notifications, restart Nagios, and more:

https://assets.nagios.com/downloads/nag ... ernalcmds/

Exactly how you put it together is up to you, but the gist of it is that you'll probably have a script running from cron on the secondary to check the state of the primary. You might also want something running on the primary to have it disable itself if it determines that it has gone offline, etc.

We have a document that explains much of this:
https://assets.nagios.com/downloads/nag ... ios-XI.pdf

There are more advanced possibilities for synchronization, if you're interested. Have a look at this:
https://allmybase.com/2010/10/04/settin ... s-servers/
That's not something we can support directly, but I'm happy to point you in the right direction.

I hope this helps solve your issue!

--Jeffrey
As of May 25th, 2018, all communications with Nagios Enterprises and its employees are covered under our new Privacy Policy.

Be sure to check out our Knowledgebase for helpful articles and solutions!
lgaddam
Posts: 116
Joined: Wed Aug 28, 2019 1:01 am

Re: Nagios server /var disk space issue

Post by lgaddam »

Hi Jeff,

Thanks for your valuable inputs. Looks interesting.

My environment has two Nagios instances in standalone physical servers.
If primary goes down, users use secondary Nagios instance.

It is not in Failover, manually I will insist all users to use secondary when primary is offline.
But both instances, all the time will be up and running and active.

Now the case is that, both are active and from both instances email notifications are sending.

When i started sendmail, lot of old notifications from secondary were filled in users mailbox with current timestamp.
They started escalating why lot of node down notifications received when the servers are up and running in Nagios at the moment.

So, i wanted to stop notifications in secondary when primary is up and running.
User avatar
jdunitz
Posts: 235
Joined: Wed Feb 05, 2020 2:50 pm

Re: Nagios server /var disk space issue

Post by jdunitz »

On your secondary server, you probably want to disable notifications until something happens to the primary.

Then at the same time when you tell everyone to start using the other XI server, you'd run a script to turn notifications back on on the secondary. It will still have been monitoring all this time, but just won't send emails until you tell it to.

In addition to that, you could consider having something to monitor sendmail to make sure you don't queue up a bunch of old messages again.
Having sendmail die isn't usually a very common occurrence, but in your case it'd be a major annoyance, so you can take measures to avoid that.
You could do this with Nagios (i.e., check_procs), or just a plain shell script running from cron.

For the primary-to-secondary notifications off/on script, you could do something as simple as this:

Code: Select all

#!/bin/sh

if [ "$1" = '' ]; then
   echo "You didn't say off or on."
   exit 1
fi

if [ "$1" = 'off' ]; then
echo "Disabling all notifications."
now=`date +%s`
commandfile='/usr/local/nagios/var/rw/nagios.cmd'
/bin/printf "[%lu] DISABLE_NOTIFICATIONS\n" $now > $commandfile
fi

if [ "$1" = 'on' ]; then
echo "Enabling all notifications."
now=`date +%s`
commandfile='/usr/local/nagios/var/rw/nagios.cmd'
/bin/printf "[%lu] ENABLE_NOTIFICATIONS\n" $now > $commandfile
fi

You could save that script in /root/notification-switcher.sh, and then run it as "/root/notification-switcher.sh off" or "/root/notification-switcher.sh on". You can call it whatever you like, and put it wherever makes sense to you.

You can see status of the notifications by going to Admin->Monitoring Engine Status, like so:
NotificationStatus.PNG
Hopefully this all makes sense to you. Let me know if it doesn't, of course.

Enjoy!

--Jeffrey
You do not have the required permissions to view the files attached to this post.
As of May 25th, 2018, all communications with Nagios Enterprises and its employees are covered under our new Privacy Policy.

Be sure to check out our Knowledgebase for helpful articles and solutions!
lgaddam
Posts: 116
Joined: Wed Aug 28, 2019 1:01 am

Re: Nagios server /var disk space issue

Post by lgaddam »

Yup this solution looks perfect.
When no one uses secondary better to turn off notifications & we can turn it on only when secondary replaces primary.

In this case, even sendmail not running also, no messages will get queued up right ?
User avatar
jdunitz
Posts: 235
Joined: Wed Feb 05, 2020 2:50 pm

Re: Nagios server /var disk space issue

Post by jdunitz »

Messages will queue up if they're generated locally. But those should be pretty minimal if you've disabled notifications.
You could always just flush the queue as part of the process of going primary, if you don't care about mail that might have been generated while your secondary was still in secondary mode.

Shall we lock this thread?
As of May 25th, 2018, all communications with Nagios Enterprises and its employees are covered under our new Privacy Policy.

Be sure to check out our Knowledgebase for helpful articles and solutions!
lgaddam
Posts: 116
Joined: Wed Aug 28, 2019 1:01 am

Re: Nagios server /var disk space issue

Post by lgaddam »

Yes, thanks much!!
Please lock the thread.
scottwilkerson
DevOps Engineer
Posts: 19396
Joined: Tue Nov 15, 2011 3:11 pm
Location: Nagios Enterprises
Contact:

Re: Nagios server /var disk space issue

Post by scottwilkerson »

lgaddam wrote:Yes, thanks much!!
Please lock the thread.
Great!

Locking
Former Nagios employee
Creator:
Human Design Website
Get Your Human Design Chart
Locked