Page 2 of 4

Re: NagiosXI had a seizure

Posted: Mon Dec 16, 2013 1:13 pm
by tmcdonald
I took a look at the MT iSMS component, and unless I'm reading the code wrong it appears to be adding another event handler to Nagios that makes an HTTP call to send the SMS. So this would mean it's either queued in the device itself, or (more likely) on the server that sends the SMS.

Re: NagiosXI had a seizure

Posted: Mon Dec 16, 2013 1:18 pm
by BanditBBS
abrist wrote:SMS can be sent one of two ways. If you use the XI mailer, they are sent out immediately, no queue, and may be queued on the carrier's servers. If you send SMS with sendmail, then you can check the queue with "mailq".
I'm using the multitech component and using a multitech network attached sms modem. It literally took all day for the SMS's to all get through to the phones.
abrist wrote:My suspicion is load/io wait on the mysql server, io wait on the nagios server during the config write out process, or network latency.
I think Spencer is suggesting creating a ramdisk for /usr/local/nagios/etc/ and then rsyncing it to somewhere for a backup.
My mysql server is virtual and has plenty of horse power behind it, not sure what I could do about latency there. So I was wanting to do the ramdisk thing, but I wouldn't mind help from him so I make sure and don't miss anything. I can always get my linux admin to help, but I figure why not ask the experts :P

Re: NagiosXI had a seizure

Posted: Mon Dec 16, 2013 1:20 pm
by BanditBBS
tmcdonald wrote:I took a look at the MT iSMS component, and unless I'm reading the code wrong it appears to be adding another event handler to Nagios that makes an HTTP call to send the SMS. So this would mean it's either queued in the device itself, or (more likely) on the server that sends the SMS.
yes, it does it via http. I could see the outbox filling up all day like nagios was still making those http posts all day long

Re: NagiosXI had a seizure

Posted: Mon Dec 16, 2013 1:22 pm
by sreinhardt
Andy is correct, I would suggest ramdisking /usr/local/nagios/etc and rsyncing either after apply config, or after a set interval. Something I use on my personal systems, and is likely a little controversial for production, is the Anything-Sync-Daemon from ArchLinux land. You configure it as a system service, and setup the fstab rules to mount that directory to tmpfs, and it handles the rsync for you. This way in the event of a system crash or otherwise, you can start this daemon, allow it to copy your configs to memory, then start nagios as though nothing happened. Additionally, most configs, with the notable exceptions being nagios.cfg, ndoutls.cfg and such, would be repopulated via an apply config to memory at the absolute very worst. Obviously I would suggest testing this on a non-prod server first.

The profile-sync-daemon wiki article provides a little more indepth information on what is actually happening and should be configured. Also realize that Arch uses systemd\systemctl and might have some slight other differences from Cent, but by and large it should be entirely possible to change the systemd\ctl stuff to an init.d script. I can work on testing some of this if this is the route you would like to go.

Re: NagiosXI had a seizure

Posted: Mon Dec 16, 2013 1:22 pm
by abrist
As the mail is posted through an http request to the modem, the queue was probably on the modem or on the carriers relay servers. Have you looked at the modems web interface for a way to clear the queue?

Re: NagiosXI had a seizure

Posted: Mon Dec 16, 2013 1:25 pm
by BanditBBS
abrist wrote:As the mail is posted through an http request to the modem, the queue was probably on the modem or on the carriers relay servers. Have you looked at the modems web interface for a way to clear the queue?
yeah, just didn't very deep. I'll look more later.

Re: NagiosXI had a seizure

Posted: Mon Dec 16, 2013 1:33 pm
by BanditBBS
sreinhardt wrote:Andy is correct, I would suggest ramdisking /usr/local/nagios/etc and rsyncing either after apply config, or after a set interval. Something I use on my personal systems, and is likely a little controversial for production, is the Anything-Sync-Daemon from ArchLinux land. You configure it as a system service, and setup the fstab rules to mount that directory to tmpfs, and it handles the rsync for you. This way in the event of a system crash or otherwise, you can start this daemon, allow it to copy your configs to memory, then start nagios as though nothing happened. Additionally, most configs, with the notable exceptions being nagios.cfg, ndoutls.cfg and such, would be repopulated via an apply config to memory at the absolute very worst. Obviously I would suggest testing this on a non-prod server first.

The profile-sync-daemon wiki article provides a little more indepth information on what is actually happening and should be configured. Also realize that Arch uses systemd\systemctl and might have some slight other differences from Cent, but by and large it should be entirely possible to change the systemd\ctl stuff to an init.d script. I can work on testing some of this if this is the route you would like to go.
Thanks...meeting with my linux admin in 30 minutes to discuss and test on my dev server.

Re: NagiosXI had a seizure

Posted: Mon Dec 16, 2013 1:39 pm
by sreinhardt
You're welcome! Let us know what you decide. As I said, if you do go this route, I am happy to test here too, it is something I have been planning to implement anyway for higher load\end systems as it should provide massive nagios reload\restart improvements. (ns vs ms read latency)

Re: NagiosXI had a seizure

Posted: Mon Dec 16, 2013 3:17 pm
by BanditBBS
Spenser(My temporary new best friend),

Linux admin and I did these steps on my dev NagiosXI server:
  • mv /usr/local/nagios/etc to some temp location
  • recreated the etc folder
  • mount -t tmpfs none /usr/local/nagios/etc -o size=50m
  • chown apache:nagios to that folder and copied everything from the temp location to the new ramdisk
After that, I went into CCM and did an apply changes and this is the new contents of that folder:

Code: Select all

-rw-rw-r-- 1 apache nagios   793 Dec 16 14:59 cgi.cfg
-rw-rw-r-- 1 apache nagios 25826 Dec 16 15:00 commands.cfg
-rw-rw-r-- 1 apache nagios  1073 Dec 16 15:00 contactgroups.cfg
-rw-rw-r-- 1 apache nagios  2682 Dec 16 15:00 contacts.cfg
-rw-rw-r-- 1 apache nagios  1500 Dec 16 15:00 contacttemplates.cfg
-rw-rw-r-- 1 apache nagios   642 Dec 16 15:00 hostdependencies.cfg
-rw-rw-r-- 1 apache nagios   644 Dec 16 15:00 hostescalations.cfg
-rw-rw-r-- 1 apache nagios   662 Dec 16 15:00 hostextinfo.cfg
-rw-rw-r-- 1 apache nagios   984 Dec 16 15:00 hostgroups.cfg
drwxrwxr-x 2 apache nagios   320 Dec 16 14:59 hosts
-rw-rw-r-- 1 apache nagios 13940 Dec 16 15:00 hosttemplates.cfg
drwxrwxr-x 2 apache nagios    40 Dec 16 14:59 import
-rwxrwxr-x 1 apache nagios  5764 Dec 16 14:59 nagios.cfg
-rw-rw-r-- 1 apache nagios  2229 Dec 16 14:59 ndo2db.cfg
-rw-rw-r-- 1 apache nagios  4827 Dec 16 14:59 ndomod.cfg
-rw-rw-r-- 1 apache nagios  7227 Dec 16 14:59 nrpe.cfg
-rw-rw-r-- 1 apache nagios  5374 Dec 16 14:59 nsca.cfg
drwxrwxr-x 4 apache nagios   260 Dec 16 14:59 pnp
-rwxrwxr-x 1 apache nagios   210 Dec 16 14:59 resource.cfg
-rw-rw-r-- 1 apache nagios  1627 Dec 16 14:59 send_nsca.cfg
-rw-rw-r-- 1 apache nagios   648 Dec 16 15:00 servicedependencies.cfg
-rw-rw-r-- 1 apache nagios   650 Dec 16 15:00 serviceescalations.cfg
-rw-rw-r-- 1 apache nagios   668 Dec 16 15:00 serviceextinfo.cfg
-rw-rw-r-- 1 apache nagios   638 Dec 16 15:00 servicegroups.cfg
drwxrwxr-x 2 apache nagios   260 Dec 16 14:59 services
-rw-rw-r-- 1 apache nagios 21181 Dec 16 15:00 servicetemplates.cfg
drwxrwxr-x 2 apache nagios   100 Dec 16 14:59 static
-rw-rw-r-- 1 apache nagios  5104 Dec 16 15:00 timeperiods.cfg
[root@rn000002 etc]#
As you can see, the files have been updated, except the nagios.cfg, ndo2db.cfg and the few other files. So now, i am tryign to figure out the best way to actually implement and test this. How do I get those unchanging config files(nagios.cfg, nrpe.cfg, etc) to be on the ramdisk upon a server restart. I have no issue adding a step to the reboot/shutdown proceedures...for example...

Adding to the shutdown server proceedure:
  • Copy everything from ramdisk to a temp/backup location(or even create a cron to do it hourly, its only 10MB)
Add to startup proceedure
  • Copy everything from temp/backup location to the ramdisk and go to gui and hit apply changes
You have a better suggestion?

Re: NagiosXI had a seizure

Posted: Mon Dec 16, 2013 3:28 pm
by sreinhardt
The links I suggested before were specifically for rsyncing all files in that directory. Otherwise, yes doing an on boot and on shutdown copy would likely suffice. I would suggest doing a regular cron or other copy\sync as well just in case the server flips out again. Otherwise I think you got it just right! Have you noticed any improvements?

Edit, it is also probably necessary to force nagios and related services not to load until those files can be copied over. Maybe alter your boot script so that it loads configs onto the ramdisk, then starts the proper services, and disable the services otherwise from starting on boot.