Page 1 of 1

SIGSEV nagios going down.

Posted: Mon Aug 15, 2016 4:45 pm
by maytheforcebeprosper
Background:

Built a new machine
24GB /Ram
4 cores 2Ghz
Nagios 4.2
Rhel 6.8

Compiled from source.

after running nagios
"service nagios start" I get this in the logs

Code: Select all

[1471296780] HOST ALERT:server.com;DOWN;SOFT;1;FPING CRITICALserver.com (loss=100% )
[1471296780] HOST ALERT:server.com;DOWN;SOFT;1;FPING CRITICAL - server.com (loss=100% )
[1471296780] HOST ALERT:server.com;DOWN;SOFT;1;FPING CRITICAL - host is unreachable
[1471296780] HOST ALERT:server.com;DOWN;SOFT;1;FPING CRITICAL -server.com (loss=100% )
[1471296780] HOST ALERT: server.com;DOWN;SOFT;1;FPING CRITICAL - server.com (loss=100% )
[1471296780] HOST ALERT: server.comDOWN;SOFT;1;FPING CRITICAL - server.com(loss=100% )
[1471296780] Caught SIGSEGV, shutting down...

happens about 30 secs into start. any ideas on how I can tshoot this?

Re: SIGSEV nagios going down.

Posted: Mon Aug 15, 2016 5:56 pm
by tmcdonald
I'd start with something like this:

Code: Select all

cd /usr/local/nagios/bin/
strace ./nagios ../etc/nagios.cfg
Adjust to meet your paths, and you might need to install strace first. This should give us an idea of what is happening right before it crashes.

Re: SIGSEV nagios going down.

Posted: Tue Aug 16, 2016 7:50 am
by maytheforcebeprosper

Code: Select all

write(15, "-vlan1161-1.net.domain.com\n\tin"..., 4096) = 4096
write(15, "\n\ndefine hostdependency {\n\thost_"..., 4096) = 4096
write(15, "law-gw-1-vlan1683-1.net.domain"..., 4096) = 4096
write(15, "s\td,u\n\t}\n\ndefine hostdependency "..., 4096) = 4096
write(15, "aw-gw-1-vlan2866-1.net.domain."..., 4096) = 4096
write(15, "\td,u\n\t}\n\ndefine hostdependency {"..., 4096) = 4096
write(15, "t_name\tlaw-gw-1-vlan3254-1.net.c"..., 4096) = 4096
write(15, "\td,u\n\t}\n\ndefine hostdependency {"..., 4096) = 4096
write(15, "7-1.net.domain.com\n\tinherits_p"..., 4096) = 4096
write(15, "du\n\tinherits_parent\t0\n\tnotificat"..., 4096) = 4096
write(15, "\thost_name\tphi-gw-1.net.domain"..., 4096) = 4096
write(15, "a.edu\n\tinherits_parent\t0\n\texecut"..., 4096) = 4096
write(15, "rits_parent\t0\n\tnotification_fail"..., 4096) = 4096
write(15, "hostdependency {\n\thost_name\tsyra"..., 4096) = 4096
write(15, "\tdependent_host_name\tsyracuse-gw"..., 4096) = 4096
write(15, "umbia.edu\n\tdependent_host_name\tw"..., 4096) = 4096
write(15, "ent\t0\n\texecution_failure_options"..., 4096) = 4096
write(15, "t.domain.com\n\tdependent_host_n"..., 4096) = 4096
write(15, "arent\t0\n\tnotification_failure_op"..., 4096) = 4096
write(15, "-gw-1.net.domain.com\n\tdependen"..., 4096) = 4096
write(15, "its_parent\t0\n\texecution_failure_"..., 4096) = 4096
write(15, "e\twat-gw-1.net.domain.com\n\tdep"..., 4096) = 4096
write(15, "u\n\tinherits_parent\t0\n\tnotificati"..., 4096) = 4096
write(15, "-1.net.domain.com\n\tdependent_h"..., 4096) = 4096
write(15, "on_failure_options\td,u\n\t}\n\ndefin"..., 3164) = 3164
close(15)                               = 0
munmap(0x7fa64a5bd000, 4096)            = 0
unlink("/var/nagios/status.dat")        = -1 ENOENT (No such file or directory)
open("/var/nagios/retention.dat", O_RDONLY) = 15
fstat(15, {st_mode=S_IFREG|0600, st_size=55192, ...}) = 0
mmap(NULL, 55192, PROT_READ, MAP_PRIVATE, 15, 0) = 0x7fa64a5b0000
munmap(0x7fa64a5b0000, 55192)           = 0
close(15)                               = 0

That's where it dies,

I'll look into the /var/nagios/status.dat

Is there something I might be missing? More output needed?

Re: SIGSEV nagios going down.

Posted: Tue Aug 16, 2016 4:00 pm
by tmcdonald
Alright, so it doesn't appear to be the main process. Must be a child then.

strace -f -s 256 ./nagios ../etc/nagios.cfg

Otherwise you might need to enable a core dump for analysis:

http://antmeetspenguin.blogspot.com/201 ... -dump.html

This sort of thing is not easy to track down over a forum, so apologies in advance.