Page 1 of 1
Nagios log server stops collecting logs
Posted: Mon Feb 26, 2018 8:48 am
by bpizzutiWHI
I originally thought this was just Logstash dying, but it looks like it's happing right after a log rotation, and both the elasticsearch and logstash log files are zero size. I can provide the immediately prior logs that were GZiped. I put in a daily cron job to restart logstash but it didn't help, this happend a couple days ago and logstash never recovered.
Re: Nagios log server stops collecting logs
Posted: Mon Feb 26, 2018 2:16 pm
by kyang
What version of NLS are you on now?
Can you post or PM the prior logs and some recent logs also?
Could you also post or PM your system profile?
NLS home page --> click "Admin" --> under System click "System Status" --> Click "Download System Profile"
Thanks!
Re: Nagios log server stops collecting logs
Posted: Mon Feb 26, 2018 2:33 pm
by bpizzutiWHI
Haven't actually restarted it yet, so the only log present is the empty ones. Here are the prior log files and the system profile. We're running 2.0.2.
Re: Nagios log server stops collecting logs
Posted: Mon Feb 26, 2018 5:51 pm
by kyang
Could you try restarting logstash?
Then the output of ps -aux?
You have a lot of these errors.
Code: Select all
A plugin had an unrecoverable error. Will restart this plugin.\n Plugin: <LogStash::Inputs::Tcp type=>\"netlog\", port=>10525
"UDP listener died"
"syslog listener died", :protocol=>:udp, :address=>"0.0.0.0:10526"
Re: Nagios log server stops collecting logs
Posted: Tue Feb 27, 2018 8:56 am
by bpizzutiWHI
I'm not sure why that listener keeps dying.
ps -aux:
Code: Select all
USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND
root 1 0.1 0.0 194684 5704 ? Ss Feb20 16:24 /usr/lib/systemd/systemd --switche
root 2 0.0 0.0 0 0 ? S Feb20 0:00 [kthreadd]
root 3 0.0 0.0 0 0 ? S Feb20 0:04 [ksoftirqd/0]
root 5 0.0 0.0 0 0 ? S< Feb20 0:00 [kworker/0:0H]
root 7 0.0 0.0 0 0 ? S Feb20 0:00 [migration/0]
root 8 0.0 0.0 0 0 ? S Feb20 0:00 [rcu_bh]
root 9 0.1 0.0 0 0 ? S Feb20 9:58 [rcu_sched]
root 10 0.0 0.0 0 0 ? S Feb20 0:01 [watchdog/0]
root 11 0.0 0.0 0 0 ? S Feb20 0:01 [watchdog/1]
root 12 0.0 0.0 0 0 ? S Feb20 0:00 [migration/1]
root 13 0.0 0.0 0 0 ? S Feb20 0:03 [ksoftirqd/1]
root 15 0.0 0.0 0 0 ? S< Feb20 0:00 [kworker/1:0H]
root 16 0.0 0.0 0 0 ? S Feb20 0:01 [watchdog/2]
root 17 0.0 0.0 0 0 ? S Feb20 0:00 [migration/2]
root 18 0.0 0.0 0 0 ? S Feb20 0:06 [ksoftirqd/2]
root 20 0.0 0.0 0 0 ? S< Feb20 0:00 [kworker/2:0H]
root 21 0.0 0.0 0 0 ? S Feb20 0:01 [watchdog/3]
root 22 0.0 0.0 0 0 ? S Feb20 0:00 [migration/3]
root 23 0.0 0.0 0 0 ? S Feb20 0:00 [ksoftirqd/3]
root 25 0.0 0.0 0 0 ? S< Feb20 0:00 [kworker/3:0H]
root 26 0.0 0.0 0 0 ? S Feb20 0:01 [watchdog/4]
root 27 0.0 0.0 0 0 ? S Feb20 0:00 [migration/4]
root 28 0.0 0.0 0 0 ? S Feb20 0:13 [ksoftirqd/4]
root 30 0.0 0.0 0 0 ? S< Feb20 0:00 [kworker/4:0H]
root 31 0.0 0.0 0 0 ? S Feb20 0:01 [watchdog/5]
root 32 0.0 0.0 0 0 ? S Feb20 0:00 [migration/5]
root 33 0.0 0.0 0 0 ? S Feb20 0:14 [ksoftirqd/5]
root 35 0.0 0.0 0 0 ? S< Feb20 0:00 [kworker/5:0H]
root 36 0.0 0.0 0 0 ? S Feb20 0:01 [watchdog/6]
root 37 0.0 0.0 0 0 ? S Feb20 0:00 [migration/6]
root 38 0.0 0.0 0 0 ? S Feb20 0:01 [ksoftirqd/6]
root 40 0.0 0.0 0 0 ? S< Feb20 0:00 [kworker/6:0H]
root 41 0.0 0.0 0 0 ? S Feb20 0:01 [watchdog/7]
root 42 0.0 0.0 0 0 ? S Feb20 0:00 [migration/7]
root 43 0.0 0.0 0 0 ? S Feb20 0:00 [ksoftirqd/7]
root 45 0.0 0.0 0 0 ? S< Feb20 0:00 [kworker/7:0H]
root 46 0.0 0.0 0 0 ? S Feb20 0:01 [watchdog/8]
root 47 0.0 0.0 0 0 ? S Feb20 0:00 [migration/8]
root 48 0.0 0.0 0 0 ? S Feb20 0:00 [ksoftirqd/8]
root 50 0.0 0.0 0 0 ? S< Feb20 0:00 [kworker/8:0H]
root 51 0.0 0.0 0 0 ? S Feb20 0:01 [watchdog/9]
root 52 0.0 0.0 0 0 ? S Feb20 0:00 [migration/9]
root 53 0.0 0.0 0 0 ? S Feb20 0:00 [ksoftirqd/9]
root 55 0.0 0.0 0 0 ? S< Feb20 0:00 [kworker/9:0H]
root 56 0.0 0.0 0 0 ? S Feb20 0:01 [watchdog/10]
root 57 0.0 0.0 0 0 ? S Feb20 0:00 [migration/10]
root 58 0.0 0.0 0 0 ? S Feb20 0:00 [ksoftirqd/10]
root 60 0.0 0.0 0 0 ? S< Feb20 0:00 [kworker/10:0H]
root 61 0.0 0.0 0 0 ? S Feb20 0:01 [watchdog/11]
root 62 0.0 0.0 0 0 ? S Feb20 0:00 [migration/11]
root 63 0.0 0.0 0 0 ? S Feb20 0:02 [ksoftirqd/11]
root 65 0.0 0.0 0 0 ? S< Feb20 0:00 [kworker/11:0H]
root 66 0.0 0.0 0 0 ? S Feb20 0:01 [watchdog/12]
root 67 0.0 0.0 0 0 ? S Feb20 0:00 [migration/12]
root 68 0.0 0.0 0 0 ? S Feb20 0:00 [ksoftirqd/12]
root 70 0.0 0.0 0 0 ? S< Feb20 0:00 [kworker/12:0H]
root 71 0.0 0.0 0 0 ? S Feb20 0:01 [watchdog/13]
root 72 0.0 0.0 0 0 ? S Feb20 0:00 [migration/13]
root 73 0.0 0.0 0 0 ? S Feb20 0:00 [ksoftirqd/13]
root 75 0.0 0.0 0 0 ? S< Feb20 0:00 [kworker/13:0H]
root 76 0.0 0.0 0 0 ? S Feb20 0:01 [watchdog/14]
root 77 0.0 0.0 0 0 ? S Feb20 0:01 [migration/14]
root 78 0.0 0.0 0 0 ? S Feb20 0:00 [ksoftirqd/14]
root 80 0.0 0.0 0 0 ? S< Feb20 0:00 [kworker/14:0H]
root 81 0.0 0.0 0 0 ? S Feb20 0:01 [watchdog/15]
root 82 0.0 0.0 0 0 ? S Feb20 0:01 [migration/15]
root 83 0.0 0.0 0 0 ? S Feb20 0:00 [ksoftirqd/15]
root 85 0.0 0.0 0 0 ? S< Feb20 0:00 [kworker/15:0H]
root 86 0.0 0.0 0 0 ? S Feb20 0:01 [watchdog/16]
root 87 0.0 0.0 0 0 ? S Feb20 0:00 [migration/16]
root 88 0.0 0.0 0 0 ? S Feb20 0:04 [ksoftirqd/16]
root 90 0.0 0.0 0 0 ? S< Feb20 0:00 [kworker/16:0H]
root 91 0.0 0.0 0 0 ? S Feb20 0:01 [watchdog/17]
root 92 0.0 0.0 0 0 ? S Feb20 0:01 [migration/17]
root 93 0.0 0.0 0 0 ? S Feb20 0:01 [ksoftirqd/17]
root 95 0.0 0.0 0 0 ? S< Feb20 0:00 [kworker/17:0H]
root 96 0.0 0.0 0 0 ? S Feb20 0:01 [watchdog/18]
root 97 0.0 0.0 0 0 ? S Feb20 0:01 [migration/18]
root 98 0.0 0.0 0 0 ? S Feb20 0:06 [ksoftirqd/18]
root 100 0.0 0.0 0 0 ? S< Feb20 0:00 [kworker/18:0H]
root 101 0.0 0.0 0 0 ? S Feb20 0:01 [watchdog/19]
root 102 0.0 0.0 0 0 ? S Feb20 0:01 [migration/19]
root 103 0.0 0.0 0 0 ? S Feb20 0:06 [ksoftirqd/19]
root 105 0.0 0.0 0 0 ? S< Feb20 0:00 [kworker/19:0H]
root 106 0.0 0.0 0 0 ? S Feb20 0:01 [watchdog/20]
root 107 0.0 0.0 0 0 ? S Feb20 0:01 [migration/20]
root 108 0.0 0.0 0 0 ? S Feb20 0:10 [ksoftirqd/20]
root 110 0.0 0.0 0 0 ? S< Feb20 0:00 [kworker/20:0H]
root 111 0.0 0.0 0 0 ? S Feb20 0:01 [watchdog/21]
root 112 0.0 0.0 0 0 ? S Feb20 0:01 [migration/21]
root 113 0.0 0.0 0 0 ? S Feb20 0:00 [ksoftirqd/21]
root 115 0.0 0.0 0 0 ? S< Feb20 0:00 [kworker/21:0H]
root 116 0.0 0.0 0 0 ? S Feb20 0:01 [watchdog/22]
root 117 0.0 0.0 0 0 ? S Feb20 0:01 [migration/22]
root 118 0.0 0.0 0 0 ? S Feb20 0:00 [ksoftirqd/22]
root 121 0.0 0.0 0 0 ? S Feb20 0:01 [watchdog/23]
root 122 0.0 0.0 0 0 ? S Feb20 0:01 [migration/23]
root 123 0.0 0.0 0 0 ? S Feb20 0:01 [ksoftirqd/23]
root 125 0.0 0.0 0 0 ? S< Feb20 0:00 [kworker/23:0H]
root 126 0.0 0.0 0 0 ? S Feb20 0:01 [watchdog/24]
root 127 0.0 0.0 0 0 ? S Feb20 0:01 [migration/24]
root 128 0.0 0.0 0 0 ? S Feb20 0:00 [ksoftirqd/24]
root 130 0.0 0.0 0 0 ? S< Feb20 0:00 [kworker/24:0H]
root 131 0.0 0.0 0 0 ? S Feb20 0:01 [watchdog/25]
root 132 0.0 0.0 0 0 ? S Feb20 0:01 [migration/25]
root 133 0.0 0.0 0 0 ? S Feb20 0:12 [ksoftirqd/25]
root 135 0.0 0.0 0 0 ? S< Feb20 0:00 [kworker/25:0H]
root 136 0.0 0.0 0 0 ? S Feb20 0:01 [watchdog/26]
root 137 0.0 0.0 0 0 ? S Feb20 0:01 [migration/26]
root 138 0.0 0.0 0 0 ? S Feb20 0:00 [ksoftirqd/26]
root 140 0.0 0.0 0 0 ? S< Feb20 0:00 [kworker/26:0H]
root 141 0.0 0.0 0 0 ? S Feb20 0:01 [watchdog/27]
root 142 0.0 0.0 0 0 ? S Feb20 0:01 [migration/27]
root 143 0.0 0.0 0 0 ? S Feb20 0:00 [ksoftirqd/27]
root 145 0.0 0.0 0 0 ? S< Feb20 0:00 [kworker/27:0H]
root 147 0.0 0.0 0 0 ? S Feb20 0:00 [kdevtmpfs]
root 148 0.0 0.0 0 0 ? S< Feb20 0:00 [netns]
root 149 0.0 0.0 0 0 ? S Feb20 0:00 [khungtaskd]
root 150 0.0 0.0 0 0 ? S< Feb20 0:00 [writeback]
root 151 0.0 0.0 0 0 ? S< Feb20 0:00 [kintegrityd]
root 152 0.0 0.0 0 0 ? S< Feb20 0:00 [bioset]
root 153 0.0 0.0 0 0 ? S< Feb20 0:00 [kblockd]
root 155 0.0 0.0 0 0 ? S< Feb20 0:00 [md]
root 161 0.0 0.0 0 0 ? S Feb20 0:03 [kswapd0]
root 162 0.0 0.0 0 0 ? SN Feb20 0:00 [ksmd]
root 163 0.0 0.0 0 0 ? SN Feb20 0:01 [khugepaged]
root 164 0.0 0.0 0 0 ? S Feb20 0:00 [fsnotify_mark]
root 165 0.0 0.0 0 0 ? S< Feb20 0:00 [crypto]
root 173 0.0 0.0 0 0 ? S< Feb20 0:00 [kthrotld]
root 175 0.0 0.0 0 0 ? S< Feb20 0:00 [kmpath_rdacd]
root 180 0.0 0.0 0 0 ? S< Feb20 0:00 [kpsmoused]
root 181 0.0 0.0 0 0 ? S< Feb20 0:00 [ipv6_addrconf]
root 200 0.0 0.0 0 0 ? S< Feb20 0:00 [deferwq]
root 236 0.0 0.0 0 0 ? S Feb20 0:19 [kauditd]
root 360 0.0 0.0 0 0 ? S Feb24 0:07 [kworker/14:2]
root 445 0.0 0.0 0 0 ? S Feb20 0:00 [scsi_eh_0]
root 446 0.0 0.0 0 0 ? S< Feb20 0:00 [scsi_tmf_0]
root 447 0.0 0.0 0 0 ? S< Feb20 0:00 [ata_sff]
root 462 0.0 0.0 0 0 ? S Feb20 0:00 [scsi_eh_1]
root 463 0.0 0.0 0 0 ? S< Feb20 0:00 [scsi_tmf_1]
root 464 0.0 0.0 0 0 ? S Feb20 0:00 [scsi_eh_2]
root 465 0.0 0.0 0 0 ? S< Feb20 0:00 [scsi_tmf_2]
root 466 0.0 0.0 0 0 ? S Feb20 0:00 [scsi_eh_3]
root 467 0.0 0.0 0 0 ? S< Feb20 0:00 [scsi_tmf_3]
root 468 0.0 0.0 0 0 ? S Feb20 0:00 [scsi_eh_4]
root 469 0.0 0.0 0 0 ? S< Feb20 0:00 [scsi_tmf_4]
root 478 0.0 0.0 0 0 ? S< Feb20 0:00 [ttm_swap]
root 479 0.0 0.0 0 0 ? S Feb20 0:00 [scsi_eh_5]
root 480 0.0 0.0 0 0 ? S< Feb20 0:00 [scsi_tmf_5]
root 481 0.0 0.0 0 0 ? S Feb20 0:00 [scsi_eh_6]
root 482 0.0 0.0 0 0 ? S< Feb20 0:00 [scsi_tmf_6]
root 483 0.0 0.0 0 0 ? S Feb20 0:00 [scsi_eh_7]
root 484 0.0 0.0 0 0 ? S< Feb20 0:00 [scsi_tmf_7]
root 485 0.0 0.0 0 0 ? S Feb20 0:00 [scsi_eh_8]
root 486 0.0 0.0 0 0 ? S< Feb20 0:00 [scsi_tmf_8]
root 487 0.0 0.0 0 0 ? S Feb20 0:00 [scsi_eh_9]
root 488 0.0 0.0 0 0 ? S< Feb20 0:00 [scsi_tmf_9]
root 489 0.0 0.0 0 0 ? S Feb20 0:00 [scsi_eh_10]
root 490 0.0 0.0 0 0 ? S< Feb20 0:00 [scsi_tmf_10]
root 626 0.0 0.0 0 0 ? S< Feb20 0:00 [kdmflush]
root 627 0.0 0.0 0 0 ? S< Feb20 0:00 [bioset]
root 639 0.0 0.0 0 0 ? S< Feb20 0:00 [kdmflush]
root 640 0.0 0.0 0 0 ? S< Feb20 0:00 [bioset]
root 653 0.0 0.0 0 0 ? S< Feb20 0:00 [xfsalloc]
root 654 0.0 0.0 0 0 ? S< Feb20 0:00 [xfs_mru_cache]
root 655 0.0 0.0 0 0 ? S< Feb20 0:00 [xfs-buf/dm-0]
root 656 0.0 0.0 0 0 ? S< Feb20 0:00 [xfs-data/dm-0]
root 657 0.0 0.0 0 0 ? S< Feb20 0:00 [xfs-conv/dm-0]
root 658 0.0 0.0 0 0 ? S< Feb20 0:00 [xfs-cil/dm-0]
root 659 0.0 0.0 0 0 ? S< Feb20 0:00 [xfs-reclaim/dm-]
root 660 0.0 0.0 0 0 ? S< Feb20 0:00 [xfs-log/dm-0]
root 661 0.0 0.0 0 0 ? S< Feb20 0:00 [xfs-eofblocks/d]
root 662 0.0 0.0 0 0 ? S Feb20 2:21 [xfsaild/dm-0]
root 731 0.0 0.0 141564 73872 ? Ss Feb20 2:41 /usr/lib/systemd/systemd-journald
root 744 0.0 0.0 0 0 ? S< Feb20 0:00 [kworker/3:1H]
root 758 0.0 0.0 268404 1484 ? Ss Feb20 0:00 /usr/sbin/lvmetad -f
root 765 0.0 0.0 45556 1824 ? Ss Feb20 0:00 /usr/lib/systemd/systemd-udevd
root 1331 0.0 0.0 0 0 ? S< Feb20 0:00 [edac-poller]
root 1744 0.0 0.0 0 0 ? S< Feb20 0:00 [kvm-irqfd-clean]
root 1765 0.0 0.0 0 0 ? S< Feb20 0:00 [xfs-buf/sda1]
root 1766 0.0 0.0 0 0 ? S< Feb20 0:00 [xfs-data/sda1]
root 1767 0.0 0.0 0 0 ? S< Feb20 0:00 [xfs-conv/sda1]
root 1768 0.0 0.0 0 0 ? S< Feb20 0:00 [xfs-cil/sda1]
root 1769 0.0 0.0 0 0 ? S< Feb20 0:00 [xfs-reclaim/sda]
root 1770 0.0 0.0 0 0 ? S< Feb20 0:00 [xfs-log/sda1]
root 1771 0.0 0.0 0 0 ? S< Feb20 0:00 [xfs-eofblocks/s]
root 1773 0.0 0.0 0 0 ? S Feb20 0:00 [xfsaild/sda1]
root 1779 0.0 0.0 0 0 ? S< Feb20 0:00 [kworker/5:1H]
root 1783 0.0 0.0 0 0 ? S< Feb20 0:00 [kworker/6:1H]
root 1790 0.0 0.0 0 0 ? S< Feb20 0:00 [kworker/7:1H]
root 1794 0.0 0.0 0 0 ? S< Feb20 0:00 [kdmflush]
root 1795 0.0 0.0 0 0 ? S< Feb20 0:00 [bioset]
root 1797 0.0 0.0 0 0 ? S< Feb20 0:00 [kdmflush]
root 1798 0.0 0.0 0 0 ? S< Feb20 0:00 [bioset]
root 1804 0.0 0.0 0 0 ? S< Feb20 0:00 [xfs-buf/dm-2]
root 1805 0.0 0.0 0 0 ? S< Feb20 0:00 [xfs-data/dm-2]
root 1806 0.0 0.0 0 0 ? S< Feb20 0:00 [xfs-conv/dm-2]
root 1807 0.0 0.0 0 0 ? S< Feb20 0:00 [xfs-cil/dm-2]
root 1808 0.0 0.0 0 0 ? S< Feb20 0:00 [xfs-reclaim/dm-]
root 1809 0.0 0.0 0 0 ? S< Feb20 0:00 [xfs-log/dm-2]
root 1810 0.0 0.0 0 0 ? S< Feb20 0:00 [xfs-eofblocks/d]
root 1811 0.0 0.0 0 0 ? S Feb20 0:00 [xfsaild/dm-2]
root 1815 0.0 0.0 0 0 ? S< Feb20 0:00 [xfs-buf/dm-3]
root 1816 0.0 0.0 0 0 ? S< Feb20 0:00 [xfs-data/dm-3]
root 1817 0.0 0.0 0 0 ? S< Feb20 0:00 [xfs-conv/dm-3]
root 1818 0.0 0.0 0 0 ? S< Feb20 0:00 [xfs-cil/dm-3]
root 1819 0.0 0.0 0 0 ? S< Feb20 0:00 [xfs-reclaim/dm-]
root 1820 0.0 0.0 0 0 ? S< Feb20 0:00 [xfs-log/dm-3]
root 1821 0.0 0.0 0 0 ? S< Feb20 0:00 [xfs-eofblocks/d]
root 1822 0.0 0.0 0 0 ? S Feb20 2:09 [xfsaild/dm-3]
root 1834 0.0 0.0 57464 1848 ? S<sl Feb20 0:35 /sbin/auditd -n
root 1856 0.0 0.0 26300 1756 ? Ss Feb20 0:41 /usr/lib/systemd/systemd-logind
root 1857 0.0 0.0 6412 592 ? Ss Feb20 6:39 /sbin/rngd -f
libstor+ 1864 0.0 0.0 10568 804 ? Ss Feb20 0:01 /usr/bin/lsmd -d
root 1866 0.0 0.0 21476 1384 ? Ss Feb20 1:56 /usr/sbin/irqbalance --foreground
root 1870 0.0 0.0 214832 5492 ? Ss Feb20 0:00 /usr/sbin/abrtd -d -s
root 1872 0.0 0.0 212332 4560 ? Ss Feb20 0:05 /usr/bin/abrt-watch-log -F BUG: WA
polkitd 1877 0.0 0.0 531776 12168 ? Ssl Feb20 0:41 /usr/lib/polkit-1/polkitd --no-deb
root 1880 0.0 0.0 130376 2840 ? Ss Feb20 0:00 /usr/sbin/smartd -n -q never
dbus 1882 0.0 0.0 28668 1924 ? Ss Feb20 1:29 /bin/dbus-daemon --system --addres
root 1883 0.0 0.0 0 0 ? S< Feb20 0:00 [kworker/2:1H]
chrony 1884 0.0 0.0 117892 1892 ? S Feb20 0:01 /usr/sbin/chronyd
root 1887 0.0 0.0 437884 10556 ? Ssl Feb20 0:23 /usr/sbin/NetworkManager --no-daem
root 1892 0.0 0.0 127880 1596 ? Ss Feb20 0:06 /usr/sbin/crond -n
root 1894 0.0 0.0 27892 940 ? Ss Feb20 0:00 /usr/sbin/atd -f
root 1943 0.0 0.0 0 0 ? S< Feb20 0:00 [kworker/20:1H]
root 1944 0.0 0.0 0 0 ? S< Feb20 0:00 [kworker/0:1H]
root 2004 0.0 0.0 0 0 ? S< Feb20 0:01 [kworker/23:1H]
root 2211 0.0 0.0 555704 16620 ? Ssl Feb20 0:49 /usr/bin/python -Es /usr/sbin/tune
root 2212 0.0 0.0 597480 48584 ? Ssl Feb20 1:52 /usr/sbin/rsyslogd -n
root 2214 0.0 0.0 226252 11636 ? Ss Feb20 6:39 /usr/sbin/snmpd -LS0-6d -f
root 2222 0.0 0.0 108004 4072 ? Ss Feb20 0:00 /usr/sbin/sshd -D
root 2223 0.0 0.0 319340 12960 ? Ss Feb20 0:18 /usr/sbin/httpd -DFOREGROUND
nagios 2553 34.6 52.1 167424240 137677780 ? SLl Feb20 3318:25 java -Xms128837m -Xmx128837m -Dja
root 3689 0.0 0.0 1063876 23808 ? Ssl Feb20 4:04 /opt/dell/srvadmin/sbin/dsm_sa_dat
root 3774 0.0 0.0 0 0 ? S 01:01 0:00 [kworker/0:2]
root 3846 0.0 0.0 0 0 ? S< Feb20 0:01 [kworker/1:1H]
root 3888 0.0 0.0 0 0 ? S< Feb20 0:01 [kworker/25:1H]
root 3889 0.0 0.0 0 0 ? S< Feb20 0:01 [kworker/9:1H]
root 3890 0.0 0.0 0 0 ? S< Feb20 0:00 [kworker/18:1H]
root 3891 0.0 0.0 0 0 ? S< Feb20 0:00 [kworker/16:1H]
root 3899 0.0 0.0 96704 2712 ? Ss Feb20 0:10 sendmail: accepting connections
root 3920 0.0 0.0 161708 2788 ? S Feb20 0:14 /usr/netvault/bin/nvpmgr startup
root 3921 0.0 0.0 134600 2300 ? S Feb20 0:06 nvcmgr 2
smmsp 3922 0.0 0.0 88152 2012 ? Ss Feb20 0:00 sendmail: Queue runner@01:00:00 fo
root 3931 0.3 0.0 135644 3544 ? S Feb20 32:46 nvnmgr 3
root 3932 0.0 0.0 142680 2608 ? S Feb20 0:06 nvstatsmngr 9
root 3934 0.0 0.0 134580 2128 ? S Feb20 0:00 nvconsolesvc 15
root 3974 0.0 0.0 303024 4096 ? Ssl Feb20 0:07 /opt/dell/srvadmin/sbin/dsm_sa_eve
root 4001 0.0 0.0 515504 10372 ? Ssl Feb20 1:38 /opt/dell/srvadmin/sbin/dsm_sa_snm
root 4066 0.0 0.0 727736 17260 ? Ss Feb20 0:00 /opt/dell/srvadmin/sbin/dsm_sa_dat
root 4097 0.0 0.0 0 0 ? S< Feb20 0:00 [kworker/14:1H]
root 4143 0.0 0.0 140428 892 ? Ss Feb20 0:00 /opt/dell/srvadmin/sbin/dsm_om_con
root 4144 0.1 0.1 6252736 351800 ? Sl Feb20 17:55 /opt/dell/srvadmin/sbin/dsm_om_con
root 4187 0.0 0.0 700464 4208 ? Ssl Feb20 0:00 /opt/dell/srvadmin/sbin/dsm_om_shr
root 4711 0.0 0.0 0 0 ? S Feb24 0:02 [kworker/20:0]
root 5398 0.0 0.0 0 0 ? S< Feb20 0:00 [kworker/19:1H]
root 5421 0.0 0.0 0 0 ? S< Feb20 0:01 [kworker/8:1H]
root 5488 0.0 0.0 0 0 ? S< Feb20 0:00 [kworker/11:1H]
root 5645 0.0 0.0 0 0 ? S< Feb20 0:00 [kworker/17:1H]
root 5861 0.0 0.0 0 0 ? S< Feb20 0:00 [kworker/27:1H]
root 5888 0.0 0.0 0 0 ? S< Feb20 0:00 [kworker/15:1H]
root 5891 0.0 0.0 0 0 ? S< Feb20 0:00 [kworker/10:1H]
root 5958 0.0 0.0 0 0 ? S< Feb20 0:00 [kworker/4:1H]
root 5976 0.0 0.0 0 0 ? S< Feb20 0:01 [kworker/13:1H]
root 5996 0.0 0.0 0 0 ? S< Feb20 0:01 [kworker/21:1H]
root 6096 0.0 0.0 112080 852 tty1 Ss+ Feb20 0:00 /sbin/agetty --noclear tty1 linux
root 6133 0.0 0.0 0 0 ? S< Feb20 0:01 [kworker/24:1H]
root 6240 0.0 0.0 0 0 ? S< Feb20 0:00 [kworker/12:1H]
apache 9453 0.0 0.0 430772 16908 ? S Feb26 0:13 /usr/sbin/httpd -DFOREGROUND
root 9699 0.0 0.0 0 0 ? S< Feb20 0:00 [kworker/26:1H]
root 10329 0.0 0.0 0 0 ? S< Feb24 0:00 [kworker/22:0H]
root 11486 0.0 0.0 0 0 ? S Feb24 0:02 [kworker/7:2]
root 14043 0.0 0.0 0 0 ? S 02:01 0:00 [kworker/24:0]
root 15471 0.0 0.0 0 0 ? S Feb24 0:02 [kworker/21:2]
root 16914 0.0 0.0 0 0 ? S Feb26 0:00 [kworker/9:0]
root 19287 0.0 0.0 0 0 ? S< 02:33 0:00 [kworker/22:2H]
root 22968 0.0 0.0 0 0 ? S 02:55 0:00 [kworker/25:0]
root 25464 0.0 0.0 0 0 ? S 03:10 0:00 [kworker/23:2]
apache 25735 0.0 0.0 430964 17120 ? S Feb26 0:14 /usr/sbin/httpd -DFOREGROUND
apache 25738 0.0 0.0 430824 16792 ? S Feb26 0:14 /usr/sbin/httpd -DFOREGROUND
root 26027 0.0 0.0 155136 5980 ? Ss Feb26 0:00 sshd: bpizzuti [priv]
bpizzuti 26029 0.0 0.0 155720 3220 ? S Feb26 0:00 sshd: bpizzuti@pts/0
bpizzuti 26030 0.0 0.0 55464 2196 ? Ss Feb26 0:00 /usr/libexec/openssh/sftp-server
bpizzuti 26045 0.0 0.0 117964 2660 pts/0 Ss Feb26 0:00 -bash
root 29248 0.0 0.0 0 0 ? S Feb26 0:00 [kworker/13:2]
root 31276 0.0 0.0 0 0 ? S 03:46 0:00 [kworker/4:1]
root 33719 0.0 0.0 0 0 ? S 04:01 0:00 [kworker/8:1]
root 36833 0.0 0.0 0 0 ? S 04:19 0:00 [kworker/10:2]
root 38538 0.0 0.0 0 0 ? S 04:30 0:00 [kworker/26:0]
apache 41657 0.0 0.0 430308 16088 ? S Feb26 0:12 /usr/sbin/httpd -DFOREGROUND
apache 41718 0.0 0.0 430516 16312 ? S Feb26 0:09 /usr/sbin/httpd -DFOREGROUND
root 43588 0.0 0.0 0 0 ? S 05:01 0:00 [kworker/27:1]
root 47046 0.0 0.0 0 0 ? S 05:22 0:00 [kworker/12:0]
root 51085 0.0 0.0 0 0 ? S 05:47 0:00 [kworker/20:2]
root 51298 0.0 0.0 0 0 ? S 05:48 0:00 [kworker/6:0]
root 52597 0.0 0.0 0 0 ? S Feb26 0:00 [kworker/18:2]
root 54587 0.0 0.0 0 0 ? S 06:08 0:00 [kworker/2:1]
apache 56484 0.0 0.0 326924 16204 ? S Feb26 0:07 /usr/sbin/httpd -DFOREGROUND
root 57682 0.0 0.0 0 0 ? S Feb25 0:00 [kworker/7:1]
root 59594 0.0 0.0 0 0 ? S 06:38 0:00 [kworker/u384:2]
root 62678 0.0 0.0 0 0 ? S Feb25 0:01 [kworker/2:2]
root 63253 0.0 0.0 0 0 ? S 07:01 0:00 [kworker/15:2]
root 63831 0.0 0.0 0 0 ? S Feb25 0:01 [kworker/19:0]
root 71300 0.0 0.0 0 0 ? S 07:50 0:00 [kworker/19:2]
root 73086 0.0 0.0 0 0 ? S 08:01 0:00 [kworker/3:0]
root 75087 0.0 0.0 0 0 ? S 08:13 0:00 [kworker/21:0]
root 75886 0.0 0.0 0 0 ? S Feb26 0:00 [kworker/11:0]
apache 77627 0.0 0.0 430516 16628 ? S Feb25 0:47 /usr/sbin/httpd -DFOREGROUND
root 80083 0.0 0.0 0 0 ? S 08:43 0:00 [kworker/u384:0]
root 81412 0.0 0.0 181860 2520 ? S 08:52 0:00 /usr/sbin/CROND -n
nagios 81420 0.0 0.0 115164 1204 ? Ss 08:52 0:00 /bin/sh -c /usr/bin/php -q /var/ww
nagios 81422 0.1 0.0 244884 14500 ? S 08:52 0:00 /usr/bin/php -q /var/www/html/nagi
root 81522 0.0 0.0 148396 3476 ? Ssl 08:52 0:00 /usr/libexec/fprintd
root 81557 0.0 0.0 0 0 ? S 08:52 0:00 [kworker/20:1]
root 81564 0.0 0.0 148380 1588 ? SN 08:52 0:00 runuser -s /bin/sh -c exec /usr/lo
nagios 81566 247 0.1 3646232 460072 ? SNsl 08:52 0:32 java -XX:+UseParNewGC -XX:+UseConc
bpizzuti 81658 0.0 0.0 155188 1892 pts/0 R+ 08:52 0:00 ps -aux
apache 84760 0.0 0.0 326924 16220 ? S Feb26 0:10 /usr/sbin/httpd -DFOREGROUND
apache 84793 0.0 0.0 326864 16104 ? S Feb26 0:05 /usr/sbin/httpd -DFOREGROUND
root 109323 0.0 0.0 0 0 ? S Feb24 0:01 [kworker/9:2]
root 116812 0.0 0.0 0 0 ? S Feb23 0:03 [kworker/18:0]
root 125491 0.0 0.0 0 0 ? S Feb21 0:07 [kworker/4:0]
root 127752 0.0 0.0 0 0 ? S Feb24 0:03 [kworker/5:1]
root 128867 0.0 0.0 0 0 ? S Feb24 0:02 [kworker/3:2]
root 129338 0.0 0.0 0 0 ? S Feb23 0:04 [kworker/12:1]
apache 143153 0.0 0.0 326196 15396 ? S Feb26 0:06 /usr/sbin/httpd -DFOREGROUND
root 144782 0.0 0.0 0 0 ? S Feb24 0:03 [kworker/23:0]
root 146549 0.0 0.0 0 0 ? S Feb26 0:00 [kworker/14:0]
root 149172 0.0 0.0 0 0 ? S Feb24 0:03 [kworker/10:1]
root 149395 0.0 0.0 0 0 ? S Feb26 0:00 [kworker/22:1]
root 155110 0.0 0.0 0 0 ? S Feb25 0:01 [kworker/0:0]
root 159796 0.0 0.0 0 0 ? S Feb26 0:00 [kworker/15:1]
root 161132 0.0 0.0 0 0 ? S Feb26 0:00 [kworker/25:2]
root 161624 0.0 0.0 0 0 ? S Feb26 0:00 [kworker/17:2]
root 161730 0.0 0.0 0 0 ? S Feb24 0:03 [kworker/11:1]
root 161805 0.0 0.0 0 0 ? S Feb24 0:02 [kworker/16:1]
root 162827 0.0 0.0 0 0 ? S Feb24 0:02 [kworker/1:2]
root 169761 0.0 0.0 0 0 ? S Feb24 0:02 [kworker/17:1]
root 170479 0.0 0.0 0 0 ? S Feb26 0:00 [kworker/16:2]
root 173075 0.0 0.0 0 0 ? S Feb25 0:01 [kworker/22:2]
root 176313 0.0 0.0 0 0 ? S Feb26 0:00 [kworker/27:2]
root 176693 0.0 0.0 0 0 ? S Feb24 0:01 [kworker/24:1]
root 177961 0.0 0.0 0 0 ? S Feb26 0:00 [kworker/5:0]
root 178403 0.0 0.0 0 0 ? S Feb24 0:02 [kworker/6:1]
root 179781 0.0 0.0 0 0 ? S Feb24 0:03 [kworker/8:2]
root 183563 0.0 0.0 0 0 ? S Feb24 0:03 [kworker/13:0]
root 189556 0.0 0.0 0 0 ? S Feb26 0:01 [kworker/1:1]
root 190562 0.0 0.0 0 0 ? S Feb24 0:02 [kworker/26:1]
Re: Nagios log server stops collecting logs
Posted: Tue Feb 27, 2018 9:25 am
by scottwilkerson
actually lets look at
Thanks.
At what frequency are you seeing Logstash dying like this?
Re: Nagios log server stops collecting logs
Posted: Tue Feb 27, 2018 9:44 am
by bpizzutiWHI
Once every 3 or 4 days.
Code: Select all
ps -ef|grep java
nagios 2553 1 36 Feb20 ? 2-10:46:54 java -Xms128837m -Xmx128837m -Djava.awt.headless=true -XX:+UseParNewGC -XX:+UseConcMarkSweepGC -XX:CMSInitiatingOccupancyFraction=75 -XX:+UseCMSInitiatingOccupancyOnly -XX:+HeapDumpOnOutOfMemoryError -XX:+DisableExplicitGC -Dfile.encoding=UTF-8 -Des.cluster.name=8232b1f4-257b-4280-b1e8-3aa7e3d04006 -Des.node.name=a986f886-0c32-4cd2-9b56-95654f734914 -Des.discovery.zen.ping.unicast.hosts=localhost,10.200.16.12 -Des.path.repo=/ -Delasticsearch -Des.pidfile=/var/run/elasticsearch/elasticsearch.pid -Des.path.home=/usr/local/nagioslogserver/elasticsearch -cp :/usr/local/nagioslogserver/elasticsearch/lib/elasticsearch-1.7.6.jar:/usr/local/nagioslogserver/elasticsearch/lib/*:/usr/local/nagioslogserver/elasticsearch/lib/sigar/* -Des.default.path.home=/usr/local/nagioslogserver/elasticsearch -Des.default.path.logs=/var/log/elasticsearch -Des.default.path.data=/usr/local/nagioslogserver/elasticsearch/data -Des.default.path.work=/usr/local/nagioslogserver/tmp/elasticsearch -Des.default.path.conf=/usr/local/nagioslogserver/elasticsearch/config org.elasticsearch.bootstrap.Elasticsearch
nagios 83244 83242 99 08:56 ? 05:53:59 java -XX:+UseParNewGC -XX:+UseConcMarkSweepGC -Djava.awt.headless=true -XX:CMSInitiatingOccupancyFraction=75 -XX:+UseCMSInitiatingOccupancyOnly -XX:+HeapDumpOnOutOfMemoryError -Djava.io.tmpdir=/usr/local/nagioslogserver/tmp -Xmx500m -Xss2048k -Djffi.boot.library.path=/Array/nagioslogserver/logstash/vendor/jruby/lib/jni -XX:+UseParNewGC -XX:+UseConcMarkSweepGC -Djava.awt.headless=true -XX:CMSInitiatingOccupancyFraction=75 -XX:+UseCMSInitiatingOccupancyOnly -XX:+HeapDumpOnOutOfMemoryError -Djava.io.tmpdir=/usr/local/nagioslogserver/tmp -XX:HeapDumpPath=/usr/local/nagioslogserver/logstash/heapdump.hprof -Xbootclasspath/a:/Array/nagioslogserver/logstash/vendor/jruby/lib/jruby.jar -classpath : -Djruby.home=/Array/nagioslogserver/logstash/vendor/jruby -Djruby.lib=/Array/nagioslogserver/logstash/vendor/jruby/lib -Djruby.script=jruby -Djruby.shell=/bin/sh org.jruby.Main --1.9 /usr/local/nagioslogserver/logstash/lib/bootstrap/environment.rb logstash/runner.rb agent -f /usr/local/nagioslogserver/logstash/etc/conf.d -l /var/log/logstash/logstash.log -w 4
bpizzuti 94019 26045 0 09:44 pts/0 00:00:00 grep --color=auto java
Re: Nagios log server stops collecting logs
Posted: Tue Feb 27, 2018 10:20 am
by scottwilkerson
In looking over your logstash logs, all of the errors are coming from these 3 inputs
Code: Select all
tcp {
type => 'netlog'
port => 10525
}
syslog {
type => 'netlog2'
port => 10526
}
syslog {
type => 'netlog2'
port => 10526
}
first and foremost, you have a problem there and that is you have this in your config twice
Code: Select all
syslog {
type => 'netlog2'
port => 10526
}
this you should fix right away, and apply configuration.
Next you may want to think about investigating what is coming to 10525 & 10526 to see if it's possible that they are leaving threads open.
But I would start be removing the extra netlog2 entry, you cannot have 2 things listen on the same port
Re: Nagios log server stops collecting logs
Posted: Tue Feb 27, 2018 10:34 am
by bpizzutiWHI
I could swear I had edited that to say netlog3 and use the next port in sequence. I'll make the change and keep an eye on it. We're using these three ports for syslogging from network devices, and Cisco doesn't follow the syslog standard so I had to roll my own filter for some of this stuff.
Re: Nagios log server stops collecting logs
Posted: Tue Feb 27, 2018 10:47 am
by scottwilkerson
bpizzutiWHI wrote:I could swear I had edited that to say netlog3 and use the next port in sequence. I'll make the change and keep an eye on it. We're using these three ports for syslogging from network devices, and Cisco doesn't follow the syslog standard so I had to roll my own filter for some of this stuff.
Ya that can be irritating with the Ciscos. Let us know how it turns out