Nagios log server stops collecting logs

This support forum board is for support questions relating to Nagios Log Server, our solution for managing and monitoring critical log data.
Locked
bpizzutiWHI
Posts: 64
Joined: Thu Mar 02, 2017 10:15 am

Nagios log server stops collecting logs

Post by bpizzutiWHI »

I originally thought this was just Logstash dying, but it looks like it's happing right after a log rotation, and both the elasticsearch and logstash log files are zero size. I can provide the immediately prior logs that were GZiped. I put in a daily cron job to restart logstash but it didn't help, this happend a couple days ago and logstash never recovered.
kyang

Re: Nagios log server stops collecting logs

Post by kyang »

What version of NLS are you on now?

Can you post or PM the prior logs and some recent logs also?

Could you also post or PM your system profile?

NLS home page --> click "Admin" --> under System click "System Status" --> Click "Download System Profile"

Thanks!
bpizzutiWHI
Posts: 64
Joined: Thu Mar 02, 2017 10:15 am

Re: Nagios log server stops collecting logs

Post by bpizzutiWHI »

Haven't actually restarted it yet, so the only log present is the empty ones. Here are the prior log files and the system profile. We're running 2.0.2.
You do not have the required permissions to view the files attached to this post.
kyang

Re: Nagios log server stops collecting logs

Post by kyang »

Could you try restarting logstash?

Code: Select all

service logstash restart
Then the output of ps -aux?

Code: Select all

ps -aux
You have a lot of these errors.

Code: Select all

A plugin had an unrecoverable error. Will restart this plugin.\n  Plugin: <LogStash::Inputs::Tcp type=>\"netlog\", port=>10525
"UDP listener died"
"syslog listener died", :protocol=>:udp, :address=>"0.0.0.0:10526"
bpizzutiWHI
Posts: 64
Joined: Thu Mar 02, 2017 10:15 am

Re: Nagios log server stops collecting logs

Post by bpizzutiWHI »

I'm not sure why that listener keeps dying.

ps -aux:

Code: Select all

USER        PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
root          1  0.1  0.0 194684  5704 ?        Ss   Feb20  16:24 /usr/lib/systemd/systemd --switche
root          2  0.0  0.0      0     0 ?        S    Feb20   0:00 [kthreadd]
root          3  0.0  0.0      0     0 ?        S    Feb20   0:04 [ksoftirqd/0]
root          5  0.0  0.0      0     0 ?        S<   Feb20   0:00 [kworker/0:0H]
root          7  0.0  0.0      0     0 ?        S    Feb20   0:00 [migration/0]
root          8  0.0  0.0      0     0 ?        S    Feb20   0:00 [rcu_bh]
root          9  0.1  0.0      0     0 ?        S    Feb20   9:58 [rcu_sched]
root         10  0.0  0.0      0     0 ?        S    Feb20   0:01 [watchdog/0]
root         11  0.0  0.0      0     0 ?        S    Feb20   0:01 [watchdog/1]
root         12  0.0  0.0      0     0 ?        S    Feb20   0:00 [migration/1]
root         13  0.0  0.0      0     0 ?        S    Feb20   0:03 [ksoftirqd/1]
root         15  0.0  0.0      0     0 ?        S<   Feb20   0:00 [kworker/1:0H]
root         16  0.0  0.0      0     0 ?        S    Feb20   0:01 [watchdog/2]
root         17  0.0  0.0      0     0 ?        S    Feb20   0:00 [migration/2]
root         18  0.0  0.0      0     0 ?        S    Feb20   0:06 [ksoftirqd/2]
root         20  0.0  0.0      0     0 ?        S<   Feb20   0:00 [kworker/2:0H]
root         21  0.0  0.0      0     0 ?        S    Feb20   0:01 [watchdog/3]
root         22  0.0  0.0      0     0 ?        S    Feb20   0:00 [migration/3]
root         23  0.0  0.0      0     0 ?        S    Feb20   0:00 [ksoftirqd/3]
root         25  0.0  0.0      0     0 ?        S<   Feb20   0:00 [kworker/3:0H]
root         26  0.0  0.0      0     0 ?        S    Feb20   0:01 [watchdog/4]
root         27  0.0  0.0      0     0 ?        S    Feb20   0:00 [migration/4]
root         28  0.0  0.0      0     0 ?        S    Feb20   0:13 [ksoftirqd/4]
root         30  0.0  0.0      0     0 ?        S<   Feb20   0:00 [kworker/4:0H]
root         31  0.0  0.0      0     0 ?        S    Feb20   0:01 [watchdog/5]
root         32  0.0  0.0      0     0 ?        S    Feb20   0:00 [migration/5]
root         33  0.0  0.0      0     0 ?        S    Feb20   0:14 [ksoftirqd/5]
root         35  0.0  0.0      0     0 ?        S<   Feb20   0:00 [kworker/5:0H]
root         36  0.0  0.0      0     0 ?        S    Feb20   0:01 [watchdog/6]
root         37  0.0  0.0      0     0 ?        S    Feb20   0:00 [migration/6]
root         38  0.0  0.0      0     0 ?        S    Feb20   0:01 [ksoftirqd/6]
root         40  0.0  0.0      0     0 ?        S<   Feb20   0:00 [kworker/6:0H]
root         41  0.0  0.0      0     0 ?        S    Feb20   0:01 [watchdog/7]
root         42  0.0  0.0      0     0 ?        S    Feb20   0:00 [migration/7]
root         43  0.0  0.0      0     0 ?        S    Feb20   0:00 [ksoftirqd/7]
root         45  0.0  0.0      0     0 ?        S<   Feb20   0:00 [kworker/7:0H]
root         46  0.0  0.0      0     0 ?        S    Feb20   0:01 [watchdog/8]
root         47  0.0  0.0      0     0 ?        S    Feb20   0:00 [migration/8]
root         48  0.0  0.0      0     0 ?        S    Feb20   0:00 [ksoftirqd/8]
root         50  0.0  0.0      0     0 ?        S<   Feb20   0:00 [kworker/8:0H]
root         51  0.0  0.0      0     0 ?        S    Feb20   0:01 [watchdog/9]
root         52  0.0  0.0      0     0 ?        S    Feb20   0:00 [migration/9]
root         53  0.0  0.0      0     0 ?        S    Feb20   0:00 [ksoftirqd/9]
root         55  0.0  0.0      0     0 ?        S<   Feb20   0:00 [kworker/9:0H]
root         56  0.0  0.0      0     0 ?        S    Feb20   0:01 [watchdog/10]
root         57  0.0  0.0      0     0 ?        S    Feb20   0:00 [migration/10]
root         58  0.0  0.0      0     0 ?        S    Feb20   0:00 [ksoftirqd/10]
root         60  0.0  0.0      0     0 ?        S<   Feb20   0:00 [kworker/10:0H]
root         61  0.0  0.0      0     0 ?        S    Feb20   0:01 [watchdog/11]
root         62  0.0  0.0      0     0 ?        S    Feb20   0:00 [migration/11]
root         63  0.0  0.0      0     0 ?        S    Feb20   0:02 [ksoftirqd/11]
root         65  0.0  0.0      0     0 ?        S<   Feb20   0:00 [kworker/11:0H]
root         66  0.0  0.0      0     0 ?        S    Feb20   0:01 [watchdog/12]
root         67  0.0  0.0      0     0 ?        S    Feb20   0:00 [migration/12]
root         68  0.0  0.0      0     0 ?        S    Feb20   0:00 [ksoftirqd/12]
root         70  0.0  0.0      0     0 ?        S<   Feb20   0:00 [kworker/12:0H]
root         71  0.0  0.0      0     0 ?        S    Feb20   0:01 [watchdog/13]
root         72  0.0  0.0      0     0 ?        S    Feb20   0:00 [migration/13]
root         73  0.0  0.0      0     0 ?        S    Feb20   0:00 [ksoftirqd/13]
root         75  0.0  0.0      0     0 ?        S<   Feb20   0:00 [kworker/13:0H]
root         76  0.0  0.0      0     0 ?        S    Feb20   0:01 [watchdog/14]
root         77  0.0  0.0      0     0 ?        S    Feb20   0:01 [migration/14]
root         78  0.0  0.0      0     0 ?        S    Feb20   0:00 [ksoftirqd/14]
root         80  0.0  0.0      0     0 ?        S<   Feb20   0:00 [kworker/14:0H]
root         81  0.0  0.0      0     0 ?        S    Feb20   0:01 [watchdog/15]
root         82  0.0  0.0      0     0 ?        S    Feb20   0:01 [migration/15]
root         83  0.0  0.0      0     0 ?        S    Feb20   0:00 [ksoftirqd/15]
root         85  0.0  0.0      0     0 ?        S<   Feb20   0:00 [kworker/15:0H]
root         86  0.0  0.0      0     0 ?        S    Feb20   0:01 [watchdog/16]
root         87  0.0  0.0      0     0 ?        S    Feb20   0:00 [migration/16]
root         88  0.0  0.0      0     0 ?        S    Feb20   0:04 [ksoftirqd/16]
root         90  0.0  0.0      0     0 ?        S<   Feb20   0:00 [kworker/16:0H]
root         91  0.0  0.0      0     0 ?        S    Feb20   0:01 [watchdog/17]
root         92  0.0  0.0      0     0 ?        S    Feb20   0:01 [migration/17]
root         93  0.0  0.0      0     0 ?        S    Feb20   0:01 [ksoftirqd/17]
root         95  0.0  0.0      0     0 ?        S<   Feb20   0:00 [kworker/17:0H]
root         96  0.0  0.0      0     0 ?        S    Feb20   0:01 [watchdog/18]
root         97  0.0  0.0      0     0 ?        S    Feb20   0:01 [migration/18]
root         98  0.0  0.0      0     0 ?        S    Feb20   0:06 [ksoftirqd/18]
root        100  0.0  0.0      0     0 ?        S<   Feb20   0:00 [kworker/18:0H]
root        101  0.0  0.0      0     0 ?        S    Feb20   0:01 [watchdog/19]
root        102  0.0  0.0      0     0 ?        S    Feb20   0:01 [migration/19]
root        103  0.0  0.0      0     0 ?        S    Feb20   0:06 [ksoftirqd/19]
root        105  0.0  0.0      0     0 ?        S<   Feb20   0:00 [kworker/19:0H]
root        106  0.0  0.0      0     0 ?        S    Feb20   0:01 [watchdog/20]
root        107  0.0  0.0      0     0 ?        S    Feb20   0:01 [migration/20]
root        108  0.0  0.0      0     0 ?        S    Feb20   0:10 [ksoftirqd/20]
root        110  0.0  0.0      0     0 ?        S<   Feb20   0:00 [kworker/20:0H]
root        111  0.0  0.0      0     0 ?        S    Feb20   0:01 [watchdog/21]
root        112  0.0  0.0      0     0 ?        S    Feb20   0:01 [migration/21]
root        113  0.0  0.0      0     0 ?        S    Feb20   0:00 [ksoftirqd/21]
root        115  0.0  0.0      0     0 ?        S<   Feb20   0:00 [kworker/21:0H]
root        116  0.0  0.0      0     0 ?        S    Feb20   0:01 [watchdog/22]
root        117  0.0  0.0      0     0 ?        S    Feb20   0:01 [migration/22]
root        118  0.0  0.0      0     0 ?        S    Feb20   0:00 [ksoftirqd/22]
root        121  0.0  0.0      0     0 ?        S    Feb20   0:01 [watchdog/23]
root        122  0.0  0.0      0     0 ?        S    Feb20   0:01 [migration/23]
root        123  0.0  0.0      0     0 ?        S    Feb20   0:01 [ksoftirqd/23]
root        125  0.0  0.0      0     0 ?        S<   Feb20   0:00 [kworker/23:0H]
root        126  0.0  0.0      0     0 ?        S    Feb20   0:01 [watchdog/24]
root        127  0.0  0.0      0     0 ?        S    Feb20   0:01 [migration/24]
root        128  0.0  0.0      0     0 ?        S    Feb20   0:00 [ksoftirqd/24]
root        130  0.0  0.0      0     0 ?        S<   Feb20   0:00 [kworker/24:0H]
root        131  0.0  0.0      0     0 ?        S    Feb20   0:01 [watchdog/25]
root        132  0.0  0.0      0     0 ?        S    Feb20   0:01 [migration/25]
root        133  0.0  0.0      0     0 ?        S    Feb20   0:12 [ksoftirqd/25]
root        135  0.0  0.0      0     0 ?        S<   Feb20   0:00 [kworker/25:0H]
root        136  0.0  0.0      0     0 ?        S    Feb20   0:01 [watchdog/26]
root        137  0.0  0.0      0     0 ?        S    Feb20   0:01 [migration/26]
root        138  0.0  0.0      0     0 ?        S    Feb20   0:00 [ksoftirqd/26]
root        140  0.0  0.0      0     0 ?        S<   Feb20   0:00 [kworker/26:0H]
root        141  0.0  0.0      0     0 ?        S    Feb20   0:01 [watchdog/27]
root        142  0.0  0.0      0     0 ?        S    Feb20   0:01 [migration/27]
root        143  0.0  0.0      0     0 ?        S    Feb20   0:00 [ksoftirqd/27]
root        145  0.0  0.0      0     0 ?        S<   Feb20   0:00 [kworker/27:0H]
root        147  0.0  0.0      0     0 ?        S    Feb20   0:00 [kdevtmpfs]
root        148  0.0  0.0      0     0 ?        S<   Feb20   0:00 [netns]
root        149  0.0  0.0      0     0 ?        S    Feb20   0:00 [khungtaskd]
root        150  0.0  0.0      0     0 ?        S<   Feb20   0:00 [writeback]
root        151  0.0  0.0      0     0 ?        S<   Feb20   0:00 [kintegrityd]
root        152  0.0  0.0      0     0 ?        S<   Feb20   0:00 [bioset]
root        153  0.0  0.0      0     0 ?        S<   Feb20   0:00 [kblockd]
root        155  0.0  0.0      0     0 ?        S<   Feb20   0:00 [md]
root        161  0.0  0.0      0     0 ?        S    Feb20   0:03 [kswapd0]
root        162  0.0  0.0      0     0 ?        SN   Feb20   0:00 [ksmd]
root        163  0.0  0.0      0     0 ?        SN   Feb20   0:01 [khugepaged]
root        164  0.0  0.0      0     0 ?        S    Feb20   0:00 [fsnotify_mark]
root        165  0.0  0.0      0     0 ?        S<   Feb20   0:00 [crypto]
root        173  0.0  0.0      0     0 ?        S<   Feb20   0:00 [kthrotld]
root        175  0.0  0.0      0     0 ?        S<   Feb20   0:00 [kmpath_rdacd]
root        180  0.0  0.0      0     0 ?        S<   Feb20   0:00 [kpsmoused]
root        181  0.0  0.0      0     0 ?        S<   Feb20   0:00 [ipv6_addrconf]
root        200  0.0  0.0      0     0 ?        S<   Feb20   0:00 [deferwq]
root        236  0.0  0.0      0     0 ?        S    Feb20   0:19 [kauditd]
root        360  0.0  0.0      0     0 ?        S    Feb24   0:07 [kworker/14:2]
root        445  0.0  0.0      0     0 ?        S    Feb20   0:00 [scsi_eh_0]
root        446  0.0  0.0      0     0 ?        S<   Feb20   0:00 [scsi_tmf_0]
root        447  0.0  0.0      0     0 ?        S<   Feb20   0:00 [ata_sff]
root        462  0.0  0.0      0     0 ?        S    Feb20   0:00 [scsi_eh_1]
root        463  0.0  0.0      0     0 ?        S<   Feb20   0:00 [scsi_tmf_1]
root        464  0.0  0.0      0     0 ?        S    Feb20   0:00 [scsi_eh_2]
root        465  0.0  0.0      0     0 ?        S<   Feb20   0:00 [scsi_tmf_2]
root        466  0.0  0.0      0     0 ?        S    Feb20   0:00 [scsi_eh_3]
root        467  0.0  0.0      0     0 ?        S<   Feb20   0:00 [scsi_tmf_3]
root        468  0.0  0.0      0     0 ?        S    Feb20   0:00 [scsi_eh_4]
root        469  0.0  0.0      0     0 ?        S<   Feb20   0:00 [scsi_tmf_4]
root        478  0.0  0.0      0     0 ?        S<   Feb20   0:00 [ttm_swap]
root        479  0.0  0.0      0     0 ?        S    Feb20   0:00 [scsi_eh_5]
root        480  0.0  0.0      0     0 ?        S<   Feb20   0:00 [scsi_tmf_5]
root        481  0.0  0.0      0     0 ?        S    Feb20   0:00 [scsi_eh_6]
root        482  0.0  0.0      0     0 ?        S<   Feb20   0:00 [scsi_tmf_6]
root        483  0.0  0.0      0     0 ?        S    Feb20   0:00 [scsi_eh_7]
root        484  0.0  0.0      0     0 ?        S<   Feb20   0:00 [scsi_tmf_7]
root        485  0.0  0.0      0     0 ?        S    Feb20   0:00 [scsi_eh_8]
root        486  0.0  0.0      0     0 ?        S<   Feb20   0:00 [scsi_tmf_8]
root        487  0.0  0.0      0     0 ?        S    Feb20   0:00 [scsi_eh_9]
root        488  0.0  0.0      0     0 ?        S<   Feb20   0:00 [scsi_tmf_9]
root        489  0.0  0.0      0     0 ?        S    Feb20   0:00 [scsi_eh_10]
root        490  0.0  0.0      0     0 ?        S<   Feb20   0:00 [scsi_tmf_10]
root        626  0.0  0.0      0     0 ?        S<   Feb20   0:00 [kdmflush]
root        627  0.0  0.0      0     0 ?        S<   Feb20   0:00 [bioset]
root        639  0.0  0.0      0     0 ?        S<   Feb20   0:00 [kdmflush]
root        640  0.0  0.0      0     0 ?        S<   Feb20   0:00 [bioset]
root        653  0.0  0.0      0     0 ?        S<   Feb20   0:00 [xfsalloc]
root        654  0.0  0.0      0     0 ?        S<   Feb20   0:00 [xfs_mru_cache]
root        655  0.0  0.0      0     0 ?        S<   Feb20   0:00 [xfs-buf/dm-0]
root        656  0.0  0.0      0     0 ?        S<   Feb20   0:00 [xfs-data/dm-0]
root        657  0.0  0.0      0     0 ?        S<   Feb20   0:00 [xfs-conv/dm-0]
root        658  0.0  0.0      0     0 ?        S<   Feb20   0:00 [xfs-cil/dm-0]
root        659  0.0  0.0      0     0 ?        S<   Feb20   0:00 [xfs-reclaim/dm-]
root        660  0.0  0.0      0     0 ?        S<   Feb20   0:00 [xfs-log/dm-0]
root        661  0.0  0.0      0     0 ?        S<   Feb20   0:00 [xfs-eofblocks/d]
root        662  0.0  0.0      0     0 ?        S    Feb20   2:21 [xfsaild/dm-0]
root        731  0.0  0.0 141564 73872 ?        Ss   Feb20   2:41 /usr/lib/systemd/systemd-journald
root        744  0.0  0.0      0     0 ?        S<   Feb20   0:00 [kworker/3:1H]
root        758  0.0  0.0 268404  1484 ?        Ss   Feb20   0:00 /usr/sbin/lvmetad -f
root        765  0.0  0.0  45556  1824 ?        Ss   Feb20   0:00 /usr/lib/systemd/systemd-udevd
root       1331  0.0  0.0      0     0 ?        S<   Feb20   0:00 [edac-poller]
root       1744  0.0  0.0      0     0 ?        S<   Feb20   0:00 [kvm-irqfd-clean]
root       1765  0.0  0.0      0     0 ?        S<   Feb20   0:00 [xfs-buf/sda1]
root       1766  0.0  0.0      0     0 ?        S<   Feb20   0:00 [xfs-data/sda1]
root       1767  0.0  0.0      0     0 ?        S<   Feb20   0:00 [xfs-conv/sda1]
root       1768  0.0  0.0      0     0 ?        S<   Feb20   0:00 [xfs-cil/sda1]
root       1769  0.0  0.0      0     0 ?        S<   Feb20   0:00 [xfs-reclaim/sda]
root       1770  0.0  0.0      0     0 ?        S<   Feb20   0:00 [xfs-log/sda1]
root       1771  0.0  0.0      0     0 ?        S<   Feb20   0:00 [xfs-eofblocks/s]
root       1773  0.0  0.0      0     0 ?        S    Feb20   0:00 [xfsaild/sda1]
root       1779  0.0  0.0      0     0 ?        S<   Feb20   0:00 [kworker/5:1H]
root       1783  0.0  0.0      0     0 ?        S<   Feb20   0:00 [kworker/6:1H]
root       1790  0.0  0.0      0     0 ?        S<   Feb20   0:00 [kworker/7:1H]
root       1794  0.0  0.0      0     0 ?        S<   Feb20   0:00 [kdmflush]
root       1795  0.0  0.0      0     0 ?        S<   Feb20   0:00 [bioset]
root       1797  0.0  0.0      0     0 ?        S<   Feb20   0:00 [kdmflush]
root       1798  0.0  0.0      0     0 ?        S<   Feb20   0:00 [bioset]
root       1804  0.0  0.0      0     0 ?        S<   Feb20   0:00 [xfs-buf/dm-2]
root       1805  0.0  0.0      0     0 ?        S<   Feb20   0:00 [xfs-data/dm-2]
root       1806  0.0  0.0      0     0 ?        S<   Feb20   0:00 [xfs-conv/dm-2]
root       1807  0.0  0.0      0     0 ?        S<   Feb20   0:00 [xfs-cil/dm-2]
root       1808  0.0  0.0      0     0 ?        S<   Feb20   0:00 [xfs-reclaim/dm-]
root       1809  0.0  0.0      0     0 ?        S<   Feb20   0:00 [xfs-log/dm-2]
root       1810  0.0  0.0      0     0 ?        S<   Feb20   0:00 [xfs-eofblocks/d]
root       1811  0.0  0.0      0     0 ?        S    Feb20   0:00 [xfsaild/dm-2]
root       1815  0.0  0.0      0     0 ?        S<   Feb20   0:00 [xfs-buf/dm-3]
root       1816  0.0  0.0      0     0 ?        S<   Feb20   0:00 [xfs-data/dm-3]
root       1817  0.0  0.0      0     0 ?        S<   Feb20   0:00 [xfs-conv/dm-3]
root       1818  0.0  0.0      0     0 ?        S<   Feb20   0:00 [xfs-cil/dm-3]
root       1819  0.0  0.0      0     0 ?        S<   Feb20   0:00 [xfs-reclaim/dm-]
root       1820  0.0  0.0      0     0 ?        S<   Feb20   0:00 [xfs-log/dm-3]
root       1821  0.0  0.0      0     0 ?        S<   Feb20   0:00 [xfs-eofblocks/d]
root       1822  0.0  0.0      0     0 ?        S    Feb20   2:09 [xfsaild/dm-3]
root       1834  0.0  0.0  57464  1848 ?        S<sl Feb20   0:35 /sbin/auditd -n
root       1856  0.0  0.0  26300  1756 ?        Ss   Feb20   0:41 /usr/lib/systemd/systemd-logind
root       1857  0.0  0.0   6412   592 ?        Ss   Feb20   6:39 /sbin/rngd -f
libstor+   1864  0.0  0.0  10568   804 ?        Ss   Feb20   0:01 /usr/bin/lsmd -d
root       1866  0.0  0.0  21476  1384 ?        Ss   Feb20   1:56 /usr/sbin/irqbalance --foreground
root       1870  0.0  0.0 214832  5492 ?        Ss   Feb20   0:00 /usr/sbin/abrtd -d -s
root       1872  0.0  0.0 212332  4560 ?        Ss   Feb20   0:05 /usr/bin/abrt-watch-log -F BUG: WA
polkitd    1877  0.0  0.0 531776 12168 ?        Ssl  Feb20   0:41 /usr/lib/polkit-1/polkitd --no-deb
root       1880  0.0  0.0 130376  2840 ?        Ss   Feb20   0:00 /usr/sbin/smartd -n -q never
dbus       1882  0.0  0.0  28668  1924 ?        Ss   Feb20   1:29 /bin/dbus-daemon --system --addres
root       1883  0.0  0.0      0     0 ?        S<   Feb20   0:00 [kworker/2:1H]
chrony     1884  0.0  0.0 117892  1892 ?        S    Feb20   0:01 /usr/sbin/chronyd
root       1887  0.0  0.0 437884 10556 ?        Ssl  Feb20   0:23 /usr/sbin/NetworkManager --no-daem
root       1892  0.0  0.0 127880  1596 ?        Ss   Feb20   0:06 /usr/sbin/crond -n
root       1894  0.0  0.0  27892   940 ?        Ss   Feb20   0:00 /usr/sbin/atd -f
root       1943  0.0  0.0      0     0 ?        S<   Feb20   0:00 [kworker/20:1H]
root       1944  0.0  0.0      0     0 ?        S<   Feb20   0:00 [kworker/0:1H]
root       2004  0.0  0.0      0     0 ?        S<   Feb20   0:01 [kworker/23:1H]
root       2211  0.0  0.0 555704 16620 ?        Ssl  Feb20   0:49 /usr/bin/python -Es /usr/sbin/tune
root       2212  0.0  0.0 597480 48584 ?        Ssl  Feb20   1:52 /usr/sbin/rsyslogd -n
root       2214  0.0  0.0 226252 11636 ?        Ss   Feb20   6:39 /usr/sbin/snmpd -LS0-6d -f
root       2222  0.0  0.0 108004  4072 ?        Ss   Feb20   0:00 /usr/sbin/sshd -D
root       2223  0.0  0.0 319340 12960 ?        Ss   Feb20   0:18 /usr/sbin/httpd -DFOREGROUND
nagios     2553 34.6 52.1 167424240 137677780 ? SLl  Feb20 3318:25 java -Xms128837m -Xmx128837m -Dja
root       3689  0.0  0.0 1063876 23808 ?       Ssl  Feb20   4:04 /opt/dell/srvadmin/sbin/dsm_sa_dat
root       3774  0.0  0.0      0     0 ?        S    01:01   0:00 [kworker/0:2]
root       3846  0.0  0.0      0     0 ?        S<   Feb20   0:01 [kworker/1:1H]
root       3888  0.0  0.0      0     0 ?        S<   Feb20   0:01 [kworker/25:1H]
root       3889  0.0  0.0      0     0 ?        S<   Feb20   0:01 [kworker/9:1H]
root       3890  0.0  0.0      0     0 ?        S<   Feb20   0:00 [kworker/18:1H]
root       3891  0.0  0.0      0     0 ?        S<   Feb20   0:00 [kworker/16:1H]
root       3899  0.0  0.0  96704  2712 ?        Ss   Feb20   0:10 sendmail: accepting connections
root       3920  0.0  0.0 161708  2788 ?        S    Feb20   0:14 /usr/netvault/bin/nvpmgr startup
root       3921  0.0  0.0 134600  2300 ?        S    Feb20   0:06 nvcmgr 2
smmsp      3922  0.0  0.0  88152  2012 ?        Ss   Feb20   0:00 sendmail: Queue runner@01:00:00 fo
root       3931  0.3  0.0 135644  3544 ?        S    Feb20  32:46 nvnmgr 3
root       3932  0.0  0.0 142680  2608 ?        S    Feb20   0:06 nvstatsmngr 9
root       3934  0.0  0.0 134580  2128 ?        S    Feb20   0:00 nvconsolesvc 15
root       3974  0.0  0.0 303024  4096 ?        Ssl  Feb20   0:07 /opt/dell/srvadmin/sbin/dsm_sa_eve
root       4001  0.0  0.0 515504 10372 ?        Ssl  Feb20   1:38 /opt/dell/srvadmin/sbin/dsm_sa_snm
root       4066  0.0  0.0 727736 17260 ?        Ss   Feb20   0:00 /opt/dell/srvadmin/sbin/dsm_sa_dat
root       4097  0.0  0.0      0     0 ?        S<   Feb20   0:00 [kworker/14:1H]
root       4143  0.0  0.0 140428   892 ?        Ss   Feb20   0:00 /opt/dell/srvadmin/sbin/dsm_om_con
root       4144  0.1  0.1 6252736 351800 ?      Sl   Feb20  17:55 /opt/dell/srvadmin/sbin/dsm_om_con
root       4187  0.0  0.0 700464  4208 ?        Ssl  Feb20   0:00 /opt/dell/srvadmin/sbin/dsm_om_shr
root       4711  0.0  0.0      0     0 ?        S    Feb24   0:02 [kworker/20:0]
root       5398  0.0  0.0      0     0 ?        S<   Feb20   0:00 [kworker/19:1H]
root       5421  0.0  0.0      0     0 ?        S<   Feb20   0:01 [kworker/8:1H]
root       5488  0.0  0.0      0     0 ?        S<   Feb20   0:00 [kworker/11:1H]
root       5645  0.0  0.0      0     0 ?        S<   Feb20   0:00 [kworker/17:1H]
root       5861  0.0  0.0      0     0 ?        S<   Feb20   0:00 [kworker/27:1H]
root       5888  0.0  0.0      0     0 ?        S<   Feb20   0:00 [kworker/15:1H]
root       5891  0.0  0.0      0     0 ?        S<   Feb20   0:00 [kworker/10:1H]
root       5958  0.0  0.0      0     0 ?        S<   Feb20   0:00 [kworker/4:1H]
root       5976  0.0  0.0      0     0 ?        S<   Feb20   0:01 [kworker/13:1H]
root       5996  0.0  0.0      0     0 ?        S<   Feb20   0:01 [kworker/21:1H]
root       6096  0.0  0.0 112080   852 tty1     Ss+  Feb20   0:00 /sbin/agetty --noclear tty1 linux
root       6133  0.0  0.0      0     0 ?        S<   Feb20   0:01 [kworker/24:1H]
root       6240  0.0  0.0      0     0 ?        S<   Feb20   0:00 [kworker/12:1H]
apache     9453  0.0  0.0 430772 16908 ?        S    Feb26   0:13 /usr/sbin/httpd -DFOREGROUND
root       9699  0.0  0.0      0     0 ?        S<   Feb20   0:00 [kworker/26:1H]
root      10329  0.0  0.0      0     0 ?        S<   Feb24   0:00 [kworker/22:0H]
root      11486  0.0  0.0      0     0 ?        S    Feb24   0:02 [kworker/7:2]
root      14043  0.0  0.0      0     0 ?        S    02:01   0:00 [kworker/24:0]
root      15471  0.0  0.0      0     0 ?        S    Feb24   0:02 [kworker/21:2]
root      16914  0.0  0.0      0     0 ?        S    Feb26   0:00 [kworker/9:0]
root      19287  0.0  0.0      0     0 ?        S<   02:33   0:00 [kworker/22:2H]
root      22968  0.0  0.0      0     0 ?        S    02:55   0:00 [kworker/25:0]
root      25464  0.0  0.0      0     0 ?        S    03:10   0:00 [kworker/23:2]
apache    25735  0.0  0.0 430964 17120 ?        S    Feb26   0:14 /usr/sbin/httpd -DFOREGROUND
apache    25738  0.0  0.0 430824 16792 ?        S    Feb26   0:14 /usr/sbin/httpd -DFOREGROUND
root      26027  0.0  0.0 155136  5980 ?        Ss   Feb26   0:00 sshd: bpizzuti [priv]
bpizzuti  26029  0.0  0.0 155720  3220 ?        S    Feb26   0:00 sshd: bpizzuti@pts/0
bpizzuti  26030  0.0  0.0  55464  2196 ?        Ss   Feb26   0:00 /usr/libexec/openssh/sftp-server
bpizzuti  26045  0.0  0.0 117964  2660 pts/0    Ss   Feb26   0:00 -bash
root      29248  0.0  0.0      0     0 ?        S    Feb26   0:00 [kworker/13:2]
root      31276  0.0  0.0      0     0 ?        S    03:46   0:00 [kworker/4:1]
root      33719  0.0  0.0      0     0 ?        S    04:01   0:00 [kworker/8:1]
root      36833  0.0  0.0      0     0 ?        S    04:19   0:00 [kworker/10:2]
root      38538  0.0  0.0      0     0 ?        S    04:30   0:00 [kworker/26:0]
apache    41657  0.0  0.0 430308 16088 ?        S    Feb26   0:12 /usr/sbin/httpd -DFOREGROUND
apache    41718  0.0  0.0 430516 16312 ?        S    Feb26   0:09 /usr/sbin/httpd -DFOREGROUND
root      43588  0.0  0.0      0     0 ?        S    05:01   0:00 [kworker/27:1]
root      47046  0.0  0.0      0     0 ?        S    05:22   0:00 [kworker/12:0]
root      51085  0.0  0.0      0     0 ?        S    05:47   0:00 [kworker/20:2]
root      51298  0.0  0.0      0     0 ?        S    05:48   0:00 [kworker/6:0]
root      52597  0.0  0.0      0     0 ?        S    Feb26   0:00 [kworker/18:2]
root      54587  0.0  0.0      0     0 ?        S    06:08   0:00 [kworker/2:1]
apache    56484  0.0  0.0 326924 16204 ?        S    Feb26   0:07 /usr/sbin/httpd -DFOREGROUND
root      57682  0.0  0.0      0     0 ?        S    Feb25   0:00 [kworker/7:1]
root      59594  0.0  0.0      0     0 ?        S    06:38   0:00 [kworker/u384:2]
root      62678  0.0  0.0      0     0 ?        S    Feb25   0:01 [kworker/2:2]
root      63253  0.0  0.0      0     0 ?        S    07:01   0:00 [kworker/15:2]
root      63831  0.0  0.0      0     0 ?        S    Feb25   0:01 [kworker/19:0]
root      71300  0.0  0.0      0     0 ?        S    07:50   0:00 [kworker/19:2]
root      73086  0.0  0.0      0     0 ?        S    08:01   0:00 [kworker/3:0]
root      75087  0.0  0.0      0     0 ?        S    08:13   0:00 [kworker/21:0]
root      75886  0.0  0.0      0     0 ?        S    Feb26   0:00 [kworker/11:0]
apache    77627  0.0  0.0 430516 16628 ?        S    Feb25   0:47 /usr/sbin/httpd -DFOREGROUND
root      80083  0.0  0.0      0     0 ?        S    08:43   0:00 [kworker/u384:0]
root      81412  0.0  0.0 181860  2520 ?        S    08:52   0:00 /usr/sbin/CROND -n
nagios    81420  0.0  0.0 115164  1204 ?        Ss   08:52   0:00 /bin/sh -c /usr/bin/php -q /var/ww
nagios    81422  0.1  0.0 244884 14500 ?        S    08:52   0:00 /usr/bin/php -q /var/www/html/nagi
root      81522  0.0  0.0 148396  3476 ?        Ssl  08:52   0:00 /usr/libexec/fprintd
root      81557  0.0  0.0      0     0 ?        S    08:52   0:00 [kworker/20:1]
root      81564  0.0  0.0 148380  1588 ?        SN   08:52   0:00 runuser -s /bin/sh -c exec /usr/lo
nagios    81566  247  0.1 3646232 460072 ?      SNsl 08:52   0:32 java -XX:+UseParNewGC -XX:+UseConc
bpizzuti  81658  0.0  0.0 155188  1892 pts/0    R+   08:52   0:00 ps -aux
apache    84760  0.0  0.0 326924 16220 ?        S    Feb26   0:10 /usr/sbin/httpd -DFOREGROUND
apache    84793  0.0  0.0 326864 16104 ?        S    Feb26   0:05 /usr/sbin/httpd -DFOREGROUND
root     109323  0.0  0.0      0     0 ?        S    Feb24   0:01 [kworker/9:2]
root     116812  0.0  0.0      0     0 ?        S    Feb23   0:03 [kworker/18:0]
root     125491  0.0  0.0      0     0 ?        S    Feb21   0:07 [kworker/4:0]
root     127752  0.0  0.0      0     0 ?        S    Feb24   0:03 [kworker/5:1]
root     128867  0.0  0.0      0     0 ?        S    Feb24   0:02 [kworker/3:2]
root     129338  0.0  0.0      0     0 ?        S    Feb23   0:04 [kworker/12:1]
apache   143153  0.0  0.0 326196 15396 ?        S    Feb26   0:06 /usr/sbin/httpd -DFOREGROUND
root     144782  0.0  0.0      0     0 ?        S    Feb24   0:03 [kworker/23:0]
root     146549  0.0  0.0      0     0 ?        S    Feb26   0:00 [kworker/14:0]
root     149172  0.0  0.0      0     0 ?        S    Feb24   0:03 [kworker/10:1]
root     149395  0.0  0.0      0     0 ?        S    Feb26   0:00 [kworker/22:1]
root     155110  0.0  0.0      0     0 ?        S    Feb25   0:01 [kworker/0:0]
root     159796  0.0  0.0      0     0 ?        S    Feb26   0:00 [kworker/15:1]
root     161132  0.0  0.0      0     0 ?        S    Feb26   0:00 [kworker/25:2]
root     161624  0.0  0.0      0     0 ?        S    Feb26   0:00 [kworker/17:2]
root     161730  0.0  0.0      0     0 ?        S    Feb24   0:03 [kworker/11:1]
root     161805  0.0  0.0      0     0 ?        S    Feb24   0:02 [kworker/16:1]
root     162827  0.0  0.0      0     0 ?        S    Feb24   0:02 [kworker/1:2]
root     169761  0.0  0.0      0     0 ?        S    Feb24   0:02 [kworker/17:1]
root     170479  0.0  0.0      0     0 ?        S    Feb26   0:00 [kworker/16:2]
root     173075  0.0  0.0      0     0 ?        S    Feb25   0:01 [kworker/22:2]
root     176313  0.0  0.0      0     0 ?        S    Feb26   0:00 [kworker/27:2]
root     176693  0.0  0.0      0     0 ?        S    Feb24   0:01 [kworker/24:1]
root     177961  0.0  0.0      0     0 ?        S    Feb26   0:00 [kworker/5:0]
root     178403  0.0  0.0      0     0 ?        S    Feb24   0:02 [kworker/6:1]
root     179781  0.0  0.0      0     0 ?        S    Feb24   0:03 [kworker/8:2]
root     183563  0.0  0.0      0     0 ?        S    Feb24   0:03 [kworker/13:0]
root     189556  0.0  0.0      0     0 ?        S    Feb26   0:01 [kworker/1:1]
root     190562  0.0  0.0      0     0 ?        S    Feb24   0:02 [kworker/26:1]
scottwilkerson
DevOps Engineer
Posts: 19396
Joined: Tue Nov 15, 2011 3:11 pm
Location: Nagios Enterprises
Contact:

Re: Nagios log server stops collecting logs

Post by scottwilkerson »

actually lets look at

Code: Select all

ps -ef|grep java
Thanks.

At what frequency are you seeing Logstash dying like this?
Former Nagios employee
Creator:
Human Design Website
Get Your Human Design Chart
bpizzutiWHI
Posts: 64
Joined: Thu Mar 02, 2017 10:15 am

Re: Nagios log server stops collecting logs

Post by bpizzutiWHI »

Once every 3 or 4 days.

Code: Select all

 ps -ef|grep java
nagios     2553      1 36 Feb20 ?        2-10:46:54 java -Xms128837m -Xmx128837m -Djava.awt.headless=true -XX:+UseParNewGC -XX:+UseConcMarkSweepGC -XX:CMSInitiatingOccupancyFraction=75 -XX:+UseCMSInitiatingOccupancyOnly -XX:+HeapDumpOnOutOfMemoryError -XX:+DisableExplicitGC -Dfile.encoding=UTF-8 -Des.cluster.name=8232b1f4-257b-4280-b1e8-3aa7e3d04006 -Des.node.name=a986f886-0c32-4cd2-9b56-95654f734914 -Des.discovery.zen.ping.unicast.hosts=localhost,10.200.16.12 -Des.path.repo=/ -Delasticsearch -Des.pidfile=/var/run/elasticsearch/elasticsearch.pid -Des.path.home=/usr/local/nagioslogserver/elasticsearch -cp :/usr/local/nagioslogserver/elasticsearch/lib/elasticsearch-1.7.6.jar:/usr/local/nagioslogserver/elasticsearch/lib/*:/usr/local/nagioslogserver/elasticsearch/lib/sigar/* -Des.default.path.home=/usr/local/nagioslogserver/elasticsearch -Des.default.path.logs=/var/log/elasticsearch -Des.default.path.data=/usr/local/nagioslogserver/elasticsearch/data -Des.default.path.work=/usr/local/nagioslogserver/tmp/elasticsearch -Des.default.path.conf=/usr/local/nagioslogserver/elasticsearch/config org.elasticsearch.bootstrap.Elasticsearch
nagios    83244  83242 99 08:56 ?        05:53:59 java -XX:+UseParNewGC -XX:+UseConcMarkSweepGC -Djava.awt.headless=true -XX:CMSInitiatingOccupancyFraction=75 -XX:+UseCMSInitiatingOccupancyOnly -XX:+HeapDumpOnOutOfMemoryError -Djava.io.tmpdir=/usr/local/nagioslogserver/tmp -Xmx500m -Xss2048k -Djffi.boot.library.path=/Array/nagioslogserver/logstash/vendor/jruby/lib/jni -XX:+UseParNewGC -XX:+UseConcMarkSweepGC -Djava.awt.headless=true -XX:CMSInitiatingOccupancyFraction=75 -XX:+UseCMSInitiatingOccupancyOnly -XX:+HeapDumpOnOutOfMemoryError -Djava.io.tmpdir=/usr/local/nagioslogserver/tmp -XX:HeapDumpPath=/usr/local/nagioslogserver/logstash/heapdump.hprof -Xbootclasspath/a:/Array/nagioslogserver/logstash/vendor/jruby/lib/jruby.jar -classpath : -Djruby.home=/Array/nagioslogserver/logstash/vendor/jruby -Djruby.lib=/Array/nagioslogserver/logstash/vendor/jruby/lib -Djruby.script=jruby -Djruby.shell=/bin/sh org.jruby.Main --1.9 /usr/local/nagioslogserver/logstash/lib/bootstrap/environment.rb logstash/runner.rb agent -f /usr/local/nagioslogserver/logstash/etc/conf.d -l /var/log/logstash/logstash.log -w 4
bpizzuti  94019  26045  0 09:44 pts/0    00:00:00 grep --color=auto java
scottwilkerson
DevOps Engineer
Posts: 19396
Joined: Tue Nov 15, 2011 3:11 pm
Location: Nagios Enterprises
Contact:

Re: Nagios log server stops collecting logs

Post by scottwilkerson »

In looking over your logstash logs, all of the errors are coming from these 3 inputs

Code: Select all

tcp {
        type => 'netlog'
        port => 10525
    }
syslog {
        type => 'netlog2'
        port => 10526
    }
syslog {
        type => 'netlog2'
        port => 10526
    }
first and foremost, you have a problem there and that is you have this in your config twice

Code: Select all

syslog {
        type => 'netlog2'
        port => 10526
    }
this you should fix right away, and apply configuration.

Next you may want to think about investigating what is coming to 10525 & 10526 to see if it's possible that they are leaving threads open.

But I would start be removing the extra netlog2 entry, you cannot have 2 things listen on the same port
Former Nagios employee
Creator:
Human Design Website
Get Your Human Design Chart
bpizzutiWHI
Posts: 64
Joined: Thu Mar 02, 2017 10:15 am

Re: Nagios log server stops collecting logs

Post by bpizzutiWHI »

I could swear I had edited that to say netlog3 and use the next port in sequence. I'll make the change and keep an eye on it. We're using these three ports for syslogging from network devices, and Cisco doesn't follow the syslog standard so I had to roll my own filter for some of this stuff.
scottwilkerson
DevOps Engineer
Posts: 19396
Joined: Tue Nov 15, 2011 3:11 pm
Location: Nagios Enterprises
Contact:

Re: Nagios log server stops collecting logs

Post by scottwilkerson »

bpizzutiWHI wrote:I could swear I had edited that to say netlog3 and use the next port in sequence. I'll make the change and keep an eye on it. We're using these three ports for syslogging from network devices, and Cisco doesn't follow the syslog standard so I had to roll my own filter for some of this stuff.
Ya that can be irritating with the Ciscos. Let us know how it turns out
Former Nagios employee
Creator:
Human Design Website
Get Your Human Design Chart
Locked