Page 2 of 2

Re: Nagios Log server upgrade failure

Posted: Mon Mar 04, 2024 3:46 pm
by jmichaelson
OK That's the elastgicseach service that's taking up the RAM. what's the output of journactl -xeu elasticsearch.service ?

Re: Nagios Log server upgrade failure

Posted: Mon Mar 04, 2024 5:00 pm
by NMFSTeam
This is the output:

Code: Select all

root@hqnaglogi1:~# journalctl -xeu elasticsearch.service
-- Subject: A start job for unit elasticsearch.service has finished successfully
-- Defined-By: systemd
-- Support: http://www.ubuntu.com/support
--
-- A start job for unit elasticsearch.service has finished successfully.
--
-- The job identifier is 116.
Feb 26 21:29:18 hqnaglogi1 systemd[1]: Stopping LSB: Starts elasticsearch...
-- Subject: A stop job for unit elasticsearch.service has begun execution
-- Defined-By: systemd
-- Support: http://www.ubuntu.com/support
--
-- A stop job for unit elasticsearch.service has begun execution.
--
-- The job identifier is 184144.
Feb 26 21:29:18 hqnaglogi1 elasticsearch[1226647]:  * Stopping Elasticsearch Server
Feb 26 21:29:31 hqnaglogi1 elasticsearch[1226647]:    ...done.
Feb 26 21:29:31 hqnaglogi1 systemd[1]: elasticsearch.service: Succeeded.
-- Subject: Unit succeeded
-- Defined-By: systemd
-- Support: http://www.ubuntu.com/support
--
-- The unit elasticsearch.service has successfully entered the 'dead' state.
Feb 26 21:29:31 hqnaglogi1 systemd[1]: Stopped LSB: Starts elasticsearch.
-- Subject: A stop job for unit elasticsearch.service has finished
-- Defined-By: systemd
-- Support: http://www.ubuntu.com/support
--
-- A stop job for unit elasticsearch.service has finished.
--
-- The job identifier is 184144 and the job result is done.
Feb 26 21:29:31 hqnaglogi1 systemd[1]: Starting LSB: Starts elasticsearch...
-- Subject: A start job for unit elasticsearch.service has begun execution
-- Defined-By: systemd
-- Support: http://www.ubuntu.com/support
--
-- A start job for unit elasticsearch.service has begun execution.
--
-- The job identifier is 184144.
Feb 26 21:29:31 hqnaglogi1 elasticsearch[1226725]:  * Starting Elasticsearch Server
Feb 26 21:29:31 hqnaglogi1 elasticsearch[1226725]:    ...done.
Feb 26 21:29:31 hqnaglogi1 systemd[1]: Started LSB: Starts elasticsearch.
-- Subject: A start job for unit elasticsearch.service has finished successfully
-- Defined-By: systemd
-- Support: http://www.ubuntu.com/support
--
-- A start job for unit elasticsearch.service has finished successfully.
--
-- The job identifier is 184144.

Re: Nagios Log server upgrade failure

Posted: Tue Mar 05, 2024 3:47 pm
by jmichaelson
Is there anything of note in /var/log/elasticsearch? Short of that and having you up the RAM and CPU count on your system (assuming its a VM and you can), the only thing I can say at this point is to open a support ticket. https://answerhub.nagios.com/support/s/

Re: Nagios Log server upgrade failure

Posted: Tue Mar 05, 2024 4:17 pm
by NMFSTeam
Hmm, I see this:

Code: Select all

[2024-03-05 21:13:57,022][DEBUG][action.bulk              ] [c676baf2-bd36-4dc9-9b4f-76d81e2ae08e] [logstash-2024.03.05][2] failed to execute bulk item (index) index {[logstash-2024.03.05][syslog][AY4Qd0oFp5GOWTrlWAMd], source[{"message":"CSMEventServer[23088]: {\"device_uuid\":\"6ed016a2-fc39-11e9-b4a0-bf3606a33547\",\"syslog_time\":1709673236,\"syslog_msg_class\":\"vpn\",\"syslog_severity\":6,\"syslog_msg_id\":602304,\"syslog_msg_text\":\"IPSEC: An outbound LAN-to-LAN SA (SPI= 0x0A2E8F02) between 192.168.1.182 and 192.168.0.2 (user= 192.168.0.2) has been deleted.\",\"client_ip\":\"192.168.1.182\",\"user_name\":\"192.168.0.2\"}\n","@version":"1","@timestamp":"2024-03-05T21:13:56.000Z","type":"syslog","host":"192.168.200.20","priority":158,"timestamp":"Mar  5 21:13:56","logsource":"HQ-FMC","program":"SF-IMS","pid":"11725","severity":6,"facility":19,"facility_label":"local3","severity_label":"Informational"}]}
org.elasticsearch.index.mapper.MapperParsingException: failed to parse [timestamp]
        at org.elasticsearch.index.mapper.core.AbstractFieldMapper.parse(AbstractFieldMapper.java:411)
        at org.elasticsearch.index.mapper.object.ObjectMapper.serializeValue(ObjectMapper.java:706)
        at org.elasticsearch.index.mapper.object.ObjectMapper.parse(ObjectMapper.java:497)
        at org.elasticsearch.index.mapper.DocumentMapper.parse(DocumentMapper.java:544)
        at org.elasticsearch.index.mapper.DocumentMapper.parse(DocumentMapper.java:493)
        at org.elasticsearch.index.shard.IndexShard.prepareCreate(IndexShard.java:465)
        at org.elasticsearch.action.bulk.TransportShardBulkAction.shardIndexOperation(TransportShardBulkAction.java:418)
        at org.elasticsearch.action.bulk.TransportShardBulkAction.shardOperationOnPrimary(TransportShardBulkAction.java:148)
        at org.elasticsearch.action.support.replication.TransportShardReplicationOperationAction$PrimaryPhase.performOnPrimary(TransportShardReplicationOperationAction.java:574)
        at org.elasticsearch.action.support.replication.TransportShardReplicationOperationAction$PrimaryPhase$1.doRun(TransportShardReplicationOperationAction.java:440)
        at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:36)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:750)
Caused by: org.elasticsearch.index.mapper.MapperParsingException: failed to parse date field [Mar  5 21:13:56], tried both date format [dateOptionalTime], and timestamp number with locale []
        at org.elasticsearch.index.mapper.core.DateFieldMapper.parseStringValue(DateFieldMapper.java:617)
        at org.elasticsearch.index.mapper.core.DateFieldMapper.innerParseCreateField(DateFieldMapper.java:535)
        at org.elasticsearch.index.mapper.core.NumberFieldMapper.parseCreateField(NumberFieldMapper.java:239)
        at org.elasticsearch.index.mapper.core.AbstractFieldMapper.parse(AbstractFieldMapper.java:401)
        ... 13 more
Caused by: java.lang.IllegalArgumentException: Invalid format: "Mar  5 21:13:56"
        at org.elasticsearch.common.joda.time.format.DateTimeParserBucket.doParseMillis(DateTimeParserBucket.java:187)
        at org.elasticsearch.common.joda.time.format.DateTimeFormatter.parseMillis(DateTimeFormatter.java:780)
        at org.elasticsearch.index.mapper.core.DateFieldMapper.parseStringValue(DateFieldMapper.java:612)
        ... 16 more
[2024-03-05 21:13:57,022][DEBUG][action.bulk              ] [c676baf2-bd36-4dc9-9b4f-76d81e2ae08e] [logstash-2024.03.05][2] failed to execute bulk item (index) index {[logstash-2024.03.05][syslog][AY4Qd0oFp5GOWTrlWAMi], source[{"message":"CSMEventServer[23088]: *** Got a message: \n","@version":"1","@timestamp":"2024-03-05T21:13:56.000Z","type":"syslog","host":"192.168.200.20","priority":158,"timestamp":"Mar  5 21:13:56","logsource":"HQ-FMC","program":"SF-IMS","pid":"11725","severity":6,"facility":19,"facility_label":"local3","severity_label":"Informational"}]}
org.elasticsearch.index.mapper.MapperParsingException: failed to parse [timestamp]

Re: Nagios Log server upgrade failure

Posted: Wed Mar 06, 2024 4:45 pm
by jmichaelson
I'm going to go on a limb and assume that there's more than one of theses entries in the elasticsearch logs.

With that assumption, is there evidence in the log of the elasticsearch server restarting?

Re: Nagios Log server upgrade failure

Posted: Fri Mar 15, 2024 1:09 pm
by NMFSTeam
I don't see anything like that, but then again this file is huge. It is 18GiB and growing. Anything I can search for in the file?

Re: Nagios Log server upgrade failure

Posted: Mon Mar 18, 2024 1:16 pm
by jmichaelson
I'm interested to see if there's anything that indicates its restarting. Words like start, started. Also check the system logs to see if there's an out of memory manager killing it (OOM in syslog).

Re: Nagios Log server upgrade failure

Posted: Wed Mar 27, 2024 10:27 am
by NMFSTeam
We noticed that AIDE was running on the machine when we ran top, so we looked into it. AIDE is configured to run every day around midnight, and it was still running well into the afternoon. We have updated the AIDE config to exclude Nagios Log Server files, so hopefully it'll finish much sooner, and won't be such a drain on system resources.

I did not see anything about start, restart, etc. or out of memory messages anywhere in the logs. If we continue to have issues, I will open a support ticket. Thank you very much for your time and attention to this.

- the NMFS team

Re: Nagios Log server upgrade failure

Posted: Wed Mar 27, 2024 2:39 pm
by jmichaelson
Hopefully that works for you! Happy to be of help.