Nagios Log server upgrade failure
- jmichaelson
- Posts: 241
- Joined: Wed Aug 23, 2023 1:02 pm
Re: Nagios Log server upgrade failure
OK That's the elastgicseach service that's taking up the RAM. what's the output of journactl -xeu elasticsearch.service ?
Please let us know if you have any other questions or concerns.
-Jason
-Jason
Re: Nagios Log server upgrade failure
This is the output:
Code: Select all
root@hqnaglogi1:~# journalctl -xeu elasticsearch.service
-- Subject: A start job for unit elasticsearch.service has finished successfully
-- Defined-By: systemd
-- Support: http://www.ubuntu.com/support
--
-- A start job for unit elasticsearch.service has finished successfully.
--
-- The job identifier is 116.
Feb 26 21:29:18 hqnaglogi1 systemd[1]: Stopping LSB: Starts elasticsearch...
-- Subject: A stop job for unit elasticsearch.service has begun execution
-- Defined-By: systemd
-- Support: http://www.ubuntu.com/support
--
-- A stop job for unit elasticsearch.service has begun execution.
--
-- The job identifier is 184144.
Feb 26 21:29:18 hqnaglogi1 elasticsearch[1226647]: * Stopping Elasticsearch Server
Feb 26 21:29:31 hqnaglogi1 elasticsearch[1226647]: ...done.
Feb 26 21:29:31 hqnaglogi1 systemd[1]: elasticsearch.service: Succeeded.
-- Subject: Unit succeeded
-- Defined-By: systemd
-- Support: http://www.ubuntu.com/support
--
-- The unit elasticsearch.service has successfully entered the 'dead' state.
Feb 26 21:29:31 hqnaglogi1 systemd[1]: Stopped LSB: Starts elasticsearch.
-- Subject: A stop job for unit elasticsearch.service has finished
-- Defined-By: systemd
-- Support: http://www.ubuntu.com/support
--
-- A stop job for unit elasticsearch.service has finished.
--
-- The job identifier is 184144 and the job result is done.
Feb 26 21:29:31 hqnaglogi1 systemd[1]: Starting LSB: Starts elasticsearch...
-- Subject: A start job for unit elasticsearch.service has begun execution
-- Defined-By: systemd
-- Support: http://www.ubuntu.com/support
--
-- A start job for unit elasticsearch.service has begun execution.
--
-- The job identifier is 184144.
Feb 26 21:29:31 hqnaglogi1 elasticsearch[1226725]: * Starting Elasticsearch Server
Feb 26 21:29:31 hqnaglogi1 elasticsearch[1226725]: ...done.
Feb 26 21:29:31 hqnaglogi1 systemd[1]: Started LSB: Starts elasticsearch.
-- Subject: A start job for unit elasticsearch.service has finished successfully
-- Defined-By: systemd
-- Support: http://www.ubuntu.com/support
--
-- A start job for unit elasticsearch.service has finished successfully.
--
-- The job identifier is 184144.
- jmichaelson
- Posts: 241
- Joined: Wed Aug 23, 2023 1:02 pm
Re: Nagios Log server upgrade failure
Is there anything of note in /var/log/elasticsearch? Short of that and having you up the RAM and CPU count on your system (assuming its a VM and you can), the only thing I can say at this point is to open a support ticket. https://answerhub.nagios.com/support/s/
Please let us know if you have any other questions or concerns.
-Jason
-Jason
Re: Nagios Log server upgrade failure
Hmm, I see this:
Code: Select all
[2024-03-05 21:13:57,022][DEBUG][action.bulk ] [c676baf2-bd36-4dc9-9b4f-76d81e2ae08e] [logstash-2024.03.05][2] failed to execute bulk item (index) index {[logstash-2024.03.05][syslog][AY4Qd0oFp5GOWTrlWAMd], source[{"message":"CSMEventServer[23088]: {\"device_uuid\":\"6ed016a2-fc39-11e9-b4a0-bf3606a33547\",\"syslog_time\":1709673236,\"syslog_msg_class\":\"vpn\",\"syslog_severity\":6,\"syslog_msg_id\":602304,\"syslog_msg_text\":\"IPSEC: An outbound LAN-to-LAN SA (SPI= 0x0A2E8F02) between 192.168.1.182 and 192.168.0.2 (user= 192.168.0.2) has been deleted.\",\"client_ip\":\"192.168.1.182\",\"user_name\":\"192.168.0.2\"}\n","@version":"1","@timestamp":"2024-03-05T21:13:56.000Z","type":"syslog","host":"192.168.200.20","priority":158,"timestamp":"Mar 5 21:13:56","logsource":"HQ-FMC","program":"SF-IMS","pid":"11725","severity":6,"facility":19,"facility_label":"local3","severity_label":"Informational"}]}
org.elasticsearch.index.mapper.MapperParsingException: failed to parse [timestamp]
at org.elasticsearch.index.mapper.core.AbstractFieldMapper.parse(AbstractFieldMapper.java:411)
at org.elasticsearch.index.mapper.object.ObjectMapper.serializeValue(ObjectMapper.java:706)
at org.elasticsearch.index.mapper.object.ObjectMapper.parse(ObjectMapper.java:497)
at org.elasticsearch.index.mapper.DocumentMapper.parse(DocumentMapper.java:544)
at org.elasticsearch.index.mapper.DocumentMapper.parse(DocumentMapper.java:493)
at org.elasticsearch.index.shard.IndexShard.prepareCreate(IndexShard.java:465)
at org.elasticsearch.action.bulk.TransportShardBulkAction.shardIndexOperation(TransportShardBulkAction.java:418)
at org.elasticsearch.action.bulk.TransportShardBulkAction.shardOperationOnPrimary(TransportShardBulkAction.java:148)
at org.elasticsearch.action.support.replication.TransportShardReplicationOperationAction$PrimaryPhase.performOnPrimary(TransportShardReplicationOperationAction.java:574)
at org.elasticsearch.action.support.replication.TransportShardReplicationOperationAction$PrimaryPhase$1.doRun(TransportShardReplicationOperationAction.java:440)
at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:36)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:750)
Caused by: org.elasticsearch.index.mapper.MapperParsingException: failed to parse date field [Mar 5 21:13:56], tried both date format [dateOptionalTime], and timestamp number with locale []
at org.elasticsearch.index.mapper.core.DateFieldMapper.parseStringValue(DateFieldMapper.java:617)
at org.elasticsearch.index.mapper.core.DateFieldMapper.innerParseCreateField(DateFieldMapper.java:535)
at org.elasticsearch.index.mapper.core.NumberFieldMapper.parseCreateField(NumberFieldMapper.java:239)
at org.elasticsearch.index.mapper.core.AbstractFieldMapper.parse(AbstractFieldMapper.java:401)
... 13 more
Caused by: java.lang.IllegalArgumentException: Invalid format: "Mar 5 21:13:56"
at org.elasticsearch.common.joda.time.format.DateTimeParserBucket.doParseMillis(DateTimeParserBucket.java:187)
at org.elasticsearch.common.joda.time.format.DateTimeFormatter.parseMillis(DateTimeFormatter.java:780)
at org.elasticsearch.index.mapper.core.DateFieldMapper.parseStringValue(DateFieldMapper.java:612)
... 16 more
[2024-03-05 21:13:57,022][DEBUG][action.bulk ] [c676baf2-bd36-4dc9-9b4f-76d81e2ae08e] [logstash-2024.03.05][2] failed to execute bulk item (index) index {[logstash-2024.03.05][syslog][AY4Qd0oFp5GOWTrlWAMi], source[{"message":"CSMEventServer[23088]: *** Got a message: \n","@version":"1","@timestamp":"2024-03-05T21:13:56.000Z","type":"syslog","host":"192.168.200.20","priority":158,"timestamp":"Mar 5 21:13:56","logsource":"HQ-FMC","program":"SF-IMS","pid":"11725","severity":6,"facility":19,"facility_label":"local3","severity_label":"Informational"}]}
org.elasticsearch.index.mapper.MapperParsingException: failed to parse [timestamp]
- jmichaelson
- Posts: 241
- Joined: Wed Aug 23, 2023 1:02 pm
Re: Nagios Log server upgrade failure
I'm going to go on a limb and assume that there's more than one of theses entries in the elasticsearch logs.
With that assumption, is there evidence in the log of the elasticsearch server restarting?
With that assumption, is there evidence in the log of the elasticsearch server restarting?
Please let us know if you have any other questions or concerns.
-Jason
-Jason
Re: Nagios Log server upgrade failure
I don't see anything like that, but then again this file is huge. It is 18GiB and growing. Anything I can search for in the file?
- jmichaelson
- Posts: 241
- Joined: Wed Aug 23, 2023 1:02 pm
Re: Nagios Log server upgrade failure
I'm interested to see if there's anything that indicates its restarting. Words like start, started. Also check the system logs to see if there's an out of memory manager killing it (OOM in syslog).
Please let us know if you have any other questions or concerns.
-Jason
-Jason
Re: Nagios Log server upgrade failure
We noticed that AIDE was running on the machine when we ran top, so we looked into it. AIDE is configured to run every day around midnight, and it was still running well into the afternoon. We have updated the AIDE config to exclude Nagios Log Server files, so hopefully it'll finish much sooner, and won't be such a drain on system resources.
I did not see anything about start, restart, etc. or out of memory messages anywhere in the logs. If we continue to have issues, I will open a support ticket. Thank you very much for your time and attention to this.
- the NMFS team
I did not see anything about start, restart, etc. or out of memory messages anywhere in the logs. If we continue to have issues, I will open a support ticket. Thank you very much for your time and attention to this.
- the NMFS team
- jmichaelson
- Posts: 241
- Joined: Wed Aug 23, 2023 1:02 pm
Re: Nagios Log server upgrade failure
Hopefully that works for you! Happy to be of help.
Please let us know if you have any other questions or concerns.
-Jason
-Jason