Nagios Log server upgrade failure

This support forum board is for support questions relating to Nagios Log Server, our solution for managing and monitoring critical log data.
User avatar
jmichaelson
Posts: 241
Joined: Wed Aug 23, 2023 1:02 pm

Re: Nagios Log server upgrade failure

Post by jmichaelson »

OK That's the elastgicseach service that's taking up the RAM. what's the output of journactl -xeu elasticsearch.service ?
Please let us know if you have any other questions or concerns.

-Jason
NMFSTeam
Posts: 88
Joined: Thu Nov 12, 2015 9:01 am

Re: Nagios Log server upgrade failure

Post by NMFSTeam »

This is the output:

Code: Select all

root@hqnaglogi1:~# journalctl -xeu elasticsearch.service
-- Subject: A start job for unit elasticsearch.service has finished successfully
-- Defined-By: systemd
-- Support: http://www.ubuntu.com/support
--
-- A start job for unit elasticsearch.service has finished successfully.
--
-- The job identifier is 116.
Feb 26 21:29:18 hqnaglogi1 systemd[1]: Stopping LSB: Starts elasticsearch...
-- Subject: A stop job for unit elasticsearch.service has begun execution
-- Defined-By: systemd
-- Support: http://www.ubuntu.com/support
--
-- A stop job for unit elasticsearch.service has begun execution.
--
-- The job identifier is 184144.
Feb 26 21:29:18 hqnaglogi1 elasticsearch[1226647]:  * Stopping Elasticsearch Server
Feb 26 21:29:31 hqnaglogi1 elasticsearch[1226647]:    ...done.
Feb 26 21:29:31 hqnaglogi1 systemd[1]: elasticsearch.service: Succeeded.
-- Subject: Unit succeeded
-- Defined-By: systemd
-- Support: http://www.ubuntu.com/support
--
-- The unit elasticsearch.service has successfully entered the 'dead' state.
Feb 26 21:29:31 hqnaglogi1 systemd[1]: Stopped LSB: Starts elasticsearch.
-- Subject: A stop job for unit elasticsearch.service has finished
-- Defined-By: systemd
-- Support: http://www.ubuntu.com/support
--
-- A stop job for unit elasticsearch.service has finished.
--
-- The job identifier is 184144 and the job result is done.
Feb 26 21:29:31 hqnaglogi1 systemd[1]: Starting LSB: Starts elasticsearch...
-- Subject: A start job for unit elasticsearch.service has begun execution
-- Defined-By: systemd
-- Support: http://www.ubuntu.com/support
--
-- A start job for unit elasticsearch.service has begun execution.
--
-- The job identifier is 184144.
Feb 26 21:29:31 hqnaglogi1 elasticsearch[1226725]:  * Starting Elasticsearch Server
Feb 26 21:29:31 hqnaglogi1 elasticsearch[1226725]:    ...done.
Feb 26 21:29:31 hqnaglogi1 systemd[1]: Started LSB: Starts elasticsearch.
-- Subject: A start job for unit elasticsearch.service has finished successfully
-- Defined-By: systemd
-- Support: http://www.ubuntu.com/support
--
-- A start job for unit elasticsearch.service has finished successfully.
--
-- The job identifier is 184144.
User avatar
jmichaelson
Posts: 241
Joined: Wed Aug 23, 2023 1:02 pm

Re: Nagios Log server upgrade failure

Post by jmichaelson »

Is there anything of note in /var/log/elasticsearch? Short of that and having you up the RAM and CPU count on your system (assuming its a VM and you can), the only thing I can say at this point is to open a support ticket. https://answerhub.nagios.com/support/s/
Please let us know if you have any other questions or concerns.

-Jason
NMFSTeam
Posts: 88
Joined: Thu Nov 12, 2015 9:01 am

Re: Nagios Log server upgrade failure

Post by NMFSTeam »

Hmm, I see this:

Code: Select all

[2024-03-05 21:13:57,022][DEBUG][action.bulk              ] [c676baf2-bd36-4dc9-9b4f-76d81e2ae08e] [logstash-2024.03.05][2] failed to execute bulk item (index) index {[logstash-2024.03.05][syslog][AY4Qd0oFp5GOWTrlWAMd], source[{"message":"CSMEventServer[23088]: {\"device_uuid\":\"6ed016a2-fc39-11e9-b4a0-bf3606a33547\",\"syslog_time\":1709673236,\"syslog_msg_class\":\"vpn\",\"syslog_severity\":6,\"syslog_msg_id\":602304,\"syslog_msg_text\":\"IPSEC: An outbound LAN-to-LAN SA (SPI= 0x0A2E8F02) between 192.168.1.182 and 192.168.0.2 (user= 192.168.0.2) has been deleted.\",\"client_ip\":\"192.168.1.182\",\"user_name\":\"192.168.0.2\"}\n","@version":"1","@timestamp":"2024-03-05T21:13:56.000Z","type":"syslog","host":"192.168.200.20","priority":158,"timestamp":"Mar  5 21:13:56","logsource":"HQ-FMC","program":"SF-IMS","pid":"11725","severity":6,"facility":19,"facility_label":"local3","severity_label":"Informational"}]}
org.elasticsearch.index.mapper.MapperParsingException: failed to parse [timestamp]
        at org.elasticsearch.index.mapper.core.AbstractFieldMapper.parse(AbstractFieldMapper.java:411)
        at org.elasticsearch.index.mapper.object.ObjectMapper.serializeValue(ObjectMapper.java:706)
        at org.elasticsearch.index.mapper.object.ObjectMapper.parse(ObjectMapper.java:497)
        at org.elasticsearch.index.mapper.DocumentMapper.parse(DocumentMapper.java:544)
        at org.elasticsearch.index.mapper.DocumentMapper.parse(DocumentMapper.java:493)
        at org.elasticsearch.index.shard.IndexShard.prepareCreate(IndexShard.java:465)
        at org.elasticsearch.action.bulk.TransportShardBulkAction.shardIndexOperation(TransportShardBulkAction.java:418)
        at org.elasticsearch.action.bulk.TransportShardBulkAction.shardOperationOnPrimary(TransportShardBulkAction.java:148)
        at org.elasticsearch.action.support.replication.TransportShardReplicationOperationAction$PrimaryPhase.performOnPrimary(TransportShardReplicationOperationAction.java:574)
        at org.elasticsearch.action.support.replication.TransportShardReplicationOperationAction$PrimaryPhase$1.doRun(TransportShardReplicationOperationAction.java:440)
        at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:36)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:750)
Caused by: org.elasticsearch.index.mapper.MapperParsingException: failed to parse date field [Mar  5 21:13:56], tried both date format [dateOptionalTime], and timestamp number with locale []
        at org.elasticsearch.index.mapper.core.DateFieldMapper.parseStringValue(DateFieldMapper.java:617)
        at org.elasticsearch.index.mapper.core.DateFieldMapper.innerParseCreateField(DateFieldMapper.java:535)
        at org.elasticsearch.index.mapper.core.NumberFieldMapper.parseCreateField(NumberFieldMapper.java:239)
        at org.elasticsearch.index.mapper.core.AbstractFieldMapper.parse(AbstractFieldMapper.java:401)
        ... 13 more
Caused by: java.lang.IllegalArgumentException: Invalid format: "Mar  5 21:13:56"
        at org.elasticsearch.common.joda.time.format.DateTimeParserBucket.doParseMillis(DateTimeParserBucket.java:187)
        at org.elasticsearch.common.joda.time.format.DateTimeFormatter.parseMillis(DateTimeFormatter.java:780)
        at org.elasticsearch.index.mapper.core.DateFieldMapper.parseStringValue(DateFieldMapper.java:612)
        ... 16 more
[2024-03-05 21:13:57,022][DEBUG][action.bulk              ] [c676baf2-bd36-4dc9-9b4f-76d81e2ae08e] [logstash-2024.03.05][2] failed to execute bulk item (index) index {[logstash-2024.03.05][syslog][AY4Qd0oFp5GOWTrlWAMi], source[{"message":"CSMEventServer[23088]: *** Got a message: \n","@version":"1","@timestamp":"2024-03-05T21:13:56.000Z","type":"syslog","host":"192.168.200.20","priority":158,"timestamp":"Mar  5 21:13:56","logsource":"HQ-FMC","program":"SF-IMS","pid":"11725","severity":6,"facility":19,"facility_label":"local3","severity_label":"Informational"}]}
org.elasticsearch.index.mapper.MapperParsingException: failed to parse [timestamp]
User avatar
jmichaelson
Posts: 241
Joined: Wed Aug 23, 2023 1:02 pm

Re: Nagios Log server upgrade failure

Post by jmichaelson »

I'm going to go on a limb and assume that there's more than one of theses entries in the elasticsearch logs.

With that assumption, is there evidence in the log of the elasticsearch server restarting?
Please let us know if you have any other questions or concerns.

-Jason
NMFSTeam
Posts: 88
Joined: Thu Nov 12, 2015 9:01 am

Re: Nagios Log server upgrade failure

Post by NMFSTeam »

I don't see anything like that, but then again this file is huge. It is 18GiB and growing. Anything I can search for in the file?
User avatar
jmichaelson
Posts: 241
Joined: Wed Aug 23, 2023 1:02 pm

Re: Nagios Log server upgrade failure

Post by jmichaelson »

I'm interested to see if there's anything that indicates its restarting. Words like start, started. Also check the system logs to see if there's an out of memory manager killing it (OOM in syslog).
Please let us know if you have any other questions or concerns.

-Jason
NMFSTeam
Posts: 88
Joined: Thu Nov 12, 2015 9:01 am

Re: Nagios Log server upgrade failure

Post by NMFSTeam »

We noticed that AIDE was running on the machine when we ran top, so we looked into it. AIDE is configured to run every day around midnight, and it was still running well into the afternoon. We have updated the AIDE config to exclude Nagios Log Server files, so hopefully it'll finish much sooner, and won't be such a drain on system resources.

I did not see anything about start, restart, etc. or out of memory messages anywhere in the logs. If we continue to have issues, I will open a support ticket. Thank you very much for your time and attention to this.

- the NMFS team
User avatar
jmichaelson
Posts: 241
Joined: Wed Aug 23, 2023 1:02 pm

Re: Nagios Log server upgrade failure

Post by jmichaelson »

Hopefully that works for you! Happy to be of help.
Please let us know if you have any other questions or concerns.

-Jason
Post Reply