Incorrect timestamp in duration for trap

nms · Post by **nms** » Fri Jul 31, 2020 1:46 am

Hi,

We are experiencing quite a strange issue on trap receiver of certain nodes. The common thing is that the traps originating from these nodes are all of the same vendor. However i could not see anything wrong when looking at the timestamp of the traps, but maybe I'm missing something.

When the trap hits the agent (Nagios) we end up seeing a very strange duration as below:

2020-07-31_0834_001.png

When we click on the service (i.e the trap service) we see that the duration is marked as "N/A".

2020-07-31_0835.png

But if we click on the service history we can see that all is fine for the date/time received.

2020-07-31_0835_001.png

What can we provide you in order to understand and correct this issue?

Rgds,
Matthew

Post by **mbellerue** » Fri Jul 31, 2020 5:02 pm

The duration shows the time that the device has been in a particular state. I don't know that it is a valid field for an SNMP trap service check. An SNMP trap service check waits for any trap to be received from a specific device. They may not send the all clear after a critical trap has been thrown. In the service history screenshot, you are seeing the date/time that a trap came in.

All that said, I don't think you should be seeing 18,000 days under the duration field. Can you click on the Service Status Detail for the SNMP trap service, and click on the + icon to show the advanced details. Let me know if both Active and Passive checks are enabled.

ssax · Post by **ssax** » Fri Jul 31, 2020 5:03 pm

Please PM me a copy of your profile, you can download it from Admin > System Profile > Download Profile button.

Additionally, please send the output of these commands (as root):
- NOTE: You may need to adjust the -h 127.0.0.1, the -uroot, and -pnagiosxi in the first command if your DB is offloaded to another server and/or you've changed the root mysql password

Code: Select all

echo "SELECT table_name AS 'Table', round(((data_length + index_length) / 1024 / 1024), 2) 'Size in MB' FROM information_schema.TABLES WHERE table_schema IN ('nagios', 'nagiosql', 'nagiosxi');" | mysql -h 127.0.0.1 -uroot -pnagiosxi --table

Do you see any traps in /var/log/snmptt or in /var/log/messages for these? Can you send us an example so we can see what your traps are sending in so we can see why the timestamp isn't being honored?

What does it show in Admin > SNMP Trap Interface for received traps?

Please run this command an PM me the resulting /tmp/SNMPFILES.zip file:

Code: Select all

zip -r /tmp/SNMPFILES.zip /etc/snmp

Thank you

nms · Post by **nms** » Thu Aug 06, 2020 9:17 am

Hi

Can you click on the Service Status Detail for the SNMP trap service, and click on the + icon to show the advanced details. Let me know if both Active and Passive checks are enabled.

Active Checks are in state disabled, while passive checks are of course found enabled.
I think that is the way it should be since this is a passive check.

Profile sent as a PM. I have sent also the /etc/snmp/ content as a PM.

For the size of the tables, please find attached "tables.txt".

Traps in log /var/log/snmptt are clearly visible with correct timestamp:

Code: Select all

grep am1-hss-master01-p snmptt.log-20200802 | grep -v heartbeat | grep "Jul 30"
Thu Jul 30 12:37:24 2020 .1.3.6.1.4.1.17856.3.1.2.0.1 Critical "Status Events" am1-hss-master01-p - A titanAlarmNotificaton represents  a  potential 07 E4 07 1E 0A 25 17 09 2B 00 00  1 management 170 element bru-sha-hss-hss01 Remote element [bru-sha-hss-hss01] has one or more raised alarms  Navigate to the remote element and see its logs for more details
Thu Jul 30 12:37:24 2020 .1.3.6.1.4.1.17856.3.1.2.0.1 Normal "Status Events" am1-hss-master01-p - A titanAlarmNotificaton represents  a  potential 07 E4 07 1E 0A 25 17 09 2B 00 00  1 management 170 element bru-sha-hss-hss01 Remote element [bru-sha-hss-hss01] has one or more raised alarms  Navigate to the remote element and see its logs for more details
Thu Jul 30 12:37:24 2020 .1.3.6.1.4.1.17856.3.1.2.0.1 Critical "Status Events" am1-hss-master01-p - A titanAlarmNotificaton represents  a  potential 07 E4 07 1E 0A 25 17 09 2B 00 00  5 management 170 element bru-sha-hss-hss01 Remote element [bru-sha-hss-hss01] has one or more raised alarms  Navigate to the remote element and see its logs for more details
Thu Jul 30 12:42:24 2020 .1.3.6.1.4.1.17856.3.1.2.0.1 Critical "Status Events" am1-hss-master01-p - A titanAlarmNotificaton represents  a  potential 07 E4 07 1E 0A 2A 17 09 2B 00 00  1 management 170 element bru-sha-hss-hss01 Remote element [bru-sha-hss-hss01] has one or more raised alarms  Navigate to the remote element and see its logs for more details
Thu Jul 30 12:42:24 2020 .1.3.6.1.4.1.17856.3.1.2.0.1 Normal "Status Events" am1-hss-master01-p - A titanAlarmNotificaton represents  a  potential 07 E4 07 1E 0A 2A 17 09 2B 00 00  1 management 170 element bru-sha-hss-hss01 Remote element [bru-sha-hss-hss01] has one or more raised alarms  Navigate to the remote element and see its logs for more details
Thu Jul 30 12:42:24 2020 .1.3.6.1.4.1.17856.3.1.2.0.1 Critical "Status Events" am1-hss-master01-p - A titanAlarmNotificaton represents  a  potential 07 E4 07 1E 0A 2A 17 09 2B 00 00  4 management 170 element bru-sha-hss-hss01 Remote element [bru-sha-hss-hss01] has one or more raised alarms  Navigate to the remote element and see its logs for more details
Thu Jul 30 12:42:24 2020 .1.3.6.1.4.1.17856.3.1.2.0.1 Warning "Status Events" am1-hss-master01-p - A titanAlarmNotificaton represents  a  potential 07 E4 07 1E 0A 2A 17 09 2B 00 00  4 management 170 element bru-sha-hss-hss01 Remote element [bru-sha-hss-hss01] has one or more raised alarms  Navigate to the remote element and see its logs for more details

As for "Admin > SNMP Trap Interface" i do not have anything configured nor anything in the "Received Traps" tab

Rgds,
Matthew

ssax · Post by **ssax** » Thu Aug 06, 2020 6:26 pm

Please resend the tables.txt, I do not see it.

Please go to Reports > State History:
- Adjust the Period to like this month
- Select the host from the Limit To dropdown
- Select the service
- For Type, select Both
- For State Type, select Both
- Click Run

Please send me the report, you can either download it as a PDF or CSV.

I'm wondering if it's just been in WARNING the entire time. If you click on the service in Home > Service Detail and click the + (advanced) tab, please send a screenshot of that page so we can see what the values show.

nms · Post by **nms** » Fri Aug 07, 2020 3:31 am

Hi,

Please find the tables, state history report, and service detail screenshot attached.

tables.txt

ServiceStatusDetail.png

statehistory.pdf

The warning is set as is since I had configured the snmptt.conf as a warning depending on the SNMP variables matches. See below.

Code: Select all

EVENT titanAlarmNotification .1.3.6.1.4.1.17856.3.1.2.0.1 "Status Events" Warning
FORMAT A titanAlarmNotificaton represents  a  potential $*
EXEC /usr/local/bin/snmptraphandling.py "$r" "SNMP Traps Management" "$s" " " "RepairAction: $8 NOTE: Actual time of node is in UTC!" "SNMPTrap: WARNING: ProbableCause: $6"
MATCH $2: 2-4
MATCH $3: (management)
MATCH MODE=and
SDESC
A titanAlarmNotificaton represents  a  potential
or actual service  affecting  condition  that  is
detected by the application. This trap  signifies
the occurrence of an action and/or condition  that
is significant to  administrative  users  of  the
system.   When  the  condition  is  resolved,  an
identical trap with a severity of CLEAR is sent.
Variables:
  1: titanAlarmTimestamp
  2: titanAlarmSeverity
  3: titanAlarmSubsystem
  4: titanAlarmId
  5: titanAlarmResource
  6: titanAlarmProbableCause
  7: titanAlarmAdditionalText
  8: titanAlarmRepairAction
EDESC
#

EVENT titanAlarmNotification .1.3.6.1.4.1.17856.3.1.2.0.1 "Status Events" Normal
FORMAT A titanAlarmNotificaton represents  a  potential $*
EXEC /usr/local/bin/snmptraphandling.py "$r" "SNMP Traps Management" "$s" " " "RepairAction: $8 NOTE: Actual time of node is in UTC!" "SNMPTrap: CLEARED: ProbableCause: $6"
MATCH $2: 1
MATCH $3: (management)
MATCH MODE=and
SDESC
A titanAlarmNotificaton represents  a  potential
or actual service  affecting  condition  that  is
detected by the application. This trap  signifies
the occurrence of an action and/or condition  that
is significant to  administrative  users  of  the
system.   When  the  condition  is  resolved,  an
identical trap with a severity of CLEAR is sent.
Variables:
  1: titanAlarmTimestamp
  2: titanAlarmSeverity
  3: titanAlarmSubsystem
  4: titanAlarmId
  5: titanAlarmResource
  6: titanAlarmProbableCause
  7: titanAlarmAdditionalText
  8: titanAlarmRepairAction
EDESC
#

Rgds,
Matthew

benjaminsmith · Post by **benjaminsmith** » Fri Aug 07, 2020 5:10 pm

Hi Mathew,

The database tables look ok and the state of the service did change so I want to confirm if this is a database/php issue or not. Let's log into the Nagios Core interface on this server and check the duration values to verify any discrepancy.

Login into to http://ipaddress/nagios, click on Services on the left-hand side, click on the trap service and take look at the Service State Information table, does it match what you're seeing in the XI interface?

Also, can you send over a fresh system profile? I would like to view the current logs. Thanks, Benjamin

To send us your system profile.
Login to the Nagios XI GUI using a web browser.
Click the "Admin" > "System Profile" Menu
Click the "Download Profile" button
Save the profile.zip file and share in a private message or upload it to the post/ticket, and then reply to this post to bring it up in the queue.

nms · Post by **nms** » Mon Aug 10, 2020 2:03 am

Hi Benjamin,

Indeed from core, it's the 1970 timestamp.

Trap.png

Attached is also a fresh profile (PM).

Rgds,

benjaminsmith · Post by **benjaminsmith** » Mon Aug 10, 2020 5:11 pm

Hi,

Thanks for verifying that information in Nagios Core, it looks to be a configuration issue. Let's edit the snmptt.conf /etc/snmp/snmptt.conf

Change this line from

Code: Select all

EXEC /usr/local/bin/snmptraphandling.py "$r" "SNMP Traps Management" "$s" " " "RepairAction: $8 NOTE: Actual time of node is in UTC!" "SNMPTrap: WARNING: ProbableCause: $6"

To:

Code: Select all

EXEC /usr/local/bin/snmptraphandling.py "$r" "SNMP Traps Management" "$s" "$@" "$-*" "RepairAction: $8 NOTE: Actual time of node is in UTC! SNMPTrap: WARNING: ProbableCause: $6"

Save the change and restart snmptt

Code: Select all

systemctl restart snmptt

Let me know if the issue is resolved. You may have to let it run for a few minutes.

$@ - Number of seconds since the epoch of when the trap was spooled (daemon mode) or the current time (standalone mode)

nms · Post by **nms** » Wed Aug 12, 2020 8:44 am

Hi Benjamin,

That was indeed the issue!

After the change, I noticed the correct timestamp entry.

This ticket can be closed.

Thanks!
Matthew

Nagios Support Forum

Incorrect timestamp in duration for trap

Incorrect timestamp in duration for trap

Re: Incorrect timestamp in duration for trap

Re: Incorrect timestamp in duration for trap

Re: Incorrect timestamp in duration for trap

Re: Incorrect timestamp in duration for trap

Re: Incorrect timestamp in duration for trap

Re: Incorrect timestamp in duration for trap

Re: Incorrect timestamp in duration for trap

Re: Incorrect timestamp in duration for trap

Re: Incorrect timestamp in duration for trap