no alert triggered on service down

This support forum board is for support questions relating to Nagios XI, our flagship commercial network monitoring solution.
nky1986
Posts: 21
Joined: Fri Oct 19, 2012 6:18 am

Re: no alert triggered on service down

Post by nky1986 »

one more point

if you see the nagios logs or events for that day, the service status never went into soft 1-2-3... states, that also suggests that service check was missed and as a result, there was no alert

but surprisingly, the graphs were plotted properly
User avatar
tgriep
Madmin
Posts: 9190
Joined: Thu Oct 30, 2014 9:02 am

Re: no alert triggered on service down

Post by tgriep »

I saw the same thing. I did not see any entries on the service being down that day.
One possible cause for what happened is a stuck nagios process running.
When there are 2 nagios daemons running, they conflict with each other and cause the issue you had but without having access to the server on that day, it is hard to determine if that was true.
Be sure to check out our Knowledgebase for helpful articles and solutions!
nky1986
Posts: 21
Joined: Fri Oct 19, 2012 6:18 am

Re: no alert triggered on service down

Post by nky1986 »

what is the next step?

we need to find out RCA of this issue and fix it.
npolovenko
Support Tech
Posts: 3457
Joined: Mon May 15, 2017 5:00 pm

Re: no alert triggered on service down

Post by npolovenko »

@nky1986, Let's try this. Can you open a Service Status Detail page for the service in question and go to the advanced tab. After that, please submit a critical passive check result 3 times so that the service check gets into a hard-critical state. How much time does it take till you receive an email alert?
As of May 25th, 2018, all communications with Nagios Enterprises and its employees are covered under our new Privacy Policy.
nky1986
Posts: 21
Joined: Fri Oct 19, 2012 6:18 am

Re: no alert triggered on service down

Post by nky1986 »

when i try to do that, there are four options under check result i.e. OK, WARNING, UNKNOWN, CRITICAL. what shall i choose here?
dwhitfield
Former Nagios Staff
Posts: 4583
Joined: Wed Sep 21, 2016 10:29 am
Location: NoLo, Minneapolis, MN
Contact:

Re: no alert triggered on service down

Post by dwhitfield »

npolovenko wrote: submit a critical passive check result 3 times
Critical is what was suggested, but ultimately it doesn't matter exactly. You just need to select one that will send notifications. Thus, if it's OK now, sending another OK is not going to trigger a notification. Critical is certainly the most likely to cause a notification.
nky1986
Posts: 21
Joined: Fri Oct 19, 2012 6:18 am

Re: no alert triggered on service down

Post by nky1986 »

i have submitted 3 critical passive checks and below are the current settings:

Checks are configured as:
- Check interval: 140
- Retry interval: 70
- Max check attempts: 3

so when should I expect the alert notification?
dwhitfield
Former Nagios Staff
Posts: 4583
Joined: Wed Sep 21, 2016 10:29 am
Location: NoLo, Minneapolis, MN
Contact:

Re: no alert triggered on service down

Post by dwhitfield »

If you submit 3 (and they show up in the nagios.log), it should come out immediately unless you have a notification delay. I see now that you have postgres so the repair I had you run earlier is less likely to have resolved the issue. Let's try the following.

The following commands are different if you are using a version of PostgreSQL before v9. To determine which version you have execute the following command:

Code: Select all

postgres -V
Based on that output, execute the commands specific to your version:

Versions BEFORE 9

Code: Select all

echo "vacuum;vacuum analyze;"|psql nagiosxi postgres
service postgresql restart


Versions 9 onwards

Code: Select all

echo "vacuum;vacuum analyze;vacuum full;"|psql nagiosxi postgres
service postgresql restart




To log in the postgres manually, run:

Code: Select all

psql nagiosxi nagiosxi


To view the tables, run:

Code: Select all

\d


and to exit:

Code: Select all

\q


If you tried to run the vacuum on the posgres or you attempted to log in manually in the database, but you see the following error message:

Code: Select all

psql: FATAL:  database is not accepting commands to avoid wraparound data loss in database "postgres"
HINT:  Stop the postmaster and use a standalone backend to vacuum database "postgres".


You may notice either a high CPU usage for the postmaster process, or a repeated error message in the /var/lib/pgsql/data/pg_log file:

transaction ID wrap limit is 2147484146



You can try to fix the issue by running the following command in the command line:

Important: Run the commands one-by-one (don't run them with one go!)

Versions BEFORE PostgreSQL 9

Code: Select all

service postgresql stop
su postgres
echo "VACUUM;" > /tmp/fix.sql
postgres -D /var/lib/pgsql/data nagiosxi < /tmp/fix.sql
postgres -D /var/lib/pgsql/data postgres < /tmp/fix.sql
postgres -D /var/lib/pgsql/data template1 < /tmp/fix.sql
exit
service postgresql start


Note: The commands listed above may not work with some versions of PosgreSQL. If you see the following error:

Code: Select all

postgres: invalid argument: "nagiosxi"


You will need to run the following commands instead:

Code: Select all

service postgresql stop
su postgres
echo "VACUUM;" > /tmp/fix.sql
postgres --single -D /var/lib/pgsql/data nagiosxi < /tmp/fix.sql
postgres --single -D /var/lib/pgsql/data postgres < /tmp/fix.sql
postgres --single -D /var/lib/pgsql/data template1 < /tmp/fix.sql
exit
service postgresql start


Versions 9 Onwards

Code: Select all

service postgresql stop
su postgres
echo "VACUUM FULL;" > /tmp/fix.sql
postgres -D /var/lib/pgsql/data nagiosxi < /tmp/fix.sql
postgres -D /var/lib/pgsql/data postgres < /tmp/fix.sql
postgres -D /var/lib/pgsql/data template1 < /tmp/fix.sql
exit
service postgresql start


Note: The commands listed above may not work with some versions of PosgreSQL. If you see the following error:

Code: Select all

postgres: invalid argument: "nagiosxi"


You will need to run the following commands instead:

Code: Select all

service postgresql stop
su postgres
echo "VACUUM FULL;" > /tmp/fix.sql
postgres --single -D /var/lib/pgsql/data nagiosxi < /tmp/fix.sql
postgres --single -D /var/lib/pgsql/data postgres < /tmp/fix.sql
postgres --single -D /var/lib/pgsql/data template1 < /tmp/fix.sql
exit
service postgresql start
nky1986
Posts: 21
Joined: Fri Oct 19, 2012 6:18 am

Re: no alert triggered on service down

Post by nky1986 »

we are running with postgres 8.4.20 and i performed below operations

[root@Spectraguard ~]# echo "vacuum;vacuum analyze;"|psql nagiosxi postgres
VACUUM
VACUUM


[root@Spectraguard ~]# service postgresql restart
Stopping postgresql service: [ OK ]
Starting postgresql service: [ OK ]

[root@Spectraguard ~]# psql nagiosxi nagiosxi
\d
\q


i didn't face any error during all that.

shall i submit critical passive checks now?
dwhitfield
Former Nagios Staff
Posts: 4583
Joined: Wed Sep 21, 2016 10:29 am
Location: NoLo, Minneapolis, MN
Contact:

Re: no alert triggered on service down

Post by dwhitfield »

nky1986 wrote:
shall i submit critical passive checks now?
Yes.

When you did this before, did they should up in the nagios.log? What happens if you change your notification command? Have your notifications ever worked on this server? Was there an upgrade in between it working and not working?
Locked