no alert triggered on service down
Re: no alert triggered on service down
one more point
if you see the nagios logs or events for that day, the service status never went into soft 1-2-3... states, that also suggests that service check was missed and as a result, there was no alert
but surprisingly, the graphs were plotted properly
if you see the nagios logs or events for that day, the service status never went into soft 1-2-3... states, that also suggests that service check was missed and as a result, there was no alert
but surprisingly, the graphs were plotted properly
Re: no alert triggered on service down
I saw the same thing. I did not see any entries on the service being down that day.
One possible cause for what happened is a stuck nagios process running.
When there are 2 nagios daemons running, they conflict with each other and cause the issue you had but without having access to the server on that day, it is hard to determine if that was true.
One possible cause for what happened is a stuck nagios process running.
When there are 2 nagios daemons running, they conflict with each other and cause the issue you had but without having access to the server on that day, it is hard to determine if that was true.
Be sure to check out our Knowledgebase for helpful articles and solutions!
Re: no alert triggered on service down
what is the next step?
we need to find out RCA of this issue and fix it.
we need to find out RCA of this issue and fix it.
-
npolovenko
- Support Tech
- Posts: 3457
- Joined: Mon May 15, 2017 5:00 pm
Re: no alert triggered on service down
@nky1986, Let's try this. Can you open a Service Status Detail page for the service in question and go to the advanced tab. After that, please submit a critical passive check result 3 times so that the service check gets into a hard-critical state. How much time does it take till you receive an email alert?
As of May 25th, 2018, all communications with Nagios Enterprises and its employees are covered under our new Privacy Policy.
Re: no alert triggered on service down
when i try to do that, there are four options under check result i.e. OK, WARNING, UNKNOWN, CRITICAL. what shall i choose here?
-
dwhitfield
- Former Nagios Staff
- Posts: 4583
- Joined: Wed Sep 21, 2016 10:29 am
- Location: NoLo, Minneapolis, MN
- Contact:
Re: no alert triggered on service down
Critical is what was suggested, but ultimately it doesn't matter exactly. You just need to select one that will send notifications. Thus, if it's OK now, sending another OK is not going to trigger a notification. Critical is certainly the most likely to cause a notification.npolovenko wrote: submit a critical passive check result 3 times
Re: no alert triggered on service down
i have submitted 3 critical passive checks and below are the current settings:
Checks are configured as:
- Check interval: 140
- Retry interval: 70
- Max check attempts: 3
so when should I expect the alert notification?
Checks are configured as:
- Check interval: 140
- Retry interval: 70
- Max check attempts: 3
so when should I expect the alert notification?
-
dwhitfield
- Former Nagios Staff
- Posts: 4583
- Joined: Wed Sep 21, 2016 10:29 am
- Location: NoLo, Minneapolis, MN
- Contact:
Re: no alert triggered on service down
If you submit 3 (and they show up in the nagios.log), it should come out immediately unless you have a notification delay. I see now that you have postgres so the repair I had you run earlier is less likely to have resolved the issue. Let's try the following.
The following commands are different if you are using a version of PostgreSQL before v9. To determine which version you have execute the following command:
Based on that output, execute the commands specific to your version:
Versions BEFORE 9
Versions 9 onwards
To log in the postgres manually, run:
To view the tables, run:
and to exit:
If you tried to run the vacuum on the posgres or you attempted to log in manually in the database, but you see the following error message:
You may notice either a high CPU usage for the postmaster process, or a repeated error message in the /var/lib/pgsql/data/pg_log file:
transaction ID wrap limit is 2147484146
You can try to fix the issue by running the following command in the command line:
Important: Run the commands one-by-one (don't run them with one go!)
Versions BEFORE PostgreSQL 9
Note: The commands listed above may not work with some versions of PosgreSQL. If you see the following error:
You will need to run the following commands instead:
Versions 9 Onwards
Note: The commands listed above may not work with some versions of PosgreSQL. If you see the following error:
You will need to run the following commands instead:
The following commands are different if you are using a version of PostgreSQL before v9. To determine which version you have execute the following command:
Code: Select all
postgres -VVersions BEFORE 9
Code: Select all
echo "vacuum;vacuum analyze;"|psql nagiosxi postgres
service postgresql restartVersions 9 onwards
Code: Select all
echo "vacuum;vacuum analyze;vacuum full;"|psql nagiosxi postgres
service postgresql restartTo log in the postgres manually, run:
Code: Select all
psql nagiosxi nagiosxiTo view the tables, run:
Code: Select all
\dand to exit:
Code: Select all
\q
If you tried to run the vacuum on the posgres or you attempted to log in manually in the database, but you see the following error message:
Code: Select all
psql: FATAL: database is not accepting commands to avoid wraparound data loss in database "postgres"
HINT: Stop the postmaster and use a standalone backend to vacuum database "postgres".You may notice either a high CPU usage for the postmaster process, or a repeated error message in the /var/lib/pgsql/data/pg_log file:
transaction ID wrap limit is 2147484146
You can try to fix the issue by running the following command in the command line:
Important: Run the commands one-by-one (don't run them with one go!)
Versions BEFORE PostgreSQL 9
Code: Select all
service postgresql stop
su postgres
echo "VACUUM;" > /tmp/fix.sql
postgres -D /var/lib/pgsql/data nagiosxi < /tmp/fix.sql
postgres -D /var/lib/pgsql/data postgres < /tmp/fix.sql
postgres -D /var/lib/pgsql/data template1 < /tmp/fix.sql
exit
service postgresql startNote: The commands listed above may not work with some versions of PosgreSQL. If you see the following error:
Code: Select all
postgres: invalid argument: "nagiosxi"You will need to run the following commands instead:
Code: Select all
service postgresql stop
su postgres
echo "VACUUM;" > /tmp/fix.sql
postgres --single -D /var/lib/pgsql/data nagiosxi < /tmp/fix.sql
postgres --single -D /var/lib/pgsql/data postgres < /tmp/fix.sql
postgres --single -D /var/lib/pgsql/data template1 < /tmp/fix.sql
exit
service postgresql startVersions 9 Onwards
Code: Select all
service postgresql stop
su postgres
echo "VACUUM FULL;" > /tmp/fix.sql
postgres -D /var/lib/pgsql/data nagiosxi < /tmp/fix.sql
postgres -D /var/lib/pgsql/data postgres < /tmp/fix.sql
postgres -D /var/lib/pgsql/data template1 < /tmp/fix.sql
exit
service postgresql startNote: The commands listed above may not work with some versions of PosgreSQL. If you see the following error:
Code: Select all
postgres: invalid argument: "nagiosxi"You will need to run the following commands instead:
Code: Select all
service postgresql stop
su postgres
echo "VACUUM FULL;" > /tmp/fix.sql
postgres --single -D /var/lib/pgsql/data nagiosxi < /tmp/fix.sql
postgres --single -D /var/lib/pgsql/data postgres < /tmp/fix.sql
postgres --single -D /var/lib/pgsql/data template1 < /tmp/fix.sql
exit
service postgresql startRe: no alert triggered on service down
we are running with postgres 8.4.20 and i performed below operations
[root@Spectraguard ~]# echo "vacuum;vacuum analyze;"|psql nagiosxi postgres
VACUUM
VACUUM
[root@Spectraguard ~]# service postgresql restart
Stopping postgresql service: [ OK ]
Starting postgresql service: [ OK ]
[root@Spectraguard ~]# psql nagiosxi nagiosxi
\d
\q
i didn't face any error during all that.
shall i submit critical passive checks now?
[root@Spectraguard ~]# echo "vacuum;vacuum analyze;"|psql nagiosxi postgres
VACUUM
VACUUM
[root@Spectraguard ~]# service postgresql restart
Stopping postgresql service: [ OK ]
Starting postgresql service: [ OK ]
[root@Spectraguard ~]# psql nagiosxi nagiosxi
\d
\q
i didn't face any error during all that.
shall i submit critical passive checks now?
-
dwhitfield
- Former Nagios Staff
- Posts: 4583
- Joined: Wed Sep 21, 2016 10:29 am
- Location: NoLo, Minneapolis, MN
- Contact:
Re: no alert triggered on service down
Yes.nky1986 wrote:
shall i submit critical passive checks now?
When you did this before, did they should up in the nagios.log? What happens if you change your notification command? Have your notifications ever worked on this server? Was there an upgrade in between it working and not working?