livestatus not picking up comments after nagios restart

Support forum for Nagios Core, Nagios Plugins, NCPA, NRPE, NSCA, NDOUtils and more. Engage with the community of users including those using the open source solutions.
joe123321
Posts: 2
Joined: Wed Jul 25, 2012 11:58 pm

livestatus not picking up comments after nagios restart

Post by joe123321 »

Hi,

Not sure if anyone else has come across this, but the issue I've been having is when reloading the nagios configuration files (or a stop/start of the nagios process) a select few comments aren't accessible by the mklivestatus backend. Reload again, and they return!
The Nagios core front end can see these comments the entire time, but mklivestatus sometimes cant see them.
Removing the comments and re-adding them sometime' fixes the issue but sometimes different comments (on the same host dissapear)

This affects both host and service comments and the only non standard thing that I've noticed is the host name has hyphens.

tried on mk-livestatus-1.2.0p2 mk-livestatus-1.2.1i2 mk-livestatus-1.2.1i4

currently trying to replicate the issue on a test environment
abrist
Red Shirt
Posts: 8334
Joined: Thu Nov 15, 2012 1:20 pm

Re: livestatus not picking up comments after nagios restart

Post by abrist »

I have not heard of this bug yet, if you can reproduce it in your test environment with a limited number of variables, definitely let us know. I dug through mk's changelog and it does not look like there has been any recent issues with the comments. You may want to ask around their community as well.
Former Nagios employee
"It is turtles. All. The. Way. Down. . . .and maybe an elephant or two."
VI VI VI - The editor of the Beast!
Come to the Dark Side.
joe123321
Posts: 2
Joined: Wed Jul 25, 2012 11:58 pm

Re: livestatus not picking up comments after nagios restart

Post by joe123321 »

Thanks for the response

After further investigation it appears to be duplicated comment ids

Some host / service/ downtime or acknowledgement comments share the same comment id - mklivestatus can only show one of these whereas the nagios web interface will show them (also we can see them using NDO as a backend)

I'm trying to work out the situations that cause comment ids to be duplicated to further understand the issue

sudo cat /var/lib/nagios3/retention.dat |grep comment_id= | uniq -c|sort
abrist
Red Shirt
Posts: 8334
Joined: Thu Nov 15, 2012 1:20 pm

Re: livestatus not picking up comments after nagios restart

Post by abrist »

Good to know, keep us updated. You may want to report your findings to the mk community as well.
Former Nagios employee
"It is turtles. All. The. Way. Down. . . .and maybe an elephant or two."
VI VI VI - The editor of the Beast!
Come to the Dark Side.
GavinG
Posts: 10
Joined: Thu Feb 21, 2013 4:14 pm

Re: livestatus not picking up comments after nagios restart

Post by GavinG »

abrist wrote:Good to know, keep us updated. You may want to report your findings to the mk community as well.
This is what I posted there yesterday: http://lists.mathias-kettner.de/piperma ... 08628.html

The issue looks to be that Nagios can create a hostcomment and a servicecomment that have the same comment_id.

I trawled through the Livestatus code a bit overnight and it certainly looks like they consider comment_id to be unique as that's what it adds/deletes comments based on (I think, my C/C++ isn't that good, I'm a Perl person). From Livestatus TableDownComm.cc:

Code: Select all

void TableDownComm::addComment(nebstruct_comment_data *data) {
    unsigned long id = data->comment_id;
    if (data->type == NEBTYPE_COMMENT_ADD || data->type == NEBTYPE_COMMENT_LOAD) {
        add(new Comment(data));
    }
    else if (data->type == NEBTYPE_COMMENT_DELETE) {
        remove(id);
    }
}
...
void TableDownComm::add(DowntimeOrComment *data)
{
    _entries_t::iterator it = _entries.find(data->_id);
    // might be update -> delete previous data set
    if (it != _entries.end()) {
        delete it->second;
        _entries.erase(it);
    }
    _entries.insert(make_pair(data->_id, data));
}
When NDOUtils inserts data into the database it appears to insert "comment_id" as "internal_comment_id" and creates it's own comment_id index which is unique - so we see all comments there (as per the post on the Livestatus mailing list).

So we're a bit stumped. We don't know if this is Nagios doing something weird and creating duplicate comment_id's (where it shouldn't) or Livestatus not handling duplicate comment_id's (where it should).
scottwilkerson
DevOps Engineer
Posts: 19396
Joined: Tue Nov 15, 2011 3:11 pm
Location: Nagios Enterprises
Contact:

Re: livestatus not picking up comments after nagios restart

Post by scottwilkerson »

This appears that livestatus isn't taking into account the comment_type

Code: Select all

comment_type 	=Indicates whether this is a host or service comment 	
  1 = Host comment 
  2 = Service comment
So it is very possible to have 2 comment_id's that are the same with different comment_type's
Former Nagios employee
Creator:
Human Design Website
Get Your Human Design Chart
GavinG
Posts: 10
Joined: Thu Feb 21, 2013 4:14 pm

Re: livestatus not picking up comments after nagios restart

Post by GavinG »

scottwilkerson wrote:This appears that livestatus isn't taking into account the comment_type

Code: Select all

comment_type 	=Indicates whether this is a host or service comment 	
  1 = Host comment 
  2 = Service comment
So it is very possible to have 2 comment_id's that are the same with different comment_type's
Yep, I was wondering if it was something like that.

I've hunted around to see if comment_id's are supposed to be globally unique or only unique within host/service comment lists - and I haven't been able to find anything. I've tried looking through the Nagios source to see if I can find this out myself but again, my C/C++ skills aren't that great (think I did 1 class 20 odd years ago).

Cheers for the quick response. :)
scottwilkerson
DevOps Engineer
Posts: 19396
Joined: Tue Nov 15, 2011 3:11 pm
Location: Nagios Enterprises
Contact:

Re: livestatus not picking up comments after nagios restart

Post by scottwilkerson »

No problem. If you could add that to your post on the mk list that would be great, I am sure they will get the bug sorted soon...
Former Nagios employee
Creator:
Human Design Website
Get Your Human Design Chart
GavinG
Posts: 10
Joined: Thu Feb 21, 2013 4:14 pm

Re: livestatus not picking up comments after nagios restart

Post by GavinG »

scottwilkerson wrote:No problem. If you could add that to your post on the mk list that would be great, I am sure they will get the bug sorted soon...
Done.

What's confusing me is that there is only 1 next_comment_id stored in the status.dat and retention.dat files, however when adding a comment it calls xcddefault_add_new_host/service_comment which in turn calls find_host/service_comment(next_comment_id) and checks for the next available host or service comment_id .. which is fine - but is there any reason why next_comment_id would be reset to a low value while there are comments in the system?

Currently our highest is comment_id=10953, however next_comment_id=502. We definitely haven't got anywhere near 4.3 billion comments through the system to roll the unsigned long int next_comment_id figure over .. not yet anyway. :)

It's almost like this bit of code from xdata/xcddefault.c isn't always initializing the comments properly (or is being called before the retention.dat file is parsed) and is resetting next_comment_id back to 1:

Code: Select all

/* initialize comment data */
int xcddefault_initialize_comment_data(char *main_config_file) {
        comment *temp_comment = NULL;

        /* find the new starting index for comment id if its missing*/
        if(next_comment_id == 0L) {
                for(temp_comment = comment_list; temp_comment != NULL; temp_comment = temp_comment->next) {
                        if(temp_comment->comment_id >= next_comment_id)
                                next_comment_id = temp_comment->comment_id + 1;
                        }
                }

        /* initialize next comment id if necessary */
        if(next_comment_id == 0L)
                next_comment_id = 1;

        return OK;
        }
Am i reading that right? A pointer to main_config_file is being passed to that subroutine but not actually being used?

Sorry if I'm asking stupid questions, I'm just trying to understand what's going on - partially so I understand but also so I can pass this on to problem managers who are wanting us to get our daily reporting (hosts and services with notifications disabled that have no comments) working.
scottwilkerson
DevOps Engineer
Posts: 19396
Joined: Tue Nov 15, 2011 3:11 pm
Location: Nagios Enterprises
Contact:

Re: livestatus not picking up comments after nagios restart

Post by scottwilkerson »

I'm going to have our Core developer take a look at this....
Former Nagios employee
Creator:
Human Design Website
Get Your Human Design Chart
Locked