Page 1 of 3

livestatus not picking up comments after nagios restart

Posted: Tue Jan 29, 2013 4:20 pm
by joe123321
Hi,

Not sure if anyone else has come across this, but the issue I've been having is when reloading the nagios configuration files (or a stop/start of the nagios process) a select few comments aren't accessible by the mklivestatus backend. Reload again, and they return!
The Nagios core front end can see these comments the entire time, but mklivestatus sometimes cant see them.
Removing the comments and re-adding them sometime' fixes the issue but sometimes different comments (on the same host dissapear)

This affects both host and service comments and the only non standard thing that I've noticed is the host name has hyphens.

tried on mk-livestatus-1.2.0p2 mk-livestatus-1.2.1i2 mk-livestatus-1.2.1i4

currently trying to replicate the issue on a test environment

Re: livestatus not picking up comments after nagios restart

Posted: Tue Jan 29, 2013 5:06 pm
by abrist
I have not heard of this bug yet, if you can reproduce it in your test environment with a limited number of variables, definitely let us know. I dug through mk's changelog and it does not look like there has been any recent issues with the comments. You may want to ask around their community as well.

Re: livestatus not picking up comments after nagios restart

Posted: Tue Feb 19, 2013 8:16 pm
by joe123321
Thanks for the response

After further investigation it appears to be duplicated comment ids

Some host / service/ downtime or acknowledgement comments share the same comment id - mklivestatus can only show one of these whereas the nagios web interface will show them (also we can see them using NDO as a backend)

I'm trying to work out the situations that cause comment ids to be duplicated to further understand the issue

sudo cat /var/lib/nagios3/retention.dat |grep comment_id= | uniq -c|sort

Re: livestatus not picking up comments after nagios restart

Posted: Wed Feb 20, 2013 1:34 pm
by abrist
Good to know, keep us updated. You may want to report your findings to the mk community as well.

Re: livestatus not picking up comments after nagios restart

Posted: Thu Feb 21, 2013 4:26 pm
by GavinG
abrist wrote:Good to know, keep us updated. You may want to report your findings to the mk community as well.
This is what I posted there yesterday: http://lists.mathias-kettner.de/piperma ... 08628.html

The issue looks to be that Nagios can create a hostcomment and a servicecomment that have the same comment_id.

I trawled through the Livestatus code a bit overnight and it certainly looks like they consider comment_id to be unique as that's what it adds/deletes comments based on (I think, my C/C++ isn't that good, I'm a Perl person). From Livestatus TableDownComm.cc:

Code: Select all

void TableDownComm::addComment(nebstruct_comment_data *data) {
    unsigned long id = data->comment_id;
    if (data->type == NEBTYPE_COMMENT_ADD || data->type == NEBTYPE_COMMENT_LOAD) {
        add(new Comment(data));
    }
    else if (data->type == NEBTYPE_COMMENT_DELETE) {
        remove(id);
    }
}
...
void TableDownComm::add(DowntimeOrComment *data)
{
    _entries_t::iterator it = _entries.find(data->_id);
    // might be update -> delete previous data set
    if (it != _entries.end()) {
        delete it->second;
        _entries.erase(it);
    }
    _entries.insert(make_pair(data->_id, data));
}
When NDOUtils inserts data into the database it appears to insert "comment_id" as "internal_comment_id" and creates it's own comment_id index which is unique - so we see all comments there (as per the post on the Livestatus mailing list).

So we're a bit stumped. We don't know if this is Nagios doing something weird and creating duplicate comment_id's (where it shouldn't) or Livestatus not handling duplicate comment_id's (where it should).

Re: livestatus not picking up comments after nagios restart

Posted: Thu Feb 21, 2013 7:47 pm
by scottwilkerson
This appears that livestatus isn't taking into account the comment_type

Code: Select all

comment_type 	=Indicates whether this is a host or service comment 	
  1 = Host comment 
  2 = Service comment
So it is very possible to have 2 comment_id's that are the same with different comment_type's

Re: livestatus not picking up comments after nagios restart

Posted: Thu Feb 21, 2013 7:51 pm
by GavinG
scottwilkerson wrote:This appears that livestatus isn't taking into account the comment_type

Code: Select all

comment_type 	=Indicates whether this is a host or service comment 	
  1 = Host comment 
  2 = Service comment
So it is very possible to have 2 comment_id's that are the same with different comment_type's
Yep, I was wondering if it was something like that.

I've hunted around to see if comment_id's are supposed to be globally unique or only unique within host/service comment lists - and I haven't been able to find anything. I've tried looking through the Nagios source to see if I can find this out myself but again, my C/C++ skills aren't that great (think I did 1 class 20 odd years ago).

Cheers for the quick response. :)

Re: livestatus not picking up comments after nagios restart

Posted: Fri Feb 22, 2013 7:50 am
by scottwilkerson
No problem. If you could add that to your post on the mk list that would be great, I am sure they will get the bug sorted soon...

Re: livestatus not picking up comments after nagios restart

Posted: Fri Feb 22, 2013 2:56 pm
by GavinG
scottwilkerson wrote:No problem. If you could add that to your post on the mk list that would be great, I am sure they will get the bug sorted soon...
Done.

What's confusing me is that there is only 1 next_comment_id stored in the status.dat and retention.dat files, however when adding a comment it calls xcddefault_add_new_host/service_comment which in turn calls find_host/service_comment(next_comment_id) and checks for the next available host or service comment_id .. which is fine - but is there any reason why next_comment_id would be reset to a low value while there are comments in the system?

Currently our highest is comment_id=10953, however next_comment_id=502. We definitely haven't got anywhere near 4.3 billion comments through the system to roll the unsigned long int next_comment_id figure over .. not yet anyway. :)

It's almost like this bit of code from xdata/xcddefault.c isn't always initializing the comments properly (or is being called before the retention.dat file is parsed) and is resetting next_comment_id back to 1:

Code: Select all

/* initialize comment data */
int xcddefault_initialize_comment_data(char *main_config_file) {
        comment *temp_comment = NULL;

        /* find the new starting index for comment id if its missing*/
        if(next_comment_id == 0L) {
                for(temp_comment = comment_list; temp_comment != NULL; temp_comment = temp_comment->next) {
                        if(temp_comment->comment_id >= next_comment_id)
                                next_comment_id = temp_comment->comment_id + 1;
                        }
                }

        /* initialize next comment id if necessary */
        if(next_comment_id == 0L)
                next_comment_id = 1;

        return OK;
        }
Am i reading that right? A pointer to main_config_file is being passed to that subroutine but not actually being used?

Sorry if I'm asking stupid questions, I'm just trying to understand what's going on - partially so I understand but also so I can pass this on to problem managers who are wanting us to get our daily reporting (hosts and services with notifications disabled that have no comments) working.

Re: livestatus not picking up comments after nagios restart

Posted: Fri Feb 22, 2013 5:42 pm
by scottwilkerson
I'm going to have our Core developer take a look at this....