Who's checking in, the mcollective trick.

This keeps coming up so I thought I'd share one trick we've used to figure out if there are stale nodes out there. These are nodes that are failing to update for various reasons that won't be reported in your reporting mechanism. One of the common causes is an expired or revoked certificate. The agent never gets far enough to report a failure.

In these cases, provided mcollective was running and configured on the node, you may still see the node in mcollective and think everything is fine. If you have a small enough implementation you can probably track down these hosts one by one, but this is how we do it with a few thousand nodes. I'm assuming you are configuring mcollective from puppet (this won't work if you aren't).

Go into your activemq configuration and add a new authorizationEntry for a new collective, call it whatever you like.

" write="mcollective" read="mcollective" admin="mcollective" />
" write="mcollective" read="mcollective" admin="mcollective" />

Now go into your mcollective server configuration and edit the main_collective and collectives settings.

main_collective = stalecollective
collectives = stalecollective,mcollective

Sit back and wait, I usually use the default checkin interval of 30 minutes, so waiting 60 minutes or so works well. Now run mco again against the new collective (edit your client.cfg or ~/.mcollective)

mco find -T stalecollective -v

You should see only your active hosts now. Possibly more interesting, run mco against the original collective and see the stale hosts

mco find -T mcollective -v

If you have hosts that checkin less frequently you might get a few false positives but this will still be a good starting point to find the nodes that aren't updating their configurations.

Mastering Puppet

About the Author...

Slides from LISA 2019 Linux systems troubleshooting #LISA2019 https://t.co/D4dMKflK6R Tue Oct 29 05:59:30 +0000 2019

https://t.co/AGeihMALAv configuring grub2 with EFI Fri Sep 13 05:20:01 +0000 2019

I published a Thing on @thingiverse! https://t.co/IYpRyEb7Hz #thingalert Tue Jul 23 19:27:57 +0000 2019

Nokogiri install on MacOSX https://t.co/v3An0miW9L Fri Jul 12 15:06:49 +0000 2019

HTML email with plain mailer plugin on Jenkins https://t.co/Z6FSDMDjy8 Thu Jul 11 21:07:25 +0000 2019