mcollective and activemq the 800 node limit

I've been running into the 800 node limit on mcollective and splitting up my nodes into subcollectives. I had a spot where I couldn't split up the nodes, so I started looking at why we were hitting this 800 node wall.

I'm using activemq with the ssl plugin, after turning on all the debugging I could find in activemq, it turns out it's just a simple resource limit problem.

With activemq running, I waited for my nodes to connect and watched the number of threads on the active java process. (This is after increasing the memory limits for activemq as described on puppetlabs website.

Getting the number of threads, two different ways.

$ pgrep java |xargs ps uH |wc -l
1023
$ pgrep java |xargs -I % ls -l /proc/%/task |wc -l
1023

Either way we are seeing around 1024 processes (threads), looks suspiciously like a limit. I increased the limit in /etc/security/limits.d/activemq.conf

activemq soft nofile 16384
activemq hard nofile 16384
activemq soft nproc 4096
activemq hard nproc 4096

Not really sure if the nofile limit is required, but nproc seems to fix my issue.
After restarting activemq

$ pgrep java |xargs ps uH |wc -l
1530
$ pgrep java |xargs -I % ls -l /proc/%/task |wc -l
1530

The number of nodes returned by mco find goes from a random result in the 800-1000 range to the 1400 or so that I was expecting.

I'm going to have to update my section on mcollective in my book

About the Author...

Slides from LISA 2019 Linux systems troubleshooting #LISA2019 https://t.co/D4dMKflK6R Tue Oct 29 05:59:30 +0000 2019

https://t.co/AGeihMALAv configuring grub2 with EFI Fri Sep 13 05:20:01 +0000 2019

I published a Thing on @thingiverse! https://t.co/IYpRyEb7Hz #thingalert Tue Jul 23 19:27:57 +0000 2019

Nokogiri install on MacOSX https://t.co/v3An0miW9L Fri Jul 12 15:06:49 +0000 2019

HTML email with plain mailer plugin on Jenkins https://t.co/Z6FSDMDjy8 Thu Jul 11 21:07:25 +0000 2019