Runit, Chpst and ulimit defaults

So I ran into this problem at work today with an runit based service breaching open files limit.

My first thought was to increase the system ulimit for nofile in /etc/security/limits.conf. I changed this from 30k to about 60k. But strangely, the service still keep dying.

Since the service uses chpst, my first suspect was that chpst was trying to be overbearing and changing the system limits before starting up the process. The service I was running was Python based, and so I added this small snippet in the beginning of the program to see what limits was it seeing.

import resource
logger.info("rlimit for nofile: %s", resource.getrlimit(resource.RLIMIT_NOFILE))

The logs showed:

[INFO] rlimit for nofile: (1024, 4096)

That was shocking. None of those values are anywhere configured in the system. lsof of the process had earlier showed me that the process had about a thousand open files before it crashed, so that explained why it was crashing - it was breaching the soft limit.

I looked at the documentation for chpst and found a option -o which changes the open files limit. So I set that in the chpst invocation (exec chpst -o 60000 ...), and I got:

[INFO] rlimit for nofile: (4096, 4096)

It seems that the -o only affects the soft limit. I took the win and the service recovered, but after the crisis passed, I kept digging. I was curious where all these limits were coming from.

The chpst source didn’t reveal any limits being imposed. I couldn’t figure out any other call to setrlimit in the rest of the runit sources either.

On a hunch, I tried to print the limits before chpst is called using this run file.

#!/bin/bash
exec 2>&1

echo "Soft limits"
ulimit -S -a

echo "Hard limits"
ulimit -H -a

exec chpst ...

It got me:

  • open files (-n) 1024 # soft limit
  • open files (-n) 4096 # hard limit

The actual ulimit values for the root user were 60000 (soft),60000 (hard). That showed me that this limits modification was not happening because of chpst, but probably because of runsv or some other part of the runit system. I could be wrong though because, like I said before, I couldn’t find a call to setrlimit in the sources.

Curiously, printing the ulimit values in the run file showed me another odd change from the system limits - the max procs limit (ulimit -u). It seems that when the run file is executing, the soft limit for this setting is set to the hard limit.

On the RHEL6 machine I was running, the root user showed these limits in the root shell: (1024,514975). In the run file however, the equivalents were (514975,514975). This was so weird, I double checked.

Why is the runit service being conservative with resource limits with regard to the max open files limit, but more generous with the max procs limit?

I have no idea.

In the end, yes, you can workaround the default limits of runit by using ulimit in the shell script before invoking chpst, but when you don’t, I hope this helps in reminding ourselves the limits it places on the services that it manages.

 
comments powered by Disqus