After running flawlessly for several months a Java server applications we’re using in SpamDrain started to lock up very frequently. A thread dump taken during such a lock up revealed that all the handler threads were stuck inside a call to
java.security.SecureRandom.nextInt(). By reading the source code of the SHA1PRNG
SecureRandom implementation I figured out that it uses
/dev/random under Linux for its source of random numbers. This was the cause of the lock ups.
The Linux kernel collects good random numbers from various sources (e.g. mouse movements on a desktop system) and stores them in a pool. When a process reads from
/dev/random these random numbers will be removed from the pool and returned to the process. If the process reads from
/dev/random faster than new random numbers are generated the reads will eventually start to block.
In our application a new random number was requested for every new client connection. The first couple of months it ran without any problems because there were plenty of random numbers in the pool. Then the pool was exhausted and the calls to
SecureRandom.nextInt() would suddenly take a really long time to complete.
Thankfully, the use of
SecureRandom was a classic case of overengineering. The calls could easily be replaced with other code and the application has been running happily ever since.