Over the past few days rethinkdb has been giving me errors about handshake timeouts due to too many open connections. If you’ve had similar ‘handshake timeout’ errors you’ve probably got the same problem.
Somewhere in the Tower Storm codebase connections were being made and not closed properly. Unfortunately the code base is huge and database calls are made in many places. Also when rethinkdb errors out it doesn’t give a stack trace or any indication of where connections are being tied up.
But I figured out a way to find connections that were not being closed properly.
Here’s my original connection code. This code is based off the example on the rethinkdb site.
### rethinkdb-client.coffee ###
r = require('rethinkdb')
netconfig = require('../config/netconfig')
db = r
db.onConnect = (callback) ->
r.connect {host: netconfig.db.host, port: netconfig.db.port, db: 'towerstorm'}, (err, conn) ->
if err then throw new Error(err)
if !conn then throw new Error("No RethinkDB connection returned")
callback(err, conn)
module.exports = db
And here’s how I modified the onConnect function to find the connections that were not being closed:
db.onConnect = (callback) ->
stack = new Error().stack
r.connect {host: netconfig.db.host, port: netconfig.db.port, db: 'towerstorm'}, (err, conn) ->
if err then throw new Error(err)
if !conn then throw new Error("No RethinkDB connection returned")
setTimeout ->
if conn && conn.open
console.log("Connection created was not closed in 5 seconds. Stack: ", stack)
, 5000
callback(err, conn)
Firstly the line:
stack = new Error().stack
Gets a stack trace of how we reached this db.onConnect function.
Then just before returning the connection I setup a callback to check the connection in 5 seconds. If it detects the connection is still open it gives me a stack trace showing exactly where it was opened and I can add a conn.close() in the appropriate spot.
And easy as that you can find and kill all your stray rethinkdb connections.