Some technical writing by Justin Giancola. I work on the platform team at Precision Nutrition.
The other day at work I ran into an interesting problem. I was working on a branch that seemed to run just fine in development but mysteriously failed on staging servers.
Debugging this with my coworker in a Rails console, we started getting PG::ConnectionBad - FATAL: sorry, too many clients already
errors whenever we kicked off the process that was failing. It seemed like we were running out of Postgresql connections, but we couldn’t explain it based on Rails’ connection pool limit and process count. The staging server in question had max_connections = 50
in Postgresql and we were running 4 multithreaded processes, each with connection pool limits of 10. So we were expecting, at worst, to have Rails gobbling up 40 connections. However, had we been maxing out our connection pools, we would instead be seeing ActiveRecord::ConnectionTimeoutError (could not obtain a database connection within 5.000 seconds (waited 5.000 seconds))
errors. Somehow we were blowing past the connection pool limits without even filling the pools.
Was ActiveRecord leaking connections? Was there something else on the server mysteriously using a significant number of connections? After much head scratching, stepping through the debugger, and even puts
statements, we learned something interesting: each process was indeed respecting the connection pool limit specified. However, rather than a single ActiveRecord::ConnectionAdapters::ConnectionPool
per process, we were actually creating and using six. How could this be?
In production, our app uses three separate Postgresql databases: a primary database for most things, a versions database for recording PaperTrail versions, and an a database to log commands and events used by our CQRS system. In order to manage these connections, we specify versions_#{Rails.env}
and cqrs_#{Rails.env}
configuration stanzas within config/database.yml
and call establish_connection
on the classes that may require connecting to a database other than the primary.
I say “may” because we only use three separate databases in production (and some specialized staging servers). In development and on regular staging servers, we use a single Postgresql instance and all three stanzas in the config file contain the same parameters.
However, even though the three config stanzas are the same, calling establish_connection
doesn’t make any use of that information, and we end up with a new connection pool for each establish_connection
invocation. So while 4 x 10 < 50, 4 x 6 x 10 is not, and when we are doing work with a lot of threads, we easily blow through the 50 connection cap.
You might also be wondering why we had six connection pools grabbing connections when there were potentially only three different databases to connect to. The reason here is that we were also calling establish_connection
on multiple ActiveRecord::Base
classes that needed to connect to the same database1. That is, if you call establish_connection :foo
on ModelA
and ModelB
, ActiveRecord doesn’t share the pool created for ModelA
with ModelB
unless ModelB
is a subclass of ModelA
.
To solve this, we did two things:
establish_connection
only on two parent classes rather than on all model classes that might need to use non-primary databases. This brought the number of active connection pools down from six to three.establish_connection
based on whether the config provided is actually different from that of the database config stanza named "#{Rails.env}"
. This way, in production we still establish new connection pools pointing to the non-primary databases where needed, while in development and staging, we don’t make any new pools and the respective models connect to the primary database via the main pool.For instance, we have multiple versions tables for use by PaperTrail to capture version info for tables that have String
and UUID primary keys. ↩