
We made our backend handle more than 50x the number of successful concurrent requests on a single instance. If you care about a platform that stays fast as you grow - or you’re curious how we got there - this post walks through the technical story: what we found, how we fixed it, and how we keep it from coming back.
The symptom
Under intense load, our backend stopped accepting work. Requests piled up; the service only recovered when stale database transactions hit their timeout. When we inspected the database connection pool, we saw it was full of transactions each waiting for a connection - and those connections were themselves blocked, waiting for another connection. The system was effectively deadlocked.
The cause
We were opening a new database transaction and, inside it, awaiting another query that opened its own connection. Under concurrency, every request held one connection (its open transaction) and then tried to acquire a second one (for the nested query). With a finite pool, that second connection was often held by another request waiting for its second connection. Result: the pool filled with transactions waiting on connections that would never become free until something timed out.
The fix
We refactored so that all database work inside a transaction uses the same connection. We pass the transaction
object through the call stack and use transaction-aware APIs (e.g. getInTransaction(trx, ...),
decryptWebhookUrlInTransaction(trx, ...)) whenever code runs inside an existing transaction. No more nested
getDb().transaction().execute(...) from code that is already running in a transaction. One logical request path
now uses at most one connection at a time.
Making it stick
We didn’t want the same mistake to creep back. We run our test suite with a database pool of size one. Any code path that accidentally opens a nested transaction tries to acquire a second connection; with only one connection available, the test hangs and hits a timeout. That forced us to find and fix every similar spot. It also acts as a regression guard: if someone introduces a nested transaction again, the tests will catch it.
Key takeaways
- Connection-pool exhaustion under load can come from holding a transaction open while awaiting a query that acquires another connection (nested connection usage).
- Fix it by reusing one connection per request path: pass the transaction through the stack and use transaction-aware variants of every DB call that can run inside a transaction.
- Running tests with a single-connection pool surfaces every place where this can happen and prevents regressions.
Conclusion
On a single backend instance, we went from an effective concurrency cap of about 90% of our pool size (because the pool was exhausted by transactions waiting on connections) to multiplying by at least 50 the number of successful concurrent requests per second. For you, that means a platform that stays responsive when many users sign or many API calls hit at once - and a team that takes performance seriously enough to find the root cause, fix it, and lock it in with tests.