{"UUID":"34b4a47b-a3d6-4bda-acc9-57b621f53468","URL":"https://incident.io/blog/database-performance","ArchiveURL":"","Title":"Incident.io intermittent database connection pool timeouts","StartTime":"0001-01-01T00:00:00Z","EndTime":"0001-01-01T00:00:00Z","Categories":["cloud","config-change"],"Keywords":["database","connection pool","timeouts","go","postgres","slack","transactions","performance"],"Company":"incident.io","Product":"database connection pool","SourcePublishedAt":"2023-04-20T00:00:00Z","SourceFetchedAt":"2026-05-04T19:51:23.441549Z","Summary":"Two weeks of intermittent app timeouts, with traces showing requests waiting up to 20s for a connection from Go's `database/sql` pool, but contention spread across many endpoints rather than a single slow query. After 24 deploys' worth of fixes (materialized views, indexes, lock timeouts, async Slack-webhook handling, and a custom `ngrok/sqlmw` middleware to attribute connection-pool hold time per operation), the root cause turned out to be an unnecessary transaction wrapping every Slack modal submission — many small fast transactions were in aggregate exhausting the pool.","Description":"Incident.io experienced intermittent application timeouts over a two-week period earlier this year, impacting customer experience and leading to \"context canceled errors\" in their error reporting. Despite initial investigations, no clear cause like sudden code changes or traffic spikes was immediately apparent.\n\nTraces revealed that HTTP requests were waiting up to 20 seconds to acquire an available connection from the Go `database/sql` connection pool. This contention was widespread across many endpoints, rather than being isolated to a single slow query or connection pool, making diagnosis challenging.\n\nInitial attempts to resolve the issue included optimizing neglected queries, adding database indices, rewriting inefficient queries, and implementing a one-second lock timeout for transactions. They also began processing Slack events asynchronously to reduce immediate database load. However, these measures did not fully resolve the intermittent timeouts.\n\nTo better diagnose the problem, incident.io improved their observability by implementing a custom `ngrok/sqlmw` middleware. This allowed them to track the total time operations spent holding a database connection pool. The ultimate root cause was identified as an unnecessary transaction wrapping every Slack modal submission.\n\nRemoving these unnecessary transactions, and explicitly adding them only where transactional guarantees were required, resolved the issue. The problem was not a single slow operation but an aggregation of many small, fast transactions exhausting the connection pool. The company has been timeout-free for four months since the fix."}