Unless you've built a massive real-time notification system with thousands of distinct channels, it is easy to miss the quadratic performance bottleneck that Postgres used to have in its notification queue. A recent commit fixes that with a spectacular throughput improvement.
The commit in question is 282b1cde9d, which landed recently and targets a future release (likely Postgres 19, though as with all master branch commits, there's a standard caveat that it could be rolled back before release).
The Problem
Previously, when a backend sent a notification (via NOTIFY), Postgres had to determine which other backends were listening to that channel. The implementation involved essentially a linear search or walk through a list for each notification, which became costly when the number of unique channels grew large.
If you had a scenario with thousands of unique channels (e.g., a "table change" notification system where each entity has its own channel), the cost of dispatching notifications would scale quadratically. For a transaction sending N notifications to N different channels, the effort was roughly O(N^2).
The Fix
The optimization introduces a shared hash table to track listeners. Instead of iterating through a list to find interested backends, the notifying process can now look up the channel in the hash table to instantly find who (if anyone) is listening. This implementation uses Partitioned Hash Locking, allowing concurrent LISTEN/UNLISTEN commands without global lock contention.
Additionally, the patch includes an optimization to "direct advance" the queue pointer for listeners that are not interested in the current batch of messages. This is coupled with a Wake-Only-Tail strategy that signals only the backend at the tail of the queue, avoiding "thundering herd" wake-ups and significantly reducing CPU context switches.
Finally, the patch helps observability by adding specific Wait Events, making it easier to spot contention in pg_stat_activity.
Benchmarking Methodology
Tests were conducted on an Intel Core i7-4770 server with 16GB RAM.
Two Postgres instances were set up:
- Before: Commit
23b25586d(the parent of the fix), running on port 5432. - After: Commit
282b1cde9(the fix), running on port 5433.
The "Sender's Commit Time" (in milliseconds) was measured across three different scenarios to test scalability limits.
Results
Test 1: Channel Scalability
Metric: Time to send 1 NOTIFY to N unique channels.
The "Before" code exhibits clear O(N^2) or similar quadratic degradation as the number of channels increases. The "After" code remains effectively flat.
| Channels | Before (ms) | After (ms) | Speedup |
|---|---|---|---|
| 1 | 0.095 | 0.094 | 1x |
| 10 | 0.059 | 0.054 | 1.1x |
| 100 | 0.080 | 0.089 | 0.9x |
| 1,000 | 2.032 | 0.485 | 4.2x |
| 2,000 | 6.195 | 0.791 | 7.8x |
| 3,000 | 13.314 | 1.019 | 13x |
| 4,000 | 24.256 | 1.656 | 15x |
| 5,000 | 35.711 | 1.693 | 21x |
| 6,000 | 54.520 | 2.456 | 22x |
| 7,000 | 70.088 | 3.274 | 21x |
| 8,000 | 93.798 | 3.450 | 27x |
| 9,000 | 128.252 | 4.483 | 29x |
| 10,000 | 140.061 | 3.598 | 39x |
Test 2: Listener Scalability (at 1,000 channels)
Metric: Impact of multiple backends listening to the same channels.
| Listeners | Before (ms) | After (ms) | Speedup |
|---|---|---|---|
| 1 | 2.468 | 0.452 | 5.5x |
| 10 | 5.085 | 0.536 | 9.5x |
| 50 | 2.728 | 0.522 | 5.2x |
| 100 | 4.342 | 0.620 | 7.0x |
| 200 | 5.485 | 0.783 | 7.0x |
Test 3: Idle User Overhead (at 1,000 channels)
Metric: Impact of connected but non-listening backends.
| Idle Users | Before (ms) | After (ms) | Speedup |
|---|---|---|---|
| 0 | 3.065 | 0.425 | 7.2x |
| 50 | 3.123 | 0.474 | 6.6x |
| 100 | 4.219 | 0.353 | 12x |
| 200 | 3.262 | 0.437 | 7.5x |
| 300 | 2.102 | 0.381 | 5.5x |
| 400 | 2.500 | 0.384 | 6.5x |
| 500 | 2.091 | 0.362 | 5.8x |
Conclusion
This optimization eliminates a significant scalability cliff in Postgres's messaging system. While "don't create 100,000 channels" is still good advice, it's great to see Postgres becoming more robust against these high-load scenarios.
For those interested in the technical nitty-gritty, reading the discussion thread is highly recommended to see how the solution evolved from a simple list switch to a shared hash map implementation over thirty-five (35!) iterations before finally landing in the codebase.
Credits: The commit was authored by Joel Jacobson and reviewed extensively by Tom Lane (who also committed it) as well as Chao Li.

No comments:
Post a Comment