That Guy From Delhi: 2026

3 Jul 2026

SeaweedFS: Distributed Storage with Commodity Equipment

Maintaining a home lab with a diverse fleet of servers is a constant balancing act, but the real challenge is finding a way to pool mismatched, leftover drives across separate machines without turning storage administration into a second full-time job. Over time, my server farm accumulated an assortment of mismatched disks—a fast NVMe SSD on one machine, a spinning mechanical HDD on another, and a tight OS drive elsewhere. Rather than letting this spare capacity go to waste or buying an expensive, dedicated NAS appliance, I set out to build a unified distributed shared storage layer across my fleet.

My goal was simple: the storage solution had to be low-maintenance, highly efficient with heterogeneous drives, actively developed, and crucially, not a project in itself to manage.

After evaluating the distributed storage landscape, I settled on SeaweedFS. Here is how I built a high-efficiency cluster pooling 840 GB of storage, and the operational lessons I learned along the way.

Why SeaweedFS?

Narrowing down the storage options for this farm was an elaborate exercise. Traditional distributed filesystems fell flat for a home lab environment:

Ceph is incredibly powerful but essentially a career choice to administer. It is far too heavy, requiring deep operational knowledge, constant tuning, and massive memory overhead just to keep the cluster healthy.
GlusterFS has a heavy configuration footprint and is highly sensitive to network latency, making it bloated for smaller labs.
MinIO has shifted its focus and license dynamics, making it less attractive for simple, general-purpose mounts.

SeaweedFS fit the bill perfectly because it is designed for speed and simplicity. Written in Go, it compiles to a single static binary with no external database dependencies. Its architecture splits metadata management (handled by the Filer) from block storage (handled by Volume Servers). This makes the entire stack extremely lightweight, requiring less than 50MB of RAM for the master nodes.

More importantly, SeaweedFS excels at managing heterogeneous disk sizes. Instead of striping data across drives (where the smallest disk limits the entire array), SeaweedFS groups data into independent, configurable volume files (which I set to 1 GB). This allows it to pack odd-sized drives to the brim while remaining highly efficient.

The Architecture: Pooling the Fleet

I configured the storage pool to span three physical nodes: the orchestrator node (lenovo) running an Alder Lake i7 with 48GB of RAM, the AI inference worker (tiger) hosting a Xeon and a 16GB GPU, and camry running Postgres fuzzing workloads.

Cluster Topology & Capacity

The cluster pools a total of 840 GB of raw capacity across three physical machines running five volume server instances:

Node	Port	Storage Media	Assigned Capacity
`lenovo`	`:8080` `:8081`	Root NVMe (`/var/lib/seaweedfs_data`) Data NVMe (`/mnt/nv1/seaweedfs_data`)	100 GB 400 GB
`tiger`	`:8080` `:8081`	HDD (`/mnt/disk2/seaweedfs_data`) SSD (`/mnt/ssd/seaweedfs_data`)	300 GB 30 GB
`camry`	`:8080`	Root SSD (`/var/lib/seaweedfs_data`)	10 GB (dialed down from 60 GB)

Note: camry is restricted to 10 GB because its root SSD is currently running tight on space. I will scale it up once I upgrade the drive.

The Rack Isolation Strategy: Handling Multi-Disk Nodes

A key detail of this configuration is how SeaweedFS handles nodes with multiple physical disks. Both lenovo and tiger run two separate volume server instances. On lenovo, one instance serves the root NVMe SSD (:8080) while the other serves the larger data NVMe drive (:8081). On tiger, one instance points to a mechanical HDD (:8080) and the other points to a SATA SSD (:8081).

In a default SeaweedFS layout without rack topology, the scheduler treats every volume server as an independent target. If a replication policy of 010 (write one copy to another node in the same data center) is evaluated, SeaweedFS could decide to store the primary copy on lenovo's root SSD volume server and the replica copy on lenovo's data NVMe volume server. While this would protect against a single SSD failure, it does nothing if the entire lenovo motherboard dies, if the machine suffers a power loss, or if the Linux kernel panics.

By explicitly hardcoding the -rack parameter on each systemd unit:

Both volume servers on lenovo run with -rack=lenovo.
Both volume servers on tiger run with -rack=tiger.
The volume server on camry runs with -rack=camry.

I tell SeaweedFS that these volume servers share physical hardware. When the cluster evaluates a rack-level replication rule (like the middle digit in 010 or 020), it is forced to select a volume server in a different rack. This ensures that copies are always placed on physically separate machines, achieving true hardware-level fault tolerance.

Replication Policies

Each path prefix has a tailored replication policy configured via weed shell -> fs.configure:

Path	Policy	Copies	Behavior
`/` (Default)	`010`	2	One extra copy on a different physical machine (different rack).
`/photos`	`020`	3	Full replication; one copy on every machine in the fleet.
`/scratch`	`000`	1	Single-copy, no redundancy. Used for the ~185 GB Jellyfin movie library.

The /scratch directory allows me to store my movie library with zero redundancy overhead. Since the media can be re-downloaded, accepting single-copy storage is a sensible trade for reclaiming disk space.

Operational Lessons: Surviving a Silent Outage

No DIY storage cluster is complete without testing its edges. During a routine reboot of lenovo, I encountered a silent outage that served as a fantastic learning experience for hardening FUSE client mounts.

The Boot-Time Race

When lenovo booted, its local SeaweedFS filer, S3 gateway, and volume servers ran their IP auto-detection routines. Because the systemd units were configured with After=network.target, they launched before the physical network interfaces had actually acquired their LAN IP addresses.

Consequently, weed resolved the local hostname to the loopback address 127.0.1.1 (per Debian's default /etc/hosts mapping) and bound its services to localhost. The master service survived because its peers list explicitly hardcoded the LAN IPs, but the filer and volume servers on lenovo became unreachable to the rest of the network.

This triggered a cascade of failures:

Unreachable Filer: Since the filer only listened on 127.0.0.1, FUSE mounts on tiger and camry threw ENOTCONN errors and became stale.
FUSE Crash-Loop: The local mount service on lenovo began crash-looping in the background, trying and failing to mount the local filer every 10 seconds.
The Samba Blindspot: Samba on lenovo was configured to share the mount point directory /mnt/seaweedfs to my Windows laptop as the Z: drive. Because Samba started successfully and the directory /mnt/seaweedfs still existed, Samba happily served the bare local directory to the network.

Because Samba was serving the bare local mount point, any files written by Samba or automated scripts landed directly on the local host root filesystem, hidden underneath the unmounted folder path.

Discovery & Triage

I detected the issue when running my weekly system inventory script and reviewing the git history. The script captures cluster topology and active mount capacities. The diff immediately flagged a disk contents discrepancy: the files and directories visible on consecutive runs did not match, signaling that the FUSE mounts had silently disappeared.

What was supposed to be the mounted volume was actually just the empty, unmounted /mnt/seaweedfs folder on the host's root filesystem.

Upon realizing this, I triaged the nodes, recovered the orphaned files from the underlying root directories, successfully remounted the cluster, and restored the data back into the active SeaweedFS storage pool.

Hardening the Storage Layer

To prevent this boot-race and silent write behavior from happening again, I implemented five structural fixes across all nodes:

1. Hardcoding IPs in Services

I stripped auto-detection out of the systemd unit files. Every service now explicitly binds and advertises its LAN IP:

# /etc/systemd/system/seaweedfs-filer.service
ExecStart=/usr/local/bin/weed filer -ip=192.168.85.70 -ip.bind=0.0.0.0 -port=8888 -s3.port=8333

2. Delaying Until the Network is Truly Online

I updated all SeaweedFS units to wait for network-online.target rather than the weak network.target. This guarantees that IP routing is fully established before the storage layer starts up:

[Unit]
Description=SeaweedFS Filer
After=network-online.target
Wants=network-online.target

3. Fail-Hard Mountpoint Permissions (The `chmod 000` Trick)

To stop silent local writes when FUSE mounts fail, I unmounted /mnt/seaweedfs on every node and locked down the bare folders:

sudo umount /mnt/seaweedfs
sudo chmod 000 /mnt/seaweedfs
sudo chown root:root /mnt/seaweedfs

When SeaweedFS is not mounted, /mnt/seaweedfs is owned by root with no read, write, or execute permissions. Any attempt by a script or Samba to write to it immediately throws a loud "Permission Denied" error. When mounted, the FUSE driver overrides these permissions.

4. Active Liveness Watchdog

I deployed a simple systemd timer-based watchdog script. Every hour, it checks the mount status using mountpoint -q and a quick read probe:

#!/bin/bash
# /usr/local/bin/seaweedfs-mount-health.sh
if ! mountpoint -q /mnt/seaweedfs || ! timeout 15 stat /mnt/seaweedfs >/dev/null 2>&1; then
    echo "SeaweedFS mount is unhealthy or stale. Attempting restart..."
    systemctl restart seaweedfs-mount
fi

If the mount fails the probe, the service automatically restarts, restoring the link and kicking Samba to pick up the fresh mount.

5. Prometheus Disk Health Monitoring

Finally, to round out the observability of the storage cluster, I configured Prometheus (backed by VictoriaMetrics) to handle fleet-wide disk health monitoring. By collecting metrics via node_exporter across all three nodes, I can now monitor SMART health, temperature, and disk space utilisation. This provides an essential early-warning layer, alerting me to degrading physical drives or capacity bottlenecks before they have a chance to affect the running volume servers (though I will save the deep dive into the Prometheus configuration and custom alert rules for a future post).

The Economics: Cloud Costs vs. Commodity Hardware

Besides the flexibility of having a local storage pool, there is a strong financial argument for offline storage. Eventually, this local cluster can serve as a direct copy of my cloud storage, saving me the $30 to $50 I spend every month on subscription fees.

When you look at the cost of the commodity hardware in my home lab (much of it sourced from second-hand marketplaces), the economics are hard to argue with:

camry: Sourced three years ago for a mere $80.
tiger: A basic server build at $200 (though the RTX 5060 Ti 16 GB inside it was a good grand).
lenovo: The premium compute node in the farm, which was the most expensive at $800.
Storage Media: A typical NVMe SSD costs around $100.

For a modest investment in hardware, I get a private storage pool that I own entirely. Yes, running your own storage layer comes with some operational maintenance and the occasional "pain" of triaging configuration races, but the trade-off is massive: zero recurring monthly storage bills, complete data privacy, and local gigabit transfer speeds.

Conclusion

SeaweedFS has proven to be a phenomenally fast, lightweight, and low-maintenance distributed storage option. It has allowed me to turn a collection of odd, mismatched disks into a resilient, zero-cost network drive without the operational complexity of Ceph or GlusterFS.

By enforcing strict IP binding, delaying service start until the network is fully online, locking the local directory permissions to 000, and running an active watchdog, I have turned a raw object store into a robust, self-healing, set-and-forget storage layout. It does exactly what home lab storage should do: it runs efficiently in the background, stays out of the way, and lets me focus on the actual work.

23 Jun 2026

AI Ping-Pong with Polly (and Omnigent)

Sometimes the most entertaining way to test a new orchestration tool isn't to run a standard benchmark, but to lock two state-of-the-art AI models in a virtual room and make them debate historical facts. This week, I got to do exactly that using an interesting new tool called polly within the omniagent ecosystem.

The idea was simple: set up a relay where Claude and Gemini (specifically the Antigravity agent) pass a task back and forth, validating and revalidating each other's work.

However, getting there wasn't entirely smooth sailing. I see recent polly changes that seem to be working on Antigravity support but I presume that it is just heavily in beta and under active development - Simply because I couldn't get it to work cleanly on my local test setup out of the box. In a somewhat ironic twist of AI-assisted development, I ended up having Claude help me debug my environment and write the custom harness around Antigravity to get the toolchain working.

Once the harness was solid, the fun began. I decided to test their interaction on a simple prompt asking for a list of India's Prime Ministers (based on their first stint). I then pass that list back-n-forth between the two AI engines for cross-validation and updation. To avoid an endless loop, I limited the number of iterations to exactly two.

Here is the exact prompt and how it kicked off:



 ❯ ask antigravity to generate the list of prime ministers of india in the
 chronological order (of their first stint) and then pass the list to
 claude for validation and an updated list of the same. and pass that onto
 to antigravity for revalidation with the same request. Do this round trip
 only 2 times and display the resultant list.

   · thinking…
◆ I'll run this as a relay: antigravity generates → 
    Claude validates/updates → antigravity revalidates, repeated for 
    2 round trips. Starting with antigravity's initial list.

◆ Antigravity produced the initial list (15 entries). Round trip 1 —
   passing to Claude for validation + an updated list:

◆ Claude is validating now. Waiting for its result.

.
.
.

Watching the two models interact was fascinating. Antigravity would confidently output the initial sequence, Claude would meticulously review it (often catching nuance around interim Prime Ministers like Gulzarilal Nanda), and then Antigravity would process Claude's corrections. It functioned like a highly pedantic game of ping-pong.

While this specific test was just good fun, the real motivation behind integrating polly is much more practical. I've been working on a complex PostgreSQL B-tree index compaction patch to solve the index bloat problem that VACUUM leaves behind. Developing database internals is notoriously tricky, and I've been relying on Claude (and now Antigravity) to help write and re-write and review the code heavily.

Having two AI agents look at the developing patch from different angles has been invaluable. They continually find subtle bugs, naming inconsistencies, edge-case production gotchas, and deviations from C best practices. Instead of manually passing code snippets back and forth between different chat windows, the idea here is to see if polly (or a similar tool) can allow me to seamlessly automate this cross-validation part without my intervention.

Now that the Antigravity harness is patched up and working, I've proven that this dual-engine review process can work, and be an automated part of my Postgres hacking workflow. That said, because Google's API billing operates separately from the standard Google AI Pro account, this specific Claude-and-Gemini pairing was mostly a test run. For the near future, I may just end up cross-validating between Claude and Codex (or a similar alternative) to keep the pipeline running smoothly and cost-effectively.

For the curious, here is the Relay trace of that conversation, and the final result.



   Relay summary

    Hop               Agent                 Result
    ─────────────────────────────────────────────────────
    Generate          antigravity (Gemini)  15-entry list
    RT1 → validate    Claude                two changes
    RT1 → revalidate  antigravity           no changes
    RT2 → validate    Claude                no changes
    RT2 → revalidate  antigravity           no changes


   Resultant list — Prime Ministers of India (chronological by start of first term)


     1 Jawaharlal Nehru — 1947
     2 Gulzarilal Nanda (Acting) — 1964
     3 Lal Bahadur Shastri — 1964
     4 Indira Gandhi — 1966
     5 Morarji Desai — 1977
     6 Charan Singh — 1979
     7 Rajiv Gandhi — 1984
     8 Vishwanath Pratap Singh — 1989
     9 Chandra Shekhar — 1990
    10 P. V. Narasimha Rao — 1991
    11 Atal Bihari Vajpayee — 1996
    12 H. D. Deve Gowda — 1996
    13 Inder Kumar Gujral — 1997
    14 Manmohan Singh — 2004
    15 Narendra Modi — 2014

   One note worth flagging (the cross-checks agreed, but for completeness): this
   list counts Gulzarilal Nanda once at his first acting stint (1964). He also
   served a second acting term in 1966 after Shastri's death — both models
   consistently treated 'first stint' as the ordering key, so he
   appears a single time at 1964, which matches your request.

Keep having fun !!

19 Jun 2026

Index Bloat - The Problem VACUUM Leaves Behind

This is an opening post (of a series) about my attempt at automatic B-tree index compaction integrated into autovacuum.

Index bloat is one of the most common performance problems in production PostgreSQL deployments — and one of the most misunderstood. People assume that a fully vacuumed table has a healthy index - which isn't true. There is a significant gap between an index that has been cleaned and one that has been compacted, and although Postgres is closing that gap (v13 enabled index deduplication & v14 brought bottom-up index deletion), there is a lot more to do here.

This post explains what index bloat actually is, why standard VACUUM cannot fix it, and why the existing manual tools — while useful — are not a substitute for continuous automated maintenance.

What Index Bloat Is (and Isn't)

After a large bulk delete — say, removing 80% of the rows in a table — autovacuum will dutifully remove all 80% of the dead index entries. The index will look clean. But the leaf pages are still there, mostly empty. A table that went from 1 million to 200,000 rows may still have an index file that occupies roughly the same space as before (a leaf page that held 100 index entries, and now holds 20, still occupies 8 kB on disk). Only the entries are gone; the pages remain.

The practical consequence is stark. A fully vacuumed B-tree index can have hundreds of leaf pages each filled to just 20% capacity. A range scan covering 10% of the keyspace now could end up reading five times as many pages as a compact index would require. Every buffer-cache miss costs the same whether the page carries 10 tuples or 100.

Why VACUUM Can't Merge Pages

PostgreSQL's autovacuum already visits every B-tree index after vacuuming a table. It calls btbulkdelete to remove dead index entries, then btvacuumcleanup to perform index-level tidying. What it cannot do is merge two partially-filled leaf pages into one and free the other.

Merging requires moving index entries from one page (the source) to another (the destination), then marking the source as deleted and unlinking it from the sibling chain. The B-tree code already knows how to do this: _bt_pagedel handles the case where a leaf page becomes completely empty after all its entries are removed. But VACUUM never merges pages that are merely sparse. A page at 20% fill is considered live and is left alone indefinitely.

Why can't VACUUM just merge sparse pages the same way it handles empty ones?

The obstacle is locking. B-tree index scans descend to a leaf page, pin it in memory, and then walk right by following btpo_next links to adjacent sibling pages. If a merge moves entries from a right page to a left page while a concurrent scan has already pinned the right page but not yet finished reading from it, the scan will miss the moved entries. That is a correctness bug.

Page-level locks — shared for reading, exclusive for writing — are not enough to prevent this. A concurrent scan can pin the right page before a merge begins, but after the check that would have prevented the merge.

So the option is either (a) Carefully coordinating page-level locks across all concurrent access-paths, OR, (b) Taking a relation-level lock, held across the entire merge operation, which can then guarantee that no scan is mid-flight on the page being moved. This series of blog-posts tries to revisit this locking dilemma (that makes index compaction difficult today) and see what else is possible to do to help the average application workload.

The Existing Solutions and Their Limitations

The standard tool for recovering bloated index space is REINDEX — or its online variant, REINDEX CONCURRENTLY. A full REINDEX rebuilds the index from scratch and produces a perfectly packed result, but it holds an AccessExclusiveLock on the relation for the entire duration of the build. All reads and writes are blocked. On a large production table that can mean minutes of downtime.

REINDEX CONCURRENTLY avoids that lock but still requires building a complete shadow copy of the index: a second, full-sized index is written while the original stays online. This is expensive in I/O and CPU, can run for hours on large indexes, may leave an invalid index (on cancellation) and requires 2x disk-space.

VACUUM FULL and extensions like pg_repack have the same fundamental character. pg_repack is more surgical than VACUUM FULL — it rebuilds the table and its indexes while the original stays accessible, and swaps them in atomically — but it still builds a complete copy before anything is reclaimed.

PostgreSQL 19 (barring last minute rollbacks) introduces a built-in REPACK command that performs online table and index repacking without requiring an external extension. This is genuine progress: it eliminates the pg_repack dependency, integrates with standard tooling, and is maintained by the core team. But the fundamental character of the operation is unchanged. REPACK rebuilds from scratch, and it is a scheduled one-shot command.

The common thread across all of these tools: a DBA has to notice the bloat — typically through a monitoring query or a performance regression — schedule a maintenance window, and issue the command. In practice, indexes frequently stay bloated for months between maintenance events. A table that sees steady churn (a history log, an events queue, a session store) will re-accumulate bloat between scheduled maintenance runs, sometimes faster than it can be addressed manually.

What production systems actually need is continuous, incremental compaction that runs in the background — the way autovacuum continuously reclaims heap space — without ever taking a lock that blocks application traffic.

What This Series Covers

The remaining posts in this series measure what continuous automated compaction delivers across 19 benchmark scenarios: static bulk deletes, steady-state churn, HOT and non-HOT update workloads, UUID-keyed indexes, concurrent readers under load, sawtooth growth patterns, TPS degradation under accumulating bloat, and streaming replication. The scenarios start simple and progressively test harder conditions.

The final post in this series (coming soon) covers the algorithm itself: the multiple failed design attempts, the core insight, the locking design that keeps application traffic unblocked, the implementation details, and operational considerations.

This is personal research, done on my own time and not connected to or endorsed by my employer. This is a relatively complex topic and I am sharing my findings as I explore this space.

15 Jun 2026

Why Postgres Doesn't Have synchronous_commit=remote_receive?

In distributed database environments, balancing durability and performance is a constant tug-of-war. PostgreSQL’s synchronous_commit parameter sits at the heart of this, giving administrators a dial to choose exactly when a COMMIT returns success to the client.

The idea of remote_receive was born from a simple question: does skipping the standby's disk write yield a measurable, real-world performance benefit? By waiting only for WAL bytes to reach the standby's memory, could we get a meaningful boost over remote_write? I set out to implement and benchmark this feature to find out.

What followed was a journey through network latency, OS page caches, CPU scheduler thrashing, and benchmarking noise. Here is the breakdown of the implementation, the tests, the initial anomalies, and the final results.

1. The Feature: What is `remote_receive`?

Before this branch, PostgreSQL offered four primary synchronous commit modes:

off: Fully asynchronous. (Fastest, least safe)
local: Waits for local disk flush on the primary.
remote_write: Waits for the standby to write the WAL to its OS buffer cache (pwrite).
remote_apply: Waits for the standby to fully replay the WAL. (Slowest, most safe)

remote_receive sits directly between local and remote_write. In this mode, the primary guarantees that the WAL bytes have physically arrived at the standby's walreceiver process buffer. It does not wait for the standby to call pwrite().

The Hypothesis: By completely bypassing the standby's disk I/O, remote_receive should deliver lower latency and higher throughput than remote_write, especially on replica hardware with slow disks.

Implementation Details

To build this, I had to modify both the standby and the primary:

Standby Status Update: I modified the 34-byte wire message that the standby sends to the primary, adding a new 8-byte receivePtr (creating a 42-byte message, backward compatible).
Early Replies: I modified walreceiver.c to send a reply message immediately upon receiving a WAL chunk in memory, before the XLogWalRcvWrite() call executes the pg_pwrite.
Primary Wait Logic: I updated syncrep.c and walsender.c to track the SYNC_REP_WAIT_RECEIVE wait queue, releasing waiting backends as soon as the standby's receivePtr advanced.

2. Scenario 1: The SSD Baseline (Fast Primary, Fast Replica)

To validate the code, I first set up a baseline test using two fast machines on a gigabit LAN.

The Servers:

Primary: lenovo (Intel Core i7-12700 12C/20T, 48GB RAM, NVMe SSD)
Replica: camry (Intel Core i7-4770 4C/8T, 24GB RAM, SATA SSD)

I ran pgbench (Scale 10, 4 clients) for 30 seconds across the different modes.

The Result:

remote_write: 3,944 TPS (Median)
remote_receive: 3,946 TPS (Median)

The performance was virtually identical (a 0.06% difference). Why didn't remote_receive pull ahead?

The Reality of pwrite(): On modern operating systems with free RAM, a standard pwrite() to a buffered file does not write to physical disk immediately. It copies the data into the OS page cache (essentially a memory copy taking mere microseconds), leaving the kernel to flush dirty pages asynchronously.

With network round-trip time (RTT) on a gigabit LAN between 0.2ms and 1.0ms, the 5 microseconds saved by bypassing pwrite() is completely dwarfed by network latency. This makes remote_write and remote_receive perform nearly identically in typical conditions.

This fundamental reality explains why core PostgreSQL developers historically questioned the return on investment (RoI) of a memory-only receive mode. Because a standard pwrite() to the OS page cache is already a RAM-speed operation (taking mere microseconds), remote_write is practically as fast as any network-receive-only mode under normal conditions, but with a major durability advantage: it survives a PostgreSQL process crash on the standby (as long as the OS remains running). Bypassing it would reduce durability without providing any real-world performance benefit in typical conditions. In pgsql-hackers discussions—such as the thread on Measuring Replay Lag where distinct write_lag and flush_lag tracking was introduced—it is clear that the network round-trip time dominates the replication pipeline. To find a measurable benefit for a receive-only mode, I needed a replica where disk I/O was a severe enough bottleneck to cause page cache pressure and slow down the pwrite() call itself.

3. Scenario 2: The HDD Challenge (Asymmetric Hardware)

For the next test, I replaced the fast camry replica with a much weaker machine.

The Servers:

Primary: lenovo (i7-12700, SSD)
Replica: mac (Intel Core i5-2520M 2C/4T, 16GB RAM, 5400RPM HDD)

With a mechanical hard drive and a slow dual-core processor, I expected remote_receive to outpace remote_write. I ran three 30-second runs per mode, but the initial results were unexpected:

The First (Anomalous) Result:

remote_write: 203.6 TPS (Median)
remote_receive: 179.0 TPS (Median)

remote_write was ~20 TPS faster than remote_receive. Tracing the walreceiver and walsender loops ruled out code bugs; instead, the bottleneck came down to four factors:

The OS Cache Illusion: Even on a slow HDD, pwrite() still writes to RAM. The mechanical disk's extreme latency only hits during fsync or when the page cache fills up, meaning the receive-only advantage remained small.
CPU/Scheduler Thrashing: By sending an "early reply" before writing to disk, remote_receive generates twice as many TCP reply packets (one for receive, one later for flush). On the replica's older dual-core CPU, processing this packet flood alongside WAL replay caused high context-switching overhead.
Flow Control: remote_write acted as a natural flow-control mechanism. By waiting to write before replying, it throttled the primary slightly, keeping the replica's CPU out of a thrashing state.
Statistical Noise: The remote_write runs ranged from 177 to 211 TPS (a 19% spread). A 3-run, 30-second test was simply too noisy to yield a reliable median.

Aside: The Raspberry Pi 4 Attempt Before settling on the Mac Mini HDD, I attempted to use a Raspberry Pi 4 (pi4 — Cortex-A72 4C/4T, 4GB RAM, SD card / USB storage) as the slow replica. However, the Pi 4 was a poor fit for this benchmark. The main issue wasn't simply that the CPU maxed out, but that the low-power ARM CPU could not keep pace with the primary's faster rate of WAL generation. This lag cascaded into secondary issues—such as rapidly mounting replication lag, TCP buffer queues, and process starvation—which completely dominated the environment and masked any storage-level performance differences.

4. The Final Test: Eliminating the Noise

In the initial runs, the slow mechanical disk combined with standard kernel buffering created a massive source of noise: the sustained background I/O writes from a just-finished test were still flushing to disk when the next test began. Although an LSN cross-check was already in place (waiting for the replica's replay_lsn to catch up to the primary's current WAL LSN), this only verified database-level catch-up, not physical disk queue clearance. The residual write queue in the OS cache severely penalized the subsequent run, creating artificial variance. To isolate the actual replication performance and eliminate this noise, I overhauled the benchmarking methodology:

Interleaved Runs: I interleaved executions (Write, Receive, Write, Receive...) to average out temporal background OS tasks and thermal states.
Longer Runs and More Iterations: I ran 10 iterations per mode at 60 seconds per run (20 runs total).
Aggressive Cache Flushing: I ran sync on both the primary and replica between runs, followed by a 30-second sleep, to flush the OS page cache to physical disk platters and guarantee clean disk queues.

The Final Results:

Mode	Median TPS	Mean TPS	Median Latency
`remote_write`	201.58	200.74	20.663 ms
`remote_receive`	211.44	206.85	19.434 ms

The Results

With the noise eliminated and disk queues flushed, the true behavior emerged: remote_receive outperformed remote_write by ~10 TPS (~4.9% gain) at the median and ~6 TPS (~3.0% gain) at the mean. The small gap is expected: bypassing a RAM-buffered pwrite() on the replica yields microsecond-level gains, while network round-trip time remains the dominant factor.

Conclusion

The remote_receive implementation successfully introduces a granular durability option that guarantees WAL has crossed the network into the standby's memory before committing.

This exercise highlighted a key rule of database benchmarking: the OS page cache masks physical disk latency until it fills up. To accurately benchmark high-variance storage, one must use interleaved runs, higher iteration counts, and rigorous OS-level cache flushes between tests. Otherwise, you are measuring cache behavior and statistical noise rather than raw database throughput.

A Note on the Development Process & Resources: While I dabble in C and C++ from time to time, development at this level within PostgreSQL internals would not have been possible without AI assistance. Additionally, all resources for this project—including the Claude subscription, development machines, laptop, and time—were entirely personal and completely independent of my employer.

16 May 2026

Taming the Token Burn

Cloud AI is incredibly convenient, but the costs scale rapidly once you start leaning on long-context workflows. I've spent the last year relying on Gemini Pro for everything from Postgres automation to complex scripting, but the honeymoon phase is officially over (for me). As my projects grew in scope, so did the token consumption, eventually hitting a brick wall that forced a rethink of my entire stack.

For most, this is a niche problem. For anyone building high-context agentic workflows, it’s the only problem that matters.

Last year, the ride was smooth. My "AI usage" was minimal and Gemini Pro handled my daily tasks with ease. However, over the last three months, I've started pushing it with "decent-sized" projects—tasks that require keeping dozens of context balls in the air simultaneously. The result? It guzzles through tokens like a V8 engine in a traffic jam.

The breaking point came recently when, during a deep-dive development session, I fed the model a substantial task involving a complex codebase. Within two hours, I burned through 100% of my monthly allocation!!

Three (3) months in a row!

Token usage spike showing a 100% burn rate in just two hours.

Yeah, that's not going to fly.

Finding the Right Tool for the AI Job

It’s time for a pivot, but not necessarily a total exit from the cloud. I’ve recently taken a Claude subscription as well, and it has been proving to be remarkably good at tasks where Gemini used to stutter. It’s basically been a lesson in finding "the right tool for the job" rather than looking for a single silver bullet.

That said, reducing my reliance on expensive cloud tokens is a priority. The recent NVIDIA RTX 5060 Ti purchase was a steep investment, but nonetheless a strategic move towards local autonomy. The goal is to migrate my heavy-lift agentic AI workflows to my local server farm, reserving premium cloud services for the few tasks where they are truly indispensable—specifically, massive context shifts that can't be easily decomposed. Some key examples being planning complex projects, or tasks that require reasoning over large codebases.

I have a few machines at home (hosting IronClaw, Postgres fuzzing, Jellyfin, Git servers, Obsidian repositories, and a Grafana dashboard to keep an eye on it all), and the goal is to keep costs down while making these tools smarter. The only way I see that happening is if I reduce Cloud API dependence and switch existing tooling to local LLMs.

Another key driver is that I expect some of my newer projects, still in stealth mode, to burn through tokens like nobody's business; they wouldn't see the light of day if I didn't change the status quo.

Stay tuned.

14 May 2026

Postgres May 2026 Security Update: 11 CVEs, All Versions Affected

It's that time again. Postgres v18.4 release (along with 17.10, 16.14, 15.18, 14.23) has dropped some serious hints in the git logs, and it's bringing a significant payload of CVE tagged patches. As a seasoned Postgres end-user and an erstwhile DBA, whenever I see a flurry of high-vulnerability security commits, I immediately start recommending that customers begin planning their patching cycles.

Note: Some official CVSS scores are as high as 8.8! and so the urgency and relevance should be considered when prioritizing this upgrade.

Tom Lane: Last-minute updates for release notes.

Security: CVE-2026-6472, CVE-2026-6473, CVE-2026-6474, CVE-2026-6475, CVE-2026-6476, CVE-2026-6477, CVE-2026-6478, CVE-2026-6479, CVE-2026-6575, CVE-2026-6637, CVE-2026-6638

Let's quickly take a quick look at the CVE list:

The Memory and Overflow Fixes

CVE-2026-6472 - Git commit (CVSS Score: 5.4) — Missing CREATE Privilege Check on Multirange Types. (Backpatched to: 14, 15, 16, 17, 18)
CVE-2026-6473 - Git commit (CVSS Score: 8.8) — The Memory Overflow Ghost. (Backpatched to: 14, 15, 16, 17, 18)
CVE-2026-6474 - Git commit (CVSS Score: 4.3) — Unsafe pg_strftime() handling. (Backpatched to: 14, 15, 16, 17, 18)
CVE-2026-6477 - Git commit (CVSS Score: 8.8) — Frontend Large Object Buffer Overruns. (Backpatched to: 14, 15, 16, 17, 18)

The Replication and Backup Vulnerabilities

CVE-2026-6475 - Git commit (CVSS Score: 8.8) — Path Traversal in pg_basebackup & pg_rewind. (Backpatched to: 14, 15, 16, 17, 18)
CVE-2026-6476 - Git commit (CVSS Score: 7.2) — SQL Injection in pg_createsubscriber. (Backpatched to: 17, 18 — pg_createsubscriber was introduced in v17)
CVE-2026-6638 - Git commit (CVSS Score: 3.7) — SQL Injection in Logical Replication. (Backpatched to: 16, 17, 18)

The Cryptography and DoS Defenses

CVE-2026-6478 - Git commit (CVSS Score: 6.5) — Defeating Timing Attacks in Auth. (Backpatched to: 14, 15, 16, 17, 18)
CVE-2026-6479 - Git commit (CVSS Score: 7.5) — SSL/GSS Recursion Denial of Service (DoS). (Backpatched to: 14, 15, 16, 17, 18)

The Edge Cases: Catalogs and Contrib Modules

CVE-2026-6575 - Git commit (CVSS Score: 4.3) — Statistics Restore Crashes. (Backpatched to: 18 only)
CVE-2026-6637 - Git commit (CSS Score: 8.8) — refint Contrib Vulnerability. (Backpatched to: 14, 15, 16, 17, 18)

The Data Anomalies: Breaching the Trend Line

Stepping back and looking at the historical data for Postgres security releases, there's a distinct pattern in how deeply CVEs are backpatched. Typically, the volume of fixes applied to older, mature branches remains relatively low and stable. However, as the graph below illustrates, this release cycle has completely shattered that historical trend line.

The graph plots "Total Vulnerability Age Points" per release month from 2017 to 2026. The trend line stays in single digits through 2019, gradually climbs to 10-15 through 2023, and then spikes sharply in May-2026.

The "Vulnerability Age Points" metric works as follows: for each CVE fixed in a given month, I calculate the age of the oldest major version that received the backpatch (e.g., a fix backpatched to v14 in 2026 scores 5 points, since v14 was released in 2021). The column then sums these scores across all CVEs in that month. This captures both the depth (how far back) and the breadth (how many CVEs) of backpatching activity in a single number.

While it's worth acknowledging that this metric has a natural mathematical bias—since a high volume of CVEs automatically inflates the score (e.g., 11 CVEs all scoring 5 points mathematically guarantees a score of 55)—it still accurately visualizes a very real spike in activity. For years, this metric hovered in the single digits, occasionally bumping up during a busy quarter. But recently, the scores have slowly grown and ballooned for the upcoming May 2026 release.

This isn't just backpatching one or two obscure bugs to a 5-year-old release; it's a highly elevated volume of CVEs all the way back to the oldest supported versions. The sheer depth and breadth of these fixes indicate a fundamental shift in how vulnerabilities are being discovered.

A Personal Hunch: Could AI Be Behind the Spike?

Update: - Feels good to be right :) . This post was written before the release and then it is good to guess correctly that many of these CVEs were found by multiple different AI tools. See CVE pages linked to read more.

I want to be upfront: I have zero insider knowledge here, and no data to back this up. But looking at the sheer volume and the deeply obscure nature of these fixes—integer overflows in palloc_array() edge cases, subtle recursion bugs in startup packet handling, timing side-channels in auth paths—it's hard not to wonder whether some of these discoveries were aided by AI-driven code auditing tools.

Projects like Anthropic's Glasswing are designed to autonomously analyze large codebases for exactly these kinds of subtle, hard-to-spot vulnerabilities. This hunch is further supported by broader industry shifts; recently, trusted figures like Linux kernel maintainer Greg Kroah-Hartman publicly acknowledged that AI bug reports have evolved from "slop" to genuinely high-quality submissions.

The pattern of findings here—broad, systematic, reaching deep into mature code—feels different from the typical "researcher finds one clever exploit" pattern. It feels more like a methodical, machine-assisted sweep. But again, that's purely a hunch on my part, and I'd be happy to be corrected.

Final Thoughts

With the release announcement just around the corner, these fixes are being backpatched to earlier supported versions. As the per-CVE annotations above show, most of these CVEs affect all currently supported versions (14 through 18), though a few are scoped to newer releases only (CVE-2026-6476 to v17+, CVE-2026-6638 to v16+, and CVE-2026-6575 to v18 only).

What does this mean practically? Since these are all minor version upgrades, no full pg_upgrade is needed. Usually, you can simply stop Postgres, install the updated binaries for your version, and restart. However, always carefully read the release notes before upgrading! Patches involving catalog objects or extensions (like the refint fix) might require DBAs to run specific SQL commands post-upgrade, such as ALTER EXTENSION ... UPDATE. The risk of not patching here significantly outweighs the risk of the upgrade itself.

If you expose Postgres to untrusted clients, or operate in a multi-tenant environment, the authentication DoS (CVE-2026-6479) and backup path traversal (CVE-2026-6475) should put this upgrade high on your priority list.

For the latest official information, check the Postgres Security Page and the Minor Release Roadmap.

Stay safe, and happy patching.

22 Apr 2026

Local LLM Update - ThinkStation Meets Blackwell

Finally - an update to the AI workbench, and one hell of an update it is :) !

The home server farm has now been blessed with dedicated silicon, and it is an absolute beast: the NVIDIA RTX 5060 Ti 16GB. The old setup—which was painfully grinding through CPU inference at a mere 1 token/sec on 46GB of RAM—has officially been retired. After months of relying on remote APIs, lightning-fast local agentic workflows are finally a reality, all without seriously over-working my legacy workstation (and waiting eons for responses).

To clarify, my paid subscriptions to cloud LLMs will continue. However, with this capable local setup handling the bulk of my daily tasks, I expect to hit the dreaded "credits have expired" message much later in the month—if ever. At the very least, having this local horsepower certainly pushes off the need to upgrade to a pricier "Ultra" plan just to get more API tokens.

The ultimate objective here is to deploy an agentic setup for the home and family. I initially investigated OpenClaw for the orchestration layer, but its security model was simply too porous for a paranoid systems engineer. IronClaw, on the other hand, looks like a solid, secure candidate to serve as the local agent nexus. Before any of that software could run, I needed an inference engine that did not crawl.

The Hardware: The ThinkStation S30 Meets Blackwell

The host machine is my trusty Lenovo ThinkStation S30 (Machine Type 4351). It is a Sandy/Ivy Bridge Xeon platform, firmly anchored in the PCIe 3.0 era. Dropping a brand-new Blackwell-architecture GPU into a system this old is technically a severe mismatch, but the RTX 5060 Ti 16GB is the perfect fit for this specific niche.

Instead of chasing older, power-hungry professional cards like the RTX A4000, or settling for consumer Turing cards with split memory bottlenecks, the 5060 Ti offered exactly what the S30 needed:

16GB VRAM: The absolute minimum needed to comfortably fit modern 7B-9B parameter models.
GDDR7 Bandwidth: Hitting 448 GB/s, drastically improving throughput over older 60-series cards.
Native FP8/FP4 Support: Crucial for running highly quantized models efficiently.

Overcoming Legacy Architecture Limits

Getting a 2026 GPU to speak with a 2013 motherboard required some immediate troubleshooting. Initial boots into Linux Mint resulted in a wall of kernel panics and DMAR (DMA Remapping) faults. The modern GPU's memory management completely clashed with the S30's legacy Intel VT-d (IOMMU) implementation.

I resolved this by disabling Intel VT-d in the BIOS and appending intel_iommu=off to the GRUB bootloader parameters. This bypassed the broken firmware tables and allowed the system to boot stably with the proprietary NVIDIA 580.126.09 drivers.

Another significant bottleneck was the PCIe 3.0 bus itself. When running vLLM, the default CUDA Graph capture and torch.compile phases initially took a grueling 16 minutes to complete due to the slow bus speed. While it worked beautifully after that initial warmup, the long delay posed a problem: because I set up the inference engine as an auto-start systemd service at boot, systemd would assume the process had hung and shoot it down before compilation could finish. To resolve this, I bypassed the compilation overhead by starting vLLM with the --enforce-eager flag, ensuring the service starts up reliably without getting restarted by the OS.

Performance: 40-70 Tokens per Second

Despite the older host system, the 5060 Ti excels at small, localized tasks.

I settled on the Qwen2.5-Coder-7B-Instruct-FP8-Dynamic model. Because it leverages FP8 precision, the model weights consume roughly 8.5GB of the available 15.48 GiB VRAM. This leaves plenty of overhead for the KV cache and a 16k context window without spilling over into system RAM (which, across a PCIe 3.0 bus, would throttle performance down to single-digit tokens per second).

With vLLM 0.19.1 managing the inference (I initially started with Ollama but switched to vLLM—I don't have comparison numbers yet, but that is a topic for a future post), the S30 consistently pushes 40 to 70 tokens/sec for generation, and handles prompt processing at over 700 tokens/sec.

Here is a quick snapshot from the vLLM logs confirming these real-world speeds during a typical code completion task:

(APIServer pid=967222) INFO 04-22 19:09:50 [loggers.py:259] Engine 000: 
  Avg prompt throughput: 709.3 tokens/s, Avg generation throughput: 4.3 tokens/s, 
  Running: 1 reqs, Waiting: 0 reqs, GPU KV cache usage: 9.9%, 
  Prefix cache hit rate: 0.4%

(APIServer pid=967222) INFO:     127.0.0.1:60326 - "POST /v1/chat/completions 
  HTTP/1.1" 200 OK

(APIServer pid=967222) INFO 04-22 19:10:00 [loggers.py:259] Engine 000: 
  Avg prompt throughput: 52.4 tokens/s, Avg generation throughput: 41.0 tokens/s, 
  Running: 0 reqs, Waiting: 0 reqs, GPU KV cache usage: 0.0%, 
  Prefix cache hit rate: 0.6%

The inference service now runs automatically via systemd. To securely access the engine from my laptop at home, I rely entirely on an SSH port forward. This elegant solution means that dealing with network-level security or opening firewall ports isn't even a configuration requirement; SSH handles the secure tunnel perfectly. On the client side, this local endpoint works beautifully with VS Code and the Continue extension, providing a seamless and entirely private AI coding experience.

Power, Thermals, and Footprint

One of the best surprises of this Blackwell upgrade is how remarkably quiet and power-efficient the card is. It idles gracefully at 10W and rarely pushes past 20W during my typical small localized tasks. Because it runs so cool (sitting around 38°C with the fans completely off at 0%), stuffing the entire server rig into a cupboard does not trigger any thermal anxiety whatsoever.

For reference, here is the current footprint while loaded:

With the hardware foundation finally stabilized, incredibly power-efficient, and pushing excellent tokens per second, the runway is completely clear to deploy IronClaw and build out the actual home agent capabilities.

Looking Ahead

I am absolutely thrilled to have reached this stage! Pressing my trusty old ThinkStation into service was a gamble that paid off beautifully, saving a cool $2,000 to $3,000 that would have otherwise gone into an entirely new workstation on top of the GPU cost. Better yet, this setup gives me massive flexibility to stagger future upgrades. If I ever need more VRAM, I can simply drop in a second Blackwell card for dual mode—provided I finally upgrade the host machine to support PCIe 5.0 so the interconnect isn't choked by legacy bus speeds.

For now, I am eagerly looking forward to what's next. Having a secure, lightning-fast, and entirely local AI bedrock opens up incredible possibilities for the home network. I'll be diving deep into the IronClaw orchestration very soon, and you can expect more blog posts detailing that agentic journey in the coming weeks!

29 Mar 2026

Understanding Google AI Credits: What You're Actually Paying For

If you've subscribed to Google AI Pro [1] (or are eyeing an upgrade), you've probably noticed the word "credits" appearing everywhere — in your billing dashboard, in the Gemini app, and inside your code editor [5]. But what are they? How do they reset? And if your family is on the plan, who's burning through them? [3]

I spent an afternoon untangling this, and the short version is: Google doesn't have one quota system — it has three, each resetting on a different clock. Here's how it all works.

The Three Clocks

The single biggest source of confusion with Google AI Pro is that there are three independent limits running simultaneously, each with its own reset schedule:

Timer	What Resets?	Applies To
Monthly	1,000 AI Credits	Video / 4K Images / IDE Overages
Weekly	Antigravity Baseline Quota	"Free" high-tier usage in Antigravity
Daily	Prompt & Media Limits	Chat (Thinking/Pro), Images, Music

Understanding which "fuel tank" you're drawing from at any given moment is the key to not running dry at the wrong time.

Monthly AI Credits (The 1,000 Pool)

When you subscribe to AI Pro, Google allocates 1,000 AI Credits [1] at the start of each billing cycle. These are the credits you see in your Google One dashboard with a countdown showing days until reset.

Key facts:

No rollover. If you have 380 credits left with 10 days to go, those 380 vanish on your billing date [1]. You start fresh at 1,000.
These are for premium "heavy" tasks. High-end video generation (Google Flow/Veo 3.1) [4], advanced image creation (Imagen 4 / Nano Banana 2), and IDE overage billing all draw from this pool.
Standard tasks are free. Typing prompts into the Gemini app, generating standard images (Gemini / Imagen 4), and basic music tracks (Lyria 3) do not consume monthly credits up to their daily limits [4].

The Top-Up Exception: If you manually purchase a "Top-up" credit pack, those purchased credits are typically valid for 12 months from the date of purchase and do carry over across billing cycles [1]. Only your subscription credits are use-it-or-lose-it.

What Can You Buy With Credits?

Here are the standard costs for premium generation (2026 pricing) [2, 6]:

Feature	Credit Cost (Approx.)	What 380 credits buys you
Nano Banana 2 (Imagen 4)	1	~380 high-res assets
Veo 3.1 Fast (via Flow)	10	~38 cinematic clips
Veo 3.1 Quality (via Flow)	100	~3 cinematic clips
IDE Overage (per hour)*	15	~25 hours of high-tier use

*Approximation based on typical token consumption in Antigravity.

Tip: If you're close to the end of your billing month with credits to spare, it's the perfect time for high-resolution 4K image generation or experimental Veo video clips. They won't cost you anything "extra" once the new month starts.

Antigravity Weekly Baseline Quota (The IDE Clock)

If you use Google Antigravity (the AI-powered code editor), you'll notice a separate "refresh date" in your settings. This date is typically different from your monthly credit reset.

What it is: Google gives AI Pro users a "Free Baseline" of high-performance model usage (Gemini 3.1 Pro, Claude 4.6 Sonnet, etc.) within Antigravity every week [5, 6].

Key facts:

5-Hour Sprints: Your immediate capacity refreshes every 5 hours [5].
7-Day Hard Cap: There is a weekly baseline limit. If you exhaust this, the 5-hour refresh stops working until your next 7-day reset (the "refresh date" you see) [5].
This quota is individual — each account on your family plan has its own personal IDE baseline.

The Overage Toggle

In Antigravity settings, the "Use AI Credits for Overages" toggle lets you decide what happens when your weekly baseline runs out:

ON: Antigravity draws from your monthly 1,000-credit pool to keep you coding at full speed.
OFF: You're limited to the "Flash" model until the weekly refresh hits.

Daily Quotas (Chat, Music, and Deep Research)

Finally, your day-to-day interactions have their own daily caps that reset at midnight Pacific Time. These never touch your 1,000 monthly credits:

Feature	Limit Per Day
Thinking Model Chat	300 prompts
Pro Model Chat	100 prompts
Nano Banana 2 (Std)	1,000 images
Lyria 3 (Music)	50 tracks [4]
Deep Research Reports	3 reports

Family Plans: What's Shared, What's Not

Feature	Shared with Family?	Reset Cadence
1,000 AI Credits	Yes [3] (shared pool)	Monthly
2 TB Storage	Yes [1] (shared pool)	Monthly
Daily Interaction Limits	No (individual)	Daily
Antigravity Baseline	No [5] (individual)	Weekly

The Credits are the only shared fuel. If your family member generates a few Veo 3.1 videos, they are spending from the same 1,000-credit bucket you use for your IDE overages [3]. You can monitor this in the Google One → AI Credits Activity dashboard.

Important: Family members must be 18+ for high-tier model access. Under-18 accounts are restricted to Gemini Basic.

References

Google One AI Pro: Membership Overview - Plan pricing and 1,000 credit allocation.
Google One - AI Credits Pricing (2026) - Details on credit reset and top-ups.
Google One Help - Managing Family AI Credit Activity - Shared pool tracking and activity history.
Google Cloud - Expanding AI Creativity with Google Flow and Veo 3.1 - Media generation models and limits.
Google DeepMind - Antigravity IDE Baseline Quota Framework - 5-hour refresh and weekly baseline rules.
Anthropic - Claude Sonnet 4.6 on Google Cloud Vertex AI - February 2026 release news.

9 Mar 2026

Display IMDb Ratings on Einthusan

Technical Features

Surfing niche streaming sites without inline film ratings is a recipe for endless tab-opening and "analysis paralysis." To scratch my own itch, I put together a small userscript called Masala Script to fix this exact problem.

For context, Einthusan is a massive streaming directory for South Asian cinema. While it's an excellent digital archive, exploring its thousands of regional films is tedious because it lacks external metadata like IMDb ratings.

To solve this friction, Masala Script (presently just a single Tampermonkey extension file) reads the Einthusan page, interfaces with the free OMDB API, and renders IMDb rating badges right next to the title.

Technical Features

While it might seem trivial to inject an API call onto a DOM load, building this script properly required addressing a few interesting technical hurdles:

Intelligent Fallback Matching via Wikipedia: South Asian movie titles vary wildly in transliteration. An exact-title search against the OMDB API frequently fails for regional films. However, Einthusan usually provides a Wikipedia link for each movie. I built a transparent scrape fallback: if the direct title/year fetch fails, the script fetches the linked Wikipedia page in the background, extracts the definitive ttXXXXXXX IMDb ID using a regular expression, and then repeats the OMDB query using that exact ID for 100% accuracy.
Aggressive Client-Side Local Caching: The free tier of the OMDB API provides 1,000 requests per day. A typical Einthusan browse page can render over 20 movie cards at once. Scrolling through just 50 pages would immediately exhaust the daily allowance and result in rate limits. The script counters this by heavily utilizing the GM_setValue and GM_getValue Tampermonkey APIs—caching successful queries in the browser for 7 days, and failed title lookups for 1 day.
Detailed Error Tooltips: Rather than failing silently, any lookup that ultimately misses (or fails due to API config errors) renders a "Fail" or "N/A" UI badge. When hovered, it provides an exact exception traceback or error string so the user knows exactly why the movie metadata wasn't found.
Single Page Application Navigation Detection: Einthusan manages categories via AJAX updates, utilizing history.pushState and popstate to load new frames. The userscript actively monkey-patches and listens to these navigation boundaries, injecting the DOM observers properly on dynamic content swapping.

The Genesis of Masala Script

Built using modern AI tools, this project demonstrates how rapidly useful and robust browser enhancements can be coded from scratch with minimal manual boilerplating.

Looking at the initial Git history over the course of the day shows how quickly the script escalated from a basic exact-title scraper into a much more mature and intelligent tool:

commit 70027efcd906586cb46928b6da16c46c2402ae25
docs: replace development instructions with a detailed features and notes section in README.

commit 133dbdf7e6123f3a9ac774d820e1a1aa932051f4
Add caching for imdb ratings

commit acb25cc1187884e771efa9e32cf9836461d403c1
feat: Do a fallback search (via Wikipedia), in case first imdb rating fetch fails.

commit 06f3f05be8f6ebd08dfedffb80cf4d53544786f6
feat: Add movie page to the list of pages where this works

Try It Out

If you want to try it out on your next Einthusan movie night:

Install the Tampermonkey browser extension.
Claim a free OMDB API Key.
Install the script by clicking on imdb-einthusan.user.js.

The next time you navigate to any browse or movie page on Einthusan, the script will prompt you for your OMDB API key (only once), and start displaying ratings next to the movie cards!

Although this extension might only interest a very niche subset of users, for those of us who regularly browse Einthusan (and heavily rely on IMDb to filter through the noise), it's a massive quality-of-life add-on to have.

If anyone is looking for new features, has a bug to report, or wants to contribute, feel free to create an issue or submit a pull request on the repository. Happy watching!

4 Feb 2026

The "Skip Scan" You Already Had Before v18

PostgreSQL 18 introduces native "Skip Scan" for multicolumn B-tree indexes, a major optimizer enhancement. However, a common misconception is that pre-v18 versions purely resort to sequential scans when the leading column isn't filtered. In reality, the cost-based planner has long been capable of leveraging such indexes when efficient.

How it works under the hood:

In pre-v18 versions, if an index is significantly smaller than the table (the heap), scanning the entire index to find matching rows—even without utilizing the tree structure to "skip"—can be cheaper than a sequential table scan. This is sometimes also referred as a Full-Index Scan. While this lacks v18’s ability to hop between distinct leading keys, it effectively still speeds up queries on non-leading columns for many common workloads.

Notably, we would see why this v18 improvement is not a game-changer for all workloads, and why you shouldn't assume speed-up for all kinds of datasets.

Benchmarks: v17 vs v18

To understand when the new v18 Skip Scan helps, we tested two distinct scenarios.

Scenario A: Low Cardinality (The "Success" Case)

The Setup: We created an index on (bid, abalance) and ran the following query:

SELECT COUNT(*) FROM pgbench_accounts WHERE abalance = -2187;

Note: We did not run a read-write workload, so abalance is 0 for all rows. The query returns 0 rows.

Data Statistics:

Table Rows: 1,000,000
Leading Column (bid): ~10 distinct values (Low Cardinality).
Filter Column (abalance): 1 distinct value (Uniform).

Results:

Version	Strategy	TPS	Avg Latency	Buffers Hit
v17	Index Only Scan (Full)	~2,511	0.40 ms	845
v18	Index Only Scan (Skip Scan)	~14,382	0.07 ms	39

Why the massive 5.7x gain?

Pre-v18 Postgres sees that bid is not constrained. It decides its only option is to read the entire index (all leaf pages) to find rows where abalance = -2187. It scans 1,000,000 items (845 pages).

Postgres v18 uses Skip Scan. It knows bid comes first. instead of scanning sequentially, it:

Seeks to the first bid (e.g., 1). Checks for abalance = -2187.
"Skips" to the next unique bid (e.g., 2). Checks for abalance = -2187.
Repeats for all 10 branches.

It effectively turns one giant scan into 10 small seeks. The EXPLAIN output confirms this with Index Searches: 11 (10 branches + 1 stop condition) and only 39 buffer hits.

v18 Skip Scan Plan (Scenario A):

 Index Only Scan using pgbench_accounts_bid_abalance_idx on public.pgbench_accounts ...
   Index Cond: (pgbench_accounts.abalance = '-2187'::integer)
   Heap Fetches: 0
   Index Searches: 11
   Buffers: shared hit=39

Scenario B: High Cardinality (The "Failure" Case)

The Setup: We used the same query, but against an index on (aid, abalance).

Data Statistics:

Leading Column (aid): 1,000,000 distinct values (100% Unique / High Cardinality).

Results:

Version	Strategy	TPS	Avg Latency	Buffers Hit
v17	Index Only Scan (Full)	~55.8	17.92 ms	2735
v18	Index Only Scan (Full)	~51.2	19.52 ms	2741

Why no improvement? If v18 attempted to "Skip Scan" here, it would have to skip 1,000,000 times (once for every unique aid). Performing 1 million seeks is significantly heavier than simply reading the 1 million index entries linearly (which benefits from sequential I/O and page pre-fetching). The planner correctly estimated this cost and fell back to the standard "Index Only Scan" used in v17.

v18 Plan (Identical to v17):

 Index Only Scan using pgbench_accounts_aid_abalance_idx on public.pgbench_accounts ...
   Index Cond: (pgbench_accounts.abalance = '-2187'::integer)
   Heap Fetches: 0
   Index Searches: 1
   Buffers: shared hit=2741

How to Identify a Skip Scan

You might notice that the node type in the EXPLAIN output remains Index Scan or Index Only Scan in both cases. PostgreSQL 18 does not introduce a special "Skip Scan" node.

Instead, you must look at the Index Searches line (visible with EXPLAIN (ANALYZE)):

v17 (and older): The Index Searches row does not appear.
v18 (Standard Scan): Index Searches: 1. The scan started at one point and read sequentially (as seen in Scenario B).
v18 (Skip Scan): Index Searches: > 1. The engine performed multiple seeks (as seen in Scenario A which had 11).

In our Scenario A (Success), Index Searches: 11 tells us it performed ~11 hops, confirming the feature was used.

How to reproduce:

Note: We use PostgreSQL 13 in this section to meaningfully demonstrate that even older versions (long before v18) could efficiently utilize multicolumn indexes for these types of queries, to clarify that older Postgres versions were capable of avoiding a Full Table Scan (and instead fallback to Full Index Scan) in such scenarios.

PostgreSQL 13 (a running cluster)
A pgbench-initialized database (run pgbench -i to create pgbench_accounts, pgbench_branches, pgbench_tellers, and pgbench_history)
Reasonable data loaded (default pgbench -i -s 10 or similar)

Steps to reproduce the example:

Follow these steps in your terminal to set up the test environment. This assumes you have PostgreSQL 13 (or a similar version) installed and its command-line tools are in your PATH.

Step 1: Create and Initialize the Database

createdb testdb                # Create a new database
pgbench -i -s 10 testdb         # Initialize it with 1 million rows

Step 2: Run a Quick Read-Write Test

To simulate some database activity, run a standard 10-second read-write benchmark.

pgbench -T 10 testdb

Step 3: Connect and Run the Test Query

Now, connect to the database with psql to run the SQL commands that demonstrate the index scan behavior.

SQL steps (run in psql):

psql testdb
-- create a multicolumn index (first column aid, second column abalance)
testdb=# CREATE INDEX CONCURRENTLY IF NOT EXISTS pgbench_accounts_aid_abalance_idx
  ON pgbench_accounts(aid, abalance);

testdb=# select abalance, count(*) from pgbench_accounts group by abalance order by count(*) desc limit 5;
 abalance | count
----------+--------
        0 | 995151
    -2187 |      4
    -4621 |      4
    -4030 |      4
    -2953 |      4
(5 rows)

-- run an explain on a selection that filters only on the second column (abalance)
testdb=# EXPLAIN (ANALYSE, VERBOSE, COSTS)
SELECT COUNT(*) FROM pgbench_accounts WHERE abalance = -2187;

Aggregate  (cost=18480.77..18480.78 rows=1 width=8) (actual time=18.672..18.674 rows=1 loops=1)
   Output: count(*)
   ->  Index Only Scan using pgbench_accounts_aid_abalance_idx on public.pgbench_accounts  (cost=0.42..18480.69 rows=33 width=0) (actual time=8.583..18.659 rows=4 loops=1)
         Output: aid, abalance
         Index Cond: (pgbench_accounts.abalance = '-2187'::integer)
         Heap Fetches: 0
 Planning Time: 0.328 ms
 Execution Time: 18.732 ms
(8 rows)

Notes on the output above:

The important line is Index Only Scan using pgbench_accounts_aid_abalance_idx along with Index Cond: (pgbench_accounts.abalance = '-2187'::integer) — this demonstrates the planner chose an index scan that probes the multicolumn btree for entries where the second column matches the predicate.

Why this is not the v18 "skip-scan" feature in full:

The v18 skip-scan adds optimizer logic to efficiently iterate distinct values of leading columns and probe the remainder of the index; it's a targeted algorithmic addition. What we show above is the planner choosing an index scan over a multicolumn index and applying the Index Cond to the second column. That can be effective for many queries and data layouts, but it lacks the specialized skip-scan internals that v18 adds for certain other cases.

Production Tips

If you need queries on a non-leading column to be fast on older Postgres versions, create the right index and keep statistics current (ANALYZE). The planner may prefer an index scan over a seq scan when selectivity and costs align.
Consider partial or expression indexes when appropriate; they let you make an index that directly serves the filter you need.
When portability across versions is important, test on the earliest Postgres version you need to support; planner behavior can vary by release, statistics, and data distribution.

Conclusion

Postgres v18's documented skip-scan addition for B-tree indexes is a welcome and useful optimizer enhancement, specifically for low-cardinality leading columns. However, for high-cardinality leading columns (like our first example), the standard Full Index Scan remains optimal, and pre-v18 versions handle them just fine.

References

PostgreSQL 18 release notes (skip-scan, indexes): https://www.postgresql.org/docs/18/release-18.html
nbtree skip-scan optimization: https://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=92fe23d93
Further optimize nbtree search scan key comparisons: https://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=8a510275d

3 Jul 2026

Why SeaweedFS?

The Architecture: Pooling the Fleet

Cluster Topology & Capacity

The Rack Isolation Strategy: Handling Multi-Disk Nodes

Replication Policies

Operational Lessons: Surviving a Silent Outage

The Boot-Time Race

Discovery & Triage

Hardening the Storage Layer

1. Hardcoding IPs in Services

2. Delaying Until the Network is Truly Online

3. Fail-Hard Mountpoint Permissions (The chmod 000 Trick)

4. Active Liveness Watchdog

5. Prometheus Disk Health Monitoring

The Economics: Cloud Costs vs. Commodity Hardware

Conclusion

23 Jun 2026

19 Jun 2026

What Index Bloat Is (and Isn't)

Why VACUUM Can't Merge Pages

The Existing Solutions and Their Limitations

What This Series Covers

15 Jun 2026

1. The Feature: What is remote_receive?

Implementation Details

2. Scenario 1: The SSD Baseline (Fast Primary, Fast Replica)

3. Scenario 2: The HDD Challenge (Asymmetric Hardware)

4. The Final Test: Eliminating the Noise

The Results

Conclusion

16 May 2026

Finding the Right Tool for the AI Job

14 May 2026

The Memory and Overflow Fixes

The Replication and Backup Vulnerabilities

The Cryptography and DoS Defenses

The Edge Cases: Catalogs and Contrib Modules

The Data Anomalies: Breaching the Trend Line

A Personal Hunch: Could AI Be Behind the Spike?

Final Thoughts

22 Apr 2026

The Hardware: The ThinkStation S30 Meets Blackwell

Overcoming Legacy Architecture Limits

Performance: 40-70 Tokens per Second

Power, Thermals, and Footprint

Looking Ahead

29 Mar 2026

The Three Clocks

Monthly AI Credits (The 1,000 Pool)

What Can You Buy With Credits?

Antigravity Weekly Baseline Quota (The IDE Clock)

The Overage Toggle

Daily Quotas (Chat, Music, and Deep Research)

Family Plans: What's Shared, What's Not

References

9 Mar 2026

Technical Features

The Genesis of Masala Script

Try It Out

4 Feb 2026

Benchmarks: v17 vs v18

Scenario A: Low Cardinality (The "Success" Case)

Scenario B: High Cardinality (The "Failure" Case)

How to Identify a Skip Scan

How to reproduce:

Steps to reproduce the example:

Step 1: Create and Initialize the Database

Step 2: Run a Quick Read-Write Test

Step 3: Connect and Run the Test Query

Production Tips

Conclusion

References

3. Fail-Hard Mountpoint Permissions (The `chmod 000` Trick)

1. The Feature: What is `remote_receive`?