Moving Storage Around for Better Performance

Moving Storage Around for Better Performance

Some people wind down after work with a good book. Others crack open a beer and watch TV. Me? I move virtual machines around for fun.

After a day dealing with SS7 upstream failures that no one at Verizon seems in a hurry to fix, troubleshooting entire cable modem nodes going dark, and spending an hour building a phone station in a DMS10 only to find out our provisioning software was mangling syntax on the backend, I need something that makes sense. Something that, when I fix it, stays fixed.

That brings me to tonight’s project—optimizing storage performance in my Proxmox environment.

For the last six months, I have been running CodexMCP, my fully automated ISP stack project, on a Proxmox-based infrastructure. Most of the virtual machines are running on a ZFS-backed storage pool using spinning disks. It works, but as the load has increased, the limits of spinning rust have become more noticeable. Disk I/O has started becoming the bottleneck, particularly for services like OpenSearch, MariaDB, and logging.

I do have an NVMe drive in this system, but up until now, it has really only been used as the boot drive and to store OS ISOs. The idea is simple—offload the most disk-intensive virtual machines onto the NVMe drive while keeping everything else on the existing ZFS storage. That should free up I/O bandwidth, reduce contention on the ZFS pool, and generally smooth things out.

The process itself is straightforward but has some quirks. Moving a running VM from one storage pool to another is slow. Painfully slow. The VM is still writing to disk while I am trying to copy it, and that kills performance. The solution? Just stop the VM, move it over, and start it back up. Nothing in this environment is mission-critical, so I have the luxury of downtime when I need it.

My primary workstation, the one where I write code, is part of this system, so that requires a bit more caution. But that’s why I have it rsyncing out to Linode every minute. If something goes sideways, I am not losing work.

Once everything is moved over, I expect some immediate improvements. OpenSearch queries should be snappier, database transactions will speed up, and log ingestion will put less strain on the system overall. It’s a simple change, but one that makes a meaningful impact—something I wish I could say for half the nonsense I deal with during the day.

This is what keeps me sane. Moving things around, optimizing, making systems work better. It is a reminder that, at least in my own infrastructure, I can fix problems properly. I am not waiting on Verizon. I am not dealing with legacy provisioning software that throws syntax errors for no reason. I am just solving problems the way they should be solved.

That is a good way to end the day.