All articles in the "Proxmox" series
- Commisioning Ex-DC Storage
- My ZFS Strategy Ground Work
Proxmox VE + ZFS = Practical Storage Strategy
My Proxmox VE server is more than just a hypervisor. It is also my NAS, backup target, container platform, and general-purpose infrastructure server. I cover the NAS side of this design in another article using a Samba container:<<insert article link here>>
At the centre of this design is OpenZFS.
ZFS is often described as a filesystem, but that description undersells what it actually is. ZFS combines:
- a filesystem
- a logical volume manager
- software RAID
- snapshotting
- data integrity verification
- replication
- caching layers …into a single integrated software-defined storage platform.
This article is not intended to be a deep ZFS primer. The internet already has excellent documentation, benchmarks, arguments, and religious wars surrounding ZFS. Instead, this article documents the storage design strategy behind my Proxmox VE homelab, how I balance performance, reliability, and capacity, and why I made the design decisions that I did.
Start With the Physical Layer
Before designing pools, datasets, or caching strategies, I start with the hardware itself.
These are the block devices currently installed in my server:
lsblk -d -o name,model,size,serial,id-link
NAME MODEL SIZE
sda INTEL 745.2G
sdb INTEL 745.2G
sdc INTEL 745.2G
sdd INTEL 745.2G
sde SAMSUNG 894.3G
sdf SAMSUNG 894.3G
sdg INTEL 745.2G
sdh INTEL 745.2G
sdi SAMSUNG 894.3G
sdj SAMSUNG 894.3G
sdk HUS726040ALS210 3.6T
sdl HUS726040ALS210 3.6T
sdm HUS726040ALS210 3.6T
sdn HUS724040ALS641 3.6T
sdo HUS726040ALS214 3.6T
sdp HUS726040ALS214 3.6T
sdq HUS726040ALS214 3.6T
sdr HUS726040ALS214 3.6T
nvme1n1 Optane 260.8G
nvme0n1 BIWIN 476.9G
This gives me:
- 8 × 4 TB SAS HGST HDDs
- 4 × Intel enterprise SATA SSDs
- 4 × Samsung SATA SSDs
- 1 × Intel Optane NVMe device
- 1 × consumer NVMe device
Apart from the NVMe devices, these are all ex-datacenter drives. I have another article covering how I validate and monitor used enterprise drives before trusting them with production data:<<insert article link here>>
Why I Rejected Hardware RAID
One of the first architectural decisions I made was to avoid traditional hardware RAID controllers entirely.
Historically, hardware RAID solved several important problems:
- redundancy
- caching
- drive abstraction
- recovery workflows
Modern ZFS already handles these functions internally and expects direct access to physical disks to maintain data integrity.
Introducing a hardware RAID layer between ZFS and the disks can:
- obscure SMART telemetry
- interfere with error handling
- complicate recovery
- break end-to-end checksumming
- create controller dependency
In short, ZFS wants to manage the disks itself.
For this build, simplicity and recoverability matter more to me than maintaining legacy RAID workflows.
Persistent Device Naming Matters
Do not build ZFS pools using raw Linux device names like /dev/sda.
Linux device names can change:
- after reboots
- after HBA changes
- after moving disks between ports
- after motherboard replacement
This can make troubleshooting significantly harder and, in worst-case scenarios, complicate pool recovery.
Instead, use persistent device identifiers:
/dev/disk/by-id//dev/disk/by-path/
I prefer using WWN-based identifiers from /dev/disk/by-id/.
Example:
/dev/disk/by-id/wwn-0x5000cca2697d55f4
This has several advantages:
- identifiers remain stable across reboots
- disks remain identifiable after migration
- failed drives are easier to physically locate
- serial numbers map cleanly to ZFS fault reports
It also goes nicely with how ZFS reports degraded devices.
Boot Pool Layout
Proxmox VE 8.x has mature and stable support for booting directly from mirrored ZFS pools.
My boot pool (rpool) consists of two mirrored Intel enterprise SSDs configured during installation:
zpool status
pool: rpool
state: ONLINE
config:
NAME STATE
rpool ONLINE
mirror-0 ONLINE
ata-INTEL_SSDSC2BB800G4_BTWL505202K2800RGN-part3 ONLINE
ata-INTEL_SSDSC2BB800G4_BTWL411104HV800RGN-part3 ONLINE
These drives are dedicated to the operating system and Proxmox VE itself, so from this point onward I largely treat them as fixed infrastructure components.
I will cover:
- mirrored boot pools
- EFI partition replication
- bootloader recovery
- rpool dataset structure …in a separate article.
A Minimal ZFS Primer
This article focuses on architecture rather than teaching ZFS fundamentals, but there are a few concepts worth defining before moving further.
Core Concepts
- VDEV — A group of disks forming a redundancy layout such as a mirror or RAIDZ group.
- Pool (
zpool) — One or more VDEVs aggregated into a single storage layer. - Dataset — A lightweight filesystem with independent properties like compression, quotas, and snapshots.
- ZVOL — A block device created inside ZFS, commonly used for VM disks.
- Special VDEV — A high-speed metadata device used to accelerate metadata and optionally small-file workloads.
A VDEV failure can destroy the entire pool, even if all other VDEVs remain healthy.
Performance Layers
ZFS also includes several caching and logging layers:
- ARC — Primary RAM-based read cache
- L2ARC — Secondary cache on fast storage devices
- ZIL — Transaction log protecting synchronous writes
- SLOG — Dedicated low-latency device for accelerating synchronous writes
These layers are extremely powerful, but they also introduce complexity and tuning considerations.
That flexibility is one of the reasons ZFS is so capable in both homelab and enterprise environments.
RAM Matters More Than Many People Expect
ZFS has a reputation for consuming enormous amounts of memory. That reputation is both true and exaggerated.
ZFS aggressively uses available RAM for ARC caching, which improves performance significantly. However, modern OpenZFS is far more memory-efficient than older recommendations suggest.
My approach is simple:
- give ZFS enough RAM to cache effectively
- avoid starving virtual machines
- monitor ARC pressure over time
- tune only when necessary
In a mixed hypervisor and NAS environment like this one, memory balance matters just as much as storage design.
Architecting the Storage
Designing storage is always a balancing act between three competing priorities:
graph TD
A[Capacity]
B[Performance]
C[Reliability]
A <--> B
B <--> C
C <--> A
Every gain in one area usually comes at the expense of another.
For example:
- more parity improves resilience but reduces usable capacity
- high-performance mirror layouts consume more disks
- aggressive caching improves speed but increases complexity
The correct answer depends entirely on the workload.
Defining My Workloads
My Proxmox VE server runs a mixture of:
- virtual machines
- LXC containers
- network storage
- backup repositories
- media services
- databases
- container infrastructure
These workloads have very different storage requirements. Some care primarily about:
- latency
- IOPS
- synchronous write performance
Others care more about:
- raw capacity
- sequential throughput
- resilience
- long-term retention
Understanding those workload characteristics is what drives the storage design.
Strategy Direction
At a high level, my design strategy separates:
- bulk capacity storage
- latency-sensitive VM workloads
- metadata-heavy operations
- backup and archival data
Different storage tiers will serve different purposes.
The general direction looks like this:
flowchart TB
subgraph HOST["Proxmox VE Host"]
ARC["System RAM ARC / ZIL"]
end
subgraph BOOT["Boot Storage"]
RPOOL["SSD Mirrored rpool"]
end
subgraph FAST["Performance Tier"]
SSD["SSD Pool"]
end
subgraph CAPACITY["Capacity Tier"]
HDD["RAIDZ2 HDD Pool"]
end
subgraph CACHE["Acceleration Layer"]
OPTANE["Optane L2ARC / SLOG"]
SPECIAL["SSD Special VDEV"]
end
ARC <--> OPTANE
OPTANE <--> SSD
OPTANE <--> HDD
SPECIAL <--> HDD
SPECIAL <--> SSD
The next article will cover how I implement this architecture using:
- RAIDZ2 HDD layouts
- mirrored SSD VDEVs
- special metadata devices
- dataset tuning
- ZVOL configuration
- VM storage placement
- container storage strategies
- caching considerations
- snapshot and replication planning
ZFS in Proxmox VE Already Works Extremely Well
One of the strengths of ZFS in Proxmox VE is that even a basic deployment works remarkably well out of the box.
Even without extensive tuning:
- VM provisioning works cleanly
- snapshots are effortless
- replication is reliable
- compression is transparent
- zvol management is integrated directly into the Proxmox VE GUI
You can absolutely create a pool, deploy virtual machines, and stop there.
My goal is not to make ZFS work. It already does.
My goal is to understand the workload deeply enough to make the storage layout intentional rather than accidental. Many of the disadvantages to using ZFS are due to using the defaults for all work loads and the complexities in tuning it so the storage layouts are optimised for the workloads
In future articles I will pull on the many levers that ZFS has for tuning and that is where ZFS becomes genuinely interesting.