Recovering Data from a Failing HDD Inside a Live Proxmox Homelab
Recovered critical data from a failing NTFS HDD inside a live Proxmox homelab without additional hardware. Focused on read-only access, controlled extraction, and minimal risk to avoid further disk damage.
HDD Data Recovery in a Proxmox Homelab
Context
I run a small Proxmox VE homelab for day-to-day experimentation — Linux and Windows VMs, LXC containers, networking tests, and storage experiments. It’s not a dedicated recovery setup, just a practical environment I actively use.
One of the attached HDDs (an older NTFS disk used for archival and creative assets) started behaving inconsistently — intermittent I/O errors, occasional mount failures, and increasing read latency.
At that point, the priority wasn’t fixing the disk, but extracting data safely without making things worse.
What follows is how the recovery was handled inside the existing Proxmox host, without additional hardware, while keeping the system stable.
Identifying the Failure Pattern
The disk was still detectable at the block level, but filesystem operations were unreliable.
Key Observations
dmesgshowed repeated read errors and command failures- NTFS partitions were visible but unstable
- SMART data was accessible, but error counts were non-trivial
SMART didn’t immediately flag the drive as failed, but the error metrics were enough to treat it as degrading rather than reliable long-term storage.
Goal:
Read as little as possible, write nothing, and move important data off quickly.
Read-Only First, Always
The most important decision was to never mount the disk in read-write mode.
Each NTFS partition was mounted explicitly as read-only:
This avoided:
-
Journal replays
-
Metadata updates
-
Accidental filesystem writes
Some directories were readable, while others triggered immediate I/O errors — indicating partial sector degradation rather than total failure.
Controlled Data Extraction
Instead of attempting a full copy, recovery focused on known-good, high-value directories first.
rsync -avh --progress "/mnt/hdd_creative/Creative Projects" /mnt/creative_data/
Benefits of this approach:
-
Skipped unreadable files instead of hanging
-
Provided visibility into transfer progress and failures
-
Allowed safe interruption and resume
Anything that failed consistently was left untouched. Forcing reads on bad sectors only accelerates disk failure.
Storage Constraints and LVM Thin Provisioning
The Proxmox root filesystem was intentionally small, so dumping recovery data there wasn’t an option.
Solution: Allocate recovery storage using LVM thin provisioning
lvcreate -V 200G -T vmstore/thinpool -n creative_data
mkfs.ext4 /dev/vmstore/creative_data
mount /dev/vmstore/creative_data /mnt/creative_data
This provided:
-
Isolated storage for recovered data
-
No impact on Proxmox system stability
-
Flexibility to grow, shrink, or discard later
Recovered files were moved here immediately after validation.
Verification Without Risky Remounts
Rather than repeatedly accessing the failing disk, validation was performed from the SSD-backed recovery volume.
The recovery directory was exposed via Samba and accessed from a Windows VM inside Proxmox.
This enabled:
-
GUI-based inspection
-
Media playback validation
-
Project file spot checks
All verification was done away from the failing disk — where it belongs.
Disk Degradation Reality Check
Eventually, the HDD began returning I/O errors even on directory reads. That marked the stopping point.
Once critical data was recovered:
-
The disk was unmounted
-
No further read attempts were made
-
No filesystem repair tools were use
Important Note
Tools like ntfsfix and chkdsk are filesystem consistency tools, not recovery tools. On physically failing disks, they often make things worse.
Remote Access During Recovery
Twingate was already installed directly on the Proxmox host, which allowed:
-
Monitoring transfers remotely
-
Checking disk behavior without physical access
-
Avoiding unnecessary power cycles
Remote access didn’t change the recovery process, but it made the workflow safer and more practical.
What Was Intentionally Avoided
-
No sector-by-sector cloning (no spare target disk available)
-
No filesystem repair attempts
-
No stress testing after data was secured
-
No repeated mounts once errors escalated
Recovery is about restraint, not heroics.
Takeaways
-
Proxmox is fully capable of handling recovery tasks
-
Read-only mounts significantly reduce risk
-
LVM thin provisioning is extremely useful for temporary storage
-
Verification should always happen away from the failing device
-
Knowing when to stop is as important as knowing what to do