HDD Data Recovery in a Proxmox Homelab

Context

I run a small Proxmox VE homelab for day-to-day experimentation — Linux and Windows VMs, LXC containers, networking tests, and storage experiments. It’s not a dedicated recovery setup, just a practical environment I actively use.

One of the attached HDDs (an older NTFS disk used for archival and creative assets) started behaving inconsistently — intermittent I/O errors, occasional mount failures, and increasing read latency.

At that point, the priority wasn’t fixing the disk, but extracting data safely without making things worse.

What follows is how the recovery was handled inside the existing Proxmox host, without additional hardware, while keeping the system stable.

Identifying the Failure Pattern

The disk was still detectable at the block level, but filesystem operations were unreliable.

Key Observations

dmesg showed repeated read errors and command failures
NTFS partitions were visible but unstable
SMART data was accessible, but error counts were non-trivial

SMART didn’t immediately flag the drive as failed, but the error metrics were enough to treat it as degrading rather than reliable long-term storage.

Goal:
Read as little as possible, write nothing, and move important data off quickly.

Read-Only First, Always

The most important decision was to never mount the disk in read-write mode.

Each NTFS partition was mounted explicitly as read-only:

This avoided:

Journal replays
Metadata updates
Accidental filesystem writes

Some directories were readable, while others triggered immediate I/O errors — indicating partial sector degradation rather than total failure.

Controlled Data Extraction

Instead of attempting a full copy, recovery focused on known-good, high-value directories first.

rsync -avh --progress "/mnt/hdd_creative/Creative Projects" /mnt/creative_data/

Benefits of this approach:

Skipped unreadable files instead of hanging
Provided visibility into transfer progress and failures
Allowed safe interruption and resume

Anything that failed consistently was left untouched. Forcing reads on bad sectors only accelerates disk failure.

Storage Constraints and LVM Thin Provisioning

The Proxmox root filesystem was intentionally small, so dumping recovery data there wasn’t an option.

Solution: Allocate recovery storage using LVM thin provisioning

lvcreate -V 200G -T vmstore/thinpool -n creative_data
mkfs.ext4 /dev/vmstore/creative_data
mount /dev/vmstore/creative_data /mnt/creative_data

This provided:

Isolated storage for recovered data
No impact on Proxmox system stability
Flexibility to grow, shrink, or discard later

Recovered files were moved here immediately after validation.

Verification Without Risky Remounts

Rather than repeatedly accessing the failing disk, validation was performed from the SSD-backed recovery volume.

The recovery directory was exposed via Samba and accessed from a Windows VM inside Proxmox.

This enabled:

GUI-based inspection
Media playback validation
Project file spot checks

All verification was done away from the failing disk — where it belongs.

Disk Degradation Reality Check

Eventually, the HDD began returning I/O errors even on directory reads. That marked the stopping point.

Once critical data was recovered:

The disk was unmounted
No further read attempts were made
No filesystem repair tools were use

Important Note

Tools like ntfsfix and chkdsk are filesystem consistency tools, not recovery tools. On physically failing disks, they often make things worse.

Remote Access During Recovery

Twingate was already installed directly on the Proxmox host, which allowed:

Monitoring transfers remotely
Checking disk behavior without physical access
Avoiding unnecessary power cycles

Remote access didn’t change the recovery process, but it made the workflow safer and more practical.

What Was Intentionally Avoided

No sector-by-sector cloning (no spare target disk available)
No filesystem repair attempts
No stress testing after data was secured
No repeated mounts once errors escalated

Recovery is about restraint, not heroics.

Takeaways

Proxmox is fully capable of handling recovery tasks
Read-only mounts significantly reduce risk
LVM thin provisioning is extremely useful for temporary storage
Verification should always happen away from the failing device
Knowing when to stop is as important as knowing what to do

Archives

Loading