How to Build a Backup & Recovery Strategy

Everyone has backups. Almost nobody tests restores. And the difference between those two things is the difference between recovering from a ransomware attack in hours and paying a seven-figure ransom because your “backups” were encrypted right alongside everything else. Backups are not a checkbox. They’re your last line of defense when everything else has failed — when the attacker has encrypted your servers, when the datacenter floods, when somebody drops the production database at 2 AM on a Friday. If your backups don’t work, nothing else matters.

The question isn’t whether you have backups. The question is: can you restore your critical systems to a known-good state, within a defined timeframe, right now? If you can’t answer that with certainty, keep reading.

DO / DON’T

DO:

Follow the 3-2-1 rule — Three copies of your data, on two different media types, with one copy offsite. This is the minimum.
Test restores regularly — Monthly for critical systems, quarterly for everything else. A backup you haven’t tested is an assumption, not a plan.
Keep at least one backup offline or immutable — If ransomware can reach your backups, your backups are just more data to encrypt.
Define RPO and RTO for every critical system — How much data can you afford to lose? How long can the system be down? These numbers drive everything else.
Encrypt your backups — Backup media gets lost, stolen, or improperly disposed of. Encryption protects the data regardless.
Document the restore process — Step-by-step, so someone other than the person who set it up can execute it under pressure.

DON’T:

Don’t store all backups on the same network as production — Ransomware operators specifically target backup infrastructure. If your backups are on the same domain, they’re on the same target list.
Don’t assume cloud services back up your data — SaaS providers protect their infrastructure, not your data. Read the shared responsibility model.
Don’t keep only one backup copy — One copy is one failure away from zero copies. Hardware fails, media degrades, tapes get lost.
Don’t skip testing — “We’ve never had a problem” is what everyone says right before the first problem.
Don’t forget about configuration and infrastructure — Data backups without system configuration backups mean rebuilding everything from scratch before you can restore.
Don’t ignore backup monitoring — A backup job that’s been silently failing for three months means three months of data loss you don’t know about yet.

Define RPO and RTO

Before you design anything, answer two questions for each critical system:

Recovery Point Objective (RPO)

How much data can you afford to lose?

RPO defines the maximum acceptable age of data to restore. If your RPO is 4 hours, you need backups at least every 4 hours. If your RPO is zero, you need real-time replication.

RPO	Meaning	Backup Method
Near zero	No data loss acceptable	Synchronous replication, continuous data protection (CDP)
1-4 hours	Minimal data loss acceptable	Frequent snapshots, transaction log backups
24 hours	One business day of data loss acceptable	Daily backups
48-72 hours	Multiple days acceptable	Daily or twice-weekly backups

Recovery Time Objective (RTO)

How long can the system be down?

RTO defines the maximum acceptable downtime. If your RTO is 1 hour, you need hot standby infrastructure. If your RTO is 48 hours, you can restore from cold backups.

RTO	Meaning	Recovery Method
Minutes	Near-zero downtime	Hot standby, automatic failover, load-balanced redundancy
1-4 hours	Minimal downtime	Warm standby, pre-staged recovery environment, VM snapshots
24 hours	One business day	Restore from recent backups to existing or cloud infrastructure
48-72 hours	Extended downtime acceptable	Restore from offsite/offline backups, hardware procurement

RPO and RTO drive your backup architecture, your infrastructure investment, and your testing requirements. Define them with business stakeholders, not IT alone. The CFO and the operations lead need to agree on what “acceptable” means because they’ll live with the consequences. NIST SP 800-34 Rev. 1 provides the framework for contingency planning, including RPO/RTO definition.

The 3-2-1 Rule (and Beyond)

3-2-1

The classic rule:

3 copies of your data (production + 2 backups)
2 different media types (disk + tape, disk + cloud, NAS + object storage)
1 copy offsite (different physical location)

This protects against hardware failure (multiple copies), media-specific failures (different types), and site-level disasters (offsite copy).

3-2-1-1-0

The modern extension for the ransomware era:

3 copies of your data
2 different media types
1 copy offsite
1 copy offline or immutable (air-gapped or write-once storage)
0 errors — verified restores with zero failures

The extra “1” is the ransomware answer. An offline copy that’s physically disconnected from the network, or an immutable copy on storage that prevents modification or deletion even by administrators, is the only backup that ransomware can’t reach.

Immutable Backups

Why Immutability Matters

Modern ransomware operators don’t just encrypt production data. They target backup infrastructure specifically. Groups like BlackCat, LockBit, and Cl0p actively search for and destroy backups before deploying encryption. If your backup server is domain-joined and your backup storage is accessible via the same credentials, the attacker will encrypt your backups too.

How to Achieve Immutability

Object lock / WORM storage:

AWS S3 Object Lock — Governance mode (admins can override) or Compliance mode (nobody can delete, not even the root account, until the retention period expires). Use Compliance mode for ransomware protection.
Azure Immutable Blob Storage — Time-based retention policies or legal hold. Prevents modification and deletion.
Backblaze B2 — Object Lock support with S3-compatible API.
On-premises — Purpose-built backup appliances with immutable storage (e.g., ExaGrid with retention time-lock), or Linux-based storage with immutable flags.

Air-gapped backups:

Tape backups stored offsite (physically disconnected from any network)
Removable disk arrays that are connected only during backup windows and physically disconnected afterward
Dedicated backup networks with no route to production — not just VLANs, but physically separate infrastructure

Backup account isolation:

Backup service accounts should not be domain-joined
Backup administrative access should require separate credentials not stored in Active Directory
Apply the principle of least privilege — backup accounts need write access to backup storage and read access to production data, nothing more

CISA’s Ransomware Guide explicitly recommends maintaining offline, encrypted backups and testing restoration regularly.

Backup Scope

What to Back Up

The obvious answer is “everything,” but that’s expensive and ignores priority. Back up in priority order:

Critical (daily or more frequently):

Databases — customer data, financial records, application state
Active Directory / identity infrastructure — losing AD means losing authentication for everything
Configuration management — firewall configs, switch configs, server configurations, infrastructure-as-code repos
Email — if your organization runs on email, treat it as critical data
File shares with active business data

Important (daily):

Application servers — or better yet, maintain infrastructure-as-code that can rebuild them
Source code repositories — Git is distributed, but verify remote copies exist
Certificate stores and key material — encrypted, separately from the keys that decrypt them

Standard (weekly):

Workstation images — golden image plus user data
Development and test environments — reproducible from code/config, or backed up if not
Documentation and wiki systems

What Not to Back Up

Temporary files and caches (reproducible)
OS installation media (keep copies, don’t back up daily)
Data that can be regenerated from other backed-up data (derived datasets, compiled binaries)

Testing Restores

Why Testing Matters

The backup succeeded. The job completed. The log shows “success.” None of that means the restore will work. Backup verification failures are discovered at the worst possible time — when you actually need the data.

How to Test

Automated verification:

Most backup tools support checksum verification during and after backup
Enable it. The performance overhead is worth the confidence.

Restore tests:

Test Type	Frequency	What It Proves
File-level restore	Monthly	Individual files can be recovered from backup
System-level restore	Quarterly	A complete system (OS, applications, data, configuration) can be rebuilt from backup
Full DR exercise	Annually	The entire recovery procedure works end-to-end, including documentation, personnel, and infrastructure

For each test, measure:

Time to restore (does it meet your RTO?)
Data completeness (does it meet your RPO?)
Application functionality (does the restored system actually work?)
Documentation accuracy (could someone else follow the restore procedure?)

Document every test: date, scope, results, issues discovered, and corrective actions. This documentation serves as evidence for compliance audits and as input for improving the process.

Ransomware-Resistant Architecture

Designing specifically for ransomware resilience:

Separate backup credentials — Backup infrastructure uses separate accounts, separate passwords, separate MFA. Not in Active Directory.
Network isolation — Backup storage on a network segment that production systems can’t directly access. Backup agents push data through a controlled channel.
Immutable retention — At least one backup copy on storage that cannot be modified or deleted for a defined retention period (see Immutable Backups section above).
Canary files — Place monitoring files on production systems. If they’re modified or encrypted, alert immediately — ransomware is active.
Recovery environment — Maintain a pre-staged recovery environment (clean network, clean credentials, recovery tools) that can be brought up independently of the compromised production environment.
Backup scanning — Scan backup data for malware before restoration. Restoring from a backup that contains the attacker’s persistence mechanism just restarts the incident.

Monitoring Backups

Don’t wait for a disaster to discover your backups have been failing.

Alert on backup failures — Any failed backup job should generate an alert within hours, not sit in a log until someone checks manually.
Alert on missed backups — A job that didn’t run is worse than one that failed — at least the failure generates a log entry.
Monitor backup size trends — A backup that’s suddenly much smaller might indicate data loss. A backup that’s suddenly much larger might indicate data corruption or ransomware encryption (encrypted data doesn’t compress well).
Monitor retention compliance — Verify that backups exist for the full retention period. Storage cleanup jobs that are too aggressive can silently reduce your retention below your requirements.
Track RPO compliance — Measure the actual age of your most recent restorable backup. If your RPO is 4 hours but your last successful backup was 18 hours ago, you’re in violation of your own targets.

If It Already Happened

If you need to restore and your backups are inadequate, encrypted, or missing:

Don’t pay the ransom as your first option — CISA, FBI, and law enforcement agencies advise against paying. Payment doesn’t guarantee decryption and funds criminal operations.
Check for decryptors — No More Ransom maintains free decryption tools for many ransomware variants. Check before paying.
Engage incident response — Professional IR firms may identify recovery options you haven’t considered: shadow copies, cloud recycle bins, partial backups, replication lag that preserved clean data.
Report — File with FBI IC3 and CISA. If personal data is affected, check breach notification obligations.
Learn from it — After recovery, build the backup strategy you should have had. Immutable backups, tested restores, offline copies. The cost of proper backup infrastructure is a fraction of the cost of the incident you just survived.

Backups are insurance. Like insurance, they only matter when everything else has failed — and that’s exactly when you can’t afford to discover they don’t work. Define your RPO and RTO this week. Verify you have at least one immutable or offline copy. Test a restore. If the restore works, you’re ahead of most organizations. If it doesn’t, you just found the problem before an attacker did. Either way, you win.

Your Backup & Recovery Strategy