When Colonial Pipeline got hit with ransomware in May 2021, the technical recovery was only part of the problem. Gas stations ran dry across the southeastern United States. Panic buying cascaded. The pipeline was down for six days, and the ripple effects lasted weeks. The technical systems came back online, but the business — the supply chain, the public trust, the operational continuity — that took much longer. That’s the gap between disaster recovery and business continuity, and most organizations don’t understand the difference until they’re living it.
The TLDR
Business continuity planning (BCP) keeps critical business functions running during and after a disruption. Disaster recovery (DR) is a subset that deals specifically with restoring IT systems and data. BCP is broader — it covers people, processes, facilities, and communication, not just servers. The process starts with a Business Impact Analysis (BIA) to identify what matters most, then builds plans for maintaining those functions through any scenario: ransomware, natural disaster, pandemic, supply chain failure. NIST SP 800-34 and ISO 22301 are the primary standards. Most plans fail their first real test because they were never tested realistically.
The Reality
Folks treat BCP like insurance — something you pay for and hope you never use. The problem with that mindset is that insurance pays out after the damage. BCP is supposed to prevent the damage from becoming fatal to the business.
The COVID-19 pandemic was the largest unplanned BCP activation in history. Organizations that had continuity plans for “pandemic scenario” — and actually tested them — pivoted to remote operations within days. Organizations that didn’t spent weeks scrambling, losing revenue, losing people, and in some cases closing permanently. The FEMA statistics are sobering: roughly 40% of small businesses that experience a disaster never reopen.
Ransomware has turned BCP from a “nice to have” into an existential requirement. When your systems are encrypted and your backups are compromised (because the attackers were in your network for weeks before detonating), the question isn’t “how do we restore from backup” — it’s “how does the business continue to function while we figure this out?”
How It Works
BCP vs DR — Related but Different
Business Continuity Planning addresses the entire organization. How do you keep serving customers, paying employees, communicating with stakeholders, and maintaining critical operations when something goes catastrophically wrong? BCP covers manual workarounds, alternate facilities, crisis communication, and supply chain continuity.
Disaster Recovery is the technical subset. How do you restore IT systems, recover data, and get infrastructure back online? DR is a component of BCP, but BCP without DR is an incomplete plan, and DR without BCP is a technical exercise that ignores the business.
The Business Impact Analysis (BIA)
The BIA is the foundation of everything. Without it, you’re guessing at what matters.
A BIA identifies critical business functions and determines the impact of losing them over time. For each function, you determine:
- Maximum Tolerable Period of Disruption (MTPD) — How long can this function be down before the business faces unacceptable consequences? For a payment processor, it might be minutes. For an annual report function, it might be weeks.
- Recovery Time Objective (RTO) — The target time to restore the function. Must be less than the MTPD.
- Recovery Point Objective (RPO) — How much data loss is acceptable. An RPO of zero means no data loss (real-time replication). An RPO of 24 hours means you can lose a day’s worth of data (daily backups).
- Dependencies — What systems, people, vendors, and facilities does this function depend on? Dependencies are where plans fall apart, because critical functions rarely exist in isolation.
The BIA produces a prioritized list: when everything goes wrong simultaneously, this is the order in which you restore things. It’s a hard conversation because every business unit thinks their function is the most critical. The BIA forces the organization to make that decision in advance, not during a crisis.
Crisis Communication
When the servers go down, the first question isn’t technical — it’s “who do we call?” Crisis communication plans define:
- Internal communication — How do you reach employees when email is down? Phone trees, personal cell numbers, out-of-band messaging platforms.
- External communication — Customers, partners, regulators, media. Who speaks for the organization? What’s the message? Breach notification laws (GDPR’s 72-hour requirement, HIPAA’s notification rules) impose legal deadlines that don’t pause because you’re busy fighting fires.
- Stakeholder management — Board notification, investor communication, regulatory reporting. Each has different timelines and messaging requirements.
The worst time to figure out your crisis communication plan is during a crisis. Pre-drafted templates, pre-approved messaging, and designated spokespersons — all of this gets decided in advance.
Alternate Processing Strategies
When your primary facility or infrastructure is unavailable, where does work happen?
- Hot site — Fully equipped duplicate facility, ready to go within hours. Most expensive, fastest recovery.
- Warm site — Infrastructure in place but requires configuration and data restoration. Middle ground. Hours to days.
- Cold site — Empty facility with power and connectivity. Everything else needs to be provisioned. Days to weeks. Cheapest.
- Cloud-based recovery — Increasingly common. Replicate critical workloads to a cloud provider. Spin up when needed, pay for what you use. Services like AWS Elastic Disaster Recovery and Azure Site Recovery have made this accessible to smaller organizations.
- Reciprocal agreements — Two organizations agree to host each other’s critical systems in a disaster. Sounds good on paper, rarely works in practice because both organizations tend to need the same resources at the same time (regional disasters).
Supply Chain Continuity
Your BCP is only as strong as your vendors’ BCPs. If your payment processor goes down, your revenue stops regardless of how well your own systems are running. Supply chain continuity requires:
- Identifying critical vendors and single points of failure
- Reviewing vendors’ own BCP/DR capabilities
- Maintaining alternate vendor relationships for critical services
- Contractual requirements for vendor recovery timelines
The SolarWinds supply chain attack demonstrated a different angle — not vendor downtime, but vendor compromise. Your continuity plan needs to account for scenarios where a trusted vendor becomes the threat vector.
Testing — The Part Everyone Skips
A plan that hasn’t been tested is a hypothesis. Testing levels, from least to most disruptive:
- Checklist review — Walk through the document, verify contact information, confirm procedures exist. Better than nothing. Barely.
- Tabletop exercise — Key stakeholders sit around a table and walk through a scenario. “It’s Tuesday morning, ransomware just encrypted the file servers. What do we do?” This reveals gaps in the plan without disrupting operations.
- Simulation / functional exercise — Actually perform recovery procedures in a test environment. Restore from backups. Fail over to the alternate site. Time it.
- Full-scale exercise — Simulate a real disaster with actual failover. Operations move to the alternate site. Production systems are recovered from backups. This is expensive and disruptive, but it’s the only way to know if the plan actually works under pressure.
NIST SP 800-34 recommends testing annually at minimum, with tabletop exercises more frequently. ISO 22301 requires testing as part of certification. Yet the most common finding in BCP audits is inadequate or nonexistent testing.
How It Gets Exploited
Ransomware operators target backups first. The usual suspects know that organizations with good backups don’t pay ransoms. So they compromise the backup infrastructure before detonating the ransomware. MITRE ATT&CK technique T1490 (Inhibit System Recovery) is specifically about destroying recovery capabilities. If your DR plan assumes clean backups exist, and the attackers spent three weeks ensuring they don’t, your plan is already dead.
Social engineering during crises. When an organization is in crisis mode, people are stressed, rushed, and operating outside normal procedures. Attackers exploit this — phishing emails posing as recovery vendors, phone calls impersonating IT staff, fake invoices for emergency services. The chaos of a disaster is a social engineering paradise.
Single points of failure in communication. If your crisis communication plan relies entirely on email, and email is down, you have no crisis communication plan. If the only person who knows the recovery procedures is on vacation in a dead zone, you have no recovery procedures. Redundancy in communication is as critical as redundancy in systems.
What You Can Do
For Organizations
Start with the BIA. Everything else flows from knowing what matters most. Involve business leadership, not just IT — they’re the ones who define “critical.”
Test your backups. Not “verify the backup job completed” — actually restore from backup and confirm the data is intact and the system works. Do this regularly. The backup that hasn’t been tested is Schrodinger’s backup.
Run a tabletop exercise. Quarterly if you can manage it. Pick a scenario, gather the right people, and walk through it. The first one will be humbling. That’s the point.
Keep your plan short enough that people will actually read it during a crisis. The 300-page BCP binder is a reference document. The actual response procedures should fit on a laminated card.
For Individuals
You have your own continuity to think about. Where are your critical files? Are they backed up somewhere other than your primary device? If your laptop died right now, how long would it take you to be operational again? Apply the same thinking: identify what matters, back it up, and test the restore.
Sources & Further Reading
- NIST SP 800-34 Rev. 1: Contingency Planning Guide — The primary federal guidance for IT contingency planning
- ISO 22301: Business Continuity Management Systems — The international standard for BCP
- FEMA Continuity Guidance — Federal continuity planning resources
- CISA Ransomware Guide — Ransomware response and recovery guidance
- MITRE ATT&CK — Attack techniques targeting recovery and continuity
- ISC2 Resources — BCP and DR professional guidance