EC2 SSH Disaster Recovery
A broken SSH configuration can make an EC2 instance unreachable while the workload itself is perfectly healthy. This writeup walks through recovering access by treating the root disk as data: detach it, attach it to a helper instance, repair the configuration, and reattach.
Technologies
EC2 · EBS · Linux · SSH · Incident Response
Problem
After a configuration change, SSH access to a running instance was lost. The instance could not be reached, but terminating and rebuilding it was not acceptable — the data and state on the root volume had to be preserved.
Architecture
The recovery uses the EBS volume-rescue pattern: stop the affected instance, detach its root EBS volume, attach that volume as a secondary disk on a temporary helper instance in the same availability zone, mount it, correct the offending configuration, then detach and reattach it to the original instance as root.
Security considerations
Access was restored without exposing the instance to additional risk — no password authentication was enabled as a shortcut, and the helper instance was disposable.
Challenges
The availability-zone constraint on attaching volumes, and getting the device names and mount points right under pressure, were the main friction points.
Lessons learned
Treating the root volume as recoverable data changes how you think about a locked instance. Writing the procedure down beforehand turns a stressful outage into a checklist.