Despite the new feature, AWS’s US East region in Northern Virginia will remain at risk, an analyst says.
Amazon Web Services (AWS) has introduced a new Domain Name Service (DNS) resiliency feature designed to improve reliability and reduce service disruptions in its US East region (Northern Virginia).
In October, AWS’s US East region experienced widespread service disruption after a DNS failure caused the DynamoDB API to become unstable, affecting over 70 AWS services and impacting a large section of its customer base for hours, forcing the hyperscaler to eventually restore DNS manually.
The full recovery of the service took even longer due to delayed network configurations and backlog clearing.
The new DNS resiliency feature, named Accelerated recovery for managing public DNS records, according to AWS, is designed to address DNS issues such as the one that brought the US East region down in October.
The new feature has been added to Route 53, AWS’s cloud-based web service that helps translate human-friendly domain names into numeric IP addresses for easier communication between systems. It is designed to provide a 60-minute recovery time objective (RTO) during future outages, AWS wrote in a blog post.
“This enhancement ensures that customers can continue making DNS changes and provisioning infrastructure even during regional outages, providing greater predictability and resilience for mission-critical applications,” AWS added.
Data and control plane differentiation
AWS has typically faced DNS issues that affect the control plane, the management layer that decides how traffic should be directed, and not the data plane. The data plane is the other layer that carries out those instructions by actually delivering DNS queries to their destination.
“In big AWS incidents, the DNS data plane usually stays up, i.e., you might still have a running infrastructure, but the control plane in US East can stall, which means you can’t update DNS fast enough to reroute traffic, and that’s the real failure point,” said HFS Research’s associate practice leader Akshat Tyagi.
“The new feature aims to fix that gap. It provides a hardened, multi-region control path that ensures key APIs like ‘ChangeResourceRecordSets’ stay available within a guaranteed 60-minute recovery window. This means enterprises can redirect users to backup regions, switch to standby endpoints, or cut over to a disaster-recovery setup without waiting for AWS to recover,” Tyagi added.
US East region is a bottleneck for AWS
The US East (Northern Virginia) region has continued to be a major architectural chokepoint for AWS.
“The control plane for many global AWS services has historically depended on that region in Northern Virginia. When that region shakes, everyone feels the ripples,” Tyagi said.
The analyst also warned that the new feature might not be enough to stop the fallout of future outages, although it fixes one of several critical gaps.
“Until AWS decides to spread control-plane responsibilities across multiple independent regions, offering stronger cross-region failover guarantees for critical APIs, there’s still risk,” Tyagi said.
AWS can take additional measures, such as shipping more opinionated blueprints for multi‑region DNS and control‑plane isolation, so customers are not reinventing complex patterns after each incident or fallout, Tyagi said.
Leading rivals in DNS resiliency capabilities
The new DNS resiliency feature could put AWS ahead of other hyperscalers that too continue to face network outages.
“Azure, GCP, and Cloudflare all operate strong, globally distributed DNS systems, but none of them commit to a defined recovery time for DNS control-plane updates during a regional outage. That’s the key difference,” Tyagi said. “They (rivals) guarantee that DNS queries will keep resolving, but they don’t specify how quickly you’ll be able to update DNS records if a control-plane incident occurs.”
The new DNS resiliency feature is not the first addition that AWS has made in its efforts to reduce downtime for its enterprise customers.
Soon after the October outage, AWS added an automated incident-generating capability within its CloudWatch service.




