How to build a multi-region active-active architecture on AWS
In this post, I will discuss the why and how of designing a multi-region active-active architecture in AWS. Learn all of this and more today.
Jun 08, 2023 • 13 Minute Read
Everything fails all the time — build for resiliency
This is the second part of our series on building multi-region, active-active architectures. In the previous post, we talked about the quest for availability, since it sits at the heart of such design. In this post, we'll discuss the why and the how of designing multi-region active-active architectures.
Bear in mind that building and successfully running multi-region active-active architecture is hard, so this post doesn't pretend to cover everything involved. Instead, it should be treated as an introduction to this art.
Why bother with multi-region architectures?
Good question and glad you asked! There are basically three reasons why you would want to have a multi-region architecture.
- Improve latency for end-users
- Disaster recovery
- Business requirements
1. Improve latency for end-users
The idea is very simple and is related to the speed of light, which no one has yet managed to crack. The closer your backend origin is to end-users, the better the experience.
However, even if CloudFront solves the problem for much of your content, some more dynamic calls still need to be done on the backend, and it could be far away, adding precious milliseconds to the request.
For example, if you have users in Europe but your backend is in US or Australia, the added latency is respectively approximately 140ms and 300ms. Those delays would be unacceptable to start with for many popular games, banking requirements, or interactive applications.
Indeed, latency plays a huge part in a customer’s perception of a high-quality experience and was proved to impact the behaviour of users to some noticeable extent, with lower latency generating more user engagements.
This observation has also been confirmed several times by large companies:
- Amazon: 100 ms of extra load time caused a 1% drop in sales (Greg Linden, source here).
- Google: 500 ms of extra load time caused 20% fewer searches (Marissa Mayer, source here).
- Yahoo!: 400 ms of extra load time caused a 5–9% increase in the number of people who clicked “back” before the page even loaded (Nicole Sullivan, source here).
As technology improves, and especially with the advent of AR, VR and MR requiring even more immersive and lifelike experiences, developers need to produce software systems with stringent latency requirements. Therefore, having locally available applications and content is becoming more and more important.
2. Disaster recovery
What followed was some significant and inspiring engineering work by any AWS customer— achieving regional resiliency.
As Netflix explained on its blog, "Netflix is designed to handle failure of all or part of a single availability zone in a region as we run across three zones and operate with no loss of functionality on two. We are working on ways of extending our resiliency to handle partial or complete regional outages."
Indeed, if your application is composed of multiple different services and one of these services, critical to your application, experiences issues, you might want to shift the traffic to a healthy region to prevent angry customers.
Netflix was not the only one to learn from this ELB failure — AWS too has learned from that failure and has taken steps into reducing the blast radius of potential failure. Today’s implementation of the control plane is more robust.
Failures always happen, and when they do it's important to work both on reducing the occurrence of problems and also to work on mitigating the severity of impact of problems.
3. Business requirements
Finally, some customers may have business requirements to store data in distinct regions, separated by several hundreds of kilometres. Therefore, those customers have to store data in multiple regions. This is becoming more and more common since AWS has now 18 regions globally, spread between the Americas, Asia Pacific, Europe, Middle East and Africa.
How to build multi-region active-active architecture in AWS
Simply put, a multi-region active-active architecture gets all the services on the client request path deployed across multiple AWS Regions. In order to do so, several requirements have to be fulfilled.
- Data replication between regions must be fast and reliable.
- You need a global network infrastructure to connect your different regions.
- Services should not have local state — they must be stateless, and state should be shared between regions.
- Synchronous cross-regional calls should be avoided when possible. Applications should use regional resources.
- DNS routing should be used to permit for different scenarios.
Let’s take a closer look at these requirements.
1. Reliable data replication
Let’s talk a little bit about the CAP theorem. The CAP theorem states that it is impossible for a distributed system to simultaneously provide more than two out of the following three guarantees: Consistency, Availability and Partition Tolerance. But especially that in the presence of a network partition, one has to choose between consistency and availability.
This means that we have two choices: giving up consistency will allow the system to remain highly available, and prioritising consistency means that the system might not always be available.
Since we are in building a multi-region architecture and are optimising for availability, we have to give up consistency — by design, this also means we need to embrace asynchronous systems and replication.
For distributed data stores, asynchronous replication decouples the primary node from its replicas at the expense of introducing replication lag or latency.
This means that changes performed on the primary node are not immediately reflected on its replicas — the result of this lag creates what is often referred to as eventual consistency. When a system achieves eventual consistency, it is said to have converged, or achieved replica convergence.
To achieve replica convergence, a system must reconcile differences between multiple copies of distributed data. It can do so by doing the following reconciliations:
- Comparing versions of the data
- “Smart” comparison of the data itself
- Choosing an arbitrary final state
The most common approach to reconciliation, and also the one used in most systems, including DynamoDB Global Tables, is called the “last writer wins”.
The effect of asynchronous replication must be taken into consideration when designing applications, since besides having architectural consequences, it also has some implications for the client user-interface design and experience.
Such implications are that interfaces should be completely non-blocking. User interactions and actions should resolve instantly without the need to wait for any backend response — everything should resolve itself in the background, asynchronously, and transparently to the user.
No loading messages or spinners staying forever on the screen. Requests to the server should be entirely decoupled from the user interface. This “trick” will also make users believe the application is fast, even if in reality it isn’t — hiding network latency and even full-service failure.
This is often referred to as graceful degradation and it also used by Netflix to mitigate certain failures.
2. Global network infrastructure
A few years ago, when deploying multi-regions architecture, it was standard practice to set up secured VPN connections between regions in order to replicate the data asynchronously.
While deploying and managing those connections has become easier, the main problem was that they went over the internet and were therefore subject to sudden change in routing and especially latency — making it difficult to maintain a consistently good replication.
To overcome that problem, James Hamilton, Vice President & Distinguished Engineer at AWS, announced that AWS would provide a high bandwidth, global network infrastructure powered by redundant 100GbE links circling the globe.
This means that AWS Regions are connected to a private global network backbone, which provides lower cost and more consistent cross-region network latency when compared with the public internet — and the benefits are clear:
- Improved latency, packet loss, and overall quality
- Avoids network interconnect capacity conflicts
- Greater operational control
3. Stateless applications
I previously wrote about the local state being a cloud anti-pattern. This is even more true for multi-region architecture. When clients interact with an application they do so in a series of interactions called a session.
In a stateless architecture, the server must treat all client requests independently of prior requests or sessions, and should not store any session information locally. So given the same input, a stateless application should provide the same response to any end-user.
Stateless applications can scale horizontally, since any request can be handled by any available computing resources (e.g. instances, containers or functions).
4. Use local resources and avoid cross-regional calls
As mentioned previously, preventing increased latency is critical for applications. Therefore, it's important to avoid synchronous cross-region calls and always make sure resources are locally available for the application to use, thus optimising latency.
For example, objects stored in an Amazon S3 bucket should be replicated in multiple regions to allow for local access from any region. Luckily, Amazon implemented the feature cross-region replication for Amazon S3. Cross-region replication is a bucket-level configuration that enables automatic, asynchronous copying of objects across buckets in different AWS Regions.
This local access of resources also applies for databases. To support this scenario, AWS launched Cross-Region Read Replicas for Amazon RDS for MySQL followed by MariaDB, PostgreSQL and Amazon Aurora.
Separating the writes from the reads across multiple regions will improve your disaster recovery capabilities, but it will also let you scale read operations into a region that is closer to your users — and make it easier to migrate from one region to another.
The main restriction with this pattern is that all critical writes traffic must go to one single master, in the region of origin.
Please remember that in order to work with cross-region read replicas, you must embrace eventual consistency as discussed above — due to the replication of data being asynchronous.
Note: Using RDS, you can monitor this replication lag by using Amazon CloudWatch and raise an alert if it reaches a level that is unacceptably high for your application.
To prevent having cross-region writes on the database, you can work with Amazon Aurora multi-master clusters.
Multi-master clusters improve Aurora’s already high availability. If one of your master instances fail, the other instances in the cluster will take over immediately, maintaining read and write availability through instance failures or even complete AZ failures, with zero application downtime.
Amazon DynamoDB global tables also allow you to build globally distributed applications.
A DynamoDB global table consists of multiple replica tables, one per region of choice, that DynamoDB treats as a single unit. Every replica has the same table name and the same primary key schema.
Applications can write data to any of the replica tables. DynamoDB automatically propagates the write to the other replica tables in the other AWS regions.
DynamoDB is the same database that is used at Amazon.com during Prime Days. For example, in 2017, Amazon DynamoDB requests from Alexa, the Amazon.com sites, and Amazon fulfilment centers totalled 3.34 trillion, peaking at 12.9 million per second.
5. DNS routing
In order to route traffic between regions, we need to use a Domain Name System (DNS), which support configurable routing policies.
- Geolocation routing policy: Use when you want to route traffic based on the location of your users.
- Geoproximity routing policy: Use when you want to route traffic based on the location of your resources and, optionally, shift traffic from resources in one location to resources in another.
- Latency routing policy: Use when you have resources in multiple locations and you want to route traffic to the resource that provides the best latency.
- Multi-value answer routing policy: Use when you want Route 53 to respond to DNS queries with up to eight healthy records selected at random.
- Weighted routing policy: Use to route traffic to multiple resources in proportions that you specify.
Routing policies (Geoproximity and latency) with Route53.
In the post, we learned that in order to build a multi-region active-active architecture, all the services on the client request path must be deployed across multiple AWS Regions, that we must embrace asynchronous designs and architectures, and that we must build applications that are fully stateless.
Of course, we should leverage services like Amazon S3 or DynamoDB that are highly available and that benefit from the global network build by Amazon around the globe to have reliable replication of data. Finally, we also discussed the use of traffic flow in order to support different routing policies between AWS Regions.
Please do not hesitate to give feedback, share your own opinion or simply clap your hands. In the next part, I will go on and build a multi-region, active-active backend — but will do that serverless style :) Stay tuned!