The challenges of blue-green deployment with AWS Lambda and CloudFormation

The challenges and approaches with Blue-Green Deployments using AWS Lambda and CloudFormation. Learn more!

By A Cloud Guru News

Jun 08, 2023 • 7 Minute Read

Please set an alt value for this image...

I’ve been thinking a lot about deployment of Lambda and AWS CloudFormation. The addition of weighted aliases for Lambda and canarying for API Gateway means that phased rollouts of code that look “in-place” using the same endpoint are now possible — usually called blue-green deployments.

What is blue-green deployment?

Blue-green deployment is a continuous deployment process that reduces downtime and risk by having two identical production environments, called blue and green. Blue-green deployments allow you to test the new application version before sending production traffic to it. If there is an issue with the newly deployed application version, you can roll back to the previous version faster than with in-place deployments.

Blue-green deployment challenges

One of the consequences of this approach is that your application has more than one version of a function’s code active at one time.

I think the more ephemeral and small the compute is, the more it looks like just another node in the infrastructure graph. But current deployment paradigm is that you push code into a infrastructure node. I’d rather it be more of a pull
— Ben Kehoe (@ben11kehoe) December 17, 2017

Fundamentally, versioned Lambda code doesn’t feel like it belongs in CloudFormation. In the end, the set of versions of a Lambda function is probably going to look like a replica of your repository history.

You only want the in use code — your active branches — to be present in the managed infrastructure for application using CloudFormation. So I got to thinking how functions could be published outside of CloudFormation while somehow still linked to the resources inside the template.

Like, I want continuous deployment to deploy every commit into S3 artifacts (not an actually into Lambda). Continuous delivery is then CFN updates pulling those artifacts into the infra graph. Feels like a clean split. cc @chrismunns
— Ben Kehoe (@ben11kehoe) December 17, 2017

The ever-growing list of AWS Lambda versions

Except … function configuration is slowly changing which feels maybe like it belongs in CloudFormation. Configuration specifies the IAM role which should be in CloudFormation — and the environment variables may reference other CloudFormation resources.

However, the configuration is versioned with the code, so deploying code outside of CloudFormation would have to pull in values from a stack. The stack (i.e., the weighted aliases) would then also need to reference the Lambda version created by that deployed code.

So it weaves in and out of AWS CloudFormation. Not great.

I created a diagram below that depicts the challenge. It’s actually annoying to diagram concisely because it’s all about how it evolves over time. 😒

The ever-growing list of Lambda versions doesn’t seem like it belongs in CloudFormation because the template then just grows and grows. The function configuration is versioned along with the code and therefore would be deployed outside CloudFormation — but it’s slowly changing. The functions also references resources that live in CloudFormation. Both of these are clear indicators it should live within the template.

Once I figure out the desired model — whatever it is — I’ll create custom resources to accomplish it — but I hope it would be something that could work for SAM in the future.

My current thoughts on this challenge:

AWS::Lambda::Placeholder

There would be a resource that represents the Lambda function existence: it creates the name (and only by necessity, deploys non-functional code as a placeholder). Unlike the existing AWS::Lambda::Function resource, it would not take any properties other than (optionally) FunctionName. I’ll call this resource type AWS::Lambda::Placeholder.

AWS::Lambda::Deployment

There would be a resource type for deploying a version. It would take all the properties of AWS::Lambda::Function, with two changes:

FunctionName would be required for where you would put a reference to the Placeholder resource
CodeSha256 would be available but optional, to allow for the prevention of race conditions like the existing AWS::Lambda::Version resource)

I’ll call this resource type AWS::Lambda::Deployment. It would return the version ARN when ref’d in the template — and probably the version number as an attribute.

Unlike AWS::Lambda::Version, AWS::Lambda::Deployment would be updatable for any field. It would cause a new version to be published. There would cease to be any reference in the stack to the version that it previously deployed, but that version would not actually be removed.

Multiple Deployment Resources

You could have multiple Deployment resources, for each of the places in the template you’re referencing a version — each of the versions that has a connection in the above diagram.

AWS::Lambda::ExistingDeployment

It’d be useful to have a way to reference an existing version of a Lambda function, so that if a previous version needs to be referenced again, it can just get dropped in. Maybe AWS::Lambda::ExistingDeployment.

Steps of Evolution

Using the original diagram and annotating the AWS::Lambda::Deployment resources in blue, the evolution would look something like this:

Step 1: We’ve just deployed the very first version:

Step 2: Now suppose we’ve just updated the configuration to give the Lambda function more memory. This wouldn’t require us to use the weighted alias to roll out — we’d just update the Deployment resource.

The weighted alias resource would get updated with the new version reference. Function version v1 would no longer appears anywhere in the stack — but it still exists.

Step 3: Now that we’ve got a code update, we want to roll it out using the weighted alias. We’d add a new Deployment resource to deploy the new code specifying the function configuration as well — even if it hasn’t changed — and point the weighted alias at it. We’d keep both Deployment resources around while we roll out since both are referenced in the stack.

Step 4: Once the rollout is complete, we can update the stack to remove the older Deployment resource and update the weighted alias to remove its reference to that Deployment.

Step 5: Using this approach, here’s the updated diagram indicating the usage of Deployment resources.

For more concise templates, this option would be to provide a separate Configuration resource. It wouldn’t actually do anything itself — but could be referenced by multiple deployments.

What I am modeling is based on a notional system where, using your infrastructure graph such as CloudFormation, you don’t have functions as a first class concept. Instead, imagine that you are telling your event sources “execute this code artifact (from S3), with this IAM role and this configuration”.

In your infrastructure graph, you can create an edge between two nodes — the event source and function code. The edge, a first class concept, contains the configuration for the code to invoke.

I think what I’m converging on is based on a theoretical system where you wouldn’t actually “deploy” Lambda functions. You would just say “this event invokes this code artifact”
— Ben Kehoe (@ben11kehoe) December 17, 2017

This isn’t the future I necessarily want, but I think it’s useful directionally to think about ephemeral compute as where your business logic fits into the system to glue it together.

Get the skills you need for a better career.

Master modern tech skills, get certified, and level up your career. Whether you’re starting out or a seasoned pro, you can learn by doing and advance your career in cloud with ACG.

Start A Free Trial

More Resources on Blue-Green Deployments:

A C.

More about this author