How Amazon CloudFront Saved My Azure Project

How Amazon CloudFront Saved My Azure Project

In September 2024, Edgio, the main CDN provider we used for one of my large enterprise projects, filed for bankruptcy. Edgio was natively integrated into Azure, allowing you to use it without leaving the Azure ecosystem. It also featured a powerful rules engine and didn't require any upfront or monthly fees, only charging for actual usage. If your project was on Azure and you didn't want to purchase a third-party tool like Cloudflare, it was the absolute best choice.

The sundown was scheduled for the last quarter of 2025, so with plenty of time ahead, and Microsoft also included this note in their announcement, with a slight hint, that they can only try to keep the lights on, but there’s no guarantee.

As is often the case with technology, this didn't turn out as expected.

In this article, I'll explain how we managed to stay online despite the tight sundown schedule.

Brief Overview of the Situation

First things first: the project I'm discussing uses Azure as the main cloud provider. We didn't use AWS before and only a few parts of GCP. This makes sense because managing multiple clouds adds extra costs. Moreover, Azure is a great cloud platform, and Microsoft does many things well.

One of these advantages was the seamless integration with Edgio as a CDN solution, which you could use right away without any upfront costs. It was much more powerful than Azure Frontdoor.

Edgio’s main benefit was its powerful rules engine, which allowed for extensive customization. And we’re making heavy use of this engine as we’re running a lot of micro frontends (even better: single page apps) and other technical helpers behind a single domain.

Edgio's Rules Engine with Azure Blob Storage behind.

This complicates matters, as we need to handle redirects and rewrites within our CDN.

All these rules rely heavily on regex, not just for the conditions but also for the redirect and rewrite actions. We match groups and use them in the following actions. Many of the rules even include negative lookaheads, which isn't common in a typical CDN rule set. Adding to the challenge, most of the rules were written by former team members. Since few changes were needed in the past, the current team was unfamiliar with the existing rules.

All the rules had to be set up in a cumbersome web console with many dropdowns, and there was no way to test these rules beforehand. There was also no support for common Infrastructure-as-Code tools like Pulumi or Terraform, so changes had to be manually transferred to each stage. At the end of a new configuration, Edgio would generate an XML file that exactly matched the rules configured in the dropdowns. At least this was something that could be easily recorded for history.

In summary, even though the developer experience wasn't ideal, the service performed its job exceptionally well. In the end, that's what matters most.

The Crisis

After reading the initial announcement and seeing the reports about a potential acquisition by Akamai, we were quite confident that we have plenty of time to migrate.

The initial migration plan.

Right within my vacation and a few days before Christmas holidays, Microsoft announced that Edgio will not shut down in Q4 of 2025, but at the 15th of January 2025. So our timeline changed from “more than 9 months” to “less than 4 weeks” at the worst possible time of the year.

The "less than 4 weeks" didn't consider that in Germany (where the project and most of its employees and suppliers are based), people typically don't work between Christmas and New Year. It's also common to take the first week of the year off.

The updated migration plan.

This means the realistic timeline is less than 2 weeks or less than 10 working days. For replacing a major tool in a large project with many unknowns and no evaluated migration strategy, this isn't just a tight schedule—it's a perfect storm.

Decision to Migrate

Microsoft sent an email saying that all Edgio-based deployments will automatically join a "best-effort migration" to Azure Frontdoor "between the 7th and 14th of January." Since we were already using Azure Frontdoor elsewhere, I was worried that this "best effort" approach wouldn't work well for us. Our rule set, with over 90 conditions and more than 70 rewrite rules, was quite complex.

I doubted it was feasible to implement this with Frontdoor.

Evaluating Options

When I returned from vacation on December 30th, I immediately began looking into this, even though my regular work was set to start on January 2nd. I knew we couldn't afford to waste any days.

Being a rational person, I've considered our options:

  1. Wait for Microsoft to migrate our Edgio configuration to Frontdoor and hope for the best.

  2. Immediately search for a good alternative and rebuild everything there.

As mentioned earlier, I had almost no trust in option 1 because of Frontdoor's limited features and my extensive experience with Microsoft support, which is incredibly frustrating, even with an enterprise support plan.

Besides that, an improper in-place migration of existing DNS entries could essentially take us offline even before January 15th.

Missing Feature Parity between Edgio and Frontdoor

I'm not an expert in CDNs, and content delivery and caching are complex topics on their own. This combined with the unknowns of the current rules and the tight schedule, we definitely couldn’t come up with something nobody every used before.

This meant my first task was to see if I could rebuild the rules in Frontdoor myself. It didn't take long to realize that Frontdoor couldn't meet some essential requirements. The first rule I tried to rebuild immediately failed with an error message: "negative lookahead regexes are not supported."

I thought, "I could probably rebuild this without using a negative lookahead. It won't be perfect, but it's possible." However, the challenges didn't stop there. I couldn't easily extract matching groups from my conditions into the rewrite actions, nor could I use any regex matches in my actions. I couldn't find any documentation on whether this was supported, and there was nothing on Stack Overflow. GPT and Sonnet were giving me incorrect, halluzinated answers, that weren’t working, so I hit a dead end within the first few hours.

Why Amazon CloudFront Was Choosen

I have a long history with AWS services (who would have guessed?), so my first thought was: "Rebuilding these rules with simple JavaScript that runs on the edge with CloudFront will be very easy and fast."

I was confident AWS was already in use elsewhere in the company, making it a practical solution. Choosing a new third-party tool that hadn't been purchased yet wasn't an option, as the process would take more time than we had.

Migration Process

In my mind, the goal was clear from the start:

  1. All redirects could be easily managed with a CloudFront viewer request function. This function is triggered whenever a CloudFront URL is accessed. We can check the request URL and instantly send a redirect if it meets our conditions.

  2. URL rewrites will happen in a Lambda@Edge function at the origin request step of the process. This function is invoked when the origin is called and CloudFront doesn’t already have something cached.

  3. For debugging purposes, a viewer response Lambda@Edge function can be used. This is great for tracking our original request URL and the URL that was actually called at the origin.

  4. In the viewer response CloudFront function, we can define our caching behavior and handle everything else we might need to do.

CloudFront Functions are faster and cheaper than Lambda@Edge, but they have a limited feature set and more restrictions. For example:

  • The maximum code file size is 10KB.

  • The maximum execution time is 1ms.

  • Only ES5.1 is supported.

That would be sufficient for all our functions, but unfortunately, AWS only permits the use of CloudFront functions in the viewer parts of the request. However, since origin requests are mostly cached, they won't be triggered very often.

The picture above illustrates what the final solution should look like when everything works together. By using CloudFront functions for viewer requests and responses, we can significantly save money and improve performance.

Step-by-Step Migration Process

The migration involved several steps:

  1. Reverse engineer all rules and convert them to JavaScript.

  2. Set up the necessary AWS infrastructure and configure everything correctly.

  3. Expand the existing tests to thoroughly cover every rule and scenario.

  4. Deploy the new solution without changing the DNS entry to ensure it works. By adding a single line to /etc/hosts, we can redirect an existing DNS entry to a specific IP (one from our CloudFront distribution). This allowed us to safely test the new solution in our heavily integrated staging environment without affecting other teams.

  5. Switch the DNS record to CloudFront in the staging environment.

  6. Run smoke, end-to-end, and acceptance tests.

  7. Go live by switching the production DNS records.

With a proper timeline, this wouldn't be a big issue, but we had almost no time.

Faced Challenges

Without any surprises, we faced many challenges along the way.

Reverse Engineering

Even with AI helpers, reverse engineering existing solutions can be quite challenging. This is especially true for something that is entirely XML-based and lacks much documentation available online.

The first major challenge was to understand every rule and, if any were missing, add proper tests to ensure the new solution works as needed. Since the CDN serves multiple teams and has many custom rules, this was not an easy task.

With nearly 100 conditions and over 60 rewrite actions, it was difficult to convert everything into clear and functional JavaScript. I also aimed to make everything as simple as possible.

CloudFront Limitations

As mentioned earlier, CloudFront functions have limitations, such as the maximum code size and execution time. Initially, when I rewrote our redirect rules, they couldn't fit inside a CloudFront function because the code size was too large. Additionally, due to the heavy use of regex, the maximum execution time would also be exceeded. Also, I’ve used quite a lof of ES6+ features.

Due to this, I’ve started to use Lambda@Edge also for the viewer request. After a few calculations I’ve come to the conclusion that this would have a dramatic impact on our bill, as Lambda@Edge is much more expensive:

  • Instead of $0.1 per 1m invocations at CloudFront functions, we’d be charged $0.6 for Lambda@Edge.

  • Free tier only covers 1m executions per month instead of 2m at CloudFront functions.

  • Besides the per-request charges, we’d also pay for the execution time based on GB-secs.

This led me to realize that I needed to make our viewer request function compatible with CloudFront functions:

  1. I’ve needed to transpile the function to ES5.1 with babel in the build process.

  2. I need to minify everything to get below the 10KB size limit.

  3. I need to get rid of the slow lookahead regexes to make the function faster.

Except for the last task, this was obviously very easy. Rewriting the rules without using too much or slow regex wasn’t.

Fast computation results for our request viewer CloudFront function.

But in the end, with some support by my colleagues, we’re able to find a solution that took our average processing time down from 13ms to way less than 1ms.

💡
You can test CloudFront functions directly in the web console. It allows you to define the request object, including the URL, headers, and cookies—essentially anything you can think of—and submit it to your function. It also provides a compute effort score from 1 to 100, where a lower score is better. If the function often scores over 70, CloudFront might throttle your functions. In our rewrite, our viewer request function improved from an initial score of 95-99 to a solid 3-5.

Another thing mentioned earlier is that cache invalidations are not ideal for us because we heavily rewrite the request URL in the origin request. It would be better to invalidate based on the origin's paths. However, this is not something we can change, and it's not a significant downside.

AWS Service Quotas on New Accounts

When you create a new AWS account, you’ll mostly have much lower limits than in the past. These low quotas can be increased via the AWS support. Some limits will even be automatically approved.

One limit that is quite frustrating is the Concurrency Limit for AWS Lambda. For new accounts, it is set to 10 instead of 1,000. This means if there are 10 ongoing executions, the next one will be throttled. For Lambda@Edge, this will always result in errors for the client.

Since we're not an enterprise customer at AWS, we don't have a paid support plan. This made me worry that it might take a long time to increase these limits. Fortunately, I knew that AWS Support on platforms like X/Twitter is very quick and effective. I decided to explain our situation in hopes of getting our limits increased right away.

I received a response almost immediately, and after a private message exchange clearly describing the situation, our limits were increased in all regions.

Business Requirements

We thrive for a verticalized solution for our applications and products, meaning a teams owns their stack from top to bottom. Excluded from that are the DNS records, as they are considered too business critical. This also includes the certificate infrastructure.

We couldn't use a managed Amazon-issued ACM certificate, so we had to create a CSR and import the issued certificate into ACM. In an enterprise company, this takes extra time because you need to find the right people and start the necessary processes.

We also needed to set up new AWS accounts integrated with our Azure-based SSO, allowing us to use them in a compliant way. Fortunately, this process went very quickly.

Time Constraints

Without time pressures, this migration would have been enjoyable, and this post probably wouldn't have been written.

However, with a tight schedule and the risk of significant business impact and potential damage to the company's reputation, this task is not fun.

We encountered many major and minor issues in our staging environment where things didn't work as expected, so we needed to do a lot of fine-tuning and update some functions.

Benefits of The New Solution

I am pleased with our new CDN provider due to its many benefits:

  • We can now deploy using Infrastructure as Code and have everything managed as code.

  • It is very affordable for our traffic needs.

  • We can finally support multiple origins (any HTTP or native AWS).

  • We have full control at every step of the request.

  • Instead of hard-to-read XML, we have simple, readable JavaScript rules.

  • We can test our functions in advance.

  • We can have zero-cost testing stages.

  • Deployment time decreased from 20-40 minutes to 2-5 minutes without version flickering.

  • AWS support is much better than Azure's.

Everything has its downsides: cache invalidation isn't ideal for us because CloudFront caches based on the request URL, while Edgio could cache on the origin URL. Additionally, cache invalidation is very costly with CloudFront. We are charged $0.005 per invalidation path (though a wildcard counts as a single path, even if it invalidates thousands of files).

But that's the only downside at the moment.

A Few Words About The “Best Effort Migration”

Microsoft migrated our Edgio configuration to Frontdoor two days before the shutdown, and it went as expected.

  • About 30% of the rules were "migrated."

  • 70% of the rules didn't appear in the new Frontdoor Rule Set and were simply ignored.

  • The migrated rules were copied as-is, meaning they included Edgio syntax not supported in Frontdoor (e.g., $1 or $2 used in rewrite actions to refer to regex groups in the matcher).

In summary, nothing worked anymore. Since we switched to CloudFront just before this migration, we didn't feel the impact and stayed online without disruptions. However, it was a very close call.

Important: I can't blame Microsoft for this situation. Edgio was very popular among Azure customers because of its great pricing and features, so I can only imagine how many migrations they had to handle. Also, due to the lack of feature parity, it simply wasn't possible to migrate complex configurations.

Conclusion & Lessons Learned

I believe the lessons learned here are straightforward and easy to summarize.

  • If something is labeled "best effort," don't expect more than that.

  • Don't underestimate an upcoming shutdown, especially if it's already marked as "trying to keep the lights on."

  • Have viable alternatives ready in your mind.

In our recent situation, considering the circumstances, everything went better than I could have imagined. Still, I wouldn't want to go through this again in the near future, as it was really nerve-wracking.

Obviously, the company wasn't initially aiming for an AWS solution in an ecosystem that was almost entirely Azure-based. However, everyone worked together seamlessly to make it happen just in time.

Customers didn't notice anything during the migration process, and most employees didn't either.