How Ahrefs Gets a Billion Dollar-Worth Infrastructure With a 90% Discount

Efim Mirochnik
Ahrefs
Published in
10 min readMay 8, 2024

--

The buzz around our $400 million savings over three years at the Singapore data center article was hard to ignore. The article laid out how 850 identical servers stacked up against their AWS counterparts based on a single month’s spending. Now, let’s zoom out and consider what the total cost would have been if Ahrefs had shifted its entire on-premise infrastructure to the cloud from the start of its colocation journey.

Historically, Ahrefs started with colocation, then moved to using hosting providers like OVH, SoftLayer, Hetzner and others. In 2017, the company returned to colocation and has since expanded this approach. This review covers actual colocation-related expenses over a six-year period from July 1, 2017, to June 30, 2023. We have compiled all the expenditures from our data centers, grouped them by month, and created a graph of our historical monthly spending.

Historical Monthly Ahrefs Spending

The light blue bars on the graph represent colocation spending including regular monthly expenses, such as committed power, consumed electricity, fees for the internet services, and also one-time expenditures including racks, power distribution systems, biometric access, etc. The dark blue bars represent purchases of network and server hardware, along with related licenses, support, and professional services for installation and cabling.

In the early 2023, we made our largest single-month investment in the infrastructure, totaling about $19 million. Below, you can see some of the servers that were ordered at that time.

Ahrefs servers

Now, let’s zoom out and look at our total on-premise spend since 2017. Naturally, the sharp peaks in the monthly expenditure bars align with noticeable increments on the cumulative spending graph.

Historical Monthly and Accumulated Ahrefs Spending

Overall, Ahrefs has spent $122 million to support its on-premise infrastructure since 2017. This represents a major cost center for the company.

Now, let’s compare this with the costs of AWS equivalents for the servers and storage Ahrefs used each month. AWS offers two primary types of EC2 instances:

  • AWS On-Demand: This option offers the greatest flexibility with a basic, non-discounted hourly rate.
  • AWS 3-Year Reserved Instance Paid All-Upfront: This is the most cost-effective choice, requiring a three-year commitment and payment upfront.

For more detailed information on how we selected the equivalent EC2 instances for our servers, the EBS for storage, and the underlying assumptions, please refer to the Appendix at the end of the article.

To fully illustrate the comparison, we need to zoom out ten times to display the equivalent AWS spending.

Accumulated Ahrefs and Equivalent AWS spending, $M

Shockingly, neither AWS option results in less than a $1 billion price tag, while Ahrefs managed to keep its costs about 10 times lower. Imagine the staggering possibilities with that spare $1 billion instead of drowning in cloud expenses.

  • Could a company replicate Facebook’s shrewd acquisition of Instagram for $1 billion?
  • Or perhaps venture into the aviation industry by purchasing Air New Zealand, valued at $1.07 billion?
  • Maybe even construct a record-breaking marvel like the Burj Khalifa for $1.5 billion?
  • Or more practically, why not buy a 1GW nuclear-powered data center for just $650 million, akin to Amazon, and turn a profit by leasing or reselling it to the other cloud providers?

The financial disparity is not just alarming — it’s a wake-up call.

If Ahrefs had opted for a trendy cloud solution in 2017 instead of pursuing colocation, the company might not have existed today, let alone be profitable. At best, we would be constantly stressed, struggling to cut cloud costs and spending considerable time and effort on cloud-specific optimizations and strategies. However, as shown in the graph above, both approaches to optimization — ranging from the priciest on-demand services to “up to 60% discounted” three-year reserved instances with upfront payments — don’t differ significantly in the long run, at least for the Ahrefs’ infrastructure. Possibly because we regularly add servers and utilize hundreds of petabytes of fast SSD storage, the discounts of the reserved instances can’t reveal themselves?

But lets’ find out. What if we maintain our infrastructure as it is and look into the future? Let’s assume we don’t add or remove any components and continue paying our current monthly expenses as they were as of June 30, 2023. To see the full scope, we’ll need to zoom out more.

Past and Forecasted Ahrefs and AWS Spending

The dashed lines representing AWS on-demand and Ahrefs’ own infrastructure continue straight as expected. However, the dashed yellow line has multiple steps. Even though we haven’t added anything new, we still need to renew our EC2 reserved instance subscriptions and pay the full upfront amount every three years, which creates those repeating steps.

So, if we hypothetically freeze our infrastructure for the next three years, it would cost Ahrefs an additional $24 million — a significant sum for us. However, maintaining the same setup in AWS would cost $1.2 billion with the cheapest reserved instances option. Opting for the on-demand services would escalate costs by $2 billion over the same period. Just imagine the revenue that a company needs to cover these massive cloud bills and still turn a profit!

You just didn’t optimize it in a cloud-native way!

Some have advocated for a shift to a serverless architecture, embracing the widely promoted “Pay For What You Use” principle. This approach can work well for resources that aren’t used often or have a light load. However, the Amazon Prime Video team showed that by switching their heavy-loaded applications from AWS serverless to EC2 and ECS platforms, they were able to cut costs by 90%.

For Ahrefs, the idea of switching to serverless doesn’t hold water. The 850 servers we discussed previously utilize 86–92% of the total power available in their data hall each month, showcasing high utilization. Given that these servers are approximately 1/10th the cost of their AWS EC2+EBS counterparts, and drawing on Prime Video’s drastic cost savings, it’s clear that Ahrefs’ infrastructure costs could skyrocket 100 times or more in a serverless setup.

Another proposed optimization was the use of the spot instances. However, this strategy is unsuitable for us due to our servers’ consistent high load, requiring stable and continuous instances, not intermittent ones.

The final suggestion was to negotiate steep volume discounts with cloud vendors. For these discounts to make financial sense, they would need to be at least 90%, reducing the costs to just 10% of the original expenditure. You may try… Hopefully, this article will help you articulate better.

What about people?!

The previous article received a lot of comments, that can be summarized as follows:

People costs are not included in your calculations! You need many more people with various skills to support your failing hardware! Without them, your math and conclusions are totally wrong!

We didn’t include personnel costs in our calculations because they don’t significantly impact the overall financial outcome. All of Ahrefs’ servers, spread across various colocation data centers, hosting providers, and AWS cloud, are effectively managed by a compact team of eleven people. This group comprises infrastructure and SRE engineers, data center technicians, and the author of this article. These skilled professionals have been managing, maintaining, upgrading, and troubleshooting our servers, networks, operating systems, and multiple software systems across different environments for many years.

In June 2023, we had the maximum number of servers from the reviewed period. That month, our team addressed 94 hardware issues across 3300 servers, demonstrating their capability to manage substantial challenges with a daily average of just over four issues.

Hardware related issues in June 2023. Each issue may contain more than 1 problem.

However, let’s explore the possibility that Ahrefs might be handling things incorrectly. Imagine we need to substantially increase the number of professionals to maintain our 3300 physical servers because they are not cloud-based and prone to issues. The hypothesis is that expanding our team to support colocation servers would make cloud solutions more advantageous than our current colocation setup.

Let’s consider a hypothetical scenario where we assemble a comprehensive team to support our server hardware. This team would include DevOps, hardware engineers, data center technicians, solution architects, as well as staff for our network and security operations centers (NOC and SOC). Additionally, we would recruit procurement, project, capacity, asset, delivery, sustainability, and diversity managers, accompanied by team leads, vice presidents, senior vice presidents, recruiters, and HR business partners to oversee these operations. Lets be generous and account for me overlooking some more critical roles and round up the final hiring number to 120 professionals focused exclusively on server support. Given that our entire company comprised about 120 people as of June 2023, this hiring initiative would effectively double the size of the company.

In this generous flight of fancy, let’s imagine that these hypothetical employees earn an eye-watering salary of $500,000 per person per year on average. Thus, the annual expense for this “dream team” would reach $60 million, totaling $360 million over six years. Interestingly, this amount is three times what we’ve spent on our entire infrastructure. But, hey, that’s the math.

Now, combining our infrastructure costs of $122 million with this extravagantly priced team brings our total to $482 million. This outlandishly costly and overqualified imaginary team, equipped with every conceivable skill, still makes owning our hardware half the cost of the most affordable cloud option — which, by the way, doesn’t account for any personnel costs. Cloud services also require human oversight, which we’ve omitted from our calculations.

Our hypothesis that beefing up staff to support our colocation setup would tip the scales financially and render the cloud more advantageous turned out to be off base.

Our real team handling cloud, hosting, and colocations is much smaller, and by the way, we are hiring to extend it, so that we can handle more servers soon.

Conclusion

Choosing colocation for the infrastructure was the right decision for Ahrefs. Reflecting on the past six years, the data centers with our own servers and network cost us $122 million, a figure that would have ballooned to an astronomical $1.1 billion if we had opted for the cheapest AWS cloud offering. Our powerful on-prem servers enable our developers to focus on enhancing our products for clients, rather than getting caught in a never-ending cycle of trying to optimize cloud costs with little hope of success.

Don’t get me wrong, cloud technology, especially AWS, provides remarkable flexibility for quick deployment, making it ideal for urgent projects like prototype creation, rapid testing, or edge infrastructure rollout globally. Another significant advantage is its ability to minimize time and risk, offering instant server access and avoiding the bureaucracy and delays of physical setups. If waiting 2–9 months for the new servers or data centers is impractical, the cloud can still be a viable option for you.

Cloud Service Providers heavily market the cloud as the sole viable infrastructure solution, complicating product comparisons and emphasizing its perceived benefits. However, long-term comparisons with colocation infrastructure reveal that sustained cloud use can become prohibitively expensive, questioning its suitability as a universal solution.

Perhaps clouds for companies should be considered like drugs for people? As a medication, the cloud can be extremely useful in the short term or for specific, narrow applications, boosting a company’s performance and bringing huge benefits. However, using the cloud extensively over the long term may drain too much money, potentially harming the business and leading to bankruptcy. Like any medicine, use cloud responsibly in reasonable doses.

Appendix. Assumptions and calculations

To begin with, we consolidated and analyzed the invoices from our colocation data centers. We then selected the closest possible AWS equivalents for comparison. For the pricing of EC2 and EBS services, we used AWS rates from the end of the reviewed period. We were choosing the prices from the regions most closely matching the locations of our Ahrefs servers, or the nearest geographical alternatives. Then, based on the number of specific servers per each month in our data centers, we calculated amount of money for AWS alternatives.

Finding suitable EC2 alternatives for our servers wasn’t always easy. Most of our servers pack dual AMD 64-core/128-thread CPUs across three generations, each rigged up with 1.5TB or 2TB of RAM and multiple 15TB NVMe SSD drives. The closest option in AWS EC2 was the i3en.metal instance, mainly chosen for its 8x 7.5TB NVMe SSD drives. Still, we needed from 2x to 3.5x i3en.metal instances to match the CPU, RAM, and SSD resources of a single one of our servers.

For storage, we utilized internal SSDs and HDDs within EC2 instances wherever possible, and supplemented any additional storage needs with the most cost-effective AWS gp2 (SSD) and sc1 (HDD) storage options available. We matched the drives based on capacity rather than performance, disregarding IOPS costs and significantly slower performance of AWS EBS storage. For instance, the throughput of AWS gp2 volumes is limited to 250MB/s, which is considerably slower compared to the 3–7GB/s speeds typical for our PCIe Gen3 and Gen4 NVMe SSD drives.

The table below outlines some additional consideration and benefits we attributed to AWS in our calculations, as it wasn’t feasible to precisely quantify them. We estimated a minimal value for AWS consisting from EC2 and EBS only, although the actual figures are likely higher.

Assumptions and details

AWS server costs began when the servers were delivered, not when Ahrefs initially paid for them and awaited their arrival. For example, the servers we purchased in March and April of 2023 were not delivered until August 2023. Consequently, the equivalent AWS spending in our analysis starts from August 2023.

Accumulated Ahrefs and Equivalent AWS spending 2017–2020, $M

The latest feedback points out that the term “all-upfront” for a 3-year reserved EC2 instance might be misleading. Although these instances are paid for in full, they typically lack sufficient internal storage, necessitating additional monthly payments for EBS. This is why, in the graph above, the Beta angle often appears larger than Alpha when we include servers that require significant storage capacity.

--

--