Smart Cloud Cost Management Strategies to Reduce Expenses

Moving applications and infrastructure from on-premises and data centers to the cloud changes the economics of your costs.

Indeed, one of the principal benefits of migrating to the cloud is to switch costs to where almost everything is an operational expense (OpEx), in place of capital expenses (CapEx).

This article helps to outline key cloud cost considerations as you move applications and infrastructure from on-premises to the Cloud. The OpEx savings principles and computing flexibility benefits of cloud-based systems are well documented. However, as we’ve seen, there are also some important factors to consider ensuring your cloud costs remain predictable month to month and don’t rise unexpectedly.

Before we explore the cost “risks” of cloud-based applications and infrastructure, let’s quickly outline some of cost benefits moving to the Cloud:

Cloud technology allows for rapid growth in computing resources without significant capital investments.
Engineers gain access to the world’s best infrastructure out-of-the-box, and access to practically unlimited computing scale.
Consumption-based costs allows companies’ to cycle through ideas quickly, trying a lot of different things without incurring significant infrastructure or architecture costs.
Elasticity – the costs-curve can follow the actual demand-curve very closely, reducing wasted resources from over up-scaling.
Business Unit independence – Cloud environments can be setup so different business units’ computing spend can be separated from each other. In this way, Business Units can make independent infrastructure decisions, and computing consumption and costs can be independently managed and billed.

Additionally, other cost benefits include the fact that someone else is patching and updating underlying infrastructure, integrations between services are much easier, and infrastructure as code enables much more granular on-demand resource scaling (up and down) as needed.

Cloud Cost Hazards

Along with substantial cost benefits, there are some important cloud cost “hazards” to be aware of as you consider cloud-based applications and infrastructure.

Cloud is a very powerful and flexible tool. As with any powerful tool, it’s easy to hurt oneself using it. So care and planning is essential for large deployments. If you’re coming from a CapEx dominated traditional cost model, the Cloud cost model turns everything upside down.

Here are some new cloud application and infrastructure cost categories to address:

Unexpected costs
Expensive mistakes
Garbage Creep
Complexity

Let’s look at each of these in some detail and propose potential remedies you might explore.

Unexpected Costs

When everything is in OpEx, costs can swing significantly from one period to another. Budgeting process needs to be aware of that. We can distinguish two types of unpredictability:

Caused by unexpected increased legitimate usage. For example, after your application is mentioned in a very popular resource, or perhaps after successful marketing that dramatically raises awareness of your app and brings many new customers/users.
Caused by not useful load. A classic example of this would be a DOS/DDOS attack.

Usually the first type of unpredictability is a good one. You just want to make sure that the revenue-curve is ahead of the cost-curve, so if you get a spike in usage you also get a spike in revenue to pay for that usage.

The second type of unpredictability is one you need to protect against. The remedy? All cloud vendors offer DDOS protection services, as do plenty of 3rd party vendors. However, DDOS defense might not be simple and costs can rise disproportionately from the volume of attack. Nevertheless, fast reaction time is critical in this case.

Expensive Mistakes

When all your infrastructure is defined in code, one small mistake can lead to hundreds of thousands of dollars lost in a matter of hours, or even minutes.

An example of this might be a script which creates resources in an infinite cycle or does cleanup of resources incorrectly after the job is done. The remedy for this is similar to the wasteful unexpected costs. DevOps team needs to establish robust and responsive alerts around cloud costs to catch runaway processes. Unfortunately, costs data usually lags behind actual API calls by a few hours and alerts can only help so much. As with any mission critical software, a good software development lifecycle (SDLC) is necessary for cloud deployment. Software needs to be tested and monitored.

Garbage Creep

Over time, idle unterminated resources can start to accumulate, and it might be hard to figure out in a constantly changing production environment which resources are still critical and which ones are now just garbage.
Engineers naturally tend to stay on a safe side and so they tend to keep resources in place without terminating them. This leads to an accumulation of idle resources and consequently an accumulation of costs associated with them.

An example of such resources can be unterminated instances, hard drives, or data in the object storage. The remedy? Garbage creep can be addressed by establishing a regular review and cleanup procedure which involves going through all resources and finding which of them can be disabled. Cloud vendors(AWS, Google Cloud, Microsoft Azure) and 3rd party vendors offer many very useful tools that allow tagging, reporting and monitoring resource usage. But, for the most part the cleanup still needs to be supervised by a human – especially if the tagging is not thorough.

Complexity

Cloud costs encapsulate complex engineering and financial concepts into a single dollar number – which makes cloud costs very complicated to untangle. This makes cloud costs a black box for customers, since we don’t really know what’s driving those costs.

Different application and infrastructure architectures will lead to very different costs, up to 10x cost differences. So, it’s vital to design and model several architectures — even if you have previous cloud experience – to determine which is the most cost effective, and BEFORE you commit to any development.

There are many tools which help cloud costs management. But even these tools can have a hard time keeping up with the complexity and breadth of cloud offerings. This means there’s no replacement for in-house human expertise dedicated to cloud costs management.

All cloud vendors offer tools to analyze costs and, depending on the scale of your cloud environment, they might be sufficient. But for large deployments, it’s very likely you will need more insights and more ways to dissect data. Since costs are a structured data, there are so many 3rd-party tools you can use to better understand your cloud costs (Excel, SQL databases, BI tools, etc.).

Final Thoughts

Both pre-planned cost analysis and ad-hoc cost analysis are very helpful to find reasons for increased spend and to catch spending anomalies, such as a potential waste of cloud resources.

The person or team responsible for analyzing cloud costs needs to be proficient with data analysis techniques, needs to understand utilized cloud services and their cost structures, and needs to understand the application and infrastructure itself. This is a very important cross-functional role which can be challenging to hire, but the right people in this position can be a lifesaver for the company.