AKS Monitoring: Choose the Right Approach!

21.05

Azure Kubernetes Service (AKS) is an incredibly powerful platform, but keeping a container cluster running smoothly isn’t always straightforward. Monitoring, for example, is absolutely essential for ensuring stability and performance. Easier said than done, of course. 

Logs, metrics, traces … The stream of data is massive but there are many ways to keep an eye on it all. So, how do you choose the right monitoring approach without losing control or blowing the budget? In this blog, we compare Microsoft’s default option with a more flexible, self-hosted alternative. 

The default route: Azure monitor and log analytics 

When you set up an AKS cluster, chances are you’ll reach for Azure Monitor with a Log Analytics Workspace. Makes sense: it’s Microsoft’s standard option and it provides out-of-the-box logging and basic metrics. It’s also fully managed by Microsoft, so you can get up and running rather quickly. 

However, that simplicity comes with trade-offs. You’ll soon find the limits of flexibility. Want to visualise custom metrics from your own applications, or from a database running inside your cluster? Getting that neatly into a single dashboard becomes a lot trickier and the default visualisations often fall short. 

Then there’s the cost: data ingestion (processing all those logs and metrics) does add up fast. Even a standard cluster with default logging and no optimisations can easily set you back €200-€300 per month – and that’s without heavy usage. 

Obviously, you can optimise, but in larger environments, the costs remain high and unpredictable. Add the per-alert charges (€0-€3 per month) and it’s soon clear how fast your budget can spiral. Let’s not forget, finally, that you’re tied to the Azure ecosystem with this approach. 

A more flexible approach: the self-hosted Grafana stack 

Fortunately, there’s a powerful open-source alternative you can install on your AKS cluster: the Grafana stack. It’s not just a single tool but a collection of specialised components that form a full-featured monitoring platform together: 

  • Grafana for dashboards 
  • Prometheus and Mimir for metrics 
  • AlertManager for alerts 
  • Loki and Alloy for logs 
  • Tempo for tracing 

As a whole, they offer a comprehensive and customisable solution. Of course, you could use the Azure-hosted Grafana and Prometheus, but we’d recommend a self-hosted full stack.Why consider this option? Well, the benefits simply speak for themselves: 

  1. First of all, you get maximum flexibility. You can monitor everything running on your cluster, exactly the way you want, and bring it all together in one or more dashboards. A true ‘single pane of glass’, in other words. 
  2. Secondly, it’s an open-source and cloud-agnostic option. So, no vendor lock-in. If you ever decide to switch to another cloud provider or even move on-prem, your monitoring setup can move right along with you. 
  3. A third major benefit is cost. Instead of paying per GB of data or per alert, you mainly pay for the compute resources (CPU and memory) that the stack uses on your cluster. That’s often more predictable (and cheaper in the long run) than Azure Monitor. 

Moreover, AlertManager comes with dozens of useful Kubernetes alerts pre-configured, at no additional cost. To top it all off, the entire stack integrates seamlessly with OpenTelemetry; the open standard for application instrumentation. 

All that flexibility doesn’t come for free, of course. If you go the self-hosted route, you’re responsible for installing, maintaining, updating, and troubleshooting the Grafana stack yourself. It goes without saying that this takes time and expertise. 

Which option does CloudFuel use for clients? 

We’re strong believers in the power of the self-hosted Grafana stack. That is why it’s a standard part of our AKS Baseline; a collection of best practices and pre-configured components we deploy to our clients’ clusters. That being said, not everyone has the capacity to monitor that stack themselves. 

Do you want the benefits of the self-hosted stack (the flexibility, control, and potential cost savings) but not the hassle of managing it all? Then our Care Journey is the perfect fit for you. We take full responsibility for maintaining the monitoring stack (and other baseline components), giving you the best of both worlds: maximum control without the operational burden. 

So, which AKS monitoring option should you choose? 

Azure Monitor is a sensible starting point for your AKS monitoring, and there’s nothing wrong with it. Especially, if you’re not scaling just yet. Just be aware of its limitations in terms of flexibility and the potentially high and unpredictable costs. 

The self-hosted Grafana stack is a more mature, open-source alternative that offers greater control, more certainty about the future, and potentially lower costs. The downside is that you’ll have to handle the management yourself – or outsource it. 

So, we’re not going to recommend one over the other outright. The best choice for you depends on your organisation’s needs and priorities. Analyse what matters most to you: convenience, cost, flexibility, or control? 

Do you have questions about the best monitoring strategy for your AKS environment? Want to learn more about how our AKS Baseline and managed services can support you? Get in touch: we’d love to help you on your cloud journey. 

Smokescreen