Agentic CLI for AKS: FAQs and how to use it

What Agentic CLI for AKS is, how you can use it to make Kubernetes (K8s) troubleshooting easier, and other common questions.

By Steve Buchanan

Apr 13, 2026 • 8 Minute Read

Please set an alt value for this image...

Subscribe to the newsletter

It’s not uncommon to find a tech professional who’s worked with containers and Kubernetes (K8s), from software developers to AI engineers to cloud administrators. Unfortunately, another unifying thing is the following observation whenever something with K8s breaks:

“Troubleshooting Kubernetes is hard!”

…Or at least, that’s the way it’s been for the last decade or so. Thankfully, when you’re using Azure Kubernetes Service, you can now use the agentic CLI for AKS to ease a lot of your troubleshooting issues.

In this article, I’ll cover what the agentic CLI for AKS is, how it can assist with troubleshooting, and walk through a step-by-step example of how to use it. I’ll also answer some other FAQs about this service.

Why troubleshooting in Kubernetes is hard (historically)

1. Kubernetes is complex with many moving parts

Kubernetes is not a single system. You’ve got all sorts of elements including, but not limited to:

Networking
APIs communicating with each other
Containers
DNS
Storage
Language frameworks

Because of this, troubleshooting often requires knowledge across multiple technologies.

2. You’ve often got to troubleshoot the cloud layer at the same time

The most commonly used Kubernetes platforms today are managed Kubernetes services from the hyperscalers such as AWS EKS, Google GKE, and Azure AKS. When something goes wrong, you’re just troubleshooting Kubernetes, but also the infrastructure and services provided by the cloud platform.

3. It’s hard to observe and find the root cause of the issue

Signals are fragmented across logs, metrics, and traces. These signals often exist across multiple tools, frameworks, and infrastructure layers. This fragmentation makes correlating the right signals difficult and increases the time it takes to identify the root cause.

4. Troubleshooting can mean days and weeks of Google searching

Historically, when we run into something unfamiliar in tech, we turn to the internet. This means putting into a search engine the error message or problem we were seeing, then scouring forums, blog posts, documentation pages, and GitHub issues for the answer.

If you’re lucky, this process doesn’t take long. But all too often, this search takes hours, days, or even weeks diving through a sea of technical information and identifying the correct solution.

5. Generative AI is not great at answering specific K8s questions

Generative AI tools like ChatGPT, Claude, and Gemini have greatly reduced the slog of searching for answers when troubleshooting K8s. But while asking these tools for contextual advice is a lot easier than a Google search, there’s still a limitation: these models are trained on public data from the internet.

Why is this an issue? These tools know about Kubernetes and managed Kubernetes platforms like AKS, EKS, and GKE, but they do not have direct access to your specific environment, your workloads, your pods, or your cluster configuration. They get you close to the solution, but engineers still need to bridge the final gap using manual investigation.

All of the above is what leads many tech professionals to cry out with frustration when something goes wrong with Kubernetes. It’s also where the agentic CLI for AKS enters the picture.

What is the agentic CLI for AKS?

The agentic CLI for AKS is an AI-powered command line experience designed to help users operate, optimize, and troubleshoot Azure Kubernetes Service (AKS) clusters. It is built on open-source foundations like:

· HolmesGPT (the CNCF SRE Agent)

· AKS Model Context Protocol (MCP) Server

· A user-configured LLM (OpenAI, Anthropic, or open-source models)

The agentic CLI for AKS allows users to ask questions about their AKS clusters using natural language. The tool collects relevant diagnostics, analyzes the data with the configured LLM, and returns explanations and troubleshooting guidance related to cluster health, configuration, and operational issues.

It was built to assist, not replace, Kubernetes admins. Most importantly, it adheres to three core security principles:

· Local Execution: Diagnostics run on your machine; data is never stored in AKS systems.

· Azure CLI Auth: It inherits your existing RBAC permissions—it only sees what you are allowed to see.

· Bring Your Own AI (BYOAI): You plug in your own approved provider (Azure OpenAI, Anthropic, etc.), keeping your organization in control of data retention.

What are the benefits of the agentic CLI for AKS?

The goal of the agentic CLI for AKS is to simplify the operational complexity of Kubernetes by helping engineers quickly identify issues and understand what is happening inside their clusters.

By providing AI-assisted diagnostics and explanations, the tool helps improve reliability, streamline operations, and enhances the overall developer and operator experience. Ultimately, the mission is simple: to enable developers, SREs, DevOps engineers, and platform teams to operate AKS more effectively and resolve issues faster.

Is the agentic CLI for AKS just another AI Agent for AKS?

No. The agentic CLI is best understood as a troubleshooting and diagnostic assistant, not an operational AI agent. It does not perform automated actions inside your cluster. Instead, it gathers information, analyzes it, and provides insights and recommendations that an administrator can choose to act on.

You can think about the difference this way:

· The agentic CLI for AKS is an assistive diagnostic tool

· Kubernetes agents, like Kagent, are operational automation agents

Another key distinction is where the tool runs. The agentic CLI runs locally on an administrator’s machine or in a pod on the AKS cluster and interacts with the AKS cluster through familiar tools such as kubectl and the Azure CLI. It collects diagnostic information, sends that context to the configured LLM, and then provides explanations and troubleshooting guidance.

Kubernetes-native agents like Kagent typically run inside the cluster as pods and are designed to automate operational tasks or workflows.

Because of this, the Agentic CLI is particularly useful for interactive debugging and investigation, while Kubernetes-native agents focus on deeper automation.

Agentic CLI for AKS vs Azure Monitor AI Investigation

Azure Monitor has recently introduced AI-driven investigation capabilities within its observability platform. While both tools use AI to assist with troubleshooting, they solve slightly different problems.

Azure Monitor AI investigation

· Operates inside the Azure Monitor observability platform

· Analyzes metrics, logs, and traces already collected in Azure Monitor

· Helps correlate signals across resources and services

· Provides automated incident investigation summaries

Agentic CLI for AKS

· Runs locally as a CLI tool used by engineers

· Collects diagnostics directly from the Kubernetes cluster

· Allows interactive troubleshooting using natural language

· Focuses specifically on Kubernetes operational debugging

The bottom line: Use Azure Monitor for high-level correlation of logs and metrics across your entire fleet. Use the Agentic CLI when you are "hands-on-keyboard" and need a deep-dive assistant to interrogate the live state of a specific cluster.

How to deploy the Agentic CLI for AKS

The CLI supports two modes: Client Mode (for local investigation) and Cluster Mode (running as a pod via Workload Identity).

Client Mode

Client mode runs the Agentic CLI locally on your machine using Docker containers.

Run on your local machine through a terminal (i.e. VS Code terminal):

          az account set --subscription "your-subscription-id-or-name"
 
# Install the agentic CLI for AKS extension
az extension add --name aks-agent --debug
 
# Update the agentic CLI for AKS extension
az extension update --name aks-agent --debug
 
# Verify the agentic CLI for AKS is running
az aks agent –help

# Initialize client mode:
az aks agent-init --resource-group $RESOURCE_GROUP --name $CLUSTER_NAME

    

Next you will be prompted to select a deployment mode and then prompted again to configure your preferred LLM provider. You will need the model name, API key, API base (URL for the endpoint), and API version.

Cluster Mode

Cluster mode deploys the agentic CLI components directly into the AKS cluster as a pod using Kubernetes service accounts and workload identity.

Run through Azure Cloud Shell:

          az account set --subscription "your-subscription-id-or-name"
 
# Install the agentic CLI for AKS extension
az extension add --name aks-agent --debug
 
# Update the agentic CLI for AKS extension
az extension update --name aks-agent --debug
 
# Verify the agentic CLI for AKS is running
az aks agent –help

# Initialize cluster mode:
az aks agent-init --resource-group $RESOURCE_GROUP --name $CLUSTER_NAME

    

After initialization, configure the LLM provider that the tool will use for analysis.

Using the agentic CLI for AKS

Once deployed, the agentic CLI can help diagnose a variety of Kubernetes issues.

Put it to Work: Real-World Use Cases

Once deployed, you can stop hunting for error codes and start asking questions:

Pinpoint PDB violations, quota issues, and IP exhaustion

      az aks agent "my AKS cluster is in a failed state, what happened?"

Detect resource constraints, affinity mismatches, and zone limitations

      az aks agent "why is my pod stuck in Pending state?"

Diagnose kubelet crashes, CNI failures, and resource pressure

      az aks agent "why is one of my nodes in NotReady state?"

Identify DNS issues such as CoreDNS failures or network misconfigurations

      az aks agent "why are my pods failing DNS lookups?"

Instead of manually inspecting dozens of logs and cluster resources, the CLI gathers relevant signals and provides a guided explanation of what might be happening.

Conclusion

The journey of troubleshooting Kubernetes has moved from manual documentation searches to general AI chats, and now, finally, to context-aware agentic tools.

The agentic CLI for AKS bridges the "last mile" gap by combining the reasoning power of LLMs with the real-time context of your specific cluster. By keeping execution local and allowing you to bring your own AI, it provides a secure, powerful way to slash your Mean Time to Resolution (MTTR). It doesn't take your job; it takes the "hard" out of the Kubernetes troubleshooting experience.

Learning more about agentic CLI for AKS and other topics

If you want to learn more about agentic CLI for AKS and give it a try, you can check out Microsoft’s official page here. There are also the following learning paths on Pluralsight to help you improve your skills and knowledge in AKS, general Kubernetes, and agentic AI:

· General Kubernetes

· Azure Kubernetes Service (AKS)

· Certified Kubernetes Administrator (CKA)

· Integrating Agentic AI for Developers

Steve B.

Steve Buchanan is a Principal PM Manager with a leading global tech giant focused on improving the cloud. He is a Pluralsight author, the author of eight technical books, Onalytica's Who’s Who in Cloud?-top 50, and a former 10-time Microsoft MVP. He has presented at tech events, including, DevOps Days, Open Source North, Midwest Management Summit (MMS), Microsoft Ignite, BITCon, Experts Live Europe, OSCON, Inside Azure management, keynote at Minnebar 18, and user groups. He has been a guest on over a dozen podcasts and has been featured in several publications including the Star Tribune (the 5th largest newspaper in the US). He stays active in the technical community and enjoys blogging about his adventures in the world of IT at www.buchatech.com

More about this author