How to Deploy an Azure AKS Cluster
This tutorial walks you through deploying a managed Kubernetes cluster on Azure Kubernetes Service (AKS) through Planton. You will write a YAML manifest describing the cluster you want, deploy it with a single CLI command, and connect kubectl to the running cluster. By the end, you will have a production-ready AKS cluster with autoscaling node pools, Azure CNI Overlay networking, and Azure AD RBAC -- or a lightweight development cluster, depending on your needs.
Note: The Planton web console provides a guided creation wizard for AKS and other Cloud Resources. This tutorial uses the CLI/YAML approach for stability and reproducibility. The console UI evolves frequently — always check it for the latest experience.
What You Will Learn
- How AKS fits in Planton's Cloud Catalog as an Azure Cloud Resource
- How to write an
AzureAksClustermanifest with system node pools, networking, and security configuration - How to deploy with
planton applyand monitor progress in real time - How to connect
kubectlto the new cluster - How production and development configurations differ and when to use each
Prerequisites
- An Azure Provider Connection configured and set as the default for your target environment. This connection provides the
tenant_id,subscription_id, and credentials Planton needs to provision resources in your Azure subscription. The setup follows the same pattern as the AWS and GCP provider connection tutorials -- create anAzureProviderConnectionwith your credentials and set it as the default. - An Azure Resource Group where the AKS cluster will be created. You can use an existing one or create one through Planton (see Creating Prerequisites via Planton at the end of this tutorial).
- A Virtual Network (VNet) with a subnet for AKS nodes. The subnet's ARM resource ID is required in the manifest. You can use an existing VNet or create one through Planton (see the same section below).
- A Planton organization and at least one environment created
- The
plantonCLI installed and authenticated (planton auth login) - The Azure CLI installed (for connecting
kubectlafter deployment)
What Is an AKS Cloud Resource?
AzureAksCluster is a Cloud Resource type in the Cloud Catalog that provisions a fully managed Azure Kubernetes Service cluster. You define the cluster configuration in a YAML manifest and apply it with planton apply -- Planton handles the Azure Resource Manager operations using your Azure provider connection. For more on Cloud Resources, see the Cloud Resources documentation.
Step 1: Write the AKS Manifest
Create a file named aks-cluster.yaml with the following content. This manifest describes a production-grade AKS cluster with multi-zone high availability, autoscaling system and application node pools, and Azure CNI Overlay networking.
apiVersion: azure.openmcf.org/v1
kind: AzureAksCluster
metadata:
name: app-aks-cluster
org: your-org
env: production
spec:
region: eastus
resourceGroup:
value: "your-resource-group-name"
vnetSubnetId:
value: "/subscriptions/your-subscription-id/resourceGroups/your-rg/providers/Microsoft.Network/virtualNetworks/your-vnet/subnets/aks-nodes"
kubernetesVersion: "1.30"
controlPlaneSku: STANDARD
networkPlugin: AZURE_CNI
networkPluginMode: OVERLAY
systemNodePool:
vmSize: Standard_D4s_v5
autoscaling:
minCount: 3
maxCount: 5
availabilityZones:
- "1"
- "2"
- "3"
userNodePools:
- name: general
vmSize: Standard_D8s_v5
autoscaling:
minCount: 2
maxCount: 10
availabilityZones:
- "1"
- "2"
- "3"
Replace these placeholder values with your own:
metadata.name: A name for the AKS cluster. Planton generates a URL-safe slug from it.metadata.org: Your Planton organization slug.metadata.env: The environment this cluster belongs to (e.g.,production,staging,dev).spec.resourceGroup.value: The name of an existing Azure Resource Group.spec.vnetSubnetId.value: The full ARM resource ID of the subnet for cluster nodes. You can find this in the Azure portal under your VNet's subnet properties, or by runningaz network vnet subnet show --resource-group your-rg --vnet-name your-vnet --name aks-nodes --query id -o tsv.
The key fields in this manifest:
region: The Azure region for the cluster. Choose a region that supports AKS availability zones.resourceGroup/vnetSubnetId: These use a nestedvaluekey because they also supportvalueFromreferences to other Cloud Resources. Literal values work for this tutorial; the Creating Prerequisites via Planton section shows thevalueFromapproach.kubernetesVersion: Pins the cluster to a specific Kubernetes minor version. Azure supports the current version and two previous minor versions.controlPlaneSku:STANDARDprovides an uptime SLA (99.95% with AZs) for ~$73/month.FREEhas no SLA -- suitable for development.networkPlugin/networkPluginMode:AZURE_CNIwithOVERLAYis recommended. Pods get IPs from a private range (default10.244.0.0/16), separate from your VNet address space. Avoids subnet IP exhaustion at scale.systemNodePool: Required. Runs cluster components (CoreDNS, metrics-server).Standard_D4s_v5(4 vCPUs, 16 GB) is recommended for production;Standard_D2s_v3for development. Spreading across 3 availability zones enables the 99.95% SLA tier.userNodePools: Where application workloads run, separated from system components. Add multiple pools for different workload profiles (compute, memory, Spot instances).
Step 2: Deploy with planton apply
Run the following command to deploy the AKS cluster. The -t flag streams the deployment progress to your terminal in real time.
planton apply -f aks-cluster.yaml -t
Planton validates the manifest, creates a deployment job, and begins provisioning the AKS cluster on Azure. The terminal output shows four phases:
- init: Configures the Azure provider using your connection credentials (a few seconds)
- refresh: Checks for any existing state (a few seconds)
- preview: Plans the changes -- shows the Azure resources that will be created (several seconds)
- update: Creates the AKS cluster, system node pool, user node pools, and configures networking (typically 5-10 minutes)
AKS cluster creation takes longer than many other resource types because Azure needs to provision the control plane, set up networking, and boot the node pool VMs across availability zones. Expect the update phase to take 5-10 minutes for a production configuration.
If you prefer to deploy without streaming, omit the -t flag:
planton apply -f aks-cluster.yaml
The CLI prints the deployment job ID immediately. You can check on it later with:
planton follow <stack-job-id>
Step 3: Verify the Deployment
After the deployment completes, retrieve the Cloud Resource to see its status and outputs:
planton get AzureAksCluster app-aks-cluster -o yaml
The status.outputs section contains the key information about your cluster:
| Output | Description | Example |
|---|---|---|
api_server_endpoint | The FQDN of the Kubernetes API server | app-aks-cluster-dns-abc123.hcp.eastus.azmk8s.io |
cluster_resource_id | The Azure ARM resource ID of the AKS cluster | /subscriptions/.../managedClusters/app-aks-cluster |
cluster_kubeconfig | Base64-encoded kubeconfig file contents | (base64 string) |
managed_identity_principal_id | Azure AD principal ID of the cluster's managed identity | a1b2c3d4-... |
To list all deployment jobs for this resource:
planton stack-job list <cloud-resource-id>
The cloud resource ID is in the metadata.id field of the planton get output.
Step 4: Connect kubectl to the Cluster
The standard way to connect kubectl to an AKS cluster is through the Azure CLI. Run the following command to merge the cluster's credentials into your local kubeconfig:
az aks get-credentials \
--resource-group your-resource-group-name \
--name app-aks-cluster \
--overwrite-existing
Replace your-resource-group-name with the name of the resource group you specified in the manifest.
Verify the connection by listing the cluster nodes:
kubectl get nodes
You should see your system pool nodes and user pool nodes listed with a Ready status. With the production manifest from Step 1, you will see at least 5 nodes: 3 system nodes (one per availability zone) and at least 2 user nodes.
NAME STATUS ROLES AGE VERSION
aks-general-12345678-vmss000000 Ready <none> 5m v1.30.x
aks-general-12345678-vmss000001 Ready <none> 5m v1.30.x
aks-system-87654321-vmss000000 Ready <none> 8m v1.30.x
aks-system-87654321-vmss000001 Ready <none> 8m v1.30.x
aks-system-87654321-vmss000002 Ready <none> 8m v1.30.x
Alternatively, the cluster_kubeconfig output from Step 3 contains a base64-encoded kubeconfig that you can decode and use directly:
planton get AzureAksCluster app-aks-cluster -o yaml | \
grep cluster_kubeconfig | awk '{print $2}' | base64 -d > kubeconfig-aks.yaml
export KUBECONFIG=kubeconfig-aks.yaml
kubectl get nodes
The az aks get-credentials approach is recommended because it integrates with Azure AD for authentication and handles token refresh automatically.
Development Configuration
For development and testing environments where cost and speed matter more than resilience, use a lighter configuration:
apiVersion: azure.openmcf.org/v1
kind: AzureAksCluster
metadata:
name: app-aks-dev
org: your-org
env: dev
spec:
region: eastus
resourceGroup:
value: "your-dev-resource-group"
vnetSubnetId:
value: "/subscriptions/your-subscription-id/resourceGroups/your-dev-rg/providers/Microsoft.Network/virtualNetworks/your-dev-vnet/subnets/aks-nodes"
kubernetesVersion: "1.30"
controlPlaneSku: FREE
networkPlugin: AZURE_CNI
networkPluginMode: OVERLAY
systemNodePool:
vmSize: Standard_D2s_v3
autoscaling:
minCount: 1
maxCount: 3
availabilityZones:
- "1"
Here is what changed from the production configuration and why:
controlPlaneSku: FREE: No uptime SLA and no cost for the control plane tier. The Free tier is functionally identical to Standard for most development workflows -- the difference is in availability guarantees.- Single availability zone:
availabilityZones: ["1"]instead of three zones. This reduces the minimum node count (no need to spread across zones) and may improve scheduling density. - Smaller VM size:
Standard_D2s_v3(2 vCPUs, 8 GB RAM) is sufficient for development workloads and costs roughly half ofStandard_D4s_v5. - Lower autoscaling:
minCount: 1andmaxCount: 3. A single system node is adequate for development. The cluster autoscaler adds nodes only when pods cannot be scheduled. - No user node pools: Applications run on the system node pool alongside cluster components. This is acceptable for development where workload isolation is not a concern.
Deploy the development configuration the same way:
planton apply -f aks-dev.yaml -t
Common Patterns and Tips
Enabling add-ons
AKS offers several Azure-managed add-ons that you can enable through the addons block. These are optional -- they are not included in the manifests above to keep the core path focused, but they are recommended for production clusters.
addons:
enableContainerInsights: true
logAnalyticsWorkspaceId: "/subscriptions/your-sub-id/resourceGroups/your-rg/providers/Microsoft.OperationalInsights/workspaces/your-workspace"
enableKeyVaultCsiDriver: true
enableAzurePolicy: true
enableWorkloadIdentity: true
- Container Insights (
enableContainerInsights): Streams container logs, performance metrics, and Kubernetes events to Azure Monitor. Requires a Log Analytics Workspace -- provide its ARM resource ID inlogAnalyticsWorkspaceId. Container Insights is only enabled when both the flag istrueand the workspace ID is provided. - Key Vault CSI Driver (
enableKeyVaultCsiDriver): Allows pods to mount secrets from Azure Key Vault as volumes, eliminating the need to store secrets in Kubernetes Secrets. - Azure Policy (
enableAzurePolicy): Enforces governance policies on the cluster (pod security standards, resource quotas, allowed registries). - Workload Identity (
enableWorkloadIdentity): Allows pods to authenticate to Azure services using Kubernetes service accounts instead of storing credentials. This is the recommended approach for applications that access Azure resources like Key Vault, Storage, or SQL Database.
Restricting API server access
By default, the AKS cluster has a public API server endpoint accessible from any IP address. For production clusters, restrict access to known networks:
authorizedIpRanges:
- "203.0.113.0/24"
- "198.51.100.0/24"
authorizedIpRanges accepts a list of CIDR blocks. Only traffic from these ranges can reach the Kubernetes API server. Add your office network, VPN exit points, and CI/CD agent networks.
For maximum security, deploy a fully private cluster:
privateClusterEnabled: true
When privateClusterEnabled is true, the API server has no public endpoint. It is accessible only from within the VNet or through peered networks. This requires a VPN, ExpressRoute, or a bastion host for kubectl access. The authorizedIpRanges field is not applicable to private clusters.
Spot instance pools
Azure Spot VMs offer 30-90% cost savings over regular VMs, but Azure can evict them when it needs the capacity back. Use Spot pools for fault-tolerant, stateless workloads like batch jobs, background processing, or stateless API replicas that can tolerate interruptions.
userNodePools:
- name: spot
vmSize: Standard_D4s_v5
autoscaling:
minCount: 0
maxCount: 20
availabilityZones:
- "1"
- "2"
- "3"
spotEnabled: true
Setting minCount: 0 allows the pool to scale to zero when there are no pods to schedule, which means you pay nothing when the pool is idle. When Spot nodes are evicted, the cluster autoscaler provisions replacement nodes automatically.
To direct specific workloads to Spot nodes, use Kubernetes tolerations and node affinity in your pod specs. The Spot pool nodes are created with a kubernetes.azure.com/scalesetpriority: spot label and a kubernetes.azure.com/scalesetpriority=spot:NoSchedule taint.
Advanced networking
For most clusters, the default networking configuration is correct. Customize these settings only if your network architecture requires specific CIDR ranges to avoid conflicts:
advancedNetworking:
podCidr: "10.244.0.0/16"
serviceCidr: "10.0.0.0/16"
dnsServiceIp: "10.0.0.10"
podCidr: The CIDR range for pod IPs in Overlay mode. The default10.244.0.0/16provides over 65,000 pod IPs. Change this only if it conflicts with your VNet or on-premises networks.serviceCidr: The CIDR range for Kubernetes Service cluster IPs. Must not overlap with the VNet address space, pod CIDR, or any peered networks. The default is10.0.0.0/16.dnsServiceIp: The IP address for the cluster DNS service (CoreDNS). Must be within theserviceCidrrange. The default is10.0.0.10.
Disabling Azure AD RBAC
By default, AKS clusters are created with Azure Active Directory RBAC integration enabled. This means cluster access is managed through Azure AD users and groups, and you can assign Kubernetes RBAC roles using Azure role assignments.
If you need to disable this -- for example, in isolated development environments where Azure AD integration is not available -- set:
disableAzureAdRbac: true
This is not recommended for production. Without Azure AD RBAC, cluster access is managed solely through Kubernetes-native RBAC with client certificates, which is harder to audit and does not integrate with your organization's identity provider.
Creating Prerequisites via Planton (Alternative)
If you prefer to manage your Azure Resource Group and Virtual Network as Planton Cloud Resources rather than creating them externally, you can deploy them through planton apply and reference their outputs in the AKS manifest.
Step A: Create a Resource Group
Create a file named resource-group.yaml:
apiVersion: azure.openmcf.org/v1
kind: AzureResourceGroup
metadata:
name: aks-infrastructure
org: your-org
env: production
spec:
name: rg-aks-production
region: eastus
Deploy it:
planton apply -f resource-group.yaml -t
Step B: Create a Virtual Network
Create a file named vnet.yaml. The resourceGroup field references the Resource Group you created in Step A:
apiVersion: azure.openmcf.org/v1
kind: AzureVpc
metadata:
name: aks-network
org: your-org
env: production
spec:
region: eastus
resourceGroup:
valueFrom:
kind: AzureResourceGroup
name: aks-infrastructure
env: production
fieldPath: status.outputs.resource_group_name
addressSpaceCidr: "10.1.0.0/16"
nodesSubnetCidr: "10.1.0.0/18"
Deploy it:
planton apply -f vnet.yaml -t
The resourceGroup.valueFrom tells Planton to resolve the resource group name from the AzureResourceGroup Cloud Resource you created in Step A. The fieldPath specifies which output to use -- in this case, the resource_group_name output.
Step C: Deploy AKS with resource references
Now create the AKS manifest with valueFrom references instead of literal values:
apiVersion: azure.openmcf.org/v1
kind: AzureAksCluster
metadata:
name: app-aks-cluster
org: your-org
env: production
spec:
region: eastus
resourceGroup:
valueFrom:
kind: AzureResourceGroup
name: aks-infrastructure
env: production
fieldPath: status.outputs.resource_group_name
vnetSubnetId:
valueFrom:
kind: AzureVpc
name: aks-network
env: production
fieldPath: status.outputs.nodes_subnet_id
kubernetesVersion: "1.30"
controlPlaneSku: STANDARD
networkPlugin: AZURE_CNI
networkPluginMode: OVERLAY
systemNodePool:
vmSize: Standard_D4s_v5
autoscaling:
minCount: 3
maxCount: 5
availabilityZones:
- "1"
- "2"
- "3"
userNodePools:
- name: general
vmSize: Standard_D8s_v5
autoscaling:
minCount: 2
maxCount: 10
availabilityZones:
- "1"
- "2"
- "3"
The valueFrom references allow Planton to resolve the resource group name and subnet ID from the outputs of the resources you deployed in Steps A and B. This approach is particularly useful when deploying through Infra Charts, where multiple resources are orchestrated together and dependencies are resolved automatically through a DAG.
What to Do Next
Your AKS cluster is running on Azure. From here:
- Deploy a backend service to the cluster. See How to Deploy Your First Service with Zero-Config CI/CD to set up a push-to-deploy workflow for your applications.
- Deploy Redis or other workloads onto the cluster. See How to Deploy Redis on Kubernetes -- the Kubernetes Cloud Resource workflow deploys directly to any connected cluster, including the one you created here.
- Explore other Azure resources in the Cloud Catalog. The same
planton applyworkflow works for Azure SQL Database, Azure Key Vault, Azure Storage Accounts, Azure Container Registry, and other Azure resource types.
Next article