AKS Cluster w/ Terraform

Project URL: https://github.com/bgcodehub/profisee-code

1. Introduction

  • Briefly introduce the problem statement:
    • The objective was to set up an AKS cluster via Terraform code. Today, I’ll walk you through the architecture and components of my solution.

2. Explain the Configuration

  • Azure Provider:
    • Starting off, I configured the Azure provider for Terraform using the ‘azurerm’ provider block.
  • Resource Group:
    • For organization and logical segregation of resources, I created a resource group names ‘rg1’ with configurable name and location.

3. Dive into the Modules

  • Service Principal:
    • The ‘ServicePrincipal’ module is responsible for creating an Azure Service Principal. It’s a prerequisite for AKS as it allows AKS to interact with other Azure services.
    • Mention that the Service Principal creation depends on the resource group’s existence.
  • Role Assignment:
    • I’ve assigned the ‘Contributor’ role to the Service Principal at the subscription level. This allows the Service Principal to create and manage all types of Azure resources.
  • Key Vault & Secrets:
    • Security is paramount. So, instead of storing sensitive data in plaintext, I opted for Azure Key Vault.
    • The ‘keyvault’ modules creates a Key Vault where we store the client secret of the Service Principal.
    • The ‘azure_key_vault_secret’ resource saves the Service Principal’s client secret in this vault.

4. Highlight the Main Component: AKS

  • Finally, using the ‘aks’ module, we provision an AKS cluster.
  • Highlight the dependencies:
    • It’s worth noting the dependencies throughout this code. For instance, the creation of AKS has a direct dependency on the ‘ServicePrincipal’ module, ensuring structured and sequential provisioning of resources.
  • Explain any additional configurations or design choices if they exist within the ‘aks’ module:

4a. AKS Version Determination:

  • Datasource for AKS Version:
    • I’m using the ‘azure_kubernetes_service_versions’ data source to automatically fetch the latest stable AKS version for the specified location. This ensures we always deploy our cluster with the latest and most secure version available, without including any preview versions.

4b. AKS Cluster Configuration:

  • Name and Location:
    • The AKS cluster is named “bg-aks-cluster” and is located based on the variable ‘var.location’.
  • DNS Prefix:
    • The ‘dns_prefix’ is constructed by appending “-cluster” to the resource group’s name. This gives a unique domain name for the AKS API server.
  • Kubernetes Version:
    • I’m using the latest stable version retrieved from the data source.
  • Node Resource Group:
    • I’ve defined a custom node resource group by appending “-nrg” to the main resource group name.

4c. Default Node Pool Configuration:

  • VM Configuration:
    • VM Size: I’ve chosen the ‘Standard_DS2_v2’ size, which is a good balance between performance and cost.
    • OS Disk Size: Set at 30GB.
  • Auto-Scaling:
    • The node pool is set to automatically scale. It can scale out to a maximum of 3 nodes and scale in to a minimum of 1 node.
  • Availability Zones:
    • I’ve configured the node pool to spread across 3 availability zones, ensuring high availability and fault tolerance.
  • Labels and Tags:
    • Both node labels and tags are defined for better management and categorization. This can aid in Kubernetes scheduling decisions and Azure resource tracking.

4d. Service Principal Configuration:

  • I’ve utilized a service principal for AKS to interact with the other Azure services. This principal’s credentials are supplied via the ‘var.client_id’ and var.client_secret’.

4e. Linux Profile:

  • SSH Key:
    • The AKS nodes will have an admin username as “ubuntu”. For SSH access, I’m injecting a public SSH key from the specified file path in ‘var.ssh_public_key’.

4f. Network Profile:

  • Network Plugin:
    • I’m using the “azure” network plugin, which integrates AKS with Azure’s networking capabilities.
  • Load Balancer SKU:
    • The “standard” SKU has been chosen for the Load Balancer, providing better performance, additional features, and built-in zone redundancy compared to the basic SKU.

5. Discuss Implementation Choices

  • Modularity:
    • As you can see, the solution is modular. By structuring the code in this way, we gain better organization, readability, and reusability.
  • Security:
    • By using Key Vault, we’ve ensured sensitive credentials aren’t exposed. It’s an essential best practice for securing infrastructure as code implementations.

6. Potential Improvements

  • Potential Improvements:
    • Monitor and Logging:
      • While the infrastructure setup is a core component, in the next phase, I’d consider integrating Azure Monitoring and Logging tools with the AKS cluster. This would provide valuable insights into the health and performance of the cluster.
    • Network Policies:
      • Currently, I’ve established the foundational network setup. Going forward, establishing fine-grained network policies could enhance the security and control of the communication between pods in the AKS cluster.
    • Scaling Strategy:
      • The AKS cluster is now in its basic configuration. Depending on the anticipated load, I’d explore Azure’s Kubernetes-based scaling solutions to ensure the applications can handle the demand.
    • Backup and Recovery:
      • Introducing regular backup schedules for the AKS cluster and its associated DBs would be crucial. It’s an area that couple be improved upon to ensure business continuity.

7. Key Takeaways

  • Dependency Management:
    • Understanding the dependencies between resources was crucial. Terraform’s ‘depends_on’ attribute proved invaluable, ensuring that resources were provisioned in the correct order.
  • Security Practice:
    • This project underscored the importance of securing sensitive information. Using Azure Key Vault to handle secrets reinforced best practices around cloud security.
  • Modularity:
    • The modular approach in Terraform not only made the codebase cleaner but also re-enforced the importance of reusability and organization when it comes to Infrastructure as Code.

8. Q&A

  • Integrating this Terraform code into a CI/CD pipeline for automated infrastructure deployment?
    • I’d use a CI/CD tool like Jenkins or Azure DevOps. The pipeline would start with linting and validating the Terraform code, followed by a ‘terraform plan’ stage. Post-approval, the ‘terraform apply’ stage would be triggered. Storing the Terraform state securely and handling concurrent deployments would be critical considerations.
  • Ensuring that this Terraform configuration stays up-to-date with the ever-evolving Azure services?
    • Firstly, I’d ensure that I regularly update the ‘azurerm’ provider to benefit from the latest features and improvements. Additionally, I’d subscribe to Azure updates and Terraform’s official documentation to stay informed about any changes.
  • How can I further enhance the security posture of this AKS cluster?
    • Beyond the Key Vault, I’d implement Azure’s built-in security features for AKS like Azure AD Pod Identity and Network Policies. Regular security audits, vulnerability assessments, and ensuring the nodes are patched with the latest security updates would further enhance the security posture.