Deploying SQL server Always on Availability Group on Azure Kubernetes Services(AKS).

In the previous blog (How to deploy SQL Server containers to a Kubernetes cluster for high availability? – Beyond the Horizon… (livenetwork.in) we learned how to deploy a SQL server container to the Kubernetes Cluster. In this blog, we will extend the same concept to deploy the SQL server Always on Availability group on Azure Kubernetes Services. I have developed a video for this and you can watch this video on the below link.

https://youtu.be/7JJQFTF-9dE

What is SQL server Always on Availability Group?

An availability group solution supports a set of primary user databases and one to eight sets of corresponding secondary user databases(for fault tolerance). The secondary databases are kept up to date with the primary databases by transferring transaction log blocks from each primary database to every secondary database either synchronously or asynchronously over the network. It provides Database Level Protection. Data streams between replicas are encrypted with certificates for the purpose of security and Failover can be implemented either automatically or manually.

SQL Server AlwaysOn Availability Groups provide us the capability to group two or more databases and perform a failover of both as a logical unit.
A single primary replica, which hosts the primary read/write databases.
One to eight secondary replicas, each of which hosts a set of secondary read-only databases and serves as a potential failover target for the availability group.
An availability group supports a failover environment for a discrete set of user databases, known as availability databases, that fail over together as a unit. An availability group supports a set of primary databases and one to eight sets of corresponding secondary databases.
It can have synchronous and Asynchronous Replicas, but it can have up to three synchronous Replicas only.
An availability group fails over at the level of an availability replica. An availability replica provides redundancy only at the database level—for the set of databases in one availability group.
The primary replica makes the primary databases available for read-write connections from clients, and additionally, sends transaction log records of each primary database (data synchronization) to every secondary database.
Each secondary replica caches the transaction log records, (hardens the log) and then applies them to its corresponding secondary database. Data synchronization occurs between the primary database and each connected secondary database, independently of the other databases, therefore, a secondary database can be suspended or fail without affecting other secondary databases, and a primary database can be suspended or fail without affecting other primary databases.

Data Synchronization Options.

Synchronous data movement guarantees that if a commit confirmation has been received by the client, the log record has been hardened to the primary replica as well as the secondary replica. This means the secondary has 100% of the data that the primary has. If the secondary needed to take over, it has all the needed data. This gives us the capability of zero data loss and comes with little overhead of increased transaction latency. This also gives the option of automatic failover to the secondary replica because the data is already present
Asynchronous data movement doesn’t guarantee zero-data loss because log records for a transaction might arrive a bit later. In this case, the primary replica commits the transaction after hardening into the local log file, however, it does not wait for the hardening into the secondary replica. The secondary is hardened asynchronously. An advantage of this configuration is that the primary replica does not have to wait for confirmation from an asynchronous secondary, although the disadvantage is that the secondary replica can be lagging behind the primary.

How do Availability Groups Work?

Synchronous and asynchronous commit between Primary replicas and Secondaries are depicted in this diagram.

Kubernetes and SQL server Availability Group Architecture

SQL Server availability groups are implemented thru Kubernetes replicaSets. So in case of Primary / Secondary replica failures, Kubernetes can create the POD or Node again and the service can work without interruption.

Why it is important to use Kubernetes for SQL Always on Availability?

No need to set up windows failover clustering or Linux Pacemaker Cluster.
Kubernetes provides automatic failover capability.
We can create up to 17 distributed replicas.

Kubernetes configuration.

Here is the configuration of Primary Replica and Persistent volume claim.

Here is the CLI code to make it work.

   #Login and Create Resource Group
az login
az group create --name RG-SQLAG --location eastus

#Create AKS cluster
az aks create --resource-group RG-SQLAG --name SQLAG --node-count 3 \
 --generate-ssh-keys --node-vm-size=Standard_B2ms

 # Get credentials for the cluster so we can connect
az aks get-credentials --resource-group RG-SQLAG --name SQLAG

# List nodes
kubectl get nodes
# List Pods
kubectl get pods

# Create external storage with PV and PVC
kubectl apply -f sqlAGstorage.yaml

# Display the persistent volume and claim
kubectl get pv
kubectl get pvc

# Use K8 secrets to hold required sa password for SQL Server container
kubectl create secret generic mssql-secret \
 --from-literal=SA_PASSWORD="Welcome@0001234567"

# Deploy the SQL Server 2019 container

# Primary Replica Deployment
kubectl apply -f sqlAGPdeployment.yaml --record

# Secondary Replica1 Deployment
kubectl apply -f sqlAGS1deployment.yaml --record

# Secondary Replica2 Deployment
kubectl apply -f sqlAGS2deployment.yaml --record

# Get the status of the nodes, pods and service
kubectl get all

# Retrieve external IP address of the primary and secondary replicas
ip1=$(kubectl get services | grep mssql-primary | cut -c49-63)
ip2=$(kubectl get services | grep mssql-secondary1 | cut -c49-63)
ip3=$(kubectl get services | grep mssql-secondary2 | cut -c49-63)

echo IP addresses of the replicas  $ip1, $ip2, $ip3

#Create Primary & secondary replicas from SQL script and create certificates in
#Primary replica

# Now Copy the certificates from the  primary to the local
# First Retrieve pod name to variable
podagp=$(kubectl get pods -l app=mssql-primary -o custom-columns=:metadata.name)
podags1=$(kubectl get pods -l app=mssql-secondary1 -o custom-columns=:metadata.name)
podags2=$(kubectl get pods -l app=mssql-secondary2 -o custom-columns=:metadata.name)

#prepare variables
PathToCopyCert=${podagp}":var/opt/mssql/ag_certificate.cert"
PathToCopyCertKey=${podagp}":var/opt/mssql/ag_certificate.key"

# First copy to local
kubectl cp $podagp:var/opt/mssql/ag_certificate.cert ag_certificate.cert
kubectl cp $podagp:var/opt/mssql/ag_certificate.key ag_certificate.key

# Copy the certificate from local host to secondary1
echo Copying AG certificates from localhost to pod $podags1
kubectl cp ag_certificate.cert  $podags1:var/opt/mssql
kubectl cp ag_certificate.key  $podags1:var/opt/mssql

# Next copy to secondary2
kubectl cp ag_certificate.cert $podags2:var/opt/mssql
kubectl cp ag_certificate.key $podags2:var/opt/mssql

# Now start the testing if SQL Always on Availabilty  by causing failover
podname=$(kubectl get pods -l app=mssql-primary -o custom-columns=:metadata.name)
echo $podname
kubectl delete pod $podname
sleep 3
kubectl get pods

# Some Important troubleshooting commands
#Get the current status of the deployment
kubectl get all

#Get the current resource usage
kubectl top nodes
kubectl top pods

#Retrieve pod IP addresses
kubectl get services

#Retrieving YAML declarations
kubectl get <pod_name> -o yaml

# Dump POD info
kubectl describe pods  mssqlag-secondary1-deployment-c7559dd7d-w42lz

#Describe a deployment
kubectl describe deployments

#Describe a pod
kubectl describe pod -1 app=mssql-primary

#Describe the storage
kubectl describe pv
kubectl describe pvc

# Display the Kubernetes Logs
kubectl logs -l app=mssql
# Display the logs from a previous run# Use the -p flag to look at the logs
#from containers that  have exited

# Display the all events sorted
kubectl get events --sort-by=.metadata.creationTimestamp

# Display pod events
kubectl get events --field-selector involvedObject.kind=Pod

# Delete the deployment
kubectl delete -f sqlAGPdeployment.yaml

# Getting Kubernetes secrets
kubectl get secret mssql-secret -o jsonpath="{.data.SA_PASSWORD}" | base64 --decode

# Get pod name
podname=$(kubectl get pods -l app=mssql-primary -o custom-columns=:metadata.name)

# Running individual commands in the pod’s container
kubectl exec -it $podname--top

# Opening an interactive shell in the pod’s container
kubectl exec -it $podname--/bin/bash

#Run command on POD directly without login
kubectl exec -it $podname -- /opt/mssql-tools/bin/sqlcmd -S. -U sa -P SQL2019AGPassword -Q 'select @@version'

# Browse AKS dashboard
az aks browse --resource-group RG-SQLAG --name SQLAG

You can find the entire code in my Github repo: Live Network /SQL-Always-on-AKS (github.com).

Hope you enjoyed this blog!!