Four Steps to achieve the ultimate CI/CD goals with Kubernetes

Zero downtime, Zero traffic performance penalty and metric impact are the ultimate goals of enterprise level CI/CD (Continuous Integration/Continue Delivering) workflow in cloud native environment.

This can not be achieved by a single piece of tools. We need multiple solutions working together. Kubernetes software release rollup and rollback with canary release support itself is not enough.

First: Use Kubernetes friendly CI toolset

For CI toolset, it should be well integrated with github, docker registry, kubernetes orchestrator and the toolset itself should be cloud native and easy to be deployed in public or private cloud. The tool will carry out daily kubernetes development tasks: containerized application, build and tag docker image, push it into the registry, detect the Kubernetes manifest change, deploy those Kubernetes objects, test and bug fix, push the source code ( Docker file, Application Source Code , Kubernetes Manifest File) into github automatically and smoothly.

Jenkins is the original, de facto standard in CI/CD and the Jenkins X version is its cloud native Jenkins implementation. If you think Jenkins is a little bit too complicated and heavy for your development cycle, the Forge project provides the starting point for this Kubernetes based CI workflow.

Second: Use Kubernetes Correctly

To best utilize Kubernetes canary release related features, following key points must be followed:

  • Understand when Kubernetes rolling update will happen: rolling update of deployment will be automatically triggered when you update the images, configuration, labels, annotations, resource limits and scale up your resource in the cluster. You can trigger an update rollout by updating the object’s spec: template.
  • Rolling updates are designed to update your workloads without downtime with following canvas:
    • No matter how you adjust the Kubernetes rolling update strategy parameters: maxSurge and maxUnavailable number, there is a service available gap. It is kube-proxy or ingress that detects Kubernetes object endpoints changes, updates node’s iptables and terminates old pods. This is an asynchronous process and may break the zero downtime rule.
    • Your containerized application should be designed to handle shutdown gracefully and may implement mechanisms to coordinate during the process of bring up of the new pod and killing of old pod. One way is to check the readiness of new pod before shutdown of the old pod and implement shutdown hookup handler to make sure kube-proxy and ingress have enough time to update its traffic routing tables.

Third: Use Service Mesh to control Traffic Flow

Using plain Kubernetes canary deployment method, the traffic load distribution is controlled by replica ratio. Large number of replica sets is needed to realize such as 1% of traffic going to canary deployment for example ( 100 replica sets are needed). To maintain the fixed traffic ratio will become more awkward when the Horizontal Pod Autoscaler (HPA) feature is enabled for traffic auto scaling. It also lacks the feature such as routing a specific sourced traffic to canary pods and so on.

Service mesh technology can precisely control the portion of traffic fed into newer versions of applications through the concept of traffic weight. It can implement traffic routing rules for the scenario we discussed before. Service mesh treats traffic routing and replica deployment completely independently. HPA can work independently with traffic routing rules.

Fourth: Coordinate with Metrics Measurement

During canary development, you should not only make sure new services are functional, but also need to guarantee there is no traffic performance impact during the whole process.

CD tool needs to work with the cluster metrics collection tool to monitor pod, node and whole cluster traffic metrics and feedback those to CD toolset. When there are any unexpected traffic performance impacts, such as traffic unexpected jitta, delay and loss, you can pause and rollback your deployment. For example, Spinnaker toolset coming from Netflix is continuous deployment centric. It is integrated with Prometheus and Datadog cluster metrics monitoring tools and can make decisions about deployments based on metrics. For canary deployment, if the canary deployment has caused any pertinent metrics degradation, you can roll it back. You can also integrate Nginx cloud application controllers, even some AI functions to feed back measured KPI results directly to the CD toolset and make roll back or roll up decisions automatically.

Achieving a successful CI/CD pipeline in a cloud native enterprise environment is a must do task. By adopting correct toolsets and methodology, you will reap the fruitful results it brings to your software development cycle.

Keyuan Zhang

Published by Keyuan Zhang

Professional with intensive industry experience and knowledge on Cloud Computing, IoT and Embedded System.

Leave a comment