This is the multi-page printable view of this section. Click here to print.

Return to the regular view of this page.

Compute, Storage, and Networking Extensions

This section covers extensions to your cluster that do not come as part as Kubernetes itself. You can use these extensions to enhance the nodes in your cluster, or to provide the network fabric that links Pods together.

  • CSI and FlexVolume storage plugins

    Container Storage Interface (CSI) plugins provide a way to extend Kubernetes with supports for new kinds of volumes. The volumes can be backed by durable external storage, or provide ephemeral storage, or they might offer a read-only interface to information using a filesystem paradigm.

    Kubernetes also includes support for FlexVolume plugins, which are deprecated since Kubernetes v1.23 (in favour of CSI).

    FlexVolume plugins allow users to mount volume types that aren't natively supported by Kubernetes. When you run a Pod that relies on FlexVolume storage, the kubelet calls a binary plugin to mount the volume. The archived FlexVolume design proposal has more detail on this approach.

    The Kubernetes Volume Plugin FAQ for Storage Vendors includes general information on storage plugins.

  • Device plugins

    Device plugins allow a node to discover new Node facilities (in addition to the built-in node resources such as cpu and memory), and provide these custom node-local facilities to Pods that request them.

  • Network plugins

    A network plugin allow Kubernetes to work with different networking topologies and technologies. Your Kubernetes cluster needs a network plugin in order to have a working Pod network and to support other aspects of the Kubernetes network model.

    Kubernetes 1.26 is compatible with CNI network plugins.

1 - Network Plugins

Kubernetes 1.26 supports Container Network Interface (CNI) plugins for cluster networking. You must use a CNI plugin that is compatible with your cluster and that suits your needs. Different plugins are available (both open- and closed- source) in the wider Kubernetes ecosystem.

A CNI plugin is required to implement the Kubernetes network model.

You must use a CNI plugin that is compatible with the v0.4.0 or later releases of the CNI specification. The Kubernetes project recommends using a plugin that is compatible with the v1.0.0 CNI specification (plugins can be compatible with multiple spec versions).

Installation

A Container Runtime, in the networking context, is a daemon on a node configured to provide CRI Services for kubelet. In particular, the Container Runtime must be configured to load the CNI plugins required to implement the Kubernetes network model.

For specific information about how a Container Runtime manages the CNI plugins, see the documentation for that Container Runtime, for example:

For specific information about how to install and manage a CNI plugin, see the documentation for that plugin or networking provider.

Network Plugin Requirements

For plugin developers and users who regularly build or deploy Kubernetes, the plugin may also need specific configuration to support kube-proxy. The iptables proxy depends on iptables, and the plugin may need to ensure that container traffic is made available to iptables. For example, if the plugin connects containers to a Linux bridge, the plugin must set the net/bridge/bridge-nf-call-iptables sysctl to 1 to ensure that the iptables proxy functions correctly. If the plugin does not use a Linux bridge, but uses something like Open vSwitch or some other mechanism instead, it should ensure container traffic is appropriately routed for the proxy.

By default, if no kubelet network plugin is specified, the noop plugin is used, which sets net/bridge/bridge-nf-call-iptables=1 to ensure simple configurations (like Docker with a bridge) work correctly with the iptables proxy.

Loopback CNI

In addition to the CNI plugin installed on the nodes for implementing the Kubernetes network model, Kubernetes also requires the container runtimes to provide a loopback interface lo, which is used for each sandbox (pod sandboxes, vm sandboxes, ...). Implementing the loopback interface can be accomplished by re-using the CNI loopback plugin. or by developing your own code to achieve this (see this example from CRI-O).

Support hostPort

The CNI networking plugin supports hostPort. You can use the official portmap plugin offered by the CNI plugin team or use your own plugin with portMapping functionality.

If you want to enable hostPort support, you must specify portMappings capability in your cni-conf-dir. For example:

{
  "name": "k8s-pod-network",
  "cniVersion": "0.4.0",
  "plugins": [
    {
      "type": "calico",
      "log_level": "info",
      "datastore_type": "kubernetes",
      "nodename": "127.0.0.1",
      "ipam": {
        "type": "host-local",
        "subnet": "usePodCidr"
      },
      "policy": {
        "type": "k8s"
      },
      "kubernetes": {
        "kubeconfig": "/etc/cni/net.d/calico-kubeconfig"
      }
    },
    {
      "type": "portmap",
      "capabilities": {"portMappings": true}
    }
  ]
}

Support traffic shaping

Experimental Feature

The CNI networking plugin also supports pod ingress and egress traffic shaping. You can use the official bandwidth plugin offered by the CNI plugin team or use your own plugin with bandwidth control functionality.

If you want to enable traffic shaping support, you must add the bandwidth plugin to your CNI configuration file (default /etc/cni/net.d) and ensure that the binary is included in your CNI bin dir (default /opt/cni/bin).

{
  "name": "k8s-pod-network",
  "cniVersion": "0.4.0",
  "plugins": [
    {
      "type": "calico",
      "log_level": "info",
      "datastore_type": "kubernetes",
      "nodename": "127.0.0.1",
      "ipam": {
        "type": "host-local",
        "subnet": "usePodCidr"
      },
      "policy": {
        "type": "k8s"
      },
      "kubernetes": {
        "kubeconfig": "/etc/cni/net.d/calico-kubeconfig"
      }
    },
    {
      "type": "bandwidth",
      "capabilities": {"bandwidth": true}
    }
  ]
}

Now you can add the kubernetes.io/ingress-bandwidth and kubernetes.io/egress-bandwidth annotations to your Pod. For example:

apiVersion: v1
kind: Pod
metadata:
  annotations:
    kubernetes.io/ingress-bandwidth: 1M
    kubernetes.io/egress-bandwidth: 1M
...

What's next

2 - Device Plugins

Device plugins let you configure your cluster with support for devices or resources that require vendor-specific setup, such as GPUs, NICs, FPGAs, or non-volatile main memory.
FEATURE STATE: Kubernetes v1.26 [stable]

Kubernetes provides a device plugin framework that you can use to advertise system hardware resources to the Kubelet.

Instead of customizing the code for Kubernetes itself, vendors can implement a device plugin that you deploy either manually or as a DaemonSet. The targeted devices include GPUs, high-performance NICs, FPGAs, InfiniBand adapters, and other similar computing resources that may require vendor specific initialization and setup.

Device plugin registration

The kubelet exports a Registration gRPC service:

service Registration {
	rpc Register(RegisterRequest) returns (Empty) {}
}

A device plugin can register itself with the kubelet through this gRPC service. During the registration, the device plugin needs to send:

  • The name of its Unix socket.
  • The Device Plugin API version against which it was built.
  • The ResourceName it wants to advertise. Here ResourceName needs to follow the extended resource naming scheme as vendor-domain/resourcetype. (For example, an NVIDIA GPU is advertised as nvidia.com/gpu.)

Following a successful registration, the device plugin sends the kubelet the list of devices it manages, and the kubelet is then in charge of advertising those resources to the API server as part of the kubelet node status update. For example, after a device plugin registers hardware-vendor.example/foo with the kubelet and reports two healthy devices on a node, the node status is updated to advertise that the node has 2 "Foo" devices installed and available.

Then, users can request devices as part of a Pod specification (see container). Requesting extended resources is similar to how you manage requests and limits for other resources, with the following differences:

  • Extended resources are only supported as integer resources and cannot be overcommitted.
  • Devices cannot be shared between containers.

Example

Suppose a Kubernetes cluster is running a device plugin that advertises resource hardware-vendor.example/foo on certain nodes. Here is an example of a pod requesting this resource to run a demo workload:

---
apiVersion: v1
kind: Pod
metadata:
  name: demo-pod
spec:
  containers:
    - name: demo-container-1
      image: registry.k8s.io/pause:2.0
      resources:
        limits:
          hardware-vendor.example/foo: 2
#
# This Pod needs 2 of the hardware-vendor.example/foo devices
# and can only schedule onto a Node that's able to satisfy
# that need.
#
# If the Node has more than 2 of those devices available, the
# remainder would be available for other Pods to use.

Device plugin implementation

The general workflow of a device plugin includes the following steps:

  • Initialization. During this phase, the device plugin performs vendor specific initialization and setup to make sure the devices are in a ready state.

  • The plugin starts a gRPC service, with a Unix socket under host path /var/lib/kubelet/device-plugins/, that implements the following interfaces:

    service DevicePlugin {
          // GetDevicePluginOptions returns options to be communicated with Device Manager.
          rpc GetDevicePluginOptions(Empty) returns (DevicePluginOptions) {}
    
          // ListAndWatch returns a stream of List of Devices
          // Whenever a Device state change or a Device disappears, ListAndWatch
          // returns the new list
          rpc ListAndWatch(Empty) returns (stream ListAndWatchResponse) {}
    
          // Allocate is called during container creation so that the Device
          // Plugin can run device specific operations and instruct Kubelet
          // of the steps to make the Device available in the container
          rpc Allocate(AllocateRequest) returns (AllocateResponse) {}
    
          // GetPreferredAllocation returns a preferred set of devices to allocate
          // from a list of available ones. The resulting preferred allocation is not
          // guaranteed to be the allocation ultimately performed by the
          // devicemanager. It is only designed to help the devicemanager make a more
          // informed allocation decision when possible.
          rpc GetPreferredAllocation(PreferredAllocationRequest) returns (PreferredAllocationResponse) {}
    
          // PreStartContainer is called, if indicated by Device Plugin during registeration phase,
          // before each container start. Device plugin can run device specific operations
          // such as resetting the device before making devices available to the container.
          rpc PreStartContainer(PreStartContainerRequest) returns (PreStartContainerResponse) {}
    }
    
  • The plugin registers itself with the kubelet through the Unix socket at host path /var/lib/kubelet/device-plugins/kubelet.sock.

  • After successfully registering itself, the device plugin runs in serving mode, during which it keeps monitoring device health and reports back to the kubelet upon any device state changes. It is also responsible for serving Allocate gRPC requests. During Allocate, the device plugin may do device-specific preparation; for example, GPU cleanup or QRNG initialization. If the operations succeed, the device plugin returns an AllocateResponse that contains container runtime configurations for accessing the allocated devices. The kubelet passes this information to the container runtime.

Handling kubelet restarts

A device plugin is expected to detect kubelet restarts and re-register itself with the new kubelet instance. A new kubelet instance deletes all the existing Unix sockets under /var/lib/kubelet/device-plugins when it starts. A device plugin can monitor the deletion of its Unix socket and re-register itself upon such an event.

Device plugin deployment

You can deploy a device plugin as a DaemonSet, as a package for your node's operating system, or manually.

The canonical directory /var/lib/kubelet/device-plugins requires privileged access, so a device plugin must run in a privileged security context. If you're deploying a device plugin as a DaemonSet, /var/lib/kubelet/device-plugins must be mounted as a Volume in the plugin's PodSpec.

If you choose the DaemonSet approach you can rely on Kubernetes to: place the device plugin's Pod onto Nodes, to restart the daemon Pod after failure, and to help automate upgrades.

API compatibility

Previously, the versioning scheme required the Device Plugin's API version to match exactly the Kubelet's version. Since the graduation of this feature to Beta in v1.12 this is no longer a hard requirement. The API is versioned and has been stable since Beta graduation of this feature. Because of this, kubelet upgrades should be seamless but there still may be changes in the API before stabilization making upgrades not guaranteed to be non-breaking.