Deploying a multi-node k8s cluster (e.g., multiple workers) while following the official Kubernetes instructions (i.e., not using options such as Minikube or frameworks like Rancher) can easily lead to networking errors. Furthermore, if using Vagrant for that task, there is also another network-related consideration to bear in mind for multi-node setups.

Networking considerations are to be taken in mind here, because otherwise the Container Network Interface (CNI) may not be properly configured and, even independently of that, worker nodes will not be able to join the cluster.

To illustrate the environment at hand, a 3-node k8s cluster (one master/CP, two workers) deployed in VirtualBox, with the Vagrantfile and scripts available in GitHub.

The networking part will look like this in the different nodes:

Node Hostname iface=enp0s3 iface=enp0s8
k8s-cp cp 10.0.2.15/24 192.178.33.110/24
k8s-worker1 worker1 10.0.2.15/24 192.178.33.120/24
k8s-worker2 worker2 10.0.2.15/24 192.178.33.130/24

Thee major issue to address is that the k8s-cp IP used to advertise the cluster to the rest of nodes shall be explicitly indicated in any environment whose primary interface does not expose a publicly reachable IP to all other nodes.

This is the case of a VirtualBox environment (here used through Vagrant). The primary interface is enp0s3, which uses NAT and gets assigned the 10.0.2.15/24 IP. This means that all VirtualBox VMs will have this same IP assigned to enp0s3. This will cause failures both to:

  • Allow any other node to join the cluster; and
  • Cause conflict in the CNI (here, Calico) because k8s-cp will have the cluster advertised on the primary, NATted IP; and future workers will try to access services hosted in k8s-cp (such as kube-apiserver) that will instead be searched for in their own private network.

An example of the first error is shown below, resulting into the joining node to attempt to look for the kube-apiserver in its own address:

1
2
3
4
5
6
7
sudo kubeadm join 10.0.2.15:6443 --token qualsk.0584iwavwhmmq0ox \
	--discovery-token-ca-cert-hash sha256:7cb2ec38493492631ebb3de5aac2823747191ae62cfbaa5426578ed8803bdcb8 
Joining cluster from host with IP=192.178.33.110
[preflight] Running pre-flight checks

error execution phase preflight: couldn't validate the identity of the API Server: Get "https://10.0.2.15:6443/api/v1/namespaces/kube-public/configmaps/cluster-info?timeout=10s": dial tcp 10.0.2.15:6443: connect: connection refused
To see the stack trace of this error execute with --v=5 or higher

An example of the second error is shown below, and results in k8s-worker1 to not be able to get to a stable, fully running state:

1
2
3
4
5
6
7
8
9
10
11
12
13
vagrant@cp:~$ kubectl get pod -A
NAMESPACE     NAME                                       READY   STATUS             RESTARTS   AGE
kube-system   calico-kube-controllers-5f6cfd688c-fcfnj   1/1     Running            0          16m
kube-system   calico-node-5rhk6                          0/1     CrashLoopBackOff   7          12m
kube-system   calico-node-k2x9k                          1/1     Running            0          16m
kube-system   coredns-74ff55c5b-7f8jc                    1/1     Running            0          16m
kube-system   coredns-74ff55c5b-9ls7h                    1/1     Running            0          16m
kube-system   etcd-k8scp                                 1/1     Running            0          16m
kube-system   kube-apiserver-k8scp                       1/1     Running            0          16m
kube-system   kube-controller-manager-k8scp              1/1     Running            0          16m
kube-system   kube-proxy-8qkvn                           1/1     Running            0          16m
kube-system   kube-proxy-pjj8r                           1/1     Running            0          12m
kube-system   kube-scheduler-k8scp                       1/1     Running            0          16m

Note how the Felix service runs a liveness check on a non-existing service on localhost, and how the BIRD service tries to fetch data from a non-existing file.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
vagrant@cp:~$ kubectl describe pod calico-node-5rhk6 -n kube-system
(...)
Node:                 worker1/192.178.33.120
(...)
IP:                   192.178.33.120
(...)
Containers:
  calico-node:
    Container ID:   cri-o://0ddff8a64db5a95388d00d2aa5086a89e14359afc7030d3993018469ee963914
    (...)
    Ready:          False
    (...)
    Liveness:   exec [/bin/calico-node -felix-live -bird-live] delay=10s timeout=10s period=10s #success=1 #failure=6
    Readiness:  exec [/bin/calico-node -felix-ready -bird-ready] delay=0s timeout=10s period=10s #success=1 #failure=3
    (...)
    Environment:
      (...)
      IP:                                 autodetect
      (...)
      CALICO_IPV4POOL_CIDR:               172.178.33.10/16
      IP_AUTODETECTION_METHOD:            can-reach=192.178.33.110
      (...)
Conditions:
  Type              Status
  Initialized       True 
  Ready             False 
  ContainersReady   False 
  PodScheduled      True 
(...)
Tolerations:     :NoSchedule op=Exists
                 :NoExecute op=Exists
                 CriticalAddonsOnly op=Exists
                 node.kubernetes.io/disk-pressure:NoSchedule op=Exists
                 node.kubernetes.io/memory-pressure:NoSchedule op=Exists
                 node.kubernetes.io/network-unavailable:NoSchedule op=Exists
                 node.kubernetes.io/not-ready:NoExecute op=Exists
                 node.kubernetes.io/pid-pressure:NoSchedule op=Exists
                 node.kubernetes.io/unreachable:NoExecute op=Exists
                 node.kubernetes.io/unschedulable:NoSchedule op=Exists
Events:
  Type     Reason     Age                   From               Message
  ----     ------     ----                  ----               -------
  Normal   Scheduled  12m                   default-scheduler  Successfully assigned kube-system/calico-node-5rhk6 to worker1
  Normal   Pulling    12m                   kubelet            Pulling image "docker.io/calico/cni:v3.20.0"
  Normal   Pulled     12m                   kubelet            Successfully pulled image "docker.io/calico/cni:v3.20.0" in 14.711193713s
  Normal   Started    12m                   kubelet            Started container upgrade-ipam
  Normal   Created    12m                   kubelet            Created container upgrade-ipam
  Normal   Created    12m                   kubelet            Created container install-cni
  Normal   Pulled     12m                   kubelet            Container image "docker.io/calico/cni:v3.20.0" already present on machine
  Normal   Started    12m                   kubelet            Started container install-cni
  Normal   Pulling    12m                   kubelet            Pulling image "docker.io/calico/pod2daemon-flexvol:v3.20.0"
  Normal   Created    12m                   kubelet            Created container flexvol-driver
  Normal   Pulled     12m                   kubelet            Successfully pulled image "docker.io/calico/pod2daemon-flexvol:v3.20.0" in 6.61434581s
  Normal   Started    12m                   kubelet            Started container flexvol-driver
  Normal   Pulling    12m                   kubelet            Pulling image "docker.io/calico/node:v3.20.0"
  Normal   Pulled     11m                   kubelet            Successfully pulled image "docker.io/calico/node:v3.20.0" in 9.621457222s
  Normal   Created    11m                   kubelet            Created container calico-node
  Normal   Started    11m                   kubelet            Started container calico-node
  Warning  Unhealthy  10m (x5 over 11m)     kubelet            Liveness probe failed: calico/node is not ready: Felix is not live: Get "http://localhost:9099/liveness": dial tcp 127.0.0.1:9099: connect: connection refused
  Warning  Unhealthy  2m22s (x47 over 11m)  kubelet            Readiness probe failed: calico/node is not ready: BIRD is not ready: Failed to stat() nodename file: stat /var/lib/calico/nodename: no such file or directory

Exposing a specific interface (IP) to init the cluster

The master/CP IP is passed and advertised to all nodes in the cluster that would like to join later on.

Define some networking variables first to use later.

1
2
CP_IP="192.178.33.110"
POD_CIDR="172.178.33.0/16"

Then decide whether to bring up the cluster using the CLI parameters or a YAML file. This guide focuses on the simplest method (CLI parameters).

With CLI parameters:

1
sudo kubeadm init --pod-network-cidr ${POD_CIDR} --apiserver-advertise-address=${CP_IP}

For the YAML file, check this guide) and also check the command kubeadm config print init-defaults, which will likely help you translating CLI parameters to those in the YAML file (e.g., "--apiserver-advertise-address" will be defined in .localAPIEndpoint.advertiseAddress).

Define the explicitly advertised IP in the CNI configuration

In this particular case, the CNI uses the Calico plugin. First download the Calico manifest.

1
2
wget https://docs.projectcalico.org/manifests/calico.yaml
cp -p calico.yaml calico-cni.yaml

Then modify the file to update with the specific networking data for your cluster, including the IP that was previously advertised (as well as the pod CIDR, as usual).

Defining the explicit interface in Calico is important because, otherwise, the first interface found will be auto-detected (see autodetection methods in Calico).

1
2
3
4
5
# Note: it is very important to apply correct indentation through spaces
sed -i "s|# - name: CALICO_IPV4POOL_CIDR|- name: CALICO_IPV4POOL_CIDR|g" calico-cni.yaml
sed -i "s|#   value: \"192.168.0.0/16\"|  value: \"${POD_CIDR}\"|g" calico-cni.yaml
# Note: the following can be used to make it further explicit, although it did not seem necessary to bring up the cluster
sed -i "s|#   value: \"192.168.0.0/16\"|  value: \"${POD_CIDR}\"\n            # Extra: adding to avoid auto-detection issues with IPs in VB\n            - name: IP_AUTODETECTION_METHOD\n              value: \"can-reach=${HOST_IP}\"|g" calico-cni.yaml

The resulting section will be as follows:

1
2
3
4
5
6
7
8
            # The default IPv4 pool to create on startup if none exists. Pod IPs will be
            # chosen from this range. Changing this value after installation will have
            # no effect. This should fall within `--cluster-cidr`.
            - name: CALICO_IPV4POOL_CIDR
              value: "172.178.33.0/16"
            # Extra: adding to avoid auto-detection issues with IPs in VB
            - name: IP_AUTODETECTION_METHOD
              value: "can-reach=192.178.33.110"

After applying this change and recreating the cluster, the calico node on k8s-worker1 should converge to the “Running” status.

1
2
3
4
5
6
7
8
9
10
11
12
13
vagrant@cp:~$ kubectl get pods -A -o wide
NAMESPACE     NAME                                       READY   STATUS    RESTARTS   AGE     IP               NODE     NOMINATED NODE   READINESS GATES
kube-system   calico-kube-controllers-5f6cfd688c-sxtwd   1/1     Running   0          5m4s    172.178.74.129   cp    <none>           <none>
kube-system   calico-node-p949f                          1/1     Running   0          5m4s    192.178.33.110   cp    <none>           <none>
kube-system   calico-node-vwknl                          0/1     Running   0          43s     192.178.33.120   worker1   <none>           <none>
kube-system   coredns-74ff55c5b-49bsf                    1/1     Running   0          5m6s    172.178.74.130   cp    <none>           <none>
kube-system   coredns-74ff55c5b-s9msd                    1/1     Running   0          5m6s    172.178.74.131   cp    <none>           <none>
kube-system   etcd-k8scp                                 1/1     Running   0          5m14s   192.178.33.110   cp    <none>           <none>
kube-system   kube-apiserver-k8scp                       1/1     Running   0          5m14s   192.178.33.110   cp    <none>           <none>
kube-system   kube-controller-manager-k8scp              1/1     Running   0          5m14s   192.178.33.110   cp    <none>           <none>
kube-system   kube-proxy-s46ch                           1/1     Running   0          43s     192.178.33.120   worker1   <none>           <none>
kube-system   kube-proxy-x9wps                           1/1     Running   0          5m6s    192.178.33.110   cp    <none>           <none>
kube-system   kube-scheduler-k8scp                       1/1     Running   0          5m14s   192.178.33.110   cp    <none>           <none>

And the Calico pod in k8s-worker1 should now be successfully initialised and ready.

However, the BIRD service is still failing the readiness probe. Some of the nodes may be unreachable via BGP and must be investigated.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
vagrant@cp:~$ kubectl describe pod calico-node-vwknl -n kube-system
(...)
Node:                 worker1/192.178.33.120
(...)
IP:                   192.178.33.120
Controlled By:  DaemonSet/calico-node
(...)
Containers:
  calico-node:
    Container ID:   docker://bc0276580a076444cf3fb23dadd4d1902d7f31efd95d1c09517dd20bb28419e4
    (...)
    Ready:          True
    (...)
    Liveness:   exec [/bin/calico-node -felix-live -bird-live] delay=10s timeout=10s period=10s #success=1 #failure=6
    Readiness:  exec [/bin/calico-node -felix-ready -bird-ready] delay=0s timeout=10s period=10s #success=1 #failure=3
    (...)
    Environment:
      (...)
      IP:                                 autodetect
      (...)
      CALICO_IPV4POOL_CIDR:               172.178.33.10/16
      IP_AUTODETECTION_METHOD:            can-reach=192.178.33.110
      (...)
Conditions:
  Type              Status
  Initialized       True 
  Ready             True 
  ContainersReady   True 
  PodScheduled      True 
(...)

Events:
  Type     Reason     Age   From               Message
  ----     ------     ----  ----               -------
  Normal   Scheduled  66s   default-scheduler  Successfully assigned kube-system/calico-node-vwknl to worker1
  Normal   Pulling    56s   kubelet            Pulling image "docker.io/calico/cni:v3.20.0"
  Normal   Pulled     45s   kubelet            Successfully pulled image "docker.io/calico/cni:v3.20.0" in 11.259383483s
  Normal   Created    44s   kubelet            Created container upgrade-ipam
  Normal   Started    44s   kubelet            Started container upgrade-ipam
  Normal   Pulled     44s   kubelet            Container image "docker.io/calico/cni:v3.20.0" already present on machine
  Normal   Created    44s   kubelet            Created container install-cni
  Normal   Started    44s   kubelet            Started container install-cni
  Normal   Pulling    43s   kubelet            Pulling image "docker.io/calico/pod2daemon-flexvol:v3.20.0"
  Normal   Pulled     38s   kubelet            Successfully pulled image "docker.io/calico/pod2daemon-flexvol:v3.20.0" in 4.55674576s
  Normal   Created    38s   kubelet            Created container flexvol-driver
  Normal   Started    38s   kubelet            Started container flexvol-driver
  Normal   Pulling    38s   kubelet            Pulling image "docker.io/calico/node:v3.20.0"
  Normal   Pulled     31s   kubelet            Successfully pulled image "docker.io/calico/node:v3.20.0" in 7.04211673s
  Normal   Created    30s   kubelet            Created container calico-node
  Normal   Started    30s   kubelet            Started container calico-node
  Warning  Unhealthy  28s   kubelet            Readiness probe failed: calico/node is not ready: BIRD is not ready: Error querying BIRD: unable to connect to BIRDv4 socket: dial unix /var/run/calico/bird.ctl: connect: connection refused

Even with the error above, all pods and nodes are ready at this point (although some Calico nodes may in unhealthy status). A random test to see if the networking is okay is made now by downloading the Nginx Load Balancer and assessing a proper deployment.

1
2
3
4
5
6
7
8
vagrant@cp:~$ kubectl apply -f https://k8s.io/examples/controllers/nginx-deployment.yaml
deployment.apps/nginx-deployment created

vagrant@cp:~$ kubectl get pod -n default
NAME                               READY   STATUS    RESTARTS   AGE
nginx-deployment-9456bbbf9-bsfhg   1/1     Running   0          24s
nginx-deployment-9456bbbf9-mw7t5   1/1     Running   0          24s
nginx-deployment-9456bbbf9-r5g5p   1/1     Running   0          24s

Success.