k8s集群中常见问题

1. Node status is NotReady 之一

Container Runtime Version: docker://Unknown

查看node 状态，发现是节点的状态是 Notready

kubectl get nodes
NAME   STATUS     ROLES    AGE   VERSION
vm1    Ready      master   13d   v1.12.5
vm2    NotReady   <none>   13d   v1.12.5

查看node 状态，发现是节点上的docker 版本有问题

kubectl describe node vm2
 
System Info:
......
 Container Runtime Version:  docker://Unknown

登录node ，发现docker 被删除了，重装node 的docker , 重启 docker

systemctl restart docker
systemctl status docker

再回到master ，查看node 的状态，变成 ready 。 System Info 下的 container runtime....

$ kubectl get nodes
NAME   STATUS   ROLES    AGE   VERSION
vm1    Ready    master   13d   v1.12.5
vm2    Ready    <none>   13d   v1.12.5
$ kubectl describe node vm2
Name:               vm2
.......
System Info:
.....
 Container Runtime Version:  docker://18.6.3

2. Node status is NotReady 之二

runtime network not ready: NetworkReady=false

查看node 状态，发现是节点的状态是 Notready

 Conditions:
  Type             Status  LastHeartbeatTime                 LastTransitionTime                Reason                       Message
  ----             ------  -----------------                 ------------------                ------                       -------
  MemoryPressure   False   Wed, 01 Apr 2020 13:35:29 +0800   Tue, 31 Mar 2020 22:22:18 +0800   KubeletHasSufficientMemory   kubelet has sufficient memory available
  DiskPressure     False   Wed, 01 Apr 2020 13:35:29 +0800   Tue, 31 Mar 2020 22:22:18 +0800   KubeletHasNoDiskPressure     kubelet has no disk pressure
  PIDPressure      False   Wed, 01 Apr 2020 13:35:29 +0800   Tue, 31 Mar 2020 22:22:18 +0800   KubeletHasSufficientPID      kubelet has sufficient PID available
  Ready            False   Wed, 01 Apr 2020 13:35:29 +0800   Tue, 31 Mar 2020 22:22:18 +0800   KubeletNotReady              runtime network not ready: NetworkReady=false reason:NetworkPluginNotReady message:docker: network plugin is not ready: cni config uninitialized

网络有问题

 You should now deploy a pod network to the cluster.
Run "kubectl apply -f [podnetwork].yaml" with one of the options listed at:
  https://kubernetes.io/docs/concepts/cluster-administration/addons/

忘了部署podnetwork

kubectl apply -f "https://cloud.weave.works/k8s/net?k8s-version=$(kubectl version | base64 | tr -d '\n')"

然后重启kubelet

systemctl restart kubelet

查看node 状态

kubectl describe node vm2

可以看到最后的event 正在加载

Events:
  Type    Reason                   Age    From             Message
  ----    ------                   ----   ----             -------
  Normal  Starting                 41m    kube-proxy, vm2  Starting kube-proxy.
  Normal  Starting                 11m    kubelet, vm2     Starting kubelet.
  Normal  NodeHasSufficientMemory  11m    kubelet, vm2     Node vm2 status is now: NodeHasSufficientMemory
  Normal  NodeHasNoDiskPressure    11m    kubelet, vm2     Node vm2 status is now: NodeHasNoDiskPressure
  Normal  NodeHasSufficientPID     11m    kubelet, vm2     Node vm2 status is now: NodeHasSufficientPID
  Normal  NodeAllocatableEnforced  11m    kubelet, vm2     Updated Node Allocatable limit across pods
  Normal  NodeReady                7m49s  kubelet, vm2     Node vm2 status is now: NodeReady

最后再查看状态, 节点已经Ready

kubectl get nodes

3. Node status is NotReady 之三

Kubelet stopped posting node status.

去 node vm2 上查看kubelet状态

systemctl status kubelet

报错： kubelet.service: Failed with result 'exit-code'.

排查后发现是swap 被打开了，再将注释掉，并使当前生效，重启kubelet

sed -i 's|/swap.img|#/swap.img|g' /etc/fstab
swapoff -a
systemctl restart kubelet
systemctl status kubelet

去masters查看node ，正常。

4. Node status is NotReady 之三

failed to find plugin "portmap" in path [/opt/cni/bin]

在node 上查看 kubelet , 发现使唤weave 部署网络，节点上无 portmap , 需要从master 将其拷贝到node ,再重启node 上的kubelet 。

root@gua-vm3:~# systemctl status kubelet
● kubelet.service - kubelet: The Kubernetes Node Agent
     Loaded: loaded (/lib/systemd/system/kubelet.service; enabled; vendor preset: enabled)
    Drop-In: /etc/systemd/system/kubelet.service.d
             └─10-kubeadm.conf
     Active: active (running) since Mon 2020-11-30 08:01:36 UTC; 24h ago
       Docs: https://kubernetes.io/docs/home/
   Main PID: 4555 (kubelet)
      Tasks: 16 (limit: 9451)
     Memory: 42.3M
     CGroup: /system.slice/kubelet.service
             └─4555 /usr/bin/kubelet --bootstrap-kubeconfig=/etc/kubernetes/bootstrap-kubelet.conf --kubeconfig=/etc/kubernetes/kubelet.conf --config=/var/lib/kubelet/config.yaml --network-plugin=cni --pod-infra-container-image=registr>
Dec 01 08:47:34 gua-vm3 kubelet[4555]:         },
Dec 01 08:47:34 gua-vm3 kubelet[4555]:         {
Dec 01 08:47:34 gua-vm3 kubelet[4555]:             "type": "portmap",
Dec 01 08:47:34 gua-vm3 kubelet[4555]:             "capabilities": {"portMappings": true},
Dec 01 08:47:34 gua-vm3 kubelet[4555]:             "snat": true
Dec 01 08:47:34 gua-vm3 kubelet[4555]:         }
Dec 01 08:47:34 gua-vm3 kubelet[4555]:     ]
Dec 01 08:47:34 gua-vm3 kubelet[4555]: }
Dec 01 08:47:34 gua-vm3 kubelet[4555]: : [failed to find plugin "portmap" in path [/opt/cni/bin]]
Dec 01 08:47:34 gua-vm3 kubelet[4555]: W1201 08:47:34.898238    4555 cni.go:239] Unable to update cni config: no valid networks found in /etc/cni/net.d

解决：

# on Master
cd /opt/cni/bin
scp portmap root@nodeIP:/opt/cni/bin

# on node
ls /opt/cni/bin/portmap 
systemctl restart kubelet

5.imagepullbackoff 的网络问题

部署helloworld 时 task 无法启动

imagepullbackoff 的网络问题

镜像加速:

在init 时重新指定批量拉取镜像的仓库:

kubeadm reset 
kubeadm init --config  config.yaml

root@vm1:~/k8ssetup# cat config.yaml

apiVersion: kubeadm.k8s.io/v1beta2
kubernetesVersion: v1.17.3
kind: ClusterConfiguration
networking:
  podSubnet: 192.168.0.0/16
image-repository: registry.cn-hangzhou.aliyuncs.com/google_containers

6. 没有设置允许master 部署pod 的报错

1 node(s) had taints that the pod didn't tolerate.

root@vm1:~/k8ssetup# kubectl -n tekton-pipelines get pods
NAME                                           READY   STATUS    RESTARTS   AGE
tekton-dashboard-6d9f5b4fc5-vj7q2              0/1     Pending   0          2m33s
tekton-pipelines-controller-7f66b8bd95-c7kn8   0/1     Pending   0          3m5s
tekton-pipelines-webhook-7cddbc485f-m6t7f      0/1     Pending   0          3m3s


root@vm1:~/k8ssetup# kubectl -n tekton-pipelines describe pod tekton-dashboard-6d9f5b4fc5-vj7q2 
Name:           tekton-dashboard-6d9f5b4fc5-vj7q2
Namespace:      tekton-pipelines
............
Events:
  Type     Reason            Age                  From               Message
  ----     ------            ----                 ----               -------
  Warning  FailedScheduling  11s (x4 over 2m47s)  default-scheduler  0/1 nodes are available: 1 node(s) had taints that the pod didn't tolerate.

taint master:

root@vm1:~/k8ssetup#     kubectl taint nodes --all node-role.kubernetes.io/master-
node/vm1 untainted
root@vm1:~/k8ssetup# kubectl -n tekton-pipelines describe pod tekton-dashboard-6d9f5b4fc5-vj7q2 
Name:           tekton-dashboard-6d9f5b4fc5-vj7q2
........
Events:
  Type     Reason            Age                  From               Message
  ----     ------            ----                 ----               -------
  Warning  FailedScheduling  41s (x5 over 4m48s)  default-scheduler  0/1 nodes are available: 1 node(s) had taints that the pod didn't tolerate.
  Normal   Scheduled         18s                  default-scheduler  Successfully assigned tekton-pipelines/tekton-dashboard-6d9f5b4fc5-vj7q2 to vm1
root@vm1:~/k8ssetup# kubectl -n tekton-pipelines get pods
NAME                                           READY   STATUS              RESTARTS   AGE
tekton-dashboard-6d9f5b4fc5-vj7q2              0/1     ContainerCreating   0          4m57s
tekton-pipelines-controller-7f66b8bd95-c7kn8   0/1     ContainerCreating   0          5m29s
tekton-pipelines-webhook-7cddbc485f-m6t7f      0/1     ContainerCreating   0          5m27s

node 的roles 显示为none

root@gua-vm2:/etc/cni/net.d# kubectl get nodes
NAME      STATUS   ROLES    AGE   VERSION
gua-vm1   Ready    master   25h   v1.19.4
gua-vm2   Ready    worker   25h   v1.19.4
gua-vm3   Ready    <none>   25h   v1.19.4
root@gua-vm2:/etc/cni/net.d# kubectl label nodes gua-vm3 kubernetes.io/role=worker
node/gua-vm3 labeled
root@gua-vm2:/etc/cni/net.d# kubectl get nodes
NAME      STATUS   ROLES    AGE   VERSION
gua-vm1   Ready    master   25h   v1.19.4
gua-vm2   Ready    worker   25h   v1.19.4
gua-vm3   Ready    worker   25h   v1.19.4

Navigation

Recent Posts

Friend Links