学习资料
安装方式介绍
kubectl run testapp --image=ccr.ccs.tencentyun.com/k8s-tutorial/test-k8s:v1
查看版本
docker --version
卸载老的docker
for pkg in docker.io docker-doc docker-compose docker-compose-v2 podman-docker containerd runc; do sudo apt-get remove $pkg; done
安装版本为5:20.10.18~3-0~ubuntu-focal的docker
5:20.10.18~3-0~ubuntu-focal
VERSION_STRING=5:20.10.18~3-0~ubuntu-focal sudo apt-get install docker-ce=$VERSION_STRING docker-ce-cli=$VERSION_STRING containerd.io docker-buildx-plugin docker-compose-plugin
docker取消sudo操作
sudo groupadd docker sudo gpasswd -a ${USER} docker sudo service docker restart newgrp - docker
参考这篇文章安装kubernetes
sudo apt-get update # apt-transport-https 可能是一个虚拟包(dummy package);如果是的话,你可以跳过安装这个包 sudo apt-get install -y apt-transport-https ca-certificates curl gpg
curl -fsSL https://pkgs.k8s.io/core:/stable:/v1.25/deb/Release.key | sudo gpg --dearmor -o /etc/apt/keyrings/kubernetes-apt-keyring.gpg echo 'deb [signed-by=/etc/apt/keyrings/kubernetes-apt-keyring.gpg] https://pkgs.k8s.io/core:/stable:/v1.25/deb/ /' | sudo tee /etc/apt/sources.list.d/kubernetes.list
apt-cache madison kubelet | awk '{ print $3 }'
1.25.16-1.1 1.25.15-1.1 1.25.14-1.1 1.25.13-1.1 1.25.12-1.1 1.25.11-1.1 1.25.10-1.1 1.25.9-1.1
sudo apt-get update sudo apt-get purge kubelet kubeadm kubectl umount $(df -HT | grep '/var/lib/kubelet/pods' | awk '{print $7}') sudo apt-get install -y kubelet=1.25.16-1.1 kubeadm=1.25.16-1.1 kubectl=1.25.16-1.1 sudo apt-mark hold kubelet kubeadm kubectl
在 Debian 12 和 Ubuntu 22.04 之前的早期版本中,默认情况下不存在 /etc/apt/keyrings 目录; 你可以通过运行 sudo mkdir -m 755 /etc/apt/keyrings 来创建它。
此时kubelet.service是没有启动,如果启动大概率会报错,我们需要修改一些配置
kubelet.service
(base) wy@node3:~$ systemctl status kubelet.service ● kubelet.service - kubelet: The Kubernetes Node Agent Loaded: loaded (/lib/systemd/system/kubelet.service; enabled; vendor preset: enabled) Drop-In: /usr/lib/systemd/system/kubelet.service.d └─10-kubeadm.conf Active: inactive (dead) Docs: https://kubernetes.io/docs/
可以看到kubelet.service处于auto-restart的状态
auto-restart
(base) wy@node3:~$ systemctl start kubelet.service (base) wy@node3:~$ sudo systemctl status kubelet.service ● kubelet.service - kubelet: The Kubernetes Node Agent Loaded: loaded (/lib/systemd/system/kubelet.service; enabled; vendor preset: enabled) Drop-In: /usr/lib/systemd/system/kubelet.service.d └─10-kubeadm.conf Active: activating (auto-restart) (Result: exit-code) since Sun 2023-11-26 15:10:26 CST; 6s ago Docs: https://kubernetes.io/docs/ Process: 680537 ExecStart=/usr/bin/kubelet $KUBELET_KUBECONFIG_ARGS $KUBELET_CONFIG_ARGS $KUBELET_KUB> Main PID: 680537 (code=exited, status=1/FAILURE)
使用sudo journalctl -xeu kubelet | grep failed查看错误原因
sudo journalctl -xeu kubelet | grep failed
11月 26 15:15:47 node3 kubelet[682095]: E1126 15:15:47.887801 682095 run.go:74] "command failed" err="failed to load kubelet config file, path: /var/lib/kubelet/config.yaml, error: failed to load Kubelet config file /var/lib/kubelet/config.yaml, error failed to read kubelet config file \"/var/lib/kubelet/config.yaml\", error: open /var/lib/kubelet/config.yaml: no such file or directory" -- Subject: Unit failed -- The unit kubelet.service has entered the 'failed' state with result 'exit-code'.
这里假设主节点已经安装成功,在主节点运行
kubeadm token create --print-join-command
此时运行
kubeadm join master:6443 --token 9a7fqs.jyil407qeh8i7prl --discovery-token-ca-cert-hash sha256:xxxxx
将会报以下错误
[ERROR CRI]: container runtime is not running
k8s支持cri-containerd与cri-docker,这里我们选择使用cri-docker
从这里下载安装
修改文件/lib/systemd/system/cri-docker.service配置镜像拉去策略
/lib/systemd/system/cri-docker.service
ExecStart=/usr/bin/cri-dockerd --container-runtime-endpoint fd:// --pod-infra-container-image=registry.aliyuncs.com/google_containers/pause:3.8
sudo systemctl daemon-reload sudo systemctl restart cri-docker
https://www.bilibili.com/video/BV1Ed4y1u7L3?p=5&vd_source=b3d0a7d246fbac11e6e6396ffaf89d1b/etc/docker/daemon.json
{ "exec-opts": ["native.cgroupdriver=systemd"], "data-root": "/ssd2/docker", "registry-mirrors": ["https://registry.hub.docker.com","http://hub-mirror.c.163.com","https://docker.mirrors.ustc.edu.cn","https://registry.docker-cn.com"] }
kubeadm config images list kubeadm config images list --image-repository=registry.aliyuncs.com/google_containers sudo kubeadm init \ --control-plane-endpoint="master" \ --apiserver-advertise-address=192.168.3.60 \ --image-repository=registry.aliyuncs.com/google_containers \ --cri-socket unix:///var/run/cri-dockerd.sock \ --pod-network-cidr=10.244.0.0/16 \ --service-cidr=10.96.0.0/12
注意必须指定--cri-socket unix:///var/run/cri-dockerd.sock
--cri-socket unix:///var/run/cri-dockerd.sock
把工作节点加入集群(只在工作节点跑)
sudo kubeadm join 192.168.3.60:6443 --token 5b238i.xx --discovery-token-ca-cert-hash sha256:xxx --v=5
此时报错如下:
[ERROR CRI]: container runtime is not running: output: time="2023-11-26T15:17:37+08:00" level=fatal msg="validate service connection: validate CRI v1 runtime API for endpoint \"unix:///var/run/containerd/containerd.sock\": rpc error: code = Unimplemented desc = unknown service runtime.v1.RuntimeService" , error: exit status 1
这个错误的原因是docker安装的containerd中默认禁用了cri
containerd
cri
cat /etc/containerd/config.toml disabled_plugins = ["cri"]
我们需要重新生存这个配置文件
sudo containerd config default | sudo tee /etc/containerd/config.toml
对于cgroup的介绍请参考这里。cgroup 驱动有两个cgroupfs和systemd。
sudo sed -i 's#SystemdCgroup = false#SystemdCgroup = true#g' /etc/containerd/config.toml sudo sed -i 's#sandbox_image = "registry.k8s.io/pause:3.6"#sandbox_image = "registry.aliyuncs.com/k8sxio/pause:3.6"#g' /etc/containerd/config.toml sudo systemctl restart containerd
docker info | grep Cgroup Cgroup Driver: systemd
sudo systemctl restart containerd
运行
[kubelet-check] The HTTP call equal to 'curl -sSL http://localhost:10248/healthz' failed with error: Get "http://localhost:10248/healthz": dial tcp 127.0.0.1:10248: connect: connection refused.
11月 26 15:34:59 node3 kubelet[686400]: E1126 15:34:59.029926 686400 run.go:74] "command failed" err="failed to run Kubelet: running with swap on is not supported, please disable swap! or set --fail-swap-on flag to false. /proc/swaps contained: [Filename\t\t\t\tType\t\tSize\tUsed\tPriority /dev/sda2 partition\t9765884\t9765844\t-2]" -- Subject: Unit failed -- The unit kubelet.service has entered the 'failed' state with result 'exit-code'.
可以看到这是因为该节点没有禁用swaps
swaps
sudo swapoff -a
禁用掉swap后,我们再重启kubelet.service,此时kubelet.service就正常启动了
sudo systemctl restart kubelet.service sudo systemctl status kubelet.service
通过运行以下指令确认 br_netfilter 和 overlay 模块被加载:
lsmod | grep br_netfilter lsmod | grep overlay
通过运行以下指令确认 net.bridge.bridge-nf-call-iptables、net.bridge.bridge-nf-call-ip6tables 和 net.ipv4.ip_forward 系统变量在你的 sysctl 配置中被设置为 1:
sysctl net.bridge.bridge-nf-call-iptables net.bridge.bridge-nf-call-ip6tables net.ipv4.ip_forward
上述命令如果结果不符合要求请参考这里进行修改
WARN[0000] runtime connect using default endpoints: [unix:///var/run/dockershim.sock unix:///run/containerd/containerd.sock unix:///run/crio/crio.sock unix:///var/run/cri-dockerd.sock]. As the default settings are now deprecated, you should set the endpoint instead. ERRO[0000] validate service connection: validate CRI v1 runtime API for endpoint "unix:///var/run/dockershim.sock": rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing: dial unix /var/run/dockershim.sock: connect: no such file or directory"
https://github.com/kubernetes-sigs/cri-tools/issues/1089sudo vim /etc/crictl.yaml
runtime-endpoint: "unix:///run/containerd/containerd.sock" timeout: 0 debug: false
Error registering network: failed to acquire lease: node "k8s-node3" pod cidr not assigned
vim /etc/kubernetes/manifests/kube-controller-manager.yaml apiVersion: v1 kind: Pod metadata: creationTimestamp: null labels: component: kube-controller-manager tier: control-plane name: kube-controller-manager namespace: kube-system spec: containers: - command: - kube-controller-manager - --authentication-kubeconfig=/etc/kubernetes/controller-manager.conf - --authorization-kubeconfig=/etc/kubernetes/controller-manager.conf - --bind-address=127.0.0.1 - --client-ca-file=/etc/kubernetes/pki/ca.crt - --cluster-signing-cert-file=/etc/kubernetes/pki/ca.crt - --cluster-signing-key-file=/etc/kubernetes/pki/ca.key - --controllers=*,bootstrapsigner,tokencleaner - --kubeconfig=/etc/kubernetes/controller-manager.conf - --leader-elect=true - --requestheader-client-ca-file=/etc/kubernetes/pki/front-proxy-ca.crt - --root-ca-file=/etc/kubernetes/pki/ca.crt - --service-account-private-key-file=/etc/kubernetes/pki/sa.key - --use-service-account-credentials=true - --allocate-node-cidrs=true - --cluster-cidr=10.244.0.0/16 image: registry.aliyuncs.com/google_containers/kube-controller-manager:v1.15.0
sudo kubeadm reset sudo kubeadm init --apiserver-advertise-address=192.168.3.60 --image-repository=registry.aliyuncs.com/google_containers
Aug 25 16:31:21 k8sharbor kubelet: E0825 16:31:21.756013 11035 certificate_manager.go:437] Failed while requesting a signed certificate from the master: cannot create certificate signing request: Unauthorized
sudo mv /var/lib/kubelet/ /var/lib/kubelet.bak sudo mv /etc/kubernetes /etc/kubernetes.bak3 sudo mkdir -p /etc/kubernetes/manifests sudo systemctl restart kubelet.service sudo kubeadm join 10.110.1.11:6443 --token xx.xx --discovery-token-ca-cert-hash sha256:xxx --v=5
https://github.com/flannel-io/flannel
如果你运行 kubectl describe pod/pod-name 发现 Events 中有下面这个错误
network: plugin type="flannel" failed (add): loadFlannelSubnetEnv failed: open /run/flannel/subnet.env: no such file or directory
在每个节点创建文件/run/flannel/subnet.env写入以下内容,配置后等待一会就好了
FLANNEL_NETWORK=10.244.0.0/16 FLANNEL_SUBNET=10.244.0.1/24 FLANNEL_MTU=1450 FLANNEL_IPMASQ=true