安装 Kubernetes 集群
学习资料
- https://k8s.easydoc.net/docs/dRiQjyTY/28366845/6GiNOzyZ/nd7yOvdY
https://blog.csdn.net/ibless/article/details/107899009
安装方式介绍
- minikube
只是一个 K8S 集群模拟器,只有一个节点的集群,只为测试用,master 和 worker 都在一起 - 裸机安装(Bare Metal)
至少需要两台机器(主节点、工作节点个一台),需要自己安装 Kubernetes 组件,配置会稍微麻烦点。
可以到各云厂商按时租用服务器,费用低,用完就销毁。
缺点:配置麻烦,缺少生态支持,例如负载均衡器、云存储。
kubectl run testapp --image=ccr.ccs.tencentyun.com/k8s-tutorial/test-k8s:v1
安装docker
查看版本
docker --version
卸载老的docker
for pkg in docker.io docker-doc docker-compose docker-compose-v2 podman-docker containerd runc; do sudo apt-get remove $pkg; done
安装版本为5:20.10.18~3-0~ubuntu-focal
的docker
VERSION_STRING=5:20.10.18~3-0~ubuntu-focal
sudo apt-get install docker-ce=$VERSION_STRING docker-ce-cli=$VERSION_STRING containerd.io docker-buildx-plugin docker-compose-plugin
docker取消sudo操作
sudo groupadd docker
sudo gpasswd -a ${USER} docker
sudo service docker restart
newgrp - docker
物理机安装
- kubeadm:用来初始化集群的指令。
- kubelet:在集群中的每个节点上用来启动 Pod 和容器等。
- kubectl:用来与集群通信的命令行工具。
参考这篇文章安装kubernetes
sudo apt-get update
# apt-transport-https 可能是一个虚拟包(dummy package);如果是的话,你可以跳过安装这个包
sudo apt-get install -y apt-transport-https ca-certificates curl gpg
curl -fsSL https://pkgs.k8s.io/core:/stable:/v1.25/deb/Release.key | sudo gpg --dearmor -o /etc/apt/keyrings/kubernetes-apt-keyring.gpg
echo 'deb [signed-by=/etc/apt/keyrings/kubernetes-apt-keyring.gpg] https://pkgs.k8s.io/core:/stable:/v1.25/deb/ /' | sudo tee /etc/apt/sources.list.d/kubernetes.list
apt-cache madison kubelet | awk '{ print $3 }'
1.25.16-1.1
1.25.15-1.1
1.25.14-1.1
1.25.13-1.1
1.25.12-1.1
1.25.11-1.1
1.25.10-1.1
1.25.9-1.1
sudo apt-get update
sudo apt-get purge kubelet kubeadm kubectl
umount $(df -HT | grep '/var/lib/kubelet/pods' | awk '{print $7}')
sudo apt-get install -y kubelet=1.25.16-1.1 kubeadm=1.25.16-1.1 kubectl=1.25.16-1.1
sudo apt-mark hold kubelet kubeadm kubectl
在 Debian 12 和 Ubuntu 22.04 之前的早期版本中,默认情况下不存在 /etc/apt/keyrings 目录; 你可以通过运行 sudo mkdir -m 755 /etc/apt/keyrings 来创建它。
此时kubelet.service
是没有启动,如果启动大概率会报错,我们需要修改一些配置
(base) wy@node3:~$ systemctl status kubelet.service
● kubelet.service - kubelet: The Kubernetes Node Agent
Loaded: loaded (/lib/systemd/system/kubelet.service; enabled; vendor preset: enabled)
Drop-In: /usr/lib/systemd/system/kubelet.service.d
└─10-kubeadm.conf
Active: inactive (dead)
Docs: https://kubernetes.io/docs/
可以看到kubelet.service
处于auto-restart
的状态
(base) wy@node3:~$ systemctl start kubelet.service
(base) wy@node3:~$ sudo systemctl status kubelet.service
● kubelet.service - kubelet: The Kubernetes Node Agent
Loaded: loaded (/lib/systemd/system/kubelet.service; enabled; vendor preset: enabled)
Drop-In: /usr/lib/systemd/system/kubelet.service.d
└─10-kubeadm.conf
Active: activating (auto-restart) (Result: exit-code) since Sun 2023-11-26 15:10:26 CST; 6s ago
Docs: https://kubernetes.io/docs/
Process: 680537 ExecStart=/usr/bin/kubelet $KUBELET_KUBECONFIG_ARGS $KUBELET_CONFIG_ARGS $KUBELET_KUB>
Main PID: 680537 (code=exited, status=1/FAILURE)
使用sudo journalctl -xeu kubelet | grep failed
查看错误原因
11月 26 15:15:47 node3 kubelet[682095]: E1126 15:15:47.887801 682095 run.go:74] "command failed" err="failed to load kubelet config file, path: /var/lib/kubelet/config.yaml, error: failed to load Kubelet config file /var/lib/kubelet/config.yaml, error failed to read kubelet config file \"/var/lib/kubelet/config.yaml\", error: open /var/lib/kubelet/config.yaml: no such file or directory"
-- Subject: Unit failed
-- The unit kubelet.service has entered the 'failed' state with result 'exit-code'.
这里假设主节点已经安装成功,在主节点运行
kubeadm token create --print-join-command
加入节点
此时运行
kubeadm join master:6443 --token 9a7fqs.jyil407qeh8i7prl --discovery-token-ca-cert-hash sha256:xxxxx
将会报以下错误
[ERROR CRI]: container runtime is not running
k8s支持cri-containerd与cri-docker,这里我们选择使用cri-docker
使用cri-dockerd
从这里下载安装
修改文件/lib/systemd/system/cri-docker.service
配置镜像拉去策略
ExecStart=/usr/bin/cri-dockerd --container-runtime-endpoint fd:// --pod-infra-container-image=registry.aliyuncs.com/google_containers/pause:3.8
sudo systemctl daemon-reload
sudo systemctl restart cri-docker
https://www.bilibili.com/video/BV1Ed4y1u7L3?p=5&vd_source=b3d0a7d246fbac11e6e6396ffaf89d1b
/etc/docker/daemon.json
{
"exec-opts": ["native.cgroupdriver=systemd"],
"data-root": "/ssd2/docker",
"registry-mirrors": ["https://registry.hub.docker.com","http://hub-mirror.c.163.com","https://docker.mirrors.ustc.edu.cn","https://registry.docker-cn.com"]
}
主节点运行
kubeadm config images list
kubeadm config images list --image-repository=registry.aliyuncs.com/google_containers
sudo kubeadm init \
--control-plane-endpoint="master" \
--apiserver-advertise-address=192.168.3.60 \
--image-repository=registry.aliyuncs.com/google_containers \
--cri-socket unix:///var/run/cri-dockerd.sock \
--pod-network-cidr=10.244.0.0/16 \
--service-cidr=10.96.0.0/12
注意必须指定--cri-socket unix:///var/run/cri-dockerd.sock
使用cri-containerd
把工作节点加入集群(只在工作节点跑)
sudo kubeadm join 192.168.3.60:6443 --token 5b238i.xx --discovery-token-ca-cert-hash sha256:xxx --v=5
此时报错如下:
[ERROR CRI]: container runtime is not running: output: time="2023-11-26T15:17:37+08:00" level=fatal msg="validate service connection: validate CRI v1 runtime API for endpoint \"unix:///var/run/containerd/containerd.sock\": rpc error: code = Unimplemented desc = unknown service runtime.v1.RuntimeService"
, error: exit status 1
这个错误的原因是docker安装的containerd
中默认禁用了cri
cat /etc/containerd/config.toml
disabled_plugins = ["cri"]
我们需要重新生存这个配置文件
sudo containerd config default | sudo tee /etc/containerd/config.toml
对于cgroup的介绍请参考这里。
cgroup 驱动有两个cgroupfs和systemd。
- cgroupfs 驱动是 kubelet 中默认的 cgroup 驱动。 当使用 cgroupfs 驱动时, kubelet 和容器运行时将直接对接 cgroup 文件系统来配置 cgroup。
- 当 systemd 是初始化系统时, 不 推荐使用 cgroupfs 驱动,因为 systemd 期望系统上只有一个 cgroup 管理器。 此外,如果你使用 cgroup v2, 则应用 systemd cgroup 驱动取代 cgroupfs。
sudo sed -i 's#SystemdCgroup = false#SystemdCgroup = true#g' /etc/containerd/config.toml
sudo sed -i 's#sandbox_image = "registry.k8s.io/pause:3.6"#sandbox_image = "registry.aliyuncs.com/k8sxio/pause:3.6"#g' /etc/containerd/config.toml
sudo systemctl restart containerd
docker info | grep Cgroup
Cgroup Driver: systemd
禁用swap
sudo systemctl restart containerd
运行
sudo kubeadm join 192.168.3.60:6443 --token 5b238i.xx --discovery-token-ca-cert-hash sha256:xxx --v=5
[kubelet-check] The HTTP call equal to 'curl -sSL http://localhost:10248/healthz' failed with error: Get "http://localhost:10248/healthz": dial tcp 127.0.0.1:10248: connect: connection refused.
使用sudo journalctl -xeu kubelet | grep failed
查看错误原因
11月 26 15:34:59 node3 kubelet[686400]: E1126 15:34:59.029926 686400 run.go:74] "command failed" err="failed to run Kubelet: running with swap on is not supported, please disable swap! or set --fail-swap-on flag to false. /proc/swaps contained: [Filename\t\t\t\tType\t\tSize\tUsed\tPriority /dev/sda2 partition\t9765884\t9765844\t-2]"
-- Subject: Unit failed
-- The unit kubelet.service has entered the 'failed' state with result 'exit-code'.
可以看到这是因为该节点没有禁用
swaps
sudo swapoff -a
禁用掉swap后,我们再重启kubelet.service
,此时kubelet.service
就正常启动了
sudo systemctl restart kubelet.service
sudo systemctl status kubelet.service
可能存在的问题
通过运行以下指令确认 br_netfilter 和 overlay 模块被加载:
lsmod | grep br_netfilter
lsmod | grep overlay
通过运行以下指令确认 net.bridge.bridge-nf-call-iptables、net.bridge.bridge-nf-call-ip6tables 和 net.ipv4.ip_forward 系统变量在你的 sysctl 配置中被设置为 1:
sysctl net.bridge.bridge-nf-call-iptables net.bridge.bridge-nf-call-ip6tables net.ipv4.ip_forward
上述命令如果结果不符合要求请参考这里进行修改
WARN[0000] runtime connect using default endpoints: [unix:///var/run/dockershim.sock unix:///run/containerd/containerd.sock unix:///run/crio/crio.sock unix:///var/run/cri-dockerd.sock]. As the default settings are now deprecated, you should set the endpoint instead.
ERRO[0000] validate service connection: validate CRI v1 runtime API for endpoint "unix:///var/run/dockershim.sock": rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing: dial unix /var/run/dockershim.sock: connect: no such file or directory"
https://github.com/kubernetes-sigs/cri-tools/issues/1089
sudo vim /etc/crictl.yaml
runtime-endpoint: "unix:///run/containerd/containerd.sock"
timeout: 0
debug: false
Error registering network: failed to acquire lease: node "k8s-node3" pod cidr not assigned
vim /etc/kubernetes/manifests/kube-controller-manager.yaml
apiVersion: v1
kind: Pod
metadata:
creationTimestamp: null
labels:
component: kube-controller-manager
tier: control-plane
name: kube-controller-manager
namespace: kube-system
spec:
containers:
- command:
- kube-controller-manager
- --authentication-kubeconfig=/etc/kubernetes/controller-manager.conf
- --authorization-kubeconfig=/etc/kubernetes/controller-manager.conf
- --bind-address=127.0.0.1
- --client-ca-file=/etc/kubernetes/pki/ca.crt
- --cluster-signing-cert-file=/etc/kubernetes/pki/ca.crt
- --cluster-signing-key-file=/etc/kubernetes/pki/ca.key
- --controllers=*,bootstrapsigner,tokencleaner
- --kubeconfig=/etc/kubernetes/controller-manager.conf
- --leader-elect=true
- --requestheader-client-ca-file=/etc/kubernetes/pki/front-proxy-ca.crt
- --root-ca-file=/etc/kubernetes/pki/ca.crt
- --service-account-private-key-file=/etc/kubernetes/pki/sa.key
- --use-service-account-credentials=true
- --allocate-node-cidrs=true
- --cluster-cidr=10.244.0.0/16
image: registry.aliyuncs.com/google_containers/kube-controller-manager:v1.15.0
重置主节点
sudo kubeadm reset
sudo kubeadm init --apiserver-advertise-address=192.168.3.60 --image-repository=registry.aliyuncs.com/google_containers
Aug 25 16:31:21 k8sharbor kubelet: E0825 16:31:21.756013 11035 certificate_manager.go:437] Failed while requesting a signed certificate from the master: cannot create certificate signing request: Unauthorized
sudo mv /var/lib/kubelet/ /var/lib/kubelet.bak
sudo mv /etc/kubernetes /etc/kubernetes.bak3
sudo mkdir -p /etc/kubernetes/manifests
sudo systemctl restart kubelet.service
sudo kubeadm join 10.110.1.11:6443 --token xx.xx --discovery-token-ca-cert-hash sha256:xxx --v=5
网络安装
https://github.com/flannel-io/flannel
如果你运行 kubectl describe pod/pod-name 发现 Events 中有下面这个错误
network: plugin type="flannel" failed (add): loadFlannelSubnetEnv failed: open /run/flannel/subnet.env: no such file or directory
在每个节点创建文件/run/flannel/subnet.env写入以下内容,配置后等待一会就好了
FLANNEL_NETWORK=10.244.0.0/16
FLANNEL_SUBNET=10.244.0.1/24
FLANNEL_MTU=1450
FLANNEL_IPMASQ=true