云原生03-kubernetes搭建集群环境
使用kubeadm部署Kubernetes集群(生产环境 1 控制 2 工作)
准备
-
准备三台机器 xuegod62 xuegod63 xuegod64 (可以在1台机器安装完必备库后,再复制)
-
关闭 SELinux SELinux (Security-Enhanced·Linux) 是一种基于 Mandatory·Access·Control (MAC) 的安全模块,它可以在Linux系统中提供强制访问控制机制。通过SELinux,系统管理员可以对系统中的各种对象(如文件、进程、网络端口等)进行更加精细的安全控制,提高系统的安全性。 在安装k8s时,关闭·SELinux·是因为默认情况下·SELinux·会阻止·Kubernetes·一些操作,如kubelet·对容器文件的访问等。为了避免由于·SELinux·导致·Kubernetes·运行不正常,建议在安装Kubernetes· 之前关闭· SELinux。
-
安装必备库
$ yum install -y yum-utils device-mapper-persistent-data lvm2 wget net-tools nfs-utils lrzsz gcc gcc-c++ make cmake libxml2 openssl-devel curl curl-devel unzip sudo libaio-devel vim ncurses-devel autoconf automake zlib-devel epel-release openssh-server socat conntrack telnet ipvsadm
- 配置无密码登陆 1)配置xuegod63到其他机器免密登录
$ ssh-keygen #一路回车,不输入密码把本地的ssh公钥文件安装到远程主机对应的账户
$ ssh-copy-id xuegod63
$ ssh-copy-id xuegod64
$ ssh-copy-id xuegod62
- 关闭主机防火墙
$ systemctl stop firewalld
$ systemctl disable firewalld
- 关闭交换分区
$ swapoff -a
交换分区(Swap)是为了在内存不足时,把部分内存的数据交换到硬盘上,以释放内存空间的一种机制。这样,即使物理内存不足,也可以保证系同正常运行 在安装Kubernetes时,需要禁用交换分区。这是因为Kubernetes在运行时需要使用大量的内存和CPU资源,如果系统开始使用交换分区,会导致性能下降,严重影响Kubernetes的正常运行。因此,为了保证Kubernetes的性能和稳定性,建议在安装Kubernetes时禁用交换分区。
- 修改内核参数
modprobe br_netfilter
modprobe·是一个Linux命令,它用于动态地加载内核模块到Linux内核中。br_netfilter是Linux内核模块之一,它提供了桥接网络设备和Netfilter之间的接口。Netfilter是Linux内核中的一个框架,它可以在数据包通过网络协议栈时进行修改或过滤。在 Kubernetes 中,br_netfilter 模块用于实现Kubernetes 集群中的网络功能。通过加载br_netfilter 模块,我们可以确保在Kubernetes集群中使用的iptables规则正确应用。
$ cat > /etc/sysctl.d/k8s.conf <<EOF
net.bridge.bridge-nf-call-ip6tables = 1
net.bridge.bridge-nf-call-iptables = 1
net.ipv4.ip_forward = 1
EOF
# 使其生效
$ sysctl -p /etc/sysctl.d/k8s.conf
1)net.bridge.bridge-nf-call-ip6tables:当数据包经过网桥时,是否需要将·IPv6·数据包传递给iptables·进行处理。将其设置为·1·表示启用。 2)net.bridge.bridge-nf-call-iptables:当数据包经过网桥时,是否需要将·IPv4·数据包传递给iptables·进行处理。将其设置为·1·表示启用。 3)net.ipv4.ip_forward:是否允许主机转发网络包。将其设置为·1·表示启用。这些参数是为了让·Linux·系统的网络功能可以更好地支持·Kubernetes·的网络组件(如flannel、Calico·等),启用这些参数可以确保集群中的·Pod·能够正常通信和访问外部网络。
- 配置安装 docker 和 containerd 的需要的阿里云 yum 源
$ yum-config-manager --add-repo http://mirrors.aliyun.com/docker-ce/linux/centos/docker-ce.repo
# 配置 Kubernetes 的 yum 源
$ cat > /etc/yum.repos.d/kubernetes.repo <<EOF
[kubernetes]
name= Kubernetes
baseurl=https://mirrors.aliyun.com/kubernetes/yum/repos/kubernetes-el7-x86_64/
enabled=1
gpgcheck=0
EOF
-
时间同步 (参考前面章节)
-
安装 contrinerd 在·Kubernetes·集群中,containerd·是容器运行时,它的主要作用是负责管理节点上的容器,实现容器的创建、销毁、运行、暂停、恢复等操作。而·Pod·是·Kubernetes·中最基本的调度单元,一个Pod·包含一个或多个紧密关联的容器,在·Kubernetes·集群中,当一个·Pod·被调度到一个节点上时,Kubernetes·就会基于containerd·在pod里运行容器。
$ yum install containerd.io-1.6.22 --allowerasing -y
修改配置
$ vim config.toml
sandbox_image = "registry.aliyuncs.com/google_containers/pause:3.7"
SystemdCgroup = true
- 安装初始化k8s需要的组件
$ yum install -y kubelet-1.26.0 kubeadm-1.26.0 kubectl-1.26.0
$ systemctl enable kubelet
kubelet:kubelet是Kubernetes集群中的一个核心组件,是每个节点上的代理服务,负责与主控制节点通信,管理节点上的Pod和容器。 kubelet的主要职责包括: 监控pod的状态并按需启动或停止容器、检查容器是否正常运行、与主控制节点通信,将节点状态和Pod状态上报给主控制节点、通过各种插件(如volume插件)与其他组件协同工作、管理容器的生命周期,包括启动、停止、重启等、拉取镜像 kubeadm:用于初始化k8s集群的命令行工具 kubectl:用于和集群通信的命令行,通过kubectl可以部署和管理应用,查看各种资源,创建、删除和更新各种组件
初始化集群
备注:安装k8s,物理机网段、pod网段、service网段不能冲突
$ kubeadm config print init-defaults > kubeadm.yaml
修改 kubeadm.yaml 文件
localAPIEndpoint:
advertiseAddress: 192.168.40.63
nodeRegistration:
# find / -name containerd 去查看真实路径
criSocket: unix:///run/containerd/containerd.sock
name: xuegod63
# 改成阿里云的
imageRepository: registry:cn-hangzhou.aliyuncs.com/google_containers
networking:
dnsDomain: cluster.local
serviceSubnet: 110.96.0.0/12
# 增加pod网段
podSubnet: 10.241.0.0/12
# 以下末尾增加
---
apiVersion: kubeproxy.config.k8s.io/v1alpha1
kind: KubeProxyConfiguration
mode: ipvs
---
apiVersion: kubelet.config.k8s.io/v1beta1
kind: KubeletConfiguration
cgroupDriver: systemd
- KubeProxyConfiguration 配置段
该配置段用于设置 Kubernetes 集群中的 kube-proxy 组件。 mode: ipvs 指定了 kube-proxy 使用 IPVS 模式。IPVS 模式相比传统的 iptables 模式具有更好的性能和扩展性,适合大规模集群使用。 ipvs性能比iptables好 2. KubeletConfiguration 配置段
该配置段用于设置 Kubernetes 节点上的 kubelet 组件。 cgroupDriver: systemd 指定了 cgroup 驱动为 systemd。确保 kubelet 使用的 cgroup 驱动与容器运行时(如 Docker)使用的驱动一致,避免由于 cgroup 驱动不匹配导致的问题。
- 解压必备镜像
# 示例之前打包好的多个镜像到一个压缩文件
# $ ctr -n=k8s.io images export k8s1.26.0.tar.gz egistry.aliyuncs.com/google_containers/pause:3.7 registry.cn-hangzhou.aliyuncs.com/google_containers/coredns:v1.9.3
# 导入必备镜像,如果网络良好,可忽略,后续其会自动下载
$ ctr -n=k8s.io images import k8s_1.26.0.tar.gz
-
安装
# 最好先拉取镜像 (如果上面导入部分) $ kubeadm config images pull --config=kubeadm.yaml # 可能检测configs模块,所以忽略 $ kubeadm init --config=kubeadm.yaml --ignore-preflight-errors=SystemVerification # 成功后末尾如下显示 To start using your cluster, you need to run the following as a regular user: mkdir -p $HOME/.kube sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config sudo chown $(id -u):$(id -g) $HOME/.kube/config Alternatively, if you are the root user, you can run: export KUBECONFIG=/etc/kubernetes/admin.conf You should now deploy a pod network to the cluster. Run "kubectl apply -f [podnetwork].yaml" with one of the options listed at: https://kubernetes.io/docs/concepts/cluster-administration/addons/ Then you can join any number of worker nodes by running the following on each as root: kubeadm join 192.168.59.63:6443 --token abcdef.0123456789abcdef \ --discovery-token-ca-cert-hash sha256:ebf0118f85c8bc4897e36ce1cfaa7b48d9259b8c6ace421728b4fc4620ed2a05
-
使用
# 参考上方启动 集群命令
$ mkdir -p $HOME/.kube
$ sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
$ sudo chown $(id -u):$(id -g) $HOME/.kube/config
# 可以看节点了
$ kubectl get nodes
NAME STATUS ROLES AGE VERSION
xuegod63 NotReady control-plane 9m21s v1.26.0
# 现在只有控制节点是,没工作节点,所以未准备好
可能问题
was not specified or empty, use -container-runtime-endpoint to set
vim /usr/lib/systemd/system/kubelet.service
ExecStart=/usr/bin/kubelet --container-runtime-endpoint=unix:///run/containerd/containerd.sock
$ systemctl daemon-reload
# 删除
$ kubeadm reset
# 重新安装
$ kubeadm init --config=kubeadm.yaml --ignore-preflight-errors=SystemVerification
container runtime status check may not have completed PlEG is not healthy pleg has yet to be successful
Kubelet internal check pointerr="checkpoint IS not found
ExecStart=/usr/bin/kubelet --container-runtime-endpoint=unix:///run/containerd/containerd.sock --kubeconfig=/etc/kubernetes/kubelet.conf --config=/var/lib/kubelet/config.yaml
然后重装
-
工作节点也出现上述问题,可能是 kubelet 没装好,卸载后重装
-
journalctl -u kubelet -n 100 --no-pager
查看问题
1)出现 get sandbox image "registry.k8s.io/pause:3.6":
这是 containerd 中镜像仓库没配置成国内,所以未识别,设置参考 `ctr -n k8s.io images ls` 中 pause的地址
sandbox_image = "registry.cn-hangzhou.aliyuncs.com/google_containers/pause:3.9"
- 添加工作节点
# 跟上面完成后得到的效果一样
$ kubeadm token create --print-join-command
kubeadm join xxx
# 其他机器加入结果
$ kubeadm join xxx --ignore-preflight-errors=SystemVerification
# 成功了
kubeadm join 192.168.59.63:6443 --token abcdef.0123456789abcdef --discovery-token-ca-cert-hash sha256:ebf0118f85c8bc4897e36ce1cfaa7b48d9259b8c6ace421728b4fc4620ed2a05 --ignore-preflight-errors=SystemVerification
[preflight] Running pre-flight checks
[WARNING FileExisting-tc]: tc not found in system path
[preflight] Reading configuration from the cluster...
[preflight] FYI: You can look at this config file with 'kubectl -n kube-system get cm kubeadm-config -o yaml'
[kubelet-start] Writing kubelet configuration to file "/var/lib/kubelet/config.yaml"
[kubelet-start] Writing kubelet environment file with flags to file "/var/lib/kubelet/kubeadm-flags.env"
[kubelet-start] Starting the kubelet
[kubelet-start] Waiting for the kubelet to perform the TLS Bootstrap...
This node has joined the cluster:
* Certificate signing request was sent to apiserver and a response was received.
* The Kubelet was informed of the new secure connection details.
Run 'kubectl get nodes' on the control-plane to see this node join the cluster.
- 检测安装
$ kubectl get nodes
NAME STATUS ROLES AGE VERSION
xuegod63 NotReady control-plane 3h39m v1.26.0
xuegod64 NotReady <none> 2m55s v1.26.0
# 因为网络组件没安装 所以 coredns 未准备好
$ kubectl get pods -n kube-system
NAME READY STATUS RESTARTS AGE
coredns-567c556887-bnhtn 0/1 Pending 0 3h40m
coredns-567c556887-dtrc6 0/1 Pending 0 3h40m
etcd-xuegod63 1/1 Running 1 3h40m
kube-apiserver-xuegod63 1/1 Running 1 3h40m
kube-controller-manager-xuegod63 1/1 Running 1 3h40m
kube-proxy-qdc9p 1/1 Running 1 3m45s
kube-proxy-xz2lm 1/1 Running 0 3h40m
kube-scheduler-xuegod63 1/1 Running 1 3h40m
- 安装网络组建 calico 在控制节点安装
Calico·是一个开源的·Kubernetes·网络插件。 Calico·的设计理念是基于·Linux·系统网络的,使用标准的·Linux·路由和·iptables·规则,以实现高性能的网络互联和安全隔离。它支持·IPv4·和·IPv6·双栈网络,并且可以轻松地与现有的数据中心网络集成,提供了丰富的网络策略功能,可以通过简单的标签选择器来定义容器间的网络隔离和通信规则。 Calico·的架构采用了一种分布式的方式,其中每个节点都运行一个·Calico·的网络代理组件,称为Felix,Felix·负责监听·Kubernetes·APl·Server,获取节点和网络信息,并在本地维护一个路由表和iptables·规则集,用于控制容器的网络访问。当容器发生变化时,Felix·会自动更新路由表和·iptables规则,确保网络互联和隔离的正确性。 ICalico·还支持·BGP·协议,可以将·Kubernetes·集群扩展到跨数据中心和云提供商的多个节点上,从而实现灵活、可扩展的容器网络解决方案。
$ ctr -n=k8s.io images import calico.tar.gz
# 导入
$ curl https://raw.githubusercontent.com/projectcalico/calico/v3.29.1/manifests/calico.yaml -O
- name: CLUSTER_TYPE
value: "k8s,bgp"
# 新增2行, ifconfig查看网卡名,保证其别指向回环网卡,不能跨节点通信
- name: IP_AUTODETECTION_METHOD
value: "interface=ens160"
$ kubectl apply -f calico.yaml
问题: 出现 Init:ImagePullBackOff
$ kubectl get pods -n kube-system
NAME READY STATUS RESTARTS AGE
calico-kube-controllers-fcb7b8c57-pbsgj 0/1 ContainerCreating 0 10m
calico-node-6d2rc 0/1 Init:ImagePullBackOff 0 10m
calico-node-dfgjd 0/1 Init:ImagePullBackOff 0 10m
$ kubectl get pods -n kube-system -o wide 查看那台机器问题
$ kubectl describe pod calico-node-xxxx -n kube-system
# 问题如下
Normal Pulling 6m47s (x4 over 9m23s) kubelet Pulling image "docker.io/calico/cni:v3.29.1"
# 进入对应机器拉取镜像(加空间 k8s.io, 其获取的是 k8s 镜像空间)
$ ctr -n k8s.io image pull docker.io/calico/cni:v3.29.1
# 也要加,否则后续还会报错
$ ctr -n k8s.io image pull docker.io/calico/node:v3.29.1
# 删除镜像等待重新构建
$ kubectl delete pod calico-node-xxxx -n kube-system
# 可见成功了,另一个节点一样,需要在另一台机器操作
$ # kubectl get pods -n kube-system
NAME READY STATUS RESTARTS AGE
calico-kube-controllers-fcb7b8c57-pbsgj 0/1 ImagePullBackOff 0 43m
calico-node-lcbj2 0/1 Init:ErrImagePull 0 29s
calico-node-qfwtq 1/1 Running 0 2m21s
coredns-567c556887-bnhtn 1/1 Running 0 4h34m
coredns-567c556887-dtrc6 1/1 Running 0 4h34m
etcd-xuegod63 1/1 Running 1 4h34m
kube-apiserver-xuegod63 1/1 Running 1 4h34m
kube-controller-manager-xuegod63 1/1 Running 2 (8m7s ago) 4h34m
kube-proxy-qdc9p 1/1 Running 1 57m
kube-proxy-xz2lm 1/1 Running 0 4h34m
kube-scheduler-xuegod63 1/1 Running 2 (8m2s ago) 4h34m
# calico-kube-controllers-fcb7b8c57-pbsgj同理
$ ctr -n k8s.io images pull docker.io/calico/kube-controllers:v3.29.1
# 另一台网络组件弄好后最终效果
$ kubectl get pods -n kube-system
NAME READY STATUS RESTARTS AGE
calico-kube-controllers-fcb7b8c57-pbsgj 1/1 Running 0 50m
calico-node-m4prj 1/1 Running 0 2m37s
calico-node-qfwtq 1/1 Running 0 9m33s
coredns-567c556887-bnhtn 1/1 Running 0 4h41m
coredns-567c556887-dtrc6 1/1 Running 0 4h41m
etcd-xuegod63 1/1 Running 1 4h41m
kube-apiserver-xuegod63 1/1 Running 1 4h41m
kube-controller-manager-xuegod63 1/1 Running 2 (15m ago) 4h41m
kube-proxy-qdc9p 1/1 Running 1 64m
kube-proxy-xz2lm 1/1 Running 0 4h41m
kube-scheduler-xuegod63 1/1 Running 2 (15m ago) 4h41m
# 都进入准备状态了
$ kubectl get nodes
NAME STATUS ROLES AGE VERSION
xuegod63 Ready control-plane 4h43m v1.26.0
xuegod64 Ready <none> 66m v1.26.0
kubectl get pod -n kube-system calico-kube-controllers-d886b8fff-s8lzd···0/1.····Pending状态,是因为现在只有一个控制节点,calico-kube-controllers-d886b8fff-s8lzd不能调度到控制节点,所以是pending,等到工作节点创建好之后,pod就会调度到工作节点上
初始化其他节点
为什么工作节点不能执行 kubectrl get pod
工作节点操作
- 查看
# 没有/root/.kube/config 对应文件
$ kubectl get nodes
E0121 16:03:57.098347 7086 memcache.go:238] couldn't get current server API group list: Get "http://localhost:8080/api?timeout=32s": dial tcp [::1]:8080: connect: connection refused
# 也可以用命令看,看到没对应信息,主节点却有
$ kubectl config view
kubectl config view
apiVersion: v1
clusters: null
contexts: null
current-context: ""
kind: Config
preferences: {}
users: null
- 创建目录
$ mkdir ~/.kube
# 拷贝
$ scp xuegod63:~/.kube/config ~/.kube/
# 有了
$ kubectl config view
apiVersion: v1
clusters:
- cluster:
certificate-authority-data: DATA+OMITTED
server: https://192.168.59.63:6443
name: kubernetes
contexts:
- context:
cluster: kubernetes
user: kubernetes-admin
name: kubernetes-admin@kubernetes
current-context: kubernetes-admin@kubernetes
kind: Config
preferences: {}
users:
- name: kubernetes-admin
user:
client-certificate-data: DATA+OMITTED
client-key-data: DATA+OMITTED
# 查看效果
$ kubectl get nodes
NAME STATUS ROLES AGE VERSION
xuegod63 Ready control-plane 5h17m v1.26.0
xuegod64 Ready <none> 100m v1.26.0
接下来再加一个工作节点
- 注意 工作节点不需要先启动 kubectl ,必须要执行
kubeadm join
后会自动启动
/etc/hosts
文件都加上xuegod62
192.168.59.62 xuegod62
# 主节点执行
$ kubeadm token create --print-join-command
kubeadm join 192.168.59.63:6443 --token 5z9abq.91kr5e0xasecx4gh --discovery-token-ca-cert-hash sha256:ebf0118f85c8bc4897e36ce1cfaa7b48d9259b8c6ace421728b4fc4620ed2a05
# 工作节点执行
$ kubeadm join 192.168.59.63:6443 --token 5z9abq.91kr5e0xasecx4gh --discovery-token-ca-cert-hash sha256:ebf0118f85c8bc4897e36ce1cfaa7b48d9259b8c6ace421728b4fc4620ed2a05 --ignore-preflight-errors=SystemVerification
- 问题
$ kubeadm join 192.168.59.63:6443 --token 5z9abq.91kr5e0xasecx4gh --discovery-token-ca-cert-hash sha256:ebf0118f85c8bc4897e36ce1cfaa7b48d9259b8c6ace421728b4fc4620ed2a05 --ignore-preflight-errors=SystemVerification
[preflight] Running pre-flight checks
[WARNING FileExisting-tc]: tc not found in system path
error execution phase preflight: [preflight] Some fatal errors occurred:
[ERROR FileContent--proc-sys-net-bridge-bridge-nf-call-iptables]: /proc/sys/net/bridge/bridge-nf-call-iptables does not exist
[preflight] If you know what you are doing, you can make a check non-fatal with `--ignore-preflight-errors=...`
To see the stack trace of this error execute with --v=5 or higher
可能是内核模块没有加载,重新加载即可
# 查看是否加载,输出为空则没加载
$ lsmod | grep br_netfilter
# 加载内核模块
$ modprobe br_netfilter
# 可查看已有
$ lsmod | grep br_netfilter
br_netfilter 28672 0
bridge 294912 1 br_netfilter
- 修改worker标签
# 默认是 none ,修改为 worker
kubectl label node xuegod62 node-role.kubernetes.io/worker=worker
总结
kubeadm初始化k8s证书过期解决方案
查看有效期
可以看到默认为1年
$ openssl x509 -in /etc/kubernetes/pki/apiserver.crt -noout -text | grep Not
Not Before: Jan 21 02:50:49 2025 GMT
Not After : Jan 21 02:50:50 2026 GMT
修改证书有效期
控制节点执行
$ git clone https://github.com/yuyicai/update-kube-cert.git
$ cd update-kubeadm-cert
$ chmod 755 update-kubeadm-cert.sh
# 搜索 CERT_DAYS 将其天数改成 365000 (1000年)
$ ./update-kubeadm-cert.sh all
# 查看
$ openssl x509 -in /etc/kubernetes/pki/apiserver.crt -noout -text | grep Not
Not Before: Jan 21 09:42:16 2025 GMT
Not After : May 24 09:42:16 3024 GM
测试k8s 集群的 DNS 角解析和网络是否正常
# 拉取镜像 (工作节点都要拉取)
$ ctr -n k8s.io images pull docker.io/library/busybox:latest
# 进入镜像
$ kubectl run --image docker.io/library/busybox:latest --restart=Never --rm -it busybox -- sh
# 测试百度是否通
$ ping www.baidu.com
PING www.baidu.com (103.235.46.96): 56 data bytes
64 bytes from 103.235.46.96: seq=0 ttl=127 time=64.728 ms
64 bytes from 103.235.46.96: seq=1 ttl=127 time=79.948 ms
64 bytes from 103.235.46.96: seq=2 ttl=127 time=64.765 ms
64 bytes from 103.235.46.96: seq=3 ttl=127 time=81.193 ms
^C
--- www.baidu.com ping statistics ---
4 packets transmitted, 4 packets received, 0% packet loss
round-trip min/avg/max = 64.728/72.658/81.193 ms
# 测试 DNS 名称空间来自 `kubectl get svc` and `kubectl get svc -n kube-system`
$ nslookup kubernetes.default.svc.cluster.local
Server: 10.96.0.10
Address 1: 10.96.0.10 kube-dns.kube-system.svc.cluster.local
Name: kubernetes.default.svc.cluster.local
Address 1: 10.96.0.1 kubernetes.default.svc.cluster.local
搭建多master节点的高可用集群
xuegod63 xuegod62 xuegod64 控制节点 xuegod66 工作节点
控制节点参考前面部分
通过keepalived+nginx实现k8s apiserver节点高可用
安装 nginx 和keepalived
在xuegod63和xuegod64上安装keepalived和nginx,实现对apiserver负载均衡和反向代理。 Xuegod63是keepalived主节点,xuegod64是keepalived备节点。
$ yum install epel-release nginx keepalived nginx-mod-stream -y
修改 nginx 配置文件
#四层负载均衡,为两台Master apiserver组件提供负载均衡
stream {
log_format main '$remote_addr $upstream_addr - [$time_local] $status $upstream_bytes_sent';
access_log /var/log/nginx/k8s-access.log main;
upstream k8s-apiserver {
server 192.168.59.63:6443 weight=5 max_fails=3 fail_timeout=30s;
server 192.168.59.64:6443 weight=5 max_fails=3 fail_timeout=30s;
server 192.168.59.62:6443 weight=5 max_fails=3 fail_timeout=30s;
}
server{
Listen 16443; #由于nginx与master节点复用,这个监听端口不能是6443,否则会冲突
proxy_pass k8s-apiserver;
}
}
备注: nginx配置文件参数解释: 1、weight·指定了每个后端服务器的权重,用于调节请求的分配比例,例如上述配置中三个后端服务器 的权重都为·5,则每个服务器会均衡地处理·1/3·的请求。 2、max_fails-指定了最大的失败次数,如果在·fail_timeout-时间内连续失败了·max_fails·次,则该 后端服务器会被暂时认为是不可用的,不再向其分配请求。 3、fail_timeout·指定了服务器被认为是不可用的时间,即在该时间段内连续失败了·max_fails·次,则 该后端服务器会被暂时认为是不可用的。
修改keepalive 配置文件
主节点
# 检测脚本
vrrp_script check_nginx {
script "/etc/keepalived/check_nginx.sh"
}
vrrp_instance VI_1{
state MASTER
interface ens33 #修改为实际网卡名
virtual_router_id 51 #VRRP路由ID实例,每个实例是唯一的
priority 100 #优先级,备服务器设置90
advert_int 1 #指定VRRP心跳包通告间隔时间,默认1秒
authentication {
auth_type PASS
auth_pass 1111
}
#虚拟IP
virtual_ipaddress {
192.168.40.199/24
}
track_script {
check_nginx
}
}
备用节点
vrrp_script check_nginx {
script "/etc/keepalived/check_nginx.sh"
}
vrrp_instance VI_1{
state BACKUP
interface ens160 #修改为实际网卡名
virtual_router_id 51 #VRRP路由ID实例,每个实例是唯一的
priority 90 #优先级,备服务器设置90
advert_int 1 #指定VRRP心跳包通告间隔时间,默认1秒
authentication {
auth_type PASS
auth_pass 1111
}
#虚拟IP
virtual_ipaddress {
192.168.59.199/24
}
track_script {
check_nginx
}
}
check_nginx.sh
#!/bin/bash
#1、判断Nginx是否存活
counter=$(ps -ef | grep nginx | grep sbin | egrep -cv "grep|$$")
if [ $counter -eq o ]; then
#2、如果不存活则尝试启动Nginx
service nginx start
sleep 2
#3、等待2秒后再次取一次Nginx状态
counter=$(ps -ef | grep nginx | grep sbin | egrep -cv "grep|$$")
#4、再次进行判断,如Nginx还不存活则停止Keepalived,让地址进行漂移
if [ $counter -eq O ]; then
service keepalived stop
fi
fi
重载配置
$ systemctl daemon-reload && systemctl start nginx
$ systemctl start keepalived && systemctl enable nginx keepalived
global_defs{ script_user root enable_script_security
}
修改 kubeadm.yaml
# 屏蔽掉,集群不需要
#localAPIEndpoint:
# advertiseAddress: 192.168.59.63
# bindPort: 6443
nodeRegistration:
criSocket: unix:///run/containerd/containerd.sock
imagePullPolicy: IfNotPresent
# 不需要指定了
# name: xuegod63
taints: null
...
kind: ClusterConfiguration
# 增加控制平面地址
controlPlaneEndpoint: 192.168.59.199:16443
重新安装
# 删除
$ kubeadm reset
# 重新安装
$ kubeadm init --config=kubeadm.yaml --ignore-preflight-errors=SystemVerification
扩容控制节点
xuegod63 证书拷贝到 xuegod62 xuegod64 上,
cd /root && mkdir -p /etc/kubernetes/pki/etcd && mkdir -p ~/.kube

xuegod63 执行命令
# 拷贝到62机器,64同理 前提两台机器 mkdir -p /etc/kubernetes/pki/etcd
$ scp /etc/kubernetes/pki/ca.crt xuegod62:/etc/kubernetes/pki/
$ scp /etc/kubernetes/pki/ca.key xuegod62:/etc/kubernetes/pki/
$ scp /etc/kubernetes/pki/sa.key xuegod62:/etc/kubernetes/pki/
$ scp /etc/kubernetes/pki/sa.pub xuegod62:/etc/kubernetes/pki/
$ scp /etc/kubernetes/pki/front-proxy-ca.crt xuegod62:/etc/kubernetes/pki/
$ scp /etc/kubernetes/pki/front-proxy-ca.key xuegod62:/etc/kubernetes/pki/
$ scp /etc/kubernetes/pki/etcd/ca.key xuegod62:/etc/kubernetes/pki/etcd/
$ scp /etc/kubernetes/pki/etcd/ca.crt xuegod62:/etc/kubernetes/pki/etcd/
加入控制节点 工作节点
# 63执行
$ kubeadm token create --print-join-command
kubeadm join 192.168.59.199:16443 --token 3hp728.bcs1utglt4yys4nn --discovery-token-ca-cert-hash sha256:f9dbb5c3f62f6fc78c4231764e423e1b3c340aa32e4708d17d8655a4d623e5b4
# 62 64执行
$ kubeadm join 192.168.59.199:16443 --token 3hp728.bcs1utglt4yys4nn --discovery-token-ca-cert-hash sha256:f9dbb5c3f62f6fc78c4231764e423e1b3c340aa32e4708d17d8655a4d623e5b4 --control-plane --ignore-preflight-errors=SystemVerification
# 66 工作节点执行
$ kubeadm join 192.168.59.199:16443 --token 3hp728.bcs1utglt4yys4nn --discovery-token-ca-cert-hash sha256:f9dbb5c3f62f6fc78c4231764e423e1b3c340aa32e4708d17d8655a4d623e5b4 --ignore-preflight-errors=SystemVerification
- 修改worker标签
# 默认是 none ,修改为 worker
kubectl label node xuegod66 node-role.kubernetes.io/worker=worker
安装 Calico
xuegod63 上安装即可
最终效果
$ kubectl get pods -n kube-system -owide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
calico-kube-controllers-fcb7b8c57-hjsqk 0/1 Running 0 37s 10.248.89.128 xuegod66 <none> <none>
calico-node-2278g 0/1 Running 0 38s 192.168.59.62 xuegod62 <none> <none>
calico-node-drf85 0/1 Running 0 38s 192.168.59.63 xuegod63 <none> <none>
calico-node-drvkw 0/1 Running 0 38s 192.168.59.66 xuegod66 <none> <none>
calico-node-hlzmh 0/1 Running 0 38s 192.168.59.64 xuegod64 <none> <none>
coredns-567c556887-lvdh5 0/1 Running 0 37m 10.248.34.129 xuegod63 <none> <none>
coredns-567c556887-xrbsf 0/1 Running 0 37m 10.248.34.128 xuegod63 <none> <none>
etcd-xuegod62 1/1 Running 0 10m 192.168.59.62 xuegod62 <none> <none>
etcd-xuegod63 1/1 Running 6 37m 192.168.59.63 xuegod63 <none> <none>
etcd-xuegod64 1/1 Running 0 12m 192.168.59.64 xuegod64 <none> <none>
kube-apiserver-xuegod62 1/1 Running 0 10m 192.168.59.62 xuegod62 <none> <none>
kube-apiserver-xuegod63 1/1 Running 0 37m 192.168.59.63 xuegod63 <none> <none>
kube-apiserver-xuegod64 1/1 Running 1 (12m ago) 11m 192.168.59.64 xuegod64 <none> <none>
kube-controller-manager-xuegod62 1/1 Running 0 9m30s 192.168.59.62 xuegod62 <none> <none>
kube-controller-manager-xuegod63 1/1 Running 1 (11m ago) 37m 192.168.59.63 xuegod63 <none> <none>
kube-controller-manager-xuegod64 1/1 Running 0 10m 192.168.59.64 xuegod64 <none> <none>
kube-proxy-c8dbj 1/1 Running 0 10m 192.168.59.62 xuegod62 <none> <none>
kube-proxy-cbp7m 1/1 Running 0 37m 192.168.59.63 xuegod63 <none> <none>
kube-proxy-cqz5c 1/1 Running 0 12m 192.168.59.64 xuegod64 <none> <none>
kube-proxy-hxp64 1/1 Running 0 4m58s 192.168.59.66 xuegod66 <none> <none>
kube-scheduler-xuegod62 1/1 Running 0 10m 192.168.59.62 xuegod62 <none> <none>
kube-scheduler-xuegod63 1/1 Running 1 (11m ago) 37m 192.168.59.63 xuegod63 <none> <none>
kube-scheduler-xuegod64 1/1 Running 0 11m 192.168.59.64 xuegod64 <none> <none>
# 走的是 59.199
$ kubectl config view
apiVersion: v1
clusters:
- cluster:
certificate-authority-data: DATA+OMITTED
server: https://192.168.59.199:16443
name: kubernetes
contexts:
- context:
cluster: kubernetes
user: kubernetes-admin
name: kubernetes-admin@kubernetes
current-context: kubernetes-admin@kubernetes
kind: Config
preferences: {}
users:
- name: kubernetes-admin
user:
client-certificate-data: DATA+OMITTED
client-key-data: DATA+OMITTED
测试网络和dns(同单master)
配置etcd高可用
etcd 根据控制节点的加入高可用配置有变化 最后加入的控制节点其etcd配置最全,可参考它修改到其他控制节点
$ cd /etc/kubernetes/manifests
$ vim etcd.yaml
修改 --initial-cluster后的字段
# 62最后加,最全,将其配置到其他控制节点上
--initial-cluster=xuegod62=https://192.168.59.62:2380,xuegod63=https://192.168.59.63:2380,xuegod64=https://192.168.59.64:2380
重启64 63
# 修改完了重启 63 64
$ systemctl restart kubelet
- 测试etcd是否正常
$ docker run --rm -it --net host -v /etc/kubernetes:/etc/kubernetes registry.cn-hangzhou.aliyuncs.com/google_containers/etcd:3.5.4-0 etcdctl --cert /etc/kubernetes/pki/etcd/peer.crt --key /etc/kubernetes/pki/etcd/peer.key --cacert /etc/kubernetes/pki/etcd/ca.crt member list
# 结果显示3台正常
1cb3edee20e48682, started, xuegod62, https://192.168.59.62:2380, https://192.168.59.62:2379, false
353ed8718e280918, started, xuegod63, https://192.168.59.63:2380, https://192.168.59.63:2379, false
3fa85738870d0f34, started, xuegod64, https://192.168.59.64:2380, https://192.168.59.64:2379, false
- 测试主从
$ docker run --rm -it --net host -v /etc/kubernetes:/etc/kubernetes registry.cn-hangzhou.aliyuncs.com/google_containers/etcd:3.5.4-0 etcdctl -w table --cert /etc/kubernetes/pki/etcd/peer.crt --key /etc/kubernetes/pki/etcd/peer.key --cacert /etc/kubernetes/pki/etcd/ca.crt --endpoints=https://192.168.59.63:2379,https://192.168.1.62:2379,https://192.168.1.64:2379 endpoint status --cluster
+----------------------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+
| ENDPOINT | ID | VERSION | DB SIZE | IS LEADER | IS LEARNER | RAFT TERM | RAFT INDEX | RAFT APPLIED INDEX | ERRORS |
+----------------------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+
| https://192.168.59.62:2379 | 1cb3edee20e48682 | 3.5.6 | 4.7 MB | false | false | 14 | 36358 | 36358 | |
| https://192.168.59.63:2379 | 353ed8718e280918 | 3.5.6 | 4.8 MB | false | false | 14 | 36358 | 36358 | |
| https://192.168.59.64:2379 | 3fa85738870d0f34 | 3.5.6 | 4.8 MB | true | false | 14 | 36358 | 36358 | |
- 测试64关机然后查看主从
$ ..
+----------------------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+
| ENDPOINT | ID | VERSION | DB SIZE | IS LEADER | IS LEARNER | RAFT TERM | RAFT INDEX | RAFT APPLIED INDEX | ERRORS |
+----------------------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+
| https://192.168.59.62:2379 | 1cb3edee20e48682 | 3.5.6 | 4.7 MB | true | false | 15 | 36594 | 36594 | |
| https://192.168.59.63:2379 | 353ed8718e280918 | 3.5.6 | 4.8 MB | false | false | 15 | 36594 | 36594 | |
+----------------------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+
- 开机后自动修复
# 因为重新选主过了62,所以还是保持62
+----------------------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+
| ENDPOINT | ID | VERSION | DB SIZE | IS LEADER | IS LEARNER | RAFT TERM | RAFT INDEX | RAFT APPLIED INDEX | ERRORS |
+----------------------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+
| https://192.168.59.62:2379 | 1cb3edee20e48682 | 3.5.6 | 4.7 MB | true | false | 16 | 37437 | 37437 | |
| https://192.168.59.63:2379 | 353ed8718e280918 | 3.5.6 | 4.8 MB | false | false | 16 | 37437 | 37437 | |
| https://192.168.59.64:2379 | 3fa85738870d0f34 | 3.5.6 | 4.8 MB | false | false | 16 | 37437 | 37437 | |
+----------------------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+