Kubernetes实现

snow chuai汇总、整理、撰写---2020/2/9

最后更新时间:2024-01-31


1. Kubeadm实现k8s集群---Version:1.28.2
1.1 准备须知
1) 拓扑

--------+-------------------------+-------------------------+------------
        |                         |                         |
    eth0|192.168.1.11         eth0|192.168.1.12         eth0|192.168.1.13
+-------+-------------+   +-------+------------+   +--------+-----------+
|  [ Manager Node ]   |   |   [ Master Node ]  |   |  [ Master Node ]   |
| [srv1.1000y.cloud]  |   | [srv2.1000y.cloud] |   | [srv3.1000y.cloud] |
+-------+-------------+   +-------+------------+   +--------+-----------+
        |                         |                         |
    eth0|192.168.1.14         eth0|192.168.1.15         eth0|192.168.1.16
+-------+-------------+   +-------+------------+   +--------+-----------+
|  [ Worker Node ]    |   |   [ Woker Node ]   |   |  [ Woker Node ]    |
| [srv1.1000y.cloud]  |   | [srv2.1000y.cloud] |   | [srv3.1000y.cloud] |
+-------+-------------+   +-------+------------+   +--------+-----------+
2) 配置要求 1. Master节点: 2Core/4G Mem 2. Worker节点: 4Core/8G Mem
3) 网段 物理网段: 192.168.1.0/24 Service网段: 10.96.0.0/12 Pods网段: 172.16.0.0/12
4) 系统环境 OS: CentOS Linux release 7.9.2009 (Core) Kernel: 6.5.5-1.el7.elrepo.x86_64 硬件架构: x86_64 NTP: 所有节点均同步完成 SELinux: 已关闭 Firewall: 已关闭 本地FQDN解析已完成 本地主机配置完成: IP地址、子网掩码、默认网关[必须]、DNS服务器IP地址、NTP服务器IP地址
5) 软件版本 kubernetes: 1.28.2 etcd: 3.5.9 cfssl: 1.6.4
1.2 前期准备---所有节点操作---[单独节点操作将会特殊说明]
1) 更新所有节点的内核---[版本要求: 4.18+]
[root@srv1 ~]# yum install wget psmisc vim net-tools nfs-utils telnet yum-utils device-mapper-persistent-data lvm2 git tar curl -y
[root@srv1 ~]# yum install elrepo-release -y
[root@srv1 ~]# sed -i "s@mirrorlist@#mirrorlist@g" /etc/yum.repos.d/elrepo.repo [root@srv1 ~]# sed -i "s@http://elrepo.org/linux@https://mirrors.aliyun.com/elrepo/@g" /etc/yum.repos.d/elrepo.repo
# 安装最新的内核 # 稳定版为kernel-ml,如需更新长期维护版本kernel-lt # 如想查看ml/lt的各可用的内核版本,可按以下命令操作 [root@srv1 ~]# yum --enablerepo=elrepo-kernel search kernel-ml --showduplicates
[root@srv1 ~]# yum --enablerepo=elrepo-kernel search kernel-lt --showduplicates
# 安装最新的ml内核并使用 [root@srv1 ~]# yum --enablerepo=elrepo-kernel install kernel-ml -y
[root@srv1 ~]# grubby --set-default $(ls /boot/vmlinuz-* | grep elrepo) ; grubby --default-kernel /boot/vmlinuz-6.5.5-1.el7.elrepo.x86_64 [root@srv1 ~]# reboot
[root@srv1 ~]# uname -r 6.5.5-1.el7.elrepo.x86_64
2) 修改网络配置 [root@srv1 ~]# cat > /etc/NetworkManager/conf.d/calico.conf << EOF [keyfile] unmanaged-devices=interface-name:cali*;interface-name:tunl* EOF ################################################## 文件说明 ##################################################
# 这个参数用于指定不由 NetworkManager 管理的设备。它由以下两个部分组成 # # interface-name:cali* # 表示以 "cali" 开头的接口名称被排除在 NetworkManager 管理之外。例如,"cali0", "cali1" 等接口不受 NetworkManager 管理。 # # interface-name:tunl* # 表示以 "tunl" 开头的接口名称被排除在 NetworkManager 管理之外。例如,"tunl0", "tunl1" 等接口不受 NetworkManager 管理。 # # 通过使用这个参数,可以将特定的接口排除在 NetworkManager 的管理范围之外,以便其他工具或进程可以独立地管理和配置这些接口。
################################################## 说明结束 ##################################################
[root@srv1 ~]# systemctl restart NetworkManager
3) 所有节点关闭swap [root@srv1 ~]# vim /etc/fstab
# # /etc/fstab # Created by anaconda on Sun Dec 5 14:41:17 2021 # # Accessible filesystems, by reference, are maintained under '/dev/disk' # See man pages fstab(5), findfs(8), mount(8) and/or blkid(8) for more info # UUID=eaca2437-8d59-47e4-bacb-4de06d26b7c8 / ext4 defaults 1 1 UUID=95c7c42a-7569-4f80-aec4-961628270ce7 /boot ext4 defaults 1 2 UUID=bf9e568d-ae5d-43d1-948c-90a45d731ec8 swap swap noauto,defaults 0 0
# sysctl -w vm.swappiness=0: 这个命令用于修改vm.swappiness参数的值为0,表示系统在物理内存充足时更倾向于使用物理内存而非交换分区。 [root@srv1 ~]# swapoff -a && sysctl -w vm.swappiness=0
4) 配置ulimit [root@srv1 ~]# ulimit -SHn 65535 [root@srv1 ~]# cat >> /etc/security/limits.conf <<EOF * soft nofile 655360 * hard nofile 131072 * soft nproc 655350 * hard nproc 655350 * soft memlock unlimited * hard memlock unlimited EOF
[root@srv1 ~]# reboot
################################################## 参数说明 ##################################################
# soft nofile 655360 # soft表示软限制,nofile表示一个进程可打开的最大文件数,默认值为1024。这里的软限制设置为655360,即一个进程可打开的最大文件数为655360。
# hard nofile 131072 # hard表示硬限制,即系统设置的最大值。nofile表示一个进程可打开的最大文件数,默认值为4096。这里的硬限制设置为131072,即系统设置的最大文件数为131072。
# soft nproc 655350 # soft表示软限制,nproc表示一个用户可创建的最大进程数,默认值为30720。这里的软限制设置为655350,即一个用户可创建的最大进程数为655350。
# hard nproc 655350 # hard表示硬限制,即系统设置的最大值。nproc表示一个用户可创建的最大进程数,默认值为4096。这里的硬限制设置为655350,即系统设置的最大进程数为655350。
# seft memlock unlimited # seft表示软限制,memlock表示一个进程可锁定在RAM中的最大内存,默认值为64 KB。这里的软限制设置为unlimited,即一个进程可锁定的最大内存为无限制。
# hard memlock unlimited # hard表示硬限制,即系统设置的最大值。memlock表示一个进程可锁定在RAM中的最大内存,默认值为64 KB。这里的硬限制设置为unlimited,即系统设置的最大内存锁定为无限制。
################################################## 说明结束 ##################################################
5) 配置ssh免密登录---于srv1上操作 [root@srv1 ~]# ssh-keygen -q -N '' [root@srv1 ~]# ssh-copy-id srv2.1000y.cloud [root@srv1 ~]# ssh-copy-id srv3.1000y.cloud [root@srv1 ~]# ssh-copy-id srv4.1000y.cloud [root@srv1 ~]# ssh-copy-id srv5.1000y.cloud [root@srv1 ~]# ssh-copy-id srv6.1000y.cloud
6) 安装ipvsadm等工具 [root@srv1 ~]# yum install ipvsadm ipset sysstat conntrack libseccomp -y
[root@srv1 ~]# cat >> /etc/modules-load.d/ipvs.conf <<EOF ip_vs ip_vs_rr ip_vs_wrr ip_vs_sh nf_conntrack ip_tables ip_set xt_set ipt_set ipt_rpfilter ipt_REJECT ipip EOF
[root@srv1 ~]# systemctl restart systemd-modules-load.service
[root@srv1 ~]# lsmod | grep -e ip_vs -e nf_conntrack ip_vs_sh 12288 0 ip_vs_wrr 12288 0 ip_vs_rr 12288 0 ip_vs 200704 6 ip_vs_rr,ip_vs_sh,ip_vs_wrr nf_conntrack 188416 1 ip_vs nf_defrag_ipv6 24576 2 nf_conntrack,ip_vs nf_defrag_ipv4 12288 1 nf_conntrack libcrc32c 12288 2 nf_conntrack,ip_vs
7) 修改内核参数 [root@srv1 ~]# cat <<EOF > /etc/sysctl.d/k8s.conf net.ipv4.ip_forward = 1 net.bridge.bridge-nf-call-iptables = 1 fs.may_detach_mounts = 1 vm.overcommit_memory=1 vm.panic_on_oom=0 fs.inotify.max_user_watches=89100 fs.file-max=52706963 fs.nr_open=52706963 net.netfilter.nf_conntrack_max=2310720
net.ipv4.tcp_keepalive_time = 600 net.ipv4.tcp_keepalive_probes = 3 net.ipv4.tcp_keepalive_intvl =15 net.ipv4.tcp_max_tw_buckets = 36000 net.ipv4.tcp_tw_reuse = 1 net.ipv4.tcp_max_orphans = 327680 net.ipv4.tcp_orphan_retries = 3 net.ipv4.tcp_syncookies = 1 net.ipv4.tcp_max_syn_backlog = 16384 net.ipv4.ip_conntrack_max = 65536 net.ipv4.tcp_max_syn_backlog = 16384 net.ipv4.tcp_timestamps = 0 net.core.somaxconn = 16384
net.ipv6.conf.all.disable_ipv6 = 0 net.ipv6.conf.default.disable_ipv6 = 0 net.ipv6.conf.lo.disable_ipv6 = 0 net.ipv6.conf.all.forwarding = 1 EOF

[root@srv1 ~]# sysctl --system * Applying /usr/lib/sysctl.d/00-system.conf ... * Applying /usr/lib/sysctl.d/10-default-yama-scope.conf ... kernel.yama.ptrace_scope = 0 ...... ...... ...... ...... ...... ...... net.ipv6.conf.lo.disable_ipv6 = 0 net.ipv6.conf.all.forwarding = 1 * Applying /etc/sysctl.conf ...
8) 修改hosts文件---无论有无DNS Server都需要建立 [root@srv1 ~]# vim /etc/hosts 127.0.0.1 localhost localhost.localdomain localhost4 localhost4.localdomain4 ::1 localhost localhost.localdomain localhost6 localhost6.localdomain6
192.168.1.11 srv1 srv1.1000y.cloud 192.168.1.12 srv2 srv2.1000y.cloud 192.168.1.13 srv3 srv3.1000y.cloud 192.168.1.14 srv4 srv4.1000y.cloud 192.168.1.15 srv5 srv5.1000y.cloud 192.168.1.16 srv6 srv6.1000y.cloud
[root@srv1 ~]# for node in srv2.1000y.cloud srv3.1000y.cloud srv4.1000y.cloud srv5.1000y.cloud srv6.1000y.cloud do echo $node scp /etc/hosts $node:/etc/hosts done
1.3 etcd下载并安装---仅在Srv1上操作
1) 下载etcd
https://github.com/etcd-io/etcd/releases/ 下载etcd
2) 解压etcd安装包 [root@srv1 ~]# tar -xf etcd*.tar.gz && mv etcd-*/etcd /usr/local/bin/ && mv etcd-*/etcdctl /usr/local/bin/
3) 统计总工具数 [root@srv1 ~]# ls /usr/local/bin/ etcd etcdctl
4) 确认版本号 [root@srv1 ~]# etcdctl version etcdctl version: 3.5.9 API version: 3.5
5) 将组件发送给其他的Master节点 [root@srv1 ~]# master='srv2.1000y.cloud srv3.1000y.cloud' [root@srv1 ~]# worker='srv4.1000y.cloud srv5.1000y.cloud srv6.1000y.cloud'
[root@srv1 ~]# for NODE in $master do echo $NODE scp /usr/local/bin/etcd* $NODE:/usr/local/bin/ done
1.4 创建并产生证书---仅在Srv1上操作
1) 下载证书生成工具
https://github.com/cloudflare/cfssl/releases/
2) 安装证书工具 [root@srv1 ~]# cp cfssl_1.6.4_linux_amd64 /usr/local/bin/cfssl [root@srv1 ~]# cp cfssljson_1.6.4_linux_amd64 /usr/local/bin/cfssljson [root@srv1 ~]# cp cfssl-certinfo_1.6.4_linux_amd64 /usr/local/bin/cfssl-certinfo [root@srv1 ~]# chmod +x /usr/local/bin/cfssl*
3) 生成etcd证书 # 在所有Master节点操作 [root@srv1 ~]# mkdir /etc/etcd/ssl -p
[root@srv1 ~]# mkdir -p k8s/pki # 在Srv1节点操作 [root@srv1 ~]# cd k8s/pki/
[root@srv1 pki]# cat > ca-config.json << EOF { "signing": { "default": { "expiry": "876000h" }, "profiles": { "kubernetes": { "usages": [ "signing", "key encipherment", "server auth", "client auth" ], "expiry": "876000h" } } } } EOF
################################################## 参数说明 ##################################################
# 这段配置文件是用于配置加密和认证签名的一些参数。 # # 在这里,有两个部分:`signing`和`profiles`。 # # `signing`包含了默认签名配置和配置文件。 # 默认签名配置`default`指定了证书的过期时间为`876000h`。`876000h`表示证书有效期为100年。 # # `profiles`部分定义了不同的证书配置文件。 # 在这里,只有一个配置文件`kubernetes`。它包含了以下`usages`和过期时间`expiry`: # # 1. `signing`:用于对其他证书进行签名 # 2. `key encipherment`:用于加密和解密传输数据 # 3. `server auth`:用于服务器身份验证 # 4. `client auth`:用于客户端身份验证 # # 对于`kubernetes`配置文件,证书的过期时间也是`876000h`,即100年。
################################################## 说明结束 ##################################################
[root@srv1 ~]# cat > etcd-ca-csr.json << EOF { "CN": "etcd", "key": { "algo": "rsa", "size": 2048 }, "names": [ { "C": "CN", "ST": "Beijing", "L": "Beijing", "O": "etcd", "OU": "Etcd Security" } ], "ca": { "expiry": "876000h" } } EOF
################################################## 参数说明 ##################################################
# 这是一个用于生成证书签名请求(Certificate Signing Request,CSR)的JSON配置文件。JSON配置文件指定了生成证书签名请求所需的数据。 # # - "CN": "etcd" 指定了希望生成的证书的CN字段(Common Name),即证书的主题,通常是该证书标识的实体的名称。 # - "key": {} 指定了生成证书所使用的密钥的配置信息。"algo": "rsa" 指定了密钥的算法为RSA,"size": 2048 指定了密钥的长度为2048位。 # - "names": [] 包含了生成证书时所需的实体信息。在这个例子中,只包含了一个实体,其相关信息如下: # - "C": "CN" 指定了实体的国家/地区代码,这里是中国。 # - "ST": "Beijing" 指定了实体所在的省/州。 # - "L": "Beijing" 指定了实体所在的城市。 # - "O": "etcd" 指定了实体的组织名称。 # - "OU": "Etcd Security" 指定了实体所属的组织单位。 # - "ca": {} 指定了生成证书时所需的CA(Certificate Authority)配置信息。 # - "expiry": "876000h" 指定了证书的有效期,这里是876000小时。 # # 生成证书签名请求时,可以使用这个JSON配置文件作为输入,根据配置文件中的信息生成相应的CSR文件。然后,可以将CSR文件发送给CA进行签名,以获得有效的证书。 # # 生成etcd证书和etcd证书的key(如果你觉得以后可能会扩容,可以在ip那多写几个预留出来)
################################################## 说明结束 ##################################################
[root@srv1 pki]# cfssl gencert -initca etcd-ca-csr.json | cfssljson -bare /etc/etcd/ssl/etcd-ca 2023/10/01 09:43:59 [INFO] generating a new CA key and certificate from CSR 2023/10/01 09:43:59 [INFO] generate received request 2023/10/01 09:43:59 [INFO] received CSR 2023/10/01 09:43:59 [INFO] generating key: rsa-2048 2023/10/01 09:44:00 [INFO] encoded CSR 2023/10/01 09:44:00 [INFO] signed certificate with serial number 212790503469786088557368097710578918172559758726
################################################## 参数说明 ##################################################
# cfssl是一个用于生成TLS/SSL证书的工具,它支持PKI、JSON格式配置文件以及与许多其他集成工具的配合使用。 # # gencert参数表示生成证书的操作。-initca参数表示初始化一个CA(证书颁发机构)。CA是用于签发其他证书的根证书。etcd-ca-csr.json # 是一个JSON格式的配置文件,其中包含了CA的详细信息,如私钥、公钥、有效期等。这个文件提供了生成CA证书所需的信息。 # # | 符号表示将上一个命令的输出作为下一个命令的输入。 # # cfssljson是cfssl工具的一个子命令,用于格式化cfssl生成的JSON数据。 -bare参数表示直接输出裸证书,即只生成证书文件,不包含其他 # 格式的文件。/etc/etcd/ssl/etcd-ca是指定生成的证书文件的路径和名称。 # # 所以,这条命令的含义是使用cfssl工具根据配置文件ca-csr.json生成一个CA证书,并将证书文件保存在/etc/etcd/ssl/etcd-ca路径下。
################################################## 说明结束 ##################################################
[root@srv1 pki]# cat > etcd-csr.json << EOF { "CN": "etcd", "key": { "algo": "rsa", "size": 2048 }, "names": [ { "C": "CN", "ST": "Beijing", "L": "Beijing", "O": "etcd", "OU": "Etcd Security" } ] } EOF
################################################## 参数说明 ##################################################
# 这段代码是一个JSON格式的配置文件,用于生成一个证书签名请求(Certificate Signing Request,CSR)。 # # 首先,"CN"字段指定了该证书的通用名称(Common Name),这里设为"etcd"。 # # 接下来,"key"字段指定了密钥的算法("algo"字段)和长度("size"字段),此处使用的是RSA算法,密钥长度为2048位。 # # 最后,"names"字段是一个数组,其中包含了一个名字对象,用于指定证书中的一些其他信息。这个名字对象包含了以下字段: # - "C"字段指定了国家代码(Country),这里设置为"CN"。 # - "ST"字段指定了省份(State)或地区,这里设置为"Beijing"。 # - "L"字段指定了城市(Locality),这里设置为"Beijing"。 # - "O"字段指定了组织(Organization),这里设置为"etcd"。 # - "OU"字段指定了组织单元(Organizational Unit),这里设置为"Etcd Security"。 # # 这些字段将作为证书的一部分,用于标识和验证证书的使用范围和颁发者等信息。 ################################################## 说明结束 ##################################################
[root@srv1 pki]# cfssl gencert \ -ca=/etc/etcd/ssl/etcd-ca.pem \ -ca-key=/etc/etcd/ssl/etcd-ca-key.pem \ -config=ca-config.json \ -hostname=127.0.0.1,srv1.1000y.cloud,srv2.1000y.cloud,srv3.1000y.cloud,192.168.1.11,192.168.1.12,192.168.1.13 \ -profile=kubernetes \ etcd-csr.json | cfssljson -bare /etc/etcd/ssl/etcd 2023/10/01 09:49:31 [INFO] generate received request 2023/10/01 09:49:31 [INFO] received CSR 2023/10/01 09:49:31 [INFO] generating key: rsa-2048 2023/10/01 09:49:33 [INFO] encoded CSR 2023/10/01 09:49:33 [INFO] signed certificate with serial number 201107719211860724407267705003893888676796259892
################################################## 参数说明 ##################################################
# 这是一条使用cfssl生成etcd证书的命令,下面是各个参数的解释: # # -ca=/etc/etcd/ssl/etcd-ca.pem:指定用于签名etcd证书的CA文件的路径。 # -ca-key=/etc/etcd/ssl/etcd-ca-key.pem:指定用于签名etcd证书的CA私钥文件的路径。 # -config=ca-config.json:指定CA配置文件的路径,该文件定义了证书的有效期、加密算法等设置。 # -hostname=xxxx:指定要为etcd生成证书的主机名和IP地址列表。 # -profile=kubernetes:指定使用的证书配置文件,该文件定义了证书的用途和扩展属性。 # etcd-csr.json:指定etcd证书请求的JSON文件的路径,该文件包含了证书请求的详细信息。 # | cfssljson -bare /etc/etcd/ssl/etcd:通过管道将cfssl命令的输出传递给cfssljson命令,并使用-bare参数指定输出文件的 # 前缀路径,这里将生成etcd证书的.pem和-key.pem文件。 # # 这条命令的作用是使用指定的CA证书和私钥,根据证书请求的JSON文件和配置文件生成etcd的证书文件。
################################################## 说明结束 ##################################################
4) 将etcd证书复制到其他的Master节点 [root@srv1 pki]# for NODE in $master do ssh $NODE "mkdir -p /etc/etcd/ssl" for FILE in etcd-ca-key.pem etcd-ca.pem etcd-key.pem etcd.pem do scp /etc/etcd/ssl/${FILE} $NODE:/etc/etcd/ssl/${FILE} done done
1.5 etcd集群配置
1) 创建etcd配置文件---三台Master均操作
[root@srv1 pki]# cat > /etc/etcd/etcd.config.yml << EOF
name: 'srv1.1000y.cloud'
data-dir: /var/lib/etcd
wal-dir: /var/lib/etcd/wal
snapshot-count: 5000
heartbeat-interval: 100
election-timeout: 1000
quota-backend-bytes: 0
listen-peer-urls: 'https://192.168.1.11:2380'
listen-client-urls: 'https://192.168.1.11:2379,http://127.0.0.1:2379'
max-snapshots: 3
max-wals: 5
cors:
initial-advertise-peer-urls: 'https://192.168.1.11:2380'
advertise-client-urls: 'https://192.168.1.11:2379'
discovery:
discovery-fallback: 'proxy'
discovery-proxy:
discovery-srv:
initial-cluster: 'srv1.1000y.cloud=https://192.168.1.11:2380,srv2.1000y.cloud=https://192.168.1.12:2380,srv3.1000y.cloud=https://192.168.1.13:2380'
initial-cluster-token: 'etcd-k8s-cluster'
initial-cluster-state: 'new'
strict-reconfig-check: false
enable-v2: true
enable-pprof: true
proxy: 'off'
proxy-failure-wait: 5000
proxy-refresh-interval: 30000
proxy-dial-timeout: 1000
proxy-write-timeout: 5000
proxy-read-timeout: 0
client-transport-security:
  cert-file: '/etc/etcd/ssl/etcd.pem'
  key-file: '/etc/etcd/ssl/etcd-key.pem'
  client-cert-auth: true
  trusted-ca-file: '/etc/etcd/ssl/etcd-ca.pem'
  auto-tls: true
peer-transport-security:
  cert-file: '/etc/etcd/ssl/etcd.pem'
  key-file: '/etc/etcd/ssl/etcd-key.pem'
  peer-client-cert-auth: true
  trusted-ca-file: '/etc/etcd/ssl/etcd-ca.pem'
  auto-tls: true
debug: false
log-package-levels:
log-outputs: [default]
force-new-cluster: false
EOF
################################################## 参数说明 ##################################################
- `name`:指定了当前节点的名称,用于集群中区分不同的节点。 - `data-dir`:指定了 etcd 数据的存储目录。 - `wal-dir`:指定了 etcd 数据写入磁盘的目录。 - `snapshot-count`:指定了触发快照的事务数量。 - `heartbeat-interval`:指定了 etcd 集群中节点之间的心跳间隔。 - `election-timeout`:指定了选举超时时间。 - `quota-backend-bytes`:指定了存储的限额,0 表示无限制。 - `listen-peer-urls`:指定了节点之间通信的 URL,使用 HTTPS 协议。 - `listen-client-urls`:指定了客户端访问 etcd 集群的 URL,同时提供了本地访问的 URL。 - `max-snapshots`:指定了快照保留的数量。 - `max-wals`:指定了日志保留的数量。 - `initial-advertise-peer-urls`:指定了节点之间通信的初始 URL。 - `advertise-client-urls`:指定了客户端访问 etcd 集群的初始 URL。 - `discovery`:定义了 etcd 集群发现相关的选项。 - `initial-cluster`:指定了 etcd 集群的初始成员。 - `initial-cluster-token`:指定了集群的 token。 - `initial-cluster-state`:指定了集群的初始状态。 - `strict-reconfig-check`:指定了严格的重新配置检查选项。 - `enable-v2`:启用了 v2 API。 - `enable-pprof`:启用了性能分析。 - `proxy`:设置了代理模式。 - `client-transport-security`:客户端的传输安全配置。 - `peer-transport-security`:节点之间的传输安全配置。 - `debug`:是否启用调试模式。 - `log-package-levels`:日志的输出级别。 - `log-outputs`:指定了日志的输出类型。 - `force-new-cluster`:是否强制创建一个新的集群。
这些参数和选项可以根据实际需求进行调整和配置。 ################################################## 说明结束 ##################################################
[root@srv2 ~]# cat > /etc/etcd/etcd.config.yml << EOF name: 'srv2.1000y.cloud' data-dir: /var/lib/etcd wal-dir: /var/lib/etcd/wal snapshot-count: 5000 heartbeat-interval: 100 election-timeout: 1000 quota-backend-bytes: 0 listen-peer-urls: 'https://192.168.1.12:2380' listen-client-urls: 'https://192.168.1.12:2379,http://127.0.0.1:2379' max-snapshots: 3 max-wals: 5 cors: initial-advertise-peer-urls: 'https://192.168.1.12:2380' advertise-client-urls: 'https://192.168.1.12:2379' discovery: discovery-fallback: 'proxy' discovery-proxy: discovery-srv: initial-cluster: 'srv1.1000y.cloud=https://192.168.1.11:2380,srv2.1000y.cloud=https://192.168.1.12:2380,srv3.1000y.cloud=https://192.168.1.13:2380' initial-cluster-token: 'etcd-k8s-cluster' initial-cluster-state: 'new' strict-reconfig-check: false enable-v2: true enable-pprof: true proxy: 'off' proxy-failure-wait: 5000 proxy-refresh-interval: 30000 proxy-dial-timeout: 1000 proxy-write-timeout: 5000 proxy-read-timeout: 0 client-transport-security: cert-file: '/etc/etcd/ssl/etcd.pem' key-file: '/etc/etcd/ssl/etcd-key.pem' client-cert-auth: true trusted-ca-file: '/etc/etcd/ssl/etcd-ca.pem' auto-tls: true peer-transport-security: cert-file: '/etc/etcd/ssl/etcd.pem' key-file: '/etc/etcd/ssl/etcd-key.pem' peer-client-cert-auth: true trusted-ca-file: '/etc/etcd/ssl/etcd-ca.pem' auto-tls: true debug: false log-package-levels: log-outputs: [default] force-new-cluster: false EOF
[root@srv3 ~]# cat > /etc/etcd/etcd.config.yml << EOF name: 'srv3.1000y.cloud' data-dir: /var/lib/etcd wal-dir: /var/lib/etcd/wal snapshot-count: 5000 heartbeat-interval: 100 election-timeout: 1000 quota-backend-bytes: 0 listen-peer-urls: 'https://192.168.1.13:2380' listen-client-urls: 'https://192.168.1.13:2379,http://127.0.0.1:2379' max-snapshots: 3 max-wals: 5 cors: initial-advertise-peer-urls: 'https://192.168.1.13:2380' advertise-client-urls: 'https://192.168.1.13:2379' discovery: discovery-fallback: 'proxy' discovery-proxy: discovery-srv: initial-cluster: 'srv1.1000y.cloud=https://192.168.1.11:2380,srv2.1000y.cloud=https://192.168.1.12:2380,srv3.1000y.cloud=https://192.168.1.13:2380' initial-cluster-token: 'etcd-k8s-cluster' initial-cluster-state: 'new' strict-reconfig-check: false enable-v2: true enable-pprof: true proxy: 'off' proxy-failure-wait: 5000 proxy-refresh-interval: 30000 proxy-dial-timeout: 1000 proxy-write-timeout: 5000 proxy-read-timeout: 0 client-transport-security: cert-file: '/etc/etcd/ssl/etcd.pem' key-file: '/etc/etcd/ssl/etcd-key.pem' client-cert-auth: true trusted-ca-file: '/etc/etcd/ssl/etcd-ca.pem' auto-tls: true peer-transport-security: cert-file: '/etc/etcd/ssl/etcd.pem' key-file: '/etc/etcd/ssl/etcd-key.pem' peer-client-cert-auth: true trusted-ca-file: '/etc/etcd/ssl/etcd-ca.pem' auto-tls: true debug: false log-package-levels: log-outputs: [default] force-new-cluster: false EOF
2) 创建etcd service文件---三台Master均操作 [root@srv1 pki]# cat > /usr/lib/systemd/system/etcd.service << EOF [Unit] Description=Etcd Service Documentation=https://coreos.com/etcd/docs/latest/ After=network.target
[Service] Type=notify ExecStart=/usr/local/bin/etcd --config-file=/etc/etcd/etcd.config.yml Restart=on-failure RestartSec=10 LimitNOFILE=65536
[Install] WantedBy=multi-user.target Alias=etcd3.service EOF

3) 创建etcd证书目录---三台Master均操作 [root@srv1 pki]# systemctl daemon-reload [root@srv1 pki]# systemctl enable --now etcd.service
4) 确认etcd集群状态 [root@srv1 pki]# export ETCDCTL_API=3 [root@srv1 pki]# etcdctl \ --endpoints="192.168.1.11:2379,192.168.1.12:2379,192.168.1.13:2379" \ --cacert=/etc/etcd/ssl/etcd-ca.pem \ --cert=/etc/etcd/ssl/etcd.pem \ --key=/etc/etcd/ssl/etcd-key.pem \ endpoint status --write-out=table +-------------------+------------------+---------+---------+-----------+------------+-----------+ | ENDPOINT | ID | VERSION | DB SIZE | IS LEADER | IS LEARNER | RAFT TERM | +-------------------+------------------+---------+---------+-----------+------------+-----------+ | 192.168.1.11:2379 | ace8d5b0766b3d92 | 3.5.9 | 20 kB | true | false | 2 | | 192.168.1.12:2379 | ac7e57d44f030e8 | 3.5.9 | 20 kB | false | false | 2 | | 192.168.1.13:2379 | 40ba37809e1a423f | 3.5.9 | 20 kB | false | false | 2 | +-------------------+------------------+---------+---------+-----------+------------+-----------+ ------------+--------------------+--------+ RAFT INDEX | RAFT APPLIED INDEX | ERRORS | ------------+--------------------+--------+ 9 | 9 | | 9 | 9 | | 9 | 9 | | ------------+--------------------+--------+
[root@srv1 pki]# etcdctl \ --endpoints="192.168.1.11:2379,192.168.1.12:2379,192.168.1.13:2379" \ --cacert=/etc/etcd/ssl/etcd-ca.pem \ --cert=/etc/etcd/ssl/etcd.pem \ --key=/etc/etcd/ssl/etcd-key.pem \ endpoint health --write-out=table +-------------------+--------+-------------+-------+ | ENDPOINT | HEALTH | TOOK | ERROR | +-------------------+--------+-------------+-------+ | 192.168.1.11:2379 | true | 27.632698ms | | | 192.168.1.13:2379 | true | 56.126148ms | | | 192.168.1.12:2379 | true | 58.610262ms | | +-------------------+--------+-------------+-------+
4) 测试 [root@srv1 ~]# etcdctl \ --endpoints="192.168.1.11:2379,192.168.1.12:2379,192.168.1.13:2379" \ --cacert=/etc/etcd/ssl/etcd-ca.pem \ --cert=/etc/etcd/ssl/etcd.pem \ --key=/etc/etcd/ssl/etcd-key.pem \ put 1000y "Hello World" OK
[root@srv1 ~]# etcdctl \ --endpoints="192.168.1.11:2379,192.168.1.12:2379,192.168.1.13:2379" \ --cacert=/etc/etcd/ssl/etcd-ca.pem \ --cert=/etc/etcd/ssl/etcd.pem \ --key=/etc/etcd/ssl/etcd-key.pem \ get 1000y 1000y Hello World
[root@srv1 ~]# systemctl stop etcd
[root@srv1 ~]# etcdctl \ --endpoints="192.168.1.11:2379,192.168.1.12:2379,192.168.1.13:2379" \ --cacert=/etc/etcd/ssl/etcd-ca.pem \ --cert=/etc/etcd/ssl/etcd.pem \ --key=/etc/etcd/ssl/etcd-key.pem \ endpoint health --write-out=table {"level":"warn","ts":"2023-10-01T13:31:32.676026+0800","logger":"client","caller":"v3@v3.5.9/retry_interceptor.go:62", "msg":"retrying of unary invoker failed","target":"etcd-endpoints://0xc0003088c0/192.168.1.11:2379", "attempt":0,"error":"rpc error: code = DeadlineExceeded desc = latest balancer error: last connection error: connection error: desc = \"transport: Error while dialing dial tcp 192.168.1.11:2379: connect: connection refused\""} +-------------------+--------+--------------+---------------------------+ | ENDPOINT | HEALTH | TOOK | ERROR | +-------------------+--------+--------------+---------------------------+ | 192.168.1.12:2379 | true | 27.977355ms | | | 192.168.1.13:2379 | true | 29.57418ms | | | 192.168.1.11:2379 | false | 5.002483674s | context deadline exceeded | +-------------------+--------+--------------+---------------------------+
[root@srv1 ~]# etcdctl \ --endpoints="192.168.1.11:2379,192.168.1.12:2379,192.168.1.13:2379" \ --cacert=/etc/etcd/ssl/etcd-ca.pem \ --cert=/etc/etcd/ssl/etcd.pem \ --key=/etc/etcd/ssl/etcd-key.pem \ get 1000y 1000y Hello World
[root@srv1 ~]# systemctl start etcd
[root@srv1 ~]# etcdctl \ --endpoints="192.168.1.11:2379,192.168.1.12:2379,192.168.1.13:2379" \ --cacert=/etc/etcd/ssl/etcd-ca.pem \ --cert=/etc/etcd/ssl/etcd.pem \ --key=/etc/etcd/ssl/etcd-key.pem \ del 1000y 1
1.6 Kubernets HA配置---HAProxy+KeepAlived---三台Master均操作
1) 安装HAProxy及KeepAlived
[root@srv1 ~]# yum install haproxy keepalived -y
2) 移除原配置文件 [root@srv1 ~]# mv /etc/haproxy/haproxy.cfg /etc/haproxy/haproxy.cfg.bak [root@srv1 ~]# mv /etc/keepalived/keepalived.conf /etc/keepalived/keepalived.conf.bak
3) 配置HAProxy---3台主机配置文件一致 [root@srv1 ~]# cat >/etc/haproxy/haproxy.cfg << EOF global maxconn 2000 ulimit-n 16384 log 127.0.0.1 local0 err stats timeout 30s
defaults log global mode http option httplog timeout connect 5000 timeout client 50000 timeout server 50000 timeout http-request 15s timeout http-keep-alive 15s
frontend monitor-in bind *:33305 mode http option httplog monitor-uri /monitor
frontend k8s-master bind 0.0.0.0:16443 bind 127.0.0.1:16443 mode tcp option tcplog tcp-request inspect-delay 5s default_backend k8s-master
backend k8s-master mode tcp option tcplog option tcp-check balance roundrobin default-server inter 10s downinter 5s rise 2 fall 2 slowstart 60s maxconn 250 maxqueue 256 weight 100 server srv1.1000y.cloud 192.168.1.11:6443 check server srv2.1000y.cloud 192.168.1.12:6443 check server srv3.1000y.cloud 192.168.1.13:6443 check EOF

################################################## 参数说明 ##################################################
HAProxy负载均衡器配置各部分的解释: 1. global: - maxconn 2000: 设置每个进程的最大连接数为2000。 - ulimit-n 16384: 设置每个进程的最大文件描述符数为16384。 - log 127.0.0.1 local0 err: 指定日志的输出地址为本地主机的127.0.0.1,并且只记录错误级别的日志。 - stats timeout 30s: 设置查看负载均衡器统计信息的超时时间为30秒。
2. defaults: - log global: 使默认日志与global部分相同。 - mode http: 设定负载均衡器的工作模式为HTTP模式。 - option httplog: 使负载均衡器记录HTTP协议的日志。 - timeout connect 5000: 设置与后端服务器建立连接的超时时间为5秒。 - timeout client 50000: 设置与客户端的连接超时时间为50秒。 - timeout server 50000: 设置与后端服务器连接的超时时间为50秒。 - timeout http-request 15s: 设置处理HTTP请求的超时时间为15秒。 - timeout http-keep-alive 15s: 设置保持HTTP连接的超时时间为15秒。
3. frontend monitor-in: - bind *:33305: 监听所有IP地址的33305端口。 - mode http: 设定frontend的工作模式为HTTP模式。 - option httplog: 记录HTTP协议的日志。 - monitor-uri /monitor: 设置监控URI为/monitor。
4. frontend k8s-master: - bind 0.0.0.0:16443: 监听所有IP地址的16443端口。 - bind 127.0.0.1:16443: 监听本地主机的16443端口。 - mode tcp: 设定frontend的工作模式为TCP模式。 - option tcplog: 记录TCP协议的日志。 - tcp-request inspect-delay 5s: 设置在接收到请求后延迟5秒进行检查。 - default_backend k8s-master: 设置默认的后端服务器组为k8s-master。
5. backend k8s-master: - mode tcp: 设定backend的工作模式为TCP模式。 - option tcplog: 记录TCP协议的日志。 - option tcp-check: 启用TCP检查功能。 - balance roundrobin: 使用轮询算法进行负载均衡。 - default-server inter 10s downinter 5s rise 2 fall 2 slowstart 60s maxconn 250 maxqueue 256 weight 100: 设置默认的服务器参数。 - server srv1.1000y.cloud 192.168.1.11:6443 check: 增加一个名为k8s-master01的服务器,IP地址为192.168.0.31,端口号为6443,并对其进行健康检查。 - server srv2.1000y.cloud 192.168.1.12:6443 check: 增加一个名为k8s-master02的服务器,IP地址为192.168.0.32,端口号为6443,并对其进行健康检查。 - server srv3.1000y.cloud 192.168.1.13:6443 check: 增加一个名为k8s-master03的服务器,IP地址为192.168.0.33,端口号为6443,并对其进行健康检查。
以上就是这段配置代码的详细解释。它主要定义了全局配置、默认配置、前端监听和后端服务器组的相关参数和设置。通过这些配置,可以实现负载均衡和监控功能。 ################################################## 说明结束 ##################################################
4) 配置KeepAlived---不同Master主机配置文件不一样 [root@srv1 ~]# cat > /etc/keepalived/keepalived.conf << EOF ! Configuration File for keepalived
global_defs { router_id LVS_DEVEL script_user root enable_script_security }
vrrp_script chk_apiserver { script "/etc/keepalived/check_apiserver.sh" interval 5 weight -5 fall 2 rise 1 }
vrrp_instance VI_1 { state MASTER # 注意网卡名 interface eth0 mcast_src_ip 192.168.1.11 virtual_router_id 51 priority 100 nopreempt advert_int 2 authentication { auth_type PASS auth_pass K8SHA_KA_AUTH } virtual_ipaddress { 192.168.1.21 } track_script { chk_apiserver } } EOF

[root@srv2 ~]# cat > /etc/keepalived/keepalived.conf << EOF ! Configuration File for keepalived
global_defs { router_id LVS_DEVEL script_user root enable_script_security }
vrrp_script chk_apiserver { script "/etc/keepalived/check_apiserver.sh" interval 5 weight -5 fall 2 rise 1 }
vrrp_instance VI_1 { state BACKUP # 注意网卡名 interface eth0 mcast_src_ip 192.168.1.12 virtual_router_id 51 priority 80 nopreempt advert_int 2 authentication { auth_type PASS auth_pass K8SHA_KA_AUTH } virtual_ipaddress { 192.168.1.21 } track_script { chk_apiserver } } EOF

[root@srv3 ~]# cat > /etc/keepalived/keepalived.conf << EOF ! Configuration File for keepalived
global_defs { router_id LVS_DEVEL script_user root enable_script_security }
vrrp_script chk_apiserver { script "/etc/keepalived/check_apiserver.sh" interval 5 weight -5 fall 2 rise 1 }
vrrp_instance VI_1 { state BACKUP # 注意网卡名 interface eth0 mcast_src_ip 192.168.1.13 virtual_router_id 51 priority 50 nopreempt advert_int 2 authentication { auth_type PASS auth_pass K8SHA_KA_AUTH } virtual_ipaddress { 192.168.1.21 } track_script { chk_apiserver } } EOF

################################################## 参数说明 ##################################################
- `global_defs`部分定义了全局参数。 - `router_id`参数指定了当前路由器的标识,这里设置为"LVS_DEVEL"。
- `vrrp_script`部分定义了一个VRRP脚本。`chk_apiserver`是脚本的名称, - `script`参数指定了脚本的路径。该脚本每5秒执行一次,返回值为0表示服务正常,返回值为1表示服务异常。 - `weight`参数指定了根据脚本返回的值来调整优先级,这里设置为-5。 - `fall`参数指定了失败阈值,当连续2次脚本返回值为1时认为服务异常。 - `rise`参数指定了恢复阈值,当连续1次脚本返回值为0时认为服务恢复正常。
- `vrrp_instance`部分定义了一个VRRP实例。`VI_1`是实例的名称。 - `state`参数指定了当前实例的状态,这里设置为MASTER表示当前实例是主节点。 - `interface`参数指定了要监听的网卡,这里设置为eth0。 - `mcast_src_ip`参数指定了VRRP报文的源IP地址,这里设置为192.168.0.31。 - `virtual_router_id`参数指定了虚拟路由器的ID,这里设置为51。 - `priority`参数指定了实例的优先级,优先级越高(数值越大)越有可能被选为主节点。 - `nopreempt`参数指定了当主节点失效后不要抢占身份,即不要自动切换为主节点。 - `advert_int`参数指定了发送广播的间隔时间,这里设置为2秒。 - `authentication`部分指定了认证参数 - `auth_type`参数指定了认证类型,这里设置为PASS表示使用密码认证, - `auth_pass`参数指定了认证密码,这里设置为K8SHA_KA_AUTH。 - `virtual_ipaddress`部分指定了虚拟IP地址,这里设置为192.168.0.36。 - `track_script`部分指定了要跟踪的脚本,这里跟踪了chk_apiserver脚本。 ################################################## 说明结束 ##################################################
5) 健康检查脚本---3台Master配置一致 [root@srv1 ~]# cat > /etc/keepalived/check_apiserver.sh << EOF #!/bin/bash
err=0 for k in \$(seq 1 3) do check_code=\$(pgrep haproxy) if [[ \$check_code == "" ]]; then err=\$(expr \$err + 1) sleep 1 continue else err=0 break fi done
if [[ \$err != "0" ]]; then echo "systemctl stop keepalived" /usr/bin/systemctl stop keepalived exit 1 else exit 0 fi EOF

################################################## 参数说明 ##################################################
# 脚本的主要逻辑如下: # 1. 首先设置一个变量err为0,用来记录错误次数。 # 2. 使用一个循环,在循环内部执行以下操作: # a. 使用pgrep命令检查是否有名为haproxy的进程在运行。如果不存在该进程,将err加1,并暂停1秒钟,然后继续下一次循环。 # b. 如果存在haproxy进程,将err重置为0,并跳出循环。 # 3. 检查err的值,如果不为0,表示检查失败,输出一条错误信息并执行“systemctl stop keepalived”命令停止keepalived进程, # 并退出脚本返回1。 # # 4. 如果err的值为0,表示检查成功,退出脚本返回0。 # # 该脚本的主要作用是检查是否存在运行中的haproxy进程,如果无法检测到haproxy进程,将停止keepalived进程并返回错误状态。 # 如果haproxy进程存在,则返回成功状态。这个脚本可能是作为一个健康检查脚本的一部分,在确保haproxy服务可用的情况下, # 才继续运行其他操作。
################################################## 说明结束 ##################################################
[root@srv1 ~]# chmod +x /etc/keepalived/check_apiserver.sh
6) 启动服务---3台Master操作一致 [root@srv1 ~]# systemctl daemon-reload [root@srv1 ~]# systemctl enable --now haproxy keepalived
7) 启动服务---测试 [root@srv1 ~]# ping -c 2 192.168.1.21 PING 192.168.1.21 (192.168.1.21) 56(84) bytes of data. 64 bytes from 192.168.1.21: icmp_seq=1 ttl=64 time=0.067 ms 64 bytes from 192.168.1.21: icmp_seq=2 ttl=64 time=0.037 ms
--- 192.168.1.21 ping statistics --- 2 packets transmitted, 2 received, 0% packet loss, time 1032ms rtt min/avg/max/mdev = 0.037/0.052/0.067/0.015 ms
[root@srv1 ~]# telnet 192.168.1.21 16443 Trying 192.168.1.21... Connected to 192.168.1.21. Escape character is '^]'. Connection closed by foreign host.
1.7 Containerd作为Runtime---所有节点均操作
1) 安装Containerd
[root@srv1 ~]# yum-config-manager --add-repo http://mirrors.aliyun.com/docker-ce/linux/centos/docker-ce.repo
[root@srv1 ~]# yum install containerd -y
2) 为Containerd增加所需的模块 [root@srv1 ~]# cat <<EOF >> /etc/modules-load.d/containerd.conf overlay br_netfilter EOF
[root@srv1 ~]# systemctl restart systemd-modules-load.service
3) 配置containerd所需要的内核模块 [root@srv1 ~]# cat <<EOF >> /etc/sysctl.d/99-kubernetes-cri.conf net.bridge.bridge-nf-call-iptables = 1 net.ipv4.ip_forward = 1 net.bridge.bridge-nf-call-ip6tables = 1 EOF [root@srv1 ~]# sysctl --system
4) 创建containerd配置文件 [root@srv1 ~]# mkdir -p /etc/containerd
[root@srv1 ~]# containerd config default > /etc/containerd/config.toml
[root@srv1 ~]# sed -i "s#SystemdCgroup\ \=\ false#SystemdCgroup\ \=\ true#g" /etc/containerd/config.toml [root@srv1 ~]# cat /etc/containerd/config.toml | grep SystemdCgroup SystemdCgroup = true
[root@srv1 ~]# sed -i "s#registry.k8s.io#m.daocloud.io/registry.k8s.io#g" /etc/containerd/config.toml [root@srv1 ~]# cat /etc/containerd/config.toml | grep sandbox_image sandbox_image = "m.daocloud.io/registry.k8s.io/pause:3.6"
5) 为containerd默认配置文件增加镜像源 [root@srv1 ~]# vim /etc/containerd/config.toml ...... ...... ...... ...... ...... ......
# 153行之下添加如下内容 [plugins."io.containerd.grpc.v1.cri".registry.mirrors] [plugins."io.containerd.grpc.v1.cri".registry.mirrors."docker.io"] endpoint = ["https://3laho3y3.mirror.aliyuncs.com", "https://registry-1.docker.io"]
...... ...... ...... ...... ...... ......
6) 启动containerd [root@srv1 ~]# systemctl daemon-reload && systemctl enable --now containerd
7) 配置crictl客户端连接的运行时位置 [root@srv1 ~]# cat > /etc/crictl.yaml <<EOF runtime-endpoint: unix:///run/containerd/containerd.sock image-endpoint: unix:///run/containerd/containerd.sock timeout: 10 debug: false EOF
[root@srv1 ~]# systemctl restart containerd
8) 测试 [root@srv1 ~]# ctr images ls REF TYPE DIGEST SIZE PLATFORMS LABELS
1.8 安装kubelet---所有节点均操作
1) 所有节点配置k8s.repo
[root@srv1 ~]# vim /etc/yum.repos.d/kubernetes.repo
[kubernetes]
name=Kubernetes
baseurl=https://mirrors.aliyun.com/kubernetes/yum/repos/kubernetes-el7-x86_64
enabled=1
gpgcheck=1
repo_gpgcheck=1
gpgkey=https://mirrors.aliyun.com/kubernetes/yum/doc/yum-key.gpg https://mirrors.aliyun.com/kubernetes/yum/doc/rpm-package-key.gpg
2) 所有节点安装kube tools并开启kubelet服务 2.1) 查看kubeadm/kubelet/kubectl工具的版本 [root@srv1 ~]# yum list kubelet kubeadm kubectl --showduplicates | grep 1.28.2 2.2) 安装指定版本的kube-tools并开启kubelet服务 [root@srv1 ~]# yum install kubectl-1.28.2-0.x86_64 \ kubeadm-1.28.2-0.x86_64 \ kubelet-1.28.2-0.x86_64 -y
[root@srv1 ~]# vim /etc/sysconfig/kubelet KUBELET_EXTRA_ARGS="--container-runtime-endpoint=unix:///run/containerd/containerd.sock"
[root@srv1 ~]# systemctl daemon-reload && systemctl enable kubelet # 不要运行
1.9 构建Kubernetes集群---srv1节点操作
1) 获取kubeadm-config.yaml
[root@srv1 ~]# kubeadm config print init-defaults > kubeadm-config.yaml
2) 修改kubeadm-config.yaml [root@srv1 ~]# vim kubeadm-config.yaml apiVersion: kubeadm.k8s.io/v1beta3 bootstrapTokens: - groups: - system:bootstrappers:kubeadm:default-node-token token: abcdef.0123456789abcdef ttl: 24h0m0s usages: - signing - authentication kind: InitConfiguration localAPIEndpoint: # 设定API Server地址 advertiseAddress: 192.168.1.11 bindPort: 6443 nodeRegistration: criSocket: unix:///var/run/containerd/containerd.sock imagePullPolicy: IfNotPresent # 设定本机的HostName name: srv1.1000y.cloud taints: null --- apiServer: # 包含所有Master/LB/VIP IP,一个都不能少!为了方便后期扩容可以多写几个预留的IP certSANs: - 192.168.1.11 - 192.168.1.12 - 192.168.1.13 - 192.168.1.21 timeoutForControlPlane: 4m0s apiVersion: kubeadm.k8s.io/v1beta3 certificatesDir: /etc/kubernetes/pki clusterName: kubernetes controlPlaneEndpoint: 192.168.1.21:16443 controllerManager: {} dns: {} etcd: external: # 使用外部etcd endpoints: - https://192.168.1.11:2379 # etcd集群3个节点 - https://192.168.1.12:2379 - https://192.168.1.13:2379 caFile: /etc/etcd/ssl/etcd-ca.pem # 连接etcd所需证书 certFile: /etc/etcd/ssl/etcd.pem keyFile: /etc/etcd/ssl/etcd-key.pem imageRepository: registry.aliyuncs.com/google_containers kind: ClusterConfiguration # 版本号必须与kubeadm版本一致 kubernetesVersion: 1.28.2 networking: dnsDomain: cluster.local podSubnet: 172.16.0.0/12 serviceSubnet: 10.96.0.0/12 scheduler: {}
3) 更新kubeadm文件 [root@srv1 ~]# kubeadm config migrate --old-config kubeadm-config.yaml --new-config new.yaml
4) 将kubeadm文件传送至其他Master节点 [root@srv1 ~]# for node in $master do echo $node scp new.yaml $node:/root/ done
5) 部署Kubernetes集群 # 可提前通过 kubeadm config images pull --config /root/new.yaml 指令下载镜像
# srv1节点初始化,初始化以后会在/etc/kubernetes目录下生成对应的证书和配置文件,之后其他Master节点加入srv1节点即可 [root@srv1 ~]# kubeadm init --config /root/new.yaml --upload-certs [init] Using Kubernetes version: v1.28.2 [preflight] Running pre-flight checks [preflight] Pulling images required for setting up a Kubernetes cluster [preflight] This might take a minute or two, depending on the speed of your internet connection [preflight] You can also perform this action in beforehand using 'kubeadm config images pull' ...... ...... ...... ...... ...... ...... Your Kubernetes control-plane has initialized successfully!
To start using your cluster, you need to run the following as a regular user:
# 普通用户执行以下三条指令 mkdir -p $HOME/.kube sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config sudo chown $(id -u):$(id -g) $HOME/.kube/config
Alternatively, if you are the root user, you can run: # 特权账户执行以下指令 export KUBECONFIG=/etc/kubernetes/admin.conf
You should now deploy a pod network to the cluster. Run "kubectl apply -f [podnetwork].yaml" with one of the options listed at: https://kubernetes.io/docs/concepts/cluster-administration/addons/
You can now join any number of the control-plane node running the following command on each as root: # 加入其他Matter节点,执行以下指令 kubeadm join 192.168.1.21:16443 --token abcdef.0123456789abcdef \ --discovery-token-ca-cert-hash sha256:0cd0e5d9ca8cda700fd2a14c39211a73775ac03d5b386cbe7c59098ffc8b0fa6 \ --control-plane --certificate-key 811b4eeae8bbca3316b65eadb8738da988586b9b0d0daa13a22bc15785718234
Please note that the certificate-key gives access to cluster sensitive data, keep it secret! As a safeguard, uploaded-certs will be deleted in two hours; If necessary, you can use "kubeadm init phase upload-certs --upload-certs" to reload certs afterward.
Then you can join any number of worker nodes by running the following on each as root:
# 加入worker节点,加入以下指令 kubeadm join 192.168.1.21:16443 --token abcdef.0123456789abcdef \ --discovery-token-ca-cert-hash sha256:0cd0e5d9ca8cda700fd2a14c39211a73775ac03d5b386cbe7c59098ffc8b0fa
################################################## 说明汇总 ##################################################
1. 遇到kubeadm部署失败,可通过 'kubeadm reset' 进行集群复位 2. 遇到 'kubeadm reset' 时etcd集群宕机,可先停止其他节点etcd服务,并删除所有etcd集群的date-file文件,在重新启动etcd服务,即可恢复 [root@srv1 ~]# systemctl stop etcd [root@srv1 ~]# rm -rf /var/lib/etcd/* [root@srv1 ~]# systemctl start etcd
################################################## 汇总结束 ##################################################
6) 加载环境 [root@srv1 ~]# echo 'export KUBECONFIG=/etc/kubernetes/admin.conf' >> .bashrc [root@srv1 ~]# source .bashrc
[root@srv1 ~]# kubectl get nodes NAME STATUS ROLES AGE VERSION srv1.1000y.cloud NotReady control-plane 28s v1.28.2
7) 将其他的Master节点,加入至k8s集群---其他Master节点操作 [root@srv2 ~]# kubeadm join 192.168.1.21:16443 --token abcdef.0123456789abcdef \ --discovery-token-ca-cert-hash sha256:0cd0e5d9ca8cda700fd2a14c39211a73775ac03d5b386cbe7c59098ffc8b0fa6 \ --control-plane --certificate-key 811b4eeae8bbca3316b65eadb8738da988586b9b0d0daa13a22bc15785718234 ...... ...... ...... ...... ...... ...... Run 'kubectl get nodes' to see this node join the cluster.
[root@srv3 ~]# kubeadm join 192.168.1.21:16443 --token abcdef.0123456789abcdef \ --discovery-token-ca-cert-hash sha256:0cd0e5d9ca8cda700fd2a14c39211a73775ac03d5b386cbe7c59098ffc8b0fa6 \ --control-plane --certificate-key 811b4eeae8bbca3316b65eadb8738da988586b9b0d0daa13a22bc15785718234 ...... ...... ...... ...... ...... ...... Run 'kubectl get nodes' to see this node join the cluster.
7) 将其他的Worker节点,加入至k8s集群---其他worker节点操作 [root@srv4 ~]# kubeadm join 192.168.1.21:16443 --token abcdef.0123456789abcdef \ --discovery-token-ca-cert-hash sha256:0cd0e5d9ca8cda700fd2a14c39211a73775ac03d5b386cbe7c59098ffc8b0fa6 ...... ...... ...... ...... ...... ...... Run 'kubectl get nodes' to see this node join the cluster.
[root@srv5 ~]# kubeadm join 192.168.1.21:16443 --token abcdef.0123456789abcdef \ --discovery-token-ca-cert-hash sha256:0cd0e5d9ca8cda700fd2a14c39211a73775ac03d5b386cbe7c59098ffc8b0fa6 ...... ...... ...... ...... ...... ...... Run 'kubectl get nodes' to see this node join the cluster.
[root@srv6 ~]# kubeadm join 192.168.1.21:16443 --token abcdef.0123456789abcdef \ --discovery-token-ca-cert-hash sha256:0cd0e5d9ca8cda700fd2a14c39211a73775ac03d5b386cbe7c59098ffc8b0fa6 ...... ...... ...... ...... ...... ...... Run 'kubectl get nodes' to see this node join the cluster.
8) 确认 [root@srv1 ~]# kubectl get nodes NAME STATUS ROLES AGE VERSION srv1.1000y.cloud NotReady control-plane 10m v1.28.2 srv2.1000y.cloud NotReady control-plane 6m5s v1.28.2 srv3.1000y.cloud NotReady control-plane 6m5s v1.28.2 srv4.1000y.cloud NotReady <none> 2m31s v1.28.2 srv5.1000y.cloud NotReady <none> 2m7s v1.28.2 srv6.1000y.cloud NotReady <none> 102s v1.28.2
1.10 安装Calico组件---srv1节点操作
1) 升级runc--所有节点操作
[root@srv1 ~]# wget https://github.com/opencontainers/runc/releases/download/v1.1.9/runc.amd64
[root@srv1 ~]# install -m 755 runc.amd64 /usr/local/sbin/runc [root@srv1 ~]# cp -p /usr/local/sbin/runc /usr/local/bin/runc [root@srv1 ~]# cp -p /usr/local/sbin/runc /usr/bin/runc
2) 升级libseccomp --所有节点操作 [root@srv1 ~]# yum install https://mirrors.tuna.tsinghua.edu.cn/centos/8-stream/BaseOS/x86_64/os/Packages/libseccomp-2.5.1-1.el8.x86_64.rpm -y
[root@srv1 ~]# rpm -qa | grep libseccomp libseccomp-2.5.1-1.el8.x86_64
3) 下载calico-typha.yaml并修改 [root@srv1 ~]# wget https://github.com/projectcalico/calico/blob/master/manifests/calico-typha.yaml
[root@srv1 ~]# vim calico-typha.yaml ...... ...... ...... ...... ...... ...... # calico-config ConfigMap处 # 87行 "ipam": { "type": "calico-ipam", }, ...... ...... ...... ...... ...... ...... # 4878行 - name: IP value: "autodetect" ...... ...... ...... ...... ...... ...... # 4910行 - name: CALICO_IPV4POOL_CIDR value: "172.16.0.0/12"
4) 修改Calico镜像为国内源--srv1操作 # 换为国内源 [root@srv1 ~]# sed -i "s#docker.io/calico/#m.daocloud.io/docker.io/calico/#g" calico-typha.yaml
# 恢复为默认源 [root@srv1 ~]# sed -i "s#m.daocloud.io/docker.io/calico/#docker.io/calico/#g" calico-typha.yaml
5) 应用Calico--srv1操作 [root@srv1 ~]# kubectl apply -f calico-typha.yaml poddisruptionbudget.policy/calico-kube-controllers created poddisruptionbudget.policy/calico-typha created serviceaccount/calico-kube-controllers created serviceaccount/calico-node created serviceaccount/calico-cni-plugin created configmap/calico-config created customresourcedefinition.apiextensions.k8s.io/bgpconfigurations.crd.projectcalico.org created customresourcedefinition.apiextensions.k8s.io/bgpfilters.crd.projectcalico.org created customresourcedefinition.apiextensions.k8s.io/bgppeers.crd.projectcalico.org created customresourcedefinition.apiextensions.k8s.io/blockaffinities.crd.projectcalico.org created customresourcedefinition.apiextensions.k8s.io/caliconodestatuses.crd.projectcalico.org created customresourcedefinition.apiextensions.k8s.io/clusterinformations.crd.projectcalico.org created customresourcedefinition.apiextensions.k8s.io/felixconfigurations.crd.projectcalico.org created customresourcedefinition.apiextensions.k8s.io/globalnetworkpolicies.crd.projectcalico.org created customresourcedefinition.apiextensions.k8s.io/globalnetworksets.crd.projectcalico.org created customresourcedefinition.apiextensions.k8s.io/hostendpoints.crd.projectcalico.org created customresourcedefinition.apiextensions.k8s.io/ipamblocks.crd.projectcalico.org created customresourcedefinition.apiextensions.k8s.io/ipamconfigs.crd.projectcalico.org created customresourcedefinition.apiextensions.k8s.io/ipamhandles.crd.projectcalico.org created customresourcedefinition.apiextensions.k8s.io/ippools.crd.projectcalico.org created customresourcedefinition.apiextensions.k8s.io/ipreservations.crd.projectcalico.org created customresourcedefinition.apiextensions.k8s.io/kubecontrollersconfigurations.crd.projectcalico.org created customresourcedefinition.apiextensions.k8s.io/networkpolicies.crd.projectcalico.org created customresourcedefinition.apiextensions.k8s.io/networksets.crd.projectcalico.org created clusterrole.rbac.authorization.k8s.io/calico-kube-controllers created clusterrole.rbac.authorization.k8s.io/calico-node created clusterrole.rbac.authorization.k8s.io/calico-cni-plugin created clusterrolebinding.rbac.authorization.k8s.io/calico-kube-controllers created clusterrolebinding.rbac.authorization.k8s.io/calico-node created clusterrolebinding.rbac.authorization.k8s.io/calico-cni-plugin created service/calico-typha created daemonset.apps/calico-node created deployment.apps/calico-kube-controllers created deployment.apps/calico-typha created
6) 查看Calico容器状态--srv1操作 [root@srv1 ~]# kubectl get pod -A kube-system calico-kube-controllers-765c96cb9d-brhcg 1/1 Running 0 44m kube-system calico-node-52mdb 1/1 Running 0 44m kube-system calico-node-8l92t 1/1 Running 0 44m kube-system calico-node-pr8zs 1/1 Running 7 (22m ago) 44m kube-system calico-node-r2trr 1/1 Running 6 (21m ago) 44m kube-system calico-node-w9gnq 1/1 Running 6 (21m ago) 44m kube-system calico-node-znjw8 1/1 Running 6 (21m ago) 44m kube-system calico-typha-67c57cdf49-jtjwg 1/1 Running 0 44m
################################################## 说明汇总 ##################################################
1. 如果无法下载镜像,可先找个能下的,将镜像导出。 2. 在将镜像导入至k8s集群中pod所在的节点,例如: [root@srv1 ~]# ctr -n=k8s.io image import calico-kube-controllers-master.tar
################################################## 汇总结束 ##################################################
1.11 Metrics Server安装及配置
1) 下载高可用版本yaml---srv1操作
[root@srv1 ~]# curl -O \
https://github.com/kubernetes-sigs/metrics-server/releases/latest/download/high-availability.yaml
2) 修改yaml---srv1操作 [root@srv1 ~]# vim high-availability.yaml ...... ...... ...... ...... ...... ...... containers: - args: # 修改145行-154行内容如下 - --cert-dir=/tmp - --secure-port=4443 - --kubelet-preferred-address-types=InternalIP,ExternalIP,Hostname - --kubelet-use-node-status-port - --metric-resolution=15s # 修改150行内容如下 - --kubelet-insecure-tls ...... ...... ...... ...... ...... ...... # 修改193行内容如下 apiVersion: policy/v1 ...... ...... ...... ...... ...... ......
# 修改为国内源 [root@srv1 ~]# sed -i "s#registry.k8s.io/#m.daocloud.io/registry.k8s.io/#g" high-availability.yaml
3) 将srv1节点上的front-proxy-ca.crt复制到worker节点上 [root@srv1 ~]# for node in $worker do echo $node scp /etc/kubernetes/pki/front-proxy-ca.crt $node:/etc/kubernetes/pki/front-proxy-ca.crt done
4) 安装---srv1操作 [root@srv1 ~]# kubectl apply -f high-availability.yaml serviceaccount/metrics-server created clusterrole.rbac.authorization.k8s.io/system:aggregated-metrics-reader created clusterrole.rbac.authorization.k8s.io/system:metrics-server created rolebinding.rbac.authorization.k8s.io/metrics-server-auth-reader created clusterrolebinding.rbac.authorization.k8s.io/metrics-server:system:auth-delegator created clusterrolebinding.rbac.authorization.k8s.io/system:metrics-server created service/metrics-server created deployment.apps/metrics-server created poddisruptionbudget.policy/metrics-server created apiservice.apiregistration.k8s.io/v1beta1.metrics.k8s.io created
5) 测试---srv1操作 [root@srv1 ~]# kubectl get pods -A | grep metrics-server kube-system metrics-server-76bcdc46fd-dnncv 1/1 Running 0 58s kube-system metrics-server-76bcdc46fd-ql6nr 1/1 Running 0 58s
[root@srv1 ~]# kubectl top nodes -A NAME CPU(cores) CPU% MEMORY(bytes) MEMORY% srv1.1000y.cloud 245m 12% 1461Mi 38% srv2.1000y.cloud 258m 12% 1332Mi 34% srv3.1000y.cloud 226m 11% 1255Mi 32% srv4.1000y.cloud 145m 3% 1049Mi 13% srv5.1000y.cloud 103m 2% 945Mi 12% srv6.1000y.cloud 104m 2% 957Mi 12%
[root@srv1 ~]# kubectl top pods -A NAMESPACE NAME CPU(cores) MEMORY(bytes) kube-system calico-kube-controllers-765c96cb9d-brhcg 5m 18Mi kube-system calico-node-52mdb 50m 153Mi kube-system calico-node-8l92t 40m 169Mi kube-system calico-node-pr8zs 47m 163Mi kube-system calico-node-r2trr 57m 160Mi kube-system calico-node-w9gnq 39m 156Mi kube-system calico-node-znjw8 38m 158Mi kube-system calico-typha-67c57cdf49-jtjwg 5m 32Mi kube-system coredns-66f779496c-8rn6t 4m 22Mi kube-system coredns-66f779496c-mphck 3m 21Mi kube-system kube-apiserver-srv1.1000y.cloud 66m 353Mi kube-system kube-apiserver-srv2.1000y.cloud 65m 342Mi kube-system kube-apiserver-srv3.1000y.cloud 59m 406Mi kube-system kube-controller-manager-srv1.1000y.cloud 4m 28Mi kube-system kube-controller-manager-srv2.1000y.cloud 21m 60Mi kube-system kube-controller-manager-srv3.1000y.cloud 3m 27Mi kube-system kube-proxy-bhshz 1m 23Mi kube-system kube-proxy-c28r9 1m 21Mi kube-system kube-proxy-knj4q 1m 19Mi kube-system kube-proxy-kw6qk 1m 19Mi kube-system kube-proxy-m46fj 1m 22Mi kube-system kube-proxy-nsxmk 1m 22Mi kube-system kube-scheduler-srv1.1000y.cloud 4m 23Mi kube-system kube-scheduler-srv2.1000y.cloud 5m 24Mi kube-system kube-scheduler-srv3.1000y.cloud 5m 28Mi kube-system metrics-server-76bcdc46fd-dnncv 6m 17Mi kube-system metrics-server-76bcdc46fd-ql6nr 7m 20Mi
1.12 安装DashBoard
1) 下载DashBoard---srv1节点操作
DashBoard 最新版本查询: https://github.com/kubernetes/dashboard/releases
[root@srv1 ~]# curl -O \ https://raw.githubusercontent.com/kubernetes/dashboard/v2.7.0/aio/deploy/recommended.yaml
# 修改为国内源 [root@srv1 ~]# sed -i "s#kubernetesui/#registry.cn-beijing.aliyuncs.com/dotbalo/#g" ./recommended.yaml
2) 安装DashBoard---srv1节点操作 [root@srv1 ~]# kubectl apply -f recommended.yaml namespace/kubernetes-dashboard created serviceaccount/kubernetes-dashboard created service/kubernetes-dashboard created secret/kubernetes-dashboard-certs created secret/kubernetes-dashboard-csrf created secret/kubernetes-dashboard-key-holder created configmap/kubernetes-dashboard-settings created role.rbac.authorization.k8s.io/kubernetes-dashboard created clusterrole.rbac.authorization.k8s.io/kubernetes-dashboard created rolebinding.rbac.authorization.k8s.io/kubernetes-dashboard created clusterrolebinding.rbac.authorization.k8s.io/kubernetes-dashboard created deployment.apps/kubernetes-dashboard created service/dashboard-metrics-scraper created deployment.apps/dashboard-metrics-scraper created
3) 确认DashBoard Pods [root@srv1 ~]# kubectl get pods -n kubernetes-dashboard NAME READY STATUS RESTARTS AGE dashboard-metrics-scraper-7b554c884f-snd99 1/1 Running 0 49s kubernetes-dashboard-54b699784c-52twt 1/1 Running 0 50s
4) 修改DashBoard---srv1节点操作 [root@srv1 ~]# kubectl edit svc kubernetes-dashboard -n kubernetes-dashboard ...... ...... ...... ...... ...... ...... # 修改33行,将端口类型由ClusterIP改为NodePort type: NodePort ...... ...... ...... ...... ...... ...... service/kubernetes-dashboard edited
3) 显示主机端口---srv1节点操作 [root@srv1 ~]# kubectl get svc -n kubernetes-dashboard NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE dashboard-metrics-scraper ClusterIP 10.97.157.202 <none> 8000/TCP 3m47s kubernetes-dashboard NodePort 10.100.224.77 <none> 443:31970/TCP 3m50s
4) 创建token---srv1节点操作 [root@srv1 ~]# kubectl create serviceaccount -n kubernetes-dashboard admin-user serviceaccount/admin-user created [root@srv1 ~]# cat > admin-user.yaml << EOF apiVersion: rbac.authorization.k8s.io/v1 kind: ClusterRoleBinding metadata: name: admin-user annotations: rbac.authorization.kubernetes.io/autoupdate: "true" roleRef: apiGroup: rbac.authorization.k8s.io kind: ClusterRole name: cluster-admin subjects: - kind: ServiceAccount name: admin-user namespace: kubernetes-dashboard EOF
[root@srv1 ~]# kubectl apply -f admin-user.yaml serviceaccount/admin-user created clusterrolebinding.rbac.authorization.k8s.io/admin-user created
[root@srv1 ~]# kubectl apply -f create-secret.yaml secret/admin-user created
5) 查看Token [root@srv1 ~]# kubectl get secret admin-user -n kubernetes-dashboard -o jsonpath={".data.token"} | base64 -d eyJhbGciOiJSUzI1NiIsImtpZCI6IlVZVlpGbTd4NzhYOVpoTUhpbGtiVWJ4ZlVPVWY0YWQ4UzZ6aThzLS1PeFEifQ.eyJpc3MiOiJr dWJlcm5ldGVzL3NlcnZpY2VhY2NvdW50Iiwia3ViZXJuZXRlcy5pby9zZXJ2aWNlYWNjb3VudC9uYW1lc3BhY2UiOiJrdWJlcm5ldGV zLWRhc2hib2FyZCIsImt1YmVybmV0ZXMuaW8vc2VydmljZWFjY291bnQvc2VjcmV0Lm5hbWUiOiJhZG1pbi11c2VyIiwia3ViZXJuZX Rlcy5pby9zZXJ2aWNlYWNjb3VudC9zZXJ2aWNlLWFjY291bnQubmFtZSI6ImFkbWluLXVzZXIiLCJrdWJlcm5ldGVzLmlvL3NlcnZpY 2VhY2NvdW50L3NlcnZpY2UtYWNjb3VudC51aWQiOiJmNTYyYjhlNy0yZDI4LTQwMWQtOWJlZi1hNzM1NTg4NDRiYWQiLCJzdWIiOiJz eXN0ZW06c2VydmljZWFjY291bnQ6a3ViZXJuZXRlcy1kYXNoYm9hcmQ6YWRtaW4tdXNlciJ9.iDdhzAuMEguQBKPpsscPgdPNtqwSFn edP3u6D2ctQQyI-7iUwZfQKIltG4mmW7xyhG2G55lbCHYmR0Bi81-zCkJ51GKQPgExmJauWcJ2zRBUxn6TzVTG2t1wHeWcHbmXSoC6t 5HukbhNGT8c1atUTw-uloRygpqsoWKUzBC0_RwDunwy8yAfm_qknxqve9IUds5emDBQ-HAFTmYZz4xhpAz4tBcWSLL2wNYrB4Bovf4F r10vE7OkwL807m9HftTsVtVej8N4po8OBDHu83aWJCD_EnvfO-9DWcms-Yv16eI2Fuv8ig5HB5w3SxE5A_OD5Zqw6r_SsMgwKKe2KCv q9w
5) 登录DashBoard [浏览器]--->https://srv1.1000y.cloud:31970

1.13 将所有节点的kube-proxy的iptables模式改为ipvs
1) 更改kube-proxy
[root@srv1 ~]# kubectl edit cm kube-proxy -n kube-system
......
......
......
......
......
......
    metricsBindAddress: ""
    # 更改54行的内容如下
    mode: "ipvs"
    nodePortAddresses: null
......
......
......
......
......
......
configmap/kube-proxy edited
2) 更新kube-proxy [root@srv1 ~]# kubectl patch daemonset kube-proxy -p \ "{\"spec\":{\"template\":{\"metadata\":{\"annotations\":{\"date\":\"`date +'%s'`\"}}}}}" \ -n kube-system daemonset.apps/kube-proxy patched
3) 验证kube-proxy Pod # 所有Pod都将更新,注意pod的时间 [root@srv1 ~]# kubectl get po -n kube-system | grep kube-proxy kube-proxy-66vjq 1/1 Running 0 8s kube-proxy-7jq8z 0/1 ContainerCreating 0 2s kube-proxy-g2tcd 1/1 Running 0 19s kube-proxy-j2jbw 1/1 Running 0 14s kube-proxy-kw6qk 1/1 Running 0 150m kube-proxy-m46fj 1/1 Running 0 142m
[root@srv1 ~]# kubectl get po -n kube-system | grep kube-proxy kube-proxy-29vxr 1/1 Running 0 46s kube-proxy-66vjq 1/1 Running 0 59s kube-proxy-7jq8z 1/1 Running 0 53s kube-proxy-g2tcd 1/1 Running 0 70s kube-proxy-j2jbw 1/1 Running 0 65s kube-proxy-mmj5x 1/1 Running 0 39s
3) 验证kube-proxy模式 [root@srv1 ~]# curl 127.0.0.1:10249/proxyMode ipvs
1.14 安装ingress
1) 部署ingress
# 下载地址: https://raw.githubusercontent.com/kubernetes/ingress-nginx/main/deploy/static/provider/cloud/deploy.yaml
[root@srv1 ~]# cat > deploy.yaml << EOF apiVersion: v1 kind: Namespace metadata: labels: app.kubernetes.io/instance: ingress-nginx app.kubernetes.io/name: ingress-nginx name: ingress-nginx --- apiVersion: v1 automountServiceAccountToken: true kind: ServiceAccount metadata: labels: app.kubernetes.io/component: controller app.kubernetes.io/instance: ingress-nginx app.kubernetes.io/name: ingress-nginx app.kubernetes.io/part-of: ingress-nginx app.kubernetes.io/version: 1.9.0 name: ingress-nginx namespace: ingress-nginx --- apiVersion: v1 kind: ServiceAccount metadata: labels: app.kubernetes.io/component: admission-webhook app.kubernetes.io/instance: ingress-nginx app.kubernetes.io/name: ingress-nginx app.kubernetes.io/part-of: ingress-nginx app.kubernetes.io/version: 1.9.0 name: ingress-nginx-admission namespace: ingress-nginx --- apiVersion: rbac.authorization.k8s.io/v1 kind: Role metadata: labels: app.kubernetes.io/component: controller app.kubernetes.io/instance: ingress-nginx app.kubernetes.io/name: ingress-nginx app.kubernetes.io/part-of: ingress-nginx app.kubernetes.io/version: 1.9.0 name: ingress-nginx namespace: ingress-nginx rules: - apiGroups: - "" resources: - namespaces verbs: - get - apiGroups: - "" resources: - configmaps - pods - secrets - endpoints verbs: - get - list - watch - apiGroups: - "" resources: - services verbs: - get - list - watch - apiGroups: - networking.k8s.io resources: - ingresses verbs: - get - list - watch - apiGroups: - networking.k8s.io resources: - ingresses/status verbs: - update - apiGroups: - networking.k8s.io resources: - ingressclasses verbs: - get - list - watch - apiGroups: - coordination.k8s.io resourceNames: - ingress-nginx-leader resources: - leases verbs: - get - update - apiGroups: - coordination.k8s.io resources: - leases verbs: - create - apiGroups: - "" resources: - events verbs: - create - patch - apiGroups: - discovery.k8s.io resources: - endpointslices verbs: - list - watch - get --- apiVersion: rbac.authorization.k8s.io/v1 kind: Role metadata: labels: app.kubernetes.io/component: admission-webhook app.kubernetes.io/instance: ingress-nginx app.kubernetes.io/name: ingress-nginx app.kubernetes.io/part-of: ingress-nginx app.kubernetes.io/version: 1.9.0 name: ingress-nginx-admission namespace: ingress-nginx rules: - apiGroups: - "" resources: - secrets verbs: - get - create --- apiVersion: rbac.authorization.k8s.io/v1 kind: ClusterRole metadata: labels: app.kubernetes.io/instance: ingress-nginx app.kubernetes.io/name: ingress-nginx app.kubernetes.io/part-of: ingress-nginx app.kubernetes.io/version: 1.9.0 name: ingress-nginx rules: - apiGroups: - "" resources: - configmaps - endpoints - nodes - pods - secrets - namespaces verbs: - list - watch - apiGroups: - coordination.k8s.io resources: - leases verbs: - list - watch - apiGroups: - "" resources: - nodes verbs: - get - apiGroups: - "" resources: - services verbs: - get - list - watch - apiGroups: - networking.k8s.io resources: - ingresses verbs: - get - list - watch - apiGroups: - "" resources: - events verbs: - create - patch - apiGroups: - networking.k8s.io resources: - ingresses/status verbs: - update - apiGroups: - networking.k8s.io resources: - ingressclasses verbs: - get - list - watch - apiGroups: - discovery.k8s.io resources: - endpointslices verbs: - list - watch - get --- apiVersion: rbac.authorization.k8s.io/v1 kind: ClusterRole metadata: labels: app.kubernetes.io/component: admission-webhook app.kubernetes.io/instance: ingress-nginx app.kubernetes.io/name: ingress-nginx app.kubernetes.io/part-of: ingress-nginx app.kubernetes.io/version: 1.9.0 name: ingress-nginx-admission rules: - apiGroups: - admissionregistration.k8s.io resources: - validatingwebhookconfigurations verbs: - get - update --- apiVersion: rbac.authorization.k8s.io/v1 kind: RoleBinding metadata: labels: app.kubernetes.io/component: controller app.kubernetes.io/instance: ingress-nginx app.kubernetes.io/name: ingress-nginx app.kubernetes.io/part-of: ingress-nginx app.kubernetes.io/version: 1.9.0 name: ingress-nginx namespace: ingress-nginx roleRef: apiGroup: rbac.authorization.k8s.io kind: Role name: ingress-nginx subjects: - kind: ServiceAccount name: ingress-nginx namespace: ingress-nginx --- apiVersion: rbac.authorization.k8s.io/v1 kind: RoleBinding metadata: labels: app.kubernetes.io/component: admission-webhook app.kubernetes.io/instance: ingress-nginx app.kubernetes.io/name: ingress-nginx app.kubernetes.io/part-of: ingress-nginx app.kubernetes.io/version: 1.9.0 name: ingress-nginx-admission namespace: ingress-nginx roleRef: apiGroup: rbac.authorization.k8s.io kind: Role name: ingress-nginx-admission subjects: - kind: ServiceAccount name: ingress-nginx-admission namespace: ingress-nginx --- apiVersion: rbac.authorization.k8s.io/v1 kind: ClusterRoleBinding metadata: labels: app.kubernetes.io/instance: ingress-nginx app.kubernetes.io/name: ingress-nginx app.kubernetes.io/part-of: ingress-nginx app.kubernetes.io/version: 1.9.0 name: ingress-nginx roleRef: apiGroup: rbac.authorization.k8s.io kind: ClusterRole name: ingress-nginx subjects: - kind: ServiceAccount name: ingress-nginx namespace: ingress-nginx --- apiVersion: rbac.authorization.k8s.io/v1 kind: ClusterRoleBinding metadata: labels: app.kubernetes.io/component: admission-webhook app.kubernetes.io/instance: ingress-nginx app.kubernetes.io/name: ingress-nginx app.kubernetes.io/part-of: ingress-nginx app.kubernetes.io/version: 1.9.0 name: ingress-nginx-admission roleRef: apiGroup: rbac.authorization.k8s.io kind: ClusterRole name: ingress-nginx-admission subjects: - kind: ServiceAccount name: ingress-nginx-admission namespace: ingress-nginx --- apiVersion: v1 data: allow-snippet-annotations: "false" kind: ConfigMap metadata: labels: app.kubernetes.io/component: controller app.kubernetes.io/instance: ingress-nginx app.kubernetes.io/name: ingress-nginx app.kubernetes.io/part-of: ingress-nginx app.kubernetes.io/version: 1.9.0 name: ingress-nginx-controller namespace: ingress-nginx --- apiVersion: v1 kind: Service metadata: labels: app.kubernetes.io/component: controller app.kubernetes.io/instance: ingress-nginx app.kubernetes.io/name: ingress-nginx app.kubernetes.io/part-of: ingress-nginx app.kubernetes.io/version: 1.9.0 name: ingress-nginx-controller namespace: ingress-nginx spec: externalTrafficPolicy: Local ipFamilies: - IPv4 ipFamilyPolicy: SingleStack ports: - appProtocol: http name: http port: 80 protocol: TCP targetPort: http - appProtocol: https name: https port: 443 protocol: TCP targetPort: https selector: app.kubernetes.io/component: controller app.kubernetes.io/instance: ingress-nginx app.kubernetes.io/name: ingress-nginx type: LoadBalancer --- apiVersion: v1 kind: Service metadata: labels: app.kubernetes.io/component: controller app.kubernetes.io/instance: ingress-nginx app.kubernetes.io/name: ingress-nginx app.kubernetes.io/part-of: ingress-nginx app.kubernetes.io/version: 1.9.0 name: ingress-nginx-controller-admission namespace: ingress-nginx spec: ports: - appProtocol: https name: https-webhook port: 443 targetPort: webhook selector: app.kubernetes.io/component: controller app.kubernetes.io/instance: ingress-nginx app.kubernetes.io/name: ingress-nginx type: ClusterIP --- apiVersion: apps/v1 kind: DaemonSet metadata: labels: app.kubernetes.io/component: controller app.kubernetes.io/instance: ingress-nginx app.kubernetes.io/name: ingress-nginx app.kubernetes.io/part-of: ingress-nginx app.kubernetes.io/version: 1.9.0 name: ingress-nginx-controller namespace: ingress-nginx spec: minReadySeconds: 0 revisionHistoryLimit: 10 selector: matchLabels: app.kubernetes.io/component: controller app.kubernetes.io/instance: ingress-nginx app.kubernetes.io/name: ingress-nginx template: metadata: labels: app.kubernetes.io/component: controller app.kubernetes.io/instance: ingress-nginx app.kubernetes.io/name: ingress-nginx app.kubernetes.io/part-of: ingress-nginx app.kubernetes.io/version: 1.9.0 spec: hostNetwork: true containers: - args: - /nginx-ingress-controller - --publish-service=$(POD_NAMESPACE)/ingress-nginx-controller - --election-id=ingress-nginx-leader - --controller-class=k8s.io/ingress-nginx - --ingress-class=nginx - --configmap=$(POD_NAMESPACE)/ingress-nginx-controller - --validating-webhook=:8443 - --validating-webhook-certificate=/usr/local/certificates/cert - --validating-webhook-key=/usr/local/certificates/key env: - name: POD_NAME valueFrom: fieldRef: fieldPath: metadata.name - name: POD_NAMESPACE valueFrom: fieldRef: fieldPath: metadata.namespace - name: LD_PRELOAD value: /usr/local/lib/libmimalloc.so image: m.daocloud.io/registry.k8s.io/ingress-nginx/controller:v1.9.0@sha256:c15d1a617858d90fb8f8a2dd60b0676f2bb85c54e3ed11511794b86ec30c8c60 imagePullPolicy: IfNotPresent lifecycle: preStop: exec: command: - /wait-shutdown livenessProbe: failureThreshold: 5 httpGet: path: /healthz port: 10254 scheme: HTTP initialDelaySeconds: 10 periodSeconds: 10 successThreshold: 1 timeoutSeconds: 1 name: controller ports: - containerPort: 3721 name: http protocol: TCP readinessProbe: failureThreshold: 3 httpGet: path: /healthz port: 10254 scheme: HTTP initialDelaySeconds: 10 periodSeconds: 10 successThreshold: 1 timeoutSeconds: 1 resources: requests: cpu: 100m memory: 90Mi securityContext: allowPrivilegeEscalation: true capabilities: add: - NET_BIND_SERVICE drop: - ALL runAsUser: 101 volumeMounts: - mountPath: /usr/local/certificates/ name: webhook-cert readOnly: true dnsPolicy: ClusterFirst nodeSelector: kubernetes.io/os: linux serviceAccountName: ingress-nginx terminationGracePeriodSeconds: 300 volumes: - name: webhook-cert secret: secretName: ingress-nginx-admission --- apiVersion: batch/v1 kind: Job metadata: labels: app.kubernetes.io/component: admission-webhook app.kubernetes.io/instance: ingress-nginx app.kubernetes.io/name: ingress-nginx app.kubernetes.io/part-of: ingress-nginx app.kubernetes.io/version: 1.9.0 name: ingress-nginx-admission-create namespace: ingress-nginx spec: template: metadata: labels: app.kubernetes.io/component: admission-webhook app.kubernetes.io/instance: ingress-nginx app.kubernetes.io/name: ingress-nginx app.kubernetes.io/part-of: ingress-nginx app.kubernetes.io/version: 1.9.0 name: ingress-nginx-admission-create spec: containers: - args: - create - --host=ingress-nginx-controller-admission,ingress-nginx-controller-admission.$(POD_NAMESPACE).svc - --namespace=$(POD_NAMESPACE) - --secret-name=ingress-nginx-admission env: - name: POD_NAMESPACE valueFrom: fieldRef: fieldPath: metadata.namespace image: m.daocloud.io/registry.k8s.io/ingress-nginx/kube-webhook-certgen:v20230407@sha256:543c40fd093964bc9ab509d3e791f9989963021f1e9e4c9c7b6700b02bfb227b imagePullPolicy: IfNotPresent name: create securityContext: allowPrivilegeEscalation: false nodeSelector: kubernetes.io/os: linux restartPolicy: OnFailure securityContext: fsGroup: 2000 runAsNonRoot: true runAsUser: 2000 serviceAccountName: ingress-nginx-admission --- apiVersion: batch/v1 kind: Job metadata: labels: app.kubernetes.io/component: admission-webhook app.kubernetes.io/instance: ingress-nginx app.kubernetes.io/name: ingress-nginx app.kubernetes.io/part-of: ingress-nginx app.kubernetes.io/version: 1.9.0 name: ingress-nginx-admission-patch namespace: ingress-nginx spec: template: metadata: labels: app.kubernetes.io/component: admission-webhook app.kubernetes.io/instance: ingress-nginx app.kubernetes.io/name: ingress-nginx app.kubernetes.io/part-of: ingress-nginx app.kubernetes.io/version: 1.9.0 name: ingress-nginx-admission-patch spec: containers: - args: - patch - --webhook-name=ingress-nginx-admission - --namespace=$(POD_NAMESPACE) - --patch-mutating=false - --secret-name=ingress-nginx-admission - --patch-failure-policy=Fail env: - name: POD_NAMESPACE valueFrom: fieldRef: fieldPath: metadata.namespace image: m.daocloud.io/registry.k8s.io/ingress-nginx/kube-webhook-certgen:v20230407@sha256:543c40fd093964bc9ab509d3e791f9989963021f1e9e4c9c7b6700b02bfb227b imagePullPolicy: IfNotPresent name: patch securityContext: allowPrivilegeEscalation: false nodeSelector: kubernetes.io/os: linux restartPolicy: OnFailure securityContext: fsGroup: 2000 runAsNonRoot: true runAsUser: 2000 serviceAccountName: ingress-nginx-admission --- apiVersion: networking.k8s.io/v1 kind: IngressClass metadata: labels: app.kubernetes.io/component: controller app.kubernetes.io/instance: ingress-nginx app.kubernetes.io/name: ingress-nginx app.kubernetes.io/part-of: ingress-nginx app.kubernetes.io/version: 1.9.0 name: nginx spec: controller: k8s.io/ingress-nginx --- apiVersion: admissionregistration.k8s.io/v1 kind: ValidatingWebhookConfiguration metadata: labels: app.kubernetes.io/component: admission-webhook app.kubernetes.io/instance: ingress-nginx app.kubernetes.io/name: ingress-nginx app.kubernetes.io/part-of: ingress-nginx app.kubernetes.io/version: 1.9.0 name: ingress-nginx-admission webhooks: - admissionReviewVersions: - v1 clientConfig: service: name: ingress-nginx-controller-admission namespace: ingress-nginx path: /networking/v1/ingresses failurePolicy: Fail matchPolicy: Equivalent name: validate.nginx.ingress.kubernetes.io rules: - apiGroups: - networking.k8s.io apiVersions: - v1 operations: - CREATE - UPDATE resources: - ingresses sideEffects: None EOF
[root@srv1 ~]# cat > backend.yaml << EOF apiVersion: apps/v1 kind: Deployment metadata: name: default-http-backend labels: app.kubernetes.io/name: default-http-backend namespace: kube-system spec: replicas: 1 selector: matchLabels: app.kubernetes.io/name: default-http-backend template: metadata: labels: app.kubernetes.io/name: default-http-backend spec: terminationGracePeriodSeconds: 60 containers: - name: default-http-backend image: registry.cn-hangzhou.aliyuncs.com/chenby/defaultbackend-amd64:1.5 livenessProbe: httpGet: path: /healthz port: 8080 scheme: HTTP initialDelaySeconds: 30 timeoutSeconds: 5 ports: - containerPort: 8080 resources: limits: cpu: 10m memory: 20Mi requests: cpu: 10m memory: 20Mi --- apiVersion: v1 kind: Service metadata: name: default-http-backend namespace: kube-system labels: app.kubernetes.io/name: default-http-backend spec: ports: - port: 80 targetPort: 8080 selector: app.kubernetes.io/name: default-http-backend EOF
[root@srv1 ~]# kubectl apply -f deploy.yaml namespace/ingress-nginx created serviceaccount/ingress-nginx created serviceaccount/ingress-nginx-admission created role.rbac.authorization.k8s.io/ingress-nginx created role.rbac.authorization.k8s.io/ingress-nginx-admission created clusterrole.rbac.authorization.k8s.io/ingress-nginx created clusterrole.rbac.authorization.k8s.io/ingress-nginx-admission created rolebinding.rbac.authorization.k8s.io/ingress-nginx created rolebinding.rbac.authorization.k8s.io/ingress-nginx-admission created clusterrolebinding.rbac.authorization.k8s.io/ingress-nginx created clusterrolebinding.rbac.authorization.k8s.io/ingress-nginx-admission created configmap/ingress-nginx-controller created service/ingress-nginx-controller created service/ingress-nginx-controller-admission created deployment.apps/ingress-nginx-controller created job.batch/ingress-nginx-admission-create created job.batch/ingress-nginx-admission-patch created ingressclass.networking.k8s.io/nginx created validatingwebhookconfiguration.admissionregistration.k8s.io/ingress-nginx-admission created
[root@srv1 ~]# kubectl apply -f backend.yaml deployment.apps/default-http-backend created service/default-http-backend created
2) 确认ingress POD [root@srv1 ~]# kubectl get pods -n ingress-nginx NAME READY STATUS RESTARTS AGE ingress-nginx-admission-create-7spq6 0/1 Completed 0 39m ingress-nginx-admission-patch-qptvx 0/1 Completed 2 39m ingress-nginx-controller-6r8t9 1/1 Running 0 39m ingress-nginx-controller-9rhfl 1/1 Running 0 39m ingress-nginx-controller-d6jlr 1/1 Running 0 39m ingress-nginx-controller-dx22k 1/1 Running 0 39m ingress-nginx-controller-vqhjj 1/1 Running 0 39m ingress-nginx-controller-vt26r 1/1 Running 0 39m
[root@srv1 ~]# kubectl get pods -n kube-system | grep default-http-backend default-http-backend-7b44966d95-gz52t 1/1 Running 0 4m8s
3) 测试ingress [root@srv1 ~]# cat > ingress-demo-app.yaml << EOF apiVersion: apps/v1 kind: Deployment metadata: name: hello-server spec: replicas: 2 selector: matchLabels: app: hello-server template: metadata: labels: app: hello-server spec: containers: - name: hello-server image: registry.cn-hangzhou.aliyuncs.com/lfy_k8s_images/hello-server ports: - containerPort: 9000 --- apiVersion: apps/v1 kind: Deployment metadata: labels: app: nginx-demo name: nginx-demo spec: replicas: 2 selector: matchLabels: app: nginx-demo template: metadata: labels: app: nginx-demo spec: containers: - image: nginx name: nginx --- apiVersion: v1 kind: Service metadata: labels: app: nginx-demo name: nginx-demo spec: selector: app: nginx-demo ports: - port: 8000 protocol: TCP targetPort: 80 --- apiVersion: v1 kind: Service metadata: labels: app: hello-server name: hello-server spec: selector: app: hello-server ports: - port: 8000 protocol: TCP targetPort: 9000 --- apiVersion: networking.k8s.io/v1 kind: Ingress metadata: name: ingress-host-bar spec: ingressClassName: nginx rules: - host: "srv1.1000y.cloud" http: paths: - pathType: Prefix path: "/" backend: service: name: hello-server port: number: 8000 - host: "srv1.1000y.cloud" http: paths: - pathType: Prefix path: "/nginx" backend: service: name: nginx-demo port: number: 8000 EOF
[root@srv1 ~]# kubectl apply -f ingress-demo-app.yaml deployment.apps/hello-server created deployment.apps/nginx-demo created service/nginx-demo created service/hello-server created ingress.networking.k8s.io/ingress-host-bar created
################################################## 错误汇总 ##################################################
1. 出现如下错误: Error from server (InternalError): error when creating "ingress-demo-app.yaml": Internal error occurred: failed calling webhook "validate.nginx.ingress.kubernetes.io": failed to call webhook: Post "https://ingress-nginx-controller-admission.ingress -nginx.svc:443/networking/v1/ingresses?timeout=10s": no endpoints available for service "ingress-nginx-controller-admission"
2. 问题: 刚开始使用yaml的方式创建nginx-ingress,之后删除了它创建的命名空间以及 clusterrole and clusterrolebinding ,但是没有删除ValidatingWebhookConfiguration ingress-nginx-admission,这个ingress-nginx-admission是在yaml文件中安装的。当再次安装nginx-ingress之后,创建自定义的ingress就会报这个错误。
3. 解决方法: [root@srv1 ingress]# kubectl get validatingwebhookconfigurations NAME WEBHOOKS AGE ingress-nginx-admission 1 7m47s
[root@srv1 ingress]# kubectl delete -A ValidatingWebhookConfiguration ingress-nginx-admission validatingwebhookconfiguration.admissionregistration.k8s.io "ingress-nginx-admission" deleted
################################################## 汇总结束 ##################################################
[root@srv1 ~]# kubectl get ingress NAME CLASS HOSTS ADDRESS PORTS AGE ingress-host-bar nginx srv1.1000y.cloud,srv1.1000y.cloud 80 4m16s
# 测试 [root@srv1 ingress]# curl srv1.1000y.cloud Hello World!
4) 查看ingress端口 [root@srv1 ~]# kubectl get svc -n ingress-nginx NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE ingress-nginx-controller LoadBalancer 10.110.11.11 <pending> 80:31350/TCP,443:30829/TCP 26m ingress-nginx-controller-admission ClusterIP 10.101.19.249 <none> 443/TCP 26m
[root@srv1 ~]# kubectl get pods NAME READY STATUS RESTARTS AGE hello-server-569d7866bd-64rcd 1/1 Running 0 26m hello-server-569d7866bd-t4dnq 1/1 Running 0 26m nginx-demo-554db85f85-crss7 1/1 Running 0 26m nginx-demo-554db85f85-x9dc8 1/1 Running 0 26m
5) 测试 [root@srv1 ~]# cat > ingress-test.yaml << EOF apiVersion: apps/v1 kind: Deployment metadata: name: snow spec: replicas: 3 selector: matchLabels: app: snow template: metadata: labels: app: snow spec: containers: - name: snow image: docker.io/library/nginx resources: limits: memory: "128Mi" cpu: "500m" ports: - containerPort: 80 --- apiVersion: v1 kind: Service metadata: name: snow spec: ipFamilyPolicy: PreferDualStack ipFamilies: - IPv4 type: NodePort selector: app: snow ports: - port: 80 targetPort: 80 EOF deployment.apps/snow created service/snow created
[root@srv1 ~]# kubectl apply -f ingress-test.yaml deployment.apps/snow created service/snow created
[root@srv1 ~]# kubectl get pods NAME READY STATUS RESTARTS AGE hello-server-569d7866bd-79292 1/1 Running 0 4m56s hello-server-569d7866bd-q895v 1/1 Running 0 4m56s nginx-demo-554db85f85-r8qhb 1/1 Running 0 4m56s nginx-demo-554db85f85-z6q2x 1/1 Running 0 4m55s snow-7c6bdf498f-fsqxm 1/1 Running 0 49s snow-7c6bdf498f-lbhhg 1/1 Running 0 49s snow-7c6bdf498f-lqf2l 1/1 Running 0 49s
[root@srv1 ~]# kubectl get svc NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE hello-server ClusterIP 10.108.142.145 <none> 8000/TCP 5m15s kubernetes ClusterIP 10.96.0.1 <none> 443/TCP 20h nginx-demo ClusterIP 10.100.50.36 <none> 8000/TCP 5m17s snow NodePort 10.99.57.229 <none> 80:32173/TCP 40s
[root@srv1 ~]# curl -I http://srv1.1000y.cloud:32173 HTTP/1.1 200 OK Server: nginx/1.21.5 Date: Mon, 02 Oct 2023 05:29:48 GMT Content-Type: text/html Content-Length: 615 Last-Modified: Tue, 28 Dec 2021 15:28:38 GMT Connection: keep-alive ETag: "61cb2d26-267" Accept-Ranges: bytes
1.15 验证集群
1) 创建Pod资源---srv1操作
[root@srv1 ~]# cat<<EOF | kubectl apply -f -
apiVersion: v1
kind: Pod
metadata:
  name: busybox
  namespace: default
spec:
  containers:
  - name: busybox
    image: docker.io/library/busybox:1.28
    command:
      - sleep
      - "3600"
    imagePullPolicy: IfNotPresent
  restartPolicy: Always
EOF
pod/busybox created
[root@srv1 ~]# kubectl get pod NAME READY STATUS RESTARTS AGE busybox 1/1 Running 0 41s hello-server-569d7866bd-79292 1/1 Running 0 8m55s hello-server-569d7866bd-q895v 1/1 Running 0 8m55s nginx-demo-554db85f85-r8qhb 1/1 Running 0 8m55s nginx-demo-554db85f85-z6q2x 1/1 Running 0 8m54s snow-7c6bdf498f-fsqxm 1/1 Running 0 4m48s snow-7c6bdf498f-lbhhg 1/1 Running 0 4m48s snow-7c6bdf498f-lqf2l 1/1 Running 0 4m48s
2) 用pod解析默认命名空间中的kubernetes---srv1操作 [root@srv1 ~]# kubectl get svc NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE hello-server ClusterIP 10.98.48.211 <none> 8000/TCP 9m27s kubernetes ClusterIP 10.96.0.1 <none> 443/TCP 171m nginx-demo ClusterIP 10.97.58.196 <none> 8000/TCP 9m29s snow NodePort 10.102.185.237 <none> 80:30775/TCP 5m22s
3) 解析测试---srv1操作 [root@srv1 ~]# kubectl exec busybox -n default -- nslookup kubernetes Server: 10.96.0.10 Address 1: 10.96.0.10 kube-dns.kube-system.svc.cluster.local
Name: kubernetes Address 1: 10.96.0.1 kubernetes.default.svc.cluster.local
4) 跨命名空间解析测试---srv1操作 [root@srv1 ~]# kubectl get svc -n kube-system NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE calico-typha ClusterIP 10.100.196.247 <none> 5473/TCP 148m default-http-backend ClusterIP 10.103.7.44 <none> 80/TCP 14m kube-dns ClusterIP 10.96.0.10 <none> 53/UDP,53/TCP,9153/TCP 173m metrics-server ClusterIP 10.98.111.221 <none> 443/TCP 83m
[root@srv1 ~]# kubectl exec busybox -n default -- nslookup kube-dns.kube-system Server: 10.96.0.10 Address 1: 10.96.0.10 kube-dns.kube-system.svc.cluster.local
Name: kube-dns.kube-system Address 1: 10.96.0.10 kube-dns.kube-system.svc.cluster.local
5) 测试所有节点均可访问kubernetes svc 443及kube-dns的ssvc 53端口---所有节点操作 [root@srv1 ~]# telnet 10.96.0.1 443 Trying 10.96.0.1... Connected to 10.96.0.1. Escape character is '^]'.
[root@srv1 ~]# telnet 10.96.0.10 53 Trying 10.96.0.10... Connected to 10.96.0.10. Escape character is '^]'.
[root@srv1 ~]# curl 10.96.0.10:53 curl: (52) Empty reply from server
6) 测试Pod之间的通信---srv1节点操作 [root@srv1 ~]# kubectl get pod -o wide | grep busybox busybox 1/1 Running 0 6m7s 172.30.172.139 srv6.1000y.cloud <none> <none>
# 可以连通证明这个pod是可跨命名空间和跨主机通信 [root@srv1 ~]# ping -c 2 172.30.172.139 PING 172.30.172.139 (172.30.172.139) 56(84) bytes of data. 64 bytes from 172.30.172.139: icmp_seq=1 ttl=63 time=0.900 ms 64 bytes from 172.30.172.139: icmp_seq=2 ttl=63 time=0.610 ms
--- 172.30.172.139 ping statistics --- 2 packets transmitted, 2 received, 0% packet loss, time 1000ms rtt min/avg/max/mdev = 0.610/0.755/0.900/0.145 ms
[root@srv1 ~]# kubectl get pod -n kube-system -o wide | grep -e calico -e metrics-server calico-kube-controllers-765c96cb9d-brhcg 1/1 Running 0 153m 172.19.158.65 srv4.1000y.cloud <none> <none> calico-node-52mdb 1/1 Running 0 153m 192.168.1.13 srv3.1000y.cloud <none> <none> calico-node-8l92t 1/1 Running 0 153m 192.168.1.15 srv5.1000y.cloud <none> <none> calico-node-pr8zs 1/1 Running 7 (131m ago) 153m 192.168.1.16 srv6.1000y.cloud <none> <none> calico-node-r2trr 1/1 Running 6 (131m ago) 153m 192.168.1.14 srv4.1000y.cloud <none> <none> calico-node-w9gnq 1/1 Running 6 (131m ago) 153m 192.168.1.12 srv2.1000y.cloud <none> <none> calico-node-znjw8 1/1 Running 6 (131m ago) 153m 192.168.1.11 srv1.1000y.cloud <none> <none> calico-typha-67c57cdf49-jtjwg 1/1 Running 0 153m 192.168.1.14 srv4.1000y.cloud <none> <none> metrics-server-76bcdc46fd-dnncv 1/1 Running 0 87m 172.16.22.3 srv5.1000y.cloud <none> <none> metrics-server-76bcdc46fd-ql6nr 1/1 Running 0 87m 172.30.172.132 srv6.1000y.cloud <none> <<one>
# 可以连通证明这个pod是可跨命名空间和跨主机通信 [root@srv1 ~]# kubectl exec -ti busybox -- sh / # ping -c 2 192.168.1.14 PING 192.168.1.14 (192.168.1.14): 56 data bytes 64 bytes from 192.168.1.14: seq=0 ttl=63 time=0.647 ms 64 bytes from 192.168.1.14: seq=1 ttl=63 time=0.752 ms
--- 192.168.1.14 ping statistics --- 2 packets transmitted, 2 packets received, 0% packet loss round-trip min/avg/max = 0.647/0.699/0.752 ms
/ # ping -c 2 172.30.172.132 PING 172.30.172.131 (172.30.172.131): 56 data bytes 64 bytes from 172.30.172.131: seq=0 ttl=62 time=1.052 ms 64 bytes from 172.30.172.131: seq=1 ttl=62 time=0.767 ms
--- 172.30.172.131 ping statistics --- 2 packets transmitted, 2 packets received, 0% packet loss round-trip min/avg/max = 0.767/0.909/1.052 ms
/ # exit
7) 创建3副本Pod测试---srv1节点操作 [root@srv1 ~]# cat > deployments.yaml << EOF apiVersion: apps/v1 kind: Deployment metadata: name: nginx-deployment labels: app: nginx spec: replicas: 3 selector: matchLabels: app: nginx template: metadata: labels: app: nginx spec: containers: - name: nginx image: nginx ports: - containerPort: 80 EOF
[root@srv1 ~]# kubectl apply -f deployments.yaml deployment.apps/nginx-deployment created
[root@srv1 ~]# kubectl get pods | grep nginx-deployment nginx-deployment-7c5ddbdf54-frcwx 1/1 Running 0 64s nginx-deployment-7c5ddbdf54-pz926 1/1 Running 0 65s nginx-deployment-7c5ddbdf54-wsjw7 1/1 Running 0 64s
[root@srv1 ~]# kubectl delete -f deployments.yaml deployment.apps "nginx-deployment" deleted
1.16 ELK集群部署
1) 创建nfs服务---srv7操作
[root@srv7 ~]# yum install nfs-utils -y
[root@srv7 ~]# vim /etc/exports # no_subtree_check: 即使输出目录是一个子目录,nfs服务器也不检查其父目录的权限,可提高效率 /data/nfs-sc *(rw,no_root_squash,no_subtree_check)
[root@srv7 ~]# mkdir -p /data/nfs-sc
[root@srv7 ~]# systemctl enable --now rpcbind nfs-server
2) 安装nfs客户端工具---k8s所有节点操作 [root@srv1 ~]# yum install nfs-utils -y [root@srv7 ~]# systemctl enable --now rpcbind
3) 为nfs-subdir-external-provisione创建ServiceAccount---srv1节点操作 [root@srv1 ~]# mkdir k8s-elk-yaml [root@srv1 ~]# cd k8s-elk-yaml
[root@srv1 k8s-elk-yaml]# vim 01-serviceaccount.yaml apiVersion: v1 kind: ServiceAccount metadata: name: nfs-client-provisioner namespace: kube-system --- kind: ClusterRole apiVersion: rbac.authorization.k8s.io/v1 metadata: name: nfs-client-provisioner-runner rules: - apiGroups: [""] resources: ["persistentvolumes"] verbs: ["get", "list", "watch", "create", "delete"] - apiGroups: [""] resources: ["persistentvolumeclaims"] verbs: ["get", "list", "watch", "update"] - apiGroups: ["storage.k8s.io"] resources: ["storageclasses"] verbs: ["get", "list", "watch"] - apiGroups: [""] resources: ["events"] verbs: ["get", "list", "watch","create", "update", "patch"] --- kind: ClusterRoleBinding apiVersion: rbac.authorization.k8s.io/v1 metadata: name: run-nfs-client-provisioner subjects: - kind: ServiceAccount name: nfs-client-provisioner namespace: kube-system roleRef: kind: ClusterRole name: nfs-client-provisioner-runner apiGroup: rbac.authorization.k8s.io --- kind: Role apiVersion: rbac.authorization.k8s.io/v1 metadata: name: leader-locking-nfs-client-provisioner namespace: kube-system rules: - apiGroups: [""] resources: ["endpoints"] verbs: ["get", "list", "watch", "create", "update", "patch"] --- kind: RoleBinding apiVersion: rbac.authorization.k8s.io/v1 metadata: name: leader-locking-nfs-client-provisioner namespace: kube-system subjects: - kind: ServiceAccount name: nfs-client-provisioner namespace: kube-system roleRef: kind: Role name: leader-locking-nfs-client-provisioner apiGroup: rbac.authorization.k8s.io
[root@srv1 11k8s-elk-yaml]# kubectl apply -f 01-serviceaccount.yaml serviceaccount/nfs-client-provisioner created clusterrole.rbac.authorization.k8s.io/nfs-client-provisioner-runner created clusterrolebinding.rbac.authorization.k8s.io/run-nfs-client-provisioner created role.rbac.authorization.k8s.io/leader-locking-nfs-client-provisioner created rolebinding.rbac.authorization.k8s.io/leader-locking-nfs-client-provisioner created
4) 部署nfs-subdir-external-provisione---srv1节点操作 [root@srv1 11k8s-elk-yaml]# vim 02-deploy-nfs.yaml apiVersion: v1 kind: ServiceAccount metadata: name: nfs-client-provisioner --- kind: Deployment apiVersion: apps/v1 metadata: name: nfs-client-provisioner namespace: kube-system spec: replicas: 1 strategy: type: Recreate selector: matchLabels: app: nfs-client-provisioner template: metadata: labels: app: nfs-client-provisioner spec: serviceAccountName: nfs-client-provisioner containers: - name: nfs-client-provisioner image: registry.cn-beijing.aliyuncs.com/xngczl/nfs-subdir-external-provisione:v4.0.0 imagePullPolicy: IfNotPresent volumeMounts: - name: nfs-client-root mountPath: /persistentvolumes env: - name: PROVISIONER_NAME # 可改为自定义名称 value: 1000y.cloud/snowchuai - name: NFS_SERVER # 改为你自己的NFS Server IP value: 192.168.1.17 - name: NFS_PATH # 改为你自己的NFS共享目录的路径 value: /data/nfs-sc volumes: - name: nfs-client-root nfs: # 改为你自己的NFS Server IP server: 192.168.1.17 # 改为你自己的NFS共享目录的路径 path: /data/nfs-sc
[root@srv1 k8s-elk-yaml]# kubectl apply -f 02-deploy-nfs.yaml serviceaccount/nfs-client-provisioner created deployment.apps/nfs-client-provisioner created
[root@srv1 k8s-elk-yaml]# kubectl get pods -A | grep nfs kube-system nfs-client-provisioner-6f46475989-qcb9x 1/1 Running 0 74s
5) 创建sc---srv1节点操作 [root@srv1 k8s-elk-yaml]# vim 03-sc-nfs.yaml apiVersion: storage.k8s.io/v1 kind: StorageClass metadata: # sc名称,如修改名称,必须确保后续调用sc的所有yaml文件都是所修改的名称 name: managed-nfs-storage annotations: storageclass.beta.kubernetes.io/is-default-class: "true" # 此处名称必须与 02-deploy-nfs.yaml 文件中的名称一致 provisioner: 1000y.cloud/snowchuai reclaimPolicy: Delete allowVolumeExpansion: True
[root@srv1 k8s-elk-yaml]# kubectl apply -f 03-sc-nfs.yaml storageclass.storage.k8s.io/managed-nfs-storage created
[root@srv1 k8s-elk-yaml]# kubectl get sc NAME PROVISIONER RECLAIMPOLICY VOLUMEBINDINGMODE managed-nfs-storage (default) 1000y.cloud/snowchuai Delete Immediate ALLOWVOLUMEEXPANSION AGE true 2s
6) 创建一个kube-logging命名空间给es---srv1操作 [root@srv1 k8s-elk-yaml]# vim 04-es-ns.yaml apiVersion: v1 kind: Namespace metadata: name: kube-logging
[root@srv1 k8s-elk-yaml]# kubectl apply -f 04-es-ns.yaml namespace/kube-logging created
[root@srv1 k8s-elk-yaml]# kubectl get ns NAME STATUS AGE default Active 39h ingress-nginx Active 36h kube-logging Active 14s kube-node-lease Active 39h kube-public Active 39h kube-system Active 39h kubernetes-dashboard Active 37h
7) 创建es service---srv1操作 [root@srv1 k8s-elk-yaml]# vim 05-es-svc.yaml kind: Service apiVersion: v1 metadata: name: elasticsearch namespace: kube-logging labels: app: elasticsearch spec: selector: app: elasticsearch clusterIP: None ports: - port: 9200 name: rest - port: 9300 name: inter-node
[root@srv1 k8s-elk-yaml]# kubectl apply -f 05-es-svc.yaml service/elasticsearch created
[root@srv1 k8s-elk-yaml]# kubectl get svc -n kube-logging NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE elasticsearch ClusterIP None <none> 9200/TCP,9300/TCP 33s
8) 部署es cluster---srv1操作 [root@srv1 k8s-elk-yaml]# vim 06-es-statefulset-deploy.yaml apiVersion: apps/v1 kind: StatefulSet metadata: name: es-cluster namespace: kube-logging spec: serviceName: elasticsearch replicas: 3 selector: matchLabels: app: elasticsearch template: metadata: labels: app: elasticsearch spec: containers: - name: elasticsearch image: registry.cn-beijing.aliyuncs.com/dotbalo/elasticsearch:v7.10.2 imagePullPolicy: IfNotPresent resources: limits: cpu: 1000m requests: cpu: 100m ports: - containerPort: 9200 name: rest protocol: TCP - containerPort: 9300 name: inter-node protocol: TCP volumeMounts: - name: data mountPath: /usr/share/elasticsearch/data env: - name: cluster.name value: k8s-logs - name: node.name valueFrom: fieldRef: fieldPath: metadata.name - name: discovery.seed_hosts value: "es-cluster-0.elasticsearch,es-cluster-1.elasticsearch,es-cluster-2.elasticsearch" - name: cluster.initial_master_nodes value: "es-cluster-0,es-cluster-1,es-cluster-2" - name: ES_JAVA_OPTS value: "-Xms512m -Xmx512m" initContainers: - name: fix-permissions image: busybox imagePullPolicy: IfNotPresent command: ["sh", "-c", "chown -R 1000:1000 /usr/share/elasticsearch/data"] securityContext: privileged: true volumeMounts: - name: data mountPath: /usr/share/elasticsearch/data - name: increase-vm-max-map image: busybox imagePullPolicy: IfNotPresent command: ["sysctl", "-w", "vm.max_map_count=262144"] securityContext: privileged: true - name: increase-fd-ulimit image: busybox imagePullPolicy: IfNotPresent command: ["sh", "-c", "ulimit -n 65536"] securityContext: privileged: true volumeClaimTemplates: - metadata: name: data labels: app: elasticsearch spec: accessModes: [ "ReadWriteOnce" ] # 如果在 03-sc-nfs.yaml 中重新定义了 sc名称,请在这里保持一致 storageClassName: managed-nfs-storage resources: requests: storage: 10Gi
[root@srv1 k8s-elk-yaml]# kubectl apply -f 06-es-statefulset-deploy.yaml statefulset.apps/es-cluster created
# 查看pod的状态 [root@srv1 k8s-elk-yaml]# kubectl get pod -n kube-logging NAME READY STATUS RESTARTS AGE es-cluster-0 1/1 Running 0 2m es-cluster-1 1/1 Running 0 5m es-cluster-2 1/1 Running 0 9m
# 查看pod的IP地址及所在主机地址 [root@srv1 k8s-elk-yaml]# kubectl get pod -l app=elasticsearch \ -o custom-columns=POD:metadata.name,Pod-IP:.status.podIP,Node-IP:.status.hostIP \ -n kube-logging POD Pod-IP Node-IP es-cluster-0 172.30.172.145 192.168.1.16 es-cluster-1 172.16.22.14 192.168.1.15 es-cluster-2 172.19.158.75 192.168.1.14
# 查看es-cluster集群状态 --- 多等一会, 大概5-15分钟 [root@srv1 k8s-elk-yaml]# curl http://172.30.172.145:9200/_cluster/health?pretty { "cluster_name" : "k8s-logs", "status" : "green", "timed_out" : false, "number_of_nodes" : 3, "number_of_data_nodes" : 3, "active_primary_shards" : 4, "active_shards" : 8, "relocating_shards" : 0, "initializing_shards" : 0, "unassigned_shards" : 0, "delayed_unassigned_shards" : 0, "number_of_pending_tasks" : 0, "number_of_in_flight_fetch" : 0, "task_max_waiting_in_queue_millis" : 0, "active_shards_percent_as_number" : 100.0 }
9) 创建Kibana service及部署---srv1操作 [root@srv1 k8s-elk-yaml]# vim 07-kibanfa-svc.yaml apiVersion: v1 kind: Service metadata: name: kibana namespace: kube-logging labels: app: kibana spec: type: NodePort ports: - port: 5601 selector: app: kibana
[root@srv1 k8s-elk-yaml]# vim 08-kibana-deploy.yaml apiVersion: apps/v1 kind: Deployment metadata: name: kibana namespace: kube-logging labels: app: kibana spec: replicas: 1 selector: matchLabels: app: kibana template: metadata: labels: app: kibana spec: containers: - name: kibana image: registry.cn-beijing.aliyuncs.com/dotbalo/kibana-oss:7.10.2 imagePullPolicy: IfNotPresent resources: limits: cpu: 1000m requests: cpu: 100m env: - name: ELASTICSEARCH_URL value: http://elasticsearch:9200 ports: - containerPort: 5601
[root@srv1 k8s-elk-yaml]# kubectl apply -f 07-kibana-svc.yaml -f 08-kibana-deploy.yaml service/kibana created deployment.apps/kibana created
10) 部署fluentd---srv1操作 [root@srv1 k8s-elk-yaml]# vim 10-fluentd-es-configmap.yaml kind: ConfigMap apiVersion: v1 metadata: name: fluentd-es-config-v0.2.1 namespace: kube-logging labels: addonmanager.kubernetes.io/mode: Reconcile data: system.conf: |- <system> root_dir /tmp/fluentd-buffers/ </system> containers.input.conf: |- <source> @id fluentd-containers.log @type tail path /var/log/containers/*.log pos_file /var/log/es-containers.log.pos tag raw.kubernetes.* read_from_head true <parse> @type multi_format <pattern> format json time_key time time_format %Y-%m-%dT%H:%M:%S.%NZ </pattern> <pattern> format /^(?<time>.+) (?<stream>stdout|stderr) [^ ]* (?<log>.*)$/ time_format %Y-%m-%dT%H:%M:%S.%N%:z </pattern> </parse> </source> # Detect exceptions in the log output and forward them as one log entry. <match raw.kubernetes.**> @id raw.kubernetes @type detect_exceptions remove_tag_prefix raw message log stream stream multiline_flush_interval 5 max_bytes 500000 max_lines 1000 </match> # Concatenate multi-line logs <filter **> @id filter_concat @type concat key message multiline_end_regexp /\n$/ separator "" </filter> # Enriches records with Kubernetes metadata <filter kubernetes.**> @id filter_kubernetes_metadata @type kubernetes_metadata </filter> # Fixes json fields in Elasticsearch <filter kubernetes.**> @id filter_parser @type parser key_name log reserve_data true remove_key_name_field true <parse> @type multi_format <pattern> format json </pattern> <pattern> format none </pattern> </parse> </filter> system.input.conf: |- # Example: # 2015-12-21 23:17:22,066 [salt.state ][INFO ] Completed state [net.ipv4.ip_forward] at time 23:17:22.066081 <source> @id minion @type tail format /^(?<time>[^ ]* [^ ,]*)[^\[]*\[[^\]]*\]\[(?<severity>[^ \]]*) *\] (?<message>.*)$/ time_format %Y-%m-%d %H:%M:%S path /var/log/salt/minion pos_file /var/log/salt.pos tag salt </source> # Example: # Dec 21 23:17:22 gke-foo-1-1-4b5cbd14-node-4eoj startupscript: Finished running startup script /var/run/google.startup.script <source> @id startupscript.log @type tail format syslog path /var/log/startupscript.log pos_file /var/log/es-startupscript.log.pos tag startupscript </source> # Examples: # time="2016-02-04T06:51:03.053580605Z" level=info msg="GET /containers/json" # time="2016-02-04T07:53:57.505612354Z" level=error msg="HTTP Error" err="No such image: -f" statusCode=404 # TODO(random-liu): Remove this after cri container runtime rolls out. <source> @id docker.log @type tail format /^time="(?<time>[^"]*)" level=(?<severity>[^ ]*) msg="(?<message>[^"]*)"( err="(?<error>[^"]*)")?( statusCode=($<status_code>\d+))?/ path /var/log/docker.log pos_file /var/log/es-docker.log.pos tag docker </source> # Example: # 2016/02/04 06:52:38 filePurge: successfully removed file /var/etcd/data/member/wal/00000000000006d0-00000000010a23d1.wal <source> @id etcd.log @type tail # Not parsing this, because it doesn't have anything particularly useful to # parse out of it (like severities). format none path /var/log/etcd.log pos_file /var/log/es-etcd.log.pos tag etcd </source> # Multi-line parsing is required for all the kube logs because very large log # statements, such as those that include entire object bodies, get split into # multiple lines by glog. # Example: # I0204 07:32:30.020537 3368 server.go:1048] POST /stats/container/: (13.972191ms) 200 [[Go-http-client/1.1] 10.244.1.3:40537] <source> @id kubelet.log @type tail format multiline multiline_flush_interval 5s format_firstline /^\w\d{4}/ format1 /^(?<severity>\w)(?<time>\d{4} [^\s]*)\s+(?<pid>\d+)\s+(?<source>[^ \]]+)\] (?<message>.*)/ time_format %m%d %H:%M:%S.%N path /var/log/kubelet.log pos_file /var/log/es-kubelet.log.pos tag kubelet </source> # Example: # I1118 21:26:53.975789 6 proxier.go:1096] Port "nodePort for kube-system/default-http-backend:http" (:31429/tcp) was open before and is still needed <source> @id kube-proxy.log @type tail format multiline multiline_flush_interval 5s format_firstline /^\w\d{4}/ format1 /^(?<severity>\w)(?<time>\d{4} [^\s]*)\s+(?<pid>\d+)\s+(?<source>[^ \]]+)\] (?<message>.*)/ time_format %m%d %H:%M:%S.%N path /var/log/kube-proxy.log pos_file /var/log/es-kube-proxy.log.pos tag kube-proxy </source> # Example: # I0204 07:00:19.604280 5 handlers.go:131] GET /api/v1/nodes: (1.624207ms) 200 [[kube-controller-manager/v1.1.3 (linux/amd64) kubernetes/6a81b50] 127.0.0.1:38266] <source> @id kube-apiserver.log @type tail format multiline multiline_flush_interval 5s format_firstline /^\w\d{4}/ format1 /^(?<severity>\w)(?<time>\d{4} [^\s]*)\s+(?<pid>\d+)\s+(?<source>[^ \]]+)\] (?<message>.*)/ time_format %m%d %H:%M:%S.%N path /var/log/kube-apiserver.log pos_file /var/log/es-kube-apiserver.log.pos tag kube-apiserver </source> # Example: # I0204 06:55:31.872680 5 servicecontroller.go:277] LB already exists and doesn't need update for service kube-system/kube-ui <source> @id kube-controller-manager.log @type tail format multiline multiline_flush_interval 5s format_firstline /^\w\d{4}/ format1 /^(?<severity>\w)(?<time>\d{4} [^\s]*)\s+(?<pid>\d+)\s+(?<source>[^ \]]+)\] (?<message>.*)/ time_format %m%d %H:%M:%S.%N path /var/log/kube-controller-manager.log pos_file /var/log/es-kube-controller-manager.log.pos tag kube-controller-manager </source> # Example: # W0204 06:49:18.239674 7 reflector.go:245] pkg/scheduler/factory/factory.go:193: watch of *api.Service ended with: 401: The event in requested index is outdated and cleared (the requested history has been cleared [2578313/2577886]) [2579312] <source> @id kube-scheduler.log @type tail format multiline multiline_flush_interval 5s format_firstline /^\w\d{4}/ format1 /^(?<severity>\w)(?<time>\d{4} [^\s]*)\s+(?<pid>\d+)\s+(?<source>[^ \]]+)\] (?<message>.*)/ time_format %m%d %H:%M:%S.%N path /var/log/kube-scheduler.log pos_file /var/log/es-kube-scheduler.log.pos tag kube-scheduler </source> # Example: # I0603 15:31:05.793605 6 cluster_manager.go:230] Reading config from path /etc/gce.conf <source> @id glbc.log @type tail format multiline multiline_flush_interval 5s format_firstline /^\w\d{4}/ format1 /^(?<severity>\w)(?<time>\d{4} [^\s]*)\s+(?<pid>\d+)\s+(?<source>[^ \]]+)\] (?<message>.*)/ time_format %m%d %H:%M:%S.%N path /var/log/glbc.log pos_file /var/log/es-glbc.log.pos tag glbc </source> # Example: # I0603 15:31:05.793605 6 cluster_manager.go:230] Reading config from path /etc/gce.conf <source> @id cluster-autoscaler.log @type tail format multiline multiline_flush_interval 5s format_firstline /^\w\d{4}/ format1 /^(?<severity>\w)(?<time>\d{4} [^\s]*)\s+(?<pid>\d+)\s+(?<source>[^ \]]+)\] (?<message>.*)/ time_format %m%d %H:%M:%S.%N path /var/log/cluster-autoscaler.log pos_file /var/log/es-cluster-autoscaler.log.pos tag cluster-autoscaler </source> # Logs from systemd-journal for interesting services. # TODO(random-liu): Remove this after cri container runtime rolls out. <source> @id journald-docker @type systemd matches [{ "_SYSTEMD_UNIT": "docker.service" }] <storage> @type local persistent true path /var/log/journald-docker.pos </storage> read_from_head true tag docker </source> <source> @id journald-container-runtime @type systemd matches [{ "_SYSTEMD_UNIT": "{{ fluentd_container_runtime_service }}.service" }] <storage> @type local persistent true path /var/log/journald-container-runtime.pos </storage> read_from_head true tag container-runtime </source> <source> @id journald-kubelet @type systemd matches [{ "_SYSTEMD_UNIT": "kubelet.service" }] <storage> @type local persistent true path /var/log/journald-kubelet.pos </storage> read_from_head true tag kubelet </source> <source> @id journald-node-problem-detector @type systemd matches [{ "_SYSTEMD_UNIT": "node-problem-detector.service" }] <storage> @type local persistent true path /var/log/journald-node-problem-detector.pos </storage> read_from_head true tag node-problem-detector </source> <source> @id kernel @type systemd matches [{ "_TRANSPORT": "kernel" }] <storage> @type local persistent true path /var/log/kernel.pos </storage> <entry> fields_strip_underscores true fields_lowercase true </entry> read_from_head true tag kernel </source> forward.input.conf: |- # Takes the messages sent over TCP <source> @id forward @type forward </source> monitoring.conf: |- # Prometheus Exporter Plugin # input plugin that exports metrics <source> @id prometheus @type prometheus </source> <source> @id monitor_agent @type monitor_agent </source> # input plugin that collects metrics from MonitorAgent <source> @id prometheus_monitor @type prometheus_monitor <labels> host ${hostname} </labels> </source> # input plugin that collects metrics for output plugin <source> @id prometheus_output_monitor @type prometheus_output_monitor <labels> host ${hostname} </labels> </source> # input plugin that collects metrics for in_tail plugin <source> @id prometheus_tail_monitor @type prometheus_tail_monitor <labels> host ${hostname} </labels> </source> output.conf: |- <match **> @id elasticsearch @type elasticsearch @log_level info type_name _doc include_tag_key true host elasticsearch port 9200 logstash_format true <buffer> @type file path /var/log/fluentd-buffers/kubernetes.system.buffer flush_mode interval retry_type exponential_backoff flush_thread_count 2 flush_interval 5s retry_forever retry_max_interval 30 chunk_limit_size 2M total_limit_size 500M overflow_action block </buffer> </match>
[root@srv1 k8s-elk-yaml]# vim 11-fluentd-es-ds.yaml apiVersion: v1 kind: ServiceAccount metadata: name: fluentd-es namespace: kube-logging labels: k8s-app: fluentd-es addonmanager.kubernetes.io/mode: Reconcile --- kind: ClusterRole apiVersion: rbac.authorization.k8s.io/v1 metadata: name: fluentd-es labels: k8s-app: fluentd-es addonmanager.kubernetes.io/mode: Reconcile rules: - apiGroups: - "" resources: - "namespaces" - "pods" verbs: - "get" - "watch" - "list" --- kind: ClusterRoleBinding apiVersion: rbac.authorization.k8s.io/v1 metadata: name: fluentd-es labels: k8s-app: fluentd-es addonmanager.kubernetes.io/mode: Reconcile subjects: - kind: ServiceAccount name: fluentd-es namespace: kube-logging apiGroup: "" roleRef: kind: ClusterRole name: fluentd-es apiGroup: "" --- apiVersion: apps/v1 kind: DaemonSet metadata: name: fluentd namespace: kube-logging labels: k8s-app: fluentd-es version: v3.1.1 addonmanager.kubernetes.io/mode: Reconcile spec: selector: matchLabels: k8s-app: fluentd-es version: v3.1.1 template: metadata: labels: k8s-app: fluentd-es version: v3.1.1 spec: securityContext: seccompProfile: type: RuntimeDefault priorityClassName: system-node-critical serviceAccountName: fluentd-es containers: - name: fluentd-es image: registry.cn-beijing.aliyuncs.com/dotbalo/fluentd:v3.1.0 env: - name: FLUENTD_ARGS value: --no-supervisor -q resources: limits: memory: 500Mi requests: cpu: 100m memory: 200Mi volumeMounts: - name: varlog mountPath: /var/log - name: varlibdockercontainers mountPath: /var/lib/docker/containers readOnly: true - name: config-volume mountPath: /etc/fluent/config.d ports: - containerPort: 24231 name: prometheus protocol: TCP livenessProbe: tcpSocket: port: prometheus initialDelaySeconds: 5 timeoutSeconds: 10 readinessProbe: tcpSocket: port: prometheus initialDelaySeconds: 5 timeoutSeconds: 10 terminationGracePeriodSeconds: 30 # 如果不需要采集所有机器的日志, 可将下面两行注释取消 # 并对所要采集日志的机器进行标签设定: kubectl label node srv1.1000y.cloud fluentd=true # 查看机器的标签: kubectl get node -l fluentd=true --show-labels #nodeSelector: # fluentd: "true" volumes: - name: varlog hostPath: path: /var/log - name: varlibdockercontainers hostPath: path: /var/lib/docker/containers - name: config-volume configMap: name: fluentd-es-config-v0.2.1
[root@srv1 k8s-elk-yaml]# kubectl apply -f 10-fluentd-es-configmap.yaml -f 11-fluentd-es-ds.yaml configmap/fluentd-es-config-v0.2.1 created serviceaccount/fluentd-es created clusterrole.rbac.authorization.k8s.io/fluentd-es created clusterrolebinding.rbac.authorization.k8s.io/fluentd-es created daemonset.apps/fluentd created
11) 确认pod及svc---srv1操作 [root@srv1 k8s-elk-yaml]# kubectl get pod,svc -n kube-logging NAME READY STATUS RESTARTS AGE pod/es-cluster-0 1/1 Running 0 18m pod/es-cluster-1 1/1 Running 0 14m pod/es-cluster-2 1/1 Running 0 11m pod/fluentd-494js 1/1 Running 0 2m13s pod/fluentd-mg44r 1/1 Running 0 2m13s pod/fluentd-s7t8m 1/1 Running 0 2m13s pod/kibana-757b69d4b9-6k8r5 1/1 Running 0 12m
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE service/elasticsearch ClusterIP None <none> 9200/TCP,9300/TCP 21m service/kibana NodePort 10.96.153.90 <none> 5601:30087/TCP 12m
12) 检查es集群的index生成情况---srv1操作 [root@srv1 k8s-elk-yaml]# curl http://172.30.172.145:9200/_cat/indices?v health status index uuid pri rep docs.count docs.deleted store.size pri.store.size green open logstash-2023.10.03 BUztejSySw-iCHdPiTN0kQ 1 1 28 0 615.5kb 397.2kb green open .kibana_1 JI9BvAaQREqWR4aIwufBrA 1 1 0 0 416b 208b green open logstash-2023.10.04 u3BxkolnSnmCkzDapWwrVA 1 1 258336 0 61.5mb 42.3mb green open logstash-1970.01.01 BP8eFmdaROOs_SbY68RJsw 1 1 512 0 134kb 76.5kb
13) 再次确认kibana service端口---srv1操作 [root@srv1 k8s-elk-yaml]# kubectl get svc -n kube-logging NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE elasticsearch ClusterIP None <none> 9200/TCP,9300/TCP 30m kibana NodePort 10.96.153.90 <none> 5601:30087/TCP 21m
14) 进入Kibana [浏览器]===> http://srv1.1000y.cloud:30087
# 如果出现: Kibana server is not ready yet 提示,多等一会。kibanfa准备中











1.17 Prometheus集群部署
1) 下载并安装helm
# ---如无特殊说明,以下操作均在srv1上执行
[root@srv1 ~]# wget https://get.helm.sh/helm-v3.13.0-linux-amd64.tar.gz
[root@srv1 ~]# tar xfz helm-v3.12.3-linux-amd64.tar.gz [root@srv1 ~]# mv linux-amd64/helm /usr/local/bin/ [root@srv1 ~]# rm -rf linux-amd64
[root@srv1 ~]# helm version version.BuildInfo{Version:"v3.12.3", GitCommit:"3a31588ad33fe3b89af5a2a54ee1d25bfe6eaa5e", GitTreeState:"clean", GoVersion:"go1.20.7"}
2) 添加helm源 [root@srv1 ~]# helm repo add appstore https://charts.grapps.cn "appstore" has been added to your repositories
[root@srv1 ~]# helm repo list NAME URL appstore https://charts.grapps.cn
[root@srv1 ~]# helm repo update appstore Hang tight while we grab the latest from your chart repositories... ...Successfully got an update from the "appstore" chart repository Update Complete. ⎈Happy Helming!⎈
################################################## 信息汇总 ################################################## 一: 互联网下载方式 1. 下载指定版本或默认版本的配置文件[默认版本不需要增加 --version 参数] # helm pull appstore/kube-prometheus-stack --version 48.2.3
2. 解开tgz包 # tar xfz kube-prometheus-stack-48.2.3.tgz
3. 修改所需要的values[如替换image信息] # vim kube-prometheus-stack/values.yaml
4. 安装[ helm install pod_name /path/app_name_dir] # helm install kube-prometheus-stack ./kube-prometheus-stack
二: Harbor中下载 1. Harbor中已经更改完成,可按以下步骤进行操作 [root@srv1 ~]# helm registry login srv7.1000y.cloud --insecure Username: admin Password: # 输入Harbor管理员密码 Login Succeeded
[root@srv1 ~]# helm pull oci://srv7.1000y.cloud/k8s/chart/kube-prometheus-stack \ --version=48.2.3 --insecure-skip-tls-verify Pulled: srv7.1000y.cloud/k8s/chart/kube-prometheus-stack:48.2.3 Digest: sha256:9b6c629781dd518e2ccbce25878e449d4ecf16cdb6c30922c40dda74dfa4b6a4
[root@srv1 ~]# tar xfz kube-prometheus-stack-48.2.3.tgz [root@srv1 ~]# helm install kube-prometheus-stack ./kube-prometheus-stack
################################################## 汇总结束 ##################################################
3) 安装Prometheus [root@srv1 ~]# helm search repo kube-prometheus-stack NAME CHART VERSION APP VERSION DESCRIPTION appstore/kube-prometheus-stack 51.2.0 v0.68.0 kube-prometheus-stack... appstore/prometheus-operator 8.0.1 0.67.1 Stripped down version...
[root@srv1 ~]# helm install kube-prometheus-stack \ appstore/kube-prometheus-stack --version 48.2.3 NAME: kube-prometheus-stack LAST DEPLOYED: Wed Oct 4 19:19:08 2023 NAMESPACE: default STATUS: deployed REVISION: 1 NOTES: kube-prometheus-stack has been installed. Check its status by running: kubectl --namespace default get pods -l "release=kube-prometheus-stack"
Visit https://github.com/prometheus-operator/kube-prometheus for instructions on how to create & configure Alertmanager and Prometheus instances using the Operator.
################################################## 问题汇总 ################################################## 1. 出现如下错误: Error: INSTALLATION FAILED: failed pre-install: 1 error occurred: * timed out waiting for the condition
2. 执行以下操作 [root@srv1 ~]# helm uninstall kube-prometheus-stack release "kube-prometheus-stack" uninstalled
[root@srv1 ~]# helm install kube-prometheus-stack \ appstore/kube-prometheus-stack --version 48.2.3
################################################## 汇总结束 ##################################################
4) 确认Prometheus相关的Pod运行正常 [root@srv1 ~]# kubectl get pod NAME READY STATUS RESTARTS AGE alertmanager-kube-prometheus-stack-alertmanager-0 2/2 Running 0 13m kube-prometheus-stack-grafana-6b49775fcc-2q2zp 3/3 Running 3 (10m ago) 28m kube-prometheus-stack-kube-state-metrics-66769fc5f5-cksh4 1/1 Running 0 28m kube-prometheus-stack-operator-79c4b88765-mlzt6 1/1 Running 0 28m kube-prometheus-stack-prometheus-node-exporter-2d6sl 1/1 Running 0 28m kube-prometheus-stack-prometheus-node-exporter-6nlh4 1/1 Running 0 28m kube-prometheus-stack-prometheus-node-exporter-7vvt5 1/1 Running 0 28m kube-prometheus-stack-prometheus-node-exporter-g47gs 1/1 Running 0 28m kube-prometheus-stack-prometheus-node-exporter-p48cz 1/1 Running 0 28m kube-prometheus-stack-prometheus-node-exporter-pqjj8 1/1 Running 0 28m prometheus-kube-prometheus-stack-prometheus-0 2/2 Running 0 13m
5) 更改Grafana的端口类型为NodePort [root@srv1 ~]# kubectl edit svc kube-prometheus-stack-grafana ...... ...... ...... ...... ...... ...... ports: - name: http-web # 于33行追加如下内容 nodePort: 32000 port: 80 protocol: TCP targetPort: 3000 selector: app.kubernetes.io/instance: kube-prometheus-stack app.kubernetes.io/name: grafana sessionAffinity: None # 将40行改为以下内容 type: NodePort status: loadBalancer: {}
[root@srv1 ~]# kubectl get svc kube-prometheus-stack-grafana NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE kube-prometheus-stack-grafana NodePort 10.99.251.178 <none> 80:32000/TCP 40m
5) 获取Grafana的管理员账户及密码 [root@srv1 ~]# kubectl get secrets kube-prometheus-stack-grafana -o jsonpath='{.data.admin-user}' | base64 -d admin
[root@srv1 ~]# kubectl get secrets kube-prometheus-stack-grafana -o jsonpath='{.data.admin-password}' | base64 -d prom-operator
6) 登录Grafana面板并添加数据源 [浏览器]===>http://srv1.1000y.cloud:32000








1.18 结合Harbor仓库
1) 生成证书---srv7操作
[root@srv7 ~]# vim /etc/pki/tls/openssl.cnf
......
......
......
......
......
......
# 172行,将本机改为CA认证中心 basicConstraints=CA:TRUE
...... ...... ...... ...... ...... ......
# 创建root CA [root@srv7 ~]# /etc/pki/tls/misc/CA -newca CA certificate filename (or enter to create) # 回车 Making CA certificate ... Generating a 2048 bit RSA private key .................+++ .........................................+++ writing new private key to '/etc/pki/CA/private/./cakey.pem' Enter PEM pass phrase: # 输入密码 Verifying - Enter PEM pass phrase: # 确认密码 ----- You are about to be asked to enter information that will be incorporated into your certificate request. What you are about to enter is what is called a Distinguished Name or a DN. There are quite a few fields but you can leave some blank For some fields there will be a default value, If you enter '.', the field will be left blank. ----- Country Name (2 letter code) [XX]:CN State or Province Name (full name) []:BeiJing Locality Name (eg, city) [Default City]:BeiJing Organization Name (eg, company) [Default Company Ltd]:1000y.cloud Organizational Unit Name (eg, section) []:tech Common Name (eg, your name or your server's hostname) []:1000y.cloud Email Address []:
Please enter the following 'extra' attributes to be sent with your certificate request A challenge password []: An optional company name []: Using configuration from /etc/pki/tls/openssl.cnf Enter pass phrase for /etc/pki/CA/private/./cakey.pem: # 输入密码 Check that the request matches the signature Signature ok Certificate Details: Serial Number: ee:ed:72:48:b5:f5:a5:8c Validity Not Before: Oct 4 13:22:32 2023 GMT Not After : Oct 3 13:22:32 2026 GMT Subject: countryName = CN stateOrProvinceName = BeiJing organizationName = 1000y.cloud organizationalUnitName = tech commonName = 1000y.cloud X509v3 extensions: X509v3 Subject Key Identifier: C1:58:87:3A:57:99:08:9E:47:14:3B:5D:71:B4:D9:96:E9:FE:E2:3E X509v3 Authority Key Identifier: keyid:C1:58:87:3A:57:99:08:9E:47:14:3B:5D:71:B4:D9:96:E9:FE:E2:3E
X509v3 Basic Constraints: CA:TRUE Certificate is to be certified until Oct 3 13:22:32 2026 GMT (1095 days)
Write out database with 1 new entries Data Base Updated
[root@srv7 ~]# mkdir /opt/docker/registry/certs/ [root@srv7 ~]# cd /opt/docker/registry/certs/
[root@srv7 certs]# openssl genrsa -aes128 2048 > domain.key Enter PEM pass phrase: # 输入密码 Verifying - Enter PEM pass phrase:
[root@srv7 certs]# openssl rsa -in domain.key -out domain.key Enter pass phrase for server.key: # 输入密码 writing RSA key
[root@srv7 certs]# openssl req -utf8 -new -key domain.key -out domain.csr You are about to be asked to enter information that will be incorporated into your certificate request. What you are about to enter is what is called a Distinguished Name or a DN. There are quite a few fields but you can leave some blank For some fields there will be a default value, If you enter '.', the field will be left blank. ----- Country Name (2 letter code) [AU]:CN State or Province Name (full name) [Some-State]:BeiJing Locality Name (eg, city) []:BeiJing Organization Name (eg, company) [Internet Widgits Pty Ltd]:1000y.cloud Organizational Unit Name (eg, section) []:Tech Common Name (e.g. server FQDN or YOUR name) []:srv7.1000y.cloud Email Address []:
Please enter the following 'extra' attributes to be sent with your certificate request A challenge password []: An optional company name []:
[root@srv7 certs]# vim /etc/pki/tls/openssl.cnf ...... ...... ...... ...... ...... ......
# 于文件最后追加如下内容 [ 1000y.cloud ] subjectAltName = DNS:srv7.1000y.cloud, IP:192.168.1.17
[root@srv7 certs]# openssl ca \ -keyfile /etc/pki/CA/private/cakey.pem \ -cert /etc/pki/CA/cacert.pem \ -in ./domain.csr -out ./domain.crt \ -extfile /etc/pki/tls/openssl.cnf \ -extensions 1000y.cloud Enter pass phrase for /etc/pki/CA/private/cakey.pem # 输入CA密码 Check that the request matches the signature Signature ok Certificate Details: Serial Number: ee:ed:72:48:b5:f5:a5:8d Validity Not Before: Oct 4 13:26:08 2023 GMT Not After : Oct 3 13:26:08 2024 GMT Subject: countryName = CN stateOrProvinceName = BeiJing organizationName = 1000y.cloud organizationalUnitName = tech commonName = srv7.1000y.cloud X509v3 extensions: X509v3 Subject Alternative Name: DNS:srv7.1000y.cloud, IP Address:192.168.1.17 Certificate is to be certified until Oct 3 13:26:08 2024 GMT (365 days) Sign the certificate? [y/n]:y
1 out of 1 certificate requests certified, commit? [y/n]y Write out database with 1 new entries Data Base Updated Using configuration from /etc/pki/tls/openssl.cnf
[root@srv7 certs]# cd
[root@srv7 ~]# cat /etc/pki/CA/cacert.pem >> /etc/pki/tls/certs/ca-bundle.crt
2) 部署Harbor---srv7操作 (1) 安装好docker及docker-compose工具
(2) 配置并部署Harbor [root@srv7 ~]# curl -O \ https://github.com/goharbor/harbor/releases/download/v2.9.0/harbor-offline-installer-v2.9.0.tgz
[root@srv7 ~]# tar xfz harbor-offline-installer-v2.9.0.tgz [root@srv7 ~]# cd harbor/ [root@srv7 harbor]# cp harbor.yml.tmpl harbor.yml
[root@srv7 harbor]# vim harbor.yml ...... ...... ...... ...... ...... ...... # 修改第5行,更改HarBor主机名称 hostname: srv7.1000y.cloud
...... ...... ...... ...... ...... ...... # 修改第17-18行,更改证书所在路径及文件名 certificate: /opt/docker/registry/certs/domain.crt private_key: /opt/docker/registry/certs/domain.key
...... ...... ...... ...... ...... ...... # 注意36行,harbor_admin_password的密码 arbor_admin_password: 123456
...... ...... ...... ...... ...... ......
[root@srv7 harbor]# ./prepare prepare base dir is set to /root/harbor Unable to find image 'goharbor/prepare:v2.9.0' locally Trying to pull repository docker.io/goharbor/prepare ... v2.9.0: Pulling from docker.io/goharbor/prepare ...... ...... ...... ...... ...... ...... Successfully called func: create_root_cert Generated configuration file: /compose_location/docker-compose.yml Clean up the input dir
[root@srv7 harbor]# ./install.sh
[Step 0]: checking if docker is installed ...
...... ...... ...... ...... ...... ...... ✔ ----Harbor has been installed and started successfully.----
[root@srv7 harbor]# docker-compose ps
################################################## 错误汇总 ################################################## 1. 如果重启docker服务后,可能会导致harbor有些进程无法启动,导致无法访问harbor.可按以下操作 [root@srv7 harbor]# docker-compose ps [root@srv7 harbor]# cd harbor/ [root@srv7 harbor]# docker-compose up -d
2. 停止所有由docker-compose启动的服务 [root@srv7 harbor]# docker-compose stop
################################################## 汇总结束 ##################################################
(3) 配置并部署Harbor---srv7操作 [浏览器]---> http://harbor_srv_fqdn/

# 进入后创建一个名为 k8s 的项目
(4) 测试 [root@srv7 harbor]# vim /etc/docker/daemon.json { "registry-mirrors": ["https://3laho3y3.mirror.aliyuncs.com"], "insecure-registries": ["https://srv7.1000y.cloud"] }
[root@srv7 harbor]# systemctl restart docker
(5) 测试 [root@srv7 harbor]# docker images REPOSITORY TAG IMAGE ID CREATED SIZE nginx latest 605c77e624dd 21 months ago 141 MB
[root@srv7 harbor]# docker tag nginx:latest srv7.1000y.cloud/k8s/nginx
[root@srv7 harbor]# docker login srv7.1000y.cloud Username: admin Password: Login Succeeded
[root@srv7 harbor]# docker push srv7.1000y.cloud/k8s/nginx The push refers to a repository [srv2.1000y.cloud/library/nginx] d874fd2bc83b: Pushed 32ce5f6a5106: Pushed f1db227348d0: Pushed b8d6e692a25e: Pushed e379e8aedd4d: Pushed 2edcec3590a4: Pushed latest: digest: sha256:ee89b00528ff4f02f2405e4ee221743ebc3f8e8dd0bfd5c4c20a2fa2aaa7ede3 size: 1570
3) Containerd配置 (1) 将srv7上的cacert.pem复制到k8s所有节点 [root@srv1 ~]# scp srv7.1000y.cloud:/etc/pki/CA/cacert.pem . [root@srv1 ~]# cat cacert.pem >> /etc/pki/tls/certs/ca-bundle.crt
[root@srv1 ~]# for node in srv2.1000y.cloud srv3.1000y.cloud srv4.1000y.cloud srv5.1000y.cloud srv6.1000y.cloud do scp ./cacert.pem $node:~ ssh $node "cat cacert.pem >> /etc/pki/tls/certs/ca-bundle.crt" done
(2) 所有节点配置Containerd配置文件 [root@srv1 ~]# vim /etc/containerd/config.toml ...... ...... ...... ...... ...... ...... # 修改144-155行为以下内容 [plugins."io.containerd.grpc.v1.cri".registry] [plugins."io.containerd.grpc.v1.cri".registry."srv7.1000y.cloud"] config_path = ""
[plugins."io.containerd.grpc.v1.cri".registry.auths] [plugins."io.containerd.grpc.v1.cri".registry.configs."srv7.1000y.cloud".auth] username = "admin" password = "123456"
[plugins."io.containerd.grpc.v1.cri".registry.configs] [plugins."io.containerd.grpc.v1.cri".registry.configs."srv7.1000y.cloud".tls] insecure_skip_verify = true
...... ...... ...... ...... ...... ......
[root@srv1 ~]# systemctl restart containerd
(3) Containerd镜像拉取测试---srv1上操作 [root@srv1 ~]# crictl pull srv7.1000y.cloud/k8s/nginx Image is up to date for sha256:605c77e624ddb75e6110f997c58876baa13f8754486b461117934b24a9dc3a85
[root@srv1 ~]# crictl images | grep srv7 srv7.1000y.cloud/k8s/nginx latest 605c77e624ddb 56.7MB
[root@srv1 ~]# crictl rmi srv7.1000y.cloud/k8s/nginx Deleted: srv7.1000y.cloud/k8s/nginx:latest
4) 与kubernetes结合---srv1上操作 (1) 创建并确认regcred [root@srv1 ~]# kubectl create secret docker-registry regcred \ --docker-server=srv7.1000y.cloud \ --docker-username=admin \ --docker-password=123456 secret/regcred created
[root@srv1 ~]# kubectl get secrets | grep regcred regcred kubernetes.io/dockerconfigjson 1 22s
(2) 查看regcred详细信息 [root@srv1 ~]# kubectl get secret regcred --output=yaml apiVersion: v1 data: .dockerconfigjson: eyJhdXRocyI6eyJzcnY3LjEwMDB5LmNsb3VkIjp7InVzZXJuYW1lIjoiYWRtaW4iLCJwYXNzd29yZCI6IjE yMzQ1NiIsImF1dGgiOiJZV1J0YVc0Nk1USXpORFUyIn19fQ== kind: Secret metadata: creationTimestamp: "2023-10-04T15:07:54Z" name: regcred namespace: default resourceVersion: "142119" uid: 3a27d14f-01c5-4288-8204-2f8e3a9ddb34 type: kubernetes.io/dockerconfigjson
(3) 用base64查看dockerconfigjson中所包含 的用户名和密码等信息 [root@srv1 ~]# kubectl get secret regcred \ --output="jsonpath={.data.\.dockerconfigjson}" | base64 -d {"auths":{"srv7.1000y.cloud":{"username":"admin","password":"123456","auth":"YWRtaW46MTIzNDU2"}}}
5) 集成测试---srv1上操作 (1) 创建一个 Pod 测试 [root@srv1 ~]# vim private-nginx.yml # 于新文件内添加如下内容 apiVersion: v1 kind: Pod metadata: name: private-nginx spec: containers: - name: private-nginx # 设定私有仓库及镜像 image: srv7.1000y.cloud/k8s/nginx imagePullSecrets: # 添加认证名称 - name: regcred
[root@srv1 ~]# kubectl create -f private-nginx.yml pod/private-nginx created
(2) 确认镜像来源 [root@srv1 ~]# kubectl get pods | grep private-nginx private-nginx 1/1 Running 0 2m7s
[root@srv1 ~]# kubectl describe pods private-nginx | grep Image: Image: srv7.1000y.cloud/k8s/nginx
(3) 删除pods [root@srv1 ~]# kubectl delete -f private-nginx.yml pod "private-nginx" deleted
1.19 安装自动补全功能
 
[root@srv1 ~]# yum install bash-completion -y
[root@srv1 ~]# source /usr/share/bash-completion/bash_completion
[root@srv1 ~]# source <(kubectl completion bash)
[root@srv1 ~]# echo "source <(kubectl completion bash)" >> ~/.bashrc  
1.20 国内镜像仓库
cr.l5d.io/  ===> m.daocloud.io/cr.l5d.io/
docker.elastic.co/  ===> m.daocloud.io/docker.elastic.co/
docker.io/  ===> m.daocloud.io/docker.io/
gcr.io/  ===> m.daocloud.io/gcr.io/
ghcr.io/  ===> m.daocloud.io/ghcr.io/
k8s.gcr.io/  ===> m.daocloud.io/k8s.gcr.io/
mcr.microsoft.com/  ===> m.daocloud.io/mcr.microsoft.com/
nvcr.io/  ===> m.daocloud.io/nvcr.io/
quay.io/  ===> m.daocloud.io/quay.io/
registry.jujucharms.com/  ===> m.daocloud.io/registry.jujucharms.com/
registry.k8s.io/  ===> m.daocloud.io/registry.k8s.io/
registry.opensource.zalan.do/  ===> m.daocloud.io/registry.opensource.zalan.do/
rocks.canonical.com/  ===> m.daocloud.io/rocks.canonical.com/
1.21 安装时etcd服务翻滚
1) etcd节点翻滚 
[root@srv1 ~]# etcdctl \
--endpoints="192.168.1.11:2379,192.168.1.12:2379,192.168.1.13:2379" \
--cacert=/etc/etcd/ssl/etcd-ca.pem --cert=/etc/etcd/ssl/etcd.pem \
--key=/etc/etcd/ssl/etcd-key.pem  endpoint health --write-out=table
{"level":"warn","ts":"2023-10-05T15:59:52.615519+0800","logger":"client","caller":
"v3@v3.5.9/retry_interceptor.go:62","msg":"retrying of unary invoker failed","target":
"etcd-endpoints://0xc0003b4a80/192.168.1.11:2379","attempt":0,"error":"rpc error: code = 
DeadlineExceeded desc = latest balancer error: last connection error: connection error: 
desc = \"transport: Error while dialing dial tcp 192.168.1.11:2379: connect: connection refused\""}
+-------------------+--------+--------------+---------------------------+
|     ENDPOINT      | HEALTH |     TOOK     |           ERROR           |
+-------------------+--------+--------------+---------------------------+
| 192.168.1.12:2379 |   true | 1.350706779s |                           |
| 192.168.1.13:2379 |   true | 1.204151786s |                           |
| 192.168.1.11:2379 |  false | 5.001363217s | context deadline exceeded |
+-------------------+--------+--------------+---------------------------+
[root@srv1 ~]# etcdctl \ --endpoints="192.168.1.11:2379,192.168.1.12:2379,192.168.1.13:2379" \ --cacert=/etc/etcd/ssl/etcd-ca.pem --cert=/etc/etcd/ssl/etcd.pem \ --key=/etc/etcd/ssl/etcd-key.pem member list --write-out=table +------------------+---------+------------------+---------------------------+---------------------------+------------+ | ID | STATUS | NAME | PEER ADDRS | CLIENT ADDRS | IS LEARNER | +------------------+---------+------------------+---------------------------+---------------------------+------------+ | ac7e57d44f030e8 | started | srv2.1000y.cloud | https://192.168.1.12:2380 | https://192.168.1.12:2379 | false | | 40ba37809e1a423f | started | srv3.1000y.cloud | https://192.168.1.13:2380 | https://192.168.1.13:2379 | false | | 486c1127759f2e55 | started | srv1.1000y.cloud | https://192.168.1.11:2380 | https://192.168.1.11:2379 | false | +------------------+---------+------------------+---------------------------+---------------------------+------------+
2) 移除etcd故障节点 [root@srv1 ~]# etcdctl \ --endpoints="192.168.1.11:2379,192.168.1.12:2379,192.168.1.13:2379" \ --cacert=/etc/etcd/ssl/etcd-ca.pem \ --cert=/etc/etcd/ssl/etcd.pem \ --key=/etc/etcd/ssl/etcd-key.pem member remove ce146ffbf5fd1c12 Member ce146ffbf5fd1c12 removed from cluster a3af6d9a178535cc
3) 清除etcd故障节点的数据 [root@srv1 ~]# cd /var/lib/etcd [root@srv1 etcd]# rm -rf * [root@srv1 ~]# cd
4) 重写etcd配置文件 [root@srv1 ~]# vim /etc/etcd/etcd.config.yml ...... ...... ...... ...... ...... ...... initial-cluster-token: 'etcd-k8s-cluster' # 修改21行,其内容如下 initial-cluster-state: 'existing' strict-reconfig-check: false ...... ...... ...... ...... ...... ......
5) 重新将etcd节点加入集群 [root@srv1 ~]# etcdctl \ --endpoints="192.168.1.11:2379,192.168.1.12:2379,192.168.1.13:2379" \ --cacert=/etc/etcd/ssl/etcd-ca.pem \ --cert=/etc/etcd/ssl/etcd.pem \ --key=/etc/etcd/ssl/etcd-key.pem member add srv1.1000y.cloud \ --peer-urls=https://192.168.1.11:2380 Member 486c1127759f2e55 added to cluster a3af6d9a178535cc
ETCD_NAME="srv1.1000y.cloud" ETCD_INITIAL_CLUSTER="srv2.1000y.cloud=https://192.168.1.12:2380,srv3.1000y.cloud=https://192.168.1.13:2380,srv1.1000y.cloud=https://192.168.1.11:2380" ETCD_INITIAL_ADVERTISE_PEER_URLS="https://192.168.1.11:2380" ETCD_INITIAL_CLUSTER_STATE="existing"
6) 重启etcd节点 [root@srv1 ~]# systemctl restart etcd
7) 确认 [root@srv1 ~]# etcdctl \ --endpoints="192.168.1.11:2379,192.168.1.12:2379,192.168.1.13:2379" \ --cacert=/etc/etcd/ssl/etcd-ca.pem \ --cert=/etc/etcd/ssl/etcd.pem \ --key=/etc/etcd/ssl/etcd-key.pem endpoint health \ --write-out=table +-------------------+--------+--------------+-------+ | ENDPOINT | HEALTH | TOOK | ERROR | +-------------------+--------+--------------+-------+ | 192.168.1.12:2379 | true | 358.51544ms | | | 192.168.1.13:2379 | true | 358.494477ms | | | 192.168.1.11:2379 | true | 378.163923ms | | +-------------------+--------+--------------+-------+
1.22 calico-kube-controllers Pod无法正常running
1) 查找 calico-kube-controllers-xxxxxxxxxx-xxxxx 所在节点
2) 重启 kubelet及kube-proxy POD/服务---[服务可全节点全部重启]
2. 二进制方式实现Kubernetes集群---Version:1.28.2
2.1 准备须知
1) 拓扑

--------+-------------------------+-------------------------+------------
        |                         |                         |
    eth0|192.168.1.11         eth0|192.168.1.12         eth0|192.168.1.13
+-------+-------------+   +-------+------------+   +--------+-----------+
|  [ Manager Node ]   |   |   [ Master Node ]  |   |  [ Master Node ]   |
| [srv1.1000y.cloud]  |   | [srv2.1000y.cloud] |   | [srv3.1000y.cloud] |
+-------+-------------+   +-------+------------+   +--------+-----------+
        |                         |                         |
    eth0|192.168.1.14         eth0|192.168.1.15         eth0|192.168.1.16
+-------+-------------+   +-------+------------+   +--------+-----------+
|  [ Worker Node ]    |   |   [ Woker Node ]   |   |  [ Woker Node ]    |
| [srv1.1000y.cloud]  |   | [srv2.1000y.cloud] |   | [srv3.1000y.cloud] |
+-------+-------------+   +-------+------------+   +--------+-----------+
2) 配置要求 1. Master节点: 2Core/4G Mem 2. Worker节点: 4Core/8G Mem
3) 网段 物理网段: 192.168.1.0/24 Service网段: 10.96.0.0/12 Pods网段: 172.16.0.0/12
4) 系统环境 OS: CentOS Linux release 7.9.2009 (Core) Kernel: 6.5.5-1.el7.elrepo.x86_64 硬件架构: x86_64 NTP: 所有节点均同步完成 SELinux: 已关闭 Firewall: 已关闭 本地FQDN解析已完成 本地主机配置完成: IP地址、子网掩码、默认网关[必须]、DNS服务器IP地址、NTP服务器IP地址
5) 软件版本 cni-plugins: 1.3.0 cri_containerd-cni: 1.6.24 crictl: v1.28.0 kubernetes: 1.28.2 etcd: 3.5.9 cfssl: 1.6.4
2.2 前期准备---所有节点操作---[单独节点操作将会特殊说明]
1) 更新所有节点的内核---[版本要求: 4.18+]
[root@srv1 ~]# yum install wget psmisc vim net-tools nfs-utils telnet yum-utils device-mapper-persistent-data lvm2 git tar curl -y
[root@srv1 ~]# yum install elrepo-release -y
[root@srv1 ~]# sed -i "s@mirrorlist@#mirrorlist@g" /etc/yum.repos.d/elrepo.repo [root@srv1 ~]# sed -i "s@http://elrepo.org/linux@https://mirrors.aliyun.com/elrepo/@g" /etc/yum.repos.d/elrepo.repo
# 安装最新的内核 # 稳定版为kernel-ml,如需更新长期维护版本kernel-lt # 如想查看ml/lt的各可用的内核版本,可按以下命令操作 [root@srv1 ~]# yum --enablerepo=elrepo-kernel search kernel-ml --showduplicates
[root@srv1 ~]# yum --enablerepo=elrepo-kernel search kernel-lt --showduplicates
################################################## 问题汇总 ##################################################
1. 目前elrepo中的kernel无法下载,请移步至https://mirrors.coreix.net/elrepo-archive-archive/kernel/el7/x86_64/RPMS/下载
2. 下载为kernel-lt/ml-$version, kernel-tools-$version, kernel-tools-libs-$version
################################################## 汇总结束 ##################################################
# 安装最新的ml内核并使用 [root@srv1 ~]# yum --enablerepo=elrepo-kernel install kernel-ml -y
[root@srv1 ~]# grubby --set-default $(ls /boot/vmlinuz-* | grep elrepo) ; grubby --default-kernel /boot/vmlinuz-6.5.5-1.el7.elrepo.x86_64 [root@srv1 ~]# reboot
[root@srv1 ~]# uname -r 6.5.5-1.el7.elrepo.x86_64
2) 修改网络配置 [root@srv1 ~]# cat > /etc/NetworkManager/conf.d/calico.conf << EOF [keyfile] unmanaged-devices=interface-name:cali*;interface-name:tunl* EOF ################################################## 文件说明 ##################################################
# 这个参数用于指定不由 NetworkManager 管理的设备。它由以下两个部分组成 # # interface-name:cali* # 表示以 "cali" 开头的接口名称被排除在 NetworkManager 管理之外。例如,"cali0", "cali1" 等接口不受 NetworkManager 管理。 # # interface-name:tunl* # 表示以 "tunl" 开头的接口名称被排除在 NetworkManager 管理之外。例如,"tunl0", "tunl1" 等接口不受 NetworkManager 管理。 # # 通过使用这个参数,可以将特定的接口排除在 NetworkManager 的管理范围之外,以便其他工具或进程可以独立地管理和配置这些接口。
################################################## 说明结束 ##################################################
[root@srv1 ~]# systemctl restart NetworkManager
3) 所有节点关闭swap [root@srv1 ~]# vim /etc/fstab
# # /etc/fstab # Created by anaconda on Sun Dec 5 14:41:17 2021 # # Accessible filesystems, by reference, are maintained under '/dev/disk' # See man pages fstab(5), findfs(8), mount(8) and/or blkid(8) for more info # UUID=eaca2437-8d59-47e4-bacb-4de06d26b7c8 / ext4 defaults 1 1 UUID=95c7c42a-7569-4f80-aec4-961628270ce7 /boot ext4 defaults 1 2 UUID=bf9e568d-ae5d-43d1-948c-90a45d731ec8 swap swap noauto,defaults 0 0
# sysctl -w vm.swappiness=0: 这个命令用于修改vm.swappiness参数的值为0,表示系统在物理内存充足时更倾向于使用物理内存而非交换分区。 [root@srv1 ~]# swapoff -a && sysctl -w vm.swappiness=0
4) 配置ulimit [root@srv1 ~]# ulimit -SHn 65535 [root@srv1 ~]# cat >> /etc/security/limits.conf <<EOF * soft nofile 655360 * hard nofile 131072 * soft nproc 655350 * hard nproc 655350 * soft memlock unlimited * hard memlock unlimited EOF
[root@srv1 ~]# reboot
################################################## 参数说明 ##################################################
# soft nofile 655360 # soft表示软限制,nofile表示一个进程可打开的最大文件数,默认值为1024。这里的软限制设置为655360,即一个进程可打开的最大文件数为655360。
# hard nofile 131072 # hard表示硬限制,即系统设置的最大值。nofile表示一个进程可打开的最大文件数,默认值为4096。这里的硬限制设置为131072,即系统设置的最大文件数为131072。
# soft nproc 655350 # soft表示软限制,nproc表示一个用户可创建的最大进程数,默认值为30720。这里的软限制设置为655350,即一个用户可创建的最大进程数为655350。
# hard nproc 655350 # hard表示硬限制,即系统设置的最大值。nproc表示一个用户可创建的最大进程数,默认值为4096。这里的硬限制设置为655350,即系统设置的最大进程数为655350。
# seft memlock unlimited # seft表示软限制,memlock表示一个进程可锁定在RAM中的最大内存,默认值为64 KB。这里的软限制设置为unlimited,即一个进程可锁定的最大内存为无限制。
# hard memlock unlimited # hard表示硬限制,即系统设置的最大值。memlock表示一个进程可锁定在RAM中的最大内存,默认值为64 KB。这里的硬限制设置为unlimited,即系统设置的最大内存锁定为无限制。
################################################## 说明结束 ##################################################
5) 配置ssh免密登录---于srv1上操作 [root@srv1 ~]# ssh-keygen -q -N '' [root@srv1 ~]# ssh-copy-id srv2.1000y.cloud [root@srv1 ~]# ssh-copy-id srv3.1000y.cloud [root@srv1 ~]# ssh-copy-id srv4.1000y.cloud [root@srv1 ~]# ssh-copy-id srv5.1000y.cloud [root@srv1 ~]# ssh-copy-id srv6.1000y.cloud
5) 安装ipvsadm等工具 [root@srv1 ~]# yum install ipvsadm ipset sysstat conntrack libseccomp -y
[root@srv1 ~]# cat >> /etc/modules-load.d/ipvs.conf <<EOF ip_vs ip_vs_rr ip_vs_wrr ip_vs_sh nf_conntrack ip_tables ip_set xt_set ipt_set ipt_rpfilter ipt_REJECT ipip EOF
[root@srv1 ~]# systemctl restart systemd-modules-load.service
[root@srv1 ~]# lsmod | grep -e ip_vs -e nf_conntrack ip_vs_sh 12288 0 ip_vs_wrr 12288 0 ip_vs_rr 12288 0 ip_vs 200704 6 ip_vs_rr,ip_vs_sh,ip_vs_wrr nf_conntrack 188416 1 ip_vs nf_defrag_ipv6 24576 2 nf_conntrack,ip_vs nf_defrag_ipv4 12288 1 nf_conntrack libcrc32c 12288 2 nf_conntrack,ip_vs
6) 修改内核参数 [root@srv1 ~]# cat <<EOF > /etc/sysctl.d/k8s.conf net.ipv4.ip_forward = 1 net.bridge.bridge-nf-call-iptables = 1 fs.may_detach_mounts = 1 vm.overcommit_memory=1 vm.panic_on_oom=0 fs.inotify.max_user_watches=89100 fs.file-max=52706963 fs.nr_open=52706963 net.netfilter.nf_conntrack_max=2310720
net.ipv4.tcp_keepalive_time = 600 net.ipv4.tcp_keepalive_probes = 3 net.ipv4.tcp_keepalive_intvl =15 net.ipv4.tcp_max_tw_buckets = 36000 net.ipv4.tcp_tw_reuse = 1 net.ipv4.tcp_max_orphans = 327680 net.ipv4.tcp_orphan_retries = 3 net.ipv4.tcp_syncookies = 1 net.ipv4.tcp_max_syn_backlog = 16384 net.ipv4.ip_conntrack_max = 65536 net.ipv4.tcp_max_syn_backlog = 16384 net.ipv4.tcp_timestamps = 0 net.core.somaxconn = 16384
net.ipv6.conf.all.disable_ipv6 = 0 net.ipv6.conf.default.disable_ipv6 = 0 net.ipv6.conf.lo.disable_ipv6 = 0 net.ipv6.conf.all.forwarding = 1 EOF

[root@srv1 ~]# sysctl --system * Applying /usr/lib/sysctl.d/00-system.conf ... * Applying /usr/lib/sysctl.d/10-default-yama-scope.conf ... kernel.yama.ptrace_scope = 0 ...... ...... ...... ...... ...... ...... net.ipv6.conf.lo.disable_ipv6 = 0 net.ipv6.conf.all.forwarding = 1 * Applying /etc/sysctl.conf ...
2.3 使用Containerd作为Runtime---所有节点操作
1) 安装及配置cni插件---[下载地址: https://github.com/containernetworking/plugins/releases/]
(1) 创建cni插件所需要的目录
[root@srv1 ~]# mkdir -p /etc/cni/net.d /opt/cni/bin
(2) 解压cni插件 [root@srv1 ~]# tar xf cni-plugins-linux-amd64-v1.3.0.tgz -C /opt/cni/bin/
2) 下载并配置cri-containerd-cni---[https://github.com/containerd/containerd/releases/] (1) 解压cri-containerd-cni [root@srv1 ~]# tar -xzf cri-containerd-cni-1.6.24-linux-amd64.tar.gz -C /
(2) 创建containerd.service [root@srv1 ~]# cat > /etc/systemd/system/containerd.service <<EOF [Unit] Description=containerd container runtime Documentation=https://containerd.io After=network.target local-fs.target
[Service] ExecStartPre=-/sbin/modprobe overlay ExecStart=/usr/local/bin/containerd Type=notify Delegate=yes KillMode=process Restart=always RestartSec=5 LimitNPROC=infinity LimitCORE=infinity LimitNOFILE=infinity TasksMax=infinity OOMScoreAdjust=-999
[Install] WantedBy=multi-user.target EOF

(3) 配置containerd所需要的模块 [root@srv1 ~]# cat <<EOF >> /etc/modules-load.d/containerd.conf overlay br_netfilter EOF
[root@srv1 ~]# systemctl restart systemd-modules-load.service
(4) 配置containerd所需要的内核模块 [root@srv1 ~]# cat <<EOF >> /etc/sysctl.d/99-kubernetes-cri.conf net.bridge.bridge-nf-call-iptables = 1 net.ipv4.ip_forward = 1 net.bridge.bridge-nf-call-ip6tables = 1 EOF [root@srv1 ~]# sysctl --system
(5) 创建containerd配置文件 [root@srv1 ~]# mkdir -p /etc/containerd
[root@srv1 ~]# containerd config default > /etc/containerd/config.toml
[root@srv1 ~]# sed -i "s#SystemdCgroup\ \=\ false#SystemdCgroup\ \=\ true#g" /etc/containerd/config.toml [root@srv1 ~]# cat /etc/containerd/config.toml | grep SystemdCgroup SystemdCgroup = true
[root@srv1 ~]# sed -i "s#registry.k8s.io#m.daocloud.io/registry.k8s.io#g" /etc/containerd/config.toml [root@srv1 ~]# cat /etc/containerd/config.toml | grep sandbox_image sandbox_image = "m.daocloud.io/registry.k8s.io/pause:3.6"
[root@srv1 ~]# vim /etc/containerd/config.toml ...... ...... ...... ...... ...... ...... [plugins."io.containerd.grpc.v1.cri".registry.mirrors] # 于154-155行添加如下内容 [plugins."io.containerd.grpc.v1.cri".registry.mirrors."docker.io"] endpoint = ["https://3laho3y3.mirror.aliyuncs.com"]
...... ...... ...... ...... ...... ......
(6) 启动containerd [root@srv1 ~]# systemctl daemon-reload && systemctl enable --now containerd
3) 配置crictl客户端连接的运行时位置---[下载地址: https://github.com/kubernetes-sigs/cri-tools/releases/] [root@srv1 ~]# tar xf crictl-v1.28.0-linux-amd64.tar.gz -C /usr/bin/
[root@srv1 ~]# cat > /etc/crictl.yaml <<EOF runtime-endpoint: unix:///run/containerd/containerd.sock image-endpoint: unix:///run/containerd/containerd.sock timeout: 10 debug: false EOF
[root@srv1 ~]# systemctl restart containerd
# 测试 [root@srv1 ~]# crictl info
2.4 k8s与etcd下载及安装---仅在Srv1上操作
1) 解压k8s安装包
(1) https://github.com/kubernetes/kubernetes/tree/master/CHANGELOG下载1.28.2版本
(2) 解压并安装k8s [root@srv1 ~]# tar -xf kubernetes-server-linux-amd64.tar.gz \ --strip-components=3 \ -C /usr/local/bin kubernetes/server/bin/kube{let,ctl,-apiserver,-controller-manager,-scheduler,-proxy}
2) 解压etcd安装包 (1) https://github.com/etcd-io/etcd/releases/ 下载etcd
(2) 解压并安装etcd [root@srv1 ~]# tar -xf etcd*.tar.gz && mv etcd-*/etcd /usr/local/bin/ && mv etcd-*/etcdctl /usr/local/bin/
3) 统计总工具数 [root@srv1 ~]# ls /usr/local/bin/ containerd crictl etcdctl kube-proxy containerd-shim critest kube-apiserver kube-scheduler containerd-shim-runc-v1 ctd-decoder kube-controller-manager containerd-shim-runc-v2 ctr kubectl containerd-stress etcd kubelet
[root@srv1 ~]# ls /usr/local/bin/ | wc -l 17
4) 确认版本号 [root@srv1 ~]# kubelet --version Kubernetes v1.28.2
[root@srv1 ~]# etcdctl version etcdctl version: 3.5.9 API version: 3.5
5) 将组件发送给其他的Master节点 [root@srv1 ~]# master='srv2.1000y.cloud srv3.1000y.cloud' [root@srv1 ~]# worker='srv4.1000y.cloud srv5.1000y.cloud srv6.1000y.cloud'
[root@srv1 ~]# for NODE in $master do echo $NODE scp /usr/local/bin/kube{let,ctl,-apiserver,-controller-manager,-scheduler,-proxy} $NODE:/usr/local/bin/ scp /usr/local/bin/etcd* $NODE:/usr/local/bin/ done
[root@srv1 ~]# for NODE in $worker do echo $NODE scp /usr/local/bin/kube{let,-proxy} $NODE:/usr/local/bin/ done
# 所有节点执行以下命令 [root@srv1 ~]# mkdir -p /opt/cni/bin
2.5 创建并产生证书---仅在Srv1上操作
1) 下载证书生成工具
https://github.com/cloudflare/cfssl/releases/
2) 安装证书工具 [root@srv1 ~]# cp cfssl_1.6.4_linux_amd64 /usr/local/bin/cfssl [root@srv1 ~]# cp cfssljson_1.6.4_linux_amd64 /usr/local/bin/cfssljson [root@srv1 ~]# cp cfssl-certinfo_1.6.4_linux_amd64 /usr/local/bin/cfssl-certinfo [root@srv1 ~]# chmod +x /usr/local/bin/cfssl*
3) 生成etcd证书 # 在所有Master节点操作 [root@srv1 ~]# mkdir /etc/etcd/ssl -p
[root@srv1 ~]# mkdir -p k8s/pki # 在Srv1节点操作 [root@srv1 ~]# cd k8s/pki/
[root@srv1 pki]# cat > ca-config.json << EOF { "signing": { "default": { "expiry": "876000h" }, "profiles": { "kubernetes": { "usages": [ "signing", "key encipherment", "server auth", "client auth" ], "expiry": "876000h" } } } } EOF
################################################## 参数说明 ##################################################
# 这段配置文件是用于配置加密和认证签名的一些参数。 # # 在这里,有两个部分:`signing`和`profiles`。 # # `signing`包含了默认签名配置和配置文件。 # 默认签名配置`default`指定了证书的过期时间为`876000h`。`876000h`表示证书有效期为100年。 # # `profiles`部分定义了不同的证书配置文件。 # 在这里,只有一个配置文件`kubernetes`。它包含了以下`usages`和过期时间`expiry`: # # 1. `signing`:用于对其他证书进行签名 # 2. `key encipherment`:用于加密和解密传输数据 # 3. `server auth`:用于服务器身份验证 # 4. `client auth`:用于客户端身份验证 # # 对于`kubernetes`配置文件,证书的过期时间也是`876000h`,即100年。
################################################## 说明结束 ##################################################
[root@srv1 ~]# cat > etcd-ca-csr.json << EOF { "CN": "etcd", "key": { "algo": "rsa", "size": 2048 }, "names": [ { "C": "CN", "ST": "Beijing", "L": "Beijing", "O": "etcd", "OU": "Etcd Security" } ], "ca": { "expiry": "876000h" } } EOF
################################################## 参数说明 ##################################################
# 这是一个用于生成证书签名请求(Certificate Signing Request,CSR)的JSON配置文件。JSON配置文件指定了生成证书签名请求所需的数据。 # # - "CN": "etcd" 指定了希望生成的证书的CN字段(Common Name),即证书的主题,通常是该证书标识的实体的名称。 # - "key": {} 指定了生成证书所使用的密钥的配置信息。"algo": "rsa" 指定了密钥的算法为RSA,"size": 2048 指定了密钥的长度为2048位。 # - "names": [] 包含了生成证书时所需的实体信息。在这个例子中,只包含了一个实体,其相关信息如下: # - "C": "CN" 指定了实体的国家/地区代码,这里是中国。 # - "ST": "Beijing" 指定了实体所在的省/州。 # - "L": "Beijing" 指定了实体所在的城市。 # - "O": "etcd" 指定了实体的组织名称。 # - "OU": "Etcd Security" 指定了实体所属的组织单位。 # - "ca": {} 指定了生成证书时所需的CA(Certificate Authority)配置信息。 # - "expiry": "876000h" 指定了证书的有效期,这里是876000小时。 # # 生成证书签名请求时,可以使用这个JSON配置文件作为输入,根据配置文件中的信息生成相应的CSR文件。然后,可以将CSR文件发送给CA进行签名,以获得有效的证书。 # # 生成etcd证书和etcd证书的key(如果你觉得以后可能会扩容,可以在ip那多写几个预留出来)
################################################## 说明结束 ##################################################
[root@srv1 pki]# cfssl gencert -initca etcd-ca-csr.json | cfssljson -bare /etc/etcd/ssl/etcd-ca 2023/10/01 09:43:59 [INFO] generating a new CA key and certificate from CSR 2023/10/01 09:43:59 [INFO] generate received request 2023/10/01 09:43:59 [INFO] received CSR 2023/10/01 09:43:59 [INFO] generating key: rsa-2048 2023/10/01 09:44:00 [INFO] encoded CSR 2023/10/01 09:44:00 [INFO] signed certificate with serial number 212790503469786088557368097710578918172559758726
################################################## 参数说明 ##################################################
# cfssl是一个用于生成TLS/SSL证书的工具,它支持PKI、JSON格式配置文件以及与许多其他集成工具的配合使用。 # # gencert参数表示生成证书的操作。-initca参数表示初始化一个CA(证书颁发机构)。CA是用于签发其他证书的根证书。etcd-ca-csr.json # 是一个JSON格式的配置文件,其中包含了CA的详细信息,如私钥、公钥、有效期等。这个文件提供了生成CA证书所需的信息。 # # | 符号表示将上一个命令的输出作为下一个命令的输入。 # # cfssljson是cfssl工具的一个子命令,用于格式化cfssl生成的JSON数据。 -bare参数表示直接输出裸证书,即只生成证书文件,不包含其他 # 格式的文件。/etc/etcd/ssl/etcd-ca是指定生成的证书文件的路径和名称。 # # 所以,这条命令的含义是使用cfssl工具根据配置文件ca-csr.json生成一个CA证书,并将证书文件保存在/etc/etcd/ssl/etcd-ca路径下。
################################################## 说明结束 ##################################################
[root@srv1 pki]# cat > etcd-csr.json << EOF { "CN": "etcd", "key": { "algo": "rsa", "size": 2048 }, "names": [ { "C": "CN", "ST": "Beijing", "L": "Beijing", "O": "etcd", "OU": "Etcd Security" } ] } EOF
################################################## 参数说明 ##################################################
# 这段代码是一个JSON格式的配置文件,用于生成一个证书签名请求(Certificate Signing Request,CSR)。 # # 首先,"CN"字段指定了该证书的通用名称(Common Name),这里设为"etcd"。 # # 接下来,"key"字段指定了密钥的算法("algo"字段)和长度("size"字段),此处使用的是RSA算法,密钥长度为2048位。 # # 最后,"names"字段是一个数组,其中包含了一个名字对象,用于指定证书中的一些其他信息。这个名字对象包含了以下字段: # - "C"字段指定了国家代码(Country),这里设置为"CN"。 # - "ST"字段指定了省份(State)或地区,这里设置为"Beijing"。 # - "L"字段指定了城市(Locality),这里设置为"Beijing"。 # - "O"字段指定了组织(Organization),这里设置为"etcd"。 # - "OU"字段指定了组织单元(Organizational Unit),这里设置为"Etcd Security"。 # # 这些字段将作为证书的一部分,用于标识和验证证书的使用范围和颁发者等信息。 ################################################## 说明结束 ##################################################
[root@srv1 pki]# cfssl gencert \ -ca=/etc/etcd/ssl/etcd-ca.pem \ -ca-key=/etc/etcd/ssl/etcd-ca-key.pem \ -config=ca-config.json \ -hostname=127.0.0.1,srv1.1000y.cloud,srv2.1000y.cloud,srv3.1000y.cloud,192.168.1.11,192.168.1.12,192.168.1.13 \ -profile=kubernetes \ etcd-csr.json | cfssljson -bare /etc/etcd/ssl/etcd 2023/10/01 09:49:31 [INFO] generate received request 2023/10/01 09:49:31 [INFO] received CSR 2023/10/01 09:49:31 [INFO] generating key: rsa-2048 2023/10/01 09:49:33 [INFO] encoded CSR 2023/10/01 09:49:33 [INFO] signed certificate with serial number 201107719211860724407267705003893888676796259892
################################################## 参数说明 ##################################################
# 这是一条使用cfssl生成etcd证书的命令,下面是各个参数的解释: # # -ca=/etc/etcd/ssl/etcd-ca.pem:指定用于签名etcd证书的CA文件的路径。 # -ca-key=/etc/etcd/ssl/etcd-ca-key.pem:指定用于签名etcd证书的CA私钥文件的路径。 # -config=ca-config.json:指定CA配置文件的路径,该文件定义了证书的有效期、加密算法等设置。 # -hostname=xxxx:指定要为etcd生成证书的主机名和IP地址列表。 # -profile=kubernetes:指定使用的证书配置文件,该文件定义了证书的用途和扩展属性。 # etcd-csr.json:指定etcd证书请求的JSON文件的路径,该文件包含了证书请求的详细信息。 # | cfssljson -bare /etc/etcd/ssl/etcd:通过管道将cfssl命令的输出传递给cfssljson命令,并使用-bare参数指定输出文件的 # 前缀路径,这里将生成etcd证书的.pem和-key.pem文件。 # # 这条命令的作用是使用指定的CA证书和私钥,根据证书请求的JSON文件和配置文件生成etcd的证书文件。
################################################## 说明结束 ##################################################
4) 将etcd证书复制到其他的Master节点 [root@srv1 pki]# for NODE in $master do ssh $NODE "mkdir -p /etc/etcd/ssl" for FILE in etcd-ca-key.pem etcd-ca.pem etcd-key.pem etcd.pem do scp /etc/etcd/ssl/${FILE} $NODE:/etc/etcd/ssl/${FILE} done done
5) 生成k8s相关证书---仅在srv1上操作 (1) 在所有Master节点创建目录 [root@srv1 pki]# mkdir -p /etc/kubernetes/pki
(2) 生成证书所需的配置文件 [root@srv1 pki]# cat > ca-csr.json << EOF { "CN": "kubernetes", "key": { "algo": "rsa", "size": 2048 }, "names": [ { "C": "CN", "ST": "Beijing", "L": "Beijing", "O": "Kubernetes", "OU": "Kubernetes-manual" } ], "ca": { "expiry": "876000h" } } EOF
################################################## 参数说明 ##################################################
# 这是一个用于生成 Kubernetes 相关证书的配置文件。该配置文件中包含以下信息: # # - CN:CommonName,即用于标识证书的通用名称。在此配置中,CN 设置为 "kubernetes",表示该证书是用于 Kubernetes。 # - key:用于生成证书的算法和大小。在此配置中,使用的算法是 RSA,大小是 2048 位。 # - names:用于证书中的名称字段的详细信息。在此配置中,有以下字段信息: # - C:Country,即国家。在此配置中,设置为 "CN"。 # - ST:State,即省/州。在此配置中,设置为 "Beijing"。 # - L:Locality,即城市。在此配置中,设置为 "Beijing"。 # - O:Organization,即组织。在此配置中,设置为 "Kubernetes"。 # - OU:Organization Unit,即组织单位。在此配置中,设置为 "Kubernetes-manual"。 # - ca:用于证书签名的证书颁发机构(CA)的配置信息。在此配置中,设置了证书的有效期为 876000 小时。 # # 这个配置文件可以用于生成 Kubernetes 相关的证书,以确保集群中的通信安全性。
################################################## 说明结束 ##################################################
[root@srv1 pki]# cfssl gencert -initca ca-csr.json | cfssljson -bare /etc/kubernetes/pki/ca 2023/10/01 10:02:56 [INFO] generating a new CA key and certificate from CSR 2023/10/01 10:02:56 [INFO] generate received request 2023/10/01 10:02:56 [INFO] received CSR 2023/10/01 10:02:56 [INFO] generating key: rsa-2048 2023/10/01 10:02:56 [INFO] encoded CSR 2023/10/01 10:02:56 [INFO] signed certificate with serial number 315198981720649652046586466540872385474822878035
################################################## 参数说明 ##################################################
# 具体的解释如下: # # cfssl是一个用于生成TLS/SSL证书的工具,它支持PKI、JSON格式配置文件以及与许多其他集成工具的配合使用。 # # gencert参数表示生成证书的操作。-initca参数表示初始化一个CA(证书颁发机构)。CA是用于签发其他证书的根证书。ca-csr. # json是一个JSON格式的配置文件,其中包含了CA的详细信息,如私钥、公钥、有效期等。这个文件提供了生成CA证书所需的信息。 # # | 符号表示将上一个命令的输出作为下一个命令的输入。 # # cfssljson是cfssl工具的一个子命令,用于格式化cfssl生成的JSON数据。 -bare参数表示直接输出裸证书,即只生成证书文件, # 不包含其他格式的文件。/etc/kubernetes/pki/ca是指定生成的证书文件的路径和名称。 # # 所以,这条命令的含义是使用cfssl工具根据配置文件ca-csr.json生成一个CA证书,并将证书文件保存在/etc/kubernetes/pki/ca # 路径下。 ################################################## 说明结束 ##################################################
[root@srv1 pki]# cat > apiserver-csr.json << EOF { "CN": "kube-apiserver", "key": { "algo": "rsa", "size": 2048 }, "names": [ { "C": "CN", "ST": "Beijing", "L": "Beijing", "O": "Kubernetes", "OU": "Kubernetes-manual" } ] } EOF
################################################## 参数说明 ##################################################
# 这是一个用于生成 Kubernetes 相关证书的配置文件。该配置文件中包含以下信息: # # - `CN` 字段指定了证书的通用名称 (Common Name),这里设置为 "kube-apiserver",表示该证书用于 Kubernetes API Server。 # - `key` 字段指定了生成证书时所选用的加密算法和密钥长度。这里选用了 RSA 算法,密钥长度为 2048 位。 # - `names` 字段包含了一组有关证书持有者信息的项。这里使用了以下信息: # - `C` 表示国家代码 (Country),这里设置为 "CN" 表示中国。 # - `ST` 表示州或省份 (State),这里设置为 "Beijing" 表示北京市。 # - `L` 表示城市或地区 (Location),这里设置为 "Beijing" 表示北京市。 # - `O` 表示组织名称 (Organization),这里设置为 "Kubernetes" 表示 Kubernetes。 # - `OU` 表示组织单位 (Organizational Unit),这里设置为 "Kubernetes-manual" 表示手动管理的 Kubernetes 集群。 # # 这个配置文件可以用于生成 Kubernetes 相关的证书,以确保集群中的通信安全性。
################################################## 说明结束 ##################################################
[root@srv1 pki]# cfssl gencert \ -ca=/etc/kubernetes/pki/ca.pem \ -ca-key=/etc/kubernetes/pki/ca-key.pem \ -config=ca-config.json \ # 此行太长,有回车,请勿直接复制 -hostname=10.96.0.1,192.168.1.21,127.0.0.1,kubernetes,kubernetes.default,kubernetes.default.svc, kubernetes.default.svc.cluster,kubernetes.default.svc.cluster.local,srv1.1000y.cloud,srv2.1000y.cloud, srv3.1000y.cloud,srv4.1000y.cloud,srv5.1000y.cloud,srv6.1000y.cloud,192.168.1.11,192.168.1.12,192.168.1.13, 192.168.1.14,192.168.1.15,192.168.1.16 \ -profile=kubernetes \ apiserver-csr.json | cfssljson -bare /etc/kubernetes/pki/apiserver 2023/10/01 10:08:53 [INFO] generate received request 2023/10/01 10:08:53 [INFO] received CSR 2023/10/01 10:08:53 [INFO] generating key: rsa-2048 2023/10/01 10:08:53 [INFO] encoded CSR 2023/10/01 10:08:53 [INFO] signed certificate with serial number 488400570471420854343650941206806954966716167053
################################################## 参数说明 ##################################################
# 生成一个根证书 ,可多写了一些IP作为预留IP,为将来添加node做准备 # 10.96.0.1是service网段的第一个地址,需要计算,192.168.1.21为高可用vip地址
# 命令的参数解释如下: # - `-ca=/etc/kubernetes/pki/ca.pem`:指定证书的颁发机构(CA)文件路径。 # - `-ca-key=/etc/kubernetes/pki/ca-key.pem`:指定证书的颁发机构(CA)私钥文件路径。 # - `-config=ca-config.json`:指定证书生成的配置文件路径,配置文件中包含了证书的有效期、加密算法等信息。 # - `-hostname=10.96.0.1,192.168.0.36,127.0.0.1,fc00:43f4:1eea:1::10`:指定证书的主机名或IP地址列表。 # - `-profile=kubernetes`:指定证书生成的配置文件中的配置文件名。 # - `apiserver-csr.json`:API Server的证书签名请求配置文件路径。 # - `| cfssljson -bare /etc/kubernetes/pki/apiserver`:通过管道将生成的证书输出到cfssljson工具,将其转换为PEM编码 # 格式,并保存到 `/etc/kubernetes/pki/apiserver.pem` 和 `/etc/kubernetes/pki/apiserver-key.pem` 文件中。 # # 最终,这个命令将会生成API Server的证书和私钥,并保存到指定的文件中
################################################## 说明结束 ##################################################
(3) 生成apiserver聚合证书 [root@srv1 pki]# cat > front-proxy-ca-csr.json << EOF { "CN": "kubernetes", "key": { "algo": "rsa", "size": 2048 }, "ca": { "expiry": "876000h" } } EOF
################################################## 参数说明 ##################################################
# 这个JSON文件表示了生成一个名为"kubernetes"的证书的配置信息。这个证书是用来进行Kubernetes集群的身份验证和安全通信。 # # 配置信息包括以下几个部分: # # 1. "CN": "kubernetes":这表示了证书的通用名称(Common Name),也就是证书所代表的实体的名称。在这里,证书的通用名称 # 被设置为"kubernetes",表示这个证书是用来代表Kubernetes集群。 # # 2. "key":这是用来生成证书的密钥相关的配置。在这里,配置使用了RSA算法,并且设置了密钥的大小为2048位。 # # 3. "ca":这个字段指定了证书的颁发机构(Certificate Authority)相关的配置。在这里,配置指定了证书的有效期为876000小时, # 即100年。这意味着该证书在100年内将被视为有效,过期后需要重新生成。 # # 总之,这个JSON文件中的配置信息描述了如何生成一个用于Kubernetes集群的证书,包括证书的通用名称、密钥算法和大小以及证书的有效期。 ################################################## 说明结束 ##################################################
[root@srv1 pki]# cfssl gencert \ -initca front-proxy-ca-csr.json | cfssljson -bare /etc/kubernetes/pki/front-proxy-ca 2023/10/01 10:13:06 [INFO] generating a new CA key and certificate from CSR 2023/10/01 10:13:06 [INFO] generate received request 2023/10/01 10:13:06 [INFO] received CSR 2023/10/01 10:13:06 [INFO] generating key: rsa-2048 2023/10/01 10:13:07 [INFO] encoded CSR 2023/10/01 10:13:07 [INFO] signed certificate with serial number 213355231329657341963920104480364945349912115878
################################################## 参数说明 ##################################################
# cfssl是一个用于生成TLS/SSL证书的工具,它支持PKI、JSON格式配置文件以及与许多其他集成工具的配合使用。 # # gencert参数表示生成证书的操作。-initca参数表示初始化一个CA(证书颁发机构)。CA是用于签发其他证书的根证书。 # front-proxy-ca-csr.json是一个JSON格式的配置文件,其中包含了CA的详细信息,如私钥、公钥、有效期等。这个文件提供了生成 # CA证书所需的信息。 # # | 符号表示将上一个命令的输出作为下一个命令的输入。 # # cfssljson是cfssl工具的一个子命令,用于格式化cfssl生成的JSON数据。 -bare参数表示直接输出裸证书,即只生成证书文件,不包含 # 其他格式的文件。/etc/kubernetes/pki/front-proxy-ca是指定生成的证书文件的路径和名称。 # # 所以,这条命令的含义是使用cfssl工具根据配置文件ca-csr.json生成一个CA证书,并将证书文件保存在/etc/kubernetes/pki/front-proxy-ca路径下。
################################################## 说明结束 ##################################################
[root@srv1 pki]# cat > front-proxy-client-csr.json << EOF { "CN": "front-proxy-client", "key": { "algo": "rsa", "size": 2048 } } EOF
################################################## 参数说明 ##################################################
# 这是一个JSON格式的配置文件,用于描述一个名为"front-proxy-client"的配置。配置包括两个字段:CN和key。 # # - CN(Common Name)字段表示证书的通用名称,这里为"front-proxy-client"。 # - key字段描述了密钥的算法和大小。"algo"表示使用RSA算法,"size"表示密钥大小为2048位。 # # 该配置文件用于生成一个SSL证书,用于在前端代理客户端进行认证和数据传输的加密。这个证书中的通用名称是"front-proxy-client", # 使用RSA算法生成,密钥大小为2048位。
################################################## 说明结束 ##################################################
[root@srv1 pki]# cfssl gencert \ -ca=/etc/kubernetes/pki/front-proxy-ca.pem \ -ca-key=/etc/kubernetes/pki/front-proxy-ca-key.pem \ -config=ca-config.json \ -profile=kubernetes \ front-proxy-client-csr.json | cfssljson -bare /etc/kubernetes/pki/front-proxy-client 2023/10/01 10:16:46 [INFO] generate received request 2023/10/01 10:16:46 [INFO] received CSR 2023/10/01 10:16:46 [INFO] generating key: rsa-2048 2023/10/01 10:16:47 [INFO] encoded CSR 2023/10/01 10:16:47 [INFO] signed certificate with serial number 441865382837329977157846346782030973793266800184 2023/10/01 10:16:47 [WARNING] This certificate lacks a "hosts" field. This makes it unsuitable for websites. For more information see the Baseline Requirements for the Issuance and Management of Publicly-Trusted Certificates, v.1.1.6, from the CA/Browser Forum (https://cabforum.org); specifically, section 10.2.3 ("Information Requirements").
################################################## 参数说明 ##################################################
# - `-ca=/etc/kubernetes/pki/front-proxy-ca.pem`: 指定用于签署证书的根证书文件路径。 # - `-ca-key=/etc/kubernetes/pki/front-proxy-ca-key.pem`: 指定用于签署证书的根证书的私钥文件路径。 # - `-config=ca-config.json`: 指定用于配置证书签署的配置文件路径。该配置文件描述了证书生成的一些规则,如加密算法和有效期等。 # - `-profile=kubernetes`: 指定生成证书时使用的配置文件中定义的profile,其中包含了一些默认的参数。 # - `front-proxy-client-csr.json`: 指定用于生成证书的CSR文件路径,该文件包含了证书请求的相关信息。 # - `| cfssljson -bare /etc/kubernetes/pki/front-proxy-client`: 通过管道将生成的证书输出到cfssljson工具进行解析, # 并通过`-bare`参数将证书和私钥分别保存到指定路径。 # # 这个命令的作用是根据提供的CSR文件和配置信息,使用指定的根证书和私钥生成一个前端代理客户端的证书,并将证书和私钥分别 # 保存到`/etc/kubernetes/pki/front-proxy-client.pem`和`/etc/kubernetes/pki/front-proxy-client-key.pem`文件中。 ################################################## 说明结束 ##################################################
6) 生成controller-manage的证书---仅在srv1上操作 [root@srv1 pki]# cat > manager-csr.json << EOF { "CN": "system:kube-controller-manager", "key": { "algo": "rsa", "size": 2048 }, "names": [ { "C": "CN", "ST": "Beijing", "L": "Beijing", "O": "system:kube-controller-manager", "OU": "Kubernetes-manual" } ] } EOF
################################################## 参数说明 ##################################################
# 这是一个用于生成密钥对(公钥和私钥)的JSON配置文件。下面是针对该文件中每个字段的详细解释: # # - "CN": 值为"system:kube-controller-manager",代表通用名称(Common Name),是此密钥对的主题(subject)。 # - "key": 这个字段用来定义密钥算法和大小。 # - "algo": 值为"rsa",表示使用RSA算法。 # - "size": 值为2048,表示生成的密钥大小为2048位。 # - "names": 这个字段用来定义密钥对的各个名称字段。 # - "C": 值为"CN",表示国家(Country)名称是"CN"(中国)。 # - "ST": 值为"Beijing",表示省/州(State/Province)名称是"Beijing"(北京)。 # - "L": 值为"Beijing",表示城市(Locality)名称是"Beijing"(北京)。 # - "O": 值为"system:kube-controller-manager",表示组织(Organization)名称是"system:kube-controller-manager"。 # - "OU": 值为"Kubernetes-manual",表示组织单位(Organizational Unit)名称是"Kubernetes-manual"。 # # 这个JSON配置文件基本上是告诉生成密钥对的工具,生成一个带有特定名称和属性的密钥对。
################################################## 说明结束 ##################################################
[root@srv1 pki]# cfssl gencert \ -ca=/etc/kubernetes/pki/ca.pem \ -ca-key=/etc/kubernetes/pki/ca-key.pem \ -config=ca-config.json \ -profile=kubernetes \ manager-csr.json | cfssljson -bare /etc/kubernetes/pki/controller-manager 2023/10/01 10:21:24 [INFO] generate received request 2023/10/01 10:21:24 [INFO] received CSR 2023/10/01 10:21:24 [INFO] generating key: rsa-2048 2023/10/01 10:21:26 [INFO] encoded CSR 2023/10/01 10:21:26 [INFO] signed certificate with serial number 117808291191094199146737417506094326400779876505 2023/10/01 10:21:26 [WARNING] This certificate lacks a "hosts" field. This makes it unsuitable for websites. For more information see the Baseline Requirements for the Issuance and Management of Publicly-Trusted Certificates, v.1.1.6, from the CA/Browser Forum (https://cabforum.org); specifically, section 10.2.3 ("Information Requirements").
################################################## 参数说明 ##################################################
# 1. `cfssl gencert` 是cfssl工具的命令,用于生成证书。 # 2. `-ca` 指定根证书的路径和文件名,这里是`/etc/kubernetes/pki/ca.pem`。 # 3. `-ca-key` 指定根证书的私钥的路径和文件名,这里是`/etc/kubernetes/pki/ca-key.pem`。 # 4. `-config` 指定配置文件的路径和文件名,这里是`ca-config.json`。 # 5. `-profile` 指定证书使用的配置文件中的配置模板,这里是`kubernetes`。 # 6. `manager-csr.json` 是证书签发请求的配置文件,用于生成证书签发请求。 # 7. `|` 管道操作符,将前一条命令的输出作为后一条命令的输入。 # 8. `cfssljson -bare` 是 cfssl 工具的命令,作用是将证书签发请求的输出转换为PKCS#1、PKCS#8和x509 PEM文件。 # 9. `/etc/kubernetes/pki/controller-manager` 是转换后的 PEM 文件的存储位置和文件名。 # # 这个命令的作用是根据根证书和私钥、配置文件以及证书签发请求的配置文件,生成经过签发的控制器管理器证书和私钥,并将转换后 # 的 PEM 文件保存到指定的位置。 ################################################## 说明结束 ##################################################
[root@srv1 pki]# kubectl config set-cluster kubernetes \ --certificate-authority=/etc/kubernetes/pki/ca.pem \ --embed-certs=true \ --server=https://192.168.1.21:16443 \ --kubeconfig=/etc/kubernetes/controller-manager.kubeconfig Cluster "kubernetes" set.
################################################## 参数说明 ##################################################
# 若使用 haproxy、keepalived 那么为 `--server=https://192.168.1.21:16443 # 若使用 nginx方案,那么为 `--server=https://127.0.0.1:16443`
# kubectl config set-cluster命令用于配置集群信息。 # --certificate-authority选项指定了集群的证书颁发机构(CA)的路径,这个CA会验证kube-apiserver提供的证书是否合法。 # --embed-certs选项用于将证书嵌入到生成的kubeconfig文件中,这样就不需要在kubeconfig文件中单独指定证书文件路径。 # --server选项指定了kube-apiserver的地址,这里使用的是192.168.1.21:16443,表示使用本地主机上的kube-apiserver, # 默认端口为16443。 # --kubeconfig选项指定了生成的kubeconfig文件的路径和名称,这里指定为/etc/kubernetes/controller-manager.kubeconfig。 # 综上所述,kubectl config set-cluster命令的作用是在kubeconfig文件中设置集群信息,包括证书颁发机构、证书、 # kube-apiserver地址等。 ################################################## 说明结束 ##################################################
# 设置一个环境项,一个上下文 [root@srv1 pki]# kubectl config set-context system:kube-controller-manager@kubernetes \ --cluster=kubernetes \ --user=system:kube-controller-manager \ --kubeconfig=/etc/kubernetes/controller-manager.kubeconfig Context "system:kube-controller-manager@kubernetes" created.
################################################## 参数说明 ##################################################
# 这个命令用于配置 Kubernetes 控制器管理器的上下文信息。下面是各个参数的详细解释: # 1. `kubectl config set-context system:kube-controller-manager@kubernetes`: # 设置上下文的名称为 `system:kube-controller-manager@kubernetes`,这是一个标识符,用于唯一标识该上下文。 # 2. `--cluster=kubernetes`: 指定集群的名称为 `kubernetes`,这是一个现有集群的标识符,表示要管理的 Kubernetes 集群。 # 3. `--user=system:kube-controller-manager`: 指定使用的用户身份为 `system:kube-controller-manager`。 # 这是一个特殊的用户身份,具有控制 Kubernetes 控制器管理器的权限。 # 4. `--kubeconfig=/etc/kubernetes/controller-manager.kubeconfig`: 指定 kubeconfig 文件的路径为 # `/etc/kubernetes/controller-manager.kubeconfig`。kubeconfig 文件是一个用于管理 Kubernetes 配置的文件, # 包含了集群、用户和上下文的相关信息。 # 通过运行这个命令,可以将这些配置信息保存到 `/etc/kubernetes/controller-manager.kubeconfig` 文件中, # 以便在后续的操作中使用。
################################################## 说明结束 ##################################################
# 设置一个用户 [root@srv1 pki]# kubectl config set-credentials system:kube-controller-manager \ --client-certificate=/etc/kubernetes/pki/controller-manager.pem \ --client-key=/etc/kubernetes/pki/controller-manager-key.pem \ --embed-certs=true \ --kubeconfig=/etc/kubernetes/controller-manager.kubeconfig User "system:kube-controller-manager" set.
################################################## 参数说明 ##################################################
# 上述命令是用于设置 Kubernetes 的 controller-manager 组件的客户端凭据。下面是每个参数的详细解释: # # - `kubectl config`: 是使用 kubectl 命令行工具的配置子命令。 # - `set-credentials`: 是定义一个新的用户凭据配置的子命令。 # - `system:kube-controller-manager`: 是设置用户凭据的名称,`system:` 是 Kubernetes API Server # 内置的身份验证器使用的用户标识符前缀,它表示是一个系统用户,在本例中是 kube-controller-manager 组件使用的身份。 # - `--client-certificate=/etc/kubernetes/pki/controller-manager.pem`: 指定 controller-manager.pem # 客户端证书的路径。 # - `--client-key=/etc/kubernetes/pki/controller-manager-key.pem`: 指定 controller-manager-key.pem # 客户端私钥的路径。 # - `--embed-certs=true`: 表示将证书和私钥直接嵌入到生成的 kubeconfig 文件中,而不是通过引用外部文件。 # - `--kubeconfig=/etc/kubernetes/controller-manager.kubeconfig`: 指定生成的 kubeconfig 文件的路径和文件名, # 即 controller-manager.kubeconfig。 # # 通过运行上述命令,将根据提供的证书和私钥信息,为 kube-controller-manager 创建一个 kubeconfig 文件,以便后续使用该文件进行身份验证和访问 Kubernetes API。
################################################## 说明结束 ##################################################
# 设置默认环境 [root@srv1 pki]# kubectl config use-context system:kube-controller-manager@kubernetes \ --kubeconfig=/etc/kubernetes/controller-manager.kubeconfig Switched to context "system:kube-controller-manager@kubernetes".
################################################## 参数说明 ##################################################
# 这个命令是用来指定kubectl使用指定的上下文环境来执行操作。上下文环境是kubectl用来确定要连接到哪个Kubernetes集群以及 # 使用哪个身份验证信息的配置。 # # 在这个命令中,`kubectl config use-context`是用来设置当前上下文环境的命令。 # # system:kube-controller-manager@kubernetes`是指定的上下文名称,它告诉kubectl要使用的Kubernetes集群和身份验证信息。 # # `--kubeconfig=/etc/kubernetes/controller-manager.kubeconfig`是用来指定使用的kubeconfig文件的路径。 # kubeconfig文件是存储集群连接和身份验证信息的配置文件。 # # 通过执行这个命令,kubectl将使用指定的上下文来执行后续的操作,包括部署和管理Kubernetes资源。
################################################## 说明结束 ##################################################
7) 生成kube-scheduler的证书---仅在srv1上操作 [root@srv1 pki]# cat > scheduler-csr.json << EOF { "CN": "system:kube-scheduler", "key": { "algo": "rsa", "size": 2048 }, "names": [ { "C": "CN", "ST": "Beijing", "L": "Beijing", "O": "system:kube-scheduler", "OU": "Kubernetes-manual" } ] } EOF
################################################## 参数说明 ##################################################
# 这个命令是用来创建一个叫做scheduler-csr.json的文件,并将其中的内容赋值给该文件。 # # 文件内容是一个JSON格式的文本,包含了一个描述证书请求的结构。 # # 具体内容如下: # # - "CN": "system:kube-scheduler":Common Name字段,表示该证书的名称为system:kube-scheduler。 # - "key": {"algo": "rsa", "size": 2048}:key字段指定生成证书时使用的加密算法是RSA,并且密钥的长度为2048位。 # - "names": [...]:names字段定义了证书中的另外一些标识信息。 # - "C": "CN":Country字段,表示国家/地区为中国。 # - "ST": "Beijing":State字段,表示省/市为北京。 # - "L": "Beijing":Locality字段,表示所在城市为北京。 # - "O": "system:kube-scheduler":Organization字段,表示组织为system:kube-scheduler。 # - "OU": "Kubernetes-manual":Organizational Unit字段,表示组织单元为Kubernetes-manual。 # # 而EOF是一个占位符,用于标记开始和结束的位置。在开始的EOF之后到结束的EOF之间的内容将会被写入到scheduler-csr.json文件中。 # # 总体来说,这个命令用于生成一个描述kube-scheduler证书请求的JSON文件。
################################################## 说明结束 ##################################################
[root@srv1 pki]# cfssl gencert \ -ca=/etc/kubernetes/pki/ca.pem \ -ca-key=/etc/kubernetes/pki/ca-key.pem \ -config=ca-config.json \ -profile=kubernetes \ scheduler-csr.json | cfssljson -bare /etc/kubernetes/pki/scheduler 2023/10/01 10:34:27 [INFO] generate received request 2023/10/01 10:34:27 [INFO] received CSR 2023/10/01 10:34:27 [INFO] generating key: rsa-2048 2023/10/01 10:34:27 [INFO] encoded CSR 2023/10/01 10:34:27 [INFO] signed certificate with serial number 730081275072501872348691828045941635294752784571 2023/10/01 10:34:27 [WARNING] This certificate lacks a "hosts" field. This makes it unsuitable for websites. For more information see the Baseline Requirements for the Issuance and Management of Publicly-Trusted Certificates, v.1.1.6, from the CA/Browser Forum (https://cabforum.org); specifically, section 10.2.3 ("Information Requirements").
################################################## 参数说明 ##################################################
# 1. `cfssl gencert`:使用cfssl工具生成证书。 # 2. `-ca=/etc/kubernetes/pki/ca.pem`:指定根证书文件的路径。在这里,是指定根证书的路径为`/etc/kubernetes/pki/ca.pem`。 # 3. `-ca-key=/etc/kubernetes/pki/ca-key.pem`:指定根证书私钥文件的路径。在这里,是指定根证书私钥的 # 路径为`/etc/kubernetes/pki/ca-key.pem`。 # 4. `-config=ca-config.json`:指定证书配置文件的路径。在这里,是指定证书配置文件的路径为`ca-config.json`。 # 5. `-profile=kubernetes`:指定证书的配置文件中的一个配置文件模板。在这里,是指定配置文件中的`kubernetes`配置模板。 # 6. `scheduler-csr.json`:指定Scheduler的证书签名请求文件(CSR)的路径。在这里,是指定请求文件的路径为 # `scheduler-csr.json`。 # 7. `|`(管道符号):将前一个命令的输出作为下一个命令的输入。 # 8. `cfssljson`:将cfssl工具生成的证书签名请求(CSR)进行解析。 # 9. `-bare /etc/kubernetes/pki/scheduler`:指定输出路径和前缀。在这里,是将解析的证书签名请求生成以下文件: # `/etc/kubernetes/pki/scheduler.pem`(包含了证书)、`/etc/kubernetes/pki/scheduler-key.pem`(包含了私钥)。 # # 总结来说,这个命令的目的是根据根证书、根证书私钥、证书配置文件、CSR文件等生成Kubernetes Scheduler的证书和私钥文件。
################################################## 说明结束 ##################################################
[root@srv1 pki]# kubectl config set-cluster kubernetes \ --certificate-authority=/etc/kubernetes/pki/ca.pem \ --embed-certs=true \ --server=https://192.168.1.21:16443 \ --kubeconfig=/etc/kubernetes/scheduler.kubeconfig Cluster "kubernetes" set.
################################################## 参数说明 ##################################################
# 若使用 haproxy、keepalived 那么为 `--server=https://192.168.1.21:16443` # 若使用 nginx方案,那么为 `--server=https://127.0.0.1:16443`
# - `kubectl config set-cluster kubernetes`: 设置一个集群并命名为"kubernetes"。 # - `--certificate-authority=/etc/kubernetes/pki/ca.pem`: 指定集群使用的证书授权机构的路径。 # - `--embed-certs=true`: 该标志指示将证书嵌入到生成的kubeconfig文件中。 # - `--server=https://127.0.0.1:8443`: 指定集群的 API server 位置。 # - `--kubeconfig=/etc/kubernetes/scheduler.kubeconfig`: 指定要保存 kubeconfig 文件的路径和名称。
################################################## 说明结束 ##################################################
[root@srv1 pki]# kubectl config set-credentials system:kube-scheduler \ --client-certificate=/etc/kubernetes/pki/scheduler.pem \ --client-key=/etc/kubernetes/pki/scheduler-key.pem \ --embed-certs=true \ --kubeconfig=/etc/kubernetes/scheduler.kubeconfig User "system:kube-scheduler" set.
################################################## 参数说明 ##################################################
# 这段命令是用于设置 kube-scheduler 组件的身份验证凭据,并生成相应的 kubeconfig 文件。
# - `kubectl config set-credentials system:kube-scheduler`:设置 `system:kube-scheduler` 用户的身份验证凭据。 # # - `--client-certificate=/etc/kubernetes/pki/scheduler.pem`:指定一个客户端证书文件,用于基于证书的身份验证。 # 在这种情况下,指定了 kube-scheduler 组件的证书文件路径。 # # - `--client-key=/etc/kubernetes/pki/scheduler-key.pem`:指定与客户端证书相对应的客户端私钥文件。 # # - `--embed-certs=true`:将客户端证书和私钥嵌入到生成的 kubeconfig 文件中。 # # - `--kubeconfig=/etc/kubernetes/scheduler.kubeconfig`:指定生成的 kubeconfig 文件的路径和名称。 # # 该命令的目的是为 kube-scheduler 组件生成一个 kubeconfig 文件,以便进行身份验证和访问集群资源。kubeconfig 文件是 # 一个包含了连接到 Kubernetes 集群所需的所有配置信息的文件,包括服务器地址、证书和秘钥等。
################################################## 说明结束 ##################################################
[root@srv1 pki]# kubectl config set-context system:kube-scheduler@kubernetes \ --cluster=kubernetes \ --user=system:kube-scheduler \ --kubeconfig=/etc/kubernetes/scheduler.kubeconfig Context "system:kube-scheduler@kubernetes" created.
################################################## 参数说明 ##################################################
# 该命令用于设置一个名为"system:kube-scheduler@kubernetes"的上下文,具体配置如下: # # 1. --cluster=kubernetes: 指定集群的名称为"kubernetes",这个集群是在当前的kubeconfig文件中已经定义好的。 # # 2. --user=system:kube-scheduler: 指定用户的名称为"system:kube-scheduler",这个用户也是在当前的kubeconfig文件中 # 已经定义好的。这个用户用于认证和授权kube-scheduler组件访问Kubernetes集群的权限。 # # 3. --kubeconfig=/etc/kubernetes/scheduler.kubeconfig: 指定kubeconfig文件的路径为 # "/etc/kubernetes/scheduler.kubeconfig",这个文件将被用来保存上下文的配置信息。 # # 这个命令的作用是将上述的配置信息保存到指定的kubeconfig文件中,以便后续使用该文件进行认证和授权访问Kubernetes集群。
################################################## 说明结束 ##################################################
[root@srv1 pki]# kubectl config use-context system:kube-scheduler@kubernetes \ --kubeconfig=/etc/kubernetes/scheduler.kubeconfig Switched to context "system:kube-scheduler@kubernetes".
################################################## 参数说明 ##################################################
# 上述命令是使用`kubectl`命令来配置Kubernetes集群中的调度器组件。 # # `kubectl config use-context`命令用于切换`kubectl`当前使用的上下文。上下文是Kubernetes集群、用户和命名空间的组合, # 用于确定`kubectl`的连接目标。下面解释这个命令的不同部分: # # - `system:kube-scheduler@kubernetes`是一个上下文名称。它指定了使用`kube-scheduler`用户和`kubernetes`命名空间的 # 系统级别上下文。系统级别上下文用于操作Kubernetes核心组件。 # # - `--kubeconfig=/etc/kubernetes/scheduler.kubeconfig`用于指定Kubernetes配置文件的路径。Kubernetes配置文件包含 # 连接到Kubernetes集群所需的身份验证和连接信息。 # # 通过运行以上命令,`kubectl`将使用指定的上下文和配置文件,以便在以后的命令中能正确地与Kubernetes集群中的调度器组件进行交互。
################################################## 说明结束 ##################################################
8) 生成admin的证书配置---仅在srv1上操作 [root@srv1 pki]# cat > admin-csr.json << EOF { "CN": "admin", "key": { "algo": "rsa", "size": 2048 }, "names": [ { "C": "CN", "ST": "Beijing", "L": "Beijing", "O": "system:masters", "OU": "Kubernetes-manual" } ] } EOF
################################################## 参数说明 ##################################################
# 这段代码是一个JSON格式的配置文件,用于创建和配置一个名为"admin"的Kubernetes凭证。 # # 这个凭证包含以下字段: # # - "CN": "admin": 这是凭证的通用名称,表示这是一个管理员凭证。 # - "key": 这是一个包含证书密钥相关信息的对象。 # - "algo": "rsa":这是使用的加密算法类型,这里是RSA加密算法。 # - "size": 2048:这是密钥的大小,这里是2048位。 # - "names": 这是一个包含证书名称信息的数组。 # - "C": "CN":这是证书的国家/地区字段,这里是中国。 # - "ST": "Beijing":这是证书的省/州字段,这里是北京。 # - "L": "Beijing":这是证书的城市字段,这里是北京。 # - "O": "system:masters":这是证书的组织字段,这里是system:masters,表示系统的管理员组。 # - "OU": "Kubernetes-manual":这是证书的部门字段,这里是Kubernetes-manual。 # # 通过这个配置文件创建的凭证将具有管理员权限,并且可以用于管理Kubernetes集群。
################################################## 说明结束 ##################################################
[root@srv1 pki]# cfssl gencert \ -ca=/etc/kubernetes/pki/ca.pem \ -ca-key=/etc/kubernetes/pki/ca-key.pem \ -config=ca-config.json \ -profile=kubernetes \ admin-csr.json | cfssljson -bare /etc/kubernetes/pki/admin 2023/10/01 10:47:36 [INFO] generate received request 2023/10/01 10:47:36 [INFO] received CSR 2023/10/01 10:47:36 [INFO] generating key: rsa-2048 2023/10/01 10:47:38 [INFO] encoded CSR 2023/10/01 10:47:38 [INFO] signed certificate with serial number 630394483751146065645360514719405740813841655498 2023/10/01 10:47:38 [WARNING] This certificate lacks a "hosts" field. This makes it unsuitable for websites. For more information see the Baseline Requirements for the Issuance and Management of Publicly-Trusted Certificates, v.1.1.6, from the CA/Browser Forum (https://cabforum.org); specifically, section 10.2.3 ("Information Requirements").
################################################## 参数说明 ##################################################
# 上述命令是使用cfssl工具生成Kubernetes admin的证书。 # # 具体解释如下: # # 1. `cfssl gencert`:使用cfssl工具生成证书。 # # 2. `-ca=/etc/kubernetes/pki/ca.pem`:指定根证书文件的路径。在这里,是指定根证书的路径为 # `/etc/kubernetes/pki/ca.pem`。 # # 3. `-ca-key=/etc/kubernetes/pki/ca-key.pem`:指定根证书私钥文件的路径。在这里,是指定根证书私钥的路径为 # `/etc/kubernetes/pki/ca-key.pem`。 # # 4. `-config=ca-config.json`:指定证书配置文件的路径。在这里,是指定证书配置文件的路径为`ca-config.json`。 # # 5. `-profile=kubernetes`:指定证书的配置文件中的一个配置文件模板。在这里,是指定配置文件中的`kubernetes`配置模板。 # # 6. `admin-csr.json`:指定admin的证书签名请求文件(CSR)的路径。在这里,是指定请求文件的路径为`admin-csr.json`。 # # 7. `|`(管道符号):将前一个命令的输出作为下一个命令的输入。 # # 8. `cfssljson`:将cfssl工具生成的证书签名请求(CSR)进行解析。 # # 9. `-bare /etc/kubernetes/pki/admin`:指定输出路径和前缀。在这里,是将解析的证书签名请求生成以下文件: # `/etc/kubernetes/pki/admin.pem`(包含了证书)、`/etc/kubernetes/pki/admin-key.pem`(包含了私钥)。 # # 总结来说,这个命令的目的是根据根证书、根证书私钥、证书配置文件、CSR文件等生成Kubernetes Scheduler的证书和私钥文件。
################################################## 说明结束 ##################################################
[root@srv1 pki]# kubectl config set-cluster kubernetes \ --certificate-authority=/etc/kubernetes/pki/ca.pem \ --embed-certs=true \ --server=https://192.168.1.21:16443 \ --kubeconfig=/etc/kubernetes/admin.kubeconfig Cluster "kubernetes" set.
################################################## 参数说明 ##################################################
# 若使用 haproxy、keepalived 那么为 `--server=https://192.168.1.21:16443` # 若使用 nginx方案,那么为 `--server=https://127.0.0.1:16443` # # 该命令用于配置一个名为"kubernetes"的集群,并将其应用到/etc/kubernetes/scheduler.kubeconfig文件中。 # # 该命令的解释如下: # - `kubectl config set-cluster kubernetes`: 设置一个集群并命名为"kubernetes"。 # - `--certificate-authority=/etc/kubernetes/pki/ca.pem`: 指定集群使用的证书授权机构的路径。 # - `--embed-certs=true`: 该标志指示将证书嵌入到生成的kubeconfig文件中。 # - `--server=https://127.0.0.1:8443`: 指定集群的 API server 位置。 # - `--kubeconfig=/etc/kubernetes/admin.kubeconfig`: 指定要保存 kubeconfig 文件的路径和名称。
################################################## 说明结束 ##################################################
[root@srv1 pki]# kubectl config set-credentials kubernetes-admin \ --client-certificate=/etc/kubernetes/pki/admin.pem \ --client-key=/etc/kubernetes/pki/admin-key.pem \ --embed-certs=true \ --kubeconfig=/etc/kubernetes/admin.kubeconfig User "kubernetes-admin" set.
################################################## 参数说明 ##################################################
# 这段命令是用于设置 kubernetes-admin 组件的身份验证凭据,并生成相应的 kubeconfig 文件。 # # 解释每个选项的含义如下: # - `kubectl config set-credentials kubernetes-admin`:设置 `kubernetes-admin` 用户的身份验证凭据。 # # - `--client-certificate=/etc/kubernetes/pki/admin.pem`:指定一个客户端证书文件,用于基于证书的身份验证。 # 在这种情况下,指定了 admin 组件的证书文件路径。 # - `--client-key=/etc/kubernetes/pki/admin-key.pem`:指定与客户端证书相对应的客户端私钥文件。 # # - `--embed-certs=true`:将客户端证书和私钥嵌入到生成的 kubeconfig 文件中。 # # - `--kubeconfig=/etc/kubernetes/admin.kubeconfig`:指定生成的 kubeconfig 文件的路径和名称。 # # 该命令的目的是为 admin 组件生成一个 kubeconfig 文件,以便进行身份验证和访问集群资源。kubeconfig 文件是一个包含了 # 连接到 Kubernetes 集群所需的所有配置信息的文件,包括服务器地址、证书和秘钥等。
################################################## 说明结束 ##################################################
[root@srv1 pki]# kubectl config set-context kubernetes-admin@kubernetes \ --cluster=kubernetes \ --user=kubernetes-admin \ --kubeconfig=/etc/kubernetes/admin.kubeconfig Context "kubernetes-admin@kubernetes" created.
################################################## 参数说明 ##################################################
# 该命令用于设置一个名为"kubernetes-admin@kubernetes"的上下文,具体配置如下: # # 1. --cluster=kubernetes: 指定集群的名称为"kubernetes",这个集群是在当前的kubeconfig文件中已经定义好的。 # # 2. --user=kubernetes-admin: 指定用户的名称为"kubernetes-admin",这个用户也是在当前的kubeconfig文件中已经定义好的。 # 这个用户用于认证和授权admin组件访问Kubernetes集群的权限。 # # 3. --kubeconfig=/etc/kubernetes/admin.kubeconfig: 指定kubeconfig文件的路径 # 为"/etc/kubernetes/admin.kubeconfig",这个文件将被用来保存上下文的配置信息。 # # 这个命令的作用是将上述的配置信息保存到指定的kubeconfig文件中,以便后续使用该文件进行认证和授权访问Kubernetes集群。
################################################## 说明结束 ##################################################
[root@srv1 pki]# kubectl config use-context kubernetes-admin@kubernetes \ --kubeconfig=/etc/kubernetes/admin.kubeconfig Switched to context "kubernetes-admin@kubernetes".
################################################## 参数说明 ##################################################
# 上述命令是使用`kubectl`命令来配置Kubernetes集群中的调度器组件。 # # `kubectl config use-context`命令用于切换`kubectl`当前使用的上下文。上下文是Kubernetes集群、用户和命名空间的组合, # 用于确定`kubectl`的连接目标。下面解释这个命令的不同部分: # # - `kubernetes-admin@kubernetes`是一个上下文名称。它指定了使用`kubernetes-admin`用户和`kubernetes`命名空间的 # 系统级别上下文。系统级别上下文用于操作Kubernetes核心组件。 # # - `--kubeconfig=/etc/kubernetes/admin.kubeconfig`用于指定Kubernetes配置文件的路径。Kubernetes配置文件包含 # 连接到Kubernetes集群所需的身份验证和连接信息。 # # 通过运行以上命令,`kubectl`将使用指定的上下文和配置文件,以便在以后的命令中能正确地与Kubernetes集群中的调度器组件进行交互。
################################################## 说明结束 ##################################################
9) 创建kube-proxy证书---仅在srv1上操作 [root@srv1 pki]# cat > kube-proxy-csr.json << EOF { "CN": "system:kube-proxy", "key": { "algo": "rsa", "size": 2048 }, "names": [ { "C": "CN", "ST": "Beijing", "L": "Beijing", "O": "system:kube-proxy", "OU": "Kubernetes-manual" } ] } EOF
################################################## 参数说明 ##################################################
# 这段代码是一个JSON格式的配置文件,用于创建和配置一个名为"kube-proxy-csr"的Kubernetes凭证。 # # 这个凭证包含以下字段: # # - "CN": "system:kube-proxy": 这是凭证的通用名称,表示这是一个管理员凭证。 # - "key": 这是一个包含证书密钥相关信息的对象。 # - "algo": "rsa":这是使用的加密算法类型,这里是RSA加密算法。 # - "size": 2048:这是密钥的大小,这里是2048位。 # - "names": 这是一个包含证书名称信息的数组。 # - "C": "CN":这是证书的国家/地区字段,这里是中国。 # - "ST": "Beijing":这是证书的省/州字段,这里是北京。 # - "L": "Beijing":这是证书的城市字段,这里是北京。 # - "O": "system:kube-proxy":这是证书的组织字段,这里是system:kube-proxy。 # - "OU": "Kubernetes-manual":这是证书的部门字段,这里是Kubernetes-manual。 # # 通过这个配置文件创建的凭证将具有管理员权限,并且可以用于管理Kubernetes集群。
################################################## 说明结束 ##################################################
[root@srv1 pki]# cfssl gencert \ -ca=/etc/kubernetes/pki/ca.pem \ -ca-key=/etc/kubernetes/pki/ca-key.pem \ -config=ca-config.json \ -profile=kubernetes \ kube-proxy-csr.json | cfssljson -bare /etc/kubernetes/pki/kube-proxy 2023/10/01 10:57:07 [INFO] generate received request 2023/10/01 10:57:07 [INFO] received CSR 2023/10/01 10:57:07 [INFO] generating key: rsa-2048 2023/10/01 10:57:08 [INFO] encoded CSR 2023/10/01 10:57:08 [INFO] signed certificate with serial number 517300751372677093868373203239628896395277861636 2023/10/01 10:57:08 [WARNING] This certificate lacks a "hosts" field. This makes it unsuitable for websites. For more information see the Baseline Requirements for the Issuance and Management of Publicly-Trusted Certificates, v.1.1.6, from the CA/Browser Forum (https://cabforum.org); specifically, section 10.2.3 ("Information Requirements").
################################################## 参数说明 ##################################################
# 上述命令是使用cfssl工具生成Kubernetes admin的证书。 # # 具体解释如下: # # 1. `cfssl gencert`:使用cfssl工具生成证书。 # # 2. `-ca=/etc/kubernetes/pki/ca.pem`:指定根证书文件的路径。在这里,是指定根证书的路径为 # `/etc/kubernetes/pki/ca.pem`。 # # 3. `-ca-key=/etc/kubernetes/pki/ca-key.pem`:指定根证书私钥文件的路径。在这里,是指定根证书私钥的路径为 # `/etc/kubernetes/pki/ca-key.pem`。 # # 4. `-config=ca-config.json`:指定证书配置文件的路径。在这里,是指定证书配置文件的路径为`ca-config.json`。 # # 5. `-profile=kubernetes`:指定证书的配置文件中的一个配置文件模板。在这里,是指定配置文件中的`kubernetes`配置模板。 # # 6. `kube-proxy-csr.json`:指定admin的证书签名请求文件(CSR)的路径。在这里,是指定请求文件的路径为 # `kube-proxy-csr.json`。 # # 7. `|`(管道符号):将前一个命令的输出作为下一个命令的输入。 # # 8. `cfssljson`:将cfssl工具生成的证书签名请求(CSR)进行解析。 # # 9. `-bare /etc/kubernetes/pki/kube-proxy`:指定输出路径和前缀。在这里,是将解析的证书签名请求生成以下文件: # `/etc/kubernetes/pki/kube-proxy.pem`(包含了证书)、`/etc/kubernetes/pki/kube-proxy-key.pem`(包含了私钥)。 # # 总结来说,这个命令的目的是根据根证书、根证书私钥、证书配置文件、CSR文件等生成Kubernetes Scheduler的证书和私钥文件。
################################################## 说明结束 ##################################################
[root@srv1 pki]# kubectl config set-cluster kubernetes \ --certificate-authority=/etc/kubernetes/pki/ca.pem \ --embed-certs=true \ --server=https://192.168.1.21:16443 \ --kubeconfig=/etc/kubernetes/kube-proxy.kubeconfig Cluster "kubernetes" set.
################################################## 参数说明 ##################################################
# 若使用 haproxy、keepalived 那么为 `--server=https://192.168.1.21:16443` # 若使用 nginx方案,那么为 `--server=https://127.0.0.1:16443` # # 该命令用于配置一个名为"kubernetes"的集群,并将其应用到/etc/kubernetes/kube-proxy.kubeconfig文件中。 # # 该命令的解释如下: # - `kubectl config set-cluster kubernetes`: 设置一个集群并命名为"kubernetes"。 # - `--certificate-authority=/etc/kubernetes/pki/ca.pem`: 指定集群使用的证书授权机构的路径。 # - `--embed-certs=true`: 该标志指示将证书嵌入到生成的kubeconfig文件中。 # - `--server=https://127.0.0.1:8443`: 指定集群的 API server 位置。 # - `--kubeconfig=/etc/kubernetes/kube-proxy.kubeconfig`: 指定要保存 kubeconfig 文件的路径和名称。
################################################## 说明结束 ##################################################
[root@srv1 pki]# kubectl config set-credentials kube-proxy \ --client-certificate=/etc/kubernetes/pki/kube-proxy.pem \ --client-key=/etc/kubernetes/pki/kube-proxy-key.pem \ --embed-certs=true \ --kubeconfig=/etc/kubernetes/kube-proxy.kubeconfig User "kube-proxy" set.
################################################## 参数说明 ##################################################
# 这段命令是用于设置 kube-proxy 组件的身份验证凭据,并生成相应的 kubeconfig 文件。 # # 解释每个选项的含义如下: # - `kubectl config set-credentials kube-proxy`:设置 `kube-proxy` 用户的身份验证凭据。 # # - `--client-certificate=/etc/kubernetes/pki/kube-proxy.pem`:指定一个客户端证书文件,用于基于证书的身份验证。 # 在这种情况下,指定了 kube-proxy 组件的证书文件路径。 # # - `--client-key=/etc/kubernetes/pki/kube-proxy-key.pem`:指定与客户端证书相对应的客户端私钥文件。 # # - `--embed-certs=true`:将客户端证书和私钥嵌入到生成的 kubeconfig 文件中。 # # - `--kubeconfig=/etc/kubernetes/kube-proxy.kubeconfig`:指定生成的 kubeconfig 文件的路径和名称。 # # 该命令的目的是为 kube-proxy 组件生成一个 kubeconfig 文件,以便进行身份验证和访问集群资源。kubeconfig 文件是一个 # 包含了连接到 Kubernetes 集群所需的所有配置信息的文件,包括服务器地址、证书和秘钥等。
################################################## 说明结束 ##################################################
[root@srv1 pki]# kubectl config set-context kube-proxy@kubernetes \ --cluster=kubernetes \ --user=kube-proxy \ --kubeconfig=/etc/kubernetes/kube-proxy.kubeconfig Context "kube-proxy@kubernetes" created.
################################################## 参数说明 ##################################################
# 该命令用于设置一个名为"kube-proxy@kubernetes"的上下文,具体配置如下: # # 1. --cluster=kubernetes: 指定集群的名称为"kubernetes",这个集群是在当前的kubeconfig文件中已经定义好的。 # # 2. --user=kube-proxy: 指定用户的名称为"kube-proxy",这个用户也是在当前的kubeconfig文件中已经定义好的。 # 这个用户用于认证和授权kube-proxy组件访问Kubernetes集群的权限。 # # 3. --kubeconfig=/etc/kubernetes/kube-proxy.kubeconfig: 指定kubeconfig文件的路径为 # "/etc/kubernetes/kube-proxy.kubeconfig",这个文件将被用来保存上下文的配置信息。 # # 这个命令的作用是将上述的配置信息保存到指定的kubeconfig文件中,以便后续使用该文件进行认证和授权访问Kubernetes集群。
################################################## 说明结束 ##################################################
[root@srv1 pki]# kubectl config use-context kube-proxy@kubernetes \ --kubeconfig=/etc/kubernetes/kube-proxy.kubeconfig Switched to context "kube-proxy@kubernetes".
################################################## 参数说明 ##################################################
# 上述命令是使用`kubectl`命令来配置Kubernetes集群中的调度器组件。 # # `kubectl config use-context`命令用于切换`kubectl`当前使用的上下文。上下文是Kubernetes集群、用户和命名空间的组合, # 用于确定`kubectl`的连接目标。下面解释这个命令的不同部分: # # - `kube-proxy@kubernetes`是一个上下文名称。它指定了使用`kube-proxy`用户和`kubernetes`命名空间的系统级别上下文。 # 系统级别上下文用于操作Kubernetes核心组件。 # # - `--kubeconfig=/etc/kubernetes/kube-proxy.kubeconfig`用于指定Kubernetes配置文件的路径。Kubernetes配置文件 # 包含连接到Kubernetes集群所需的身份验证和连接信息。 # # 通过运行以上命令,`kubectl`将使用指定的上下文和配置文件,以便在以后的命令中能正确地与Kubernetes集群中的调度器组件进行交互。
################################################## 说明结束 ##################################################
10) 创建ServiceAccount Key ——secret---仅在srv1上操作 [root@srv1 pki]# openssl genrsa -out /etc/kubernetes/pki/sa.key 2048 Generating RSA private key, 2048 bit long modulus ...........................+++ ..............................+++ e is 65537 (0x10001)
[root@srv1 pki]# openssl rsa -in /etc/kubernetes/pki/sa.key -pubout -out /etc/kubernetes/pki/sa.pub writing RSA key
11) 将证书发送到其他master节点 [root@srv1 pki]# for NODE in $master do for FILE in $(ls /etc/kubernetes/pki | grep -v etcd) do scp /etc/kubernetes/pki/${FILE} $NODE:/etc/kubernetes/pki/${FILE} done for FILE in admin.kubeconfig controller-manager.kubeconfig scheduler.kubeconfig do scp /etc/kubernetes/${FILE} $NODE:/etc/kubernetes/${FILE} done done
12) 确认证书数量 [root@srv1 pki]# ls /etc/kubernetes/pki/ admin.csr controller-manager.csr kube-proxy.csr admin-key.pem controller-manager-key.pem kube-proxy-key.pem admin.pem controller-manager.pem kube-proxy.pem apiserver.csr front-proxy-ca.csr sa.key apiserver-key.pem front-proxy-ca-key.pem sa.pub apiserver.pem front-proxy-ca.pem scheduler.csr ca.csr front-proxy-client.csr scheduler-key.pem ca-key.pem front-proxy-client-key.pem scheduler.pem ca.pem front-proxy-client.pem
[root@srv1 pki]# ls /etc/kubernetes/pki/ | wc -l 26
2.6 etcd集群配置
1) 创建etcd配置文件---三台Master均操作
[root@srv1 pki]# cat > /etc/etcd/etcd.config.yml << EOF
name: 'srv1.1000y.cloud'
data-dir: /var/lib/etcd
wal-dir: /var/lib/etcd/wal
snapshot-count: 5000
heartbeat-interval: 100
election-timeout: 1000
quota-backend-bytes: 0
listen-peer-urls: 'https://192.168.1.11:2380'
listen-client-urls: 'https://192.168.1.11:2379,http://127.0.0.1:2379'
max-snapshots: 3
max-wals: 5
cors:
initial-advertise-peer-urls: 'https://192.168.1.11:2380'
advertise-client-urls: 'https://192.168.1.11:2379'
discovery:
discovery-fallback: 'proxy'
discovery-proxy:
discovery-srv:
initial-cluster: 'srv1.1000y.cloud=https://192.168.1.11:2380,srv2.1000y.cloud=https://192.168.1.12:2380,srv3.1000y.cloud=https://192.168.1.13:2380'
initial-cluster-token: 'etcd-k8s-cluster'
initial-cluster-state: 'new'
strict-reconfig-check: false
enable-v2: true
enable-pprof: true
proxy: 'off'
proxy-failure-wait: 5000
proxy-refresh-interval: 30000
proxy-dial-timeout: 1000
proxy-write-timeout: 5000
proxy-read-timeout: 0
client-transport-security:
  cert-file: '/etc/etcd/ssl/etcd.pem'
  key-file: '/etc/etcd/ssl/etcd-key.pem'
  client-cert-auth: true
  trusted-ca-file: '/etc/kubernetes/pki/etcd/etcd-ca.pem'
  auto-tls: true
peer-transport-security:
  cert-file: '/etc/kubernetes/pki/etcd/etcd.pem'
  key-file: '/etc/kubernetes/pki/etcd/etcd-key.pem'
  peer-client-cert-auth: true
  trusted-ca-file: '/etc/kubernetes/pki/etcd/etcd-ca.pem'
  auto-tls: true
debug: false
log-package-levels:
log-outputs: [default]
force-new-cluster: false
EOF
################################################## 参数说明 ##################################################
- `name`:指定了当前节点的名称,用于集群中区分不同的节点。 - `data-dir`:指定了 etcd 数据的存储目录。 - `wal-dir`:指定了 etcd 数据写入磁盘的目录。 - `snapshot-count`:指定了触发快照的事务数量。 - `heartbeat-interval`:指定了 etcd 集群中节点之间的心跳间隔。 - `election-timeout`:指定了选举超时时间。 - `quota-backend-bytes`:指定了存储的限额,0 表示无限制。 - `listen-peer-urls`:指定了节点之间通信的 URL,使用 HTTPS 协议。 - `listen-client-urls`:指定了客户端访问 etcd 集群的 URL,同时提供了本地访问的 URL。 - `max-snapshots`:指定了快照保留的数量。 - `max-wals`:指定了日志保留的数量。 - `initial-advertise-peer-urls`:指定了节点之间通信的初始 URL。 - `advertise-client-urls`:指定了客户端访问 etcd 集群的初始 URL。 - `discovery`:定义了 etcd 集群发现相关的选项。 - `initial-cluster`:指定了 etcd 集群的初始成员。 - `initial-cluster-token`:指定了集群的 token。 - `initial-cluster-state`:指定了集群的初始状态。 - `strict-reconfig-check`:指定了严格的重新配置检查选项。 - `enable-v2`:启用了 v2 API。 - `enable-pprof`:启用了性能分析。 - `proxy`:设置了代理模式。 - `client-transport-security`:客户端的传输安全配置。 - `peer-transport-security`:节点之间的传输安全配置。 - `debug`:是否启用调试模式。 - `log-package-levels`:日志的输出级别。 - `log-outputs`:指定了日志的输出类型。 - `force-new-cluster`:是否强制创建一个新的集群。
这些参数和选项可以根据实际需求进行调整和配置。 ################################################## 说明结束 ##################################################
[root@srv2 ~]# cat > /etc/etcd/etcd.config.yml << EOF name: 'srv2.1000y.cloud' data-dir: /var/lib/etcd wal-dir: /var/lib/etcd/wal snapshot-count: 5000 heartbeat-interval: 100 election-timeout: 1000 quota-backend-bytes: 0 listen-peer-urls: 'https://192.168.1.12:2380' listen-client-urls: 'https://192.168.1.12:2379,http://127.0.0.1:2379' max-snapshots: 3 max-wals: 5 cors: initial-advertise-peer-urls: 'https://192.168.1.12:2380' advertise-client-urls: 'https://192.168.1.12:2379' discovery: discovery-fallback: 'proxy' discovery-proxy: discovery-srv: initial-cluster: 'srv1.1000y.cloud=https://192.168.1.11:2380,srv2.1000y.cloud=https://192.168.1.12:2380,srv3.1000y.cloud=https://192.168.1.13:2380' initial-cluster-token: 'etcd-k8s-cluster' initial-cluster-state: 'new' strict-reconfig-check: false enable-v2: true enable-pprof: true proxy: 'off' proxy-failure-wait: 5000 proxy-refresh-interval: 30000 proxy-dial-timeout: 1000 proxy-write-timeout: 5000 proxy-read-timeout: 0 client-transport-security: cert-file: '/etc/etcd/ssl/etcd.pem' key-file: '/etc/etcd/ssl/etcd-key.pem' client-cert-auth: true trusted-ca-file: '/etc/kubernetes/pki/etcd/etcd-ca.pem' auto-tls: true peer-transport-security: cert-file: '/etc/kubernetes/pki/etcd/etcd.pem' key-file: '/etc/kubernetes/pki/etcd/etcd-key.pem' peer-client-cert-auth: true trusted-ca-file: '/etc/kubernetes/pki/etcd/etcd-ca.pem' auto-tls: true debug: false log-package-levels: log-outputs: [default] force-new-cluster: false EOF
[root@srv3 ~]# cat > /etc/etcd/etcd.config.yml << EOF name: 'srv3.1000y.cloud' data-dir: /var/lib/etcd wal-dir: /var/lib/etcd/wal snapshot-count: 5000 heartbeat-interval: 100 election-timeout: 1000 quota-backend-bytes: 0 listen-peer-urls: 'https://192.168.1.13:2380' listen-client-urls: 'https://192.168.1.13:2379,http://127.0.0.1:2379' max-snapshots: 3 max-wals: 5 cors: initial-advertise-peer-urls: 'https://192.168.1.13:2380' advertise-client-urls: 'https://192.168.1.13:2379' discovery: discovery-fallback: 'proxy' discovery-proxy: discovery-srv: initial-cluster: 'srv1.1000y.cloud=https://192.168.1.11:2380,srv2.1000y.cloud=https://192.168.1.12:2380,srv3.1000y.cloud=https://192.168.1.13:2380' initial-cluster-token: 'etcd-k8s-cluster' initial-cluster-state: 'new' strict-reconfig-check: false enable-v2: true enable-pprof: true proxy: 'off' proxy-failure-wait: 5000 proxy-refresh-interval: 30000 proxy-dial-timeout: 1000 proxy-write-timeout: 5000 proxy-read-timeout: 0 client-transport-security: cert-file: '/etc/etcd/ssl/etcd.pem' key-file: '/etc/etcd/ssl/etcd-key.pem' client-cert-auth: true trusted-ca-file: '/etc/kubernetes/pki/etcd/etcd-ca.pem' auto-tls: true peer-transport-security: cert-file: '/etc/kubernetes/pki/etcd/etcd.pem' key-file: '/etc/kubernetes/pki/etcd/etcd-key.pem' peer-client-cert-auth: true trusted-ca-file: '/etc/kubernetes/pki/etcd/etcd-ca.pem' auto-tls: true debug: false log-package-levels: log-outputs: [default] force-new-cluster: false EOF
2) 创建etcd service文件---三台Master均操作 [root@srv1 pki]# cat > /usr/lib/systemd/system/etcd.service << EOF [Unit] Description=Etcd Service Documentation=https://coreos.com/etcd/docs/latest/ After=network.target
[Service] Type=notify ExecStart=/usr/local/bin/etcd --config-file=/etc/etcd/etcd.config.yml Restart=on-failure RestartSec=10 LimitNOFILE=65536
[Install] WantedBy=multi-user.target Alias=etcd3.service EOF

3) 创建etcd证书目录---三台Master均操作 [root@srv1 pki]# mkdir /etc/kubernetes/pki/etcd [root@srv1 pki]# ln -s /etc/etcd/ssl/* /etc/kubernetes/pki/etcd/ [root@srv1 pki]# systemctl daemon-reload [root@srv1 pki]# systemctl enable --now etcd.service
4) 确认etcd集群状态 [root@srv1 pki]# export ETCDCTL_API=3 [root@srv1 pki]# etcdctl \ --endpoints="192.168.1.11:2379,192.168.1.12:2379,192.168.1.13:2379" \ --cacert=/etc/kubernetes/pki/etcd/etcd-ca.pem \ --cert=/etc/kubernetes/pki/etcd/etcd.pem \ --key=/etc/kubernetes/pki/etcd/etcd-key.pem \ endpoint status --write-out=table +-------------------+------------------+---------+---------+-----------+------------+-----------+ | ENDPOINT | ID | VERSION | DB SIZE | IS LEADER | IS LEARNER | RAFT TERM | +-------------------+------------------+---------+---------+-----------+------------+-----------+ | 192.168.1.11:2379 | ace8d5b0766b3d92 | 3.5.9 | 20 kB | true | false | 2 | | 192.168.1.12:2379 | ac7e57d44f030e8 | 3.5.9 | 20 kB | false | false | 2 | | 192.168.1.13:2379 | 40ba37809e1a423f | 3.5.9 | 20 kB | false | false | 2 | +-------------------+------------------+---------+---------+-----------+------------+-----------+ ------------+--------------------+--------+ RAFT INDEX | RAFT APPLIED INDEX | ERRORS | ------------+--------------------+--------+ 9 | 9 | | 9 | 9 | | 9 | 9 | | ------------+--------------------+--------+
[root@srv1 pki]# etcdctl \ --endpoints="192.168.1.11:2379,192.168.1.12:2379,192.168.1.13:2379" \ --cacert=/etc/kubernetes/pki/etcd/etcd-ca.pem \ --cert=/etc/kubernetes/pki/etcd/etcd.pem \ --key=/etc/kubernetes/pki/etcd/etcd-key.pem \ endpoint health --write-out=table +-------------------+--------+-------------+-------+ | ENDPOINT | HEALTH | TOOK | ERROR | +-------------------+--------+-------------+-------+ | 192.168.1.11:2379 | true | 27.632698ms | | | 192.168.1.13:2379 | true | 56.126148ms | | | 192.168.1.12:2379 | true | 58.610262ms | | +-------------------+--------+-------------+-------+
4) 测试 [root@srv1 ~]# etcdctl \ --endpoints="192.168.1.11:2379,192.168.1.12:2379,192.168.1.13:2379" \ --cacert=/etc/kubernetes/pki/etcd/etcd-ca.pem \ --cert=/etc/kubernetes/pki/etcd/etcd.pem \ --key=/etc/kubernetes/pki/etcd/etcd-key.pem \ put 1000y "Hello World" OK
[root@srv1 ~]# etcdctl \ --endpoints="192.168.1.11:2379,192.168.1.12:2379,192.168.1.13:2379" \ --cacert=/etc/kubernetes/pki/etcd/etcd-ca.pem \ --cert=/etc/kubernetes/pki/etcd/etcd.pem \ --key=/etc/kubernetes/pki/etcd/etcd-key.pem \ get 1000y 1000y Hello World
[root@srv1 ~]# systemctl stop etcd
[root@srv1 ~]# etcdctl \ --endpoints="192.168.1.11:2379,192.168.1.12:2379,192.168.1.13:2379" \ --cacert=/etc/kubernetes/pki/etcd/etcd-ca.pem \ --cert=/etc/kubernetes/pki/etcd/etcd.pem \ --key=/etc/kubernetes/pki/etcd/etcd-key.pem \ endpoint health --write-out=table {"level":"warn","ts":"2023-10-01T13:31:32.676026+0800","logger":"client","caller":"v3@v3.5.9/retry_interceptor.go:62", "msg":"retrying of unary invoker failed","target":"etcd-endpoints://0xc0003088c0/192.168.1.11:2379", "attempt":0,"error":"rpc error: code = DeadlineExceeded desc = latest balancer error: last connection error: connection error: desc = \"transport: Error while dialing dial tcp 192.168.1.11:2379: connect: connection refused\""} +-------------------+--------+--------------+---------------------------+ | ENDPOINT | HEALTH | TOOK | ERROR | +-------------------+--------+--------------+---------------------------+ | 192.168.1.12:2379 | true | 27.977355ms | | | 192.168.1.13:2379 | true | 29.57418ms | | | 192.168.1.11:2379 | false | 5.002483674s | context deadline exceeded | +-------------------+--------+--------------+---------------------------+
[root@srv1 ~]# etcdctl \ --endpoints="192.168.1.11:2379,192.168.1.12:2379,192.168.1.13:2379" \ --cacert=/etc/kubernetes/pki/etcd/etcd-ca.pem \ --cert=/etc/kubernetes/pki/etcd/etcd.pem \ --key=/etc/kubernetes/pki/etcd/etcd-key.pem \ get 1000y 1000y Hello World
[root@srv1 ~]# systemctl start etcd
[root@srv1 ~]# etcdctl \ --endpoints="192.168.1.11:2379,192.168.1.12:2379,192.168.1.13:2379" \ --cacert=/etc/kubernetes/pki/etcd/etcd-ca.pem \ --cert=/etc/kubernetes/pki/etcd/etcd.pem \ --key=/etc/kubernetes/pki/etcd/etcd-key.pem \ del 1000y 1
2.7 Kubernets HA配置---HAProxy+KeepAlived与Nginx二选一
1) HAProxy+KeepAlived---三台Master均操作
(1) 安装HAProxy及KeepAlived
[root@srv1 ~]# yum install haproxy keepalived -y
(2) 移除原配置文件 [root@srv1 ~]# mv /etc/haproxy/haproxy.cfg /etc/haproxy/haproxy.cfg.bak [root@srv1 ~]# mv /etc/keepalived/keepalived.conf /etc/keepalived/keepalived.conf.bak
(3) 配置HAProxy---3台主机配置文件一致 [root@srv1 ~]# cat >/etc/haproxy/haproxy.cfg << EOF global maxconn 2000 ulimit-n 16384 log 127.0.0.1 local0 err stats timeout 30s
defaults log global mode http option httplog timeout connect 5000 timeout client 50000 timeout server 50000 timeout http-request 15s timeout http-keep-alive 15s
frontend monitor-in bind *:33305 mode http option httplog monitor-uri /monitor
frontend k8s-master bind 0.0.0.0:16443 bind 127.0.0.1:16443 mode tcp option tcplog tcp-request inspect-delay 5s default_backend k8s-master
backend k8s-master mode tcp option tcplog option tcp-check balance roundrobin default-server inter 10s downinter 5s rise 2 fall 2 slowstart 60s maxconn 250 maxqueue 256 weight 100 server srv1.1000y.cloud 192.168.1.11:6443 check server srv2.1000y.cloud 192.168.1.12:6443 check server srv3.1000y.cloud 192.168.1.13:6443 check EOF

################################################## 参数说明 ##################################################
HAProxy负载均衡器配置各部分的解释: 1. global: - maxconn 2000: 设置每个进程的最大连接数为2000。 - ulimit-n 16384: 设置每个进程的最大文件描述符数为16384。 - log 127.0.0.1 local0 err: 指定日志的输出地址为本地主机的127.0.0.1,并且只记录错误级别的日志。 - stats timeout 30s: 设置查看负载均衡器统计信息的超时时间为30秒。
2. defaults: - log global: 使默认日志与global部分相同。 - mode http: 设定负载均衡器的工作模式为HTTP模式。 - option httplog: 使负载均衡器记录HTTP协议的日志。 - timeout connect 5000: 设置与后端服务器建立连接的超时时间为5秒。 - timeout client 50000: 设置与客户端的连接超时时间为50秒。 - timeout server 50000: 设置与后端服务器连接的超时时间为50秒。 - timeout http-request 15s: 设置处理HTTP请求的超时时间为15秒。 - timeout http-keep-alive 15s: 设置保持HTTP连接的超时时间为15秒。
3. frontend monitor-in: - bind *:33305: 监听所有IP地址的33305端口。 - mode http: 设定frontend的工作模式为HTTP模式。 - option httplog: 记录HTTP协议的日志。 - monitor-uri /monitor: 设置监控URI为/monitor。
4. frontend k8s-master: - bind 0.0.0.0:16443: 监听所有IP地址的16443端口。 - bind 127.0.0.1:16443: 监听本地主机的16443端口。 - mode tcp: 设定frontend的工作模式为TCP模式。 - option tcplog: 记录TCP协议的日志。 - tcp-request inspect-delay 5s: 设置在接收到请求后延迟5秒进行检查。 - default_backend k8s-master: 设置默认的后端服务器组为k8s-master。
5. backend k8s-master: - mode tcp: 设定backend的工作模式为TCP模式。 - option tcplog: 记录TCP协议的日志。 - option tcp-check: 启用TCP检查功能。 - balance roundrobin: 使用轮询算法进行负载均衡。 - default-server inter 10s downinter 5s rise 2 fall 2 slowstart 60s maxconn 250 maxqueue 256 weight 100: 设置默认的服务器参数。 - server srv1.1000y.cloud 192.168.1.11:6443 check: 增加一个名为k8s-master01的服务器,IP地址为192.168.0.31,端口号为6443,并对其进行健康检查。 - server srv2.1000y.cloud 192.168.1.12:6443 check: 增加一个名为k8s-master02的服务器,IP地址为192.168.0.32,端口号为6443,并对其进行健康检查。 - server srv3.1000y.cloud 192.168.1.13:6443 check: 增加一个名为k8s-master03的服务器,IP地址为192.168.0.33,端口号为6443,并对其进行健康检查。
以上就是这段配置代码的详细解释。它主要定义了全局配置、默认配置、前端监听和后端服务器组的相关参数和设置。通过这些配置,可以实现负载均衡和监控功能。 ################################################## 说明结束 ##################################################
(4) 配置KeepAlived---不同Master主机配置文件不一样 [root@srv1 ~]# cat > /etc/keepalived/keepalived.conf << EOF ! Configuration File for keepalived
global_defs { router_id LVS_DEVEL script_user root enable_script_security }
vrrp_script chk_apiserver { script "/etc/keepalived/check_apiserver.sh" interval 5 weight -5 fall 2 rise 1 }
vrrp_instance VI_1 { state MASTER # 注意网卡名 interface eth0 mcast_src_ip 192.168.1.11 virtual_router_id 51 priority 100 nopreempt advert_int 2 authentication { auth_type PASS auth_pass K8SHA_KA_AUTH } virtual_ipaddress { 192.168.1.21 } track_script { chk_apiserver } } EOF

[root@srv2 ~]# cat > /etc/keepalived/keepalived.conf << EOF ! Configuration File for keepalived
global_defs { router_id LVS_DEVEL script_user root enable_script_security }
vrrp_script chk_apiserver { script "/etc/keepalived/check_apiserver.sh" interval 5 weight -5 fall 2 rise 1 }
vrrp_instance VI_1 { state BACKUP # 注意网卡名 interface eth0 mcast_src_ip 192.168.1.12 virtual_router_id 51 priority 80 nopreempt advert_int 2 authentication { auth_type PASS auth_pass K8SHA_KA_AUTH } virtual_ipaddress { 192.168.1.21 } track_script { chk_apiserver } } EOF

[root@srv3 ~]# cat > /etc/keepalived/keepalived.conf << EOF ! Configuration File for keepalived
global_defs { router_id LVS_DEVEL script_user root enable_script_security }
vrrp_script chk_apiserver { script "/etc/keepalived/check_apiserver.sh" interval 5 weight -5 fall 2 rise 1 }
vrrp_instance VI_1 { state BACKUP # 注意网卡名 interface eth0 mcast_src_ip 192.168.1.13 virtual_router_id 51 priority 50 nopreempt advert_int 2 authentication { auth_type PASS auth_pass K8SHA_KA_AUTH } virtual_ipaddress { 192.168.1.21 } track_script { chk_apiserver } } EOF

################################################## 参数说明 ##################################################
- `global_defs`部分定义了全局参数。 - `router_id`参数指定了当前路由器的标识,这里设置为"LVS_DEVEL"。
- `vrrp_script`部分定义了一个VRRP脚本。`chk_apiserver`是脚本的名称, - `script`参数指定了脚本的路径。该脚本每5秒执行一次,返回值为0表示服务正常,返回值为1表示服务异常。 - `weight`参数指定了根据脚本返回的值来调整优先级,这里设置为-5。 - `fall`参数指定了失败阈值,当连续2次脚本返回值为1时认为服务异常。 - `rise`参数指定了恢复阈值,当连续1次脚本返回值为0时认为服务恢复正常。
- `vrrp_instance`部分定义了一个VRRP实例。`VI_1`是实例的名称。 - `state`参数指定了当前实例的状态,这里设置为MASTER表示当前实例是主节点。 - `interface`参数指定了要监听的网卡,这里设置为eth0。 - `mcast_src_ip`参数指定了VRRP报文的源IP地址,这里设置为192.168.0.31。 - `virtual_router_id`参数指定了虚拟路由器的ID,这里设置为51。 - `priority`参数指定了实例的优先级,优先级越高(数值越大)越有可能被选为主节点。 - `nopreempt`参数指定了当主节点失效后不要抢占身份,即不要自动切换为主节点。 - `advert_int`参数指定了发送广播的间隔时间,这里设置为2秒。 - `authentication`部分指定了认证参数 - `auth_type`参数指定了认证类型,这里设置为PASS表示使用密码认证, - `auth_pass`参数指定了认证密码,这里设置为K8SHA_KA_AUTH。 - `virtual_ipaddress`部分指定了虚拟IP地址,这里设置为192.168.0.36。 - `track_script`部分指定了要跟踪的脚本,这里跟踪了chk_apiserver脚本。 ################################################## 说明结束 ##################################################
(5) 健康检查脚本---3台Master配置一致 [root@srv1 ~]# cat > /etc/keepalived/check_apiserver.sh << EOF #!/bin/bash
err=0 for k in \$(seq 1 3) do check_code=\$(pgrep haproxy) if [[ \$check_code == "" ]]; then err=\$(expr \$err + 1) sleep 1 continue else err=0 break fi done
if [[ \$err != "0" ]]; then echo "systemctl stop keepalived" /usr/bin/systemctl stop keepalived exit 1 else exit 0 fi EOF

################################################## 参数说明 ##################################################
# 脚本的主要逻辑如下: # 1. 首先设置一个变量err为0,用来记录错误次数。 # 2. 使用一个循环,在循环内部执行以下操作: # a. 使用pgrep命令检查是否有名为haproxy的进程在运行。如果不存在该进程,将err加1,并暂停1秒钟,然后继续下一次循环。 # b. 如果存在haproxy进程,将err重置为0,并跳出循环。 # 3. 检查err的值,如果不为0,表示检查失败,输出一条错误信息并执行“systemctl stop keepalived”命令停止keepalived进程, # 并退出脚本返回1。 # # 4. 如果err的值为0,表示检查成功,退出脚本返回0。 # # 该脚本的主要作用是检查是否存在运行中的haproxy进程,如果无法检测到haproxy进程,将停止keepalived进程并返回错误状态。 # 如果haproxy进程存在,则返回成功状态。这个脚本可能是作为一个健康检查脚本的一部分,在确保haproxy服务可用的情况下, # 才继续运行其他操作。
################################################## 说明结束 ##################################################
[root@srv1 ~]# chmod +x /etc/keepalived/check_apiserver.sh
(6) 启动服务---3台Master操作一致 [root@srv1 ~]# systemctl daemon-reload [root@srv1 ~]# systemctl enable --now haproxy keepalived
(7) 启动服务---测试 [root@srv1 ~]# ping -c 2 192.168.1.21 PING 192.168.1.21 (192.168.1.21) 56(84) bytes of data. 64 bytes from 192.168.1.21: icmp_seq=1 ttl=64 time=0.067 ms 64 bytes from 192.168.1.21: icmp_seq=2 ttl=64 time=0.037 ms
--- 192.168.1.21 ping statistics --- 2 packets transmitted, 2 received, 0% packet loss, time 1032ms rtt min/avg/max/mdev = 0.037/0.052/0.067/0.015 ms
[root@srv1 ~]# telnet 192.168.1.21 16443 Trying 192.168.1.21... Connected to 192.168.1.21. Escape character is '^]'. Connection closed by foreign host.
2) Nginx高可用方案---1台Master均操作 [root@srv1 ~]# yum install gcc -y
[root@srv1 ~]# wget http://nginx.org/download/nginx-1.25.1.tar.gz [root@srv1 ~]# tar xvf nginx-*.tar.gz [root@srv1 ~]# cd nginx-*
[root@srv1 ~]# ./configure --with-stream --without-http --without-http_uwsgi_module \ --without-http_scgi_module --without-http_fastcgi_module
[root@srv1 ~]# make && make install
[root@srv1 ~]# node='srv1.1000y.cloud srv2.1000y.cloud srv3.1000y.cloud srv4.1000y.cloud srv5.1000y.cloud srv6.1000y.cloud'
[root@srv1 ~]# for NODE in $node do scp -r /usr/local/nginx/ $NODE:/usr/local/nginx/ done
# 所有主机全部执行以下操作 [root@srv1 ~]# cat > /usr/local/nginx/conf/kube-nginx.conf << EOF worker_processes 1; events { worker_connections 1024; } stream { upstream backend { least_conn; hash $remote_addr consistent; server 192.168.1.11:6443 max_fails=3 fail_timeout=30s; server 192.168.1.12:6443 max_fails=3 fail_timeout=30s; server 192.168.1.13:6443 max_fails=3 fail_timeout=30s; } server { listen 127.0.0.1:16443; proxy_connect_timeout 1s; proxy_pass backend; } } EOF
[root@srv1 ~]# cat > /etc/systemd/system/kube-nginx.service << EOF [Unit] Description=kube-apiserver nginx proxy After=network.target After=network-online.target Wants=network-online.target
[Service] Type=forking ExecStartPre=/usr/local/nginx/sbin/nginx -c /usr/local/nginx/conf/kube-nginx.conf -p /usr/local/nginx -t ExecStart=/usr/local/nginx/sbin/nginx -c /usr/local/nginx/conf/kube-nginx.conf -p /usr/local/nginx ExecReload=/usr/local/nginx/sbin/nginx -c /usr/local/nginx/conf/kube-nginx.conf -p /usr/local/nginx -s reload PrivateTmp=true Restart=always RestartSec=5 StartLimitInterval=0 LimitNOFILE=65536
[Install] WantedBy=multi-user.target EOF

[root@srv1 ~]# systemctl daemon-reload && systemctl enable --now kube-nginx.service
2.8 创建apiserver
1) 编写kube-apiserver.service
[root@srv1 ~]# cat > /usr/lib/systemd/system/kube-apiserver.service << EOF
[Unit]
Description=Kubernetes API Server
Documentation=https://github.com/kubernetes/kubernetes
After=network.target
[Service] ExecStart=/usr/local/bin/kube-apiserver \\ --v=2 \\ --allow-privileged=true \\ --bind-address=0.0.0.0 \\ --secure-port=6443 \\ --advertise-address=192.168.1.11 \\ --service-cluster-ip-range=10.96.0.0/12 \\ --service-node-port-range=30000-32767 \\ --etcd-servers=https://192.168.1.11:2379,https://192.168.1.12:2379,https://192.168.1.13:2379 \\ --etcd-cafile=/etc/etcd/ssl/etcd-ca.pem \\ --etcd-certfile=/etc/etcd/ssl/etcd.pem \\ --etcd-keyfile=/etc/etcd/ssl/etcd-key.pem \\ --client-ca-file=/etc/kubernetes/pki/ca.pem \\ --tls-cert-file=/etc/kubernetes/pki/apiserver.pem \\ --tls-private-key-file=/etc/kubernetes/pki/apiserver-key.pem \\ --kubelet-client-certificate=/etc/kubernetes/pki/apiserver.pem \\ --kubelet-client-key=/etc/kubernetes/pki/apiserver-key.pem \\ --service-account-key-file=/etc/kubernetes/pki/sa.pub \\ --service-account-signing-key-file=/etc/kubernetes/pki/sa.key \\ --service-account-issuer=https://kubernetes.default.svc.cluster.local \\ --kubelet-preferred-address-types=InternalIP,ExternalIP,Hostname \\ --enable-admission-plugins=NamespaceLifecycle,LimitRanger,ServiceAccount,DefaultStorageClass,DefaultTolerationSeconds,NodeRestriction,ResourceQuota \\ --authorization-mode=Node,RBAC \\ --enable-bootstrap-token-auth=true \\ --requestheader-client-ca-file=/etc/kubernetes/pki/front-proxy-ca.pem \\ --proxy-client-cert-file=/etc/kubernetes/pki/front-proxy-client.pem \\ --proxy-client-key-file=/etc/kubernetes/pki/front-proxy-client-key.pem \\ --requestheader-allowed-names=aggregator \\ --requestheader-group-headers=X-Remote-Group \\ --requestheader-extra-headers-prefix=X-Remote-Extra- \\ --requestheader-username-headers=X-Remote-User \\ --enable-aggregator-routing=true Restart=on-failure RestartSec=10s LimitNOFILE=65535
[Install] WantedBy=multi-user.target EOF

[root@srv2 ~]# cat > /usr/lib/systemd/system/kube-apiserver.service << EOF [Unit] Description=Kubernetes API Server Documentation=https://github.com/kubernetes/kubernetes After=network.target
[Service] ExecStart=/usr/local/bin/kube-apiserver \\ --v=2 \\ --allow-privileged=true \\ --bind-address=0.0.0.0 \\ --secure-port=6443 \\ --advertise-address=192.168.1.12 \\ --service-cluster-ip-range=10.96.0.0/12 \\ --service-node-port-range=30000-32767 \\ --etcd-servers=https://192.168.1.11:2379,https://192.168.1.12:2379,https://192.168.1.13:2379 \\ --etcd-cafile=/etc/etcd/ssl/etcd-ca.pem \\ --etcd-certfile=/etc/etcd/ssl/etcd.pem \\ --etcd-keyfile=/etc/etcd/ssl/etcd-key.pem \\ --client-ca-file=/etc/kubernetes/pki/ca.pem \\ --tls-cert-file=/etc/kubernetes/pki/apiserver.pem \\ --tls-private-key-file=/etc/kubernetes/pki/apiserver-key.pem \\ --kubelet-client-certificate=/etc/kubernetes/pki/apiserver.pem \\ --kubelet-client-key=/etc/kubernetes/pki/apiserver-key.pem \\ --service-account-key-file=/etc/kubernetes/pki/sa.pub \\ --service-account-signing-key-file=/etc/kubernetes/pki/sa.key \\ --service-account-issuer=https://kubernetes.default.svc.cluster.local \\ --kubelet-preferred-address-types=InternalIP,ExternalIP,Hostname \\ --enable-admission-plugins=NamespaceLifecycle,LimitRanger,ServiceAccount,DefaultStorageClass,DefaultTolerationSeconds,NodeRestriction,ResourceQuota \\ --authorization-mode=Node,RBAC \\ --enable-bootstrap-token-auth=true \\ --requestheader-client-ca-file=/etc/kubernetes/pki/front-proxy-ca.pem \\ --proxy-client-cert-file=/etc/kubernetes/pki/front-proxy-client.pem \\ --proxy-client-key-file=/etc/kubernetes/pki/front-proxy-client-key.pem \\ --requestheader-allowed-names=aggregator \\ --requestheader-group-headers=X-Remote-Group \\ --requestheader-extra-headers-prefix=X-Remote-Extra- \\ --requestheader-username-headers=X-Remote-User \\ --enable-aggregator-routing=true Restart=on-failure RestartSec=10s LimitNOFILE=65535
[Install] WantedBy=multi-user.target EOF

[root@srv3 ~]# cat > /usr/lib/systemd/system/kube-apiserver.service << EOF [Unit] Description=Kubernetes API Server Documentation=https://github.com/kubernetes/kubernetes After=network.target
[Service] ExecStart=/usr/local/bin/kube-apiserver \\ --v=2 \\ --allow-privileged=true \\ --bind-address=0.0.0.0 \\ --secure-port=6443 \\ --advertise-address=192.168.1.13 \\ --service-cluster-ip-range=10.96.0.0/12 \\ --service-node-port-range=30000-32767 \\ --etcd-servers=https://192.168.1.11:2379,https://192.168.1.12:2379,https://192.168.1.13:2379 \\ --etcd-cafile=/etc/etcd/ssl/etcd-ca.pem \\ --etcd-certfile=/etc/etcd/ssl/etcd.pem \\ --etcd-keyfile=/etc/etcd/ssl/etcd-key.pem \\ --client-ca-file=/etc/kubernetes/pki/ca.pem \\ --tls-cert-file=/etc/kubernetes/pki/apiserver.pem \\ --tls-private-key-file=/etc/kubernetes/pki/apiserver-key.pem \\ --kubelet-client-certificate=/etc/kubernetes/pki/apiserver.pem \\ --kubelet-client-key=/etc/kubernetes/pki/apiserver-key.pem \\ --service-account-key-file=/etc/kubernetes/pki/sa.pub \\ --service-account-signing-key-file=/etc/kubernetes/pki/sa.key \\ --service-account-issuer=https://kubernetes.default.svc.cluster.local \\ --kubelet-preferred-address-types=InternalIP,ExternalIP,Hostname \\ --enable-admission-plugins=NamespaceLifecycle,LimitRanger,ServiceAccount,DefaultStorageClass,DefaultTolerationSeconds,NodeRestriction,ResourceQuota \\ --authorization-mode=Node,RBAC \\ --enable-bootstrap-token-auth=true \\ --requestheader-client-ca-file=/etc/kubernetes/pki/front-proxy-ca.pem \\ --proxy-client-cert-file=/etc/kubernetes/pki/front-proxy-client.pem \\ --proxy-client-key-file=/etc/kubernetes/pki/front-proxy-client-key.pem \\ --requestheader-allowed-names=aggregator \\ --requestheader-group-headers=X-Remote-Group \\ --requestheader-extra-headers-prefix=X-Remote-Extra- \\ --requestheader-username-headers=X-Remote-User \\ --enable-aggregator-routing=true Restart=on-failure RestartSec=10s LimitNOFILE=65535
[Install] WantedBy=multi-user.target EOF

################################################## 参数说明 ##################################################
- `--v=2` 指定日志级别为2,打印详细的API Server日志。 - `--allow-privileged=true` 允许特权容器运行。 - `--bind-address=0.0.0.0` 绑定API Server监听的IP地址。 - `--secure-port=6443` 指定API Server监听的安全端口。 - `--advertise-address=192.168.0.31` 广告API Server的地址。 - `--service-cluster-ip-range=10.96.0.0/12,fd00:1111::/112` 指定服务CIDR范围。 - `--service-node-port-range=30000-32767` 指定NodePort的范围。 - `--etcd-servers=https://192.168.0.31:2379,https://192.168.0.32:2379,https://192.168.0.33:2379` 指定etcd服务器的地址。 - `--etcd-cafile` 指定etcd服务器的CA证书。 - `--etcd-certfile` 指定etcd服务器的证书。 - `--etcd-keyfile` 指定etcd服务器的私钥。 - `--client-ca-file` 指定客户端CA证书。 - `--tls-cert-file` 指定服务的证书。 - `--tls-private-key-file` 指定服务的私钥。 - `--kubelet-client-certificate` 和 `--kubelet-client-key` 指定与kubelet通信的客户端证书和私钥。 - `--service-account-key-file` 指定服务账户公钥文件。 - `--service-account-signing-key-file` 指定服务账户签名密钥文件。 - `--service-account-issuer` 指定服务账户的发布者。 - `--kubelet-preferred-address-types` 指定kubelet通信时的首选地址类型。 - `--enable-admission-plugins` 启用一系列准入插件。 - `--authorization-mode` 指定授权模式。 - `--enable-bootstrap-token-auth` 启用引导令牌认证。 - `--requestheader-client-ca-file` 指定请求头中的客户端CA证书。 - `--proxy-client-cert-file` 和 `--proxy-client-key-file` 指定代理客户端的证书和私钥。 - `--requestheader-allowed-names` 指定请求头中允许的名字。 - `--requestheader-group-headers` 指定请求头中的组头。 - `--requestheader-extra-headers-prefix` 指定请求头中的额外头前缀。 - `--requestheader-username-headers` 指定请求头中的用户名头。 - `--enable-aggregator-routing` 启用聚合路由。
################################################## 说明结束 ##################################################
2) 启动所有Master节点的kube-apiserver.service [root@srv1 ~]# systemctl daemon-reload ; systemctl enable --now kube-apiserver.service
2.9 配置kube-controller-manager service
1) 编写kube-controller-manager.service---所有Master节点操作且配置一致
[root@srv1 ~]# cat > /usr/lib/systemd/system/kube-controller-manager.service << EOF
[Unit]
Description=Kubernetes Controller Manager
Documentation=https://github.com/kubernetes/kubernetes
After=network.target
[Service] ExecStart=/usr/local/bin/kube-controller-manager \\ --v=2 \\ --bind-address=0.0.0.0 \\ --root-ca-file=/etc/kubernetes/pki/ca.pem \\ --cluster-signing-cert-file=/etc/kubernetes/pki/ca.pem \\ --cluster-signing-key-file=/etc/kubernetes/pki/ca-key.pem \\ --service-account-private-key-file=/etc/kubernetes/pki/sa.key \\ --kubeconfig=/etc/kubernetes/controller-manager.kubeconfig \\ --leader-elect=true \\ --use-service-account-credentials=true \\ --node-monitor-grace-period=40s \\ --node-monitor-period=5s \\ --controllers=*,bootstrapsigner,tokencleaner \\ --allocate-node-cidrs=true \\ --service-cluster-ip-range=10.96.0.0/12 \\ --cluster-cidr=172.16.0.0/12 \\ --node-cidr-mask-size-ipv4=24 \\ --requestheader-client-ca-file=/etc/kubernetes/pki/front-proxy-ca.pem Restart=always RestartSec=10s
[Install] WantedBy=multi-user.target EOF

################################################## 参数说明 ##################################################
--v=2:设置日志的详细级别为 2。 --bind-address=0.0.0.0:绑定的IP地址,用于监听Kubernetes控制平面的请求,这里设置为 0.0.0.0,表示监听所有网络接口上的请求。 --root-ca-file:根证书文件的路径,用于验证其他组件的证书。 --cluster-signing-cert-file:用于签名集群证书的证书文件路径。 --cluster-signing-key-file:用于签名集群证书的私钥文件路径。 --service-account-private-key-file:用于签名服务账户令牌的私钥文件路径。 --kubeconfig:kubeconfig 文件的路径,包含了与 Kubernetes API 服务器通信所需的配置信息。 --leader-elect=true:启用 Leader 选举机制,确保只有一个控制器管理器作为 leader 在运行。 --use-service-account-credentials=true:使用服务账户的凭据进行认证和授权。 --node-monitor-grace-period=40s:节点监控的优雅退出时间,节点长时间不响应时会触发节点驱逐。 --node-monitor-period=5s:节点监控的检测周期,用于检测节点是否正常运行。
--controllers:指定要运行的控制器类型,在这里使用了通配符 *,表示运行所有的控制器,同时还包括了bootstrapsigner和tokencleaner控制器。
--allocate-node-cidrs=true:为节点分配 CIDR 子网,用于分配 Pod 网络地址。 --service-cluster-ip-range:定义 Service 的 IP 范围,这里设置为 10.96.0.0/12。 --cluster-cidr:定义集群的 CIDR 范围,这里设置为 172.16.0.0/12。 --node-cidr-mask-size-ipv4:分配给每个节点的 IPv4 子网掩码大小,默认是 24。 --requestheader-client-ca-file:设置请求头中客户端 CA 的证书文件路径,用于认证请求头中的 CA 证书。
################################################## 说明结束 ##################################################
2) 启动所有Master节点的kube-controller-manager.service [root@srv1 ~]# systemctl daemon-reload ; systemctl enable --now kube-controller-manager.service
2.10 配置kube-scheduler service
1) 编写kube-scheduler service---所有Master节点操作且配置一致
[root@srv1 ~]# cat > /usr/lib/systemd/system/kube-scheduler.service << EOF
[Unit]
Description=Kubernetes Scheduler
Documentation=https://github.com/kubernetes/kubernetes
After=network.target
[Service] ExecStart=/usr/local/bin/kube-scheduler \\ --v=2 \\ --bind-address=0.0.0.0 \\ --leader-elect=true \\ --kubeconfig=/etc/kubernetes/scheduler.kubeconfig Restart=always RestartSec=10s
[Install] WantedBy=multi-user.target EOF

################################################## 参数说明 ##################################################
--v=2:设置日志的详细级别为 2。 --bind-address=0.0.0.0:绑定的 IP 地址,用于监听 Kubernetes 控制平面的请求,这里设置为 0.0.0.0,表示监听所有网络接口上的请求。 --leader-elect=true:启用 Leader 选举机制,确保只有一个调度器作为 leader 在运行。 --kubeconfig=/etc/kubernetes/scheduler.kubeconfig:kubeconfig 文件的路径,包含了与 Kubernetes API 服务器通信所需的配置信息。
################################################## 说明结束 ##################################################
2) 启动所有Master节点的kube-scheduler.service [root@srv1 ~]# systemctl daemon-reload ; systemctl enable --now kube-scheduler.service
2.11 TLS Bootstrapping配置
1) 配置TLS Bootstrapping---Master1节点操作
[root@srv1 ~]# mkdir bootstrap
[root@srv1 ~]# cd bootstrap
[root@srv1 bootstrap]# cat >> bootstrap.secret.yaml << EOF apiVersion: v1 kind: Secret metadata: name: bootstrap-token-c8ad9c namespace: kube-system type: bootstrap.kubernetes.io/token stringData: description: "The default bootstrap token generated by 'kubelet '." token-id: c8ad9c token-secret: 2e4d610cf3e7426e usage-bootstrap-authentication: "true" usage-bootstrap-signing: "true" auth-extra-groups: system:bootstrappers:default-node-token,system:bootstrappers:worker,system:bootstrappers:ingress
--- apiVersion: rbac.authorization.k8s.io/v1 kind: ClusterRoleBinding metadata: name: kubelet-bootstrap roleRef: apiGroup: rbac.authorization.k8s.io kind: ClusterRole name: system:node-bootstrapper subjects: - apiGroup: rbac.authorization.k8s.io kind: Group name: system:bootstrappers:default-node-token --- apiVersion: rbac.authorization.k8s.io/v1 kind: ClusterRoleBinding metadata: name: node-autoapprove-bootstrap roleRef: apiGroup: rbac.authorization.k8s.io kind: ClusterRole name: system:certificates.k8s.io:certificatesigningrequests:nodeclient subjects: - apiGroup: rbac.authorization.k8s.io kind: Group name: system:bootstrappers:default-node-token --- apiVersion: rbac.authorization.k8s.io/v1 kind: ClusterRoleBinding metadata: name: node-autoapprove-certificate-rotation roleRef: apiGroup: rbac.authorization.k8s.io kind: ClusterRole name: system:certificates.k8s.io:certificatesigningrequests:selfnodeclient subjects: - apiGroup: rbac.authorization.k8s.io kind: Group name: system:nodes --- apiVersion: rbac.authorization.k8s.io/v1 kind: ClusterRole metadata: annotations: rbac.authorization.kubernetes.io/autoupdate: "true" labels: kubernetes.io/bootstrapping: rbac-defaults name: system:kube-apiserver-to-kubelet rules: - apiGroups: - "" resources: - nodes/proxy - nodes/stats - nodes/log - nodes/spec - nodes/metrics verbs: - "*" --- apiVersion: rbac.authorization.k8s.io/v1 kind: ClusterRoleBinding metadata: name: system:kube-apiserver namespace: "" roleRef: apiGroup: rbac.authorization.k8s.io kind: ClusterRole name: system:kube-apiserver-to-kubelet subjects: - apiGroup: rbac.authorization.k8s.io kind: User name: kube-apiserver EOF

[root@srv1 bootstrap]# kubectl config set-cluster kubernetes \ --certificate-authority=/etc/kubernetes/pki/ca.pem \ --embed-certs=true \ --server=https://192.168.1.21:16443 \ --kubeconfig=/etc/kubernetes/bootstrap-kubelet.kubeconfig Cluster "kubernetes" set.
################################################## 参数说明 ##################################################
# 这是一个使用 kubectl 命令设置 Kubernetes 集群配置的命令示例。下面是对每个选项的详细解释: # # config set-cluster kubernetes:指定要设置的集群名称为 "kubernetes",表示要修改名为 "kubernetes" 的集群配置。 # # --certificate-authority=/etc/kubernetes/pki/ca.pem:指定证书颁发机构(CA)的证书文件路径,用于验证服务器证书的有效性。 # # --embed-certs=true:将证书文件嵌入到生成的 kubeconfig 文件中。这样可以避免在 kubeconfig 文件中引用外部证书文件。 # # --server=https://127.0.0.1:8443:指定 Kubernetes API 服务器的地址和端口,这里使用的是 https 协议和本地地址 #(127.0.0.1),端口号为 8443。你可以根据实际环境修改该参数。 # # --kubeconfig=/etc/kubernetes/bootstrap-kubelet.kubeconfig:指定 kubeconfig 文件的路径和名称, # 这里是 /etc/kubernetes/bootstrap-kubelet.kubeconfig。 # # 通过执行此命令,你可以设置名为 "kubernetes" 的集群配置,并提供 CA 证书、API 服务器地址和端口,并将这些配置信息嵌入到 # bootstrap-kubelet.kubeconfig 文件中。这个 kubeconfig 文件可以用于认证和授权 kubelet 组件与 Kubernetes API # 服务器之间的通信。请确保路径和文件名与实际环境中的配置相匹配。
################################################## 说明结束 ##################################################
[root@srv1 bootstrap]# kubectl config set-credentials tls-bootstrap-token-user \ --token=c8ad9c.2e4d610cf3e7426e \ --kubeconfig=/etc/kubernetes/bootstrap-kubelet.kubeconfig User "tls-bootstrap-token-user" set.
################################################## 参数说明 ##################################################
# 这是一个使用 kubectl 命令设置凭证信息的命令示例。下面是对每个选项的详细解释: # # config set-credentials tls-bootstrap-token-user:指定要设置的凭证名称为 "tls-bootstrap-token-user", # 表示要修改名为 "tls-bootstrap-token-user" 的用户凭证配置。 # # --token=c8ad9c.2e4d610cf3e7426e:指定用户的身份验证令牌(token)。在这个示例中,令牌是 # c8ad9c.2e4d610cf3e7426e。你可以根据实际情况修改该令牌。[tokenid为c8ad9c,],[token-secret为2e4d610cf3e7426e] # token-secret可用命令: head -c 6 /dev/urandom | od -An -t x | tr -d ' ' 生成 # # --kubeconfig=/etc/kubernetes/bootstrap-kubelet.kubeconfig:指定 kubeconfig 文件的路径和名称, # 这里是 /etc/kubernetes/bootstrap-kubelet.kubeconfig。 # # 通过执行此命令,你可以设置名为 "tls-bootstrap-token-user" 的用户凭证,并将令牌信息加入到 bootstrap-kubelet.kubeconfig 文件中。这个 kubeconfig 文件可以用于认证和授权 kubelet 组件与 Kubernetes API 服务器之间的通信。请确保路径和文件名与实际环境中的配置相匹配。
################################################## 说明结束 ##################################################
[root@srv1 bootstrap]# kubectl config set-context tls-bootstrap-token-user@kubernetes \ --cluster=kubernetes \ --user=tls-bootstrap-token-user \ --kubeconfig=/etc/kubernetes/bootstrap-kubelet.kubeconfig Context "tls-bootstrap-token-user@kubernetes" created.
################################################## 参数说明 ##################################################
# 这是一个使用 kubectl 命令设置上下文信息的命令示例。下面是对每个选项的详细解释: # # config set-context tls-bootstrap-token-user@kubernetes:指定要设置的上下文名称为 # "tls-bootstrap-token-user@kubernetes",表示要修改名为 "tls-bootstrap-token-user@kubernetes" 的上下文配置。 # # --cluster=kubernetes:指定上下文关联的集群名称为 "kubernetes",表示使用名为 "kubernetes" 的集群配置。 # # --user=tls-bootstrap-token-user:指定上下文关联的用户凭证名称为 "tls-bootstrap-token-user",表示使用名为 # "tls-bootstrap-token-user" 的用户凭证配置。 # # --kubeconfig=/etc/kubernetes/bootstrap-kubelet.kubeconfig:指定 kubeconfig 文件的路径和名称, # 这里是 /etc/kubernetes/bootstrap-kubelet.kubeconfig。 # 通过执行此命令,你可以设置名为 "tls-bootstrap-token-user@kubernetes" 的上下文,并将其关联到名为 "kubernetes" # 的集群配置和名为 "tls-bootstrap-token-user" 的用户凭证配置。这样,bootstrap-kubelet.kubeconfig 文件就包含了 # 完整的上下文信息,可以用于指定与 Kubernetes 集群建立连接时要使用的集群和凭证。请确保路径和文件名与实际环境中的配置相匹配。
################################################## 说明结束 ##################################################
[root@srv1 bootstrap]# kubectl config use-context tls-bootstrap-token-user@kubernetes \ --kubeconfig=/etc/kubernetes/bootstrap-kubelet.kubeconfig Switched to context "tls-bootstrap-token-user@kubernetes".
################################################## 参数说明 ##################################################
# 这是一个使用 kubectl 命令设置当前上下文的命令示例。下面是对每个选项的详细解释: # # config use-context tls-bootstrap-token-user@kubernetes:指定要使用的上下文名称为 # "tls-bootstrap-token-user@kubernetes",表示要将当前上下文切换为名为 "tls-bootstrap-token-user@kubernetes" # 的上下文。 # # --kubeconfig=/etc/kubernetes/bootstrap-kubelet.kubeconfig:指定 kubeconfig 文件的路径和名称, # 这里是 /etc/kubernetes/bootstrap-kubelet.kubeconfig。 # # 通过执行此命令,你可以将当前上下文设置为名为 "tls-bootstrap-token-user@kubernetes" 的上下文。这样,当你执行其他 # kubectl 命令时,它们将使用该上下文与 Kubernetes 集群进行交互。请确保路径和文件名与实际环境中的配置相匹配。
################################################## 说明结束 ##################################################
[root@srv1 bootstrap]# cd
[root@srv1 ~]# mkdir -p /root/.kube [root@srv1 ~]# cp /etc/kubernetes/admin.kubeconfig /root/.kube/config [root@srv1 ~]# kubectl create -f bootstrap/bootstrap.secret.yaml secret/bootstrap-token-c8ad9c created clusterrolebinding.rbac.authorization.k8s.io/kubelet-bootstrap created clusterrolebinding.rbac.authorization.k8s.io/node-autoapprove-bootstrap created clusterrolebinding.rbac.authorization.k8s.io/node-autoapprove-certificate-rotation created clusterrole.rbac.authorization.k8s.io/system:kube-apiserver-to-kubelet created clusterrolebinding.rbac.authorization.k8s.io/system:kube-apiserver created
2.12 kubelet配置
1) 将srv1上的证书复制其他节点上
[root@srv1 ~]# cd /etc/kubernetes/
[root@srv1 kubernetes]# for NODE in $master $worker
do 
  ssh $NODE mkdir -p /etc/kubernetes/pki
  for FILE in pki/ca.pem pki/ca-key.pem pki/front-proxy-ca.pem bootstrap-kubelet.kubeconfig kube-proxy.kubeconfig
  do 
    scp /etc/kubernetes/$FILE $NODE:/etc/kubernetes/${FILE}
  done
done
2) 配置kubelet.service---所有节点操作 [root@srv1 ~]# mkdir -p /var/lib/kubelet \ /var/log/kubernetes \ /etc/systemd/system/kubelet.service.d \ /etc/kubernetes/manifests/
[root@srv1 ~]# cat > /usr/lib/systemd/system/kubelet.service << EOF [Unit] Description=Kubernetes Kubelet Documentation=https://github.com/kubernetes/kubernetes After=containerd.service Requires=containerd.service
[Service] ExecStart=/usr/local/bin/kubelet \\ --bootstrap-kubeconfig=/etc/kubernetes/bootstrap-kubelet.kubeconfig \\ --kubeconfig=/etc/kubernetes/kubelet.kubeconfig \\ --config=/etc/kubernetes/kubelet-conf.yml \\ --container-runtime-endpoint=unix:///run/containerd/containerd.sock \\ --node-labels=node.kubernetes.io/node=
[Install] WantedBy=multi-user.target EOF

3) 配置kubelet配置文件---所有节点操作 [root@srv1 ~]# cat > /etc/kubernetes/kubelet-conf.yml << EOF apiVersion: kubelet.config.k8s.io/v1beta1 kind: KubeletConfiguration address: 0.0.0.0 port: 10250 readOnlyPort: 10255 authentication: anonymous: enabled: false webhook: cacheTTL: 2m0s enabled: true x509: clientCAFile: /etc/kubernetes/pki/ca.pem authorization: mode: Webhook webhook: cacheAuthorizedTTL: 5m0s cacheUnauthorizedTTL: 30s cgroupDriver: systemd cgroupsPerQOS: true clusterDNS: - 10.96.0.10 clusterDomain: cluster.local containerLogMaxFiles: 5 containerLogMaxSize: 10Mi contentType: application/vnd.kubernetes.protobuf cpuCFSQuota: true cpuManagerPolicy: none cpuManagerReconcilePeriod: 10s enableControllerAttachDetach: true enableDebuggingHandlers: true enforceNodeAllocatable: - pods eventBurst: 10 eventRecordQPS: 5 evictionHard: imagefs.available: 15% memory.available: 100Mi nodefs.available: 10% nodefs.inodesFree: 5% evictionPressureTransitionPeriod: 5m0s failSwapOn: true fileCheckFrequency: 20s hairpinMode: promiscuous-bridge healthzBindAddress: 127.0.0.1 healthzPort: 10248 httpCheckFrequency: 20s imageGCHighThresholdPercent: 85 imageGCLowThresholdPercent: 80 imageMinimumGCAge: 2m0s iptablesDropBit: 15 iptablesMasqueradeBit: 14 kubeAPIBurst: 10 kubeAPIQPS: 5 makeIPTablesUtilChains: true maxOpenFiles: 1000000 maxPods: 110 nodeStatusUpdateFrequency: 10s oomScoreAdj: -999 podPidsLimit: -1 registryBurst: 10 registryPullQPS: 5 resolvConf: /etc/resolv.conf rotateCertificates: true runtimeRequestTimeout: 2m0s serializeImagePulls: true staticPodPath: /etc/kubernetes/manifests streamingConnectionIdleTimeout: 4h0m0s syncFrequency: 1m0s volumeStatsAggPeriod: 1m0s EOF
################################################## 参数说明 ##################################################
# - apiVersion: kubelet.config.k8s.io/v1beta1:指定了配置文件的API版本为kubelet.config.k8s.io/v1beta1。 # - kind: KubeletConfiguration:指定了配置类别为KubeletConfiguration。 # - address: 0.0.0.0:指定了Kubelet监听的地址为0.0.0.0。 # - port: 10250:指定了Kubelet监听的端口为10250。 # - readOnlyPort: 10255:指定了只读端口为10255,用于提供只读的状态信息。 # - authentication:指定了认证相关的配置信息。 # - anonymous.enabled: false:禁用了匿名认证。 # - webhook.enabled: true:启用了Webhook认证。 # - x509.clientCAFile: /etc/kubernetes/pki/ca.pem:指定了X509证书的客户端CA文件路径。 # - authorization:指定了授权相关的配置信息。 # - mode: Webhook:指定了授权模式为Webhook。 # - webhook.cacheAuthorizedTTL: 5m0s:指定了授权缓存时间段为5分钟。 # - webhook.cacheUnauthorizedTTL: 30s:指定了未授权缓存时间段为30秒。 # - cgroupDriver: systemd:指定了Cgroup驱动为systemd。 # - cgroupsPerQOS: true:启用了每个QoS类别一个Cgroup的设置。 # - clusterDNS: 指定了集群的DNS服务器地址列表。 # - 10.96.0.10:指定了DNS服务器地址为10.96.0.10。 # - clusterDomain: cluster.local:指定了集群的域名后缀为cluster.local。 # - containerLogMaxFiles: 5:指定了容器日志文件保留的最大数量为5个。 # - containerLogMaxSize: 10Mi:指定了容器日志文件的最大大小为10Mi。 # - contentType: application/vnd.kubernetes.protobuf:指定了内容类型为protobuf。 # - cpuCFSQuota: true:启用了CPU CFS Quota。 # - cpuManagerPolicy: none:禁用了CPU Manager。 # - cpuManagerReconcilePeriod: 10s:指定了CPU管理器的调整周期为10秒。 # - enableControllerAttachDetach: true:启用了控制器的挂载和拆卸。 # - enableDebuggingHandlers: true:启用了调试处理程序。 # - enforceNodeAllocatable: 指定了强制节点可分配资源的列表。 # - pods:强制节点可分配pods资源。 # - eventBurst: 10:指定了事件突发的最大数量为10。 # - eventRecordQPS: 5:指定了事件记录的最大请求量为5。 # - evictionHard: 指定了驱逐硬性限制参数的配置信息。 # - imagefs.available: 15%:指定了镜像文件系统可用空间的限制为15%。 # - memory.available: 100Mi:指定了可用内存的限制为100Mi。 # - nodefs.available: 10%:指定了节点文件系统可用空间的限制为10%。 # - nodefs.inodesFree: 5%:指定了节点文件系统可用inode的限制为5%。 # - evictionPressureTransitionPeriod: 5m0s:指定了驱逐压力转换的时间段为5分钟。 # - failSwapOn: true:指定了在发生OOM时禁用交换分区。 # - fileCheckFrequency: 20s:指定了文件检查频率为20秒。 # - hairpinMode: promiscuous-bridge:设置了Hairpin Mode为"promiscuous-bridge"。 # - healthzBindAddress: 127.0.0.1:指定了健康检查的绑定地址为127.0.0.1。 # - healthzPort: 10248:指定了健康检查的端口为10248。 # - httpCheckFrequency: 20s:指定了HTTP检查的频率为20秒。 # - imageGCHighThresholdPercent: 85:指定了镜像垃圾回收的上阈值为85%。 # - imageGCLowThresholdPercent: 80:指定了镜像垃圾回收的下阈值为80%。 # - imageMinimumGCAge: 2m0s:指定了镜像垃圾回收的最小时间为2分钟。 # - iptablesDropBit: 15:指定了iptables的Drop Bit为15。 # - iptablesMasqueradeBit: 14:指定了iptables的Masquerade Bit为14。 # - kubeAPIBurst: 10:指定了KubeAPI的突发请求数量为10个。 # - kubeAPIQPS: 5:指定了KubeAPI的每秒请求频率为5个。 # - makeIPTablesUtilChains: true:指定了是否使用iptables工具链。 # - maxOpenFiles: 1000000:指定了最大打开文件数为1000000。 # - maxPods: 110:指定了最大的Pod数量为110。 # - nodeStatusUpdateFrequency: 10s:指定了节点状态更新的频率为10秒。 # - oomScoreAdj: -999:指定了OOM Score Adjustment为-999。 # - podPidsLimit: -1:指定了Pod的PID限制为-1,表示无限制。 # - registryBurst: 10:指定了Registry的突发请求数量为10个。 # - registryPullQPS: 5:指定了Registry的每秒拉取请求数量为5个。 # - resolvConf: /etc/resolv.conf:指定了resolv.conf的文件路径。 # - rotateCertificates: true:指定了是否轮转证书。 # - runtimeRequestTimeout: 2m0s:指定了运行时请求的超时时间为2分钟。 # - serializeImagePulls: true:指定了是否序列化镜像拉取。 # - staticPodPath: /etc/kubernetes/manifests:指定了静态Pod的路径。 # - streamingConnectionIdleTimeout: 4h0m0s:指定了流式连接的空闲超时时间为4小时。 # - syncFrequency: 1m0s:指定了同步频率为1分钟。 # - volumeStatsAggPeriod: 1m0s:指定了卷统计聚合周期为1分钟。
################################################## 说明结束 ##################################################
[root@srv1 ~]# systemctl daemon-reload && systemctl enable --now kubelet.service
4) 测试---Srv1操作 (1) 查看各节点状态 [root@srv1 ~]# kubectl get node NAME STATUS ROLES AGE VERSION srv1.1000y.cloud Ready <none> 99s v1.28.2 srv2.1000y.cloud Ready <none> 98s v1.28.2 srv3.1000y.cloud Ready <none> 95s v1.28.2 srv4.1000y.cloud Ready <none> 98s v1.28.2 srv5.1000y.cloud Ready <none> 97s v1.28.2 srv6.1000y.cloud Ready <none> 97s v1.28.2
(2) 查看容器运行时 [root@srv1 ~]# kubectl describe node | grep Runtime Container Runtime Version: containerd://1.6.24 Container Runtime Version: containerd://1.6.24 Container Runtime Version: containerd://1.6.24 Container Runtime Version: containerd://1.6.24 Container Runtime Version: containerd://1.6.24 Container Runtime Version: containerd://1.6.24
2.13 kube-proxy配置
1) 将kubeconfig发送至其他节点---srv1执行
[root@srv1 ~]# for NODE in $master $worker
do
  echo $NODE
  scp /etc/kubernetes/kube-proxy.kubeconfig $NODE:/etc/kubernetes/kube-proxy.kubeconfig
done
2) 所有节点添加kube-proxy.service文件 [root@srv1 ~]# cat > /usr/lib/systemd/system/kube-proxy.service << EOF [Unit] Description=Kubernetes Kube Proxy Documentation=https://github.com/kubernetes/kubernetes After=network.target
[Service] ExecStart=/usr/local/bin/kube-proxy \\ --config=/etc/kubernetes/kube-proxy.yaml \\ --v=2 Restart=always RestartSec=10s
[Install] WantedBy=multi-user.target EOF

3) 所有节点添加kube-proxy配置文件 [root@srv1 ~]# cat > /etc/kubernetes/kube-proxy.yaml << EOF apiVersion: kubeproxy.config.k8s.io/v1alpha1 bindAddress: 0.0.0.0 clientConnection: acceptContentTypes: "" burst: 10 contentType: application/vnd.kubernetes.protobuf kubeconfig: /etc/kubernetes/kube-proxy.kubeconfig qps: 5 clusterCIDR: 172.16.0.0/12 configSyncPeriod: 15m0s conntrack: max: null maxPerCore: 32768 min: 131072 tcpCloseWaitTimeout: 1h0m0s tcpEstablishedTimeout: 24h0m0s enableProfiling: false healthzBindAddress: 0.0.0.0:10256 hostnameOverride: "" iptables: masqueradeAll: false masqueradeBit: 14 minSyncPeriod: 0s syncPeriod: 30s ipvs: masqueradeAll: true minSyncPeriod: 5s scheduler: "rr" syncPeriod: 30s kind: KubeProxyConfiguration metricsBindAddress: 127.0.0.1:10249 mode: "ipvs" nodePortAddresses: null oomScoreAdj: -999 portRange: "" udpIdleTimeout: 250ms EOF
################################################## 参数说明 ##################################################
# 1. apiVersion: kubeproxy.config.k8s.io/v1alpha1 # - 指定该配置文件的API版本。 # # 2. bindAddress: 0.0.0.0 # - 指定kube-proxy使用的监听地址。0.0.0.0表示监听所有网络接口。 # # 3. clientConnection: # - 客户端连接配置项。 # # a. acceptContentTypes: "" # - 指定接受的内容类型。 # # b. burst: 10 # - 客户端请求超出qps设置时的最大突发请求数。 # # c. contentType: application/vnd.kubernetes.protobuf # - 指定客户端请求的内容类型。 # # d. kubeconfig: /etc/kubernetes/kube-proxy.kubeconfig # - kube-proxy使用的kubeconfig文件路径。 # # e. qps: 5 # - 每秒向API服务器发送的请求数量。 # # 4. clusterCIDR: 172.16.0.0/12,fc00:2222::/112 # - 指定集群使用的CIDR范围,用于自动分配Pod IP。 # # 5. configSyncPeriod: 15m0s # - 指定kube-proxy配置同步到节点的频率。 # # 6. conntrack: # - 连接跟踪设置。 # # a. max: null # - 指定连接跟踪的最大值。 # # b. maxPerCore: 32768 # - 指定每个核心的最大连接跟踪数。 # # c. min: 131072 # - 指定最小的连接跟踪数。 # # d. tcpCloseWaitTimeout: 1h0m0s # - 指定处于CLOSE_WAIT状态的TCP连接的超时时间。 # # e. tcpEstablishedTimeout: 24h0m0s # - 指定已建立的TCP连接的超时时间。 # # 7. enableProfiling: false # - 是否启用性能分析。 # # 8. healthzBindAddress: 0.0.0.0:10256 # - 指定健康检查监听地址和端口。 # # 9. hostnameOverride: "" # - 指定覆盖默认主机名的值。 # # 10. iptables: # - iptables设置。 # # a. masqueradeAll: false # - 是否对所有流量使用IP伪装。 # # b. masqueradeBit: 14 # - 指定伪装的Bit标记。 # # c. minSyncPeriod: 0s # - 指定同步iptables规则的最小间隔。 # # d. syncPeriod: 30s # - 指定同步iptables规则的时间间隔。 # # 11. ipvs: # - ipvs设置。 # # a. masqueradeAll: true # - 是否对所有流量使用IP伪装。 # # b. minSyncPeriod: 5s # - 指定同步ipvs规则的最小间隔。 # # c. scheduler: "rr" # - 指定ipvs默认使用的调度算法。 # # d. syncPeriod: 30s # - 指定同步ipvs规则的时间间隔。 # # 12. kind: KubeProxyConfiguration # - 指定该配置文件的类型。 # # 13. metricsBindAddress: 127.0.0.1:10249 # - 指定指标绑定的地址和端口。 # # 14. mode: "ipvs" # - 指定kube-proxy的模式。这里指定为ipvs,使用IPVS代理模式。 # # 15. nodePortAddresses: null # - 指定可用于NodePort的网络地址。 # # 16. oomScoreAdj: -999 # - 指定kube-proxy的OOM优先级。 # # 17. portRange: "" # - 指定可用于服务端口范围。 # # 18. udpIdleTimeout: 250ms # - 指定UDP连接的空闲超时时间。
################################################## 说明结束 ##################################################
4) 启动kube-proxy配置文件 [root@srv1 ~]# systemctl daemon-reload && systemctl enable --now kube-proxy.service
2.14 Calico配置
1) 升级runc--所有节点操作
[root@srv1 ~]# wget https://github.com/opencontainers/runc/releases/download/v1.1.9/runc.amd64
[root@srv1 ~]# install -m 755 runc.amd64 /usr/local/sbin/runc [root@srv1 ~]# cp -p /usr/local/sbin/runc /usr/local/bin/runc [root@srv1 ~]# cp -p /usr/local/sbin/runc /usr/bin/runc
2) 升级libseccomp --所有节点操作 [root@srv1 ~]# yum install https://mirrors.tuna.tsinghua.edu.cn/centos/8-stream/BaseOS/x86_64/os/Packages/libseccomp-2.5.1-1.el8.x86_64.rpm -y
[root@srv1 ~]# rpm -qa | grep libseccomp libseccomp-2.5.1-1.el8.x86_64
3) 下载calico-typha.yaml并修改 [root@srv1 ~]# wget https://github.com/projectcalico/calico/blob/master/manifests/calico-typha.yaml
[root@srv1 ~]# vim calico-typha.yaml ...... ...... ...... ...... ...... ...... # calico-config ConfigMap处 # 87行 "ipam": { "type": "calico-ipam", }, ...... ...... ...... ...... ...... ...... # 4878行 - name: IP value: "autodetect" ...... ...... ...... ...... ...... ...... # 4910行 - name: CALICO_IPV4POOL_CIDR value: "172.16.0.0/12"
4) 修改Calico镜像为国内源--srv1操作 # 换为国内源 [root@srv1 ~]# sed -i "s#docker.io/calico/#m.daocloud.io/docker.io/calico/#g" calico-typha.yaml
# 恢复为默认源 [root@srv1 ~]# sed -i "s#m.daocloud.io/docker.io/calico/#docker.io/calico/#g" calico-typha.yaml
5) 应用Calico--srv1操作 [root@srv1 ~]# kubectl apply -f calico-typha.yaml poddisruptionbudget.policy/calico-kube-controllers created poddisruptionbudget.policy/calico-typha created serviceaccount/calico-kube-controllers created serviceaccount/calico-node created serviceaccount/calico-cni-plugin created configmap/calico-config created customresourcedefinition.apiextensions.k8s.io/bgpconfigurations.crd.projectcalico.org created customresourcedefinition.apiextensions.k8s.io/bgpfilters.crd.projectcalico.org created customresourcedefinition.apiextensions.k8s.io/bgppeers.crd.projectcalico.org created customresourcedefinition.apiextensions.k8s.io/blockaffinities.crd.projectcalico.org created customresourcedefinition.apiextensions.k8s.io/caliconodestatuses.crd.projectcalico.org created customresourcedefinition.apiextensions.k8s.io/clusterinformations.crd.projectcalico.org created customresourcedefinition.apiextensions.k8s.io/felixconfigurations.crd.projectcalico.org created customresourcedefinition.apiextensions.k8s.io/globalnetworkpolicies.crd.projectcalico.org created customresourcedefinition.apiextensions.k8s.io/globalnetworksets.crd.projectcalico.org created customresourcedefinition.apiextensions.k8s.io/hostendpoints.crd.projectcalico.org created customresourcedefinition.apiextensions.k8s.io/ipamblocks.crd.projectcalico.org created customresourcedefinition.apiextensions.k8s.io/ipamconfigs.crd.projectcalico.org created customresourcedefinition.apiextensions.k8s.io/ipamhandles.crd.projectcalico.org created customresourcedefinition.apiextensions.k8s.io/ippools.crd.projectcalico.org created customresourcedefinition.apiextensions.k8s.io/ipreservations.crd.projectcalico.org created customresourcedefinition.apiextensions.k8s.io/kubecontrollersconfigurations.crd.projectcalico.org created customresourcedefinition.apiextensions.k8s.io/networkpolicies.crd.projectcalico.org created customresourcedefinition.apiextensions.k8s.io/networksets.crd.projectcalico.org created clusterrole.rbac.authorization.k8s.io/calico-kube-controllers created clusterrole.rbac.authorization.k8s.io/calico-node created clusterrole.rbac.authorization.k8s.io/calico-cni-plugin created clusterrolebinding.rbac.authorization.k8s.io/calico-kube-controllers created clusterrolebinding.rbac.authorization.k8s.io/calico-node created clusterrolebinding.rbac.authorization.k8s.io/calico-cni-plugin created service/calico-typha created daemonset.apps/calico-node created deployment.apps/calico-kube-controllers created deployment.apps/calico-typha created
6) 查看Calico容器状态--srv1操作 [root@srv1 ~]# kubectl get pod -A NAMESPACE NAME READY STATUS RESTARTS AGE kube-system calico-kube-controllers-765c96cb9d-fwspl 1/1 Running 0 11h kube-system calico-node-btq6j 1/1 Running 0 11h kube-system calico-node-btzkj 1/1 Running 0 11h kube-system calico-node-d2vbs 1/1 Running 0 11h kube-system calico-node-kb4xn 1/1 Running 0 11h kube-system calico-node-pzqfl 1/1 Running 0 11h kube-system calico-node-x794v 1/1 Running 0 11h kube-system calico-typha-67c57cdf49-c2jfg 1/1 Running 0 11h
################################################## 说明汇总 ##################################################
1. 如果无法下载镜像,可先找个能下的,将镜像导出。 2. 在将镜像导入至k8s集群中pod所在的节点,例如: [root@srv1 ~]# ctr -n=k8s.io image import calico-kube-controllers-master.tar
################################################## 汇总结束 ##################################################
2.15 CoreDNS安装及配置
1) 安装helm---srv1操作
[root@srv1 ~]# wget https://mirrors.huaweicloud.com/helm/v3.12.3/helm-v3.12.3-linux-amd64.tar.gz
[root@srv1 ~]# tar xfz helm-v3.12.3-linux-amd64.tar.gz
[root@srv1 ~]# cp linux-amd64/helm /usr/local/bin/ [root@srv1 ~]# helm version version.BuildInfo{Version:"v3.12.3", GitCommit:"......
2) 安装并修改CoreDNS配置文件---srv1操作 [root@srv1 ~]# helm repo add coredns https://coredns.github.io/helm "coredns" has been added to your repositories
[root@srv1 ~]# helm pull coredns/coredns
[root@srv1 ~]# tar xfz coredns-1.26.0.tgz [root@srv1 ~]# vim coredns/values.yaml ...... ...... ...... ...... ...... ......
service: # 修改52行,定义CoreDNS的IP地址 clusterIP: "10.96.0.10" # clusterIPs: [] # loadBalancerIP: "" # externalIPs: [] # externalTrafficPolicy: "" # ipFamilyPolicy: "" # The name of the Service # If not set, a name is generated using the fullname template name: "" annotations: {}
...... ...... ...... ...... ...... ......
# 修改为国内源 [root@srv1 ~]# sed -i "s#coredns/#m.daocloud.io/docker.io/coredns/#g" coredns/values.yaml [root@srv1 ~]# sed -i "s#registry.k8s.io/#m.daocloud.io/registry.k8s.io/#g" coredns/values.yaml
################################################## 信息汇总 ################################################## 1. Harbor中已经更改完成,可按以下步骤进行操作 [root@srv1 ~]# helm registry login srv7.1000y.cloud --insecure Username: admin Password: # 输入Harbor管理员密码 Login Succeeded
[root@srv1 ~]# helm pull oci://srv7.1000y.cloud/k8s/chart/coredns --version=1.26.0 \ --insecure-skip-tls-verify Pulled: srv7.1000y.cloud/k8s/chart/coredns:1.26.0 Digest: sha256:9e824163c1530296d3b9a151f40bbba9e9d5367cb0a92d0287b1912f1b84e8ca
[root@srv1 ~]# tar xfz coredns-1.26.0.tgz [root@srv1 ~]# helm install coredns ./coredns-1.26.0/ -n kube-system
################################################## 汇总结束 ##################################################
# 安装CoreDNS [root@srv1 ~]# helm install coredns ./coredns-1.26.0/ -n kube-system NAME: coredns LAST DEPLOYED: Mon Oct 2 08:54:59 2023 NAMESPACE: kube-system STATUS: deployed REVISION: 1 TEST SUITE: None NOTES: CoreDNS is now running in the cluster as a cluster-service.
It can be tested with the following:
1. Launch a Pod with DNS tools:
kubectl run -it --rm --restart=Never --image=infoblox/dnstools:latest dnstools
2. Query the DNS server:
/ # host kubernetes
3) 验证---srv1操作 [root@srv1 ~]# kubectl get pod -A NAMESPACE NAME READY STATUS RESTARTS AGE kube-system calico-kube-controllers-765c96cb9d-fwspl 1/1 Running 0 12h kube-system calico-node-btq6j 1/1 Running 0 12h kube-system calico-node-btzkj 1/1 Running 0 12h kube-system calico-node-d2vbs 1/1 Running 0 12h kube-system calico-node-kb4xn 1/1 Running 0 12h kube-system calico-node-pzqfl 1/1 Running 0 12h kube-system calico-node-x794v 1/1 Running 0 12h kube-system calico-typha-67c57cdf49-c2jfg 1/1 Running 0 12h kube-system coredns-coredns-6c9554fc94-f9qhc 1/1 Running 0 94s
2.16 Metrics Server安装及配置
1) 下载高可用版本yaml---srv1操作
[root@srv1 ~]# curl -O \
https://github.com/kubernetes-sigs/metrics-server/releases/latest/download/high-availability.yaml
2) 修改yaml---srv1操作 [root@srv1 ~]# vim high-availability.yaml ...... ...... ...... ...... ...... ...... containers: - args: # 修改145行-154行内容如下 - --cert-dir=/tmp - --secure-port=4443 - --kubelet-preferred-address-types=InternalIP,ExternalIP,Hostname - --kubelet-use-node-status-port - --metric-resolution=15s - --kubelet-insecure-tls - --requestheader-client-ca-file=/etc/kubernetes/pki/front-proxy-ca.pem - --requestheader-username-headers=X-Remote-User - --requestheader-group-headers=X-Remote-Group - --requestheader-extra-headers-prefix=X-Remote-Extra- ...... ...... ...... ...... ...... ...... volumeMounts: - mountPath: /tmp name: tmp-dir # 于189行-190行添加内容如下 - name: ca-ssl mountPath: /etc/kubernetes/pki nodeSelector: kubernetes.io/os: linux priorityClassName: system-cluster-critical serviceAccountName: metrics-server volumes: - emptyDir: {} name: tmp-dir # 于198行-200行添加内容如下 - name: ca-ssl hostPath: path: /etc/kubernetes/pki --- # 修改202行内容如下 apiVersion: policy/v1 ...... ...... ...... ...... ...... ......
# 修改为国内源 [root@srv1 ~]# sed -i "s#registry.k8s.io/#m.daocloud.io/registry.k8s.io/#g" high-availability.yaml
3) 安装---srv1操作 [root@srv1 ~]# kubectl apply -f high-availability.yaml serviceaccount/metrics-server created clusterrole.rbac.authorization.k8s.io/system:aggregated-metrics-reader created clusterrole.rbac.authorization.k8s.io/system:metrics-server created rolebinding.rbac.authorization.k8s.io/metrics-server-auth-reader created clusterrolebinding.rbac.authorization.k8s.io/metrics-server:system:auth-delegator created clusterrolebinding.rbac.authorization.k8s.io/system:metrics-server created service/metrics-server created deployment.apps/metrics-server created poddisruptionbudget.policy/metrics-server created apiservice.apiregistration.k8s.io/v1beta1.metrics.k8s.io created
4) 测试---srv1操作 [root@srv1 ~]# kubectl get pods -A NAMESPACE NAME READY STATUS RESTARTS AGE kube-system calico-kube-controllers-765c96cb9d-fwspl 1/1 Running 0 12h kube-system calico-node-btq6j 1/1 Running 0 12h kube-system calico-node-btzkj 1/1 Running 0 12h kube-system calico-node-d2vbs 1/1 Running 0 12h kube-system calico-node-kb4xn 1/1 Running 0 12h kube-system calico-node-pzqfl 1/1 Running 0 12h kube-system calico-node-x794v 1/1 Running 0 12h kube-system calico-typha-67c57cdf49-c2jfg 1/1 Running 0 12h kube-system coredns-coredns-6c9554fc94-f9qhc 1/1 Running 0 37m kube-system metrics-server-5fd97bdcd-tcnmg 1/1 Running 0 104s kube-system metrics-server-5fd97bdcd-v99d5 1/1 Running 0 104s
[root@srv1 ~]# kubectl top nodes NAME CPU(cores) CPU% MEMORY(bytes) MEMORY% srv1.1000y.cloud 253m 12% 1368Mi 35% srv2.1000y.cloud 246m 12% 1175Mi 30% srv3.1000y.cloud 262m 13% 1239Mi 32% srv4.1000y.cloud 160m 4% 788Mi 10% srv5.1000y.cloud 167m 4% 786Mi 10% srv6.1000y.cloud 158m 3% 842Mi 10%
2.17 验证集群
1) 创建Pod资源---srv1操作
[root@srv1 ~]# cat<<EOF | kubectl apply -f -
apiVersion: v1
kind: Pod
metadata:
  name: busybox
  namespace: default
spec:
  containers:
  - name: busybox
    image: docker.io/library/busybox:1.28
    command:
      - sleep
      - "3600"
    imagePullPolicy: IfNotPresent
  restartPolicy: Always
EOF
pod/busybox created
[root@srv1 ~]# kubectl get pod NAME READY STATUS RESTARTS AGE busybox 1/1 Running 0 35s
2) 用pod解析默认命名空间中的kubernetes---srv1操作 [root@srv1 ~]# kubectl get svc NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE kubernetes ClusterIP 10.96.0.1 <none> 443/TCP 16h
3) 解析测试---srv1操作 [root@srv1 ~]# kubectl exec busybox -n default -- nslookup kubernetes Server: 10.96.0.10 Address 1: 10.96.0.10 coredns-coredns.kube-system.svc.cluster.local
Name: kubernetes Address 1: 10.96.0.1 kubernetes.default.svc.cluster.local
4) 跨命名空间解析测试---srv1操作 [root@srv1 ~]# kubectl get svc -A NAMESPACE NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE default kubernetes ClusterIP 10.96.0.1 <none> 443/TCP 16h kube-system calico-typha ClusterIP 10.104.7.132 <none> 5473/TCP 12h kube-system coredns-coredns ClusterIP 10.96.0.10 <none> 53/UDP,53/TCP 44m kube-system metrics-server ClusterIP 10.99.90.2 <none> 443/TCP 9m22s
[root@srv1 ~]# kubectl exec busybox -n default -- nslookup coredns-coredns.kube-system Server: 10.96.0.10 Address 1: 10.96.0.10 coredns-coredns.kube-system.svc.cluster.local
Name: coredns-coredns.kube-system Address 1: 10.96.0.10 coredns-coredns.kube-system.svc.cluster.local
5) 测试所有节点均可访问kubernetes svc 443及kube-dns的ssvc 53端口---所有节点操作 [root@srv1 ~]# telnet 10.96.0.1 443 Trying 10.96.0.1... Connected to 10.96.0.1. Escape character is '^]'.
[root@srv1 ~]# telnet 10.96.0.10 53 Trying 10.96.0.10... Connected to 10.96.0.10. Escape character is '^]'.
[root@srv1 ~]# curl 10.96.0.10:53 curl: (52) Empty reply from server
6) 测试Pod之间的通信---srv1节点操作 [root@srv1 ~]# kubectl get pod -o wide NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES busybox 1/1 Running 0 10m 172.16.22.3 srv5.1000y.cloud <none> <none>
[root@srv1 ~]# kubectl get pod -n kube-system -o wide NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES calico-kube-controllers-765c96cb9d-fwspl 1/1 Running 0 12h 10.88.0.2 srv5.1000y.cloud <none> <none> calico-node-btq6j 1/1 Running 0 12h 192.168.1.14 srv4.1000y.cloud <none> <none> calico-node-btzkj 1/1 Running 0 12h 192.168.1.15 srv5.1000y.cloud <none> <none> calico-node-d2vbs 1/1 Running 0 12h 192.168.1.12 srv2.1000y.cloud <none> <none> calico-node-kb4xn 1/1 Running 0 12h 192.168.1.16 srv6.1000y.cloud <none> <none> calico-node-pzqfl 1/1 Running 0 12h 192.168.1.11 srv1.1000y.cloud <none> <none> calico-node-x794v 1/1 Running 0 12h 192.168.1.13 srv3.1000y.cloud <none> <none> calico-typha-67c57cdf49-c2jfg 1/1 Running 0 12h 192.168.1.16 srv6.1000y.cloud <none> <none> coredns-coredns-6c9554fc94-f9qhc 1/1 Running 0 52m 172.19.158.65 srv4.1000y.cloud <none> <none> metrics-server-5fd97bdcd-tcnmg 1/1 Running 0 17m 172.19.158.66 srv4.1000y.cloud <none> <none> metrics-server-5fd97bdcd-v99d5 1/1 Running 0 17m 172.30.172.131 srv6.1000y.cloud <none> <none>
# # 可以连通证明这个pod是可跨命名空间和跨主机通信 [root@srv1 ~]# kubectl exec -ti busybox -- sh / # ping -c 2 192.168.1.14 PING 192.168.1.14 (192.168.1.14): 56 data bytes 64 bytes from 192.168.1.14: seq=0 ttl=63 time=0.647 ms 64 bytes from 192.168.1.14: seq=1 ttl=63 time=0.752 ms
--- 192.168.1.14 ping statistics --- 2 packets transmitted, 2 packets received, 0% packet loss round-trip min/avg/max = 0.647/0.699/0.752 ms
/ # ping -c 2 172.30.172.131 PING 172.30.172.131 (172.30.172.131): 56 data bytes 64 bytes from 172.30.172.131: seq=0 ttl=62 time=1.052 ms 64 bytes from 172.30.172.131: seq=1 ttl=62 time=0.767 ms
--- 172.30.172.131 ping statistics --- 2 packets transmitted, 2 packets received, 0% packet loss round-trip min/avg/max = 0.767/0.909/1.052 ms
/ # exit
7) 创建3副本Pod测试---srv1节点操作 [root@srv1 ~]# cat > deployments.yaml << EOF apiVersion: apps/v1 kind: Deployment metadata: name: nginx-deployment labels: app: nginx spec: replicas: 3 selector: matchLabels: app: nginx template: metadata: labels: app: nginx spec: containers: - name: nginx image: nginx ports: - containerPort: 80 EOF
[root@srv1 ~]# kubectl apply -f deployments.yaml deployment.apps/nginx-deployment created
[root@srv1 ~]# kubectl get pods NAME READY STATUS RESTARTS AGE busybox 1/1 Running 0 19m nginx-deployment-7c5ddbdf54-cljvr 1/1 Running 0 81s nginx-deployment-7c5ddbdf54-nj2pg 1/1 Running 0 81s nginx-deployment-7c5ddbdf54-rng7b 1/1 Running 0 81s
[root@srv1 ~]# kubectl delete -f deployments.yaml deployment.apps "nginx-deployment" deleted
2.18 安装DashBoard
1) 安装DashBoard---srv1节点操作
[root@srv1 ~]# helm repo add kubernetes-dashboard https://kubernetes.github.io/dashboard/
"kubernetes-dashboard" has been added to your repositories
[root@srv1 ~]# helm install kubernetes-dashboard \ kubernetes-dashboard/kubernetes-dashboard \ --namespace kube-system NAME: kubernetes-dashboard LAST DEPLOYED: Mon Oct 2 10:00:01 2023 NAMESPACE: kube-system STATUS: deployed REVISION: 1 TEST SUITE: None NOTES: ********************************************************************************* *** PLEASE BE PATIENT: kubernetes-dashboard may take a few minutes to install *** *********************************************************************************
Get the Kubernetes Dashboard URL by running: export POD_NAME=$(kubectl get pods -n kube-system -l "app.kubernetes.io/name=kubernetes-dashboard,app.kubernetes.io/instance=kubernetes-dashboard" -o jsonpath="{.items[0].metadata.name}") echo https://127.0.0.1:8443/ kubectl -n kube-system port-forward $POD_NAME 8443:8443
################################################## 信息汇总 ################################################## 一: 互联网下载方式 1. 下载指定版本或默认版本的配置文件[默认版本不需要增加 --version 参数] [root@srv1 ~]# # helm pull kubernetes-dashboard/kubernetes-dashboard
2. 解开tgz包 [root@srv1 ~]# # tar xfz kubernetes-dashboard-6.0.8.tgz
3. 修改所需要的values[如替换image信息] [root@srv1 ~]# # vim kubernetes-dashboard/kube-prometheus-stack/values.yaml
4. 安装[ helm install pod_name /path/app_name_dir] [root@srv1 ~]# # helm install kubernetes-dashboard ./kubernetes-dashboard
二: Harbor中下载 1. Harbor中已经更改完成,可按以下步骤进行操作 [root@srv1 ~]# helm registry login srv7.1000y.cloud --insecure Username: admin Password: # 输入Harbor管理员密码 Login Succeeded
[root@srv1 ~]# helm pull oci://srv7.1000y.cloud/k8s/chart/kubernetes-dashboard \ --version=6.0.8 --insecure-skip-tls-verify Pulled: srv7.1000y.cloud/k8s/chart/kubernetes-dashboard:6.0.8 Digest: sha256:4e0e54488eae1159efe0423c057882e221d8b3b8c93104d906669fd6110a2ed4
[root@srv1 ~]# tar xfz kubernetes-dashboard-6.0.8.tgz [root@srv1 ~]# helm install kubernetes-dashboard \ ./kubernetes-dashboard --namespace kube-system
################################################## 汇总结束 ##################################################
2) 修改DashBoard---srv1节点操作 [root@srv1 ~]# kubectl edit svc kubernetes-dashboard -n kube-system ...... ...... ...... ...... ...... ...... # 修改42行,将端口改为NodePort type: NodePort ...... ...... ...... ...... ...... ...... service/kubernetes-dashboard edited
3) 显示主机端口---srv1节点操作 [root@srv1 ~]# kubectl get pods -n kube-system -o wide | grep kubernetes-dashboard kubernetes-dashboard-65cd84fc57-wg2ck 1/1 Running 0 112s 172.19.158.70 srv4.1000y.cloud <none> <none>
################################################## 说明汇总 ##################################################
1. 如果长时间无法下载镜像,可按以下内容操作: # srv1上删除kubernetes-dashboard [root@srv1 ~]# helm uninstall kubernetes-dashboard --namespace kube-system
# srv4上删除kubernetes-dashboard残留的镜像信息 [root@srv4 ~]# ctr -n=k8s.io image rm docker.io/kubernetesui/dashboard:v2.7.0 docker.io/kubernetesui/dashboard:v2.7.0
# srv4上导入kubernetes-dashboard镜像 [root@srv4 ~]# ctr -n=k8s.io image import k8s-dashboard.tar unpacking docker.io/kubernetesui/dashboard:v2.7.0 (sha256:6e567ca1130381351cffd4a6351cfdb9aa39773cdba661638082fa227cdfb498)...done
# srv1上重新安装kubernetes-dashboard [root@srv1 ~]# helm install kubernetes-dashboard \ kubernetes-dashboard/kubernetes-dashboard --namespace kube-system
################################################## 说明结束 ##################################################
[root@srv1 ~]# kubectl get svc kubernetes-dashboard -n kube-system NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE kubernetes-dashboard NodePort 10.106.233.87 <none> 443:31539/TCP 3m52s
4) 创建token---srv1节点操作 [root@srv1 ~]# kubectl create serviceaccount -n kube-system admin-user serviceaccount/admin-user created
[root@srv1 ~]# cat > admin-user.yaml << EOF apiVersion: rbac.authorization.k8s.io/v1 kind: ClusterRoleBinding metadata: name: admin-user annotations: rbac.authorization.kubernetes.io/autoupdate: "true" roleRef: apiGroup: rbac.authorization.k8s.io kind: ClusterRole name: cluster-admin subjects: - kind: ServiceAccount name: admin-user namespace: kube-system EOF
[root@srv1 ~]# kubectl apply -f admin-user.yaml serviceaccount/admin-user created clusterrolebinding.rbac.authorization.k8s.io/admin-user created
[root@srv1 ~]# cat > create-secret.yaml << EOF apiVersion: v1 kind: Secret metadata: name: admin-user namespace: kube-system annotations: kubernetes.io/service-account.name: "admin-user" type: kubernetes.io/service-account-token EOF
[root@srv1 ~]# kubectl apply -f create-secret.yaml secret/admin-user created
5) 查看Token [root@srv1 ~]# kubectl get secret admin-user -n kube-system -o jsonpath={".data.token"} | base64 -d eyJhbGciOiJSUzI1NiIsImtpZCI6IlVZVlpGbTd4NzhYOVpoTUhpbGtiVWJ4ZlVPVWY0YWQ4UzZ6aThzLS1PeFEifQ.eyJpc3MiOiJr dWJlcm5ldGVzL3NlcnZpY2VhY2NvdW50Iiwia3ViZXJuZXRlcy5pby9zZXJ2aWNlYWNjb3VudC9uYW1lc3BhY2UiOiJrdWJlcm5ldGV zLWRhc2hib2FyZCIsImt1YmVybmV0ZXMuaW8vc2VydmljZWFjY291bnQvc2VjcmV0Lm5hbWUiOiJhZG1pbi11c2VyIiwia3ViZXJuZX Rlcy5pby9zZXJ2aWNlYWNjb3VudC9zZXJ2aWNlLWFjY291bnQubmFtZSI6ImFkbWluLXVzZXIiLCJrdWJlcm5ldGVzLmlvL3NlcnZpY 2VhY2NvdW50L3NlcnZpY2UtYWNjb3VudC51aWQiOiJmNTYyYjhlNy0yZDI4LTQwMWQtOWJlZi1hNzM1NTg4NDRiYWQiLCJzdWIiOiJz eXN0ZW06c2VydmljZWFjY291bnQ6a3ViZXJuZXRlcy1kYXNoYm9hcmQ6YWRtaW4tdXNlciJ9.iDdhzAuMEguQBKPpsscPgdPNtqwSFn edP3u6D2ctQQyI-7iUwZfQKIltG4mmW7xyhG2G55lbCHYmR0Bi81-zCkJ51GKQPgExmJauWcJ2zRBUxn6TzVTG2t1wHeWcHbmXSoC6t 5HukbhNGT8c1atUTw-uloRygpqsoWKUzBC0_RwDunwy8yAfm_qknxqve9IUds5emDBQ-HAFTmYZz4xhpAz4tBcWSLL2wNYrB4Bovf4F r10vE7OkwL807m9HftTsVtVej8N4po8OBDHu83aWJCD_EnvfO-9DWcms-Yv16eI2Fuv8ig5HB5w3SxE5A_OD5Zqw6r_SsMgwKKe2KCv q9w
6) 登录DashBoard [浏览器]--->https://srv1.1000y.cloud:31539


2.19 安装ingress
1) 部署ingress
下载地址: https://raw.githubusercontent.com/kubernetes/ingress-nginx/main/deploy/static/provider/cloud/deploy.yaml
[root@srv1 ~]# cat > deploy.yaml << EOF apiVersion: v1 kind: Namespace metadata: labels: app.kubernetes.io/instance: ingress-nginx app.kubernetes.io/name: ingress-nginx name: ingress-nginx --- apiVersion: v1 automountServiceAccountToken: true kind: ServiceAccount metadata: labels: app.kubernetes.io/component: controller app.kubernetes.io/instance: ingress-nginx app.kubernetes.io/name: ingress-nginx app.kubernetes.io/part-of: ingress-nginx app.kubernetes.io/version: 1.9.0 name: ingress-nginx namespace: ingress-nginx --- apiVersion: v1 kind: ServiceAccount metadata: labels: app.kubernetes.io/component: admission-webhook app.kubernetes.io/instance: ingress-nginx app.kubernetes.io/name: ingress-nginx app.kubernetes.io/part-of: ingress-nginx app.kubernetes.io/version: 1.9.0 name: ingress-nginx-admission namespace: ingress-nginx --- apiVersion: rbac.authorization.k8s.io/v1 kind: Role metadata: labels: app.kubernetes.io/component: controller app.kubernetes.io/instance: ingress-nginx app.kubernetes.io/name: ingress-nginx app.kubernetes.io/part-of: ingress-nginx app.kubernetes.io/version: 1.9.0 name: ingress-nginx namespace: ingress-nginx rules: - apiGroups: - "" resources: - namespaces verbs: - get - apiGroups: - "" resources: - configmaps - pods - secrets - endpoints verbs: - get - list - watch - apiGroups: - "" resources: - services verbs: - get - list - watch - apiGroups: - networking.k8s.io resources: - ingresses verbs: - get - list - watch - apiGroups: - networking.k8s.io resources: - ingresses/status verbs: - update - apiGroups: - networking.k8s.io resources: - ingressclasses verbs: - get - list - watch - apiGroups: - coordination.k8s.io resourceNames: - ingress-nginx-leader resources: - leases verbs: - get - update - apiGroups: - coordination.k8s.io resources: - leases verbs: - create - apiGroups: - "" resources: - events verbs: - create - patch - apiGroups: - discovery.k8s.io resources: - endpointslices verbs: - list - watch - get --- apiVersion: rbac.authorization.k8s.io/v1 kind: Role metadata: labels: app.kubernetes.io/component: admission-webhook app.kubernetes.io/instance: ingress-nginx app.kubernetes.io/name: ingress-nginx app.kubernetes.io/part-of: ingress-nginx app.kubernetes.io/version: 1.9.0 name: ingress-nginx-admission namespace: ingress-nginx rules: - apiGroups: - "" resources: - secrets verbs: - get - create --- apiVersion: rbac.authorization.k8s.io/v1 kind: ClusterRole metadata: labels: app.kubernetes.io/instance: ingress-nginx app.kubernetes.io/name: ingress-nginx app.kubernetes.io/part-of: ingress-nginx app.kubernetes.io/version: 1.9.0 name: ingress-nginx rules: - apiGroups: - "" resources: - configmaps - endpoints - nodes - pods - secrets - namespaces verbs: - list - watch - apiGroups: - coordination.k8s.io resources: - leases verbs: - list - watch - apiGroups: - "" resources: - nodes verbs: - get - apiGroups: - "" resources: - services verbs: - get - list - watch - apiGroups: - networking.k8s.io resources: - ingresses verbs: - get - list - watch - apiGroups: - "" resources: - events verbs: - create - patch - apiGroups: - networking.k8s.io resources: - ingresses/status verbs: - update - apiGroups: - networking.k8s.io resources: - ingressclasses verbs: - get - list - watch - apiGroups: - discovery.k8s.io resources: - endpointslices verbs: - list - watch - get --- apiVersion: rbac.authorization.k8s.io/v1 kind: ClusterRole metadata: labels: app.kubernetes.io/component: admission-webhook app.kubernetes.io/instance: ingress-nginx app.kubernetes.io/name: ingress-nginx app.kubernetes.io/part-of: ingress-nginx app.kubernetes.io/version: 1.9.0 name: ingress-nginx-admission rules: - apiGroups: - admissionregistration.k8s.io resources: - validatingwebhookconfigurations verbs: - get - update --- apiVersion: rbac.authorization.k8s.io/v1 kind: RoleBinding metadata: labels: app.kubernetes.io/component: controller app.kubernetes.io/instance: ingress-nginx app.kubernetes.io/name: ingress-nginx app.kubernetes.io/part-of: ingress-nginx app.kubernetes.io/version: 1.9.0 name: ingress-nginx namespace: ingress-nginx roleRef: apiGroup: rbac.authorization.k8s.io kind: Role name: ingress-nginx subjects: - kind: ServiceAccount name: ingress-nginx namespace: ingress-nginx --- apiVersion: rbac.authorization.k8s.io/v1 kind: RoleBinding metadata: labels: app.kubernetes.io/component: admission-webhook app.kubernetes.io/instance: ingress-nginx app.kubernetes.io/name: ingress-nginx app.kubernetes.io/part-of: ingress-nginx app.kubernetes.io/version: 1.9.0 name: ingress-nginx-admission namespace: ingress-nginx roleRef: apiGroup: rbac.authorization.k8s.io kind: Role name: ingress-nginx-admission subjects: - kind: ServiceAccount name: ingress-nginx-admission namespace: ingress-nginx --- apiVersion: rbac.authorization.k8s.io/v1 kind: ClusterRoleBinding metadata: labels: app.kubernetes.io/instance: ingress-nginx app.kubernetes.io/name: ingress-nginx app.kubernetes.io/part-of: ingress-nginx app.kubernetes.io/version: 1.9.0 name: ingress-nginx roleRef: apiGroup: rbac.authorization.k8s.io kind: ClusterRole name: ingress-nginx subjects: - kind: ServiceAccount name: ingress-nginx namespace: ingress-nginx --- apiVersion: rbac.authorization.k8s.io/v1 kind: ClusterRoleBinding metadata: labels: app.kubernetes.io/component: admission-webhook app.kubernetes.io/instance: ingress-nginx app.kubernetes.io/name: ingress-nginx app.kubernetes.io/part-of: ingress-nginx app.kubernetes.io/version: 1.9.0 name: ingress-nginx-admission roleRef: apiGroup: rbac.authorization.k8s.io kind: ClusterRole name: ingress-nginx-admission subjects: - kind: ServiceAccount name: ingress-nginx-admission namespace: ingress-nginx --- apiVersion: v1 data: allow-snippet-annotations: "false" kind: ConfigMap metadata: labels: app.kubernetes.io/component: controller app.kubernetes.io/instance: ingress-nginx app.kubernetes.io/name: ingress-nginx app.kubernetes.io/part-of: ingress-nginx app.kubernetes.io/version: 1.9.0 name: ingress-nginx-controller namespace: ingress-nginx --- apiVersion: v1 kind: Service metadata: labels: app.kubernetes.io/component: controller app.kubernetes.io/instance: ingress-nginx app.kubernetes.io/name: ingress-nginx app.kubernetes.io/part-of: ingress-nginx app.kubernetes.io/version: 1.9.0 name: ingress-nginx-controller namespace: ingress-nginx spec: externalTrafficPolicy: Local ipFamilies: - IPv4 ipFamilyPolicy: SingleStack ports: - appProtocol: http name: http port: 80 protocol: TCP targetPort: http - appProtocol: https name: https port: 443 protocol: TCP targetPort: https selector: app.kubernetes.io/component: controller app.kubernetes.io/instance: ingress-nginx app.kubernetes.io/name: ingress-nginx type: LoadBalancer --- apiVersion: v1 kind: Service metadata: labels: app.kubernetes.io/component: controller app.kubernetes.io/instance: ingress-nginx app.kubernetes.io/name: ingress-nginx app.kubernetes.io/part-of: ingress-nginx app.kubernetes.io/version: 1.9.0 name: ingress-nginx-controller-admission namespace: ingress-nginx spec: ports: - appProtocol: https name: https-webhook port: 443 targetPort: webhook selector: app.kubernetes.io/component: controller app.kubernetes.io/instance: ingress-nginx app.kubernetes.io/name: ingress-nginx type: ClusterIP --- apiVersion: apps/v1 kind: DaemonSet metadata: labels: app.kubernetes.io/component: controller app.kubernetes.io/instance: ingress-nginx app.kubernetes.io/name: ingress-nginx app.kubernetes.io/part-of: ingress-nginx app.kubernetes.io/version: 1.9.0 name: ingress-nginx-controller namespace: ingress-nginx spec: minReadySeconds: 0 revisionHistoryLimit: 10 selector: matchLabels: app.kubernetes.io/component: controller app.kubernetes.io/instance: ingress-nginx app.kubernetes.io/name: ingress-nginx template: metadata: labels: app.kubernetes.io/component: controller app.kubernetes.io/instance: ingress-nginx app.kubernetes.io/name: ingress-nginx app.kubernetes.io/part-of: ingress-nginx app.kubernetes.io/version: 1.9.0 spec: hostNetwork: true containers: - args: - /nginx-ingress-controller - --publish-service=$(POD_NAMESPACE)/ingress-nginx-controller - --election-id=ingress-nginx-leader - --controller-class=k8s.io/ingress-nginx - --ingress-class=nginx - --configmap=$(POD_NAMESPACE)/ingress-nginx-controller - --validating-webhook=:8443 - --validating-webhook-certificate=/usr/local/certificates/cert - --validating-webhook-key=/usr/local/certificates/key env: - name: POD_NAME valueFrom: fieldRef: fieldPath: metadata.name - name: POD_NAMESPACE valueFrom: fieldRef: fieldPath: metadata.namespace - name: LD_PRELOAD value: /usr/local/lib/libmimalloc.so image: m.daocloud.io/registry.k8s.io/ingress-nginx/controller:v1.9.0@sha256:c15d1a617858d90fb8f8a2dd60b0676f2bb85c54e3ed11511794b86ec30c8c60 imagePullPolicy: IfNotPresent lifecycle: preStop: exec: command: - /wait-shutdown livenessProbe: failureThreshold: 5 httpGet: path: /healthz port: 10254 scheme: HTTP initialDelaySeconds: 10 periodSeconds: 10 successThreshold: 1 timeoutSeconds: 1 name: controller ports: - containerPort: 3721 name: http protocol: TCP readinessProbe: failureThreshold: 3 httpGet: path: /healthz port: 10254 scheme: HTTP initialDelaySeconds: 10 periodSeconds: 10 successThreshold: 1 timeoutSeconds: 1 resources: requests: cpu: 100m memory: 90Mi securityContext: allowPrivilegeEscalation: true capabilities: add: - NET_BIND_SERVICE drop: - ALL runAsUser: 101 volumeMounts: - mountPath: /usr/local/certificates/ name: webhook-cert readOnly: true dnsPolicy: ClusterFirst nodeSelector: kubernetes.io/os: linux serviceAccountName: ingress-nginx terminationGracePeriodSeconds: 300 volumes: - name: webhook-cert secret: secretName: ingress-nginx-admission --- apiVersion: batch/v1 kind: Job metadata: labels: app.kubernetes.io/component: admission-webhook app.kubernetes.io/instance: ingress-nginx app.kubernetes.io/name: ingress-nginx app.kubernetes.io/part-of: ingress-nginx app.kubernetes.io/version: 1.9.0 name: ingress-nginx-admission-create namespace: ingress-nginx spec: template: metadata: labels: app.kubernetes.io/component: admission-webhook app.kubernetes.io/instance: ingress-nginx app.kubernetes.io/name: ingress-nginx app.kubernetes.io/part-of: ingress-nginx app.kubernetes.io/version: 1.9.0 name: ingress-nginx-admission-create spec: containers: - args: - create - --host=ingress-nginx-controller-admission,ingress-nginx-controller-admission.$(POD_NAMESPACE).svc - --namespace=$(POD_NAMESPACE) - --secret-name=ingress-nginx-admission env: - name: POD_NAMESPACE valueFrom: fieldRef: fieldPath: metadata.namespace image: m.daocloud.io/registry.k8s.io/ingress-nginx/kube-webhook-certgen:v20230407@sha256:543c40fd093964bc9ab509d3e791f9989963021f1e9e4c9c7b6700b02bfb227b imagePullPolicy: IfNotPresent name: create securityContext: allowPrivilegeEscalation: false nodeSelector: kubernetes.io/os: linux restartPolicy: OnFailure securityContext: fsGroup: 2000 runAsNonRoot: true runAsUser: 2000 serviceAccountName: ingress-nginx-admission --- apiVersion: batch/v1 kind: Job metadata: labels: app.kubernetes.io/component: admission-webhook app.kubernetes.io/instance: ingress-nginx app.kubernetes.io/name: ingress-nginx app.kubernetes.io/part-of: ingress-nginx app.kubernetes.io/version: 1.9.0 name: ingress-nginx-admission-patch namespace: ingress-nginx spec: template: metadata: labels: app.kubernetes.io/component: admission-webhook app.kubernetes.io/instance: ingress-nginx app.kubernetes.io/name: ingress-nginx app.kubernetes.io/part-of: ingress-nginx app.kubernetes.io/version: 1.9.0 name: ingress-nginx-admission-patch spec: containers: - args: - patch - --webhook-name=ingress-nginx-admission - --namespace=$(POD_NAMESPACE) - --patch-mutating=false - --secret-name=ingress-nginx-admission - --patch-failure-policy=Fail env: - name: POD_NAMESPACE valueFrom: fieldRef: fieldPath: metadata.namespace image: m.daocloud.io/registry.k8s.io/ingress-nginx/kube-webhook-certgen:v20230407@sha256:543c40fd093964bc9ab509d3e791f9989963021f1e9e4c9c7b6700b02bfb227b imagePullPolicy: IfNotPresent name: patch securityContext: allowPrivilegeEscalation: false nodeSelector: kubernetes.io/os: linux restartPolicy: OnFailure securityContext: fsGroup: 2000 runAsNonRoot: true runAsUser: 2000 serviceAccountName: ingress-nginx-admission --- apiVersion: networking.k8s.io/v1 kind: IngressClass metadata: labels: app.kubernetes.io/component: controller app.kubernetes.io/instance: ingress-nginx app.kubernetes.io/name: ingress-nginx app.kubernetes.io/part-of: ingress-nginx app.kubernetes.io/version: 1.9.0 name: nginx spec: controller: k8s.io/ingress-nginx --- apiVersion: admissionregistration.k8s.io/v1 kind: ValidatingWebhookConfiguration metadata: labels: app.kubernetes.io/component: admission-webhook app.kubernetes.io/instance: ingress-nginx app.kubernetes.io/name: ingress-nginx app.kubernetes.io/part-of: ingress-nginx app.kubernetes.io/version: 1.9.0 name: ingress-nginx-admission webhooks: - admissionReviewVersions: - v1 clientConfig: service: name: ingress-nginx-controller-admission namespace: ingress-nginx path: /networking/v1/ingresses failurePolicy: Fail matchPolicy: Equivalent name: validate.nginx.ingress.kubernetes.io rules: - apiGroups: - networking.k8s.io apiVersions: - v1 operations: - CREATE - UPDATE resources: - ingresses sideEffects: None EOF
[root@srv1 ~]# cat > backend.yaml << EOF apiVersion: apps/v1 kind: Deployment metadata: name: default-http-backend labels: app.kubernetes.io/name: default-http-backend namespace: kube-system spec: replicas: 1 selector: matchLabels: app.kubernetes.io/name: default-http-backend template: metadata: labels: app.kubernetes.io/name: default-http-backend spec: terminationGracePeriodSeconds: 60 containers: - name: default-http-backend image: registry.cn-hangzhou.aliyuncs.com/chenby/defaultbackend-amd64:1.5 livenessProbe: httpGet: path: /healthz port: 8080 scheme: HTTP initialDelaySeconds: 30 timeoutSeconds: 5 ports: - containerPort: 8080 resources: limits: cpu: 10m memory: 20Mi requests: cpu: 10m memory: 20Mi --- apiVersion: v1 kind: Service metadata: name: default-http-backend namespace: kube-system labels: app.kubernetes.io/name: default-http-backend spec: ports: - port: 80 targetPort: 8080 selector: app.kubernetes.io/name: default-http-backend EOF
[root@srv1 ~]# kubectl apply -f deploy.yaml namespace/ingress-nginx created serviceaccount/ingress-nginx created serviceaccount/ingress-nginx-admission created role.rbac.authorization.k8s.io/ingress-nginx created role.rbac.authorization.k8s.io/ingress-nginx-admission created clusterrole.rbac.authorization.k8s.io/ingress-nginx created clusterrole.rbac.authorization.k8s.io/ingress-nginx-admission created rolebinding.rbac.authorization.k8s.io/ingress-nginx created rolebinding.rbac.authorization.k8s.io/ingress-nginx-admission created clusterrolebinding.rbac.authorization.k8s.io/ingress-nginx created clusterrolebinding.rbac.authorization.k8s.io/ingress-nginx-admission created configmap/ingress-nginx-controller created service/ingress-nginx-controller created service/ingress-nginx-controller-admission created deployment.apps/ingress-nginx-controller created job.batch/ingress-nginx-admission-create created job.batch/ingress-nginx-admission-patch created ingressclass.networking.k8s.io/nginx created validatingwebhookconfiguration.admissionregistration.k8s.io/ingress-nginx-admission created
[root@srv1 ~]# kubectl apply -f backend.yaml deployment.apps/default-http-backend created service/default-http-backend created
2) 确认ingress POD [root@srv1 ~]# kubectl get pods -n ingress-nginx NAME READY STATUS RESTARTS AGE ingress-nginx-admission-create-7spq6 0/1 Completed 0 39m ingress-nginx-admission-patch-qptvx 0/1 Completed 2 39m ingress-nginx-controller-6r8t9 1/1 Running 0 39m ingress-nginx-controller-9rhfl 1/1 Running 0 39m ingress-nginx-controller-d6jlr 1/1 Running 0 39m ingress-nginx-controller-dx22k 1/1 Running 0 39m ingress-nginx-controller-vqhjj 1/1 Running 0 39m ingress-nginx-controller-vt26r 1/1 Running 0 39m
[root@srv1 ~]# kubectl get pods -n kube-system | grep default-http-backend default-http-backend-7b44966d95-gz52t 1/1 Running 0 4m8s
3) 测试ingress [root@srv1 ~]# cat > ingress-demo-app.yaml << EOF apiVersion: apps/v1 kind: Deployment metadata: name: hello-server spec: replicas: 2 selector: matchLabels: app: hello-server template: metadata: labels: app: hello-server spec: containers: - name: hello-server image: registry.cn-hangzhou.aliyuncs.com/lfy_k8s_images/hello-server ports: - containerPort: 9000 --- apiVersion: apps/v1 kind: Deployment metadata: labels: app: nginx-demo name: nginx-demo spec: replicas: 2 selector: matchLabels: app: nginx-demo template: metadata: labels: app: nginx-demo spec: containers: - image: nginx name: nginx --- apiVersion: v1 kind: Service metadata: labels: app: nginx-demo name: nginx-demo spec: selector: app: nginx-demo ports: - port: 8000 protocol: TCP targetPort: 80 --- apiVersion: v1 kind: Service metadata: labels: app: hello-server name: hello-server spec: selector: app: hello-server ports: - port: 8000 protocol: TCP targetPort: 9000 --- apiVersion: networking.k8s.io/v1 kind: Ingress metadata: name: ingress-host-bar spec: ingressClassName: nginx rules: - host: "srv1.1000y.cloud" http: paths: - pathType: Prefix path: "/" backend: service: name: hello-server port: number: 8000 - host: "srv1.1000y.cloud" http: paths: - pathType: Prefix path: "/nginx" backend: service: name: nginx-demo port: number: 8000 EOF
[root@srv1 ~]# kubectl apply -f ingress-demo-app.yaml deployment.apps/hello-server created deployment.apps/nginx-demo created service/nginx-demo created service/hello-server created ingress.networking.k8s.io/ingress-host-bar created
################################################## 错误汇总 ##################################################
1. 出现如下错误: Error from server (InternalError): error when creating "ingress-demo-app.yaml": Internal error occurred: failed calling webhook "validate.nginx.ingress.kubernetes.io": failed to call webhook: Post "https://ingress-nginx-controller-admission.ingress -nginx.svc:443/networking/v1/ingresses?timeout=10s": no endpoints available for service "ingress-nginx-controller-admission"
2. 问题: 刚开始使用yaml的方式创建nginx-ingress,之后删除了它创建的命名空间以及 clusterrole and clusterrolebinding ,但是没有删除ValidatingWebhookConfiguration ingress-nginx-admission,这个ingress-nginx-admission是在yaml文件中安装的。当再次安装nginx-ingress之后,创建自定义的ingress就会报这个错误。
3. 解决方法: [root@srv1 ingress]# kubectl get validatingwebhookconfigurations NAME WEBHOOKS AGE ingress-nginx-admission 1 7m47s
[root@srv1 ingress]# kubectl delete -A ValidatingWebhookConfiguration ingress-nginx-admission validatingwebhookconfiguration.admissionregistration.k8s.io "ingress-nginx-admission" deleted
################################################## 汇总结束 ##################################################
[root@srv1 ~]# kubectl get ingress NAME CLASS HOSTS ADDRESS PORTS AGE ingress-host-bar nginx srv1.1000y.cloud,srv1.1000y.cloud 80 4m16s
# 测试 [root@srv1 ingress]# curl srv1.1000y.cloud Hello World!
4) 查看ingress端口 [root@srv1 ~]# kubectl get svc -n ingress-nginx NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE ingress-nginx-controller LoadBalancer 10.110.11.11 <pending> 80:31350/TCP,443:30829/TCP 26m ingress-nginx-controller-admission ClusterIP 10.101.19.249 <none> 443/TCP 26m
5) 测试 [root@srv1 ~]# cat << EOF | kubectl apply -f - apiVersion: apps/v1 kind: Deployment metadata: name: snow spec: replicas: 3 selector: matchLabels: app: snow template: metadata: labels: app: snow spec: containers: - name: snow image: docker.io/library/nginx resources: limits: memory: "128Mi" cpu: "500m" ports: - containerPort: 80 --- apiVersion: v1 kind: Service metadata: name: snow spec: ipFamilyPolicy: PreferDualStack ipFamilies: - IPv4 type: NodePort selector: app: snow ports: - port: 80 targetPort: 80 EOF deployment.apps/snow created service/snow created
[root@srv1 ~]# kubectl get svc NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE hello-server ClusterIP 10.108.142.145 <none> 8000/TCP 5m15s kubernetes ClusterIP 10.96.0.1 <none> 443/TCP 20h nginx-demo ClusterIP 10.100.50.36 <none> 8000/TCP 5m17s snow NodePort 10.99.57.229 <none> 80:32173/TCP 40s
[root@srv1 ~]# kubectl get pod NAME READY STATUS RESTARTS AGE hello-server-569d7866bd-64rcd 1/1 Running 0 26m hello-server-569d7866bd-t4dnq 1/1 Running 0 26m nginx-demo-554db85f85-crss7 1/1 Running 0 26m nginx-demo-554db85f85-x9dc8 1/1 Running 0 26m snow-7c6bdf498f-4tplm 1/1 Running 0 15m snow-7c6bdf498f-trxrd 1/1 Running 0 15m snow-7c6bdf498f-vcttf 1/1 Running 0 15m
[root@srv1 ~]# curl -I http://srv1.1000y.cloud:32173 HTTP/1.1 200 OK Server: nginx/1.21.5 Date: Mon, 02 Oct 2023 05:29:48 GMT Content-Type: text/html Content-Length: 615 Last-Modified: Tue, 28 Dec 2021 15:28:38 GMT Connection: keep-alive ETag: "61cb2d26-267" Accept-Ranges: bytes
2.20 ELK集群部署
1) 创建nfs服务---srv7操作
[root@srv7 ~]# yum install nfs-utils -y
[root@srv7 ~]# vim /etc/exports # no_subtree_check: 即使输出目录是一个子目录,nfs服务器也不检查其父目录的权限,可提高效率 /data/nfs-sc *(rw,no_root_squash,no_subtree_check)
[root@srv7 ~]# mkdir -p /data/nfs-sc
[root@srv7 ~]# systemctl enable --now rpcbind nfs-server
2) 安装nfs客户端工具---k8s所有节点操作 [root@srv1 ~]# yum install nfs-utils -y [root@srv7 ~]# systemctl enable --now rpcbind
3) 为nfs-subdir-external-provisione创建ServiceAccount---srv1节点操作 [root@srv1 ~]# mkdir k8s-elk-yaml [root@srv1 ~]# cd k8s-elk-yaml
[root@srv1 k8s-elk-yaml]# vim 01-serviceaccount.yaml apiVersion: v1 kind: ServiceAccount metadata: name: nfs-client-provisioner namespace: kube-system --- kind: ClusterRole apiVersion: rbac.authorization.k8s.io/v1 metadata: name: nfs-client-provisioner-runner rules: - apiGroups: [""] resources: ["persistentvolumes"] verbs: ["get", "list", "watch", "create", "delete"] - apiGroups: [""] resources: ["persistentvolumeclaims"] verbs: ["get", "list", "watch", "update"] - apiGroups: ["storage.k8s.io"] resources: ["storageclasses"] verbs: ["get", "list", "watch"] - apiGroups: [""] resources: ["events"] verbs: ["get", "list", "watch","create", "update", "patch"] --- kind: ClusterRoleBinding apiVersion: rbac.authorization.k8s.io/v1 metadata: name: run-nfs-client-provisioner subjects: - kind: ServiceAccount name: nfs-client-provisioner namespace: kube-system roleRef: kind: ClusterRole name: nfs-client-provisioner-runner apiGroup: rbac.authorization.k8s.io --- kind: Role apiVersion: rbac.authorization.k8s.io/v1 metadata: name: leader-locking-nfs-client-provisioner namespace: kube-system rules: - apiGroups: [""] resources: ["endpoints"] verbs: ["get", "list", "watch", "create", "update", "patch"] --- kind: RoleBinding apiVersion: rbac.authorization.k8s.io/v1 metadata: name: leader-locking-nfs-client-provisioner namespace: kube-system subjects: - kind: ServiceAccount name: nfs-client-provisioner namespace: kube-system roleRef: kind: Role name: leader-locking-nfs-client-provisioner apiGroup: rbac.authorization.k8s.io
[root@srv1 11k8s-elk-yaml]# kubectl apply -f 01-serviceaccount.yaml serviceaccount/nfs-client-provisioner created clusterrole.rbac.authorization.k8s.io/nfs-client-provisioner-runner created clusterrolebinding.rbac.authorization.k8s.io/run-nfs-client-provisioner created role.rbac.authorization.k8s.io/leader-locking-nfs-client-provisioner created rolebinding.rbac.authorization.k8s.io/leader-locking-nfs-client-provisioner created
4) 部署nfs-subdir-external-provisione---srv1节点操作 [root@srv1 11k8s-elk-yaml]# vim 02-deploy-nfs.yaml apiVersion: v1 kind: ServiceAccount metadata: name: nfs-client-provisioner --- kind: Deployment apiVersion: apps/v1 metadata: name: nfs-client-provisioner namespace: kube-system spec: replicas: 1 strategy: type: Recreate selector: matchLabels: app: nfs-client-provisioner template: metadata: labels: app: nfs-client-provisioner spec: serviceAccountName: nfs-client-provisioner containers: - name: nfs-client-provisioner image: registry.cn-beijing.aliyuncs.com/xngczl/nfs-subdir-external-provisione:v4.0.0 imagePullPolicy: IfNotPresent volumeMounts: - name: nfs-client-root mountPath: /persistentvolumes env: - name: PROVISIONER_NAME # 可改为自定义名称 value: 1000y.cloud/snowchuai - name: NFS_SERVER # 改为你自己的NFS Server IP value: 192.168.1.17 - name: NFS_PATH # 改为你自己的NFS共享目录的路径 value: /data/nfs-sc volumes: - name: nfs-client-root nfs: # 改为你自己的NFS Server IP server: 192.168.1.17 # 改为你自己的NFS共享目录的路径 path: /data/nfs-sc
[root@srv1 k8s-elk-yaml]# kubectl apply -f 02-deploy-nfs.yaml serviceaccount/nfs-client-provisioner created deployment.apps/nfs-client-provisioner created
[root@srv1 k8s-elk-yaml]# kubectl get pods -A | grep nfs kube-system nfs-client-provisioner-6f46475989-qcb9x 1/1 Running 0 74s
5) 创建sc---srv1节点操作 [root@srv1 k8s-elk-yaml]# vim 03-sc-nfs.yaml apiVersion: storage.k8s.io/v1 kind: StorageClass metadata: # sc名称,如修改名称,必须确保后续调用sc的所有yaml文件都是所修改的名称 name: managed-nfs-storage annotations: storageclass.beta.kubernetes.io/is-default-class: "true" # 此处名称必须与 02-deploy-nfs.yaml 文件中的名称一致 provisioner: 1000y.cloud/snowchuai reclaimPolicy: Delete allowVolumeExpansion: True
[root@srv1 k8s-elk-yaml]# kubectl apply -f 03-sc-nfs.yaml storageclass.storage.k8s.io/managed-nfs-storage created
[root@srv1 k8s-elk-yaml]# kubectl get sc NAME PROVISIONER RECLAIMPOLICY VOLUMEBINDINGMODE managed-nfs-storage (default) 1000y.cloud/snowchuai Delete Immediate ALLOWVOLUMEEXPANSION AGE true 2s
6) 创建一个kube-logging命名空间给es---srv1操作 [root@srv1 k8s-elk-yaml]# vim 04-es-ns.yaml apiVersion: v1 kind: Namespace metadata: name: kube-logging
[root@srv1 k8s-elk-yaml]# kubectl apply -f 04-es-ns.yaml namespace/kube-logging created
[root@srv1 k8s-elk-yaml]# kubectl get ns NAME STATUS AGE default Active 39h ingress-nginx Active 36h kube-logging Active 14s kube-node-lease Active 39h kube-public Active 39h kube-system Active 39h kubernetes-dashboard Active 37h
7) 创建es service---srv1操作 [root@srv1 k8s-elk-yaml]# vim 05-es-svc.yaml kind: Service apiVersion: v1 metadata: name: elasticsearch namespace: kube-logging labels: app: elasticsearch spec: selector: app: elasticsearch clusterIP: None ports: - port: 9200 name: rest - port: 9300 name: inter-node
[root@srv1 k8s-elk-yaml]# kubectl apply -f 05-es-svc.yaml service/elasticsearch created
[root@srv1 k8s-elk-yaml]# kubectl get svc -n kube-logging NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE elasticsearch ClusterIP None <none> 9200/TCP,9300/TCP 33s
8) 部署es cluster---srv1操作 [root@srv1 k8s-elk-yaml]# vim 06-es-statefulset-deploy.yaml apiVersion: apps/v1 kind: StatefulSet metadata: name: es-cluster namespace: kube-logging spec: serviceName: elasticsearch replicas: 3 selector: matchLabels: app: elasticsearch template: metadata: labels: app: elasticsearch spec: containers: - name: elasticsearch image: registry.cn-beijing.aliyuncs.com/dotbalo/elasticsearch:v7.10.2 imagePullPolicy: IfNotPresent resources: limits: cpu: 1000m requests: cpu: 100m ports: - containerPort: 9200 name: rest protocol: TCP - containerPort: 9300 name: inter-node protocol: TCP volumeMounts: - name: data mountPath: /usr/share/elasticsearch/data env: - name: cluster.name value: k8s-logs - name: node.name valueFrom: fieldRef: fieldPath: metadata.name - name: discovery.seed_hosts value: "es-cluster-0.elasticsearch,es-cluster-1.elasticsearch,es-cluster-2.elasticsearch" - name: cluster.initial_master_nodes value: "es-cluster-0,es-cluster-1,es-cluster-2" - name: ES_JAVA_OPTS value: "-Xms512m -Xmx512m" initContainers: - name: fix-permissions image: busybox imagePullPolicy: IfNotPresent command: ["sh", "-c", "chown -R 1000:1000 /usr/share/elasticsearch/data"] securityContext: privileged: true volumeMounts: - name: data mountPath: /usr/share/elasticsearch/data - name: increase-vm-max-map image: busybox imagePullPolicy: IfNotPresent command: ["sysctl", "-w", "vm.max_map_count=262144"] securityContext: privileged: true - name: increase-fd-ulimit image: busybox imagePullPolicy: IfNotPresent command: ["sh", "-c", "ulimit -n 65536"] securityContext: privileged: true volumeClaimTemplates: - metadata: name: data labels: app: elasticsearch spec: accessModes: [ "ReadWriteOnce" ] # 如果在 03-sc-nfs.yaml 中重新定义了 sc名称,请在这里保持一致 storageClassName: managed-nfs-storage resources: requests: storage: 10Gi
[root@srv1 k8s-elk-yaml]# kubectl apply -f 06-es-statefulset-deploy.yaml statefulset.apps/es-cluster created
# 查看pod的状态 [root@srv1 k8s-elk-yaml]# kubectl get pod -n kube-logging NAME READY STATUS RESTARTS AGE es-cluster-0 1/1 Running 0 2m es-cluster-1 1/1 Running 0 5m es-cluster-2 1/1 Running 0 9m
# 查看pod的IP地址及所在主机地址 [root@srv1 k8s-elk-yaml]# kubectl get pod -l app=elasticsearch \ -o custom-columns=POD:metadata.name,Pod-IP:.status.podIP,Node-IP:.status.hostIP \ -n kube-logging POD Pod-IP Node-IP es-cluster-0 172.30.172.145 192.168.1.16 es-cluster-1 172.16.22.14 192.168.1.15 es-cluster-2 172.19.158.75 192.168.1.14
# 查看es-cluster集群状态 --- 多等一会,大概5-15分钟 [root@srv1 k8s-elk-yaml]# curl http://172.30.172.145:9200/_cluster/health?pretty { "cluster_name" : "k8s-logs", "status" : "green", "timed_out" : false, "number_of_nodes" : 3, "number_of_data_nodes" : 3, "active_primary_shards" : 4, "active_shards" : 8, "relocating_shards" : 0, "initializing_shards" : 0, "unassigned_shards" : 0, "delayed_unassigned_shards" : 0, "number_of_pending_tasks" : 0, "number_of_in_flight_fetch" : 0, "task_max_waiting_in_queue_millis" : 0, "active_shards_percent_as_number" : 100.0 }
9) 创建Kibana service及部署---srv1操作 [root@srv1 k8s-elk-yaml]# vim 07-kibanfa-svc.yaml apiVersion: v1 kind: Service metadata: name: kibana namespace: kube-logging labels: app: kibana spec: type: NodePort ports: - port: 5601 selector: app: kibana
[root@srv1 k8s-elk-yaml]# vim 08-kibana-deploy.yaml apiVersion: apps/v1 kind: Deployment metadata: name: kibana namespace: kube-logging labels: app: kibana spec: replicas: 1 selector: matchLabels: app: kibana template: metadata: labels: app: kibana spec: containers: - name: kibana image: registry.cn-beijing.aliyuncs.com/dotbalo/kibana-oss:7.10.2 imagePullPolicy: IfNotPresent resources: limits: cpu: 1000m requests: cpu: 100m env: - name: ELASTICSEARCH_URL value: http://elasticsearch:9200 ports: - containerPort: 5601
[root@srv1 k8s-elk-yaml]# kubectl apply -f 07-kibana-svc.yaml -f 08-kibana-deploy.yaml service/kibana created deployment.apps/kibana created
10) 部署fluentd---srv1操作 [root@srv1 k8s-elk-yaml]# vim 10-fluentd-es-configmap.yaml kind: ConfigMap apiVersion: v1 metadata: name: fluentd-es-config-v0.2.1 namespace: kube-logging labels: addonmanager.kubernetes.io/mode: Reconcile data: system.conf: |- <system> root_dir /tmp/fluentd-buffers/ </system> containers.input.conf: |- <source> @id fluentd-containers.log @type tail path /var/log/containers/*.log pos_file /var/log/es-containers.log.pos tag raw.kubernetes.* read_from_head true <parse> @type multi_format <pattern> format json time_key time time_format %Y-%m-%dT%H:%M:%S.%NZ </pattern> <pattern> format /^(?<time>.+) (?<stream>stdout|stderr) [^ ]* (?<log>.*)$/ time_format %Y-%m-%dT%H:%M:%S.%N%:z </pattern> </parse> </source> # Detect exceptions in the log output and forward them as one log entry. <match raw.kubernetes.**> @id raw.kubernetes @type detect_exceptions remove_tag_prefix raw message log stream stream multiline_flush_interval 5 max_bytes 500000 max_lines 1000 </match> # Concatenate multi-line logs <filter **> @id filter_concat @type concat key message multiline_end_regexp /\n$/ separator "" </filter> # Enriches records with Kubernetes metadata <filter kubernetes.**> @id filter_kubernetes_metadata @type kubernetes_metadata </filter> # Fixes json fields in Elasticsearch <filter kubernetes.**> @id filter_parser @type parser key_name log reserve_data true remove_key_name_field true <parse> @type multi_format <pattern> format json </pattern> <pattern> format none </pattern> </parse> </filter> system.input.conf: |- # Example: # 2015-12-21 23:17:22,066 [salt.state ][INFO ] Completed state [net.ipv4.ip_forward] at time 23:17:22.066081 <source> @id minion @type tail format /^(?<time>[^ ]* [^ ,]*)[^\[]*\[[^\]]*\]\[(?<severity>[^ \]]*) *\] (?<message>.*)$/ time_format %Y-%m-%d %H:%M:%S path /var/log/salt/minion pos_file /var/log/salt.pos tag salt </source> # Example: # Dec 21 23:17:22 gke-foo-1-1-4b5cbd14-node-4eoj startupscript: Finished running startup script /var/run/google.startup.script <source> @id startupscript.log @type tail format syslog path /var/log/startupscript.log pos_file /var/log/es-startupscript.log.pos tag startupscript </source> # Examples: # time="2016-02-04T06:51:03.053580605Z" level=info msg="GET /containers/json" # time="2016-02-04T07:53:57.505612354Z" level=error msg="HTTP Error" err="No such image: -f" statusCode=404 # TODO(random-liu): Remove this after cri container runtime rolls out. <source> @id docker.log @type tail format /^time="(?<time>[^"]*)" level=(?<severity>[^ ]*) msg="(?<message>[^"]*)"( err="(?<error>[^"]*)")?( statusCode=($<status_code>\d+))?/ path /var/log/docker.log pos_file /var/log/es-docker.log.pos tag docker </source> # Example: # 2016/02/04 06:52:38 filePurge: successfully removed file /var/etcd/data/member/wal/00000000000006d0-00000000010a23d1.wal <source> @id etcd.log @type tail # Not parsing this, because it doesn't have anything particularly useful to # parse out of it (like severities). format none path /var/log/etcd.log pos_file /var/log/es-etcd.log.pos tag etcd </source> # Multi-line parsing is required for all the kube logs because very large log # statements, such as those that include entire object bodies, get split into # multiple lines by glog. # Example: # I0204 07:32:30.020537 3368 server.go:1048] POST /stats/container/: (13.972191ms) 200 [[Go-http-client/1.1] 10.244.1.3:40537] <source> @id kubelet.log @type tail format multiline multiline_flush_interval 5s format_firstline /^\w\d{4}/ format1 /^(?<severity>\w)(?<time>\d{4} [^\s]*)\s+(?<pid>\d+)\s+(?<source>[^ \]]+)\] (?<message>.*)/ time_format %m%d %H:%M:%S.%N path /var/log/kubelet.log pos_file /var/log/es-kubelet.log.pos tag kubelet </source> # Example: # I1118 21:26:53.975789 6 proxier.go:1096] Port "nodePort for kube-system/default-http-backend:http" (:31429/tcp) was open before and is still needed <source> @id kube-proxy.log @type tail format multiline multiline_flush_interval 5s format_firstline /^\w\d{4}/ format1 /^(?<severity>\w)(?<time>\d{4} [^\s]*)\s+(?<pid>\d+)\s+(?<source>[^ \]]+)\] (?<message>.*)/ time_format %m%d %H:%M:%S.%N path /var/log/kube-proxy.log pos_file /var/log/es-kube-proxy.log.pos tag kube-proxy </source> # Example: # I0204 07:00:19.604280 5 handlers.go:131] GET /api/v1/nodes: (1.624207ms) 200 [[kube-controller-manager/v1.1.3 (linux/amd64) kubernetes/6a81b50] 127.0.0.1:38266] <source> @id kube-apiserver.log @type tail format multiline multiline_flush_interval 5s format_firstline /^\w\d{4}/ format1 /^(?<severity>\w)(?<time>\d{4} [^\s]*)\s+(?<pid>\d+)\s+(?<source>[^ \]]+)\] (?<message>.*)/ time_format %m%d %H:%M:%S.%N path /var/log/kube-apiserver.log pos_file /var/log/es-kube-apiserver.log.pos tag kube-apiserver </source> # Example: # I0204 06:55:31.872680 5 servicecontroller.go:277] LB already exists and doesn't need update for service kube-system/kube-ui <source> @id kube-controller-manager.log @type tail format multiline multiline_flush_interval 5s format_firstline /^\w\d{4}/ format1 /^(?<severity>\w)(?<time>\d{4} [^\s]*)\s+(?<pid>\d+)\s+(?<source>[^ \]]+)\] (?<message>.*)/ time_format %m%d %H:%M:%S.%N path /var/log/kube-controller-manager.log pos_file /var/log/es-kube-controller-manager.log.pos tag kube-controller-manager </source> # Example: # W0204 06:49:18.239674 7 reflector.go:245] pkg/scheduler/factory/factory.go:193: watch of *api.Service ended with: 401: The event in requested index is outdated and cleared (the requested history has been cleared [2578313/2577886]) [2579312] <source> @id kube-scheduler.log @type tail format multiline multiline_flush_interval 5s format_firstline /^\w\d{4}/ format1 /^(?<severity>\w)(?<time>\d{4} [^\s]*)\s+(?<pid>\d+)\s+(?<source>[^ \]]+)\] (?<message>.*)/ time_format %m%d %H:%M:%S.%N path /var/log/kube-scheduler.log pos_file /var/log/es-kube-scheduler.log.pos tag kube-scheduler </source> # Example: # I0603 15:31:05.793605 6 cluster_manager.go:230] Reading config from path /etc/gce.conf <source> @id glbc.log @type tail format multiline multiline_flush_interval 5s format_firstline /^\w\d{4}/ format1 /^(?<severity>\w)(?<time>\d{4} [^\s]*)\s+(?<pid>\d+)\s+(?<source>[^ \]]+)\] (?<message>.*)/ time_format %m%d %H:%M:%S.%N path /var/log/glbc.log pos_file /var/log/es-glbc.log.pos tag glbc </source> # Example: # I0603 15:31:05.793605 6 cluster_manager.go:230] Reading config from path /etc/gce.conf <source> @id cluster-autoscaler.log @type tail format multiline multiline_flush_interval 5s format_firstline /^\w\d{4}/ format1 /^(?<severity>\w)(?<time>\d{4} [^\s]*)\s+(?<pid>\d+)\s+(?<source>[^ \]]+)\] (?<message>.*)/ time_format %m%d %H:%M:%S.%N path /var/log/cluster-autoscaler.log pos_file /var/log/es-cluster-autoscaler.log.pos tag cluster-autoscaler </source> # Logs from systemd-journal for interesting services. # TODO(random-liu): Remove this after cri container runtime rolls out. <source> @id journald-docker @type systemd matches [{ "_SYSTEMD_UNIT": "docker.service" }] <storage> @type local persistent true path /var/log/journald-docker.pos </storage> read_from_head true tag docker </source> <source> @id journald-container-runtime @type systemd matches [{ "_SYSTEMD_UNIT": "{{ fluentd_container_runtime_service }}.service" }] <storage> @type local persistent true path /var/log/journald-container-runtime.pos </storage> read_from_head true tag container-runtime </source> <source> @id journald-kubelet @type systemd matches [{ "_SYSTEMD_UNIT": "kubelet.service" }] <storage> @type local persistent true path /var/log/journald-kubelet.pos </storage> read_from_head true tag kubelet </source> <source> @id journald-node-problem-detector @type systemd matches [{ "_SYSTEMD_UNIT": "node-problem-detector.service" }] <storage> @type local persistent true path /var/log/journald-node-problem-detector.pos </storage> read_from_head true tag node-problem-detector </source> <source> @id kernel @type systemd matches [{ "_TRANSPORT": "kernel" }] <storage> @type local persistent true path /var/log/kernel.pos </storage> <entry> fields_strip_underscores true fields_lowercase true </entry> read_from_head true tag kernel </source> forward.input.conf: |- # Takes the messages sent over TCP <source> @id forward @type forward </source> monitoring.conf: |- # Prometheus Exporter Plugin # input plugin that exports metrics <source> @id prometheus @type prometheus </source> <source> @id monitor_agent @type monitor_agent </source> # input plugin that collects metrics from MonitorAgent <source> @id prometheus_monitor @type prometheus_monitor <labels> host ${hostname} </labels> </source> # input plugin that collects metrics for output plugin <source> @id prometheus_output_monitor @type prometheus_output_monitor <labels> host ${hostname} </labels> </source> # input plugin that collects metrics for in_tail plugin <source> @id prometheus_tail_monitor @type prometheus_tail_monitor <labels> host ${hostname} </labels> </source> output.conf: |- <match **> @id elasticsearch @type elasticsearch @log_level info type_name _doc include_tag_key true host elasticsearch port 9200 logstash_format true <buffer> @type file path /var/log/fluentd-buffers/kubernetes.system.buffer flush_mode interval retry_type exponential_backoff flush_thread_count 2 flush_interval 5s retry_forever retry_max_interval 30 chunk_limit_size 2M total_limit_size 500M overflow_action block </buffer> </match>
[root@srv1 k8s-elk-yaml]# vim 11-fluentd-es-ds.yaml apiVersion: v1 kind: ServiceAccount metadata: name: fluentd-es namespace: kube-logging labels: k8s-app: fluentd-es addonmanager.kubernetes.io/mode: Reconcile --- kind: ClusterRole apiVersion: rbac.authorization.k8s.io/v1 metadata: name: fluentd-es labels: k8s-app: fluentd-es addonmanager.kubernetes.io/mode: Reconcile rules: - apiGroups: - "" resources: - "namespaces" - "pods" verbs: - "get" - "watch" - "list" --- kind: ClusterRoleBinding apiVersion: rbac.authorization.k8s.io/v1 metadata: name: fluentd-es labels: k8s-app: fluentd-es addonmanager.kubernetes.io/mode: Reconcile subjects: - kind: ServiceAccount name: fluentd-es namespace: kube-logging apiGroup: "" roleRef: kind: ClusterRole name: fluentd-es apiGroup: "" --- apiVersion: apps/v1 kind: DaemonSet metadata: name: fluentd namespace: kube-logging labels: k8s-app: fluentd-es version: v3.1.1 addonmanager.kubernetes.io/mode: Reconcile spec: selector: matchLabels: k8s-app: fluentd-es version: v3.1.1 template: metadata: labels: k8s-app: fluentd-es version: v3.1.1 spec: securityContext: seccompProfile: type: RuntimeDefault priorityClassName: system-node-critical serviceAccountName: fluentd-es containers: - name: fluentd-es image: registry.cn-beijing.aliyuncs.com/dotbalo/fluentd:v3.1.0 env: - name: FLUENTD_ARGS value: --no-supervisor -q resources: limits: memory: 500Mi requests: cpu: 100m memory: 200Mi volumeMounts: - name: varlog mountPath: /var/log - name: varlibdockercontainers mountPath: /var/lib/docker/containers readOnly: true - name: config-volume mountPath: /etc/fluent/config.d ports: - containerPort: 24231 name: prometheus protocol: TCP livenessProbe: tcpSocket: port: prometheus initialDelaySeconds: 5 timeoutSeconds: 10 readinessProbe: tcpSocket: port: prometheus initialDelaySeconds: 5 timeoutSeconds: 10 terminationGracePeriodSeconds: 30 # 如果不需要采集所有机器的日志, 可将下面两行注释取消 # 并对所要采集日志的机器进行标签设定: kubectl label node srv1.1000y.cloud fluentd=true # 查看机器的标签: kubectl get node -l fluentd=true --show-labels #nodeSelector: # fluentd: "true" volumes: - name: varlog hostPath: path: /var/log - name: varlibdockercontainers hostPath: path: /var/lib/docker/containers - name: config-volume configMap: name: fluentd-es-config-v0.2.1
[root@srv1 k8s-elk-yaml]# kubectl apply -f 10-fluentd-es-configmap.yaml -f 11-fluentd-es-ds.yaml configmap/fluentd-es-config-v0.2.1 created serviceaccount/fluentd-es created clusterrole.rbac.authorization.k8s.io/fluentd-es created clusterrolebinding.rbac.authorization.k8s.io/fluentd-es created daemonset.apps/fluentd created
11) 确认pod及svc---srv1操作 [root@srv1 k8s-elk-yaml]# kubectl get pod,svc -n kube-logging NAME READY STATUS RESTARTS AGE pod/es-cluster-0 1/1 Running 0 18m pod/es-cluster-1 1/1 Running 0 14m pod/es-cluster-2 1/1 Running 0 11m pod/fluentd-494js 1/1 Running 0 2m13s pod/fluentd-mg44r 1/1 Running 0 2m13s pod/fluentd-s7t8m 1/1 Running 0 2m13s pod/kibana-757b69d4b9-6k8r5 1/1 Running 0 12m
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE service/elasticsearch ClusterIP None <none> 9200/TCP,9300/TCP 21m service/kibana NodePort 10.96.153.90 <none> 5601:30087/TCP 12m
12) 检查es集群的index生成情况---srv1操作 [root@srv1 k8s-elk-yaml]# curl http://172.30.172.145:9200/_cat/indices?v health status index uuid pri rep docs.count docs.deleted store.size pri.store.size green open logstash-2023.10.03 BUztejSySw-iCHdPiTN0kQ 1 1 28 0 615.5kb 397.2kb green open .kibana_1 JI9BvAaQREqWR4aIwufBrA 1 1 0 0 416b 208b green open logstash-2023.10.04 u3BxkolnSnmCkzDapWwrVA 1 1 258336 0 61.5mb 42.3mb green open logstash-1970.01.01 BP8eFmdaROOs_SbY68RJsw 1 1 512 0 134kb 76.5kb
13) 再次确认kibana service端口---srv1操作 [root@srv1 k8s-elk-yaml]# kubectl get svc -n kube-logging NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE elasticsearch ClusterIP None <none> 9200/TCP,9300/TCP 30m kibana NodePort 10.96.153.90 <none> 5601:30087/TCP 21m
14) 进入Kibana [浏览器]===> http://srv1.1000y.cloud:30087
# 如果出现: Kibana server is not ready yet 提示,多等一会。kibanfa准备中











2.21 Prometheus集群部署
1) 下载并安装helm
# ---如无特殊说明,以下操作均在srv1上执行
[root@srv1 ~]# wget https://get.helm.sh/helm-v3.13.0-linux-amd64.tar.gz
[root@srv1 ~]# tar xfz helm-v3.12.3-linux-amd64.tar.gz [root@srv1 ~]# mv linux-amd64/helm /usr/local/bin/ [root@srv1 ~]# rm -rf linux-amd64
[root@srv1 ~]# helm version version.BuildInfo{Version:"v3.12.3", GitCommit:"3a31588ad33fe3b89af5a2a54ee1d25bfe6eaa5e", GitTreeState:"clean", GoVersion:"go1.20.7"}
2) 添加helm源 [root@srv1 ~]# helm repo add appstore https://charts.grapps.cn "appstore" has been added to your repositories
[root@srv1 ~]# helm repo list NAME URL appstore https://charts.grapps.cn
[root@srv1 ~]# helm repo update appstore Hang tight while we grab the latest from your chart repositories... ...Successfully got an update from the "appstore" chart repository Update Complete. ⎈Happy Helming!⎈
################################################## 信息汇总 ################################################## 一: 互联网下载方式 1. 下载指定版本或默认版本的配置文件[默认版本不需要增加 --version 参数] # helm pull appstore/kube-prometheus-stack --version 48.2.3
2. 解开tgz包 # tar xfz kube-prometheus-stack-48.2.3.tgz
3. 修改所需要的values[如替换image信息] # vim kube-prometheus-stack/values.yaml
4. 安装[ helm install pod_name /path/app_name_dir] # helm install kube-prometheus-stack ./kube-prometheus-stack
二: Harbor中下载 1. Harbor中已经更改完成,可按以下步骤进行操作 [root@srv1 ~]# helm registry login srv7.1000y.cloud --insecure Username: admin Password: # 输入Harbor管理员密码 Login Succeeded
[root@srv1 ~]# helm pull oci://srv7.1000y.cloud/k8s/chart/kube-prometheus-stack \ --version=48.2.3 --insecure-skip-tls-verify Pulled: srv7.1000y.cloud/k8s/chart/kube-prometheus-stack:48.2.3 Digest: sha256:9b6c629781dd518e2ccbce25878e449d4ecf16cdb6c30922c40dda74dfa4b6a4
[root@srv1 ~]# tar xfz kube-prometheus-stack-48.2.3.tgz [root@srv1 ~]# helm install kube-prometheus-stack ./kube-prometheus-stack
################################################## 汇总结束 ##################################################
3) 安装Prometheus [root@srv1 ~]# helm search repo kube-prometheus-stack NAME CHART VERSION APP VERSION DESCRIPTION appstore/kube-prometheus-stack 51.2.0 v0.68.0 kube-prometheus-stack... appstore/prometheus-operator 8.0.1 0.67.1 Stripped down version...
[root@srv1 ~]# helm install kube-prometheus-stack \ appstore/kube-prometheus-stack --version 48.2.3 NAME: kube-prometheus-stack LAST DEPLOYED: Wed Oct 4 19:19:08 2023 NAMESPACE: default STATUS: deployed REVISION: 1 NOTES: kube-prometheus-stack has been installed. Check its status by running: kubectl --namespace default get pods -l "release=kube-prometheus-stack"
Visit https://github.com/prometheus-operator/kube-prometheus for instructions on how to create & configure Alertmanager and Prometheus instances using the Operator.
################################################## 问题汇总 ################################################## 1. 出现如下错误: Error: INSTALLATION FAILED: failed pre-install: 1 error occurred: * timed out waiting for the condition
2. 执行以下操作 [root@srv1 ~]# helm uninstall kube-prometheus-stack release "kube-prometheus-stack" uninstalled
[root@srv1 ~]# helm install kube-prometheus-stack \ appstore/kube-prometheus-stack --version 48.2.3
################################################## 汇总结束 ##################################################
4) 确认Prometheus相关的Pod运行正常 [root@srv1 ~]# kubectl get pod NAME READY STATUS RESTARTS AGE alertmanager-kube-prometheus-stack-alertmanager-0 2/2 Running 0 13m kube-prometheus-stack-grafana-6b49775fcc-2q2zp 3/3 Running 3 (10m ago) 28m kube-prometheus-stack-kube-state-metrics-66769fc5f5-cksh4 1/1 Running 0 28m kube-prometheus-stack-operator-79c4b88765-mlzt6 1/1 Running 0 28m kube-prometheus-stack-prometheus-node-exporter-2d6sl 1/1 Running 0 28m kube-prometheus-stack-prometheus-node-exporter-6nlh4 1/1 Running 0 28m kube-prometheus-stack-prometheus-node-exporter-7vvt5 1/1 Running 0 28m kube-prometheus-stack-prometheus-node-exporter-g47gs 1/1 Running 0 28m kube-prometheus-stack-prometheus-node-exporter-p48cz 1/1 Running 0 28m kube-prometheus-stack-prometheus-node-exporter-pqjj8 1/1 Running 0 28m prometheus-kube-prometheus-stack-prometheus-0 2/2 Running 0 13m
5) 更改Grafana的端口类型为NodePort [root@srv1 ~]# kubectl edit svc kube-prometheus-stack-grafana ...... ...... ...... ...... ...... ...... ports: - name: http-web # 于33行追加如下内容 nodePort: 32000 port: 80 protocol: TCP targetPort: 3000 selector: app.kubernetes.io/instance: kube-prometheus-stack app.kubernetes.io/name: grafana sessionAffinity: None # 将40行改为以下内容 type: NodePort status: loadBalancer: {}
[root@srv1 ~]# kubectl get svc kube-prometheus-stack-grafana NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE kube-prometheus-stack-grafana NodePort 10.99.251.178 <none> 80:32000/TCP 40m
5) 获取Grafana的管理员账户及密码 [root@srv1 ~]# kubectl get secrets kube-prometheus-stack-grafana -o jsonpath='{.data.admin-user}' | base64 -d admin
[root@srv1 ~]# kubectl get secrets kube-prometheus-stack-grafana -o jsonpath='{.data.admin-password}' | base64 -d prom-operator
6) 登录Grafana面板并添加数据源 [浏览器]===>http://srv1.1000y.cloud:32000








2.22 结合Harbor仓库
1) 生成证书---srv7操作
[root@srv7 ~]# vim /etc/pki/tls/openssl.cnf
......
......
......
......
......
......
# 172行,将本机改为CA认证中心 basicConstraints=CA:TRUE
...... ...... ...... ...... ...... ......
# 创建root CA [root@srv7 ~]# /etc/pki/tls/misc/CA -newca CA certificate filename (or enter to create) # 回车 Making CA certificate ... Generating a 2048 bit RSA private key .................+++ .........................................+++ writing new private key to '/etc/pki/CA/private/./cakey.pem' Enter PEM pass phrase: # 输入密码 Verifying - Enter PEM pass phrase: # 确认密码 ----- You are about to be asked to enter information that will be incorporated into your certificate request. What you are about to enter is what is called a Distinguished Name or a DN. There are quite a few fields but you can leave some blank For some fields there will be a default value, If you enter '.', the field will be left blank. ----- Country Name (2 letter code) [XX]:CN State or Province Name (full name) []:BeiJing Locality Name (eg, city) [Default City]:BeiJing Organization Name (eg, company) [Default Company Ltd]:1000y.cloud Organizational Unit Name (eg, section) []:tech Common Name (eg, your name or your server's hostname) []:1000y.cloud Email Address []:
Please enter the following 'extra' attributes to be sent with your certificate request A challenge password []: An optional company name []: Using configuration from /etc/pki/tls/openssl.cnf Enter pass phrase for /etc/pki/CA/private/./cakey.pem: # 输入密码 Check that the request matches the signature Signature ok Certificate Details: Serial Number: ee:ed:72:48:b5:f5:a5:8c Validity Not Before: Oct 4 13:22:32 2023 GMT Not After : Oct 3 13:22:32 2026 GMT Subject: countryName = CN stateOrProvinceName = BeiJing organizationName = 1000y.cloud organizationalUnitName = tech commonName = 1000y.cloud X509v3 extensions: X509v3 Subject Key Identifier: C1:58:87:3A:57:99:08:9E:47:14:3B:5D:71:B4:D9:96:E9:FE:E2:3E X509v3 Authority Key Identifier: keyid:C1:58:87:3A:57:99:08:9E:47:14:3B:5D:71:B4:D9:96:E9:FE:E2:3E
X509v3 Basic Constraints: CA:TRUE Certificate is to be certified until Oct 3 13:22:32 2026 GMT (1095 days)
Write out database with 1 new entries Data Base Updated
[root@srv7 ~]# mkdir /opt/docker/registry/certs/ [root@srv7 ~]# cd /opt/docker/registry/certs/
[root@srv7 certs]# openssl genrsa -aes128 2048 > domain.key Enter PEM pass phrase: # 输入密码 Verifying - Enter PEM pass phrase:
[root@srv7 certs]# openssl rsa -in domain.key -out domain.key Enter pass phrase for server.key: # 输入密码 writing RSA key
[root@srv7 certs]# openssl req -utf8 -new -key domain.key -out domain.csr You are about to be asked to enter information that will be incorporated into your certificate request. What you are about to enter is what is called a Distinguished Name or a DN. There are quite a few fields but you can leave some blank For some fields there will be a default value, If you enter '.', the field will be left blank. ----- Country Name (2 letter code) [AU]:CN State or Province Name (full name) [Some-State]:BeiJing Locality Name (eg, city) []:BeiJing Organization Name (eg, company) [Internet Widgits Pty Ltd]:1000y.cloud Organizational Unit Name (eg, section) []:Tech Common Name (e.g. server FQDN or YOUR name) []:srv7.1000y.cloud Email Address []:
Please enter the following 'extra' attributes to be sent with your certificate request A challenge password []: An optional company name []:
[root@srv7 certs]# vim /etc/pki/tls/openssl.cnf ...... ...... ...... ...... ...... ......
# 于文件最后追加如下内容 [ 1000y.cloud ] subjectAltName = DNS:srv7.1000y.cloud, IP:192.168.1.17
[root@srv7 certs]# openssl ca \ -keyfile /etc/pki/CA/private/cakey.pem \ -cert /etc/pki/CA/cacert.pem -in ./domain.csr \ -out ./domain.crt \ -extfile /etc/pki/tls/openssl.cnf \ -extensions 1000y.cloud Enter pass phrase for /etc/pki/CA/private/cakey.pem # 输入CA密码 Check that the request matches the signature Signature ok Certificate Details: Serial Number: ee:ed:72:48:b5:f5:a5:8d Validity Not Before: Oct 4 13:26:08 2023 GMT Not After : Oct 3 13:26:08 2024 GMT Subject: countryName = CN stateOrProvinceName = BeiJing organizationName = 1000y.cloud organizationalUnitName = tech commonName = srv7.1000y.cloud X509v3 extensions: X509v3 Subject Alternative Name: DNS:srv7.1000y.cloud, IP Address:192.168.1.17 Certificate is to be certified until Oct 3 13:26:08 2024 GMT (365 days) Sign the certificate? [y/n]:y
1 out of 1 certificate requests certified, commit? [y/n]y Write out database with 1 new entries Data Base Updated Using configuration from /etc/pki/tls/openssl.cnf
[root@srv7 certs]# cd
[root@srv7 ~]# cat /etc/pki/CA/cacert.pem >> /etc/pki/tls/certs/ca-bundle.crt
2) 部署Harbor---srv7操作 (1) 安装好docker及docker-compose工具
(2) 配置并部署Harbor [root@srv7 ~]# curl -O \ https://github.com/goharbor/harbor/releases/download/v2.9.0/harbor-offline-installer-v2.9.0.tgz
[root@srv7 ~]# tar xfz harbor-offline-installer-v2.9.0.tgz [root@srv7 ~]# cd harbor/ [root@srv7 harbor]# cp harbor.yml.tmpl harbor.yml
[root@srv7 harbor]# vim harbor.yml ...... ...... ...... ...... ...... ...... # 修改第5行,更改HarBor主机名称 hostname: srv7.1000y.cloud
...... ...... ...... ...... ...... ...... # 修改第17-18行,更改证书所在路径及文件名 certificate: /opt/docker/registry/certs/domain.crt private_key: /opt/docker/registry/certs/domain.key
...... ...... ...... ...... ...... ...... # 注意36行,harbor_admin_password的密码 arbor_admin_password: 123456
...... ...... ...... ...... ...... ......
[root@srv7 harbor]# ./prepare prepare base dir is set to /root/harbor Unable to find image 'goharbor/prepare:v2.9.0' locally Trying to pull repository docker.io/goharbor/prepare ... v2.9.0: Pulling from docker.io/goharbor/prepare ...... ...... ...... ...... ...... ...... Successfully called func: create_root_cert Generated configuration file: /compose_location/docker-compose.yml Clean up the input dir
[root@srv7 harbor]# ./install.sh
[Step 0]: checking if docker is installed ...
...... ...... ...... ...... ...... ...... ✔ ----Harbor has been installed and started successfully.----
[root@srv7 harbor]# docker-compose ps
################################################## 错误汇总 ################################################## 1. 如果重启docker服务后,可能会导致harbor有些进程无法启动,导致无法访问harbor.可按以下操作 [root@srv7 harbor]# docker-compose ps [root@srv7 harbor]# cd harbor/ [root@srv7 harbor]# docker-compose up -d
2. 停止所有由docker-compose启动的服务 [root@srv7 harbor]# docker-compose stop
################################################## 汇总结束 ##################################################
(3) 配置并部署Harbor---srv7操作 [浏览器]---> http://harbor_srv_fqdn/

# 进入后创建一个名为 k8s 的项目
(4) 测试 [root@srv7 harbor]# vim /etc/docker/daemon.json { "registry-mirrors": ["https://3laho3y3.mirror.aliyuncs.com"], "insecure-registries": ["https://srv7.1000y.cloud"] }
[root@srv7 harbor]# systemctl restart docker
(5) 测试 [root@srv7 harbor]# docker images REPOSITORY TAG IMAGE ID CREATED SIZE nginx latest 605c77e624dd 21 months ago 141 MB
[root@srv7 harbor]# docker tag nginx:latest srv7.1000y.cloud/k8s/nginx
[root@srv7 harbor]# docker login srv7.1000y.cloud Username: admin Password: Login Succeeded
[root@srv7 harbor]# docker push srv7.1000y.cloud/k8s/nginx The push refers to a repository [srv2.1000y.cloud/library/nginx] d874fd2bc83b: Pushed 32ce5f6a5106: Pushed f1db227348d0: Pushed b8d6e692a25e: Pushed e379e8aedd4d: Pushed 2edcec3590a4: Pushed latest: digest: sha256:ee89b00528ff4f02f2405e4ee221743ebc3f8e8dd0bfd5c4c20a2fa2aaa7ede3 size: 1570
3) Containerd配置 (1) 将srv7上的cacert.pem复制到k8s所有节点 [root@srv1 ~]# scp srv7.1000y.cloud:/etc/pki/CA/cacert.pem . [root@srv1 ~]# cat cacert.pem >> /etc/pki/tls/certs/ca-bundle.crt
[root@srv1 ~]# for node in srv2.1000y.cloud srv3.1000y.cloud srv4.1000y.cloud srv5.1000y.cloud srv6.1000y.cloud do scp ./cacert.pem $node:~ ssh $node "cat cacert.pem >> /etc/pki/tls/certs/ca-bundle.crt" done
(2) 所有节点配置Containerd配置文件 [root@srv1 ~]# vim /etc/containerd/config.toml ...... ...... ...... ...... ...... ...... # 修改144-155行为以下内容 [plugins."io.containerd.grpc.v1.cri".registry] [plugins."io.containerd.grpc.v1.cri".registry."srv7.1000y.cloud"] config_path = ""
[plugins."io.containerd.grpc.v1.cri".registry.auths] [plugins."io.containerd.grpc.v1.cri".registry.configs."srv7.1000y.cloud".auth] username = "admin" password = "123456"
[plugins."io.containerd.grpc.v1.cri".registry.configs] [plugins."io.containerd.grpc.v1.cri".registry.configs."srv7.1000y.cloud".tls] insecure_skip_verify = true
...... ...... ...... ...... ...... ......
[root@srv1 ~]# systemctl restart containerd
(3) Containerd镜像拉取测试---srv1上操作 [root@srv1 ~]# crictl pull srv7.1000y.cloud/k8s/nginx Image is up to date for sha256:605c77e624ddb75e6110f997c58876baa13f8754486b461117934b24a9dc3a85
[root@srv1 ~]# crictl images | grep srv7 srv7.1000y.cloud/k8s/nginx latest 605c77e624ddb 56.7MB
[root@srv1 ~]# crictl rmi srv7.1000y.cloud/k8s/nginx Deleted: srv7.1000y.cloud/k8s/nginx:latest
4) 与kubernetes结合---srv1上操作 (1) 创建并确认regcred [root@srv1 ~]# kubectl create secret docker-registry regcred \ --docker-server=srv7.1000y.cloud \ --docker-username=admin \ --docker-password=123456 secret/regcred created
[root@srv1 ~]# kubectl get secrets | grep regcred regcred kubernetes.io/dockerconfigjson 1 22s
(2) 查看regcred详细信息 [root@srv1 ~]# kubectl get secret regcred --output=yaml apiVersion: v1 data: .dockerconfigjson: eyJhdXRocyI6eyJzcnY3LjEwMDB5LmNsb3VkIjp7InVzZXJuYW1lIjoiYWRtaW4iLCJwYXNzd29yZCI6Ij EyMzQ1NiIsImF1dGgiOiJZV1J0YVc0Nk1USXpORFUyIn19fQ== kind: Secret metadata: creationTimestamp: "2023-10-04T15:07:54Z" name: regcred namespace: default resourceVersion: "142119" uid: 3a27d14f-01c5-4288-8204-2f8e3a9ddb34 type: kubernetes.io/dockerconfigjson
(3) 用base64查看dockerconfigjson中所包含 的用户名和密码等信息 [root@srv1 ~]# kubectl get secret regcred \ --output="jsonpath={.data.\.dockerconfigjson}" | base64 -d {"auths":{"srv7.1000y.cloud":{"username":"admin","password":"123456","auth":"YWRtaW46MTIzNDU2"}}}
5) 集成测试---srv1上操作 (1) 创建一个 Pod 测试 [root@srv1 ~]# vim private-nginx.yml # 于新文件内添加如下内容 apiVersion: v1 kind: Pod metadata: name: private-nginx spec: containers: - name: private-nginx # 设定私有仓库及镜像 image: srv7.1000y.cloud/k8s/nginx imagePullSecrets: # 添加认证名称 - name: regcred
[root@srv1 ~]# kubectl create -f private-nginx.yml pod/private-nginx created
(2) 确认镜像来源 [root@srv1 ~]# kubectl get pods | grep private-nginx private-nginx 1/1 Running 0 2m7s
[root@srv1 ~]# kubectl describe pods private-nginx | grep Image: Image: srv7.1000y.cloud/k8s/nginx
(3) 删除pods [root@srv1 ~]# kubectl delete -f private-nginx.yml pod "private-nginx" deleted
2.23 安装自动补全功能
 
[root@srv1 ~]# yum install bash-completion -y
[root@srv1 ~]# source /usr/share/bash-completion/bash_completion
[root@srv1 ~]# source <(kubectl completion bash)
[root@srv1 ~]# echo "source <(kubectl completion bash)" >> ~/.bashrc  
2.24 国内镜像仓库
cr.l5d.io/  ===> m.daocloud.io/cr.l5d.io/
docker.elastic.co/  ===> m.daocloud.io/docker.elastic.co/
docker.io/  ===> m.daocloud.io/docker.io/
gcr.io/  ===> m.daocloud.io/gcr.io/
ghcr.io/  ===> m.daocloud.io/ghcr.io/
k8s.gcr.io/  ===> m.daocloud.io/k8s.gcr.io/
mcr.microsoft.com/  ===> m.daocloud.io/mcr.microsoft.com/
nvcr.io/  ===> m.daocloud.io/nvcr.io/
quay.io/  ===> m.daocloud.io/quay.io/
registry.jujucharms.com/  ===> m.daocloud.io/registry.jujucharms.com/
registry.k8s.io/  ===> m.daocloud.io/registry.k8s.io/
registry.opensource.zalan.do/  ===> m.daocloud.io/registry.opensource.zalan.do/
rocks.canonical.com/  ===> m.daocloud.io/rocks.canonical.com/
2.25 安装时etcd服务翻滚
1) etcd节点翻滚 
[root@srv1 ~]# etcdctl \
--endpoints="192.168.1.11:2379,192.168.1.12:2379,192.168.1.13:2379" \
--cacert=/etc/etcd/ssl/etcd-ca.pem --cert=/etc/etcd/ssl/etcd.pem \
--key=/etc/etcd/ssl/etcd-key.pem  endpoint health --write-out=table
{"level":"warn","ts":"2023-10-05T15:59:52.615519+0800","logger":"client","caller":
"v3@v3.5.9/retry_interceptor.go:62","msg":"retrying of unary invoker failed","target":
"etcd-endpoints://0xc0003b4a80/192.168.1.11:2379","attempt":0,"error":"rpc error: code = 
DeadlineExceeded desc = latest balancer error: last connection error: connection error: 
desc = \"transport: Error while dialing dial tcp 192.168.1.11:2379: connect: connection refused\""}
+-------------------+--------+--------------+---------------------------+
|     ENDPOINT      | HEALTH |     TOOK     |           ERROR           |
+-------------------+--------+--------------+---------------------------+
| 192.168.1.12:2379 |   true | 1.350706779s |                           |
| 192.168.1.13:2379 |   true | 1.204151786s |                           |
| 192.168.1.11:2379 |  false | 5.001363217s | context deadline exceeded |
+-------------------+--------+--------------+---------------------------+
[root@srv1 ~]# etcdctl \ --endpoints="192.168.1.11:2379,192.168.1.12:2379,192.168.1.13:2379" \ --cacert=/etc/etcd/ssl/etcd-ca.pem --cert=/etc/etcd/ssl/etcd.pem \ --key=/etc/etcd/ssl/etcd-key.pem member list --write-out=table +------------------+---------+------------------+---------------------------+---------------------------+------------+ | ID | STATUS | NAME | PEER ADDRS | CLIENT ADDRS | IS LEARNER | +------------------+---------+------------------+---------------------------+---------------------------+------------+ | ac7e57d44f030e8 | started | srv2.1000y.cloud | https://192.168.1.12:2380 | https://192.168.1.12:2379 | false | | 40ba37809e1a423f | started | srv3.1000y.cloud | https://192.168.1.13:2380 | https://192.168.1.13:2379 | false | | 486c1127759f2e55 | started | srv1.1000y.cloud | https://192.168.1.11:2380 | https://192.168.1.11:2379 | false | +------------------+---------+------------------+---------------------------+---------------------------+------------+
2) 移除etcd故障节点 [root@srv1 ~]# etcdctl \ --endpoints="192.168.1.11:2379,192.168.1.12:2379,192.168.1.13:2379" \ --cacert=/etc/etcd/ssl/etcd-ca.pem \ --cert=/etc/etcd/ssl/etcd.pem \ --key=/etc/etcd/ssl/etcd-key.pem member remove ce146ffbf5fd1c12 Member ce146ffbf5fd1c12 removed from cluster a3af6d9a178535cc
3) 清除etcd故障节点的数据 [root@srv1 ~]# cd /var/lib/etcd [root@srv1 etcd]# rm -rf * [root@srv1 ~]# cd
4) 重写etcd配置文件 [root@srv1 ~]# vim /etc/etcd/etcd.config.yml ...... ...... ...... ...... ...... ...... initial-cluster-token: 'etcd-k8s-cluster' # 修改21行,其内容如下 initial-cluster-state: 'existing' strict-reconfig-check: false ...... ...... ...... ...... ...... ......
5) 重新将etcd节点加入集群 [root@srv1 ~]# etcdctl \ --endpoints="192.168.1.11:2379,192.168.1.12:2379,192.168.1.13:2379" \ --cacert=/etc/etcd/ssl/etcd-ca.pem \ --cert=/etc/etcd/ssl/etcd.pem \ --key=/etc/etcd/ssl/etcd-key.pem member add srv1.1000y.cloud \ --peer-urls=https://192.168.1.11:2380 Member 486c1127759f2e55 added to cluster a3af6d9a178535cc
ETCD_NAME="srv1.1000y.cloud" ETCD_INITIAL_CLUSTER="srv2.1000y.cloud=https://192.168.1.12:2380,srv3.1000y.cloud=https://192.168.1.13:2380,srv1.1000y.cloud=https://192.168.1.11:2380" ETCD_INITIAL_ADVERTISE_PEER_URLS="https://192.168.1.11:2380" ETCD_INITIAL_CLUSTER_STATE="existing"
6) 重启etcd节点 [root@srv1 ~]# systemctl restart etcd
7) 确认 [root@srv1 ~]# etcdctl \ --endpoints="192.168.1.11:2379,192.168.1.12:2379,192.168.1.13:2379" \ --cacert=/etc/etcd/ssl/etcd-ca.pem \ --cert=/etc/etcd/ssl/etcd.pem \ --key=/etc/etcd/ssl/etcd-key.pem endpoint health \ --write-out=table +-------------------+--------+--------------+-------+ | ENDPOINT | HEALTH | TOOK | ERROR | +-------------------+--------+--------------+-------+ | 192.168.1.12:2379 | true | 358.51544ms | | | 192.168.1.13:2379 | true | 358.494477ms | | | 192.168.1.11:2379 | true | 378.163923ms | | +-------------------+--------+--------------+-------+
2.26 calico-kube-controllers Pod无法正常running
1) 查找 calico-kube-controllers-xxxxxxxxxx-xxxxx 所在节点
2) 重启 kubelet及kube-proxy服务---[可全节点全部重启]
3. 使用Prometheus+Grafana监控---非helm安装
3.1 安装Git并clone kube-prometheus
1) git工具的安装
[root@srv1 ~]# yum install git -y
2) Clone kube-prometheus [root@srv1 ~]# git clone https://github.com/prometheus-operator/kube-prometheus.git Cloning into 'kube-prometheus'... remote: Enumerating objects: 16411, done. remote: Counting objects: 100% (332/332), done. remote: Compressing objects: 100% (130/130), done. remote: Total 16411 (delta 235), reused 252 (delta 190), pack-reused 16079 Receiving objects: 100% (16411/16411), 8.20 MiB | 950.00 KiB/s, done. Resolving deltas: 100% (10565/10565), done.
3) 确认kube-prometheus文件 [root@srv1 ~]# cd kube-prometheus/manifests
4) 修改源 [root@srv1 ~/kube-prometheus/manifests]# sed -i 's/quay.io/quay.mirrors.ustc.edu.cn/g' `grep -ril quay.io ./`
[root@srv1 ~/kube-prometheus/manifests]# grep -ri registry.k8s.io ./ ./prometheusAdapter-deployment.yaml: image: registry.k8s.io/prometheus-adapter/prometheus-adapter:v0.11.1 ./kubeStateMetrics-deployment.yaml: image: registry.k8s.io/kube-state-metrics/kube-state-metrics:v2.9.2
[root@srv1 ~/kube-prometheus/manifests]# sed -i 's#registry.k8s.io/prometheus-adapter/prometheus-adapter:v0.11.1#xuxiaoweicomcn/prometheus-adapter:v0.11.1#g' ./prometheusAdapter-deployment.yaml
root@srv1:~/kube-prometheus/manifests# sed -i 's#registry.k8s.io/kube-state-metrics/kube-state-metrics:v2.9.2#bitnami/kube-state-metrics:2.9.2#g' ./kubeStateMetrics-deployment.yaml
5) 修改promethes,alertmanager,grafana的service类型为NodePort类型 root@srv1:~/kube-prometheus/manifests# vim prometheus-service.yaml apiVersion: v1 kind: Service metadata: labels: app.kubernetes.io/component: prometheus app.kubernetes.io/instance: k8s app.kubernetes.io/name: prometheus app.kubernetes.io/part-of: kube-prometheus app.kubernetes.io/version: 2.36.2 name: prometheus-k8s namespace: monitoring spec: # 新增 type: NodePort ports: - name: web port: 9090 targetPort: web # 新增 nodePort: 30090 - name: reloader-web port: 8080 targetPort: reloader-web selector: app.kubernetes.io/component: prometheus app.kubernetes.io/instance: k8s app.kubernetes.io/name: prometheus app.kubernetes.io/part-of: kube-prometheus sessionAffinity: ClientIP
root@srv1:~/kube-prometheus/manifests# vim alertmanager-service.yaml apiVersion: v1 kind: Service metadata: labels: app.kubernetes.io/component: alert-router app.kubernetes.io/instance: main app.kubernetes.io/name: alertmanager app.kubernetes.io/part-of: kube-prometheus app.kubernetes.io/version: 0.24.0 name: alertmanager-main namespace: monitoring spec: # 新增 type: NodePort ports: - name: web port: 9093 targetPort: web # 新增 nodePort: 30093 - name: reloader-web port: 8080 targetPort: reloader-web selector: app.kubernetes.io/component: alert-router app.kubernetes.io/instance: main app.kubernetes.io/name: alertmanager app.kubernetes.io/part-of: kube-prometheus sessionAffinity: ClientIP
root@srv1:~/kube-prometheus/manifests# vim grafana-service.yaml apiVersion: v1 kind: Service metadata: labels: app.kubernetes.io/component: grafana app.kubernetes.io/name: grafana app.kubernetes.io/part-of: kube-prometheus app.kubernetes.io/version: 9.0.1 name: grafana namespace: monitoring spec: # 新增 type: NodePort ports: - name: http port: 3000 targetPort: http # 新增 nodePort: 32000 selector: app.kubernetes.io/component: grafana app.kubernetes.io/name: grafana app.kubernetes.io/part-of: kube-prometheus
3.2 安装kube-prometheus并确认状态
1)生成DashBoard
[root@srv1 ~/kube-prometheus/manifests]# kubectl apply --server-side -f setup/
customresourcedefinition.apiextensions.k8s.io/alertmanagerconfigs.monitoring.coreos.com created
customresourcedefinition.apiextensions.k8s.io/alertmanagers.monitoring.coreos.com created
customresourcedefinition.apiextensions.k8s.io/podmonitors.monitoring.coreos.com created
customresourcedefinition.apiextensions.k8s.io/probes.monitoring.coreos.com created
customresourcedefinition.apiextensions.k8s.io/prometheuses.monitoring.coreos.com created
customresourcedefinition.apiextensions.k8s.io/prometheusagents.monitoring.coreos.com created
customresourcedefinition.apiextensions.k8s.io/prometheusrules.monitoring.coreos.com created
customresourcedefinition.apiextensions.k8s.io/scrapeconfigs.monitoring.coreos.com created
customresourcedefinition.apiextensions.k8s.io/servicemonitors.monitoring.coreos.com created
customresourcedefinition.apiextensions.k8s.io/thanosrulers.monitoring.coreos.com created
namespace/monitoring created
[root@srv1 manifests]# kubectl wait --for condition=Established \ --all CustomResourceDefinition --namespace=monitoring customresourcedefinition.apiextensions.k8s.io/alertmanagerconfigs.monitoring.coreos.com condition met customresourcedefinition.apiextensions.k8s.io/alertmanagers.monitoring.coreos.com condition met customresourcedefinition.apiextensions.k8s.io/bgpconfigurations.crd.projectcalico.org condition met customresourcedefinition.apiextensions.k8s.io/bgpfilters.crd.projectcalico.org condition met customresourcedefinition.apiextensions.k8s.io/bgppeers.crd.projectcalico.org condition met customresourcedefinition.apiextensions.k8s.io/blockaffinities.crd.projectcalico.org condition met customresourcedefinition.apiextensions.k8s.io/caliconodestatuses.crd.projectcalico.org condition met customresourcedefinition.apiextensions.k8s.io/clusterinformations.crd.projectcalico.org condition met customresourcedefinition.apiextensions.k8s.io/felixconfigurations.crd.projectcalico.org condition met customresourcedefinition.apiextensions.k8s.io/globalnetworkpolicies.crd.projectcalico.org condition met customresourcedefinition.apiextensions.k8s.io/globalnetworksets.crd.projectcalico.org condition met customresourcedefinition.apiextensions.k8s.io/hostendpoints.crd.projectcalico.org condition met customresourcedefinition.apiextensions.k8s.io/ipamblocks.crd.projectcalico.org condition met customresourcedefinition.apiextensions.k8s.io/ipamconfigs.crd.projectcalico.org condition met customresourcedefinition.apiextensions.k8s.io/ipamhandles.crd.projectcalico.org condition met customresourcedefinition.apiextensions.k8s.io/ippools.crd.projectcalico.org condition met customresourcedefinition.apiextensions.k8s.io/ipreservations.crd.projectcalico.org condition met customresourcedefinition.apiextensions.k8s.io/kubecontrollersconfigurations.crd.projectcalico.org ... customresourcedefinition.apiextensions.k8s.io/networkpolicies.crd.projectcalico.org condition met customresourcedefinition.apiextensions.k8s.io/networksets.crd.projectcalico.org condition met customresourcedefinition.apiextensions.k8s.io/podmonitors.monitoring.coreos.com condition met customresourcedefinition.apiextensions.k8s.io/probes.monitoring.coreos.com condition met customresourcedefinition.apiextensions.k8s.io/prometheusagents.monitoring.coreos.com condition met customresourcedefinition.apiextensions.k8s.io/prometheuses.monitoring.coreos.com condition met customresourcedefinition.apiextensions.k8s.io/prometheusrules.monitoring.coreos.com condition met customresourcedefinition.apiextensions.k8s.io/scrapeconfigs.monitoring.coreos.com condition met customresourcedefinition.apiextensions.k8s.io/servicemonitors.monitoring.coreos.com condition met customresourcedefinition.apiextensions.k8s.io/thanosrulers.monitoring.coreos.com condition met
2)安装prometheus, alertmanager, grafana, kube-state-metrics, node-exporter等资源 [root@srv1 ~/kube-prometheus/manifests]# kubectl create -f . alertmanager.monitoring.coreos.com/main created networkpolicy.networking.k8s.io/alertmanager-main created poddisruptionbudget.policy/alertmanager-main created prometheusrule.monitoring.coreos.com/alertmanager-main-rules created secret/alertmanager-main created service/alertmanager-main created serviceaccount/alertmanager-main created servicemonitor.monitoring.coreos.com/alertmanager-main created clusterrole.rbac.authorization.k8s.io/blackbox-exporter created clusterrolebinding.rbac.authorization.k8s.io/blackbox-exporter created configmap/blackbox-exporter-configuration created deployment.apps/blackbox-exporter created networkpolicy.networking.k8s.io/blackbox-exporter created service/blackbox-exporter created serviceaccount/blackbox-exporter created servicemonitor.monitoring.coreos.com/blackbox-exporter created secret/grafana-config created secret/grafana-datasources created configmap/grafana-dashboard-alertmanager-overview created configmap/grafana-dashboard-apiserver created configmap/grafana-dashboard-cluster-total created configmap/grafana-dashboard-controller-manager created configmap/grafana-dashboard-grafana-overview created configmap/grafana-dashboard-k8s-resources-cluster created configmap/grafana-dashboard-k8s-resources-multicluster created configmap/grafana-dashboard-k8s-resources-namespace created configmap/grafana-dashboard-k8s-resources-node created configmap/grafana-dashboard-k8s-resources-pod created configmap/grafana-dashboard-k8s-resources-workload created configmap/grafana-dashboard-k8s-resources-workloads-namespace created configmap/grafana-dashboard-kubelet created configmap/grafana-dashboard-namespace-by-pod created configmap/grafana-dashboard-namespace-by-workload created configmap/grafana-dashboard-node-cluster-rsrc-use created configmap/grafana-dashboard-node-rsrc-use created configmap/grafana-dashboard-nodes-darwin created configmap/grafana-dashboard-nodes created configmap/grafana-dashboard-persistentvolumesusage created configmap/grafana-dashboard-pod-total created configmap/grafana-dashboard-prometheus-remote-write created configmap/grafana-dashboard-prometheus created configmap/grafana-dashboard-proxy created configmap/grafana-dashboard-scheduler created configmap/grafana-dashboard-workload-total created configmap/grafana-dashboards created deployment.apps/grafana created networkpolicy.networking.k8s.io/grafana created prometheusrule.monitoring.coreos.com/grafana-rules created service/grafana created serviceaccount/grafana created servicemonitor.monitoring.coreos.com/grafana created prometheusrule.monitoring.coreos.com/kube-prometheus-rules created clusterrole.rbac.authorization.k8s.io/kube-state-metrics created clusterrolebinding.rbac.authorization.k8s.io/kube-state-metrics created deployment.apps/kube-state-metrics created networkpolicy.networking.k8s.io/kube-state-metrics created prometheusrule.monitoring.coreos.com/kube-state-metrics-rules created service/kube-state-metrics created serviceaccount/kube-state-metrics created servicemonitor.monitoring.coreos.com/kube-state-metrics created prometheusrule.monitoring.coreos.com/kubernetes-monitoring-rules created servicemonitor.monitoring.coreos.com/kube-apiserver created servicemonitor.monitoring.coreos.com/coredns created servicemonitor.monitoring.coreos.com/kube-controller-manager created servicemonitor.monitoring.coreos.com/kube-scheduler created servicemonitor.monitoring.coreos.com/kubelet created clusterrole.rbac.authorization.k8s.io/node-exporter created clusterrolebinding.rbac.authorization.k8s.io/node-exporter created daemonset.apps/node-exporter created networkpolicy.networking.k8s.io/node-exporter created prometheusrule.monitoring.coreos.com/node-exporter-rules created service/node-exporter created serviceaccount/node-exporter created servicemonitor.monitoring.coreos.com/node-exporter created clusterrole.rbac.authorization.k8s.io/prometheus-k8s created clusterrolebinding.rbac.authorization.k8s.io/prometheus-k8s created networkpolicy.networking.k8s.io/prometheus-k8s created poddisruptionbudget.policy/prometheus-k8s created prometheus.monitoring.coreos.com/k8s created prometheusrule.monitoring.coreos.com/prometheus-k8s-prometheus-rules created rolebinding.rbac.authorization.k8s.io/prometheus-k8s-config created rolebinding.rbac.authorization.k8s.io/prometheus-k8s created rolebinding.rbac.authorization.k8s.io/prometheus-k8s created rolebinding.rbac.authorization.k8s.io/prometheus-k8s created role.rbac.authorization.k8s.io/prometheus-k8s-config created role.rbac.authorization.k8s.io/prometheus-k8s created role.rbac.authorization.k8s.io/prometheus-k8s created role.rbac.authorization.k8s.io/prometheus-k8s created service/prometheus-k8s created serviceaccount/prometheus-k8s created servicemonitor.monitoring.coreos.com/prometheus-k8s created clusterrole.rbac.authorization.k8s.io/prometheus-adapter created clusterrolebinding.rbac.authorization.k8s.io/prometheus-adapter created clusterrolebinding.rbac.authorization.k8s.io/resource-metrics:system:auth-delegator created clusterrole.rbac.authorization.k8s.io/resource-metrics-server-resources created configmap/adapter-config created deployment.apps/prometheus-adapter created networkpolicy.networking.k8s.io/prometheus-adapter created poddisruptionbudget.policy/prometheus-adapter created rolebinding.rbac.authorization.k8s.io/resource-metrics-auth-reader created service/prometheus-adapter created serviceaccount/prometheus-adapter created servicemonitor.monitoring.coreos.com/prometheus-adapter created clusterrole.rbac.authorization.k8s.io/prometheus-operator created clusterrolebinding.rbac.authorization.k8s.io/prometheus-operator created deployment.apps/prometheus-operator created networkpolicy.networking.k8s.io/prometheus-operator created prometheusrule.monitoring.coreos.com/prometheus-operator-rules created service/prometheus-operator created serviceaccount/prometheus-operator created servicemonitor.monitoring.coreos.com/prometheus-operator created # 有如下错误提醒,请直接忽略。[此问题与Metrics有关] Error from server (AlreadyExists): error when creating "prometheusAdapter-apiService.yaml": apiservices.apiregistration.k8s.io "v1beta1.metrics.k8s.io" already exists Error from server (AlreadyExists): error when creating "prometheusAdapter-clusterRoleAggregatedMetricsReader.yaml": clusterroles.rbac.authorization.k8s.io "system:aggregated-metrics-reader" already exists
3)确认状态 [root@srv1 ~/kube-prometheus/manifests]# kubectl get pod -n monitoring NAME READY STATUS RESTARTS AGE alertmanager-main-0 2/2 Running 0 2m48s alertmanager-main-1 2/2 Running 0 2m48s alertmanager-main-2 2/2 Running 0 2m48s blackbox-exporter-7b59fc88df-dp4v6 3/3 Running 0 5m6s grafana-748964b847-cpr9j 1/1 Running 0 5m4s kube-state-metrics-84ff45bfbc-jld4h 3/3 Running 0 5m4s node-exporter-4l4v6 2/2 Running 0 5m3s node-exporter-84f6z 2/2 Running 0 5m3s node-exporter-84kg5 2/2 Running 0 5m3s node-exporter-d276j 2/2 Running 0 5m3s node-exporter-mtntk 2/2 Running 0 5m3s node-exporter-xszwd 2/2 Running 0 5m3s prometheus-adapter-7cd669d-2nptz 1/1 Running 0 5m2s prometheus-adapter-7cd669d-pg6cq 1/1 Running 0 5m2s prometheus-k8s-0 2/2 Running 0 2m46s prometheus-k8s-1 2/2 Running 0 2m46s prometheus-operator-665dd7db75-624ts 2/2 Running 0 5m1s
4)删除 NetworkPolicy --- 避免无法访问服务 [root@srv1 ~/kube-prometheus/manifests]# kubectl -n monitoring delete networkpolicies.networking.k8s.io --all networkpolicy.networking.k8s.io "alertmanager-main" deleted networkpolicy.networking.k8s.io "blackbox-exporter" deleted networkpolicy.networking.k8s.io "grafana" deleted networkpolicy.networking.k8s.io "kube-state-metrics" deleted networkpolicy.networking.k8s.io "node-exporter" deleted networkpolicy.networking.k8s.io "prometheus-adapter" deleted networkpolicy.networking.k8s.io "prometheus-k8s" deleted networkpolicy.networking.k8s.io "prometheus-operator" deleted
3.3 访问测试
1) 访问Prometheus
[浏览器]==>[http://srv1.1000y.cloud:30090]

2) 访问Alert-manager [浏览器]==>[http://srv1.1000y.cloud:30093]
3) 访问Grafana---[用户名admin,密码admin] [浏览器]==>[http://srv1.1000y.cloud:32000]
3.4 配置Grafana
1) 更改admin账户的密码

2) 测试Prometheus数据源是否正常






4. 提供持久化存储-NFS
4.1 配置持久化存储
1) 所有节点安装nfs-utils工具
[root@srv1 ~]# yum install nfs-utils -y
[root@srv2 ~]# yum install nfs-utils -y
[root@srv3 ~]# yum install nfs-utils -y
2) 于Master Node配置NFS [root@srv1 ~]# vim /etc/exports /var/lib/nfs-share/ *(rw)
[root@srv1 ~]# mkdir /var/lib/nfs-share
3) 启动Master Node的NFS [root@srv1 ~]# systemctl enable --now rpcbind nfs-server
4) 启动Worker Node的rpcbind服务 [root@srv2 ~]# systemctl enable --now rpcbind [root@srv3 ~]# systemctl enable --now rpcbind
4.2 在Master节点上定义并创建PV对象
[root@srv1 ~]# vim nfs-pv.yml 
apiVersion: v1
kind: PersistentVolume
metadata:
  # 指定pv名为nfs-pv
  name: nfs-pv
spec:
  capacity:
    # 定义存储大小
    storage: 5Gi
  accessModes:
      # ReadWriteMany(允许多节点读写), 
      ReadWriteOnce(允许单节点读写), 
      ReadOnlyMany(允许多节点读)
    - ReadWriteMany
  persistentVolumeReclaimPolicy:
    # 即使容器终止,也保留此PV
    Retain
  nfs:
    # 设定NFS管理节点信息
    path: /var/lib/nfs-share
    server: 192.168.1.11
    readOnly: false

[root@srv1 ~]# kubectl create -f nfs-pv.yml persistentvolume/nfs-pv created
[root@srv1 ~]# kubectl get pv NAME CAPACITY ACCESS MODES RECLAIM POLICY STATUS CLAIM STORAGECLASS REASON AGE nfs-pv 5Gi RWX Retain Available 13s
4.3 在Master节点上定义并创建PVC对象
[root@srv1 ~]# vim nfs-pvc.yml
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  # 设定PVC名字
  name: nfs-pvc
spec:
  accessModes:
  - ReadWriteMany
  resources:
     requests:
       # 设置PVC存储大小
       storage: 1Gi
[root@srv1 ~]# kubectl create -f nfs-pvc.yml persistentvolumeclaim/nfs-pvc created
[root@srv1 ~]# kubectl get pvc NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE nfs-pvc Bound nfs-pv 5Gi RWX 5s
4.4 在Master节点上创建pods并使用PVC
[root@srv1 ~]# vim nginx-nfs.yml
apiVersion: v1
kind: Pod
metadata:
  # 设定POD名称
  name: nginx-nfs
  labels:
    name: nginx-nfs
spec:
  containers:
    - name: nginx-nfs
      image: nginx
      ports:
        - name: web
          containerPort: 80
      volumeMounts:
        - name: nfs-share
          # 设定nginx的挂载路径
          mountPath: /usr/share/nginx/html
  volumes:
    - name: nfs-share
      persistentVolumeClaim:
        # 设定pvc名称
        claimName: nfs-pvc
[root@srv1 ~]# kubectl create -f nginx-nfs.yml pod/nginx-nfs created
[root@srv1 ~]# kubectl get pods -o wide NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES nginx-nfs 1/1 Running 0 8s 10.244.1.3 srv3.1000y.cloud <none> <none>
4.5 测试
[root@srv1 ~]# echo 'NFS USE Test for 1000y.cloud' > /var/lib/nfs-share/index.html
[root@srv1 ~]# kubectl get pods -o wide NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES nginx-nfs 1/1 Running 0 83s 10.244.1.3 srv3.1000y.cloud <none> <none>
[root@srv1 ~]# curl 10.244.1.3 NFS USE Test for 1000y.cloud
4.6 删除PODS及PV、PVC
1) 删除Pod
[root@srv1 ~]# kubectl delete -f nginx-nfs.yml 
pod "nginx-nfs" deleted
[root@srv1 ~]# kubectl get pods No resources found in default namespace.
2) 删除PVC [root@srv1 ~]# kubectl delete -f ./nfs-pvc.yml persistentvolumeclaim "nfs-pvc" deleted
[root@srv1 ~]# kubectl get pvc No resources found in default namespace.
2) 删除PV [root@srv1 ~]# kubectl delete -f ./nfs-pv.yml persistentvolume "nfs-pv" deleted [root@srv1 ~]# kubectl get pv No resources found
5. 配合私有仓库的使用---Docker Registry
5.1 在srv4节点上配置带有SSL的私有仓库
1) 生成证书
[root@srv4 ~]# vim /etc/pki/tls/openssl.cnf
......
......
......
......
......
......
# 172行,将本机改为CA认证中心 basicConstraints=CA:TRUE
...... ...... ...... ...... ...... ......
# 创建root CA [root@srv4 ~]# /etc/pki/tls/misc/CA -newca CA certificate filename (or enter to create) # 回车 Making CA certificate ... Generating a 2048 bit RSA private key .................+++ .........................................+++ writing new private key to '/etc/pki/CA/private/./cakey.pem' Enter PEM pass phrase: # 输入密码 Verifying - Enter PEM pass phrase: # 确认密码 ----- You are about to be asked to enter information that will be incorporated into your certificate request. What you are about to enter is what is called a Distinguished Name or a DN. There are quite a few fields but you can leave some blank For some fields there will be a default value, If you enter '.', the field will be left blank. ----- Country Name (2 letter code) [XX]:CN State or Province Name (full name) []:BeiJing Locality Name (eg, city) [Default City]:BeiJing Organization Name (eg, company) [Default Company Ltd]:1000y.cloud Organizational Unit Name (eg, section) []:tech Common Name (eg, your name or your server's hostname) []:1000y.cloud Email Address []:
Please enter the following 'extra' attributes to be sent with your certificate request A challenge password []: An optional company name []: Using configuration from /etc/pki/tls/openssl.cnf Enter pass phrase for /etc/pki/CA/private/./cakey.pem: # 输入密码 Check that the request matches the signature Signature ok Certificate Details: Serial Number: b6:ee:5e:56:c3:34:44:71 Validity Not Before: Nov 21 12:15:09 2022 GMT Not After : Nov 20 12:15:09 2025 GMT Subject: countryName = CN stateOrProvinceName = BeiJing organizationName = 1000y.cloud organizationalUnitName = tech commonName = 1000y.cloud X509v3 extensions: X509v3 Subject Key Identifier: AB:6B:A4:BA:52:29:A3:BA:E1:E3:BF:8E:23:B6:B6:9A:E9:EC:69:7E X509v3 Authority Key Identifier: keyid:AB:6B:A4:BA:52:29:A3:BA:E1:E3:BF:8E:23:B6:B6:9A:E9:EC:69:7E
X509v3 Basic Constraints: CA:TRUE Certificate is to be certified until Nov 20 12:15:09 2025 GMT (1095 days)
Write out database with 1 new entries Data Base Updated
[root@srv4 ~]# cd /etc/pki/tls/certs [root@srv4 /etc/pki/tls/certs]# openssl genrsa -aes128 2048 > server.key Enter PEM pass phrase: # 输入密码 Verifying - Enter PEM pass phrase:
root@srv4:/etc/ssl/private# openssl rsa -in server.key -out server.key Enter pass phrase for server.key: # 输入密码 writing RSA key
root@srv4:/etc/ssl/private# openssl req -utf8 -new -key server.key -out server.csr You are about to be asked to enter information that will be incorporated into your certificate request. What you are about to enter is what is called a Distinguished Name or a DN. There are quite a few fields but you can leave some blank For some fields there will be a default value, If you enter '.', the field will be left blank. ----- Country Name (2 letter code) [AU]:CN State or Province Name (full name) [Some-State]:BeiJing Locality Name (eg, city) []:BeiJing Organization Name (eg, company) [Internet Widgits Pty Ltd]:1000y.cloud Organizational Unit Name (eg, section) []:Tech Common Name (e.g. server FQDN or YOUR name) []:srv4.1000y.cloud Email Address []:
Please enter the following 'extra' attributes to be sent with your certificate request A challenge password []: An optional company name []:
[root@srv4 /etc/pki/tls/certs]# vim /etc/pki/tls/openssl.cnf ...... ...... ...... ...... ...... ......
# 于文件最后追加如下内容 [ 1000y.cloud ] subjectAltName = DNS:srv4.1000y.cloud, IP:192.168.1.14
[root@srv4 /etc/pki/tls/certs]# openssl ca -keyfile /etc/pki/CA/private/cakey.pem -cert /etc/pki/CA/cacert.pem -in ./server.csr -out ./server.crt -extfile /etc/pki/tls/openssl.cnf -extensions 1000y.cloud Enter pass phrase for /etc/pki/CA/private/cakey.pem # 输入CA密码 Check that the request matches the signature Signature ok Certificate Details: Serial Number: cf:dd:e9:5d:86:07:41:65 Validity Not Before: Nov 21 12:29:38 2022 GMT Not After : Nov 21 12:29:38 2023 GMT Subject: countryName = CN stateOrProvinceName = BeiJing organizationName = 1000y.cloud organizationalUnitName = tech commonName = srv4.1000y.cloud X509v3 extensions: X509v3 Subject Alternative Name: DNS:srv4.1000y.cloud, IP Address:192.168.1.14 Certificate is to be certified until Nov 21 12:29:38 2023 GMT (365 days) Sign the certificate? [y/n]:y
1 out of 1 certificate requests certified, commit? [y/n]y Write out database with 1 new entries Data Base Updated Using configuration from /etc/pki/tls/openssl.cnf
[root@srv4 /etc/pki/tls/certs]# cd
[root@srv4 ~]# cat /etc/pki/CA/cacert.pem >> /etc/pki/tls/certs/ca-bundle.crt
2) 创建访问帐户 [root@srv4 ~]# yum install httpd-tools podman -y
[root@srv4 ~]# htpasswd -Bc /etc/containers/.htpasswd snow New password: # 设定一个密码 Re-type new password: Adding password for user snow
5.2 在srv4节点创建一个仓库容器
1) 下载镜像
[root@srv4 ~]# podman pull docker.io/library/registry:2
Trying to pull docker.io/library/registry:2...
Getting image source signatures
Copying blob ea60b727a1ce done
Copying blob 2408cc74d12b done
Copying blob c87369050336 done
Copying blob fc30d7061437 done
Copying blob e69d20d3dd20 done
Copying config 773dbf02e4 done
Writing manifest to image destination
Storing signatures
773dbf02e42e2691c752b74e9b7745623c4279e4eeefe734804a32695e46e2f3
[root@srv4 ~]# podman images REPOSITORY TAG IMAGE ID CREATED SIZE docker.io/library/registry 2 773dbf02e42e 5 weeks ago 24.6 MB
2) 建立镜像仓库 [root@srv4 ~]# mkdir /var/lib/containers/registry
[root@srv4 ~]# podman run --privileged -d --name registry -p 5000:5000 \ -v /etc/containers:/auth \ -e REGISTRY_AUTH=htpasswd \ -e REGISTRY_AUTH_HTPASSWD_PATH=/auth/.htpasswd \ -e REGISTRY_AUTH_HTPASSWD_REALM="Registry Realm" \ -e REGISTRY_HTTP_TLS_CERTIFICATE=/certs/server.crt \ -e REGISTRY_HTTP_TLS_KEY=/certs/server.key \ -v /etc/pki/tls/certs:/certs \ -v /var/lib/containers/registry:/var/lib/registry \ registry:2 93edb98b8e79cdb84224eb324779e0686a7a83aae757de50a4c46b295fa0db05
3) 设定随机启动策略 # 创建systemd元文件的参数说明 --restart-policy=always 自动重启 -t 1 停止超时时间为1秒 –name 在创建的systemd元文件中,用容器name启动、停止容器 -f 在当前目录创建{container,pod}-{ID,name}.service格式的元文件,不加此参数,创建内容只在控制台显示 [root@srv4 ~]# podman generate systemd --restart-policy=always -t 1 --name -f registry /root/container-registry.service
[root@srv4 ~]# ls -l /root/container-registry.service -rw-r--r-- 1 root root 507 Jul 1 17:58 container-registry.service
[root@srv4 ~]# cp container-registry.service /etc/systemd/system/ [root@srv4 ~]# systemctl enable container-registry.service
5.3 测试
1) Registry端测试
[root@srv4 ~]# mv /etc/containers/registries.conf /etc/containers/registries.conf.bak
[root@srv4 ~]# vim /etc/containers/registries.conf # 于新文件内追加如下内容 unqualified-search-registries = ["docker.io"]
[[registry]] prefix = "docker.io" location = "uyah70su.mirror.aliyuncs.com"
[[registry.mirror]] location = "docker.mirrors.ustc.edu.cn"
[[registry.mirror]] location = "registry.docker-cn.com"
# 定义私有镜像仓库信息 [[registry]] location = "srv4.1000y.cloud:5000" insecure = true blocked = false

[root@srv4 ~]# podman pull nginx Trying to pull docker.io/library/nginx... Getting image source signatures Copying blob f4407ba1f103 done Copying blob fe0ef4c895f5 done Copying blob 935cecace2a0 done Copying blob 8f46223e4234 done Copying blob 4a7307612456 done Copying blob b85a868b505f done Copying config 55f4b40fe4 done Writing manifest to image destination Storing signatures 55f4b40fe486a5b734b46bb7bf28f52fa31426bf23be068c8e7b19e58d9b8deb
[root@srv4 ~]# podman images REPOSITORY TAG IMAGE ID CREATED SIZE docker.io/library/nginx latest 55f4b40fe486 8 days ago 146 MB docker.io/library/registry 2 773dbf02e42e 5 weeks ago 24.6 MB
[root@srv4 ~]# podman tag docker.io/library/nginx:latest srv4.1000y.cloud:5000/nginx
[root@srv4 ~]# podman login srv4.1000y.cloud:5000 Username: snow Password: # 输入密码 Login Succeeded!
[root@srv4 ~]# podman push srv4.1000y.cloud:5000/nginx:latest Getting image source signatures Copying blob 44193d3f4ea2 done Copying blob 08249ce7456a done Copying blob d5b40e80384b done Copying blob e7344f8a29a3 done Copying blob 41451f050aa8 done Copying blob b2f82de68e0d done Copying config 55f4b40fe4 done Writing manifest to image destination Storing signatures
[root@srv4 ~]# curl --user snow:123456 -XGET https://srv4.1000y.cloud:5000/v2/_catalog {"repositories":["nginx"]}
2) 客户端测试 (1) 修改相关的配置文件 [root@srv1 ~]# scp srv4.1000y.cloud:/etc/pki/CA/cacert.pem ./ [root@srv1 ~]# cat cacert.pem >> /etc/pki/tls/certs/ca-bundle.crt
[root@srv1 ~]# curl --user snow:123456 -XGET https://srv4.1000y.cloud:5000/v2/_catalog {"repositories":["nginx"]}
[root@srv1 ~]# vim /etc/containerd/config.toml ...... ...... ...... ...... ...... ......
[plugins."io.containerd.grpc.v1.cri".registry] # 于145行,添加如下内容 [plugins."io.containerd.grpc.v1.cri".registry."srv4.1000y.cloud:5000"] config_path = ""
[plugins."io.containerd.grpc.v1.cri".registry.auths] # 于149行,添加认证帐户的信息 [plugins."io.containerd.grpc.v1.cri".registry.configs."srv4.1000y.cloud:5000".auth] username = "snow" password = "123456"
...... ...... ...... ...... ...... ......
[root@srv1 ~]# systemctl restart containerd.service
(2) 测试 [root@srv1 ~]# crictl pull srv4.1000y.cloud:5000/nginx Image is up to date for sha256:55f4b40fe486a5b734b46bb7bf28f52fa31426bf23be068c8e7b19e58d9b8deb
[root@srv1 ~]# crictl images | grep nginx srv4.1000y.cloud:5000/nginx latest 55f4b40fe486a 59.1MB
3) 其他 如果测试成功,请将其他的k8s节点的containerd服务的私有仓库配置完成
5.4 将验证信息加入至Docker credentials中-Master Node
1) 创建并确认regcred
[root@srv1 ~]# kubectl create secret docker-registry regcred \
--docker-server=srv4.1000y.cloud:5000 \
--docker-username=snow \
--docker-password=123456
secret/regcred created
[root@srv1 ~]# kubectl get secrets NAME TYPE DATA AGE regcred kubernetes.io/dockerconfigjson 1 7s
2) 查看regcred详细信息 [root@srv1 ~]# kubectl get secret regcred --output=yaml apiVersion: v1 data: .dockerconfigjson: eyJhdXRocyI6eyJzcnY0LjEwMDB5LmNsb3VkOjUwMDAiOnsidXNlcm5hbWUiOiJzbm93IiwicGFzc3dvcmQiOiIxMjM0NTYiLCJhdXRoIjoiYzI1dmR6b3hNak0wTlRZPSJ9fX0= kind: Secret metadata: creationTimestamp: "2022-07-01T09:49:50Z" name: regcred namespace: default resourceVersion: "8509" uid: f6467bbc-0ff6-43af-8238-e2dec7eef7f4 type: kubernetes.io/dockerconfigjson
3) 用base64查看dockerconfigjson中所包含 的用户名和密码等信息 [root@srv1 ~]# kubectl get secret regcred --output="jsonpath={.data.\.dockerconfigjson}" | base64 -d {"auths":{"srv4.1000y.cloud:5000":{"username":"snow","password":"123456","auth":"c25vdzoxMjM0NTY="}}}
5.5 测试k8s与私有仓库的互动
1) 确认私有仓库是否存在nginx镜像
[root@srv1 ~]# curl --user snow:123456 -XGET https://srv4.1000y.cloud:5000/v2/_catalog
{"repositories":["nginx"]}
2) 于私有仓库pull一个镜像并启动一个pods [root@srv1 ~]# crictl rmi srv4.1000y.cloud:5000/nginx:latest Deleted: srv4.1000y.cloud:5000/nginx:latest
[root@srv1 ~]# vim private-nginx.yml # 于新文件内添加如下内容 apiVersion: v1 kind: Pod metadata: name: private-nginx spec: containers: - name: private-nginx # 设定私有仓库及镜像 image: srv4.1000y.cloud:5000/nginx imagePullSecrets: # 添加认证名称 - name: regcred
[root@srv1 ~]# kubectl create -f private-nginx.yml pod/private-nginx created
[root@srv1 ~]# kubectl get pods NAME READY STATUS RESTARTS AGE private-nginx 1/1 Running 0 5s
[root@srv1 ~]# kubectl describe pods private-nginx Name: private-nginx Namespace: default Priority: 0 Node: srv2.1000y.cloud/192.168.1.12 Start Time: Fri, 01 Jul 2022 17:51:45 +0800 Labels: <none> Annotations: <none> Status: Running IP: 10.244.2.4 IPs: IP: 10.244.2.4 Containers: private-nginx: Container ID: containerd://47d2b70e4049629e97f0507895f7e541bb48150b5a53e56c447743738494f620 Image: srv4.1000y.cloud:5000/nginx Image ID: docker.io/library/nginx@sha256:10f14ffa93f8dedf1057897b745e5ac72ac5655c299dade0aa434c71557697ea Port: <none> Host Port: <none> State: Running Started: Fri, 01 Jul 2022 17:51:47 +0800 Ready: True Restart Count: 0 Environment: <none> Mounts: /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-8jwnq (ro) Conditions: Type Status Initialized True Ready True ContainersReady True PodScheduled True Volumes: kube-api-access-8jwnq: Type: Projected (a volume that contains injected data from multiple sources) TokenExpirationSeconds: 3607 ConfigMapName: kube-root-ca.crt ConfigMapOptional: <nil> DownwardAPI: true QoS Class: BestEffort Node-Selectors: <none> Tolerations: node.kubernetes.io/not-ready:NoExecute op=Exists for 300s node.kubernetes.io/unreachable:NoExecute op=Exists for 300s Events: Type Reason Age From Message ---- ------ ---- ---- ------- Normal Scheduled 16s default-scheduler Successfully assigned default/private-nginx to srv2.1000y.cloud Normal Pulling 15s kubelet Pulling image "srv4.1000y.cloud:5000/nginx" Normal Pulled 15s kubelet Successfully pulled image "srv4.1000y.cloud:5000/nginx" in 725.338554ms Normal Created 14s kubelet Created container private-nginx Normal Started 14s kubelet Started container private-nginx
3) 删除pods [root@srv1 ~]# kubectl delete -f private-nginx.yml pod "private-nginx" deleted
[root@srv1 ~]# kubectl get pods No resources found in default namespace.
6. 添加或移除节点
6.1 添加新节点
1) 为新节点[srv5]进行准备工作
[root@srv5 ~]# swapoff -a
[root@srv5 ~]# vim /etc/fstab
# # /etc/fstab # Created by anaconda on Sun Dec 5 14:41:17 2021 # # Accessible filesystems, by reference, are maintained under '/dev/disk' # See man pages fstab(5), findfs(8), mount(8) and/or blkid(8) for more info # UUID=eaca2437-8d59-47e4-bacb-4de06d26b7c8 / ext4 defaults 1 1 UUID=95c7c42a-7569-4f80-aec4-961628270ce7 /boot ext4 defaults 1 2 UUID=bf9e568d-ae5d-43d1-948c-90a45d731ec8 swap swap noauto,defaults 0 0
2) 新节点[srv5]开启bridge-nf-call-ip6tables(允许bridge的Netfilter复用IP层的Netfilter代码) [root@srv5 ~]# modprobe overlay; modprobe br_netfilter [root@srv5 ~]# echo -e overlay\\nbr_netfilter > /etc/modules-load.d/br_netfilter.conf
[root@srv5 ~]# vim /etc/sysctl.d/k8s.conf net.bridge.bridge-nf-call-ip6tables = 1 net.bridge.bridge-nf-call-iptables = 1 net.ipv4.ip_forward = 1
[root@srv1 ~]# sysctl --system * Applying /usr/lib/sysctl.d/00-system.conf ... net.bridge.bridge-nf-call-ip6tables = 0 net.bridge.bridge-nf-call-iptables = 0 net.bridge.bridge-nf-call-arptables = 0 * Applying /usr/lib/sysctl.d/10-default-yama-scope.conf ... kernel.yama.ptrace_scope = 0 * Applying /usr/lib/sysctl.d/50-default.conf ... kernel.sysrq = 16 kernel.core_uses_pid = 1 kernel.kptr_restrict = 1 net.ipv4.conf.default.rp_filter = 1 net.ipv4.conf.all.rp_filter = 1 net.ipv4.conf.default.accept_source_route = 0 net.ipv4.conf.all.accept_source_route = 0 net.ipv4.conf.default.promote_secondaries = 1 net.ipv4.conf.all.promote_secondaries = 1 fs.protected_hardlinks = 1 fs.protected_symlinks = 1 * Applying /etc/sysctl.d/99-sysctl.conf ... * Applying /etc/sysctl.d/k8s.conf ... net.bridge.bridge-nf-call-ip6tables = 1 net.bridge.bridge-nf-call-iptables = 1 net.ipv4.ip_forward = 1 * Applying /etc/sysctl.conf ...
3) 新节点[srv5]配置k8s.repo [root@srv5 ~]# vim /etc/yum.repos.d/kubernetes.repo [kubernetes] name=Kubernetes baseurl=https://mirrors.aliyun.com/kubernetes/yum/repos/kubernetes-el7-x86_64 enabled=1 gpgcheck=1 repo_gpgcheck=1 gpgkey=https://mirrors.aliyun.com/kubernetes/yum/doc/yum-key.gpg https://mirrors.aliyun.com/kubernetes/yum/doc/rpm-package-key.gpg
4) 为新节点[srv5]安装kube tools并开启kubelet服务 4.1) 所有节点安装最新版本的kube tools并开启kubelet服务 [root@srv5 ~]# yum install kubeadm kubelet kubectl -y [root@srv5 ~]# systemctl enable kubelet # 不要运行
[root@srv5 ~]# vim /etc/sysconfig/kubelet KUBELET_EXTRA_ARGS=--cgroup-driver=systemd --container-runtime-endpoint=unix:///run/containerd/containerd.sock
4.2) 所有指定版本的kube tools并开启kubelet服务 [root@srv5 ~]# yum list kubelet kubeadm kubectl --showduplicates | grep 1.17.2 [root@srv5 ~]# yum install kubectl-1.17.2-0.x86_64 \ kubeadm-1.17.2-0.x86_64 \ kubelet-1.17.2-0.x86_64 -y [root@srv5 ~]# systemctl enable kubelet # 不要运行
5) 为新节点[srv5]安装Containerd [root@srv5 ~]# yum install yum-utils device-mapper-persistent-data lvm2 -y
[root@srv5 ~]# yum-config-manager --add-repo http://mirrors.aliyun.com/docker-ce/linux/centos/docker-ce.repo [root@srv5 ~]# yum install containerd -y [root@srv5 ~]# systemctl enable --now containerd
6) 为新节点[srv5]配置Containerd (1) 生成containerd默认配置文件 [root@srv5 ~]# containerd config default > /etc/containerd/config.toml
(2) 修改containerd默认配置文件 [root@srv5 ~]# sed -i 's#sandbox_image = "registry.k8s.io/pause#sandbox_image = "registry.aliyuncs.com/google_containers/pause#g' /etc/containerd/config.toml
(3) 为containerd默认配置文件增加镜像源 [root@srv5 ~]# vim /etc/containerd/config.toml ...... ...... ...... ...... ...... ......
# 153行之下添加如下内容 [plugins."io.containerd.grpc.v1.cri".registry.mirrors] [plugins."io.containerd.grpc.v1.cri".registry.mirrors."docker.io"] endpoint = ["https://3laho3y3.mirror.aliyuncs.com", "https://registry-1.docker.io"]
...... ...... ...... ...... ...... ......
[root@srv5 ~]# systemctl daemon-reload && systemctl restart containerd.service
7 于Master节点确认Token [root@srv1 ~]# kubeadm token create --print-join-command kubeadm join 192.168.1.11:6443 --token 5t7djb.jbhfzh95nqsk3ajn \ --discovery-token-ca-cert-hash sha256:2e72a06bac5405732db819225524f050df94a5f50a04b92805b5ccfdb65f301d
8) 使新节点[srv5]加入k8s Cluster [root@srv5 ~]# kubeadm join 192.168.1.11:6443 --token 5t7djb.jbhfzh95nqsk3ajn \ --discovery-token-ca-cert-hash sha256:2e72a06bac5405732db819225524f050df94a5f50a04b92805b5ccfdb65f301d [preflight] Running pre-flight checks [preflight] Reading configuration from the cluster... [preflight] FYI: You can look at this config file with 'kubectl -n kube-system get cm kubeadm-config -o yaml' [kubelet-start] Writing kubelet configuration to file "/var/lib/kubelet/config.yaml" [kubelet-start] Writing kubelet environment file with flags to file "/var/lib/kubelet/kubeadm-flags.env" [kubelet-start] Starting the kubelet [kubelet-start] Waiting for the kubelet to perform the TLS Bootstrap...
This node has joined the cluster: * Certificate signing request was sent to apiserver and a response was received. * The Kubelet was informed of the new secure connection details.
Run 'kubectl get nodes' on the control-plane to see this node join the cluster.
9) Master节点确认 [root@srv1 ~]# kubectl get nodes NAME STATUS ROLES AGE VERSION srv1.1000y.cloud Ready control-plane 22h v1.24.2 srv2.1000y.cloud Ready <none> 22h v1.24.2 srv3.1000y.cloud Ready <none> 22h v1.24.2 srv5.1000y.cloud Ready <none> 16m v1.24.2
[root@srv1 ~]# kubectl get pods -A -o wide | grep srv5 kube-system kube-flannel-ds-7nfhb 1/1 Running 0 17m 192.168.1.15 srv5.1000y.cloud <none> <none> kube-system kube-proxy-jlzsd 1/1 Running 0 17m 192.168.1.15 srv5.1000y.cloud <none> <none>
6.2 移除节点
1) 在Master节点移除指定节点
[root@srv1 ~]# kubectl drain srv5.1000y.cloud --ignore-daemonsets --delete-local-data --force
Flag --delete-local-data has been deprecated, This option is deprecated and will be deleted. Use --delete-emptydir-data.
node/srv5.1000y.cloud cordoned
WARNING: ignoring DaemonSet-managed Pods: kube-system/kube-flannel-ds-qvxv4, kube-system/kube-proxy-2lcrc
node/srv5.1000y.cloud drained
[root@srv1 ~]# kubectl get nodes srv5.1000y.cloud NAME STATUS ROLES AGE VERSION srv5.1000y.cloud Ready,SchedulingDisabled <none> 107m v1.24.2
[root@srv1 ~]# kubectl delete node srv5.1000y.cloud node "srv5.1000y.cloud" deleted
[root@srv1 ~]# kubectl get nodes NAME STATUS ROLES AGE VERSION srv1.1000y.cloud Ready control-plane 24h v1.24.2 srv2.1000y.cloud Ready <none> 23h v1.24.2 srv3.1000y.cloud Ready <none> 23h v1.24.2
2) 在被移除的节点上reset kubeadmin的设置 [root@srv5 ~]# kubeadm reset [reset] WARNING: Changes made to this host by 'kubeadm init' or 'kubeadm join' will be reverted. [reset] Are you sure you want to proceed? [y/N]: y [preflight] Running pre-flight checks [reset] No kubeadm config, using etcd pod spec to get data directory [reset] No etcd config found. Assuming external etcd [reset] Please, manually reset etcd to prevent further issues [reset] Stopping the kubelet service [reset] Unmounting mounted directories in "/var/lib/kubelet" [reset] Deleting contents of directories: [/etc/kubernetes/manifests /etc/kubernetes/pki] [reset] Deleting files: [/etc/kubernetes/admin.conf /etc/kubernetes/kubelet.conf /etc/kubernetes/bootstrap-kubelet.conf /etc/kubernetes/controller-manager.conf /etc/kubernetes/scheduler.conf] [reset] Deleting contents of stateful directories: [/var/lib/kubelet /var/lib/dockershim /var/run/kubernetes /var/lib/cni]
The reset process does not clean CNI configuration. To do so, you must remove /etc/cni/net.d
The reset process does not reset or clean up iptables rules or IPVS tables. If you wish to reset iptables, you must do so manually by using the "iptables" command.
If your cluster was setup to utilize IPVS, run ipvsadm --clear (or similar) to reset your system's IPVS tables.
The reset process does not clean your kubeconfig files and you must remove them manually. Please, check the contents of the $HOME/.kube/config file.
7. Pods横向扩展(Horizontal Pod Autoscaler---自动弹性伸缩)
1) 需要部署完成Metrics Server
2) 撰写一个Horizontal Pod Autoscaler示例 [root@srv1 ~]# vim snow-nginx.yml apiVersion: apps/v1 kind: Deployment metadata: labels: run: snow-nginx name: snow-nginx spec: replicas: 1 selector: matchLabels: run: snow-nginx template: metadata: labels: run: snow-nginx spec: containers: - image: nginx name: snow-nginx resources: # 设置创建容器所需要的最小的资源 requests: # 250m为0.25 CPU cpu: 250m memory: 64Mi # 所需要的最大的资源 limits: cpu: 500m memory: 128Mi
[root@srv1 ~]# vim hpa.yml apiVersion: autoscaling/v2 kind: HorizontalPodAutoscaler metadata: name: snow-nginx-hpa namespace: default spec: scaleTargetRef: kind: Deployment apiVersion: apps/v1 # 部署所定义的名称 name: snow-nginx # 定义最大副本数 maxReplicas: 4 minReplicas: 1 metrics: - type: Resource resource: # 如果目标CPU使用率超过20%,则进行伸缩 name: cpu target: type: Utilization averageUtilization: 20
3) 应用示例 [root@srv1 ~]# kubectl apply -f snow-nginx.yml -f hpa.yml deployment.apps/snow-nginx created horizontalpodautoscaler.autoscaling/snow-nginx-hpa created
4) 验证 [root@srv1 ~]# kubectl get pods NAME READY STATUS RESTARTS AGE snow-nginx-56ccc94d85-bvtxg 1/1 Running 0 51s
[root@srv1 ~]# kubectl top pod NAME CPU(cores) MEMORY(bytes) snow-nginx-56ccc94d85-bvtxg 0m 3Mi
[root@srv1 ~]# kubectl get hpa NAME REFERENCE TARGETS MINPODS MAXPODS REPLICAS AGE snow-nginx-hpa Deployment/snow-nginx 0%/20% 1 4 1 86s
5) 测试横向伸缩 (1) 让snow-nginx的CPU使用超过20% [root@srv1 ~]# kubectl exec -it snow-nginx-56ccc94d85-bvtxg -- bash root@snow-nginx-56ccc94d85-bvtxg:/# cat /dev/urandom | gzip -9 > /dev/null & [1] 42 root@snow-nginx-56ccc94d85-bvtxg:/# exit exit
[root@srv1 ~]# kubectl get hpa NAME REFERENCE TARGETS MINPODS MAXPODS REPLICAS AGE snow-nginx-hpa Deployment/snow-nginx 200%/20% 1 4 4 2m47s
(2) 查看自动伸缩是否实现----应有4个Pod [root@srv1 ~]# kubectl get pods NAME READY STATUS RESTARTS AGE snow-nginx-56ccc94d85-768xm 1/1 Running 0 28s snow-nginx-56ccc94d85-bvtxg 1/1 Running 0 2m58s snow-nginx-56ccc94d85-qdf84 1/1 Running 0 28s snow-nginx-56ccc94d85-wx9jn 1/1 Running 0 28s
[root@srv1 ~]# kubectl top pod NAME CPU(cores) MEMORY(bytes) snow-nginx-56ccc94d85-768xm 8m 3Mi snow-nginx-56ccc94d85-bvtxg 500m 5Mi snow-nginx-56ccc94d85-qdf84 4m 5Mi snow-nginx-56ccc94d85-wx9jn 4m 3Mi
6) 删除 [root@srv1 ~]# kubectl delete -f snow-nginx.yml -f hpa.yml deployment.apps "snow-nginx" deleted horizontalpodautoscaler.autoscaling "snow-nginx-hpa" deleted
[root@srv1 ~]# kubectl get pods No resources found in default namespace.
8. 安装并使用Helm
8.1 下载Helm
1) 说明
Helm 类似于Linux系统下的包管理器,如yum/apt等,可以方便快捷的将之前打包好的yaml文件快速部署进kubernetes内,方便管理维护。
Helm的作用:像ubuntu中的apt命令一样,管理软件包,只不过helm这儿管理的是在k8s上安装的各种容器。 Tiller的作用:像centos7的软件仓库一样,简单说类似于/etc/yum.repos.d目录下的xxx.repo。
2) 安装Helm Helm请先下载好,并修改get-helm-3的下载源为本地 [root@srv1 ~]# curl -O https://raw.githubusercontent.com/helm/helm/master/scripts/get-helm-3
[root@srv1 ~]# bash ./get-helm-3 Downloading https://get.helm.sh/helm-v3.9.0-linux-amd64.tar.gz Verifying checksum... Done. Preparing to install helm into /usr/local/bin helm installed into /usr/local/bin/helm
[root@srv1 ~]# helm version version.BuildInfo{Version:"v3.9.0", GitCommit:"7ceeda6c585217a19a1131663d8cd1f7d641b2a7", GitTreeState:"clean", GoVersion:"go1.17.5"}
8.2 基本使用
1)使用Helm查找
[root@srv1 ~]# helm search hub wordpress
URL                                                     CHART VERSION   APP VERSION    DESCRIPTION
https://artifacthub.io/packages/helm/kube-wordp...      0.1.0           1.1            this is my wordpress package...
https://artifacthub.io/packages/helm/bitnami/wo...      15.0.4          6.0.0          WordPress is the world's most...
https://artifacthub.io/packages/helm/bitnami-ak...      15.0.4          6.0.0          WordPress is the world's most...
https://artifacthub.io/packages/helm/sikalabs/w...      0.2.0                          Simple Wordpress
2)添加国内repository [root@srv1 ~]# helm repo add stable https://apphub.aliyuncs.com/ "stable" has been added to your repositories
################################################## 国内其他repo的添加 ##################################################
[root@srv1 ~]# helm repo add azure https://mirror.azure.cn/kubernetes/charts/ [root@srv1 ~]# helm repo add bitnami https://charts.bitnami.com/bitnami/ [root@srv1 ~]# helm repo add aliyuncs https://kubernetes.oss-cn-hangzhou.aliyuncs.com/charts
################################################## 说明结束 ##################################################
[root@srv1 ~]# helm repo list NAME URL stable https://apphub.aliyuncs.com/
[root@srv1 ~]# helm show chart stable/docker-registry apiVersion: v1 appVersion: 2.7.1 description: A Helm chart for Docker Registry home: https://hub.docker.com/_/registry/ icon: https://hub.docker.com/public/images/logos/mini-logo.svg maintainers: - email: jpds@protonmail.com name: jpds - email: pete.brown@powerhrg.com name: rendhalver name: docker-registry sources: - https://github.com/docker/distribution-library-image version: 1.9.1
[root@srv1 ~]# helm repo update Hang tight while we grab the latest from your chart repositories... ...Successfully got an update from the "stable" chart repository Update Complete. ⎈Happy Helming!⎈
# 删除repo: [root@srv1 ~]# helm repo remove stable
3)按关键字搜索repo [root@srv1 ~]# helm search repo stable NAME CHART VERSION APP VERSION DESCRIPTION stable/admin-mongo 0.1.0 1 MongoDB管理工具(web gui) stable/aerospike 0.3.2 v4.5.0.5 A Helm chart for Aerospike in Kubernetes stable/airflow 4.3.3 1.10.9 Apache Airflow is a platform to programmaticall... stable/ambassador 5.3.0 0.86.1 A Helm chart for Datawire Ambassador stable/apache 7.3.5 2.4.41 Chart for Apache HTTP Server ...... ...... ...... ...... ...... ...... stable/wordpress 8.1.3 5.3.2 Web publishing platform for building blogs and ... stable/zeppelin 1.1.0 0.7.2 Web-based notebook that enables data-driven, in... stable/zookeeper 5.4.2 3.5.7 A centralized service for maintaining configura... Kubernetes
4)现实详细信息 # 指令格式为: helm show [all|chart|readme|values] [chart name] [root@srv1 ~]# helm show chart stable/docker-registry apiVersion: v1 appVersion: 2.7.1 description: A Helm chart for Docker Registry home: https://hub.docker.com/_/registry/ icon: https://hub.docker.com/public/images/logos/mini-logo.svg maintainers: - email: jpds@protonmail.com name: jpds - email: pete.brown@powerhrg.com name: rendhalver name: docker-registry sources: - https://github.com/docker/distribution-library-image version: 1.9.1
5)部署指定的应用 [root@srv1 ~]# helm install registry stable/docker-registry NAME: registry LAST DEPLOYED: Sat Jul 2 17:29:10 2022 NAMESPACE: default STATUS: deployed REVISION: 1 TEST SUITE: None NOTES: 1. Get the application URL by running these commands: export POD_NAME=$(kubectl get pods --namespace default -l "app=docker-registry,release=registry" -o jsonpath="{.items[0].metadata.name}") echo "Visit http://127.0.0.1:8080 to use your application" kubectl -n default port-forward $POD_NAME 8080:5000
6)确认 [root@srv1 ~]# helm list NAME NAMESPACE REVISION UPDATED STATUS CHART registry default 1 2022-07-02 17:29:10.082601457 +0800 CST deployed docker-registry-1.9.6 APP VERSION 2.7.1
[root@srv1 ~]# helm status registry NAME: registry LAST DEPLOYED: Sat Jul 2 17:29:10 2022 NAMESPACE: default STATUS: deployed REVISION: 1 TEST SUITE: None NOTES: 1. Get the application URL by running these commands: export POD_NAME=$(kubectl get pods --namespace default -l "app=docker-registry,release=registry" -o jsonpath="{.items[0].metadata.name}") echo "Visit http://127.0.0.1:8080 to use your application" kubectl -n default port-forward $POD_NAME 8080:5000
7)卸载应用 [root@srv1 ~]# helm uninstall registry release "registry" uninstalled
[root@srv1 ~]# helm list NAME NAMESPACE REVISION UPDATED STATUS CHART APP VERSION
9. 动态卷实现
9.1 下载nfs-subdir-external-provisioner
1) 说明
要在使用持久性存储时使用动态卷配置功能,可以在用户创建PVC(持久性卷声明)时动态创建PV(持久性卷),而无需群集管理员手动创建PV。
2) 前提条件 (1) 安装及配置nfs-kernel-server[即:NFS Server]于srv5节点 [root@srv5 ~]# yum install nfs-utils -y [root@srv5 ~]# mkdir /var/lib/nfs-share/ [root@srv5 ~]# chmod 777 /var/lib/nfs-share/
[root@srv5 ~]# vim /etc/exports /var/lib/nfs-share *(rw,no_root_squash,no_all_squash,sync)
[root@srv5 ~]# systemctl enable --now rpcbind nfs-server
(2) Master及Worker节点安装nfs-common组建[即:NFS Client] [root@srv1 ~]# yum install nfs-utils -y [root@srv2 ~]# yum install nfs-utils -y [root@srv3 ~]# yum install nfs-utils -y
3) 安装NFS Provisioner [root@srv1 ~]# kubectl get pvc No resources found in default namespace.
[root@srv1 ~]# kubectl get pv No resources found
[root@srv1 ~]# helm repo add nfs-subdir-external-provisioner https://kubernetes-sigs.github.io/nfs-subdir-external-provisioner/ "nfs-subdir-external-provisioner" has been added to your repositories
[root@srv1 ~]# helm repo list NAME URL stable https://apphub.aliyuncs.com/ nfs-subdir-external-provisioner https://kubernetes-sigs.github.io/nfs-subdir-external-provisioner/
################################################## 错误汇总 ################################################## # 1. 因无法直接downloadnfs-subdir-external-provisioner镜像,所以需要单独先下载 # 2. 在k8s的Master节点确认镜像的版本 [root@srv1 ~]# helm pull nfs-subdir-external-provisioner/nfs-subdir-external-provisioner
[root@srv1 ~]# ls -l nfs-subdir-external-provisioner-* -rw-r--r-- 1 root root 5684 Jul 2 17:34 nfs-subdir-external-provisioner-4.0.16.tgz
[root@srv1 ~]# tar xfz nfs-subdir-external-provisioner-4.0.16.tgz
[root@srv1 ~]# cat nfs-subdir-external-provisioner/values.yaml | grep -A 3 ^image: image: repository: k8s.gcr.io/sig-storage/nfs-subdir-external-provisioner tag: v4.0.2 pullPolicy: IfNotPresent
# 3. 对所有k8s节点操作,下载镜像---[以srv1举例] [root@srv1 ~]# crictl pull eipwork/nfs-subdir-external-provisioner:v4.0.2 Image is up to date for sha256:932b0bface75b80e713245d7c2ce8c44b7e127c075bd2d27281a16677c8efef3
[root@srv1 ~]# ctr --namespace=k8s.io image tag \ eipwork/nfs-subdir-external-provisioner:v4.0.2 \ registry.k8s.io/sig-storage/nfs-subdir-external-provisioner:v4.0.2 registry.k8s.io/sig-storage/nfs-subdir-external-provisioner:v4.0.2
################################################## 汇总结束 ##################################################
[root@srv1 ~]# helm install nfs-client-provisioner nfs-subdir-external-provisioner/nfs-subdir-external-provisioner \ --set nfs.server=192.168.1.15 \ --set nfs.path=/var/lib/nfs-share \ --set storageClass.defaultClass=true NAME: nfs-client-provisioner LAST DEPLOYED: Sat Jul 2 17:40:19 2022 NAMESPACE: default STATUS: deployed REVISION: 1 TEST SUITE: None
[root@srv1 ~]# helm list NAME NAMESPACE REVISION UPDATED STATUS nfs-client-provisioner default 1 2022-07-02 17:40:19.198318025 +0800 CST deployed CHART APP VERSION nfs-subdir-external-provisioner-4.0.16 4.0.2
4) 确认 [root@srv1 ~]# kubectl get pods NAME READY STATUS RESTARTS AGE nfs-client-provisioner-nfs-subdir-external-provisioner-785btdgl 1/1 Running 0 35s
9.2 基本使用
1)创建PVC及自动创建PV
[root@srv1 ~]# vim chuai-pvc.yml
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  # 指定pvc名称
  name: snow-provisioner
spec:
  accessModes:
    - ReadWriteMany
  # 指定StorageClass name
  storageClassName: nfs-client
  resources:
    requests:
      # 指定卷大小
      storage: 1Gi
[root@srv1 ~]# kubectl apply -f chuai-pvc.yml persistentvolumeclaim/snow-provisioner created
2)确认创建情况 [root@srv1 ~]# kubectl get pvc NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE snow-provisioner Bound pvc-9bc4fe77-055d-470a-a128-7d59251a4e7b 1Gi RWX nfs-client 5s
[root@srv1 ~]# kubectl get pv NAME CAPACITY ACCESS MODES RECLAIM POLICY STATUS CLAIM pvc-9bc4fe77-055d-470a-a128-7d59251a4e7b 5Gi RWO Delete Bound default/my-provisioner STORAGECLASS REASON AGE nfs-client 15s
3)测试 [root@srv1 ~]# vim chuai-pod.yml apiVersion: v1 kind: Pod metadata: name: snow-nginx spec: containers: - name: snow-nginx image: nginx ports: - containerPort: 80 name: web volumeMounts: - mountPath: /usr/share/nginx/html name: nginx-pvc volumes: - name: nginx-pvc persistentVolumeClaim: claimName: snow-provisioner
[root@srv1 ~]# kubectl apply -f chuai-pod.yml pod/snow-nginx created
[root@srv1 ~]# kubectl get pod snow-nginx -o wide NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES snow-nginx 1/1 Running 0 8s 10.244.2.9 srv2.1000y.cloud <none> <none>
4)测试确认 [root@srv1 ~]# kubectl exec snow-nginx -- df /usr/share/nginx/html Filesystem 1K-blocks Used Available 192.168.1.15:/var/lib/nfs-share/default-snow-provisioner-pvc-9bc4fe77-055d-470a-a128-7d59251a4e7b 59871232 2566144 54241280 Use% Mounted on 5% /usr/share/nginx/html
[root@srv1 ~]# echo "Nginx Index for 1000y.cloud" > index.html [root@srv1 ~]# kubectl cp index.html snow-nginx:/usr/share/nginx/html/index.html [root@srv1 ~]# curl 10.244.2.9 Nginx Index for 1000y.cloud
# NFS Server端确认 [root@srv5 ~]# tree /var/lib/nfs-share/ /var/lib/nfs-share/ └── default-snow-provisioner-pvc-9bc4fe77-055d-470a-a128-7d59251a4e7b └── index.html
1 directory, 1 file
5)删除 [root@srv1 ~]# kubectl delete -f chuai-pod.yml pod "snow-nginx" deleted
[root@srv1 ~]# kubectl delete -f chuai-pvc.ym persistentvolumeclaim "snow-provisioner" deleted
[root@srv1 ~]# kubectl get pv No resources found
[root@srv1 ~]# kubectl get pvc No resources found in default namespace.
6)使用StatefulSet---volumeClaimTemplates方式 # StatefulSet可直接在Kubernetes使用volumeClaimTemplates来申请存储空间,用而减少pvc的创建步骤 [root@srv1 ~]# kubectl get pv No resources found
[root@srv1 ~]# kubectl get pvc No resources found in default namespace.
[root@srv1 ~]# kubectl get storageclass NAME PROVISIONER RECLAIMPOLICY nfs-client (default) cluster.local/nfs-client-provisioner-nfs-subdir-external-provisioner Delete VOLUMEBINDINGMODE ALLOWVOLUMEEXPANSION AGE Immediate true 5m53s
[root@srv1 ~]# vim statefulset.yml apiVersion: apps/v1 kind: StatefulSet metadata: name: snow-nginx spec: serviceName: snow-nginx replicas: 1 selector: matchLabels: app: snow-nginx template: metadata: labels: app: snow-nginx spec: containers: - name: snow-nginx image: nginx volumeMounts: - name: data mountPath: /usr/share/nginx/html volumeClaimTemplates: - metadata: name: data spec: storageClassName: nfs-client accessModes: [ "ReadWriteOnce" ] resources: requests: storage: 1Gi
[root@srv1 ~]# kubectl apply -f statefulset.yml statefulset.apps/snow-nginx created
[root@srv1 ~]# kubectl get statefulset NAME READY AGE snow-nginx 1/1 11s
[root@srv1 ~]# kubectl get pods -o wide NAME READY STATUS RESTARTS AGE IP nfs-client-provisioner-nfs-subdir-external-provisioner-785btdgl 1/1 Running 0 6m41s 10.244.1.11 snow-nginx-0 1/1 Running 0 17s 10.244.2.10 NODE NOMINATED NODE READINESS GATES srv3.1000y.cloud <none> <none> srv2.1000y.cloud <none> <none>
[root@srv1 ~]# kubectl get pvc NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE data-snow-nginx-0 Bound pvc-ab3016a0-e63f-43ad-b104-68e8cb04dfc4 1Gi RWO nfs-client 86s
[root@srv1 ~]# kubectl get pv NAME CAPACITY ACCESS MODES RECLAIM POLICY STATUS CLAIM pvc-ab3016a0-e63f-43ad-b104-68e8cb04dfc4 1Gi RWO Delete Bound default/data-snow-nginx-0 STORAGECLASS REASON AGE nfs-client 97s
7)删除 [root@srv1 ~]# kubectl delete -f statefulset.yml statefulset.apps "snow-nginx" deleted
[root@srv1 ~]# kubectl get pods NAME READY STATUS RESTARTS AGE nfs-client-provisioner-nfs-subdir-external-provisioner-785btdgl 1/1 Running 0 9m12s
[root@srv1 ~]# kubectl delete pv pvc-ab3016a0-e63f-43ad-b104-68e8cb04dfc4 persistentvolume "pvc-ab3016a0-e63f-43ad-b104-68e8cb04dfc4" deleted
[root@srv1 ~]# kubectl delete pvc data-snow-nginx-0 persistentvolumeclaim "data-snow-nginx-0" deleted
8)删除nfs-client-provisioner [root@srv1 ~]# helm uninstall nfs-client-provisioner release "nfs-client-provisioner" uninstalled
[root@srv1 ~]# helm list NAME NAMESPACE REVISION UPDATED STATUS CHART APP VERSION
[root@srv1 ~]# kubectl get pods No resources found in default namespace.
10. Ceph作为后端存储[ceph-fs]
10.1 配置ceph---K8S节点
1) 本节各节点的作用
1. K8S: srv1为Manager Node,srv2及srv3为Worker node
2. Ceph: srv4为admin Node,srv5、srv6、srv7为Ceph Node
2) 完成Ceph-Nautilus之配置手册1. Ceph Nautilus配置与实现单元
3) 修改所有ceph管理节点上的ceph.conf,增加mon节点[srv1/srv2/srv3] [snow@srv4 ceph]$ pwd /home/snow/ceph
[snow@srv4 ceph]$ vim ceph.conf [global] fsid = 499a065f-652a-433c-a7d9-bbbd3f63204f public_network = 192.168.1.0/24 mon_initial_members = srv5 mon_host = 192.168.1.15 auth_cluster_required = cephx auth_service_required = cephx auth_client_required = cephx
4) 将修改好的ceph.conf模板复制到所有ceph节点上 [root@srv4 ~]# pscp.pssh -h host-list.txt /home/snow/ceph/ceph.conf /etc/ceph/ [1] 19:51:24 [SUCCESS] root@srv7.1000y.cloud [2] 19:51:24 [SUCCESS] root@srv6.1000y.cloud [3] 19:51:24 [SUCCESS] root@srv4.1000y.cloud [4] 19:51:24 [SUCCESS] root@srv5.1000y.cloud
5) 重启所有ceph节点及控制节点
6) 检查ceph集群 [root@srv5 ~]# ceph -s | grep -A 1 health health: HEALTH_OK
7) 部署ceph客户端前期准备 [root@srv1 ~]# yum install python2-pip http://192.168.1.254/repos/ceph/rpm-nautilus/el7/noarch/ceph-release-1-1.el7.noarch.rpm -y [root@srv2 ~]# yum install python2-pip http://192.168.1.254/repos/ceph/rpm-nautilus/el7/noarch/ceph-release-1-1.el7.noarch.rpm -y [root@srv3 ~]# yum install python2-pip http://192.168.1.254/repos/ceph/rpm-nautilus/el7/noarch/ceph-release-1-1.el7.noarch.rpm -y
8) 开启所有openstack节点epel源,如有优先级设定请取消
9) 部署ceph客户端--所有K8S节点 [root@srv1 ~]# yum install ceph-common ceph-fuse -y [root@srv2 ~]# yum install ceph-common ceph-fuse -y [root@srv3 ~]# yum install ceph-common ceph-fuse -y
10) 复制ceph.conf及ceph.client.admin.keyring [root@srv4 ~]# scp /etc/ceph/ceph.conf srv1.1000y.cloud:/etc/ceph/ [root@srv4 ~]# scp /etc/ceph/ceph.conf srv2.1000y.cloud:/etc/ceph/ [root@srv4 ~]# scp /etc/ceph/ceph.conf srv3.1000y.cloud:/etc/ceph/
[root@srv4 ~]# scp /etc/ceph/ceph.client.admin.keyring srv1.1000y.cloud:/etc/ceph/ [root@srv4 ~]# scp /etc/ceph/ceph.client.admin.keyring srv2.1000y.cloud:/etc/ceph/ [root@srv4 ~]# scp /etc/ceph/ceph.client.admin.keyring srv3.1000y.cloud:/etc/ceph/
11) 创建2个RADOS Pools [root@srv4 ~]# su - snow [snow@srv4 ~]$ cd ceph [snow@srv4 ceph]$ ceph-deploy mds create srv5 [snow@srv4 ~]$ exit
[root@srv4 ~]# ceph osd pool create cephfs_data 64 pool 'cephfs_data' created
[root@srv4 ~]# ceph osd pool create cephfs_metadata 64 pool 'cephfs_metadata' created
[root@srv4 ~]# ceph fs new cephfs cephfs_metadata cephfs_data new fs with metadata pool 3 and data pool 2
[root@srv4 ~]# ceph fs ls name: cephfs, metadata pool: cephfs_metadata, data pools: [cephfs_data ]
[root@srv4 ~]# ceph mds stat cephfs:1 {0=srv5=up:active}
[root@srv4 ~]# ceph fs status cephfs cephfs - 0 clients ====== +------+--------+------+---------------+-------+-------+ | Rank | State | MDS | Activity | dns | inos | +------+--------+------+---------------+-------+-------+ | 0 | active | srv7 | Reqs: 0 /s | 10 | 13 | +------+--------+------+---------------+-------+-------+ +-----------------+----------+-------+-------+ | Pool | type | used | avail | +-----------------+----------+-------+-------+ | cephfs_metadata | metadata | 1536k | 93.9G | | cephfs_data | data | 0 | 93.9G | +-----------------+----------+-------+-------+ +-------------+ | Standby MDS | +-------------+ +-------------+ MDS version: ceph version 14.2.22 (ca74598065096e6fcbd8433c8779a2be0c889351) nautilus (stable)
[root@srv4 ~]# ceph -s | grep -A 1 health: health: HEALTH_OK
10.2 K8s与Ceph的结合实现
1) 获取ceph账户admin及kube的key
[root@srv4 ~]# ceph auth get-key client.admin | base64
QVFEc2xzSmltNXRmRWhBQTB0Q2tFa0dWTjZYU0FHU1ZXNnFjQ3c9PQ==
2) 于k8s管理节点生成能够访问ceph的secret [root@srv1 ~]# vim secrets.yaml apiVersion: v1 kind: Secret metadata: name: ceph-secret data: key: QVFEc2xzSmltNXRmRWhBQTB0Q2tFa0dWTjZYU0FHU1ZXNnFjQ3c9PQ==
[root@srv1 ~]# kubectl create -f secrets.yaml secret/ceph-secret created
[root@srv1 ~]# kubectl get secret NAME TYPE DATA AGE ceph-secret Opaque 1 20s regcred kubernetes.io/dockerconfigjson 1 3d1h
3) 创建cephfs的pv [root@srv1 ~]# vim pv-cephfs.yaml apiVersion: v1 kind: PersistentVolume metadata: name: pv-cephfs spec: capacity: storage: 1Gi accessModes: - ReadWriteMany cephfs: monitors: # 指定srv5节点的IP及ceph所监听的端口 - 192.168.1.15:6789 user: admin secretRef: name: ceph-secret readOnly: false persistentVolumeReclaimPolicy: Recycle
# persistentVolumeReclaimPolicy策略常见有三种 1. Delete (默认) 删除 pvc 同时会删除 rbd 池中对应的 rbd 文件 2. Retain 删除 pvc 时并不会删除 rbd 池中对应 rbd 文件, 需要手动执行删除操作 3. Recycle (新版本已经弃用, 推荐使用 动态支持)删除 pod 时候,会自动清空 pvc 中的内容, 可以再次被其他 pod 调用(需要插件支持)
[root@srv1 ~]# kubectl create -f pv-cephfs.yaml persistentvolume/pv-cephfs created
[root@srv1 ~]# kubectl get pv NAME CAPACITY ACCESS MODES RECLAIM POLICY STATUS CLAIM STORAGECLASS REASON AGE pv-cephfs 1Gi RWX Recycle Available 5s
3) 创建pvc root@srv1:~# vim pvc-cephfs.yaml apiVersion: v1 kind: PersistentVolumeClaim metadata: name: pvc-cephfs spec: accessModes: - ReadWriteMany resources: requests: storage: 40Mi
[root@srv1 ~]# kubectl create -f pvc-cephfs.yaml persistentvolumeclaim/pvc-cephfs created
[root@srv1 ~]# kubectl get pvc NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE pvc-cephfs Bound pv-cephfs 1Gi RWX 4s
# pvc状态说明: 1. Available – 资源尚未被 claim 使用 2. Bound – 卷已经被绑定到 claim 3. Released – claim 被删除,卷处于释放状态,但未被集群回收 4. Failed – 卷自动回收失败
10.3 K8s与Ceph的结合测试
1) 创建测试的pod
[root@srv1 ~]# vim nginx-ceph.yaml
apiVersion: v1
kind: Pod
metadata:
 name: httpd-cephfs
spec:
 containers:
 - name: httpd-cephfs
   image: httpd
   env:
   ports:
   - containerPort: 80
     hostPort: 100
   volumeMounts:
   - name: cephfs-vol1
     # 指定挂载点
     mountPath: /usr/local/apache2/htdocs
     readOnly: false
 volumes:
 - name: cephfs-vol1
   persistentVolumeClaim:
     claimName: pvc-cephfs
[root@srv1 ~]# kubectl create -f nginx-ceph.yaml pod/httpd-cephfs created
[root@srv1 ~]# kubectl get pods NAME READY STATUS RESTARTS AGE httpd-cephfs 0/1 ContainerCreating 0 33ss
[root@srv1 ~]# kubectl describe pods httpd-cephfs Name: httpd-cephfs Namespace: default Priority: 0 Node: srv3.1000y.cloud/192.168.1.13 Start Time: Mon, 04 Jul 2022 19:15:03 +0800 Labels: <none> Annotations: <none> Status: Running IP: 10.244.1.13 IPs: IP: 10.244.1.13 Containers: httpd-cephfs: Container ID: containerd://50d4c4c40622652721de7e7621679d9429a0777ad602f0270bd557e1059dc5c2 Image: httpd Image ID: docker.io/library/httpd@sha256:886f273536ebef2239ef7dc42e6486544fbace3e36e5a42735cfdc410e36d33c Port: 80/TCP Host Port: 100/TCP State: Running Started: Mon, 04 Jul 2022 19:15:49 +0800 Ready: True Restart Count: 0 Environment: <none> Mounts: /usr/local/apache2/htdocs from cephfs-vol1 (rw) /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-gvhrp (ro) Conditions: Type Status Initialized True Ready True ContainersReady True PodScheduled True Volumes: cephfs-vol1: Type: PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace) ClaimName: pvc-cephfs ReadOnly: false kube-api-access-gvhrp: Type: Projected (a volume that contains injected data from multiple sources) TokenExpirationSeconds: 3607 ConfigMapName: kube-root-ca.crt ConfigMapOptional: <nil> DownwardAPI: true QoS Class: BestEffort Node-Selectors: <none> Tolerations: node.kubernetes.io/not-ready:NoExecute op=Exists for 300s node.kubernetes.io/unreachable:NoExecute op=Exists for 300s Events: Type Reason Age From Message ---- ------ ---- ---- ------- Normal Scheduled 54s default-scheduler Successfully assigned default/httpd-cephfs to srv3.1000y.cloud Normal Pulling 51s kubelet Pulling image "httpd" Normal Pulled 9s kubelet Successfully pulled image "httpd" in 42.015956864s Normal Created 9s kubelet Created container httpd-cephfs Normal Started 8s kubelet Started container httpd-cephfs
[root@srv1 ~]# kubectl get pods NAME READY STATUS RESTARTS AGE httpd-cephfs 1/1 Running 0 117s
# 挂载说明 PVC 对应到 rbd 池中文件 kubernetes-dynamic-pvc-190fc9b6-43c9-11e9-b23f-ec388f792726 rbd 文件将会被 map 到物理机, 并命名为 /dev/rbd0 (第一个,第二个则为 rbd1 如此类推 ) 物理机会把 /dev/rbd0 挂载到对应 pod 中 pod 会自动把 /dev/rbd0 挂载到 /media 目录中 pod 已经可以直接对 /media 挂载目录进行访问, 无需管理 /etc/fstab 文件
2) 测试 [root@srv1 ~]# ceph-authtool -p /etc/ceph/ceph.client.admin.keyring > admin.key
[root@srv1 ~]# chmod 600 admin.key
[root@srv1 ~]# mount -t ceph srv4.1000y.cloud:6789:/ /mnt -o name=admin,secretfile=admin.key
[root@srv1 ~]# df -Th /mnt Filesystem Type Size Used Avail Use% Mounted on 192.168.1.14:6789:/ ceph 94G 0 94G 0% /mnt
[root@srv1 ~]# echo "1000y.cloud" > /mnt/index.html
[root@srv1 ~]# umount /mnt
[root@srv1 ~]# kubectl get pods -o wide NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES httpd-cephfs 1/1 Running 0 6m1s 10.244.1.13 srv3.1000y.cloud <none> <none>
[root@srv1 ~]# curl 10.244.1.13 1000y.cloud
3) 在ceph服务器上,检查ceph_metadata及ceph_data的信息---于ceph管理节点操作 [root@srv4 ~]# rados ls -p cephfs_metadata 601.00000000 602.00000000 600.00000000 603.00000000 1.00000000.inode 200.00000000 200.00000001 606.00000000 607.00000000 mds0_openfiles.0 608.00000000 604.00000000 500.00000000 mds_snaptable 605.00000000 mds0_inotable 100.00000000 mds0_sessionmap 609.00000000 400.00000000 100.00000000.inode 1.00000000
[root@srv4 ~]# rados ls -p cephfs_data 10000000000.00000000
3) 检查cephfs在Pod内挂载的情况---于k8s管理节点操作 [root@srv1 ~]# kubectl exec httpd-cephfs -- mount | grep ceph-fuse ceph-fuse on /usr/local/apache2/htdocs type fuse.ceph-fuse (rw,nosuid,nodev,relatime,user_id=0,group_id=0,allow_other)
10.4 其他说明
1) 删除Pod
1. pod 删除后, cephfs 池中文件 仍然存在
2) 更多信息 https://jimmysong.io/kubernetes-handbook/practice/rbd-provisioner.html
11. Ceph作为后端存储[RBD]
11.1 配置CEPH-CSI---K8S节点
1) 本节各节点的作用
1. Ceph CSI插件: 实现了支持CSI的Container Orchestrator (CO)和Ceph集群之间的接口。它允许动态供应Ceph卷并将它们附加到工作负载
2. K8S: srv1为Manager Node,srv2及srv3为Worker node
3. Ceph: srv4为admin Node,srv5、srv6、srv7为Ceph Node
2) 完成Ceph-Nautilus之配置手册1. Ceph Nautilus配置与实现单元
3) 创建一个pool并初始化 [root@srv4 ~]# ceph osd pool create kubernetes 64 64 pool 'kubernetes' created [root@srv4 ~]# rbd pool init kubernetes
4) 配置CEPH-CSI (1) 创建一个新的ceph客户端凭证[可以但不建议使用ceph admin帐户] [root@srv4 ~]# ceph auth get-or-create client.kubernetes mon 'profile rbd' \ osd 'profile rbd pool=kubernetes' mgr 'profile rbd pool=kubernetes' [client.kubernetes] key = AQA5+8Niybw4OhAAyUQ6L6iXDRQUuqjAo6tnaA==
[root@srv4 ~]# ceph auth list | grep -A 4 client.kubernetes installed auth entries:
client.kubernetes key: AQA5+8Niybw4OhAAyUQ6L6iXDRQUuqjAo6tnaA== caps: [mgr] profile rbd pool=kubernetes caps: [mon] profile rbd caps: [osd] profile rbd pool=kubernetes
(2) 确认ceph集群mon的地址及集群ID [root@srv4 ~]# ceph mon dump epoch 1 fsid ed08aec4-ebc2-45f6-a556-11605a9d9328 last_changed 2022-07-04 15:29:45.548689 created 2022-07-04 15:29:45.548689 min_mon_release 14 (nautilus) 0: [v2:192.168.1.15:3300/0,v1:192.168.1.15:6789/0] mon.srv5 dumped monmap epoch 1
[root@srv4 ~]# ceph -s | grep id: | awk '{print $2}' ed08aec4-ebc2-45f6-a556-11605a9d9328
5) 将CEPH集群的信息加入至K8S的ConfigMap (1) 创建csi configmap [root@srv1 ~]# vim csi-config-map.yaml --- apiVersion: v1 kind: ConfigMap data: config.json: |- [ { "clusterID": "ed08aec4-ebc2-45f6-a556-11605a9d9328", "monitors": [ "192.168.1.15:6789" ] } ] metadata: name: ceph-csi-config
[root@srv1 ~]# kubectl create -f csi-config-map.yaml configmap/ceph-csi-config created
[root@srv1 ~]# kubectl get cm ceph-csi-config NAME DATA AGE ceph-csi-config 1 34s
(2) 创建一个空的KMS # ceph-csi还需要一个额外的ConfigMap对象来定义密钥管理服务 (KMS) 提供者的详细信息。如果未设置 KMS,可以将空配置放入csi-kms-config-map.yaml [root@srv1 ~]# vim csi-kms-config-map.yaml --- apiVersion: v1 kind: ConfigMap data: config.json: |- {} metadata: name: ceph-csi-encryption-kms-config
[root@srv1 ~]# kubectl create -f csi-kms-config-map.yaml configmap/ceph-csi-encryption-kms-config created
[root@srv1 ~]# kubectl get configmap ceph-csi-encryption-kms-config NAME DATA AGE ceph-csi-encryption-kms-config 1 36s
[root@srv1 ~]# kubectl get configmap ceph-csi-encryption-kms-config -o json { "apiVersion": "v1", "data": { "config.json": "{}" }, "kind": "ConfigMap", "metadata": { "creationTimestamp": "2022-07-05T09:05:51Z", "name": "ceph-csi-encryption-kms-config", "namespace": "default", "resourceVersion": "3782", "uid": "0b79b3ef-37e7-4e3e-ab40-0f2bad0a6846" } }
(3) 创建一个能够允许读取ceph.conf配置文件中信息的ConfigMap [root@srv1 ~]# vim ceph-config-map.yaml --- apiVersion: v1 kind: ConfigMap data: ceph.conf: | [global] auth_cluster_required = cephx auth_service_required = cephx auth_client_required = cephx keyring: | metadata: name: ceph-config
[root@srv1 ~]# kubectl create -f ceph-config-map.yaml configmap/ceph-config created
[root@srv1 ~]# kubectl get cm ceph-config NAME DATA AGE ceph-config 2 10s
6) 将ceph的信息将如至Kubenetes # ceph管理节点操作 [root@srv4 ~]# ceph auth list | grep -A 4 client.admin installed auth entries:
client.admin key: AQDslsJim5tfEhAA0tCkEkGVN6XSAGSVW6qcCw== caps: [mds] allow * caps: [mgr] allow * caps: [mon] allow *
# k8s master节点操作 [root@srv1 ~]# vim csi-rbd-secret.yaml --- apiVersion: v1 kind: Secret metadata: name: csi-rbd-secret namespace: default stringData: userID: admin userKey: AQDslsJim5tfEhAA0tCkEkGVN6XSAGSVW6qcCw==
[root@srv1 ~]# kubectl create -f csi-rbd-secret.yaml secret/csi-rbd-secret created
[root@srv1 ~]# kubectl get secret NAME TYPE DATA AGE csi-rbd-secret Opaque 2 29s
11.2 创建RBD
1) 创建ServiceAccount和RBAC用以访问k8s信息
[root@srv1 ~]# kubectl apply -f https://raw.githubusercontent.com/ceph/ceph-csi/master/deploy/rbd/kubernetes/csi-provisioner-rbac.yaml
serviceaccount/rbd-csi-provisioner created
clusterrole.rbac.authorization.k8s.io/rbd-external-provisioner-runner created
clusterrolebinding.rbac.authorization.k8s.io/rbd-csi-provisioner-role created
role.rbac.authorization.k8s.io/rbd-external-provisioner-cfg created
rolebinding.rbac.authorization.k8s.io/rbd-csi-provisioner-role-cfg created
[root@srv1 ~]# kubectl apply -f https://raw.githubusercontent.com/ceph/ceph-csi/master/deploy/rbd/kubernetes/csi-nodeplugin-rbac.yaml serviceaccount/rbd-csi-nodeplugin created clusterrole.rbac.authorization.k8s.io/rbd-csi-nodeplugin created clusterrolebinding.rbac.authorization.k8s.io/rbd-csi-nodeplugin created
2) 部署CEPH-CSI Provisioner (1) 下载配置文件,并寻找相关镜像信息 [root@srv1 ~]# wget https://raw.githubusercontent.com/ceph/ceph-csi/master/deploy/rbd/kubernetes/csi-rbdplugin.yaml [root@srv1 ~]# wget https://raw.githubusercontent.com/ceph/ceph-csi/master/deploy/rbd/kubernetes/csi-rbdplugin-provisioner.yaml
[root@srv1 ~]# grep -ri image: ./csi-rbdplugin-provisioner.yaml ./csi-rbdplugin.yaml | awk '{print $3}' | sort -ru registry.k8s.io/sig-storage/csi-snapshotter:v6.0.1 registry.k8s.io/sig-storage/csi-resizer:v1.5.0 registry.k8s.io/sig-storage/csi-node-driver-registrar:v2.5.1 registry.k8s.io/sig-storage/csi-attacher:v3.5.0 quay.io/cephcsi/cephcsi:canary gcr.io/k8s-staging-sig-storage/csi-provisioner:canary
(2) 因网络问题,请单独准备好这些镜像并导入至k8s所有节点 [root@srv1 ~]# ctr -n=k8s.io image import ceph-images.tar [root@srv2 ~]# ctr -n=k8s.io image import ceph-images.tar [root@srv3 ~]# ctr -n=k8s.io image import ceph-images.tar
(3) 部署CEPH-CSI Provisioner [root@srv1 ~]# kubectl apply -f csi-rbdplugin.yaml daemonset.apps/csi-rbdplugin created service/csi-metrics-rbdplugin created
[root@srv1 ~]# kubectl apply -f csi-rbdplugin-provisioner.yaml service/csi-rbdplugin-provisioner created deployment.apps/csi-rbdplugin-provisioner created
[root@srv1 ~]# kubectl get pods NAME READY STATUS RESTARTS AGE csi-rbdplugin-fxzcw 3/3 Running 0 6m10s csi-rbdplugin-j5j5m 3/3 Running 0 6m10s csi-rbdplugin-ph9gc 3/3 Running 0 13s csi-rbdplugin-provisioner-5d969665c5-5j8xj 7/7 Running 0 5m57s csi-rbdplugin-provisioner-5d969665c5-hq7zf 7/7 Running 0 5m57s csi-rbdplugin-provisioner-5d969665c5-t4lfc 7/7 Running 0 5m57s
################################################## 问题汇总 ##################################################
1. 如看到某个 csi-rbdplugin-provisioner-xxxxxxxxxx-xxxxx 一直为 Pending 状态,可查看其状态: [root@srv1 ~]# kubectl describe pod csi-rbdplugin-provisioner-xxxxxxxxxx-xxxxx ...... ...... ...... ...... ...... ...... Events: Type Reason Age From Message ---- ------ ---- ---- ------- Warning FailedScheduling 2m56s default-scheduler 0/3 nodes are available: 1 node(s) had untolerated taint {node- role.kubernetes.io/master: }, 2 node(s) didn't match pod anti-affinity rules. preemption: 0/3 nodes are available: 1 Preemption is not helpful for scheduling, 2 No preemption victims found for incoming pod.
2. 原因: work节点少,又无法部署到master节点。故产生污点
3. 解决办法: (1) 允许master部署pods [root@srv1 ~]# kubectl taint nodes --all node-role.kubernetes.io/master- node/srv1.1000y.cloud untainted taint "node-role.kubernetes.io/master" not found taint "node-role.kubernetes.io/master" not found
(2) 删除NoSchedule [root@srv1 ~]# kubectl taint node srv1.1000y.cloud node-role.kubernetes.io/control-plane:NoSchedule- node/srv1.1000y.cloud untainted
[root@srv1 ~]# kubectl get pods
################################################## 汇总结束 ##################################################
11.3 K8s与Ceph的结合实现
1) 定义存储类型---StorageClass
[root@srv1 ~]# vim csi-rbd-sc.yaml 
---
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
   name: csi-rbd-sc
provisioner: rbd.csi.ceph.com
parameters:
   # 设定CEPH集群的ID
   clusterID: ed08aec4-ebc2-45f6-a556-11605a9d9328
   # 设定CEPH Pool的名字
   pool: kubernetes
   imageFeatures: layering
   csi.storage.k8s.io/provisioner-secret-name: csi-rbd-secret
   csi.storage.k8s.io/provisioner-secret-namespace: default
   csi.storage.k8s.io/controller-expand-secret-name: csi-rbd-secret
   csi.storage.k8s.io/controller-expand-secret-namespace: default
   csi.storage.k8s.io/node-stage-secret-name: csi-rbd-secret
   csi.storage.k8s.io/node-stage-secret-namespace: default
reclaimPolicy: Delete
allowVolumeExpansion: true
mountOptions:
   - discar
################################################## 说明 ##################################################
sc的策略常见有三种(reclaimPolicy) 1. Delete (默认) 删除 pvc 同时会删除 rbd 池中对应的 rbd 文件 2. Retain 删除 pvc 时并不会删除 rbd 池中对应 rbd 文件, 需要手动执行删除操作 3. Recycle (新版本已经弃用, 推荐使用 动态支持)删除 pod 时候,会自动清空 pvc 中的内容, 可以再次被其他 pod 调用(需要插件支持)
################################################## 说明 ##################################################
[root@srv1 ~]# kubectl create -f csi-rbd-sc.yaml storageclass.storage.k8s.io/csi-rbd-sc created
[root@srv1 ~]# kubectl get sc NAME PROVISIONER RECLAIMPOLICY VOLUMEBINDINGMODE ALLOWVOLUMEEXPANSION AGE csi-rbd-sc rbd.csi.ceph.com Delete Immediate true 16s
5) 创建pv及pvc---k8s管理节点操作 [root@srv1 ~]# vim create-volume-ceph.yaml kind: PersistentVolumeClaim apiVersion: v1 metadata: name: webdata spec: accessModes: - ReadWriteOnce storageClassName: csi-rbd-sc resources: requests: storage: 1Gi
[root@srv1 ~]# kubectl create -f create-volume-ceph.yaml persistentvolumeclaim/webdata created
[root@srv1 ~]# kubectl get pv NAME CAPACITY ACCESS MODES RECLAIM POLICY STATUS CLAIM STORAGECLASS REASON AGE pvc-... 1Gi RWO Delete Bound default/webdata csi-rbd-sc 16s
[root@srv1 ~]# kubectl get pvc NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE webdata Bound pvc-2f3ddda0-b411-4f94-...... 1Gi RWO csi-rbd-sc 37s
# pvc状态说明: Available – 资源尚未被 claim 使用 Bound – 卷已经被绑定到 claim Released – claim 被删除,卷处于释放状态,但未被集群回收 Failed – 卷自动回收失败
11.4 K8s与Ceph的结合测试
1) 创建测试的pod
[root@srv1 ~]# vim nginx-ceph.yaml
apiVersion: v1
kind: Pod
metadata:
  name: nginx-test
spec:
  containers:
  - name: nginx
    image: nginx:latest
    volumeMounts:
      - name: webdata
        # 指定挂载点
        mountPath: /data/
        readOnly: false
  volumes:
  - name: webdata
    persistentVolumeClaim:
      claimName: webdata
[root@srv1 ~]# kubectl create -f nginx-ceph.yaml pod/nginx-test created
[root@srv1 ~]# kubectl get pods nginx-test NAME READY STATUS RESTARTS AGE nginx-test 1/1 Running 0 92s
2) 在pods及ceph服务器上,检查rbd镜像创建情况和镜像的信息 # k8s master节点操作 [root@srv1 ~]# kubectl exec -it nginx-test -- mount | grep rbd /dev/rbd0 on /data type ext4 (rw,relatime,discard,stripe=1024,data=ordered)
# ceph管理节点节点操作 [root@srv4 ~]# rbd ls --pool kubernetes csi-vol-137f652d-fc4a-11ec-a9d0-76656aa38e71
[root@srv4 ~]# rbd info kubernetes/csi-vol-137f652d-fc4a-11ec-a9d0-76656aa38e71 rbd image 'csi-vol-137f652d-fc4a-11ec-a9d0-76656aa38e71': size 1 GiB in 256 objects order 22 (4 MiB objects) snapshot_count: 0 id: 3771493fde88 block_name_prefix: rbd_data.3771493fde88 format: 2 features: layering op_features: flags: create_timestamp: Tue Jul 5 18:06:03 2022 access_timestamp: Tue Jul 5 18:06:03 2022 modify_timestamp: Tue Jul 5 18:06:03 2022
# 挂载说明 PVC 对应到 rbd 池中卷 kubernetes/csi-vol-137f652d-fc4a-11ec-a9d0-76656aa38e71 rbd 文件将会被 map 到物理机, 并命名为 /dev/rbd0 (第一个,第二个则为 rbd1 如此类推 ) 物理机会把 /dev/rbd0 挂载到对应 pod 中 pod 会自动把 /dev/rbd0 挂载到 /media 目录中 pod 已经可以直接对 /media 挂载目录进行访问, 无需管理 /etc/fstab 文件
11.5 实现StatefulSet
1) 说明
使用StatefulSet可直接在Kubernetes使用volumeClaimTemplates来申请存储空间,用而减少pvc的创建步骤。
2) 实现tatefulSet [root@srv1 ~]# vim nginx-statefulset.yaml apiVersion: apps/v1 kind: StatefulSet metadata: name: web spec: # 定义副本数为3 replicas: 3 revisionHistoryLimit: 10 serviceName: nginx selector: matchLabels: app: nginx template: metadata: # name字段不用写,系统会根据上面的name字段自动生成 labels: app: nginx spec: containers: - name: nginx image: nginx imagePullPolicy: IfNotPresent ports: - containerPort: 80 volumeMounts: # 填写pvc的名字 - name: ceph mountPath: /data/ volumeClaimTemplates: - metadata: name: ceph spec: accessModes: ["ReadWriteOnce"] storageClassName: csi-rbd-sc volumeMode: Filesystem resources: requests: storage: 512M
[root@srv1 ~]# kubectl create -f nginx-statefulset.yaml statefulset.apps/web created
3) 测试 [root@srv1 ~]# kubectl get pods | grep web- web-0 1/1 Running 0 3m44s web-1 1/1 Running 0 2m12s web-2 1/1 Running 0 116s
[root@srv1 ~]# kubectl get pvc | grep ceph ceph-web-0 Bound pvc-6a0d4eb6-92b8-453a-96fd-da28cc407170 489Mi RWO csi-rbd-sc 4m27s ceph-web-1 Bound pvc-2f2524ac-57b7-4a0d-a0cc-20ea56e08e0e 489Mi RWO csi-rbd-sc 2m55s ceph-web-2 Bound pvc-54be2f59-632f-47aa-a4dc-7c8c25daba57 489Mi RWO csi-rbd-sc 2m39s
4) 挂载确认 (2) kubernetes master端确认 [root@srv1 ~]# kubectl exec -it web-0 -- mount | grep rbd /dev/rbd0 on /data type ext4 (rw,relatime,discard,stripe=4096,data=ordered) [root@srv1 ~]# kubectl exec -it web-1 -- mount | grep rbd /dev/rbd1 on /data type ext4 (rw,relatime,discard,stripe=4096,data=ordered) [root@srv1 ~]# kubectl exec -it web-2 -- mount | grep rbd /dev/rbd0 on /data type ext4 (rw,relatime,discard,stripe=4096,data=ordered)
(2) ceph admin端确认 [root@srv4 ~]# rbd ls --pool kubernetes csi-vol-137f652d-fc4a-11ec-a9d0-76656aa38e71 csi-vol-1fa99cde-fc4d-11ec-a9d0-76656aa38e71 csi-vol-290b9aa7-fc4d-11ec-a9d0-76656aa38e71 csi-vol-e89d6a48-fc4c-11ec-a9d0-76656aa38e71
[root@srv4 ~]# rbd info kubernetes/csi-vol-1fa99cde-fc4d-11ec-a9d0-76656aa38e71 rbd image 'csi-vol-1fa99cde-fc4d-11ec-a9d0-76656aa38e71': size 489 MiB in 123 objects order 22 (4 MiB objects) snapshot_count: 0 id: 37714d1bf2e3 block_name_prefix: rbd_data.37714d1bf2e3 format: 2 features: layering op_features: flags: create_timestamp: Tue Jul 5 18:27:51 2022 access_timestamp: Tue Jul 5 18:27:51 2022 modify_timestamp: Tue Jul 5 18:27:51 2022
[root@srv4 ~]# rbd info kubernetes/csi-vol-290b9aa7-fc4d-11ec-a9d0-76656aa38e71 rbd image 'csi-vol-290b9aa7-fc4d-11ec-a9d0-76656aa38e71': size 489 MiB in 123 objects order 22 (4 MiB objects) snapshot_count: 0 id: 3771c353a120 block_name_prefix: rbd_data.3771c353a120 format: 2 features: layering op_features: flags: create_timestamp: Tue Jul 5 18:28:08 2022 access_timestamp: Tue Jul 5 18:28:08 2022 modify_timestamp: Tue Jul 5 18:28:08 2022
[root@srv4 ~]# rbd info kubernetes/csi-vol-e89d6a48-fc4c-11ec-a9d0-76656aa38e71 rbd image 'csi-vol-e89d6a48-fc4c-11ec-a9d0-76656aa38e71': size 489 MiB in 123 objects order 22 (4 MiB objects) snapshot_count: 0 id: 3771b09da1ba block_name_prefix: rbd_data.3771b09da1ba format: 2 features: layering op_features: flags: create_timestamp: Tue Jul 5 18:26:19 2022 access_timestamp: Tue Jul 5 18:26:19 2022 modify_timestamp: Tue Jul 5 18:26:19 2022
11.6 其他说明
1) 删除Pod
1. pod 删除后, 物理机 /dev/rbd0 自动 unmap
2. ceph rbd 池中文件 仍然存在
2) 删除pvc 当执行 kubectl delete -f create-volume-ceph.yaml 后 1. 针对 delete 策略: kubectl -n ceph get pvc 发现 pvc 文件已经被删除 ceph rbd 池中对应的 rbd 文件也会自动被删除
2. 针对 Retain 策略: kubectl -n ceph get pvc 发现 pvc 文件仍然存在, 需要手动执行 kubectl -n ceph delete pvc xxxx 进行删除 ceph rbd 池中对应 rbd 文件依旧存在, 就算执行 kubectl -n ceph delete pvc xxx 操作也不会自动删除 ceph rbd 池中对应 rbd 文件必须手动通过 rbd -p kubenetes remove xxxx 执行删除操作
3) 更多信息 https://jimmysong.io/kubernetes-handbook/practice/rbd-provisioner.html
12. 多集群管理--contexts模式
1) 请先准备2个及以上的k8s集群
################################################## 提示 ##################################################
1. 如打算备份出所有images,可用下列指令 [root@srv1 ~]# ctr -n=k8s.io image export srv1.tar $(crictl images 2>/dev/null | awk '{print $1":"$2}' | sed 1d)
2. 导入镜像 [root@srv6 ~]# ctr -n=k8s.io image import srv1.tar
################################################## 结束 ##################################################
2) 查看k8s集群的contexts---以docker1作为k8s-cluster的Master节点举例 [root@srv1 ~]# kubectl config get-contexts CURRENT NAME CLUSTER AUTHINFO NAMESPACE * cluster1 cluster1 kubernetes-admin1
3) 备份好各集群Master节点的admin.conf文件[root帐户环境]或.kube/config文件[普通帐户环境]---以srv1作为k8s-cluster的Master节点举例 [root@srv1 ~]# cp -p /etc/kubernetes/admin.conf /etc/kubernetes/admin.conf.srv1.bak
4) 开启各集群Master节点的.kube/config文件---以docker1作为k8s-cluster的Master节点举例 [root@srv1 ~]# vim .kube/config .kube/config.bak
5) 请除config文件中的所有配置,并写入以下内容---于所有k8s cluster的master节点上操作 apiVersion: v1 clusters: - cluster: # 填写第一个K8S Cluster Master节点certificate-authority-data信息。可在config.bak中查到 certificate-authority-data: LS0tLS1CRUdJTiBDRVJUSUZJQ0FUR...... # 指定第一个集群的IP地址及监听端口 server: https://192.168.1.11:6443 # 设定第一个集群的名字 name: cluster1 - cluster: # 填写第二个K8S Cluster Master节点certificate-authority-data信息。可在config.bak中查到 certificate-authority-data: LS0tLS1CRUdJTiBDRVJUSUZJQ0FU...... # 指定第二个集群的IP地址及监听端口 server: https://192.168.1.15:6443 # 设定第二个集群的名字 name: cluster2 contexts: - context: # 设定第1个集群用户 cluster: cluster1 # 设定集群识别用户信息名 user: kubernetes-admin1 # 设定用户归属哪个集群 name: cluster1 - context: cluster: cluster2 user: kubernetes-admin2 name: cluster2 # 设定默认使用集群 current-context: cluster1 kind: Config preferences: {} users: # 集群用户对应的登录信息 - name: kubernetes-admin1 user: # 填写第一个K8S Cluster Master节点client-certificate-data信息。可在config.bak中查到 client-certificate-data: LS0tLS1CRUdJTiBDRVJUSUZJQ0FURS0tLS0tCk1JS...... # 填写第一个K8S Cluster Master节点client-key-data信息。可在config.bak中查到 client-key-data: LS0tLS1CRUdJTiBSU0EgUFJJVkFURSBLRVktLS0tLQpN...... - name: kubernetes-admin2 user: # 填写第一个K8S Cluster Master节点client-certificate-data信息。可在config.bak中查到 client-certificate-data: LS0tLS1CRUdJTiBDRVJUSUZJQ0FURS0tLS0tC...... # 填写第一个K8S Cluster Master节点client-key-data信息。可在config.bak中查到 client-key-data: LS0tLS1CRUdJTiBSU0EgUFJJVkFURSBLRVktLS0tLQpNSUlFcGdJQ......
[root@srv1 ~]#
6) 测试 (1) 获取两个Cluster的contexts [root@srv1 ~]# kubectl config get-contexts CURRENT NAME CLUSTER AUTHINFO NAMESPACE * cluster1 cluster1 kubernetes-admin1 cluster2 cluster2 kubernetes-admin2
(2) 查看每个Cluster的nodes # 查看第一个k8s cluster的nodes信息 [root@srv1 ~]# kubectl get nodes NAME STATUS ROLES AGE VERSION srv1.1000y.cloud Ready control-plane 3d4h v1.24.2 srv2.1000y.cloud Ready <none> 3d3h v1.24.2 srv3.1000y.cloud Ready <none> 3d3h v1.24.2
# 查看第二个k8s cluster的nodes信息 [root@srv1 ~]# kubectl get nodes --context cluster2 NAME STATUS ROLES AGE VERSION srv5.1000y.cloud Ready control-plane 21m v1.24.2 srv6.1000y.cloud Ready <none> 17m v1.24.2 srv7.1000y.cloud Ready <none> 17m v1.24.2
# 多集群管理---极速模式 1. 把每个k8s集群的Master节点的config配置文件放到/root/.kube/目录下并改为不同名字或存放在/etc/kubernetes/目录中并改为不同的名字 2. 通过--kubeconfig实现不同集群操作,如 : root@srv1:~# kubectl --kubeconfig=/root/.kube/k8s2-config get pods
# 或
root@srv1:~# kubectl --kubeconfig=/etc/kubernetes/srv4.conf get nodes NAME STATUS ROLES AGE VERSION srv4.1000y.cloud Ready control-plane 22m v1.24.2 srv5.1000y.cloud Ready <none> 18m v1.24.2 srv6.1000y.cloud Ready <none> 18m v1.24.2

 

 

如对您有帮助,请随缘打个赏。^-^

gold