
环境
三节点master,三节点node
# kubectl get nodes -owide
NAME STATUS ROLES AGE VERSION INTERNAL-IP EXTERNAL-IP OS-IMAGE KERNEL-VERSION CONTAINER-RUNTIME
master1 Ready control-plane 2d19h v1.29.8 192.168.248.101 <none> Rocky Linux 9.2 (Blue Onyx) 5.14.0-284.11.1.el9_2.x86_64 containerd://1.7.13
master2 Ready control-plane 2d19h v1.29.8 192.168.248.102 <none> Rocky Linux 9.2 (Blue Onyx) 5.14.0-284.11.1.el9_2.x86_64 containerd://1.7.13
master3 Ready control-plane 2d19h v1.29.8 192.168.248.103 <none> Rocky Linux 9.2 (Blue Onyx) 5.14.0-284.11.1.el9_2.x86_64 containerd://1.7.13
worker1 Ready worker 2d19h v1.29.8 192.168.248.111 <none> Rocky Linux 9.2 (Blue Onyx) 5.14.0-284.30.1.el9_2.x86_64 containerd://1.7.13
worker2 Ready worker 2d19h v1.29.8 192.168.248.112 <none> Rocky Linux 9.2 (Blue Onyx) 5.14.0-284.30.1.el9_2.x86_64 containerd://1.7.13
worker3 Ready worker 2d19h v1.29.8 192.168.248.113 <none> Rocky Linux 9.2 (Blue Onyx) 5.14.0-284.30.1.el9_2.x86_64 containerd://1.7.131、配置镜像加速、或替换国内镜像或科学上网
2、部署metallb+traefik
3、部署nfsserver,storageclass (nfs-client)提供数据持久化,并且设置默认
# kubectl get sc
NAME PROVISIONER RECLAIMPOLICY VOLUMEBINDINGMODE ALLOWVOLUMEEXPANSION AGE
nfs-client (default) cluster.local/nfs-client-nfs-subdir-external-provisioner Delete Immediate true 45dhelm添加prometheus-community仓库
helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
#拉取到本地
helm pull prometheus-community/kube-prometheus-stack
tar xzvf kube-prometheus-stack-65.5.0.tgz
cd kube-prometheus-stackalertmanager下配置钉钉警告(自行查找钉钉端配置)
config:
global:
resolve_timeout: 5m
route:
group_by: ['alertname']
group_wait: 30s
group_interval: 5m
repeat_interval: 12h
receiver: 'webhook'
receivers:
- name: 'webhook'
webhook_configs:
- url: 'http://webhook-dingtalk:8060/dingtalk/webhook/send'
send_resolved: true
templates:
- '/etc/alertmanager/config/*.tmpl'配置alertmanager、prometheus数据卷存储持久化
编辑values.yaml 762-769行和3836-3842行
vim values.yaml
……
storage:
volumeClaimTemplate:
spec:
storageClassName: nfs-client
accessModes: ["ReadWriteOnce"]
resources:
requests:
storage: 50Gi
……
volumeClaimTemplate:
spec:
storageClassName: nfs-client
accessModes: ["ReadWriteOnce"]
resources:
requests:
storage: 50Gi
……配置grafana数据卷存储持久化
vim charts/grafana/values.yaml
persistence:
type: pvc
enabled: true
storageClassName: nfs-client
accessModes:
- ReadWriteOnce
size: 10Gi安装
helm install kube-prometheus-stack -n monitoring --create-namespace \
--set prometheus.ingress.enabled=true \
--set prometheus.prometheusSpec.retention=7d \
--set prometheus.ingress.hosts='{prometheus.haiwang.local}' \
--set prometheus.ingress.paths='{/}' \
--set prometheus.ingress.pathType=Prefix \
--set alertmanager.ingress.enabled=true \
--set alertmanager.ingress.hosts='{alertmanager.haiwang.local}' \
--set alertmanager.ingress.paths='{/}' \
--set alertmanager.ingress.pathType=Prefix \
--set grafana.ingress.enabled=true \
--set grafana.adminPassword=baidu.com \
--set grafana.ingress.hosts='{grafana.haiwang.local}' \
--set grafana.ingress.paths='{/}' \
--set grafana.ingress.pathType=Prefix . -f values.yaml如果有多个ingress控制器,指定特定的控制器使用
--set grafana.ingress.ingressClassName=traefik \
--set alertmanager.ingress.ingressClassName=traefik \
--set prometheus.ingress.ingressClassName=traefik \
查看pod状态
# kubectl -n monitoring get pod
NAME READY STATUS RESTARTS AGE
alertmanager-kube-prometheus-stack-alertmanager-0 2/2 Running 0 7m37s
grafana-c6475587c-57j2z 1/1 Running 0 23h
kube-prometheus-stack-grafana-65c97dc47-qpsq5 3/3 Running 0 8m14s
kube-prometheus-stack-kube-state-metrics-86b64f5db8-lzbzn 1/1 Running 0 8m14s
kube-prometheus-stack-operator-78fb9f4668-ztf6m 1/1 Running 0 8m14s
kube-prometheus-stack-prometheus-node-exporter-65jwk 1/1 Running 0 8m14s
kube-prometheus-stack-prometheus-node-exporter-6m7r5 1/1 Running 0 8m14s
kube-prometheus-stack-prometheus-node-exporter-dwx77 1/1 Running 0 8m14s
kube-prometheus-stack-prometheus-node-exporter-fffmh 1/1 Running 0 8m14s
kube-prometheus-stack-prometheus-node-exporter-klx4l 1/1 Running 0 8m14s
kube-prometheus-stack-prometheus-node-exporter-wqntg 1/1 Running 0 8m14s
prometheus-kube-prometheus-stack-prometheus-0 2/2 Running 0 7m37s查看svc
# kubectl get svc -n monitoring
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
alertmanager-operated ClusterIP None <none> 9093/TCP,9094/TCP,9094/UDP 7m58s
grafana ClusterIP 10.102.67.115 <none> 3000/TCP 23h
kube-prometheus-stack-alertmanager ClusterIP 10.100.129.125 <none> 9093/TCP,8080/TCP 8m35s
kube-prometheus-stack-grafana ClusterIP 10.105.104.4 <none> 80/TCP 8m35s
kube-prometheus-stack-kube-state-metrics ClusterIP 10.102.107.229 <none> 8080/TCP 8m35s
kube-prometheus-stack-operator ClusterIP 10.103.188.188 <none> 443/TCP 8m35s
kube-prometheus-stack-prometheus ClusterIP 10.101.98.65 <none> 9090/TCP,8080/TCP 8m35s
kube-prometheus-stack-prometheus-node-exporter ClusterIP 10.109.11.221 <none> 9100/TCP 8m35s
prometheus-operated ClusterIP None <none> 9090/TCP 7m58s查看ingress的svc地址的外部IP
# kubectl get svc -n traefik-ingress traefik
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
traefik LoadBalancer 10.110.60.189 172.20.248.20 9000:31465/TCP,80:32348/TCP,443:30948/TCP 40h如果没有在安装的时候指定密码或者忘记密码,获取grafana密码
# kubectl get secrets -n monitoring kube-prometheus-stack-grafana -o jsonpath="{.data.admin-password}" | base64 --decode ; echo
baidu.com查看ingress
# kubectl get ingress -n monitoring
NAME CLASS HOSTS ADDRESS PORTS AGE
grafana traefik csgrafana.haiwang.com 80 23h
kube-prometheus-stack-alertmanager traefik alertmanager.haiwang.local 80 10m
kube-prometheus-stack-grafana traefik grafana.haiwang.local 80 10m
kube-prometheus-stack-prometheus traefik prometheus.haiwang.local 80 10m# kubectl get pv,pvc -n monitoring
NAME CAPACITY ACCESS MODES RECLAIM POLICY STATUS CLAIM STORAGECLASS VOLUMEATTRIBUTESCLASS REASON AGE
persistentvolume/pvc-03736a1c-8523-4234-ac18-d05ffc777367 10Gi RWO Delete Bound monitoring/kube-prometheus-stack-grafana nfs-client <unset> 10m
persistentvolume/pvc-248949ce-c72d-469f-9e49-4367d4483788 50Gi RWO Delete Bound monitoring/alertmanager-kube-prometheus-stack-alertmanager-db-alertmanager-kube-prometheus-stack-alertmanager-0 nfs-client <unset> 10m
persistentvolume/pvc-71a674f5-4ca5-4a67-a587-9100776ebaee 50Gi RWO Delete Bound monitoring/prometheus-kube-prometheus-stack-prometheus-db-prometheus-kube-prometheus-stack-prometheus-0 nfs-client <unset> 10m
persistentvolume/pvc-7b281702-dcb0-4cda-bd07-9c4cc0dd60fe 10Gi RWO Delete Bound monitoring/grafana nfs-client <unset> 23h
NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS VOLUMEATTRIBUTESCLASS AGE
persistentvolumeclaim/alertmanager-kube-prometheus-stack-alertmanager-db-alertmanager-kube-prometheus-stack-alertmanager-0 Bound pvc-248949ce-c72d-469f-9e49-4367d4483788 50Gi RWO nfs-client <unset> 10m
persistentvolumeclaim/grafana Bound pvc-7b281702-dcb0-4cda-bd07-9c4cc0dd60fe 10Gi RWO nfs-client <unset> 23h
persistentvolumeclaim/kube-prometheus-stack-grafana Bound pvc-03736a1c-8523-4234-ac18-d05ffc777367 10Gi RWO nfs-client <unset> 10m
persistentvolumeclaim/prometheus-kube-prometheus-stack-prometheus-db-prometheus-kube-prometheus-stack-prometheus-0 Bound pvc-71a674f5-4ca5-4a67-a587-9100776ebaee 50Gi RWO nfs-client <unset> 10m在本地hosts文件加入解析
172.20.248.20 prometheus.haiwang.local alertmanager.haiwang.local grafana.haiwang.local打开浏览器,自带了很多dashboard
http://grafana.haiwang.local/dashboards
输入账号密码

再导入一个dashboard:8919

展示下
