Prometheus+Grafana监控系统部署

12月 18, 2019 Development, 英大

Prometheus+Grafana监控系统部署

基础环境:

OS:CentOS 7.6/SUSE-11

Docker:18.09.6

Rancher:2.3.3

Prometheus:prometheus:v2.11.1

Grafana:grafana:6.3.4

安装Prometheus+Grafana

Rancher → 应用商店 → Prometheus → 配置端口、启用grafana → 启动

等待镜像下载完成

基础配置:

配置安装的应用名称、使用模板版本、命名空间、使用默认镜像

Prometheus-server配置:

选择服务类型、访问端口、配置持久化

Alertmanager配置:

选择服务类型、访问端口、配置持久化

Grafana配置:

开启grafana Dashboard 、配置用户名密码、持久化及访问端口

Pushgateway配置:

开启 Pushgateway 、配置持久化

被监控端

下载监控指标采集器并后台运行

wget https://github.com/prometheus/node_exporter/releases/download/v0.18.1/node_exporter-0.18.1.linux-amd64.tar.gz

tar zxf node_exporter-0.18.1.linux-amd64.tar.gz

nohup ./node_exporter-0.18.1.linux-amd64/node_exporter &

配置prometheus

添加被监控节点

选择“集群”→“项目”→“资源”→“配置映射”→“prometheus-server”→ “升级”

升级 “prometheus-server”

配置节点IP

“prometheus.yml”→“值”→“scrape_configs”→“节点:端口”

– job_name: ‘weblogic’

  static_configs:

  – targets:

    – 192.168.200.7:9256

scrape_configs下插入被监控节点信息

查看节点

查看被监控节点

配置grafana

添加数据源

登录grafana → “Add data source”→“Prometheus”→ 输入prometheus相关信息,保存并测试

“添加数据源”
选择数据源类型“Prometheus”
配置prometheus地址
保存并测试

导入模板

“+”→“import”

导入新模板

导入249模板

输入模板ID

Prometheus告警配置

配置告警规则:

“资源”→“配置映射”→“Prometheus-server”→“升级”→ 输入“值”

groups:

– name: memalert     ##告警规则组组名

  rules:

  – alert: NodeMemoryUsage  ##告警名称

    expr: (node_memory_MemTotal_bytes – (node_memory_MemFree_bytes+node_memory_Buffers_bytes+node_memory_Cached_bytes )) / node_memory_MemTotal_bytes * 100 > 50    ##触发规则

    for: 1m                               ##持续时间

    annotations:                           ##告警通知

      summary: “{{$labels.instance}}: High Memory usage detected”

      description: “{{$labels.instance}}: Memory usage is above 80% (current value is:{{ $value }})”

添加新的配置映射为告警规则文件

指定告警规则文件

rule_files:

– /etc/config/memory_over.yml

“prometheus.yml”中配置规则文件

钉钉告警:

“资源”→“配置映射”→“prometheus-alertmanager”→“升级”→ 输入“值”

global:                 ##全局配置

  resolve_timeout: 5m    ##告警恢复通知时间

route:

  group_by: [‘job’]

  group_wait: 30s

  group_interval: 5m     ##新告警发送等待时间

  repeat_interval: 1h     ##重复告警发送间隔

  receiver: webhook

receivers:                   ##接收器

– name: ‘webhook’      ##接收器名称

  webhook_configs:    

  – url: ‘http://webhook-dingtalk.prometheus.svc.cluster.local:8060/dingtalk/webhook1/send’   ##webhook-svc地址

    send_resolved: true

安装webhook

[root@prometheus_master ~]# cat dingtalk-webhook.yaml

apiVersion: apps/v1

kind: Deployment

metadata:

  name: webhook-dingtalk

  namespace: prometheus

  labels:

    run: dingtalk

spec:

  replicas: 1

  selector:

    matchLabels:

        run: dingtalk

  template:

    metadata:

      labels:

        run: dingtalk

    spec:

      containers:

      – name: dingtalk

        image: timonwong/prometheus-webhook-dingtalk:v0.3.0

        imagePullPolicy: IfNotPresent

        args:

          – –ding.profile=webhook1=https://oapi.dingtalk.com/robot/send?access_token=907262bedac6e948801c7810b86415a88ec02db3fb1ffcd0010ebdf8ea0de157     ##钉钉机器人token

        ports:

        – containerPort: 8060

          protocol: TCP

apiVersion: v1

kind: Service

metadata:

  labels:

    run: dingtalk

  name: webhook-dingtalk

  namespace: prometheus

spec:

  ports:

  – port: 8060

    protocol: TCP

    targetPort: 8060

  selector:

    run: dingtalk

  sessionAffinity: None

[root@prometheus_master ~]# kubectl apply -f dingtalk-webhook.yaml

邮件告警:

“资源”→“配置映射”→“prometheus-alertmanager”→“升级”→ 输入“值”

global:

  smtp_smarthost: ‘smtp.163.com:25’

  smtp_from: ‘shij_yu@163.com’

  smtp_auth_username: ‘shij_yu@163.com’

  smtp_auth_password: ‘raging2454’

  smtp_require_tls: false

route:

  group_by: [‘alertname’]

  repeat_interval: 1m

  receiver: live-monitoring

receivers:

– name: ‘live-monitoring’

  email_configs:

  – to: ‘704156648@qq.com’

企业微信告警:

“资源”→“配置映射”→“prometheus-alertmanager”→“升级”→ 输入“值”

global:

  resolve_timeout: 5m

  wechat_api_corp_id: ‘wwf4589b498bb9c8fb’

  wechat_api_url: ‘https://qyapi.weixin.qq.com/cgi-bin/’

  wechat_api_secret: ‘gHFsTuw92GOHN0GJdDf6KLOSXUOWaR4fwicDbV9k2FI’

route:

  group_by: [‘alertname’]

  group_wait: 10s

  group_interval: 10s

  repeat_interval: 1h

  receiver: ‘wechat’

receivers:

– name: ‘wechat’

  wechat_configs:

  – send_resolved: true

    to_party: ‘1’

    agent_id: 1000002

    corp_id: ‘wwf4589b498bb9c8fb’

    api_url: ‘https://qyapi.weixin.qq.com/cgi-bin/’

    api_secret: ‘gHFsTuw92GOHN0GJdDf6KLOSXUOWaR4fwicDbV9k2FI’

告警测试

测试方案

配置一条内存告警规则,告警接收方为企业微信、钉钉、email,把内存告警的触发值调低到1%,查看是否能接收到告警消息

测试结果

钉钉告警:

企业微信:

邮箱告警:

shijy

作者shijy

发表评论

电子邮件地址不会被公开。 必填项已用*标注