運(yùn)維過程中,很多時(shí)候,業(yè)務(wù)應(yīng)用會(huì)出現(xiàn)假死的情況,應(yīng)用進(jìn)程正常,但是無法提供服務(wù),此時(shí)監(jiān)控進(jìn)程沒有任何意義,就需要監(jiān)控接口
接口監(jiān)控的方法很多,可以用鏈路監(jiān)控,可以寫腳本進(jìn)行監(jiān)控
由于監(jiān)控整體采用的是prometheus,所以這里就直接用blackbox_exporter來做接口的監(jiān)控
blackbox_exporter可以通過http、https、dns、tcp、ICMP對(duì)target進(jìn)行探測(cè),dns、tcp、ICMP都相對(duì)簡(jiǎn)單,我這邊主要是要監(jiān)控一個(gè)登錄接口,所以要用https來進(jìn)行探測(cè),具體配置方法如下
部署blockbox_exporter
- # 創(chuàng)建blackbox_exporter的目錄
- cd /usr/local/blackbox_exporter
- # 下載blackbox_exporter的包
- wget https://github.com/prometheus/blackbox_exporter/releases/download/v0.19.0/blackbox_exporter-0.19.0.linux-amd64.tar.gz
- # 解壓
- tar -zxvf blackbox_exporter-0.19.0.linux-amd64.tar.gz
- # 重命名
- mv blackbox_exporter-0.19.0.linux-amd64.tar.gz blackbox_exporter
用systemd管理blackbox_exporter
- cat > /etc/systemd/system/blackbox_exporter.service << "EOF"
- [Unit]
- Description=Blackbox Exporter
- Wants=network-online.target
- After=network-online.target
- [Service]
- User=root
- ExecStart=/usr/local/blackbox_exporter/blackbox_exporter/blackbox_exporter --config.file=/usr/local/blackbox_exporter/blackbox_exporter/blackbox.yml
- Restart=on-failure
- [Install]
- WantedBy=default.target
- EOF
對(duì)于blackbox_exporter管理的話,還有很多方法,你可以直接nohup后臺(tái)啟動(dòng),也可以通過supervisor進(jìn)行啟動(dòng),我習(xí)慣用systemd進(jìn)行管理
然后添加開機(jī)自啟動(dòng)
- systemctl daemon-reload
- systemctl enable blackbox_exporter
- systemctl start blackbox_exporter
因?yàn)闆]指定日志輸出,所以是輸出到message日志中
默認(rèn)帶的blacbox.yml只是個(gè)最簡(jiǎn)單配置,不能滿足需求,所以配置blackbox.yml中,添加一個(gè)模塊
- modules:
- xhj_login: # 模塊名稱,prometheus配置文件中要匹配
- prober: http # 協(xié)議
- timeout: 30s # 超時(shí)時(shí)間
- http: # 模塊的采集協(xié)議
- method: POST # http請(qǐng)求的方法
- preferred_ip_protocol: "ip4" # 使用的ipv4協(xié)議
- headers: # 配置post請(qǐng)求的header頭
- Content-Type: application/json
- body: ''{"mobile": "13572801829", "password": "ZWB123wyl"}'' # post請(qǐng)求參數(shù)
配置完成后,保存blackbox.yml,啟動(dòng)blackbox_exporter
- systemctl start blackbox_exporter
- systemctl status blackbox_exporter
- ● blackbox_exporter.service - Blackbox Exporter
- Loaded: loaded (/etc/systemd/system/blackbox_exporter.service; enabled; vendor preset: disabled)
- Active: active (running) since Tue 2022-01-04 21:33:28 CST; 6s ago
- Main PID: 24679 (blackbox_export)
- Tasks: 7
- Memory: 1.9M
- CGroup: /system.slice/blackbox_exporter.service
- └─24679 /data/prometheus/blackbox_exporter/blackbox_exporter/blackbox_exporter --config.file=/data/prometheus/blackbox_exporter/blackbox_exporter/blackbox....
- Jan 04 21:33:28 systemd[1]: Started Blackbox Exporter.
- Jan 04 21:33:28 blackbox_exporter[24679]: level=info ts=2022-01-04T13:33:28.173Z caller=main.go:224 msg="Starting blackbox_exporter" version="(v...33d1ed0)"
- Jan 04 21:33:28 blackbox_exporter[24679]: level=info ts=2022-01-04T13:33:28.173Z caller=main.go:225 build_context="(go=go1.16.4, user=root@2b025...2:56:44)"
- Jan 04 21:33:28 blackbox_exporter[24679]: level=info ts=2022-01-04T13:33:28.173Z caller=main.go:237 msg="Loaded config file"
- Jan 04 21:33:28 blackbox_exporter[24679]: level=info ts=2022-01-04T13:33:28.174Z caller=main.go:385 msg="Listening on address" address=:9115
- Jan 04 21:33:28 blackbox_exporter[24679]: level=info ts=2022-01-04T13:33:28.174Z caller=tls_config.go:191 msg="TLS is disabled." http2=false
- Hint: Some lines were ellipsized, use -l to show in full.
配置prometheus
- # blackbox
- - job_name: "blackbox"
- metrics_path: /probe
- params:
- module: [xhj_login]
- static_configs:
- - targets:
- - https://xxx.aaa.com/api/pc/user/login/password
- relabel_configs:
- - source_labels: [__address__]
- target_label: __param_target
- - source_labels: [_param_target]
- target_label: instance
- - target_label: __address__
- replacement: 172.17.0.1:9115
- ……省略
- ……省略
配置完成后,保存,熱加載prometheus配置文件
- curl -X POST http://localhost:9090/-/reload
加載完成后,通過prometheus的UI查看下targets
此時(shí)其實(shí)可以通過被請(qǐng)求端的日志就可以看到blackbox_exporter發(fā)起的請(qǐng)求了,我們也可以通過curl請(qǐng)求blackbox_exporter來進(jìn)行查看
- # 注意這里用&符號(hào)鏈接多個(gè)參數(shù),需要轉(zhuǎn)義
- curl http://172.17.0.1:9115/probe?target=https://xxx.aaa.com/api/pc/user/login/password\&module=xhj_login\&debug=true
- # 查看返回結(jié)果
- Logs for the probe:
- ts=2022-01-04T14:10:32.979231489Z caller=main.go:320 module=xhj_login target=https://xxx.aaa.com/api/pc/user/login/password level=info msg="Beginning probe" probe=http timeout_seconds=30
- ts=2022-01-04T14:10:32.979411891Z caller=http.go:335 module=xhj_login target=https://xxx.aaa.com/api/pc/user/login/password level=info msg="Resolving target address" ip_protocol=ip4
- ts=2022-01-04T14:10:32.986112778Z caller=http.go:335 module=xhj_login target=https://xxx.aaa.com/api/pc/user/login/password level=info msg="Resolved target address" ip=1.1.1.1
- ts=2022-01-04T14:10:32.986225541Z caller=client.go:251 module=xhj_login target=https://xxx.aaa.com/api/pc/user/login/password level=info msg="Making HTTP request" url=https://1.1.1.1/api/pc/user/login/password host=xxx.aaa.com
- ts=2022-01-04T14:10:33.05701057Z caller=main.go:130 module=xhj_login target=https://xxx.aaa.com/api/pc/user/login/password level=info msg="Received HTTP response" status_code=200
- ts=2022-01-04T14:10:33.057095911Z caller=main.go:130 module=xhj_login target=https://xxx.aaa.com/api/pc/user/login/password level=info msg="Response timings for roundtrip" roundtrip=0 start=2022-01-04T22:10:32.986352765+08:00 dnsDone=2022-01-04T22:10:32.986352765+08:00 connectDone=2022-01-04T22:10:32.992658409+08:00 gotConn=2022-01-04T22:10:33.038333687+08:00 responseStart=2022-01-04T22:10:33.056951457+08:00 tlsStart=2022-01-04T22:10:32.992701614+08:00 tlsDone=2022-01-04T22:10:33.038235019+08:00 end=2022-01-04T22:10:33.057074505+08:00
- ts=2022-01-04T14:10:33.057187486Z caller=main.go:320 module=xhj_login target=https://xxx.aaa.com/api/pc/user/login/password level=info msg="Probe succeeded" duration_seconds=0.077883376
可以看到,http_status_code 200,probe_success為1,訪問正常
添加告警
創(chuàng)建rules
- groups:
- - name: blackbox_networks_stats
- rules:
- - alert: PC登錄接口故障
- expr: probe_success == 0
- for: 1m
- labels:
- severity: critical
- annotations:
- summary: "PC登錄接口出現(xiàn)故障,無法正常登錄,請(qǐng)及時(shí)查看!"
- description: "PC登錄接口出現(xiàn)故障,無法正常登錄,請(qǐng)及時(shí)查看!"
創(chuàng)建完成后,熱加載prometheus配置,然后查看prometheus的rules
Grafana添加監(jiān)控圖標(biāo)
監(jiān)控配置完,必須得配個(gè)可視化儀表盤,直接用5345模板導(dǎo)入
儀表盤中顯示接口狀態(tài)、是否使用SSL、包括SSL證書過期時(shí)間、接口狀態(tài)碼、接口請(qǐng)求時(shí)間和DNS解析時(shí)間
原文鏈接:https://mp.weixin.qq.com/s/0AR5ZgG57gWDHFczkC6LzQ