springboot實戰之prometheus監控整合

2019 年 12 月 13 日
筆記

前言

在介紹springboot如何與prometheus整合監控之前，先介紹幾個待會整合會用到的工具。

prometheus

1、什麼是prometheus

Prometheus是由SoundCloud開發的開源監控報警系統和時序列資料庫(TSDB)。Prometheus使用Go語言開發，是Google BorgMon監控系統的開源版本

2、prometheus的特點

多維度數據模型
靈活的查詢語言
不依賴分散式存儲，單個伺服器節點是自主的
通過基於HTTP的pull方式採集時序數據
可以通過中間網關進行時序列數據推送
通過服務發現或者靜態配置來發現目標服務對象
支援多種多樣的圖表和介面展示，比如Grafana等

3、prometheus集成的組件

prometheus server：主要用於抓取數據和存儲時序數據，另外還提供查詢和 Alert Rule 配置管理
client libraries：用於對接 Prometheus Server, 可以查詢和上報數據
push gateway：用於批量，短期的監控數據的匯總節點，主要用於業務數據彙報等
exporters：各種彙報exporter，例如nodeexporter，mysqlexporter，mongodb_exporter
alertmanager：告警通知管理

4、prometheus架構圖

5、prometheus適用場景

prometheus在記錄純數字時間序列方面表現非常好。它既適用於面向伺服器等硬體指標的監控，也適用於高動態的面向服務架構的監控。對於現在流行的微服務，prometheus的多維度數據收集和數據篩選查詢語言也是非常的強大。prometheus是為服務的可靠性而設計的，當服務出現故障時，它可以使你快速定位和診斷問題。它的搭建過程對硬體和服務沒有很強的依賴關係。

6、prometheus不適用場景

prometheus重視可靠性。即使在故障情況下，您也始終可以查看有關係統的可用統計資訊。如果您需要100％的準確性（例如按請求計費），則prometheus並不是一個不錯的選擇，因為所收集的數據可能不會足夠詳細和完整。在這種情況下，最好使用其他系統來收集和分析計費數據，並使用prometheus進行其餘的監視。

7、prometheus安裝

可以查看之前我的一篇文章：運維監控之Prometheus入門安裝篇

8、prometheus監控告警

prometheus的警報分為兩個部分。prometheus伺服器中的警報規則將警報發送到Alertmanager。然後，警報管理器通過電子郵件，通話通知系統和聊天平台等方法管理這些警報，包括靜默，禁止，聚合和發出通知。

設置警報和通知的主要步驟是：

設置和配置Alertmanager
配置prometheus與Alertmanager對話
在prometheus中創建警報規則

9、prometheus安裝Alertmanager集成

因為篇幅關係，prometheus 安裝Alertmanager集成可以參考如下鏈接進行安裝

https://www.cnblogs.com/xiangsikai/p/11289757.html

Grafana

1、什麼是Grafana

Grafana是一款用Go語言開發的開源數據可視化工具，可以做數據監控和數據統計，帶有告警功能。Grafana允許您查詢，可視化，警報和了解指標，無論它們存儲在哪裡

2、Grafana的特點

可視化：快速和靈活的客戶端圖形具有多種選項。面板插件為許多不同的方式可視化指標和日誌。
報警：可視化地為最重要的指標定義警報規則。Grafana將持續評估它們，並發送通知。
通知：警報更改狀態時，它會發出通知。接收電子郵件通知。
動態儀錶盤：使用模板變數創建動態和可重用的儀錶板，這些模板變數作為下拉菜單出現在儀錶板頂部。
混合數據源：在同一個圖中混合不同的數據源!可以根據每個查詢指定數據源。這甚至適用於自定義數據源。
注釋：注釋來自不同數據源圖表。將滑鼠懸停在事件上可以顯示完整的事件元數據和標記。
過濾器：過濾器允許您動態創建新的鍵/值過濾器，這些過濾器將自動應用於使用該數據源的所有查詢。

3、Grafana安裝

Grafana官網有提供很詳細的安裝教程，安裝可以查看如下鏈接，本文就不再論述

https://grafana.com/docs/installation/rpm/

micrometer

1、什麼是micrometer

micrometer號稱監控界的SLF4J，主要用來以極低極低的消耗來給Java程式提供對指標的監控。micrometer為 Java平台上的性能數據收集提供了一個通用的 API，應用程式只需要使用 Micrometer 的通用 API 來收集性能指標即可。micrometer會負責完成與不同監控系統的適配工作。這就使得切換監控系統變得很容易。micrometer還支援推送數據到多個不同的監控系統。

2、micrometer的核心組成模組

包含數據收集 SPI 和基於記憶體的實現的核心模組 micrometer-core。
針對不同監控系統的實現模組，如針對 Prometheus 的 micrometer-registry-prometheus。
與測試相關的模組 micrometer-test。

正文

springboot整合prometheus

1、整合的前置條件

伺服器上已經安裝了prometheus、grafana、alertmanager（可選）。本文的安裝使用docker-compose來構建相關服務，具體的安裝過程可以參考如下文章

https://blog.51cto.com/msiyuetian/2369130

如果懶得看，也可以拿我已經準備好的腳本，扔到伺服器上執行吧，相關的腳本可以查看如下鏈接

http://1t.click/beth

2、pom.xml

<dependencies>      <dependency>        <groupId>org.springframework.boot</groupId>        <artifactId>spring-boot-starter-actuator</artifactId>      </dependency>        <dependency>        <groupId>io.micrometer</groupId>        <artifactId>micrometer-registry-prometheus</artifactId>      </dependency>    </dependencies>

3、application.yml

management:    endpoints:      web:        exposure:          include: '*'  spring:    application:      name: springboot_prometheus

4、prometheus.yml中配置需要採集的服務

scrape_configs:    # The job name is added as a label `job=<job_name>` to any timeseries scraped from this config.    - job_name: 'prometheus'        # metrics_path defaults to '/metrics'      # scheme defaults to 'http'.        static_configs:      - targets: ['localhost:9090']    - job_name: 'springboot-prometheus'      metrics_path: '/actuator/prometheus'      static_configs:      - targets: ['localhost:8081']

5、編寫告警規則

本文以服務宕機超過一分鐘就進行告警為例

groups:  - name: node_down    rules:    - alert: InstanceDown      expr: up == 0      for: 1m      labels:        severity: critical      annotations:        summary: "Instance {{ $labels.instance }} down"        description: "{{ $labels.instance }} of job {{ $labels.job }} has been down for more than 1 minutes."

6、在prometheus.yml中引入告警規則文件路徑node_down.yml

註：node_down.yml和prometheus.yml配置在同一目錄下

rule_files:    - "node_down.yml"

7、在alertmanager.yml中配置通知告警

註：本文以郵件通知告警為例

global:    resolve_timeout: 5m    smtp_smarthost: 'smtp.qq.com:465'    smtp_from: '[email protected]'    smtp_auth_username: '[email protected]'    smtp_auth_password: '這個不是填郵箱密碼，而是qq授權碼'    smtp_require_tls: false    smtp_hello: 'qq.com'      route:    group_interval: 1m    repeat_interval: 1m    receiver: 'mail-receiver'  receivers:  - name: 'mail-receiver'    email_configs:      - to: '[email protected]'

註：qq授權碼獲取如下

打開QQ郵箱-設置-賬戶-帳戶安全-開啟服務-POP3/SMTP服務-生成授權碼-發個簡訊

8、在prometheus.yml中配置告警服務

alerting:    alertmanagers:    - static_configs:      - targets:         - localhost:9093

9、grafana上配置prometheus資料庫

通過http://localhost:3000/login訪問grafana，其默認登錄用戶名和密碼都是admin
通過grafana介面添加prometheus資料庫

10、配置grafana介面上需要展示的dashborad

a、通過訪問https://grafana.com/grafana/dashboards這個鏈接去挑選dashborad，比如下圖

b、點擊grafana上的import按鈕，並填入訪問https://grafana.com/grafana/dashboards得來的編號，比如

c、選擇我們剛才配置好的prometheus資料庫，點擊import

d、dashborad展示

11、告警

首先關閉服務，然後通過訪問http://localhsot:9091/alerts，可以看到如下

說明我們配置的告警規則生效，現在有個實例宕機了

11、告警

通過訪問http://localhost:9093/#/alerts，可以看到如下資訊

ps：圖中馬賽克為我的服務ip+埠

打開接收告警的郵箱，會收到默認的告警郵件資訊，如下圖

自定義埋點監控

有時候我們想要做一些自定義指標監控，比如登錄在線人數啥的，這時候我們可以通過Prometheus提供的指標來進行自定義監控。prometheus的指標類型有如下幾種

Counter

Counter類型代表一種樣本數據單調遞增的指標，即只增不減，除非監控系統發生了重置

Guage

Guage類型代表一種樣本數據可以任意變化的指標，即可增可減

Histogram

Histogram 由bucket{le=""}，bucket{le="+Inf"},sum，count 組成，主要用於表示一段時間範圍內對數據進行取樣（通常是請求持續時間或響應大小），並能夠對其指定區間以及總數進行統計，通常它採集的數據展示為直方圖。

Summary

Summary 和 Histogram 類似，由{quantile="<φ>"}，sum，count 組成，主要用於表示一段時間內數據取樣結果（通常是請求持續時間或響應大小），它直接存儲了 quantile 數據，而不是根據統計區間計算出來的。

本文就以counter來實現一個統計api介面請求次數的監控

1、編寫自定義counter

@Component  public class HttpConterHelper {      private final Counter counter;      public HttpConterHelper(MeterRegistry registry) {      this.counter = registry.counter("custom_api_http_requests_total");    }      public void count() {      this.counter.increment();    }    }

2、編寫介面統計攔截器

@Slf4j  public class PrometheusInterceptor implements HandlerInterceptor {      @Autowired    private HttpConterHelper httpConterHelper;      @Override    public void afterCompletion(HttpServletRequest request, HttpServletResponse response,        Object handler, Exception ex) throws Exception {        httpConterHelper.count();    }  }

@Configuration  public class IntercepterConfig implements WebMvcConfigurer {      @Bean    public PrometheusInterceptor prometheusInterceptor() {        return new PrometheusInterceptor();    }      @Override    public void addInterceptors(InterceptorRegistry registry) {      registry.addInterceptor(prometheusInterceptor()).addPathPatterns("/**");      }  }

3、訪問http://localhost:8081/actuator/prometheus，查看自定義counter是否配置成功

ps：那個7.0代表訪問介面的次數

4、通過grafana展示自定義指標

a、點擊儀錶盤上的Add panel

b、選擇add Query

c、metrics填寫相應的PromQL

ps：如果對PromQL不熟悉可以查看如下鏈接（第一個是官網的例子，第二個是其他網友整理的例子）

https://prometheus.io/docs/prometheus/latest/querying/examples/ 或者https://www.jianshu.com/p/3bdc4cfa08da

d：dashborad展示

總結

springboot與prometheus整合就先講那麼多，本文的springboot版本是使用2版本，springboot2默認已經集成micrometer了，所以使用上基本上就是開箱即用，如果是springboot1版本，其集成可以參考如下鏈接

https://micrometer.io/docs/ref/spring/1.5.

其次如果自定義監控指標的話，除了上述的使用攔截器的方式，還可以採用自定義註解加AOP來等方式來實現。然後本文的監控告警規則配置以及告警基本上都是入門級別的，對這塊內容感興趣的朋友，可以參考如下鏈接

https://prometheus.io/docs/alerting/notification_examples/

參考文檔

https://prometheus.io/ https://grafana.com https://micrometer.io/

docker-compose快速搭建 Prometheus+Grafana監控系統

https://blog.51cto.com/msiyuetian/2369130

Grafana的介紹與使用

https://www.jianshu.com/p/0d82c7ccc85a

使用Micrometer記錄Java應用性能指標

https://www.ibm.com/developerworks/cn/java/j-using-micrometer-to-record-java-metric/index.html

demo鏈接

https://github.com/lyb-geek/springboot-learning/tree/master/springboot-prometheus