Prometheus Installation and Setup

This document aims Prometheus deployment and federation setup with stable/prometheus helm chart with Grafana and AlertManager.

Single-Cluster Prometheus deployment

Single-Cluster deployment is pretty straight-forward and works out of box with default values.

Federation deployment

We picked federation as a multicluster monitoring strategy.

Basically, federation is scraping metrics by prometheus server(master), scraped and exposed by another prometheus server (slave), via /federate endpoint.

For every slave you want to federate, you need to add a job for your master server:

    scrape_configs:
      - job_name: 'federate'
        scrape_interval: 15s

        honor_labels: true
        scheme: https
        metrics_path: '/federate'

        params:
          'match[]':
            - '{app="prometheus"}'
            - '{__name__=~"^job:.*"}'

        static_configs:
          - targets:
            - 'prometheus.slave.address.com'

match[] is a label-based selector. This should match every metric from your slave:

        params:
          'match[]':
            - '{job!=""}'

External Labels

By default, federated (or slave) metrics are not different from master server metrics. And when you try to get some metrics, for example

some_important_metrics

you'll get slave's and master's metrics altogether.

So, it is mandatory to add some additional labels to all metrics, it allows us to filter metrics by cluster with Prometheus QL.

The easiest way to add additional label to all slave's metrics is configuring external_labels field in slave's .yaml file:

server:
  global:
    external_labels:
      cluster_name: billy

Now, in our master cluster, we can get metrics, related ONLY to billy cluster, filtering by cluster_name (try to keep labels unique):

some_important_metrics{cluster_name="billy"}

Adding labels

External labels will be added to federated targets, and if you add external_label in your master cluster

    external_labels:
      cluster_name: masterJohnson
----------------------------------
some_important_metrics{cluster_name="masterJohnson"}

It will not work, cause your master's Prometheus jobs will not have label cluster_name at all.

So, if you need to get master's metrics, you need to add additional labels to every master's job you want to filter.

With static_configs:

- job_name: prometheus
  static_configs:
    targets:
      - localhost:9090
      labels:
        cluster_name: masterJohnson

With relabel_configs:

- job_name: 'kubernetes-service-endpoints'
  kubernetes_sd_configs:
    - role: endpoints
  relabel_configs:
    ****
    - target_label: cluster_name
      replacement: masterJohnson

Actually, here you replace non-existing label's value with "masterJohnson" and it will create new label :)

Keep in mind, that adding new labels will affect only newly scraped metrics.

Be careful, altering labels can cause a lot of problems in your dashboards.

Grafana deployment

Using federation and labels, you can deploy grafana to your master cluster, and then you can do literally everything within your dashboards.

Actually, the only thing you need to setup is ingress, and password if you do not want to get a random one.

Saving dashboards

You can save dashboard in raw JSON via Grafana UI: Share dashboard -> Export -> Save to file.

Importing dashboards

There are several ways to import Grafana dashboards:

  • Import with dashboard id via grfana UI from GrafanaLab
  • Import raw JSON via grafana UI
  • Import raw JSON as provisioned dashboard while deploying Grafana. Note that provisioned dashboards are uneditable.

AlertManager setup with Slack

To setup notifications, you need to setup three things: alerting rules, receivers and alertmanager routes.

To apply new rules, you need to redeploy Prometheus with updated values.yml and send POST request to /-/reload endpoint (http://prometheus.ams.pointlogic.nl/-/reload)

Rules

Rules are defined in serverFiles.alerts.groups:

serverFiles:
  alerts:
    groups:
      - name: node_down
        rules:
        - alert: NodeDown
          expr: up{job="kubernetes-nodes"} == 0
          for: 5m
          labels:
            severity: critical
          annotations:
            summary: "*Node Is down!*"
            description: "{{ $labels.kubernetes_io_hostname }} is down"

Labels helps us to differentiate alerts and annotations are useful when forming alerts messages.

Routes

Routes define where and with what intervals alerts will be sent.

 alertmanagerFiles:
   alertmanager.yml:
     route:
       group_by: [alertname, cluster]
       receiver: slack_default
       routes:
         - group_wait: 10s
           group_interval: 10m
           match:
             severity: critical
           repeat_interval: 30m

Receivers

Routes route alerts to receivers, so you need to setup at least one receiver:

alertmanagerFiles:
  alertmanager.yml:
    global:
      slack_api_url: 'https://hooks.slack.com/services/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'

    receivers:
      - name: slack_default
        slack_configs:
          - channel: '#monitoring-alerts'
            text: "<!channel>\n*Summary:* {{ .CommonAnnotations.summary }}\n*Description:*\n{{ range .Alerts }}{{ .Annotations.description }}\n{{ end }}"
            send_resolved: true

Alerts to several channels

To send alerts to several channels in one workspace, you just setup several slack_configs entries with your channels names.

If you want to send an alert to another workspace, you can do it easily with api_url field:

          - channel: '#monitoring-alerts'
            text: "some text"
            api_url: 'api url for your workspace'

          - channel: '#monitoring-alerts'
            text: "Some text"
            api_url: 'api url for your workspace'

Securing ingress with basic_auth (login and password)

To secure your ingress you can create a secret. Create a file with your login - password pair:

htpasswd -c file_name login

Create a secret:

kubectl create secret generic prometheus-auth --from-file=file_name --namespace monitoring

And add annotations to your ingress:

ingress:
  annotations:
    nginx.ingress.kubernetes.io/auth-type: basic
    nginx.ingress.kubernetes.io/auth-secret: prometheus-auth
    nginx.ingress.kubernetes.io/auth-realm: "Authentication Required"

And to federate secured Prometheus:

      - job_name: 'federate-secured'

        honor_labels: true
        scheme: https
        metrics_path: '/federate'

        params:
          'match[]':
            - '{job!=""}'

        static_configs:
          - targets:
            - 'prometheus.k8s-ap-southeast-1-production.pointlogic.co'

        basic_auth:
          username: ams
          password: <Password in plain text>