Skip to content

Using Prometheus

Prometheus metrics can be ingested into the MONIT infrastructure for different use cases:

  • long term data storage for small Prometheus deployments
  • data federation for multiple Prometheus clusters
  • central Grafana alarms for Prometheus alerts

Before starting

The first step is to open a SNOW request providing the needed details.

Important

Please respect the agreed data volume/rate. We have limited quota in all backends used by MONIT. Usage is monitored but in case a significant change is required please contact us in advance.

Pushing to MONIT (Pilot)

Prometheus metrics can be pushed to MONIT for long term storage. Currently Grafana Mimir is being used as its backbone. This service relies heavily on S3 compatible APIs, in this case provided by CERN's CEPH. Afterwards, data can be accessed using PromQL compliant queries. The default retention period is 40 days, but a custom retention per tenant can be organized by request.

Sending the data

After creating the SNOW request as explained above, you will be provided with basic authentication credentials needed to access the service. Once your account is created, you can start pushing the metrics. The simplest way to do so is by using Prometheus' remote_write feature sending data to this endpoint. Don't forget to define the basic_auth.username field and either basic_auth.password or basic_auth.password_file, and populate them with the provided credentials (tbag service secret).

- url: http://monit-prom-lts.cern.ch/api/v1/push
  basic_auth:
    username: "{{ tenant }}"
    password_file: "{{ path_to_file }}"

Using Prometheus to write data is not mandatory. The same can be achieved with POST requests to the previously mentioned URL with the appropriate basic authentication and a document scheme compliant with OpenMetrics specification version v1.0.0.

Accessing the data

Data which has already been sent can be accessed from your organization in central MONIT Grafana. To access it, you will need the same credentials used for Basic Authentication when sending the metrics and set http://monit-prom-lts.cern.ch/prometheus as the URL of the Prometheus datasource (Have a look to Data access).

There's not direct Prometheus interface to your tenant metrics, but Grafana offers the "explore" option, which will give you similar functionality allowing to put quick queries, autocompletion...

Remote read from Prometheus

One recommended setup when you are sending data using a Prometheus instance and remote write is to configure it to use Mimir as the remote read endpoint. This will allow you to keep in your Prometheus a shorter time retention of data and still query through it longer periods transparently using Mimir as the backend for data outside the Prometheus TSDB. You will find this useful as even in the event of a central Mimir downtime you will still have access to your recent data in your own Prometheus, allowing you to keep operating until the central service is restored.

In order to configure the remote read please use the following as example:

- url: http://monit-prom-lts.cern.ch/prometheus/api/v1/read
  basic_auth:
    username: "{{ tenant }}"
    password_file: "{{ path_to_file }}"
  filter_external_labels: false #This is needed only if you are injecting external labels when remote writing

High availability setup

Thanks to Mimir, it's possible to establish a high availability setup of Prometheus. This is achieved by having multiple instances push metrics related to the same targets, thus creating parallel streams of data. A deduplication process is then performed by selecting a leading instance within each cluster. If this instance is not sending samples for a certain amount of time, a new leader is selected.

In order to take advantage of the HA setup two labels must be added to each send sample:

  • __cluster__: ID of cluster of Prometheus instances
  • __replica__: ID of replica within said cluster

Tenant operation dashboards

There's a set of dashboards that will allow you to understand your tenant utilization (time series, metrics number, label distribution...). These dashboards need to be set by the MONIT admins in your infrastructure, so ask for them if you are interested.

Alertmanager

As part of the Mimir tenant you will also receive an alertmanager instance, this instance can be configured using Grafana 10 interface, but until that's available have a look into mimir tools, that will allow you to configure the alertmanager using a CLI with your user/password.

Alertmanager will have access to any rules set for your tenant, so they are isolated and can't work on other tenant metrics.