Using Prometheus
Prometheus metrics can be ingested into the MONIT infrastructure for different use cases:
- long term data storage for small Prometheus deployments
- data federation for multiple Prometheus clusters
- central Grafana alarms for Prometheus alerts
Before starting
The first step is to open a SNOW request providing the needed details.
Important
Please respect the agreed data volume/rate. We have limited quota in all backends used by MONIT. Usage is monitored but in case a significant change is required please contact us in advance.
Pushing to MONIT (Pilot)
Prometheus metrics can be pushed to MONIT for long term storage. Currently Grafana Mimir is being used as its backbone. This service relies heavily on S3 compatible APIs, in this case provided by CERN's CEPH. Afterwards, data can be accessed using PromQL compliant queries. The default retention period is 40 days, but a custom retention per tenant can be organized by request.
Sending the data
After creating the SNOW request as explained above, you will be provided with basic authentication credentials needed to
access the service. Once your account is created, you can start pushing the metrics. The simplest way to do so is
by using Prometheus' remote_write
feature sending data to this endpoint. Don't
forget to define the basic_auth.username
field and either basic_auth.password
or
basic_auth.password_file
, and populate them with the provided credentials (tbag service secret).
- url: http://monit-prom-lts.cern.ch/api/v1/push
basic_auth:
username: "{{ tenant }}"
password_file: "{{ path_to_file }}"
Using Prometheus to write data is not mandatory. The same can be achieved with POST requests to the previously mentioned URL with the appropriate basic authentication and a document scheme compliant with OpenMetrics specification version v1.0.0.
Accessing the data
Data which has already been sent can be accessed from your organization in central MONIT Grafana. To access it, you will need the same credentials used for Basic Authentication when sending the metrics and set http://monit-prom-lts.cern.ch/prometheus as the URL of the Prometheus datasource (Have a look to Data access).
There's not direct Prometheus interface to your tenant metrics, but Grafana offers the "explore" option, which will give you similar functionality allowing to put quick queries, autocompletion...
Remote read from Prometheus
One recommended setup when you are sending data using a Prometheus instance and remote write is to configure it to use Mimir as the remote read endpoint. This will allow you to keep in your Prometheus a shorter time retention of data and still query through it longer periods transparently using Mimir as the backend for data outside the Prometheus TSDB. You will find this useful as even in the event of a central Mimir downtime you will still have access to your recent data in your own Prometheus, allowing you to keep operating until the central service is restored.
In order to configure the remote read please use the following as example:
- url: http://monit-prom-lts.cern.ch/prometheus/api/v1/read
basic_auth:
username: "{{ tenant }}"
password_file: "{{ path_to_file }}"
filter_external_labels: false #This is needed only if you are injecting external labels when remote writing
High availability setup
Thanks to Mimir, it's possible to establish a high availability setup of Prometheus. This is achieved by having multiple instances push metrics related to the same targets, thus creating parallel streams of data. A deduplication process is then performed by selecting a leading instance within each cluster. If this instance is not sending samples for a certain amount of time, a new leader is selected.
In order to take advantage of the HA setup two labels must be added to each send sample:
- __cluster__: ID of cluster of Prometheus instances
- __replica__: ID of replica within said cluster
Tenant operation dashboards
There's a set of dashboards that will allow you to understand your tenant utilization (time series, metrics number, label distribution...). These dashboards need to be set by the MONIT admins in your infrastructure, so ask for them if you are interested.
Alertmanager
As part of the Mimir tenant you will also receive an alertmanager instance, this instance can be configured using Grafana 10 interface, but until that's available have a look into mimir tools, that will allow you to configure the alertmanager using a CLI with your user/password.
Alertmanager will have access to any rules set for your tenant, so they are isolated and can't work on other tenant metrics.