Using Remote Probes

This section explains how to setup and execute probes on remote targets. The workflow is based on Prometheus and the Blackbox exporter, probing endpoints over a variety of protocols: HTTP, HTTPS, TCP, DNS, ICMP.

If you want to read more about the Blackbox exporter, please refer to the Prometheus documentation and to the GitHub project.

How does it work?

The Blackbox exporter executes the configured probes against the remote targets, to then expose the results as metrics to be scraped by Prometheus. This configuration is carried out through the use of modules, which are templates that allow to tweak probe settings such as the used protocol, the expected response, the timeout and more.

You can either pick a module from a fixed set that we support (you can check it here) or create a custom module which can include more things, such as authentication.

How do I add a new probe?

All probes are centrally managed in our monit-remote-probes repository. First thing to do is to create a branch for your changes.

If you've never configured a remote probe before, open the whitelist.yaml file: it's used to prevent accidental changes to other people's probes.

Whitelist yourself by adding your group's name (if it's missing) and your username under it, in a similar fashion of what is already there.

Now create a folder using your group's name: that's where you will operate.

Remember that you are only allowed to make changes in your group's folder and to the whitelist.yaml file. Head into your folder and create a probes.yaml file. The general structure of this file is the following:

<module name>:
  targets:
    - <target definition>
    - ...
  query_interval: <xx>m #(optional)
<module name>:
  targets:
    - ...

The <module name> must be either one of the supported ones or the name of a custom module you created; more information on which modules are available and on how to create a custom one in the following chapter.

The <target definition> depends on what you want to target:

for direct targets (i.e. targeting a set of endpoints), just write the endpoint(s) that you want to probe, as:
```
targets:
  - http://monit-grafana.cern.ch
  - http://monit-grafana-open.cern.ch
```
for puppet targets (i.e. targeting all the nodes behind a puppet hostgroup), the target should have this structure:
```
  - hostgroup: "<target hostgroup>"
    labels:
      <label>: "<label value>"
```
Here, the <target hostgroup> is of course the hostgroup you want to probe (e.g. monitoring/something/example); you can also specify different labels, if you need so. Labels can be set for two use cases; here they are, along with their related labels:
query_interval is an optional configuration that you can use to set up a different query frequency for your targets than the default one "1m". Allowed values are from [1m-99m].
to trigger the creation of NoContact alarms:
- snow_functional_element
- snow_troubleshooting
- snow_assignment_level
- snow_hostgroup_grouping
- snow_watchlist
to override the internal labels automatically set by MONIT:
- job: defaults to <producer>-<hostgroup>-<module>
- module: defaults to the module name
- monit_metric_name: ?
- remote_producer: defaults to the folder name
- submitter_environment: default is retrieved from Puppet
- submitter_hostgroup: defaults to the targeted hostgroup
- toplevel_hostgorup: default derived from the targeted hostgroup

If your puppet probe is configured with a module that uses the http prober (i.e. an HTTP probe), you can also decide to probe a custom endpoint on the hosts belonging to the <target hostgroup>. To do so, add target_path in the <target definition>, following this example:

  - hostgroup: "<target hostgroup>"
    target_path: "/some/custom/endpoint"
    labels:
      <label>: "<label value>"

When you are done, commit your changes and open a merge request. A CI pipeline will take care of handling it, and will automatically merge it and deploy your new probes' configuration to our Blackbox exporter.

Tips and tricks

When you specify the target list for a certain module, you can mix several targets of different types under the same module. This means your configuration can be similar to the following:
```
targets:
  - http://monit-grafana.cern.ch
  - hostgroup: monitoring/something/example
    labels:
      snow_functional_element: "Some Functional Element"
```
Make sure to take advantage of extends when defining custom modules, as explained here.
Everything that you can specify for both direct targets and puppet targets, you can also specify at the module level to be used as "default" for all the targets. For example, if you want to to probe many endpoints externally except one, instead of specifying mode: external for each target, you could write:
```
http_ipv4:
  mode: external
  targets:
    - url: http://monit-grafana.cern.ch
      mode: internal
    - http://monit.cern.ch
    - http://monit-docs.cern.ch
    - ...
```
Please note this also applies to labels, enabled alerts, and everything that can be specified for both target types.

External probes

You can configure remote probes to be executed from outside CERN as well; specifically, they will be executed from our external cluster. Please note that the default is to run probes internally, unless you explicitly specify otherwise. To do this, you'll need to add a mode: <internal|external|both> flag to the target, where the values are self-explanatory. In practice:

targets:
  - http://monit-grafana.cern.ch
  - url: http://monit-grafana-open.cern.ch
    mode: external
  - url: http://monit.cern.ch
    mode: both
  - hostgroup: "<target hostgroup>"
      mode: both
      labels:
        <label>: "<label value>"

The mode flag is optional, so feel free to mix-and-match target definitions such as in the examples above.

Warning

IPv6 probes are currently not supported in mode: external.

Probes alerts

Besides the alert on probes you can configure in Grafana, you can make your probe alert you on other things. To do so, you'll need to add enabled_alerts: <list of alerts> to the target and specify a label with the snow_functional_element so that you can receive tickets for it. In practice:

targets:
  - url: http://monit-grafana.cern.ch
    enabled_alerts:
      - certificate_expiration
    labels:
      snow_functional_element: "Some Functional Element"

The list of alerts you can enable is the following:

certificate_expiration: alerts you one month before the SSL certificate expires (only for http probes)

The list of snow labels you can override is:

(mandatory) snow_functional_element: the SNOW FE of the incident, this is the minimum required to enable alerting functionality
(optional) snow_service_element: the SNOW SE of the incident
(optional) snow_assignment_level: the SNOW assignment level of the incident (as an int)
(optional) snow_hostgroup_grouping: if the incident should be grouped by hostgroup
(optional) snow_auto_closing: if grouping is disabled, OK alerts automatically close tickets
(optional) snow_notification_type: the alarm type, possible values: app (default), hw, os
(optional) snow_watchlist: list of emails that will form the watchlist of the SNOW incident
(optional) snow_troubleshooting: used historically by nocontacts, will be override by "troubleshooting" if specified

What modules can I use?

You can either use one of our supported modules or create a custom module yourself. Let's explore the options one by one.

Supported modules

We currently support the following modules:

http_ipv4
http_ipv6
icmp_ipv4
icmp_ipv6
ssh_banner

As you can understand from their name, these modules utilize either the HTTP, ICMP or TCP protocol, and either ipv4 or ipv6. If you would like to probe both over IPv4 and IPv6, and be notified only if both fail, you'll need to use use both modules separately, and then create a custom alarm to achieve the behavior you want.

If you want to know the details of these probes, have a look at the related chapter.

Custom modules

To define a custom module, you need to create a file in your group's folder named custom_modules.yaml. Inside this file, you can define as many custom modules as you like, following the Blackbox exporter documentation. However, there are a few important things to note:

each module has to be named using the format <group>/<module_name> (e.g. for monitoring group, a valid name would be monitoring/myprobename);
except for the prober field, no further validation checks are executed on your custom modules.

An example of a custom module for the monitoring group, for a probe using ICMP over IPv4 with a timeout of 10 seconds, would be:

monitoring/icmp_ipv4_10s:
  prober: icmp
  timeout: 10s
  icmp:
    preferred_ip_protocol: "ip4"
    ip_protocol_fallback: false

You can also define a custom module starting from one of the supported modules. For example, let's say you want to define a custom module which is exactly the same as the http_ipv4 module, but with a timeout of 10 seconds. You can extend that base module by defining your custom module in this way:

monitoring/http_ipv4_10s:
  extends: http_ipv4
  timeout: 10s

Your custom module will be built by taking the base module and overriding it with the other values you specified.

If you want more examples of custom modules, you can find the configuration details for the supported ones just down below, or have a look into the repository to see how other people structured their files.

If you need to have sensitive information in your custom module, e.g. some authentication details, keep reading to learn how to properly manage that. If not, feel free to skip the next section.

Custom modules with encryption

Sometimes you need to define a custom module with an authentication mechanism, such as basic_auth, or a bearer_token, or any other kind of sensitive information. To safely upload your module to the repository, you will need to encrypt it by following the instructions in this section. As the end result, your file will be encrypted using two secrets, one belonging to you and the other belonging to us; in this way, both of us can decrypt the file without exchanging secrets, and you can commit your files to the repository. At the moment we only support basic_auth with username and password fields, so only the password can be encrypted.

If you or your group has already done this once, and you already have sops, you will already have a public-private key pair. Put your private key in a file, skip step 5 and set the environment variables accordingly.

First off, you need to get sops, the tool you will use to encrypt the sensitive data in your yaml file. Go to the releases page and pick the version for your OS, not older than v3.7.0.
Now you need to generate the secret you will use to encrypt your file. For that you need to get age, a tool which provides a very simple way to generate public-private key pairs. Again, pick your file from the releases page and extract it.
Generate your age key pair by following their documentation, or simply by doing:
```
$ age-keygen -o privatekey.txt
Public key: age1ql3z7hjy54pw3hyww5ayyfg7zqgvc7w3j2elw8zmrj2kg5sfn9aqmcac8p
```
Your public key will be displayed on screen, while the private one will be saved to file; please store your private key safely following the usual best practices. This private key will be used by your whole group (as defined in the whitelist.yaml file) to decrypt the files if you need to edit them.
Create your custom_modules.yaml file following the documentation, using whatever sensitive information you need (e.g. password for basic_auth). Take note of which fields of your yaml file you want to encrypt: in the basic_auth example, you want to encrypt the password field. Do NOT commit the file yet, as you need to encrypt it first.
Now you need to configure sops by specifying which fields to encrypt and to use both your newly created public key and our public key to encrypt the sensitive data. To do so, you need to set a couple environment variables; let's analyze them individually:
- SOPS_AGE_RECIPIENTS specifies the list of public keys used for encryption; you need to use both your public key and MONIT's public key separated by a comma, otherwise the deployment will fail. To summarize: export SOPS_AGE_RECIPIENTS='<your_public_key>,<MONIT_public_key>'.
- SOPS_AGE_KEY_FILE is the path pointing to your private key, used by sops to decrypt your file if you need to make edits; thus what you need is export SOPS_AGE_KEY_FILE=<path_to_private_key>.
  Note
  
  MONIT's public key is
```
age158dwqkakd04p64c3j2mu2hxx5dsy8hxrwnrln9a3frw2ugndtpqs05e8v4
```
Now you can use sops to interact with your custom_modules.yaml file. Specifically:
- sops --encrypt --in-place --encrypted-regex 'password' custom_modules.yaml will encrypt your file; as you can see, you need to specify a regular expression matching the fields you want to encrypt. At the moment we only support basic_auth, so the regular expression must be 'password' as in the example.
- sops --decrypt --in-place custom_modules.yaml will decrypt your file. Whenever you need to make any edit (even to parts of the file which are not encrypted) you need to decrypt the file first, make your edits and then encrypt it again; this is because SOPS includes a file checksum at the end, meaning it will think the file is corrupted if you don't follow this procedure.
Double check that your custom_modules.yaml file has no visible sensitive information left. You will notice that sops appended some encrypted information at the end of your file, which is used to decrypt the file when needed. If you are satisfied with your results, you are good to go! Name your module properly and follow the rest of the documentation.

If you want a concrete example, a custom module for an HTTP probe using basic_auth can be found below.

How do I check the probe results?

The metrics generated from running the configured remote probes are available in a dedicated Remote Probes dashboard, hosted in the MONIT Grafana organization.

How do I set alarms?

As mentioned above, NoContact alarms can be automatically created for puppet_probes when the special snow_functional_element label is set.

In case you wish to create your own alerts in your Grafana organization please open a SNOW ticket asking to have the remote probes data source added to your organization. More details on how to create alarms can be found here.

Further details and examples

Here is a configuration file which you can use as an example:

  http_ipv4: # Module to use
    targets: # List of targets
      - http://monit-grafana.cern.ch # This is a direct target
      - hostgroup: "monitoring/sub1" # This is a puppet target
        labels: # Dict of labels to add to the metric
          snow_functional_element: "Monitoring"
  http_ipv6: # Another module to use
    targets: # List of targets
      - http://monit-grafana.cern.ch
  monitoring/icmp_ipv4_10s:
    targets: # List of targets
      - http://monit-grafana.cern.ch
  icmp_ipv4: # Module to use
    targets: # Dict of targets
      - hostgroup: "monitoring/sub1"
        labels: # Dict of labels to add to the metric
          snow_functional_element: "Monitoring"
      - hostgroup: "monitoring/sub2"
        labels:
          snow_functional_element: "Other FE"
  ssh_banner:
    targets:
      - hostgroup: monitoring/spare
        target_path: :22 # It is important to indicate explicitly the port where the hosts of the hostgroup are listening for ssh connections.
        mode: internal

As an example for the custom module definition, let's look at how the supported modules would be defined as custom ones. This also lets you see the configuration details of each of them, such as the valid status codes for the HTTP probes and more.

HTTP Probes

These probes send a request using the http prober with a timeout of 5 seconds; only HTTP 1.1 and 2.0 are supported, and eventual redirections will be followed. Certificates are validated, but only CAs trusted by CERN are recognized. The status codes which are treated as successful are listed in the configuration:

http_ipv4:
  prober: http
  timeout: 5s
  http:
    valid_status_codes: [200, 201, 202, 203, 204, 205, 206, 207, 208, 226, 401]
    valid_http_versions: ["HTTP/1.1", "HTTP/2.0"]
    follow_redirects: true
    preferred_ip_protocol: "ip4"
    ip_protocol_fallback: false
    tls_config:
      insecure_skip_verify: false
      ca_file: /etc/pki/tls/certs/ca-bundle.crt

http_ipv6:
  prober: http
  timeout: 5s
  http:
    valid_status_codes: [200, 201, 202, 203, 204, 205, 206, 207, 208, 226, 401]
    valid_http_versions: ["HTTP/1.1", "HTTP/2.0"]
    follow_redirects: true
    preferred_ip_protocol: "ip6"
    ip_protocol_fallback: false
    tls_config:
      insecure_skip_verify: false
      ca_file: /etc/pki/tls/certs/ca-bundle.crt

ICMP Probes

Similarly, ICMP probes also have a timeout of 5 seconds. The detailed configuration is:

icmp_ipv4:
  prober: icmp
  timeout: 5s
  icmp:
    preferred_ip_protocol: "ip4"
    ip_protocol_fallback: false

icmp_ipv6:
  prober: icmp
  timeout: 5s
  icmp:
    preferred_ip_protocol: "ip6"
    ip_protocol_fallback: false

SSH Probes

This probes respond only if the SSH connection with the host can be stablished. In this prober no authentication is required. As it only tests if the host is able to initiate an ssh connection. If the ssh service is not available for any reason then the prober reports as not healty. The same if the host is not reachable at all.

The default probing timeout is 5 seconds and the module is configured to use IPv4 to process the probes. Here is the detailed configuration:

ssh_banner:
    prober: tcp
    timeout: 5s
    tcp:
      preferred_ip_protocol: "ip4"
      ip_protocol_fallback: false
      query_response:
      - expect: "^SSH-2.0-"

Custom module with basic_auth

Make sure you've properly configured sops as mentioned in this documentation. An example custom module using basic_auth looks like this:

--- custom_modules.yaml (before encryption)
monitoring/http_ipv4_authn:
  prober: http
  timeout: 5s
  http:
    valid_status_codes: [200, 201, 202, 203, 204, 205, 206, 207, 208, 226, 401]
    valid_http_versions: ["HTTP/1.1", "HTTP/2.0"]
    follow_redirects: true
    preferred_ip_protocol: "ip4"
    ip_protocol_fallback: false
    tls_config:
      insecure_skip_verify: false
      ca_file: /etc/pki/tls/certs/ca-bundle.crt
    basic_auth:
      username: myusername
      password: mypassword

Now you can run sops --encrypt --in-place --encrypted-regex 'password' custom_modules.yaml, to obtain an encrypted file which will look like the following:

monitoring/http_ipv4_authn:
    prober: http
    timeout: 5s
    http:
        valid_status_codes:
            - 200
            - 201
            - 202
            - 203
            - 204
            - 205
            - 206
            - 207
            - 208
            - 226
            - 401
        valid_http_versions:
            - HTTP/1.1
            - HTTP/2.0
        follow_redirects: true
        preferred_ip_protocol: ip4
        ip_protocol_fallback: false
        tls_config:
            insecure_skip_verify: false
            ca_file: /etc/pki/tls/certs/ca-bundle.crt
        basic_auth:
            username: myusername
            password: ENC[AES256_GCM,data:YNzqqwhqEdEVIQ==,iv:Ve5/8VQu3Vj5BVbETRpMd3hPcmXy/rNmQr6xIiQ2yiM=,tag:vO64gf1ymMzO4vbL9pQWwQ==,type:str]
sops:
    kms: []
    gcp_kms: []
    azure_kv: []
    hc_vault: []
    age: []
    lastmodified: "2022-01-28T09:21:50Z"
    mac: ENC[AES256_GCM,data:HQKN6Ii+pENqCBTY2CZl5dzqIEvLDT3Nb1kmLjiv7esgbJzZVt1fEmoSLJfZsR6imAj2/A4o5tJ8rxX2rQfXNevxiBJm49F4tifkAg4Oveh976s2kdZ/YT7tTzLUE/+7yF4xMIiVnIQAteJL3LdMf7IcSh//Fbd/Y4zpZnxc1BI=,iv:/Z5YJeyUHIQKOYpUvYPlF+K0tYgTHVq0/uLkM7Q1xAU=,tag:uZ3bsUJI7Sp4jBTnB9e7bw==,type:str]
    pgp: []
    encrypted_regex: password
    version: 3.7.1

This file can now safely be uploaded to the repository!