Skip to content

MONIT Agent

MONIT agent is something that has been there ever since the beginning of MONIT but was never named like this. It refers to all the daemons and tools installed by the MONIT team in the DataCentre hosts in order to collect monitoring base information.

Current Agent

At the time of writing, the current supported MONIT agent is the one based on Collectd (for metrics collection and alarms evaluation) and Apache Flume (for metrics, alarms and syslog forwarding).

All this is configured through the cerncollectd module in Puppet, so please check the module documentation for more information on how to use it.

New Agent

As required by the department we started evaluating alternative tools in order to improve the monitoring tools offering and try to converge into a single system for Puppet, Non Puppet and Kubernetes nodes, therefore chosing Prometheus as the ecosystem to be focused on.

The new agent is currently based on exporters like node exporter (for metric exposing) and fluent-bit (for metrics and logs collection and forwarding). This allows us to keep using collectd as a metric collection tool (since it comes with a Prometheus exporter) and tho preserving plugins developed by service managers while still integrating them in the new ecosystem.

All this is configured with a new module named monitoring in Puppet.

Phasing out new agent

Moving into the new agent will be done in different phases, being the first keeping things as they are in terms of data but moving into the usage of fluent-bit instead of Flume for data forwarding (2024 Q1), the second one enabling the Prometheus integration in parallel with the current flow (2024) and the third and last one to switch dashboards and data storage to be Prometheus based only (2025).

Disable Flume replacement

Since the first phase should already be deployed for everyone as part of the base Puppet definition, here's how to disable it in case it's creating big issues in your hostgroup. Please note that this should be done only temporaryly while working on fixing whatever issue might have happened (as the old stuck will not be maintained further ) so please contact us in case you need any help.

In order to do this, service managers need to configure several hiera variables in order to disable the new monit agent.

monitoring::monit_agent: false
monitoring::monit_agent::flume_replacement: false
cerncollectd::enable_flume_replacement: false

Remove Flume completely

If you are absolutely sure that you are not using flume in your instances and want to remove any previous Flume leftovers you can use the following cerncollectd flag to clean them:

cerncollectd::ensure_flume_is_removed: true

This will remove any log, configuration and service files created by Flume. Also will stop and remove the service and remove the installed package.

Special cases

Puppet hostgroup tests start failing after enabling the new agent

Note: Some people have reported issues when loading the fixtures after adding the monitoring module, this is a known issue and the current workaround requires to add a pre condition to the spec file in order to make sure cerncollectd is included before monitoring.

describe 'hg_myhostgroup' do
  let(:pre_condition) do
    ['
        include cerncollectd
    ']
  end 

Running fluent-bit before MONIT installs the new one

When enabling the new agent, it will also enable the new MONIT repositories in the machine, which contain the package "monit-fluent-bit".

Although it has a different name, it's been built as a virtual package for fluent-bit, so in case you were using it before the new package will be identified as an upgrade.

This by itself should not be a big issue (unless some breaking change between versions), but the problem might come with the way the new package service works. This new package is shipped with a service enabled to run multiple instances of fluentbit in a single machine via "fluent-bit@".

[Unit]
Description=Fluent Bit daemon
Documentation=https://docs.fluentbit.io/manual/
Requires=network.target
After=network.target

[Service]
Type=simple
EnvironmentFile=-/etc/sysconfig/fluent-bit.d/%i
ExecStart=/usr/sbin/fluent-bit $FLUENT_BIT_OPTIONS
Restart=always

SyslogIdentifier=fluent-bit@%i

[Install]
WantedBy=multi-user.target

So you will need to adapt your current configuration to work with this, there are few ways of managing it, but we recommend using this wrapper provided by us.

    # Fluent-bit configuration constants.
    $fluentbit_agent_name = 'my-agent'
    $fluentbit_service_name = "fluent-bit@${fluentbit_agent_name}.service"
    $fluentbit_agent_config_base_dir = "/etc/fluent-bit/${fluentbit_agent_name}"

    # Instantiate a fluent-bit service as monit-agent
    monitoring::monit_agent::forwarders::fluentbit::agent { $fluentbit_agent_name: }

This will make sure you have your service environment configuration "fluent-bit@my-agent" and all the folders needed where to place configuration.