Skip to content

Data Storage

The following backends are used today by MONIT: HDFS, OpenSearch, InfluxDB.

HDFS

The monitoring data in HDFS is stored under `/project/monitoring/archive'.

OpenSearch

InfluxDB

Collectd data is stored inside InfluxDB, depending on the type of data (base monitoring or service monitoring) it will end up in a different database.

Base monitoring

For base monitoring at the current time there are a total of 15 different InfluxDB instances, one per base plugin. Each of them run in a different port but share the same read only account and have the same database name (monit_production_collectd).

Base metrics Port
dbod-m-c-cpu 8080
dbod-m-c-df 8081
dbod-m-c-disk 8082
dbod-m-c-inte 8085
dbod-m-c-irq 8086
dbod-m-c-load 8087
dbod-m-c-memo 8088
dbod-m-c-moni 8080
dbod-m-c-pupp 8083
dbod-m-c-proc 8089
dbod-m-c-swap 8090
dbod-m-c-tcpc 8091
dbod-m-c-upti 8092
dbod-m-c-user 8093
dbod-m-c-vmem 8094

Service monitoring

By default service specifc metrics are stored in a common InfluxDB instance (dbod-m-ctd) split in different databases. These databases are based on the service and the toplevel hostgroups of the data. For some exceptions metrics can also be stored in dedicated instances.

Service metrics Port
dbod-m-ctd 8084
dbod-m-batch 8080
dbod-m-cld 8091
dbod-m-mig 8080
dbod-m-moni 8080

Measurements

In the InfluxDB world there is a thing called measurement that is similar to an SQL table, for the Collectd use case these measurements are being named after the "plugin"_"type" so for example the next document:

time                plugin type
----                ------ ----
1515486329000000000 cpu    percent

Will end up in a measurment called "cpu_percent".

Data schema

The InfluxDB data schema differs from the rest of the infrastructure, due to InfluxDB restrictions, however, all the Collectd databases share the same schema.

Inside the data you will find some main types of data:

  • Monitoring metadata like host, submitter_environment, submitter_hostgroup, toplevel_hostgroup. Providing information about the host environment.
  • Collectd metadata like plugin, plugin_instance, type, type_instance. Providing information about the Collectd namespace
  • Collectd data like *_value, this is a representation of the different aggregations provided for Collectd data

Note: In the Collectd world there is the concept of multi-metrics, which mean the same document might have multiple values. e.g: network plugin will come with rate_in and rate_out. In the monitoring infrastructure we do split these documents producing single value ones, and we promote the value name to a new metadata like field "value_instance".

- time: long:milliseconds
- host: string
- max_value: double|long
- mean_value: double|long
- min_value: double|long
- plugin: string
- plugin_instance: string|UNKNOWN
- producer: string
- submitter_environment: string
- submitter_hostgroup: string
- sum_value: double|long
- toplevel_hostgroup: string
- type: string
- type_instance: string:UNKNOWN
- type_prefix: string
- value_instance: string:UNKNOWN

Data binning

In order to improve the resource usage of the InfluxDB databases and the amount of time we can keep the data we have set up three different retention policies for Collectd data.

  • one_week: one minute resolution data that stays for one week (Default one).
  • one_month: five minutes resolution data that stays for one month.
  • five_years: one hour resolution data that stays for five years