Service metrics

The following options are provided:

Please note the considerations in the sections below.

Data send via Collectd

Collectd is the recommended method for ingesting data into the infrastructure
Before deciding to use any other method have a look first at Collectd

Data send via AMQ/HTTP

The usage of the AMQ source is recommended for producers outside CERN
The following fields are "reserved": _id, availability_zone, environment, event_timestamp, host, hostgroup, hostname, json, monit_hdfs_path, producer, submitter_envrionment, submitter_hostgroup, toplevel_hostgroup, timestamp, type, type_prefix, version
You can send a flatten JSON doc or a doc split into "data" and "metadata"
Anything that you send inside "data" will be kept there
Anything else that is one of the "reserved" fields will be promoted to "metadata"
Anything that is not one of the "reserved" fields will be promoted to "data"

Data stored in OpenSearch

When inserting documents inside OpenSearch a mapping for the fields is generated, it can be driven by a predefined template or by OS infering the type to use. There are several options to choose (i.e: boolean, keyword, ip...), so you can have a look into the mapping types.

In addition, there is also a set of mapping params, that might come in handy depending on your data (highlight for the enabled one).

As a general rule, we take all the "string" fields as a "keyword" with ignore_above_value of 256 characters (which means strings over that amount of characters will be written down but not indexed). The other option is to use a type of the "text family".

Users can request these types or mapping parameters to be changed for their index patterns (through a SNOW ticket). Depending on the documents rate/size this might not be approved by default and require some further discussion as it may have a big impact on the stored size of the documents.