Service metrics
The following options are provided:
Please note the considerations in the sections below.
Data send via Collectd
- Collectd is the recommended method for ingesting data into the infrastructure
- Before deciding to use any other method have a look first at Collectd
Data send via AMQ/HTTP
- The usage of the AMQ source is recommended for producers outside CERN
- The following fields are "reserved":
_id, availability_zone, environment, event_timestamp, host, hostgroup, hostname, json, monit_hdfs_path, producer, submitter_envrionment, submitter_hostgroup, toplevel_hostgroup, timestamp, type, type_prefix, version
- You can send a flatten JSON doc or a doc split into "data" and "metadata"
- Anything that you send inside "data" will be kept there
- Anything else that is one of the "reserved" fields will be promoted to "metadata"
- Anything that is not one of the "reserved" fields will be promoted to "data"
Data stored in OpenSearch
When inserting documents inside OpenSearch a mapping for the fields is generated, it can be driven by a predefined template or by OS infering the type to use. There are several options to choose (i.e: boolean, keyword, ip...), so you can have a look into the mapping types.
In addition, there is also a set of mapping params, that might come in handy depending on your data (highlight for the enabled one).
As a general rule, we take all the "string" fields as a "keyword" with ignore_above_value of 256 characters (which means strings over that amount of characters will be written down but not indexed). The other option is to use a type of the "text family".
Users can request these types or mapping parameters to be changed for their index patterns (through a SNOW ticket). Depending on the documents rate/size this might not be approved by default and require some further discussion as it may have a big impact on the stored size of the documents.