Using HTTP

Before starting

Before deciding to use this service have a look first at recommended method using Collectd.

The first step is to open a SNOW Request providing these details:

which service metrics you wish to send and if possible an example
the expected daily data volume and data rate
how you plan to access your data: hdfs files, kafka stream, opensearch/grafana dashboard, etc.

Important

Please respect the agreed data volume/rate. We have limited quota in all backends used by MONIT. Usage is monitored but in case a significant change is required please contact us in advance.

Your data must be represented as a valid JSON object with the following fields (as strings):

(mandatory) producer: used to name your data set, only one value allowed
(mandatory) type: used to classify your data set, you can define multiple values
(optional) type_prefix: used to categorise your metrics, possible values are raw|agg|enr
(optional) timestamp: used to indicate the event submission time
(optional) _id: if you wish to set your own ID, we assign one random ID by default
(optional) host: used to add extra information about the node submitting your data

To understand how to access your data, please refer to the Data Access section.

Sending data

Metrics can be sent to the HTTP endpoint listening in monit-metrics.cern.ch:10012 (Please note this endpoint is getting deprecated).

For better isolation/security we are providing endpoints per producer, so you will need to send your metrics to https://monit-metrics.cern.ch:10014/\<producer\> instead, you will be provided with a password to do so.

Please make sure you send the required mandatory/optional fields.

{
  "producer": "myproducer",
  "type": "mytype",
  ...
  "mymetricfield1": "value1"
}

Please pay attention to the following:

all timestamps must be in UTC milliseconds or seconds, without any subdecimal part
use double quotes and not single quote (not valid in JSON)
send multiple documents in the same batch grouping them in a JSON array.
make sure your document fits in one line as we don't support multi-line JSON.
anything considered metadata for the infrastructure will be promoted to the metadata field in the produced JSON, and the rest will be put inside data.
only UTF-8 charset is accepted and it must be explicitly specified into the Content-Type entity of the HTTP requests

Here is a Python example on how to send data.

import requests
import json
from requests.auth import HTTPBasicAuth

def send(document):
    return requests.post('https://monit-metrics:10014/itdb', auth=HTTPBasicAuth(user, password), data=json.dumps(document), headers={ "Content-Type": "application/json"})

def send_and_check(document, should_fail=False):
    response = send(document)
    assert( (response.status_code in [200]) != should_fail), 'With document: {0}. Status code: {1}. Message: {2}'.format(document, response.status_code, response.text)

basic_document = [{
   "producer": "itdb",
   "type_prefix": "raw",
   "type": "dbmetric",
   "hostname": "hostnameA",
   "timestamp": 1483696735836,
   "data": { # Your metric fields
     "foo": "bar"
   },
   "field2": "value" # Another metric
}]

send_and_check(basic_document)

Let us know you are sending data so we enable your producer in the infrastructure.

Writing to InfluxDB

If you wish to write your data to InfluxDB you should specify which entities should be considered as tags and which as fields. To do this, in every message define two arrays: idb_tags and idb_fields. Without them all of the entities are treated as fields. Also with InfluxDB type entity is mandatory and will be used to create measurement.

Remarks:

General rule of thumb: tags are used for filtering data, and fields are values seen in the plots.
Specifying only idb_tags will indicate to use all remaining entities as fields.
Specifying idb_fields is used for either treating the same entity as field and tag or to exclude some entities to be send to InfluxDB (when both idb_tags and idb_fields are set, remaining entities are not used)
You should avoid writing any sort of IDs (as a tag) to InfluxDB. Reason - cardinality
Strings like descriptions, logs, etc. should not be written to InfluxDB.
If you send some nested structure you may access entities with dot notation (e.g. struct.field)
Fields mentioned at the beginning of the document (producer, type, etc.) are not written to InfluxDB by default, but they might be used in idb_tags and idb_fields.

Example:

{
  "producer": "myproducer",
  "type": "mytype",
  "field1": 42,
  "mytag1": "CERN_PROD",
  "both": "available",
  "somelog": "05 Mar 2018 11:37:08,812 INFO  [Log-BackgroundWorker-cmetric] (org.apache.flume.channel.file.EventQueueBackingStoreFile.checkpoint:255)  - Updating checkpoint metadata: logWriteOrderID: 1519993887298, queueSize: 0, queueHead: 40053
",
  "idb_tags": ["mytag1", "both"],
  "idb_fields": ["field1", "both"]
}

Warning

Even though we present this JSON in multiple lines please make sure yours fits in one line, as we don't support multi-line JSON.

Service Availability

The Service Availability (SA) metric is mandatory for all CERN IT Services and represents the general status of a service. The Availability Dashboard provides the historical view of all services' availability.

Service managers can send their service availability data via JSON/HTTP, as described in the section above. Service Availability metrics have to comply with a specific format, described below, and have to be reported at least once per hour for each service. If no metric is received for more than one hour, the service will be flagged as 'unknown'.

Availability format:

(mandatory) producer: "myproducer"
(mandatory) type: "availability"
(mandatory) serviceid: your service id as registered in the SNOW service catalogue
(mandatory) service_status: the current service status [available|degraded|unavailable]
(optional) timestamp: the event submission time
(optional) availabilityinfo: extra information about the service status
(optional) availabilitydesc: detailed description of the service
(optional) contact: contact email information for the sender of the availability
(optional) webpage: url pointing to the service website

In case you are not sure which is the serviceid of your service, you can refer to this link for the mapping between SE and serviceid.

In summary the document will look something like this:

{
  "producer": "myproducer",
  "type": "availability",
  "serviceid": "arealserviceid",
  "service_status": "available",
  "availabilitydesc": "Indicates availability of this service based on X, Y and Z",
  "availabilityinfo": "100 out of 100 machines happily running",
}

Warning

Even though we present this JSON in multiple lines please make sure yours fits in one line, as we don't support multi-line JSON.

Warning

Please note that additional numerical metrics (previously known as 'numerical values') are not part anymore of the availability document but they have to be sent as a separated JSON document as generic JSON/HTTP metrics, as by the recipe above.

Notifications

If you wish to receive notifications when your service leaves the "available" status there're two possible ways to achieve it.

Rely on SNOW to handle the notification: Please check this KB. SNOW will send an email to the registered "Groups receiving notifications" when creating your service. In case your service is already registered a ticket to SNOW admins will be needed to change the "Groups receiving notifications".
Configure an alarm in Grafana, this will give you better control on the notification flow at the cost of some Grafana work, please check the docs. If you need to have the datasource configured in your organisation contact us through SNOW.

Service SLIs/KPIs

Please refer to the dedicated documentation.