Notification Targets
ServiceNow target
It is possible to create ServiceNow incidents from alarms integrated in the MONIT infrastructure.
To enabled it, these are the parameters that can be set and their default values:
- cerncollectd::config::snow_alarms_enabled: true
- cerncollectd::config::snow_fe: populated from local FE fact
- cerncollectd::config::snow_se: the default SE associated to the FE
- cerncollectd::config::snow_assignment_level: 3
- cerncollectd::config::snow_grouping: true (deprecated in favour of hostgroup_grouping)
- cerncollectd::config::snow_hostgroup_grouping: true
- cerncollectd::config::snow_auto_closing: false (only when hostgroup_grouping is false)
- cerncollectd::config::snow_fe_category: undef
- cerncollectd::config::snow_watchlist: undef
These defaults can be overridden per hostgroup or environment using Hiera (check section below).
Grouping
All alarms are grouped by default and the infrastructure allows to specify if alarms are grouped by "entity" or "hostgroup". To control this selection please use the snow_hostgroup_grouping
parameter.
cerncollectd::config::snow_hostgroup_grouping: true
The grouping of alarms follows these rules: * If there is no "hostgroup_grouping" field or is set to false the events will be grouped in an incident by "alarm_name" and "entity". * If there is "hostgroup_grouping" set to true and there is not "submitter_hostgroup" the events will be grouped by "alarm_name" and "entity". * If there is "hostgroup_grouping" set to true and there is "submitter_hostgroup" the events will be grouped by "alarm_name" and "submitter_hostgroup".
Auto closing
Incidents created in SNOW can be closed automatically when and OK event is received matching and open ticket by "alarm_name" and "entity".
cerncollectd::config::snow_auto_closing: true
This events are only sent to SNOW when the "hostgroup_grouping"/"grouping" options are disabled (set to false). Since it's not possible to know when a ticket containing multiple entities should be closed.
Email target
It is possible to generate emails from alarms integrated in the MONIT infrastructure.
To enabled it, these are the parameters that can be set and their default values:
- cerncollectd::config::email_alarms_enabled: false
- cerncollectd::config::email_to: [] - List of email recipients
These defaults can be overridden per hostgroup or environment using Hiera (check section below).
Disable targets (Roger)
All alarms are shipped with the flag "roger_alarmed" based on the roger state of the machine for the specific alarm type. If an alarm is shipped with roger_alarmed false all notification endpoints will be ignored and no action will be taken (ticket creation in SNOW for example).
The flow followed to decide this flag is the following, moving to the next step only if the previous one failed for some reason:
- Check the roger cache file in the host
- Ask roger for actual information
- Check the parameter "alarmed_default" in the cerncollectd::alarms::config class (default: false)
To change the value of the alarmed_default parameter use Hiera and write:
cerncollectd::alarms::config::alarmed_default: true
Overriding defaults
All parameters described in the sections above can be overridden using Hiera.
All alarms
cerncollectd::config::snow_se: 'My SNOW SE'
This configuration will be applied to all the alarms by default.
Single alarm
The best way is to use the custom_targets
parameter that the alarm definition will have. An example for the boot_full
alarm would be:
cerncollectd_contrib::alarm::boot_full::custom_targets:
snow:
functional_element: "My FE"
email:
to:
- someone@cern.ch
- someone.else@cern.ch
send_ok: true
Inside the snow
Hash, you can set custom values for the following parameters. If omitted, they will be set to the global defaults:
- disabled: true/false
- functional_element
- service_element
- assignment_level
- grouping (deprecated, use hostgroup_grouping)
- hostgroup_grouping
- watchlist
- auto_closing
Inside the email
Hash, you can set the following parameters:
- disabled: true/false
- to
- send_ok: true/false
Usually that's all you have to do, but there could be cases when this parameter can't be used, or where you need finer granularity. In these cases, it's possible to use directly the cerncollectd::alarms::extra
resource. You can define a new resource in your Puppet manifest, with the collectd namespace and the target fields to override.
The same configuration that is available for the snow integration can be done for the email target as well. It's possible to use directly the cerncollectd::alarms::extra
resource.
::cerncollectd::alarms::extra {'df':
ctd_namespace => 'df',
targets => {
snow => {
disabled => false,
watchlist => ['someone@cern.ch','someone.else@cern.ch'],
},
email => {
disabled => false,
to => ['someone@cern.ch'],
},
}
}
::cerncollectd::alarms::extra {'df_root':
ctd_namespace => 'df_root',
targets => {
snow => { functional_element => 'newfe' },
email => {
disabled => false,
to => ['someone@cern.ch','someone.else@cern.ch'],
send_ok => true,
},
}
}
This will generate several configuration files inside "/etc/collectd.d/alarms/":
- df.yaml: contains the specific SNOW configuration for the df plugin.
- df_root.yaml: contains the specific SNOW configuration for the df plugin and root instance.
The priorities are always driven by the granularity of the definition, so more specific definitions will take over more generic ones (in this case: df_root > df > default).
So the result of the example above will be:
- SNOW alarms will be enabled for all the notifications coming from plugin
df
, and they will be sent to the default FE. - In the case of
df_root
, the notifications will be sent to the FEnewfe
.
In this example it is portrayed how to set a list of recipients for the email target, as well as how to send OK alarms.