.. _delivery_guarantees: Delivery guarantees =================== Mechanism --------- The Notificaties API standard (and by extension Open Notificaties) operates on a simple yet powerful message delivery mechanism: webhooks_. A webhook, in essence, is nothing more than and HTTP endpoint exposed by a server where HTTP requests/messages can be sent to. Upon receiving such a request, the webhook receiver is responsible for processing the content of this request appropriately. Webhooks are registered by parties interested in receiving notifications. The webhook registration is recorded and saved in Open Notificaties. Whenever (another) party publishes a notification, it does so by making a ``HTTP POST`` call to the Open Notificaties API. Open Notificaties, in turn, checks which parties should receive this notification and forwards the message to the registered webhook. .. _webhooks: https://en.wikipedia.org/wiki/Webhook .. _notifications_flow: Flow ~~~~ After a notification (or cloudevent) is received via the API, all subscriptions that need to receive it are fetched and a ScheduledNotification is created in the database for each subscription. A background task that runs every ``NOTIFICATION_SEC_INTERVAL`` seconds picks up ``NOTIFICATION_LIMIT`` (see :ref:`installation_env_config` > Celery) of scheduled notifications and creates tasks that will send the notification to the subscription callback_urls. Successful ScheduledNotifications are removed, failed ones get updated with an ``execute_after`` timestamp to be retried (according to exponential backoff) until they succeed or the retry limit is reached. Failure modes ------------- Even though the mechanism is simple, the underlying infrastructure is not. There is always a chance that a message does not get properly delivered - a problem that all *message broker* systems have. The Notificaties API standard defines that recipients of a message/notification have to reply with a HTTP 204 status code to confirm that the message was received. However, to complicate things further, this confirmation response may also be lost. To summarize, the following scenarios are possible: * Open Notificaties delivers message and receives confirmation (happy flow) * Open Notificaties delivers message but does not receive a confirmation (failure mode) * Open Notificaties fails to deliver the message successfully (failure mode) Now there are essentially two mitigation modes available: * at-most-once delivery * at-least-once delivery Delivering a message exactly once is not possible since the underlying infrastructure ("the internet") may fail for whatever reason. Mitigations ----------- Open Notificaties operates in "at-least-once" delivery mode. This means that whenever a delivery attempt succeeds, no more delivery attempts are made and whenever no confirmation is received, another delivery attempt will be made. A failure can be in the form of an HTTP status code that does **not** indicate `success `_ (so anything other than ``HTTP 200``, ``HTTP 201``, ``HTTP 202``, ``HTTP 204``) or network errors such as connection or timeout errors (or anything else going wrong). In practice, Open Notificaties will consider any ``2xx`` response status code as "message is delivered, no further attempts must be made". .. note:: Webhook subscribers must be able to handle multiple deliveries of the same message! If they received a message correctly but failed to reply with a success response, Open Notificaties will deliver the same message again. Retry mechanism ~~~~~~~~~~~~~~~ By default, sending notifications to subscribers has automatic retry behaviour, i.e. if the notification publishing task has failed, it will automatically be rescheduled/tried again until the maximum retry limit has been reached. Autoretry explanation and configuration ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Retry behaviour is implemented using binary exponential backoff with a delay factor, the formula to calculate the time to wait until the next retry is as follows: .. math:: t = \text{backoffFactor} * \text{baseFactor}^c where `t` is time in seconds and `c` is the number of retries that have been performed already. This behaviour can be configured using :ref:`setup_configuration ` and also via the admin interface at **Configuratie > Notificatiescomponentconfiguratie**: * **Notification delivery max retries**: the maximum number of retries the task queue will do if sending a notification has failed. Default is ``7``. * **Notification delivery retry backoff**: a boolean or a number. If this option is set to ``True``, autoretries will be delayed following the rules of binary exponential backoff. If this option is set to a number, it is used as a delay factor. Default is ``25``. * **Notification delivery retry backoff max**: an integer, specifying number of seconds. If ``Notification delivery retry backoff`` is enabled, this option will set a maximum delay in seconds between task autoretries. Default is ``52000`` seconds. * **Notification delivery base factor**: the base factor used for exponential backoff. This can be increased or decreased to spread retries over a longer or shorter time period. Default is ``4``. When using the default configuration, the following delay will be added to each retry of a failing request: +-------+--------------+---------------+ | # | Delay added | Total elapsed | +=======+==============+===============+ | 0 | 0s | 0s | +-------+--------------+---------------+ | 1 | 25s | 25s | +-------+--------------+---------------+ | 2 | 100s | 2m 5s | +-------+--------------+---------------+ | 3 | 400s | 8m 45s | +-------+--------------+---------------+ | 4 | 1600s | 35m 25s | +-------+--------------+---------------+ | 5 | 6400s | 2h 22m 5s | +-------+--------------+---------------+ | 6 | 25600s | 9h 28m 45s | +-------+--------------+---------------+ | 7 | 52000s | 23h 55m 25s | +-------+--------------+---------------+ So if the subscribed webhooks is up after 1 min of downtime the default configuration can handle it automatically. .. note:: Because scheduled notifications are started in batches every X seconds (based on ``NOTIFICATION_SEC_INTERVAL``, 20s by default) the notifications will not be executed on the exact delay. Scheduled notifications are ordered by their ``execute_after`` timestamp and ``attempt`` so that new notifications are prioritized. Under high load, dependent on the amount of queued scheduled notifications and ``NOTIFICATION_LIMIT`` it is possible that notifications are sent a few minutes later. Open Notificaties message broker ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Under the hood, notifications are distributed by background workers to ensure API endpoint availability. The *results* and metadata of the background tasks are stored in Redis, which is an in-memory key-value store. Redis is also used as a message broker. Task metadata is important for keeping track of automatic delivery retries, so it is recommended to set up Redis as a highly-available and/or persistent storage. .. _Redis: https://redis.io/