Hazem Azzam

All posts
Writing

Building a Scalable Multi-Channel Notification Service

A practical architecture for a notification service that delivers across email, SMS, push, and in-app channels. It covers user preferences, rate-limiting, synchronous and batch delivery, queueing with retries, high availability, and the trade-offs between latency, cost, and reliability.

6 min read
architecturenotificationsdistributed-systemsqueueingreliability

Why notifications are harder than they look

A "send a notification" feature starts as one line of code and ends as a distributed system. The moment you support more than one channel, more than a handful of users, and any expectation of reliability, you are building a notification service — with its own data model, queues, retries, and failure modes.

This post walks through the architecture of a scalable notification service that delivers across email, SMS, push, and in-app channels. We cover user preferences, rate-limiting, synchronous vs. batch delivery, queueing with retries, high availability, and the trade-offs between latency, cost, and reliability that shape every decision.

A channel-agnostic core

The first design rule: callers should not know or care how a notification is delivered. A service emitting order.shipped should not contain SMS API keys or email templates. Instead, producers publish a logical notification intent:

{
  "event": "order.shipped",
  "user_id": "u_12345",
  "data": { "order_id": "A-1001", "eta": "2026-06-04" },
  "priority": "normal"
}

The notification service then resolves which channels to use, which template to render, and whether the user even wants it. This separation lets you add a new channel (say, WhatsApp) without touching a single producer.

Internally, the service is built around a few clean abstractions:

  • Notification — the logical intent above.
  • Channel adapter — a uniform interface (send(rendered, recipient)) implemented per provider.
  • Template — channel-specific rendering of the same event.
  • Recipient resolver — maps user_id to concrete addresses (email, phone, device tokens).
producer → intent → [preferences] → [rate-limit] → [template] → queue → channel adapter → provider

Modeling the four channels

Each channel has different constraints, and the architecture must respect them rather than flatten them:

ChannelLatency expectationCostFailure mode
In-appInstantNear-zeroUser offline (store + show later)
PushSecondsCheapStale device token, silent drop
EmailSeconds–minutesCheapBounce, spam folder, deferral
SMSSecondsExpensiveCarrier rejection, invalid number

Each channel adapter implements the same interface but encapsulates provider quirks — APNs/FCM for push, an SMTP or transactional-email API for email, a carrier gateway for SMS, and a database write plus websocket push for in-app.

class ChannelAdapter(Protocol):
    async def send(self, message: RenderedMessage, recipient: Recipient) -> DeliveryResult:
        ...

A DeliveryResult is normalized: delivered, retryable_failure, or permanent_failure. The orchestration layer never needs provider-specific error handling — the adapter translates carrier-speak into those three outcomes.

User preferences come first

Before anything is rendered or queued, the service consults preferences. This is both a product requirement (users hate spam) and a cost lever (you don't pay for SMS nobody wants).

A preference model usually has three layers:

  1. Category opt-in — does the user want marketing, transactional, or security notifications?
  2. Channel preference — for this category, which channels are allowed?
  3. Quiet hours / frequency caps — time windows and digest preferences.
{
  "user_id": "u_12345",
  "categories": {
    "transactional": { "email": true, "push": true, "sms": false },
    "marketing":     { "email": true, "push": false, "sms": false }
  },
  "quiet_hours": { "tz": "Africa/Cairo", "from": "22:00", "to": "08:00" }
}

A critical rule: transactional and security notifications usually bypass marketing opt-outs but still respect channel-level addressability (you can't SMS a user with no phone number). Encode this as policy, not scattered if statements.

Rate-limiting and deduplication

Rate-limiting protects three things: the user's attention, your provider quotas, and your bill. There are several limits worth enforcing at distinct scopes:

  • Per-user, per-category — e.g. at most 5 marketing pushes/day.
  • Per-event dedup — collapse duplicate order.shipped events fired twice within a window.
  • Global per-provider — stay under the carrier's throughput ceiling.

A token-bucket algorithm in Redis handles per-user limits cheaply:

key   = ratelimit:{user_id}:{category}
allow = redis.call('CL.THROTTLE', key, max_burst, count, period)

Deduplication uses an idempotency key derived from (event, user_id, business_id) stored with a TTL. If the key already exists, the notification is dropped before it ever reaches a queue — saving compute and money.

Synchronous vs. batch delivery

Not every notification has the same urgency, and treating them uniformly wastes either latency or money.

Synchronous (real-time) path. Security codes, password resets, and OTPs must go out now. These skip batching, take a high-priority queue, and accept higher per-message cost for low latency. The producer may even await a delivery acknowledgment.

Batch path. Digests, newsletters, and non-urgent updates are accumulated and flushed on a schedule. Batching unlocks real savings: many email providers bill less per message in bulk, and a daily digest of 20 events becomes one email instead of twenty.

priority = high  → low-latency queue → immediate worker
priority = bulk  → aggregation window → digest builder → batch send

The decision is driven by the notification's priority and category, resolved by policy — not hard-coded per producer.

Queueing with retries

Queues are the backbone of reliability. Putting a durable queue (Kafka, SQS, RabbitMQ, or Redis Streams) between the API and the channel workers decouples accepting a notification from delivering it. The API can return 202 Accepted in milliseconds; delivery happens asynchronously.

Retries are where most naive implementations fail. The rules that matter:

  • Exponential backoff with jitter — retry at 1s, 2s, 4s, 8s… plus randomness to avoid thundering herds.
  • Bounded attempts — give up after N tries and route to a dead-letter queue (DLQ) for inspection.
  • Distinguish retryable vs. permanent — a 500 from the SMS gateway is retryable; an "invalid phone number" is not. Retrying permanent failures just burns money.
  • Idempotent sends — pass an idempotency key to the provider so a retry after a timeout doesn't double-send.
async def deliver(msg):
    for attempt in range(MAX_ATTEMPTS):
        result = await adapter.send(msg.rendered, msg.recipient)
        if result.status == "delivered":
            return record_success(msg)
        if result.status == "permanent_failure":
            return dead_letter(msg, result.reason)
        await asyncio.sleep(backoff(attempt) + jitter())
    dead_letter(msg, "max_attempts_exhausted")

The DLQ is not a graveyard — it's an operational signal. A spike in dead-lettered SMS often means a misconfigured number format or a carrier outage, and it should page someone.

High availability

A notification service is frequently on the critical path for security flows, so availability matters. Several patterns keep it up:

  • Stateless workers — scale horizontally; any worker can process any queue message.
  • Multi-region queues and DBs — survive a regional outage; replicate preference data.
  • Provider failover — keep a secondary email/SMS provider and fail over when the primary's error rate crosses a threshold. This is where a normalized DeliveryResult pays off again.
  • Circuit breakers — when a provider is clearly down, stop hammering it; shed to the secondary or pause that channel and let the queue absorb the backlog.
  • Backpressure — if workers can't keep up, the durable queue buffers rather than dropping. Monitor queue depth as a leading indicator.

Crucially, degrade gracefully: if push is down, an in-app notification still lands, and the user sees it next time they open the app. No single channel failure should lose the message.

The core trade-off: latency vs. cost vs. reliability

Almost every decision in this system is a point on a triangle:

  • Latency. Synchronous delivery, dedicated high-priority queues, and provider redundancy reduce latency — at higher cost.
  • Cost. Batching, digesting, deduplication, and aggressive preference filtering cut cost — at the price of latency or completeness.
  • Reliability. Retries, DLQs, multi-provider failover, and replication raise reliability — at the cost of complexity and infrastructure spend.

You don't pick one globally; you pick per notification class:

ClassLatencyCostReliability
OTP / securityLowestAcceptHighest
TransactionalLowMediumHigh
Marketing/digestRelaxedLowestBest-effort

Encoding this as policy keyed on category and priority is what keeps the service both cheap and trustworthy — the OTP path spends what it must, and the newsletter path spends as little as possible.

Putting it together

A scalable notification service is, at its heart, a small pipeline with strong contracts:

  1. Accept a logical intent and return fast.
  2. Resolve preferences, addressability, and rate limits.
  3. Render per channel and route by priority.
  4. Deliver through durable queues with bounded, idempotent retries.
  5. Observe everything — delivery rates, DLQ depth, provider error rates, cost per channel.

Start with the channel-agnostic core and the preference model; those two decisions determine how cleanly everything else composes. The queues, retries, and failover are well-understood patterns — the hard part is keeping the boundaries clean so that adding a channel, swapping a provider, or tightening a budget is a localized change, not a rewrite.


Rate this post

All fields are optional. Just stars is fine.

No ratings yet