Building Real-Time Messaging Infrastructure: How CloudTxt Handles 10 Million Messages

The Scale Challenge

CloudTxt is a messaging infrastructure platform that powers communication for businesses at scale. When I joined the project, it was handling about 100,000 messages per day. By the time I was done, it was processing over 10 million. The journey from 100K to 10M was not just about adding more servers - it required fundamental rethinking of the architecture at multiple levels.

Message Flow Architecture

At its core, a messaging system needs to do three things reliably: accept messages, process them (routing, transformation, delivery), and confirm delivery. The challenge is doing all three with low latency and high throughput simultaneously.

bash

// High-level message flow

Client -> API Gateway (rate limiting, auth)
  -> Message Queue (SQS FIFO for ordering guarantees)
    -> Processing Workers (Lambda fleet)
      -> Routing Engine (determine delivery channel)
        -> Channel Adapters (SMS via Twilio, Email via SES, Push via FCM)
          -> Delivery Confirmation
            -> Webhook Notification to Client

// Parallel flows:
Message -> Analytics Pipeline (Kinesis -> S3 -> Athena)
Message -> Compliance Logger (encrypted, immutable audit trail)

Strategy 1: Queue-Based Load Leveling

The most important architectural pattern for messaging at scale is queue-based load leveling. Instead of processing messages synchronously through the API, we immediately acknowledge receipt, push the message onto a queue, and process it asynchronously.

This decouples ingestion throughput from processing throughput. During peak hours, when message volume spikes 5-10x above baseline, the queue absorbs the burst while workers process at a steady, sustainable rate. The client sees consistent sub-100ms API response times regardless of backend load.

Strategy 2: Partitioned Processing

Not all messages are equal. A transactional message like a password reset needs to be delivered within seconds. A marketing message can tolerate minutes of delay. We partition messages into priority queues:

Critical Queue: OTPs, password resets, security alerts. Dedicated worker pool with auto-scaling based on queue depth. Target latency: under 5 seconds.

Standard Queue: Notifications, updates, confirmations. Shared worker pool with batch processing. Target latency: under 60 seconds.

Bulk Queue: Marketing messages, newsletters, announcements. Throttled processing with configurable send rates. Target latency: under 30 minutes.

Strategy 3: Idempotent Message Processing

In distributed systems, messages can be delivered more than once. A worker might process a message, crash before acknowledging it, and the queue delivers it again. Without idempotency, the user receives duplicate messages.

python

class IdempotentMessageProcessor:
    def __init__(self, redis_client):
        self.redis = redis_client

    async def process(self, message):
        # Generate idempotency key from message content
        idempotency_key = f"msg:{message.id}:{message.attempt}"

        # Try to set the key with NX (only if not exists) and TTL
        acquired = await self.redis.set(
            idempotency_key, "processing",
            nx=True,  # Only set if not exists
            ex=3600,  # Expire after 1 hour
        )

        if not acquired:
            # Another worker already processed this message
            logger.info(f"Duplicate message detected: {message.id}")
            return

        try:
            result = await self._deliver(message)
            await self.redis.set(idempotency_key, "delivered", ex=86400)
            return result
        except Exception as e:
            await self.redis.delete(idempotency_key)
            raise

Strategy 4: Connection Pooling and Keep-Alive

When sending millions of messages through third-party APIs like Twilio and AWS SES, connection management becomes critical. Each new HTTPS connection requires a TCP handshake and TLS negotiation - roughly 100ms of overhead. At 10 million messages, that is 277 hours of wasted time just on connection setup.

We implemented aggressive connection pooling with HTTP keep-alive, maintaining warm connections to each provider. This alone reduced our P95 delivery latency by 40%.

Strategy 5: Real-Time Monitoring Dashboard

When you process 10 million messages per day, you need visibility into the system in real-time. We built a custom monitoring dashboard using DataDog that tracks:

Messages per second by channel (SMS, email, push) with anomaly detection

Delivery success rate per provider with automatic failover alerts

Queue depth trends with predictive scaling triggers

Cost per message by channel for financial forecasting

End-to-end latency percentiles (P50, P95, P99) with SLA breach alerts

The Numbers That Matter

Peak throughput: 2,400 messages per second sustained

Average end-to-end latency: 2.3 seconds (from API call to provider delivery)

Delivery success rate: 99.7% (remaining 0.3% are invalid phone numbers or emails)

Monthly infrastructure cost: Approximately $0.001 per message at scale

Zero message loss in 18 months of production operation

Building messaging infrastructure taught me that distributed systems are all about handling failure gracefully. The happy path is easy - it is the edge cases, the retries, the deduplication, and the monitoring that make the difference between a system that works in demos and one that works in production.

Building Real-Time Messaging Infrastructure: How CloudTxt Handles 10 Million Messages

Table of Contents

The Scale Challenge

Message Flow Architecture

Strategy 1: Queue-Based Load Leveling

Strategy 2: Partitioned Processing

Strategy 3: Idempotent Message Processing

Strategy 4: Connection Pooling and Keep-Alive

Strategy 5: Real-Time Monitoring Dashboard

The Numbers That Matter

Stay Updated

About the Author

More Articles

Astro vs Next.js in 2026: Choosing the Right Framework for Content-Driven Sites

Building Mobile Apps for Education: UX Lessons from Marathon Kids and LivingTree