Cost-Aware Hybrid Data Layer on AWS

Balancing reliability, analytics, and infra spend

Problem

The client's support and business teams needed SQL reporting on webhook logs. But the same database was also handling delivery for webhooks (requires no downtime). One bad analytics query or a reporting spike could slow down or break live webhook forwarding which customers actually depended on. Increasing RDS resiliency meant using adding more AZs and needing NAT Gateway (expensive) in multi AZs for the server.

Constraints

  • Delivery path had stricter uptime requirements than analytics queries.
  • Support and business teams still needed SQL-friendly reporting in Metabase.
  • Infra choices had to stay cost-aware at projected webhook volume.

Solution

  • Kept operational routing data in DynamoDB for high-availability webhook processing.
  • Added an SQS batch queue and Lambda writer to push delivery logs into PostgreSQL (RDS) in batches.
  • Treated RDS as analytics storage rather than a hard dependency in the forwarding hot path.
  • Accepted a slightly more complex topology to isolate webhook delivery from analytics workload patterns while also saving costs on NAT Gateways.

Outcome

  • Preserved webhook forwarding reliability even if analytics storage is degraded.
  • Gave business/support teams near-real-time reporting (about one-minute batch latency).
  • Produced explicit cost visibility with monthly service estimates for ops planning.
  • Created clear boundaries so delivery and reporting can be tuned independently.

Trade-off

In hindsight, an early-stage version could start with a single database and evolve later. This also costs more due to the need for NAT Gateway in multiple AZs (since server needs to be in VPC, and thus loses internet connection). For this project's reliability requirements + cost consideration, I chose separation sooner.

Stack

  • AWS DynamoDB
  • AWS SQS
  • AWS Lambda
  • AWS RDS (PostgreSQL)
  • Metabase