← Back to posts
February 10, 2025
infrastructuredockerhetznerdevopsfull-stack

One Server, One Repo, Multiple Full-Stack Apps

A practical, opinionated guide to building a multi-site Docker infrastructure on Hetzner (with Traefik, PostgreSQL, Cloudflare R2, and zero-downtime deploys).

This morning I saw @rtwlz's post about Jmail hitting 450 million pageviews and their Vercel bill exploding to over $46,000, even after heavy cache mitigation.

Screenshot of Riley Walz's X post about Jmail's Vercel bill and Guillermo Rauch's reply

I didn't have a clear plan in mind, but I was in an exploratory phase. I tested many small things quickly, rebuilding from scratch each time and trying different services and stacks. Eventually I found a setup that works for me for a bunch of reasons.

Given the tools we have as software engineers, managing everything through a single repo makes a lot of sense. Let's try. I want to build momentum and a setup that wastes less time testing ideas, doesn't hold me back with hypothetical costs, and lets me learn new things along the way.

Here's an initial pass at what I've landed on.

The Problem

Let's start with what we're actually solving. I have five independent web applications. They share the same tech stack and a lot of UI components, but they serve different purposes and live on different domains. Each one needs:

  • A Next.js frontend (SSR, so it's not just static files)
  • A FastAPI backend (Python, async, REST API)
  • A PostgreSQL database
  • Authentication (via Clerk)
  • SSL certificates (automatic, no manual renewal)

On top of the per-site stuff, I need shared infrastructure:

  • Object storage for file uploads and media (S3-compatible)
  • Redis as a cache layer
  • Monitoring to know when something breaks at 3 AM
  • Centralized logging to debug issues without SSH-ing into the server

And it all needs to fit in a €50/month budget.


Why Hetzner

I evaluated AWS, DigitalOcean, and Hetzner. Immediate choice: Hetzner. For price-to-quality today, it's the best for a VPS.

Hetzner gives you significantly more compute per euro, and their cloud platform is simple without being simplistic.

For context, here's what €29/month gets you on Hetzner (CPX41):

ResourceSpec
vCPU8 (AMD EPYC)
RAM16 GB
Storage240 GB NVMe SSD
Traffic20 TB/month
LocationEU (Falkenstein, Nuremberg, or Helsinki)

Try getting 8 vCPU and 16 GB RAM on AWS for €29/month. I'll wait.

Add a Hetzner Storage Box (100 GB for €3.81/month) for off-server backups, and the total infrastructure cost is roughly €33/month. That leaves comfortable headroom within the €50 budget for growth.


Architecture Overview

Here's the full picture:

Loading diagram...

Everything inside the server boundary is a Docker container. The entire stack is defined in a single docker-compose.yml, orchestrated by one tool, and deployed from one repository. R2 and the Storage Box are external managed services.

Let me walk through each decision.


Docker Compose

Docker Compose gives you:

  • Declarative configuration: your entire stack is a YAML file
  • Single-command deploys: docker compose up -d
  • Service discovery: containers talk to each other by name
  • Health checks and restart policies: containers recover from crashes
  • Resource limits: you can cap CPU and memory per container

What it doesn't give you is auto-scaling, multi-node clustering, and self-healing across servers. You don't need any of that. If you outgrow a single server, you can migrate to Docker Swarm (which uses the same compose file format) or Kubernetes later. But you probably won't need to for a long time.


Traefik: The Traffic Router

My first real discovery from this exercise. Traefik has been around since 2016, yet everyone still defaults to Nginx or Caddy. It deserves more attention.

Traefik GitHub star history showing steady growth since 2016

This is the first piece that might be unfamiliar. You have one server with one public IP address, but five domains pointing to it. Something needs to inspect incoming requests and route them to the correct container. That's what a reverse proxy does.

I evaluated three options:

FeatureTraefikCaddyNginx
Docker-nativeYes, auto-discovers containersNo, manual configNo, manual config
Auto SSLYes (Let's Encrypt)YesNo (needs Certbot)
Add a new siteAdd labels to containerEdit CaddyfileEdit nginx.conf + restart
Hot reloadYesYesRequires restart
Learning curveMediumEasyEasy but more manual

I chose Traefik because of Docker-native auto-discovery. When I add a sixth site, I don't edit any proxy configuration. I just add Docker labels to the new container:

labels:
  - "traefik.http.routers.site6.rule=Host(`site6.com`)"
  - "traefik.http.routers.site6.tls.certresolver=letsencrypt"

Traefik sees the new container, generates an SSL certificate, and starts routing traffic. No config files to edit, no service to restart. This is especially valuable when you're managing multiple sites: the proxy configuration scales automatically with your stack.

Here's how the routing works:

Loading diagram...

Traefik also handles zero-downtime deploys. When you update a container, Traefik waits for the new one to pass its health check before routing traffic to it. The old container keeps serving requests until the handoff is complete.

Caddy is a perfectly valid alternative. It has a simpler configuration syntax and is easier to learn. If you prefer explicit configuration files over Docker magic, go with Caddy. I just find that auto-discovery reduces operational friction when managing multiple sites.

Nginx is the most battle-tested option, but it requires more manual work: separate Certbot setup for SSL, manual config editing for each site, and a restart/reload for changes. For high-traffic, static-heavy workloads, Nginx is unbeatable. For a multi-site Docker stack, it's unnecessary friction.

PostgreSQL: One Instance, Multiple Databases

A single PostgreSQL instance with five separate databases uses about 1–2 GB total and is easier to back up, monitor, and maintain.

Each site has its own database, its own credentials, and its own connection string. There's zero data leakage between sites. From the application's perspective, it's indistinguishable from having dedicated instances.

The tradeoff: if PostgreSQL crashes, all five sites go down. This is acceptable for low-traffic applications with a single operator. PostgreSQL is extremely stable: it's more likely that your application code crashes than Postgres does. And if you outgrow this model, splitting into separate instances later is a straightforward migration.

Migrations with Alembic

Since the backend is FastAPI with SQLAlchemy, Alembic is the natural choice for database migrations. Each site manages its own migrations independently:

apps/site-1/backend/
├── alembic/
│   ├── versions/
│   │   ├── 001_create_users.py
│   │   └── 002_add_orders.py
│   └── env.py
└── alembic.ini

The workflow is simple:

  1. Change your SQLAlchemy models
  2. Run alembic revision --autogenerate -m "add email column"
  3. Alembic diffs your models against the database and generates the migration
  4. On deploy, alembic upgrade head applies pending migrations

I considered language-agnostic tools like dbmate or golang-migrate (which use raw SQL files instead of auto-generation). These make sense if you're mixing backend languages. Since I'm committing to FastAPI, Alembic's auto-generation from SQLAlchemy models saves significant time and catches schema drift automatically.

Migrations run as part of the deployment pipeline, right before the new container starts serving traffic. A pg_dump backup runs before every migration, uploaded to the Storage Box. If a migration goes wrong, rollback is a restore away.

Redis: Cache Layer

Redis serves as an application cache: session data, rate limiting, frequently accessed queries. Not needed on day one, but when you need caching, Redis is ready.

Instead of spinning up multiple Redis containers, use a single instance with database numbers to logically separate concerns:

redis://redis:6379/0  →  Application cache (site-1)
redis://redis:6379/1  →  Application cache (site-2)
redis://redis:6379/2  →  Application cache (site-3)
...

Redis supports 16 databases by default (0–15). They share the same instance but are completely isolated from each other. Memory footprint: roughly 50–100 MB. Negligible.


Object Storage: Cloudflare R2

If your applications handle file uploads (images, documents, media), you need object storage. You could use local disk, but that breaks the moment you want to scale, migrate, or share files between services.

I evaluated MinIO (self-hosted) against managed alternatives. For this setup, Cloudflare R2 is the pragmatic choice.

The case against MinIO on your server:

  • It consumes ~500 MB RAM on a server where every gigabyte matters
  • You add operational complexity: another container to maintain, monitor, and back up
  • Files live on the same disk as everything else; if the server dies, files die too

Why Cloudflare R2:

FeatureCloudflare R2MinIO (self-hosted)
CostFree up to 10 GB, then $0.015/GB/monthFree but uses server RAM
Egress fees$0 (zero)N/A (your bandwidth)
Geographic redundancyYes (built-in)No (single server)
Server RAM0~500 MB

Egress fees = you pay every time someone downloads a file from your storage. You upload an image, a user views it on your site; that's egress. Most providers charge for this: AWS S3 ~$0.09/GB, Google Cloud ~$0.12/GB. R2 charges $0. If your sites serve lots of images or files, egress costs can silently dwarf storage costs. R2 eliminates that entirely.

R2 is S3-compatible. Your FastAPI code stays the same: boto3 or any S3 SDK works with R2 identically. Change the endpoint URL and credentials; the code doesn't change. If you ever want to switch to MinIO or AWS S3, you change a config value.

Hetzner Object Storage (€5.99/month for 1 TB) is a valid alternative: same provider, S3-compatible, very low latency to your server. If your files are mostly used server-side (backends reading/writing between services, batch processing), Hetzner Object Storage wins. If your files are mostly served to users (images on websites, downloads), R2 wins: free egress plus built-in global CDN means faster delivery worldwide. For five web apps with typical file uploads, R2 is the better default.

Each site gets its own R2 bucket:

site-1-uploads/
site-2-uploads/
site-3-uploads/
shared-assets/

The only reason to keep MinIO is if you must keep data on-premise (compliance, data sovereignty). For five web apps with file uploads? R2 is the pragmatic choice.


Authentication: Clerk

Building authentication from scratch is a trap. It seems simple: hash passwords, manage sessions, done. Then you need email verification, password reset flows, OAuth providers, multi-factor authentication, rate limiting on login attempts, session invalidation across devices, CSRF protection, and suddenly you've spent three months building infrastructure instead of your product.

Clerk handles all of this. The integration has two sides:

Frontend (Next.js)

The @clerk/nextjs package provides pre-built, customizable UI components for sign-in, sign-up, and user management. Drop them into your app, configure your Clerk dashboard, and authentication is handled.

Backend (FastAPI)

This is the part people sometimes skip, and it's a critical mistake. Clerk authenticates users on the frontend and issues a JWT (JSON Web Token). Your FastAPI backend must verify that JWT on every request. Without this, your API is completely unprotected.

Here's why: your frontend talks to your backend via HTTP requests. Nothing stops someone from calling those same API endpoints directly, with curl, Postman, or a script. If the backend doesn't verify the token, anyone can access your data without ever touching Clerk.

Each of the five sites gets its own Clerk application (separate user base, separate configuration, separate API keys). The keys live in each site's .env file and are injected into the containers at runtime.

Cost: Clerk's free tier includes 10,000 Monthly Active Users per application. For low-traffic sites, this is extremely generous. If you outgrow it, the Pro plan is $25/month with $0.02 per additional MAU.


Monitoring: Uptime Kuma

Another great find. Uptime Kuma is a self-hosted monitoring tool that runs as a single Docker container (~100 MB RAM). It pings each of your sites at a configurable interval (I use 60 seconds) and alerts you via Telegram, Discord, email, or Slack when something goes down.

It also provides a clean dashboard showing uptime history, response times, and certificate expiration dates, which is useful for catching performance degradation before it becomes an outage.

I considered Prometheus + Grafana, which is the industry standard for infrastructure monitoring. It's also designed for teams operating at scale with hundreds of metrics across dozens of services. For a solo developer with five sites, it's like using a fire truck to water a garden. Uptime Kuma covers 90% of what you need with 10% of the complexity.


Logging: Dozzle

Another great find. When something goes wrong, logs are the first thing you check. Docker's built-in logging works fine from the command line (docker compose logs -f site-1-backend), but scrolling through terminal output across 15+ containers gets tedious fast.

Dozzle is a lightweight, read-only Docker log viewer. It runs as a single container, auto-discovers all other containers, and provides a web-based UI for searching and tailing logs in real time. It requires zero configuration: no log shippers, no databases, no dashboards to build. It reads directly from Docker's log streams.

To prevent logs from consuming disk space, configure Docker's JSON log driver with rotation:

x-logging: &default-logging
  driver: json-file
  options:
    max-size: "10m"
    max-file: "3"

This caps each container's logs at 30 MB (3 files × 10 MB), and Docker automatically rotates them. Apply this globally across your compose file to every service.

I considered the ELK stack (Elasticsearch + Logstash + Kibana) and Grafana Loki. Both are excellent for large-scale log aggregation. Both also consume more RAM than some of my actual application containers. For this scale, Dozzle is the right tool.


Monorepo Structure

All five sites, shared packages, and infrastructure configuration live in a single repository. Here's why:

The five sites share a lot of code: UI components, API utilities, authentication patterns, type definitions. In a multi-repo setup, sharing code means publishing private npm packages, managing versions across repositories, and dealing with the inevitable "which version of the shared library is site-3 using?" confusion. In a monorepo, shared code is imported directly. Change a component, and every site that uses it gets the update.

platform/
├── apps/
│   ├── site-1/
│   │   ├── frontend/          # Next.js
│   │   │   ├── Dockerfile
│   │   │   ├── package.json
│   │   │   └── src/
│   │   └── backend/           # FastAPI
│   │       ├── Dockerfile
│   │       ├── pyproject.toml
│   │       ├── alembic/
│   │       └── app/
│   ├── site-2/
│   │   ├── frontend/
│   │   └── backend/
│   └── ...
├── packages/
│   ├── ui/                    # Shared React components
│   ├── api-client/            # Shared API utilities
│   └── common/                # Shared types, constants
├── infra/
│   ├── docker-compose.yml
│   ├── docker-compose.dev.yml
│   ├── traefik/
│   │   └── traefik.yml
│   └── .env.example
├── .github/
│   └── workflows/
│       └── deploy.yml
├── pnpm-workspace.yaml
└── turbo.json

Frontend sharing is managed by pnpm workspaces. The packages/ directory contains shared libraries that any apps/*/frontend/ can import. Turborepo handles build caching and task orchestration: if site-1's frontend hasn't changed, it doesn't rebuild.

Backend sharing is simpler. Python's import system handles it natively. Shared utilities go into a common package that each backend installs as a local dependency.

The infra/ directory contains everything related to deployment: the production Docker Compose file, Traefik configuration, and environment variable templates.


CI/CD: GitHub Actions

The deployment pipeline is intentionally simple: push to main, GitHub Actions builds and deploys.

Loading diagram...

The key detail is path-filtered deploys. When you push changes that only affect apps/site-1/, the pipeline only builds and deploys site-1's containers. This keeps deploys fast and reduces risk.

Built images are pushed to GitHub Container Registry (GHCR), which is free for public repos and included with GitHub plans for private repos. The server pulls updated images and restarts only the changed services.

Zero-downtime deploys work because:

  1. The new container starts alongside the old one
  2. Traefik waits for the new container's health check to pass
  3. Traefik shifts traffic to the new container
  4. The old container is stopped and removed

No load balancer configuration, no blue-green setup, no manual intervention. Docker Compose --wait and Traefik's health-check awareness handle it natively.


Backups

The strategy is straightforward:

Loading diagram...
  • PostgreSQL: Daily pg_dump per database, compressed and encrypted, uploaded to the Storage Box via rsync over SSH
  • Object storage (R2): Files are already off-server with geographic redundancy. R2 has built-in durability; no separate backup needed for typical use. If you need an extra copy, use R2's lifecycle rules or a sync job to another bucket
  • Retention: Keep 7 daily, 4 weekly, and 6 monthly backups. Automated rotation via a simple shell script
  • Pre-migration backups: Every deployment that includes Alembic migrations triggers an additional backup before applying changes

The Hetzner Storage Box is physically separate from the server. If the server burns down, your backups survive.

Test your restores. A backup that you've never restored from is just a hope. Schedule a quarterly restore test: spin up a temporary database, load the latest backup, verify data integrity.


Security Essentials

Running everything on one server means security is critical. If someone gets in, they have access to everything.

SSH: Key-only authentication. Disable password login. Change the default port (security through obscurity isn't a strategy, but it reduces noise from automated scanners).

Firewall (UFW): Allow only ports 80, 443, and your SSH port. Everything else is closed.

ufw default deny incoming
ufw default allow outgoing
ufw allow 80/tcp
ufw allow 443/tcp
ufw allow <your-ssh-port>/tcp
ufw enable

Fail2ban: Automatically bans IPs that repeatedly fail SSH authentication. Stops brute-force attacks cold.

Docker network isolation: Each site's frontend and backend communicate on a private Docker network. Sites cannot talk to each other directly. Only Traefik and the shared services (PostgreSQL, Redis) bridge across networks.

Cloudflare (DNS + edge protection): If you use Cloudflare for DNS (recommended in the architecture), every request passes through Cloudflare's network. The free tier includes: DDoS protection (absorbs volumetric attacks before they reach your server), basic WAF rules (blocks common exploits like SQL injection, XSS in URLs), bot mitigation, and rate limiting (1 rule free, enough for login endpoint protection). This is a different layer than UFW and Fail2ban: Cloudflare stops bad traffic before it reaches Hetzner; your server never wastes CPU on it. Enable Full (Strict) SSL mode in Cloudflare so traffic is encrypted end-to-end (Cloudflare to Traefik uses a real certificate).

LayerToolProtects against
Edge (before traffic hits your server)Cloudflare freeDDoS, bots, common web attacks
Server firewallUFWUnauthorized port access
SSH protectionFail2banBrute-force SSH attempts
App levelClerk + JWT validationUnauthorized API access
Loading diagram...

Automatic updates: Enable unattended security updates for the host OS. Containers run isolated, but the host kernel is shared; keep it patched.

Secrets management: Environment variables in .env files, not committed to Git. The .env.example file in the repo documents required variables without exposing values. Secrets are deployed to the server via a secure mechanism (SSH, encrypted secrets in GitHub Actions, or a secrets manager).


Local Development

You don't want to run the full production stack on your laptop. You don't need Traefik, SSL certificates, or Uptime Kuma when you're debugging a CSS issue.

The solution is a separate docker-compose.dev.yml that spins up only what you need:

# Work on site-1 only
docker compose -f docker-compose.dev.yml up site-1-frontend site-1-backend postgres redis

Key differences from production:

AspectProductionLocal Dev
ProxyTraefik with SSLNone (direct port access)
FrontendBuilt imageVolume mount + hot reload
BackendBuilt imageVolume mount + --reload
Object storageR2 (external)R2 or local MinIO for dev
MonitoringUptime Kuma + Dozzledocker compose logs

Volume mounts map your local code into the containers, so changes reflect immediately without rebuilding images. Next.js hot reload and FastAPI's --reload flag handle the rest.


RAM Budget

With 16 GB available, here's the estimated allocation:

Loading diagram...

The estimated total is roughly 7 GB (no MinIO on server; R2 is external), leaving about 9 GB of headroom for traffic spikes, build processes, and future growth. This is comfortable. If growth demands it, upgrading to Hetzner's CPX51 (32 GB RAM, €54/month) is a one-click operation with minimal downtime.


What's Next

The next step is start to implement?

Command Palette

Search for a command to run...