# Frostlabs Docker Swarm Infrastructure A production-ready Docker Swarm cluster running self-hosted productivity tools and services with enterprise-grade security, automated SSL/TLS, and distributed storage. ## Table of Contents - [Overview](#overview) - [Infrastructure Architecture](#infrastructure-architecture) - [Cluster Topology](#cluster-topology) - [Networking](#networking) - [Storage](#storage) - [Core Services](#core-services) - [Application Services](#application-services) - [External Services](#external-services) - [Security](#security) - [Deployment Workflow](#deployment-workflow) - [Monitoring & Maintenance](#monitoring--maintenance) - [Backup Strategy](#backup-strategy) - [Future Improvements](#future-improvements) - [Quick Start](#quick-start) ## Overview Frostlabs is a 6-node Docker Swarm cluster designed for self-hosted productivity tools and experimental learning. The infrastructure emphasizes: - **High Availability**: Multi-node swarm with replicated managers - **Security First**: SSO authentication, intrusion detection, and network isolation - **Automated SSL**: Cloudflare DNS challenge for automatic HTTPS certificates - **Distributed Storage**: GlusterFS for persistent data across nodes - **GitOps Ready**: Infrastructure-as-code with webhook-based deployments **Primary Use Cases:** - Self-hosted productivity applications (document management, automation) - Learning platform for experimenting with new technologies - Production-ready personal services ## Infrastructure Architecture ### Cluster Topology The swarm consists of 6 nodes organized by role: | Node | Role | Availability | Labels | |------|------|--------------|---------| | p1-control | Manager | Active | `task=control` | | p2-control | Manager | Active | `task=control` | | p3-control | Manager | Active | `task=control` | | p0-compute | Manager | Active | `task=compute` | | p4-compute | Manager | Active | `task=compute` | | p5-compute | Manager | Active | `task=compute` | --- **Mixed Architecture** | Node | Machine | Model | RAM | CPU | Cores / Threads | |------|---------|-------|-----|-----|-----------------| | p1-control | ARM64 | Raspberry Pi 5 | 8 GB | Arm Cortex-A76 | 4C | | p2-control | ARM64 | Raspberry Pi 5 | 16 GB | Arm Cortex-A76 | 4C | | p3-control | ARM64 | Raspberry Pi 5 | 16 GB | Arm Cortex-A76 | 4C | | p0-compute | ADM64 | Beelink SER 5 MAX (2023) | 32 GB | Ryzen 7 5800H | 8C / 16T | | p4-compute | AMD64 | Unraid VM | 25 GB | 12th Gen Intel® Core™ i5-12600K | 5C / 11T | | p5-compute | AMD64 | Unraid VM | 25 GB | 12th Gen Intel® Core™ i5-12600K | 5C / 11T | |**Totals:** | | | `122 GB` | | `30C / 38T` | --- **Node Label Strategy:** - `task=control`: Infrastructure services (Traefik, Portainer, CrowdSec) - `task=compute`: Application workloads (Authentik, Paperless, n8n, etc.) This separation ensures critical infrastructure services remain on manager nodes while compute-intensive applications run on dedicated "worker" nodes. **NOTE**: Worker Node in this case is a node labeled as `task=compute` All nodes are managers insuring maximum uptime. ### Networking **Overlay Network: `frostlabs`** - Driver: Overlay (encrypted by default in swarm mode) - Scope: Swarm-wide - Purpose: Inter-service communication across all nodes **Unraid Host: `frostlabs`** - **Postgres** - **Cloudflare Tunnel** - **NFS Volumes** - **2X VM's p4 & p5** **Exposed Ports:** - `80/tcp` - HTTP (redirects to HTTPS) - `443/tcp` - HTTPS (Traefik entrypoint) - `****/tcp` - Traefik dashboard - `9000/tcp` - Portainer UI - `5678/tcp` - n8n webhook endpoint ### Storage **GlusterFS Distributed Filesystem** Persistent data is stored on GlusterFS volumes mounted at `/home/doc/projects/swarm-data/` with the following structure: ``` /home/doc/projects/swarm-data/ ├── traefik/ │ ├── certificates/ # ACME certificates │ └── logs/ # Access logs for CrowdSec ├── crowdsec/ │ ├── config/ # CrowdSec configuration │ └── data/ # Decision database ├── portainer/ # Portainer data ├── authentik/ │ ├── media/ │ └── templates/ ├── paperless/ │ ├── data/ │ ├── media/ │ ├── export/ │ └── consume/ ├── n8n/ # n8n workflows ├── peertube/ │ ├── data/ │ ├── redis/ │ └── postgres/ └── webservers/production/ # Static site files ``` **Benefits:** - Data replication across nodes - High availability for stateful services - Transparent failover ## Core Services ### Traefik (v3.6.1) Modern reverse proxy and load balancer handling all ingress traffic. **Features:** - Automatic HTTPS via Cloudflare DNS challenge - HTTP to HTTPS redirection - Docker Swarm service discovery - CrowdSec bouncer plugin for threat blocking - Access logging for security monitoring **Stack Location:** `core/stack.yml` **Configuration Files:** - `core/static.yml` - Static configuration (entrypoints, providers, ACME) - `core/dynamic.yml` - Dynamic routing for external services **Exposed Routes:** All services use `*.frostlabs.me` or `*.bitfrost.me` domains with automatic SSL. ### CrowdSec Collaborative intrusion detection and prevention system. **Features:** - Parses Traefik access logs for threat detection - Crowdsourced IP reputation database - Automatic banning via Traefik middleware - Collections: `crowdsecurity/traefik`, `crowdsecurity/http-cve` **Stack Location:** `core/stack.yml` **Integration:** - Reads Traefik logs from GlusterFS volume - Bouncer plugin in Traefik blocks malicious IPs - Metrics available on port 6060 ### Portainer CE Web-based Docker management interface for the entire swarm. **Features:** - Multi-node swarm visualization - Stack deployment via UI - Container/service management - Webhook support for automated deployments **Stack Location:** `core/stack.yml` **Access:** `https://portainer.frostlabs.me` or `http://10.0.4.10:9000` **Agent Deployment:** - Global mode (runs on every node) - Provides node-level metrics and control ## Application Services ### Authentik (v2025.10.0) Enterprise SSO and identity provider. **Components:** - `authentik_server` - Main application server - `authentik_worker` - Background task processor - `redis` - Session cache **Features:** - Forward authentication for Traefik - OIDC/SAML provider - User/group management - Protects sensitive services (e.g., Unraid dashboard) **Stack Location:** `authentik/stack.yml` **Access:** `https://auth.frostlabs.me` **Database:** PostgreSQL on Unraid (`10.0.4.10:5432`) ### Paperless-ngx Document management system with OCR and full-text search. **Features:** - Automatic document ingestion from consume folder - OCR with English language support - Duplicate detection - Tagging and classification - Export functionality **Stack Location:** `paperless/stack.yml` **Access:** `https://docs.frostlabs.me` **Configuration:** - Time Zone: `America/New_York` - Database: PostgreSQL on Unraid - Polling interval: 5 seconds - Recursive consumption enabled ### n8n Self-hosted workflow automation platform. **Features:** - Visual workflow builder - 400+ integrations - Webhook support - Runner mode enabled **Stack Location:** `n8n/stack.yml` **Access:** `https://n8n.bitfrost.me` **Resources:** - Memory: 512MB reserved, 2GB limit - Persistent workflows stored in GlusterFS ### PeerTube Decentralized video hosting platform. **Components:** - `peertube` - Main application - `postgres` - Database (v17-alpine) - `redis` - Cache (v7-alpine) **Stack Location:** `peertube/stack.yml` **Access:** `https://videos.frostlabs.me` **Configuration:** - SMTP: Gmail integration for notifications - Database: Dedicated PostgreSQL instance - Admin email: frostlabs25@example.com ### Adminer Lightweight database management interface. **Stack Location:** `adminer/stack.yml` **Purpose:** Web-based management for PostgreSQL/MySQL databases across the infrastructure. ### Tracker (Static Site) Nginx-based static website hosting. **Stack Location:** `tracker/stack.yml` **Purpose:** Serves static HTML/CSS/JS from `/home/doc/projects/swarm-data/webfiles/production/taylors-development` **Port:** `8180` ## External Services Services running outside the swarm but routed through Traefik: | Service | Host | Internal URL | Public Domain | Middleware | |---------|------|--------------|---------------|------------| | Unraid Dashboard | 10.0.4.10 | http://10.0.4.10:80 | unraid.frostlabs.me | Authentik, CrowdSec | | Emby | 10.0.4.10 | http://10.0.4.10:8096 | movies.frostlabs.me | CrowdSec | | Media Manager | 10.0.4.10 | http://10.0.4.10:8000 | media.frostlabs.me | CrowdSec | **Configuration:** `core/dynamic.yml` ## Security Multi-layered security approach: ### 1. SSO Authentication (Authentik) - Forward authentication middleware in Traefik - Protects administrative interfaces (Unraid, etc.) - Centralized user management - Session management via Redis ### 2. Intrusion Detection (CrowdSec) - Real-time log analysis - Automatic IP banning - Community-driven threat intelligence - Integrated with Traefik via bouncer plugin ### 3. Network Isolation - Internal overlay network (`frostlabs`) - Services not exposed unless explicitly configured - Firewall rules limiting external access - Trusted IP ranges for administrative access ### 4. SSL/TLS Encryption - Automatic certificate issuance via Let's Encrypt - Cloudflare DNS challenge (no port 80/443 exposure required) - HTTPS enforcement (HTTP redirects) - Certificate storage on GlusterFS for HA ### 5. Secrets Management Docker secrets for sensitive data: - `cloudflare_api_token` - DNS challenge authentication - `auth-key` - Authentik secret key - `postgres-master` - Database password - `paperless-secret-key` - Django secret key - `paperless-admin-pass` - Admin password ### 6. Resource Limits All services have defined memory/CPU limits to prevent resource exhaustion attacks. ## Deployment Workflow ### Standard Deployment Process 1. **Local Testing** - Test stack configuration locally or in development environment - Validate service connectivity and configuration - Ensure no syntax errors in YAML files 2. **Git Commit** - Commit working stack files to Git repository - Push to remote (GitHub/Gitea) 3. **Portainer Deployment** - Navigate to Portainer UI (`https://portainer.frostlabs.me`) - Pull stack from Git repository - Deploy or update stack via Portainer interface 4. **Webhook Configuration** - Create webhook in Portainer for the stack - Future updates trigger automatic redeployment on Git push ### Manual Deployment For quick updates or testing: ```bash # SSH to any manager node (p1, p2, or p3) ssh p1-control # Deploy a stack docker stack deploy -c /path/to/stack.yml # Example: Deploy core infrastructure docker stack deploy -c ~/projects/homelab/frostlabs/core/stack.yml core # Update a service docker service update --image # Check service status docker service ls docker service ps ``` ### Stack Management Commands ```bash # List all stacks docker stack ls # View services in a stack docker stack services # View tasks in a stack docker stack ps # Remove a stack docker stack rm ``` ## Monitoring & Maintenance ### Current Monitoring **Portainer Dashboard:** - Service health status - Resource utilization per node - Container logs - Service scaling controls **Manual Monitoring:** ```bash # Node status docker node ls # Service health docker service ls # Check service logs docker service logs -f # View CrowdSec decisions (banned IPs) docker exec $(docker ps -q -f name=crowdsec_crowdsec) cscli decisions list # Check Traefik metrics curl http://:8082/metrics ``` ### Health Checks All services include health checks: - Traefik: Ping endpoint - CrowdSec: `cscli version` - Redis: `redis-cli ping` - Authentik: `ak healthcheck` - Paperless: HTTP endpoint test - n8n: `/healthz` endpoint ### Future Monitoring Plans - **Prometheus**: Metrics collection from all services - **Grafana**: Visualization dashboards for cluster health - **Alerting**: Notification system for service failures ## Backup Strategy ### Current Approach **Configuration Backups:** - All stack files version-controlled in Git - Infrastructure-as-code approach ensures reproducibility **Data Backups:** - Manual/periodic backups of critical GlusterFS volumes - Performed before major infrastructure changes **Critical Data to Backup:** - Traefik certificates (`/swarm-data/traefik/certificates`) - Authentik database and media - Paperless documents (`/swarm-data/paperless/media`) - n8n workflows (`/swarm-data/n8n`) - Portainer configuration ### Backup Commands ```bash # Backup a GlusterFS volume tar -czf backup-$(date +%Y%m%d).tar.gz /home/doc/projects/swarm-data/ # Backup to remote location rsync -avz /home/doc/projects/swarm-data/ user@backup-server:/backups/ ``` ### Future Backup Plans - Automated scheduled backups via cron or dedicated backup service - Off-site backup replication - Snapshot-based backups for point-in-time recovery - Automated testing of backup restoration ## Future Improvements Planned enhancements to the infrastructure: 1. **Monitoring Stack** - Deploy Prometheus for metrics collection - Grafana dashboards for visualization - Alertmanager for notifications 2. **Automated Backups** - Scheduled backup jobs - Retention policies - Automated restore testing 3. **CI/CD Pipeline** - Automated testing of stack deployments - Canary deployments for zero-downtime updates - Automated rollback on failure 4. **Enhanced Security** - Regular vulnerability scanning - Automated certificate rotation monitoring - Security audit logging 5. **Performance Optimization** - Caching layers (Redis, Varnish) - CDN integration for static assets - Database query optimization ## Quick Start ### Prerequisites - Docker Swarm initialized across all nodes - GlusterFS volumes mounted on all nodes - DNS records pointing to your swarm ingress - Cloudflare API token for DNS challenge ### Initial Deployment 1. **Clone the repository:** ```bash git clone cd frostlabs ``` 2. **Create Docker secrets:** ```bash echo "your_cloudflare_token" | docker secret create cloudflare_api_token - echo "your_auth_key" | docker secret create auth-key - echo "your_db_password" | docker secret create postgres-master - # Add other secrets as needed ``` 3. **Create the overlay network:** ```bash docker network create --driver overlay --attachable frostlabs ``` 4. **Deploy core infrastructure:** ```bash docker stack deploy -c core/stack.yml core ``` 5. **Wait for Traefik and Portainer to be healthy:** ```bash docker service ls watch docker service ps core_traefik ``` 6. **Deploy application stacks:** ```bash # Via Portainer UI (recommended) # or manually: docker stack deploy -c authentik/stack.yml authentik docker stack deploy -c paperless/stack.yml paperless docker stack deploy -c n8n/stack.yml n8n # etc. ``` ### Accessing Services Once deployed, access your services at: - Portainer: `https://portainer.frostlabs.me` - Authentik: `https://auth.frostlabs.me` - Paperless: `https://docs.frostlabs.me` - n8n: `https://n8n.bitfrost.me` - PeerTube: `https://videos.frostlabs.me` - Traefik Dashboard: `local access only` --- **Maintained by:** Frostlabs Admin: `Johnathan Allison` **Last Updated:** `2025-011-16` **License:** MIT