Rebalance swarm: promote all nodes to managers and remove hostname constraints

- Promoted p1, p2, p3 from worker to manager nodes for 4-node quorum
- Removed unnecessary hostname constraints from service configs
- Only traefik and portainer remain pinned to p0
- Services now auto-balance across all nodes via GlusterFS shared storage
- Updated README with cluster overview and distribution strategy

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
This commit is contained in:
2025-10-30 08:52:38 +00:00
parent 8eb3106777
commit dde99083fb
7 changed files with 51 additions and 27 deletions

View File

@@ -1,3 +1,50 @@
# swarm-production # swarm-production
Production Docker Swarm Infrastructure Production Docker Swarm Infrastructure
## Cluster Overview
### Nodes
- **p0** (Manager/Leader) - Infrastructure services
- **p1** (Manager) - Application services
- **p2** (Manager) - Application services
- **p3** (Manager) - Application services
All nodes are managers providing a 4-node quorum (can tolerate 2 node failures while maintaining quorum).
### Storage
- **GlusterFS** mounted at `/home/doc/swarm-data/` on all nodes
- Shared storage enables services to run on any node without storage constraints
## Service Distribution Strategy
### Pinned Services
Services that must run on specific nodes:
- **traefik** (p0) - Published ports 80/443, needs stable IP for DNS
- **portainer** (p0) - Management UI, stays with leader for convenience
- **rsync** (manager constraint) - Backup service, needs manager access
### Floating Services
Services that can run on any node (swarm auto-balances):
- adminer
- authentik (server, worker, redis)
- n8n
- paperless (webserver, redis)
- tracker-nginx
- uptime-kuma
## Recent Changes (2025-10-30)
### Swarm Rebalancing
- Promoted p1, p2, p3 from workers to managers
- Removed unnecessary hostname constraints from service configs
- Force-redeployed services to redistribute across all nodes
- Verified GlusterFS accessibility on all nodes
### Results
- Achieved balanced workload distribution across all 4 nodes
- Improved high availability with 4-node manager quorum
- Services now self-balance automatically when nodes fail/recover
- Fixed Portainer agent connectivity by restarting agents after manager promotion

View File

@@ -10,9 +10,6 @@ services:
- ADMINER_DESIGN=nette - ADMINER_DESIGN=nette
deploy: deploy:
replicas: 1 replicas: 1
placement:
constraints:
- node.hostname == p0
networks: networks:
homelab: homelab:
external: true external: true

View File

@@ -10,9 +10,6 @@ services:
- homelab - homelab
deploy: deploy:
replicas: 1 replicas: 1
placement:
constraints:
- node.hostname == p0
authentik_server: authentik_server:
image: ghcr.io/goauthentik/server:2025.10.0 image: ghcr.io/goauthentik/server:2025.10.0
@@ -38,9 +35,6 @@ services:
- homelab - homelab
deploy: deploy:
replicas: 1 replicas: 1
placement:
constraints:
- node.hostname == p0
labels: labels:
- "traefik.enable=true" - "traefik.enable=true"
- "traefik.http.routers.authentik.rule=Host(`auth.frostlabs.me`)" - "traefik.http.routers.authentik.rule=Host(`auth.frostlabs.me`)"
@@ -75,9 +69,6 @@ services:
- homelab - homelab
deploy: deploy:
replicas: 1 replicas: 1
placement:
constraints:
- node.hostname == p0
depends_on: depends_on:
- redis - redis

View File

@@ -17,9 +17,6 @@ services:
- /var/run/docker.sock:/var/run/docker.sock:ro - /var/run/docker.sock:/var/run/docker.sock:ro
deploy: deploy:
replicas: 1 replicas: 1
placement:
constraints:
- node.hostname == p0
restart_policy: restart_policy:
condition: on-failure condition: on-failure
delay: 5s delay: 5s

View File

@@ -5,9 +5,6 @@ services:
- homelab - homelab
deploy: deploy:
replicas: 1 replicas: 1
placement:
constraints:
- node.hostname == p0
paperless_webserver: paperless_webserver:
image: ghcr.io/paperless-ngx/paperless-ngx:latest image: ghcr.io/paperless-ngx/paperless-ngx:latest
@@ -48,9 +45,6 @@ services:
- homelab - homelab
deploy: deploy:
replicas: 1 replicas: 1
placement:
constraints:
- node.hostname == p0
depends_on: # Fixed: removed postgres dependency depends_on: # Fixed: removed postgres dependency
- paperless_redis - paperless_redis

View File

@@ -14,9 +14,10 @@ services:
retries: 3 retries: 3
start_period: 60s start_period: 60s
deploy: deploy:
placement:
constraints: [node.hostname == p0]
replicas: 1 replicas: 1
placement:
preferences:
- spread: node.hostname
restart_policy: restart_policy:
condition: on-failure condition: on-failure
delay: 10s delay: 10s

View File

@@ -11,9 +11,6 @@ services:
- /home/doc/swarm-data/appdata/webfiles/production/taylors-development:/usr/share/nginx/html:ro - /home/doc/swarm-data/appdata/webfiles/production/taylors-development:/usr/share/nginx/html:ro
deploy: deploy:
replicas: 1 replicas: 1
placement:
constraints:
- node.role == worker
networks: networks:
homelab: homelab:
external: true external: true