Cluster Architecture
How knot’s leaderless cluster works.
Leaderless Design
Traditional clusters have a leader node that coordinates operations. If the leader fails, a new leader must be elected, causing downtime.
Knot uses a leaderless architecture where all servers are equal.
Benefits:
- No single point of failure
- No leader election delays
- Servers can be added/removed dynamically
- Continues operating if nodes are disconnected
- Better performance distribution
How it works:
- Each server has own database
- Changes synchronized via gossip protocol
- Each server can handle any request
- No coordination overhead
- Eventual consistency model
Data Flow
Space Creation
- User creates space via web interface
- Server stores space metadata in database
- Server provisions volumes (if needed)
- Server starts container with agent
- Agent connects to server
- Space becomes available
Space Access
- User accesses space via web terminal or SSH
- Request goes to any server in cluster
- Server looks up space in database
- Server connects to agent in space
- Connection established
Template Updates
- Admin updates template
- Change saved to database
- All servers see update immediately
- Running spaces marked for update
- Update applied on next space restart
High Availability
Server Redundancy
Run multiple servers in cluster:
- Minimum 3 servers recommended
- Distribute across availability zones
- Load balancer in front of servers
- Health checks and failover
Database Redundancy
Use database HA features:
- MySQL replication or clustering
- Redis Sentinel or Cluster
- Regular backups
- Automated failover
Storage Redundancy
For Nomad deployments:
- Use replicated CSI storage
- Regular volume backups
- Disaster recovery plan
Disaster Recovery
Backup Strategy
- Regular database backups
- Encrypted backup storage
- Offsite backup copies
- Tested restore procedures
Recovery Plan
- Identify failure scope
- Restore database from backup
- Restart servers
- Verify system functionality
- Notify users
Business Continuity
- Document recovery procedures
- Define RTO (Recovery Time Objective)
- Define RPO (Recovery Point Objective)
- Regular DR testing
- Maintain spare capacity