knot

Cluster Architecture

How knot’s leaderless cluster works.

Leaderless Design

Traditional clusters have a leader node that coordinates operations. If the leader fails, a new leader must be elected, causing downtime.

Knot uses a leaderless architecture where all servers are equal.

Benefits:

No single point of failure
No leader election delays
Servers can be added/removed dynamically
Continues operating if nodes are disconnected
Better performance distribution

How it works:

Each server has own database
Changes synchronized via gossip protocol
Each server can handle any request
No coordination overhead
Eventual consistency model

Data Flow

Space Creation

User creates space via web interface
Server stores space metadata in database
Server provisions volumes (if needed)
Server starts container with agent
Agent connects to server
Space becomes available

Space Access

User accesses space via web terminal or SSH
Request goes to any server in cluster
Server looks up space in database
Server connects to agent in space
Connection established

Template Updates

Admin updates template
Change saved to database
All servers see update immediately
Running spaces marked for update
Update applied on next space restart

High Availability

Server Redundancy

Run multiple servers in cluster:

Minimum 3 servers recommended
Distribute across availability zones
Load balancer in front of servers
Health checks and failover

Database Redundancy

Use database HA features:

MySQL replication or clustering
Redis Sentinel or Cluster
Regular backups
Automated failover

Storage Redundancy

For Nomad deployments:

Use replicated CSI storage
Regular volume backups
Disaster recovery plan

Disaster Recovery

Backup Strategy

Regular database backups
Encrypted backup storage
Offsite backup copies
Tested restore procedures

Recovery Plan

Identify failure scope
Restore database from backup
Restart servers
Verify system functionality
Notify users

Business Continuity

Document recovery procedures
Define RTO (Recovery Time Objective)
Define RPO (Recovery Point Objective)
Regular DR testing
Maintain spare capacity