Load Balancing in Servers
Contents
Load Balancing in Computing Systems
1. What is Load Balancing?
- Process of distributing workloads across multiple computing resources (servers, CPUs, network links, etc.) to optimize resource use, maximize throughput, minimize response time, and avoid overload.
- Used in cloud computing, web servers, high-performance computing, networking (routers, switches), and embedded real-time systems.
2. Why is Load Balancing Needed?
- Prevents any single resource from being a bottleneck or point of failure.
- Increases system reliability and scalability.
- Enhances fault tolerance—if one node fails, others pick up the load.
- Reduces latency and improves user experience.
3. Where is Load Balancing Applied?
- Data centers: distribute incoming traffic to clusters of web/app servers.
- Embedded systems: schedule tasks on multiple cores/processors.
- Networks: balance packets/flows over links/routers (LAG, ECMP).
- Cloud platforms: autoscale and balance between virtual machines.
4. Load Balancing Algorithms
4.1 Static vs Dynamic
- Static: Allocation decided in advance, based on predefined rules (e.g., Round Robin).
- Dynamic: Allocations adapt to current loads or performance feedback (e.g., Least Connections, Adaptive).
4.2 Layer Perspective
- Layer 4 (Transport): Balances based on TCP/UDP/IP information (e.g., hardware load balancers).
- Layer 7 (Application): Considers content (HTTP headers, URLs), allows smarter routing.
5. Common Load Balancing Algorithms
5.1 Round Robin
- Requests are distributed to each server in sequence.
- Simple, does not consider server health or load.
int next_server = 0;
void handle_request(Request r) {
assign_to_server(servers[next_server], r);
next_server = (next_server + 1) % NUM_SERVERS;
}
5.2 Weighted Round Robin
- Like round robin, but assigns more requests to servers with higher capacity/weight.
int weights[NUM_SERVERS] = {3, 1, 2}; // e.g., server 0 gets 3x as many requests
int counters[NUM_SERVERS] = {0, 0, 0};
void handle_request(Request r) {
static int server = 0;
while (counters[server] >= weights[server]) {
counters[server] = 0;
server = (server + 1) % NUM_SERVERS;
}
assign_to_server(servers[server], r);
counters[server]++;
}
5.3 Least Connections
- Assigns new request to the server with the fewest active connections.
- Adapts dynamically to uneven load.
void handle_request(Request r) {
int min = active_conns[0], idx = 0;
for (int i = 1; i < NUM_SERVERS; ++i) {
if (active_conns[i] < min) { min = active_conns[i]; idx = i; }
}
assign_to_server(servers[idx], r);
active_conns[idx]++;
}
5.4 IP Hashing
- Consistent hashing based on client IP; sticky sessions for the same client.
int hash_ip(char* ip) {
int hash = 0;
for (; *ip; ++ip) hash = (hash * 31) + *ip;
return hash;
}
void handle_request(Request r) {
int idx = hash_ip(r.client_ip) % NUM_SERVERS;
assign_to_server(servers[idx], r);
}
5.5 Random Selection
- Assigns request to a randomly chosen server.
#include <stdlib.h>
void handle_request(Request r) {
int idx = rand() % NUM_SERVERS;
assign_to_server(servers[idx], r);
}
5.6 Adaptive/Feedback-Based (Advanced)
- Monitors real server response time, CPU/mem usage, or queue length.
- Directs new traffic to least-loaded (measured) server.
- Can use external metrics (e.g., Prometheus, SNMP) in modern clusters.
6. Software Implementation & Libraries
a. Linux/Unix
- IPVS (IP Virtual Server), NGINX, HAProxy, LVS (Linux Virtual Server)
- Configuration: specify backend servers, choose algorithm (rr, wrr, lc, etc.)
b. Cloud/Distributed Systems
- Kubernetes: uses kube-proxy, load balancer services
- AWS ELB, Azure Load Balancer, GCP Load Balancer—managed, auto-scale
- Modern APIs: Traefik, Envoy Proxy, Istio (Service Meshes)
c. Embedded/RTOS
- Real-time schedulers (Rate Monotonic, Earliest Deadline First)
- Multi-core RTOS: partition tasks among cores for optimal utilization
7. Hardware Load Balancers
- Specialized network appliances (e.g., F5, Citrix ADC) for ultra-low latency, L4/L7 balancing.
- Use hardware ASICs for flow tracking, SSL offload, deep packet inspection.
- Used by ISPs, enterprises, financial institutions.
8. Example Scenario: Web Server Load Balancing
- A web application with 4 backend servers, HAProxy in front.
- Clients connect to the load balancer (VIP), requests distributed per chosen algorithm.
- If a server fails, balancer detects via health check and stops sending requests to it.
- Logs, metrics, and alerts for monitoring.
Example HAProxy Config (for reference)
frontend http_front
bind *:80
default_backend web_servers
backend web_servers
balance roundrobin
server web1 192.168.1.101:80 check
server web2 192.168.1.102:80 check
server web3 192.168.1.103:80 check
server web4 192.168.1.104:80 check
9. Advanced Topics
9.1 Global Server Load Balancing (GSLB)
- What: Balances traffic not just within a data center, but across geographically distributed sites (multiple continents/countries).
- How: Uses DNS or application-layer redirects to send clients to the nearest or healthiest region.
- Why: Minimizes latency for global users, improves redundancy and disaster recovery.
- Example: A company with servers in the US, Europe, and Asia configures GSLB to route each user to the closest data center; if the Europe region goes down, users are redirected to the next best site.
9.2 Session Persistence/Sticky Sessions
- What: Ensures a user’s requests are routed to the same backend server during their session.
- Why: Needed for applications storing user state (shopping cart, login sessions) in server memory instead of shared DB/cache.
- How: Implemented via:
- Cookie-based (load balancer inserts a session cookie)
- IP-hash (same client IP always hashes to the same server)
- Application token/session affinity
- Example: HAProxy `stick-table`, NGINX `ip_hash`, AWS ELB session stickiness.
9.3 Health Checking
- What: Actively or passively monitors backend servers for availability and responsiveness.
- Why: Prevents sending requests to failed or overloaded servers, improving reliability.
- Types:
- Active: Load balancer pings or performs HTTP/TCP checks on servers periodically.
- Passive: Monitors error rates or connection failures.
- How: On repeated failures, server is removed from rotation until healthy.
- Example: HAProxy/NGINX `check` directive, cloud provider built-in health checks.
9.4 SSL Termination
- What: Load balancer handles all SSL/TLS decryption and encryption, then forwards unencrypted HTTP to backend servers.
- Why: Reduces CPU load on backend servers, simplifies certificate management, enables L7 inspection/filtering.
- How: Load balancer stores certificates and private keys, manages secure connections from clients.
- Example: `ssl`/`https` options in HAProxy, AWS/GCP/Azure load balancers, F5/Citrix ADC SSL offload.
9.5 Auto-scaling
- What: Automatically adds or removes backend servers based on real-time metrics (CPU, response time, queue length).
- Why: Ensures efficient resource usage and cost, adapts to traffic spikes, avoids under/over-provisioning.
- How: Integrated with orchestration (e.g., Kubernetes HPA, AWS EC2 Auto Scaling, Google Cloud Managed Instance Groups).
- Example: Web service running in Kubernetes scales pods up/down based on HTTP request rate; load balancer automatically adds new pods to its rotation.
10. Comparison Table
Algorithm | Static/Dynamic | Complexity | State Tracking | Use Cases | Weakness |
---|---|---|---|---|---|
Round Robin | Static | Low | No | Uniform servers, stateless | Ignores load/server size |
Weighted Round Robin | Static | Low | Yes | Mixed server capacities | Still ignores live load |
Least Connections | Dynamic | Medium | Yes | Web, DB servers, sticky | Needs connection tracking |
IP Hash | Static | Low | No | Sticky session apps | Unbalanced if hot IPs |
Random | Static | Low | No | Quick demo/testing | Can overload a server |
Adaptive/Custom | Dynamic | High | Yes | Large-scale, cloud-native | Complexity, monitoring |