Load Balancing in Servers

Karna included in Embedded

2025-04-25 1061 words 5 minutes

Load Balancing in Computing Systems

1. What is Load Balancing?

Process of distributing workloads across multiple computing resources (servers, CPUs, network links, etc.) to optimize resource use, maximize throughput, minimize response time, and avoid overload.
Used in cloud computing, web servers, high-performance computing, networking (routers, switches), and embedded real-time systems.

2. Why is Load Balancing Needed?

Prevents any single resource from being a bottleneck or point of failure.
Increases system reliability and scalability.
Enhances fault tolerance—if one node fails, others pick up the load.
Reduces latency and improves user experience.

3. Where is Load Balancing Applied?

Data centers: distribute incoming traffic to clusters of web/app servers.
Embedded systems: schedule tasks on multiple cores/processors.
Networks: balance packets/flows over links/routers (LAG, ECMP).
Cloud platforms: autoscale and balance between virtual machines.

4. Load Balancing Algorithms

4.1 Static vs Dynamic

Static: Allocation decided in advance, based on predefined rules (e.g., Round Robin).
Dynamic: Allocations adapt to current loads or performance feedback (e.g., Least Connections, Adaptive).

4.2 Layer Perspective

Layer 4 (Transport): Balances based on TCP/UDP/IP information (e.g., hardware load balancers).
Layer 7 (Application): Considers content (HTTP headers, URLs), allows smarter routing.

5. Common Load Balancing Algorithms

5.1 Round Robin

Requests are distributed to each server in sequence.
Simple, does not consider server health or load.

int next_server = 0;
void handle_request(Request r) {
    assign_to_server(servers[next_server], r);
    next_server = (next_server + 1) % NUM_SERVERS;
}

5.2 Weighted Round Robin

Like round robin, but assigns more requests to servers with higher capacity/weight.

int weights[NUM_SERVERS] = {3, 1, 2}; // e.g., server 0 gets 3x as many requests
int counters[NUM_SERVERS] = {0, 0, 0};

void handle_request(Request r) {
    static int server = 0;
    while (counters[server] >= weights[server]) {
        counters[server] = 0;
        server = (server + 1) % NUM_SERVERS;
    }
    assign_to_server(servers[server], r);
    counters[server]++;
}

5.3 Least Connections

Assigns new request to the server with the fewest active connections.
Adapts dynamically to uneven load.

void handle_request(Request r) {
    int min = active_conns[0], idx = 0;
    for (int i = 1; i < NUM_SERVERS; ++i) {
        if (active_conns[i] < min) { min = active_conns[i]; idx = i; }
    }
    assign_to_server(servers[idx], r);
    active_conns[idx]++;
}

5.4 IP Hashing

Consistent hashing based on client IP; sticky sessions for the same client.

int hash_ip(char* ip) {
    int hash = 0;
    for (; *ip; ++ip) hash = (hash * 31) + *ip;
    return hash;
}
void handle_request(Request r) {
    int idx = hash_ip(r.client_ip) % NUM_SERVERS;
    assign_to_server(servers[idx], r);
}

5.5 Random Selection

Assigns request to a randomly chosen server.

#include <stdlib.h>
void handle_request(Request r) {
    int idx = rand() % NUM_SERVERS;
    assign_to_server(servers[idx], r);
}

5.6 Adaptive/Feedback-Based (Advanced)

Monitors real server response time, CPU/mem usage, or queue length.
Directs new traffic to least-loaded (measured) server.
Can use external metrics (e.g., Prometheus, SNMP) in modern clusters.

6. Software Implementation & Libraries

a. Linux/Unix

IPVS (IP Virtual Server), NGINX, HAProxy, LVS (Linux Virtual Server)
Configuration: specify backend servers, choose algorithm (rr, wrr, lc, etc.)

b. Cloud/Distributed Systems

Kubernetes: uses kube-proxy, load balancer services
AWS ELB, Azure Load Balancer, GCP Load Balancer—managed, auto-scale
Modern APIs: Traefik, Envoy Proxy, Istio (Service Meshes)

c. Embedded/RTOS

Real-time schedulers (Rate Monotonic, Earliest Deadline First)
Multi-core RTOS: partition tasks among cores for optimal utilization

7. Hardware Load Balancers

Specialized network appliances (e.g., F5, Citrix ADC) for ultra-low latency, L4/L7 balancing.
Use hardware ASICs for flow tracking, SSL offload, deep packet inspection.
Used by ISPs, enterprises, financial institutions.

8. Example Scenario: Web Server Load Balancing

A web application with 4 backend servers, HAProxy in front.
Clients connect to the load balancer (VIP), requests distributed per chosen algorithm.
If a server fails, balancer detects via health check and stops sending requests to it.
Logs, metrics, and alerts for monitoring.

Example HAProxy Config (for reference)

frontend http_front
    bind *:80
    default_backend web_servers

backend web_servers
    balance roundrobin
    server web1 192.168.1.101:80 check
    server web2 192.168.1.102:80 check
    server web3 192.168.1.103:80 check
    server web4 192.168.1.104:80 check

9. Advanced Topics

9.1 Global Server Load Balancing (GSLB)

What: Balances traffic not just within a data center, but across geographically distributed sites (multiple continents/countries).
How: Uses DNS or application-layer redirects to send clients to the nearest or healthiest region.
Why: Minimizes latency for global users, improves redundancy and disaster recovery.
Example: A company with servers in the US, Europe, and Asia configures GSLB to route each user to the closest data center; if the Europe region goes down, users are redirected to the next best site.

9.2 Session Persistence/Sticky Sessions

What: Ensures a user’s requests are routed to the same backend server during their session.
Why: Needed for applications storing user state (shopping cart, login sessions) in server memory instead of shared DB/cache.
How: Implemented via:
- Cookie-based (load balancer inserts a session cookie)
- IP-hash (same client IP always hashes to the same server)
- Application token/session affinity
Example: HAProxy `stick-table`, NGINX `ip_hash`, AWS ELB session stickiness.

9.3 Health Checking

What: Actively or passively monitors backend servers for availability and responsiveness.
Why: Prevents sending requests to failed or overloaded servers, improving reliability.
Types:
- Active: Load balancer pings or performs HTTP/TCP checks on servers periodically.
- Passive: Monitors error rates or connection failures.
How: On repeated failures, server is removed from rotation until healthy.
Example: HAProxy/NGINX `check` directive, cloud provider built-in health checks.

9.4 SSL Termination

What: Load balancer handles all SSL/TLS decryption and encryption, then forwards unencrypted HTTP to backend servers.
Why: Reduces CPU load on backend servers, simplifies certificate management, enables L7 inspection/filtering.
How: Load balancer stores certificates and private keys, manages secure connections from clients.
Example: `ssl`/`https` options in HAProxy, AWS/GCP/Azure load balancers, F5/Citrix ADC SSL offload.

9.5 Auto-scaling

What: Automatically adds or removes backend servers based on real-time metrics (CPU, response time, queue length).
Why: Ensures efficient resource usage and cost, adapts to traffic spikes, avoids under/over-provisioning.
How: Integrated with orchestration (e.g., Kubernetes HPA, AWS EC2 Auto Scaling, Google Cloud Managed Instance Groups).
Example: Web service running in Kubernetes scales pods up/down based on HTTP request rate; load balancer automatically adds new pods to its rotation.

10. Comparison Table

Algorithm	Static/Dynamic	Complexity	State Tracking	Use Cases	Weakness
Round Robin	Static	Low	No	Uniform servers, stateless	Ignores load/server size
Weighted Round Robin	Static	Low	Yes	Mixed server capacities	Still ignores live load
Least Connections	Dynamic	Medium	Yes	Web, DB servers, sticky	Needs connection tracking
IP Hash	Static	Low	No	Sticky session apps	Unbalanced if hot IPs
Random	Static	Low	No	Quick demo/testing	Can overload a server
Adaptive/Custom	Dynamic	High	Yes	Large-scale, cloud-native	Complexity, monitoring

Contents

Load Balancing in Servers

Load Balancing in Computing Systems

1. What is Load Balancing?

2. Why is Load Balancing Needed?

3. Where is Load Balancing Applied?

4. Load Balancing Algorithms

4.1 Static vs Dynamic

4.2 Layer Perspective

5. Common Load Balancing Algorithms

5.1 Round Robin

5.2 Weighted Round Robin

5.3 Least Connections

5.4 IP Hashing

5.5 Random Selection

5.6 Adaptive/Feedback-Based (Advanced)

6. Software Implementation & Libraries

a. Linux/Unix

b. Cloud/Distributed Systems

c. Embedded/RTOS

7. Hardware Load Balancers

8. Example Scenario: Web Server Load Balancing

Example HAProxy Config (for reference)

9. Advanced Topics

9.1 Global Server Load Balancing (GSLB)

9.2 Session Persistence/Sticky Sessions

9.3 Health Checking

9.4 SSL Termination

9.5 Auto-scaling

10. Comparison Table