Contents

Load Balancing in Servers

Load Balancing in Computing Systems

1. What is Load Balancing?

  • Process of distributing workloads across multiple computing resources (servers, CPUs, network links, etc.) to optimize resource use, maximize throughput, minimize response time, and avoid overload.
  • Used in cloud computing, web servers, high-performance computing, networking (routers, switches), and embedded real-time systems.

2. Why is Load Balancing Needed?

  • Prevents any single resource from being a bottleneck or point of failure.
  • Increases system reliability and scalability.
  • Enhances fault tolerance—if one node fails, others pick up the load.
  • Reduces latency and improves user experience.

3. Where is Load Balancing Applied?

  • Data centers: distribute incoming traffic to clusters of web/app servers.
  • Embedded systems: schedule tasks on multiple cores/processors.
  • Networks: balance packets/flows over links/routers (LAG, ECMP).
  • Cloud platforms: autoscale and balance between virtual machines.

4. Load Balancing Algorithms

4.1 Static vs Dynamic

  • Static: Allocation decided in advance, based on predefined rules (e.g., Round Robin).
  • Dynamic: Allocations adapt to current loads or performance feedback (e.g., Least Connections, Adaptive).

4.2 Layer Perspective

  • Layer 4 (Transport): Balances based on TCP/UDP/IP information (e.g., hardware load balancers).
  • Layer 7 (Application): Considers content (HTTP headers, URLs), allows smarter routing.

5. Common Load Balancing Algorithms

5.1 Round Robin

  • Requests are distributed to each server in sequence.
  • Simple, does not consider server health or load.
int next_server = 0;
void handle_request(Request r) {
    assign_to_server(servers[next_server], r);
    next_server = (next_server + 1) % NUM_SERVERS;
}

5.2 Weighted Round Robin

  • Like round robin, but assigns more requests to servers with higher capacity/weight.
int weights[NUM_SERVERS] = {3, 1, 2}; // e.g., server 0 gets 3x as many requests
int counters[NUM_SERVERS] = {0, 0, 0};

void handle_request(Request r) {
    static int server = 0;
    while (counters[server] >= weights[server]) {
        counters[server] = 0;
        server = (server + 1) % NUM_SERVERS;
    }
    assign_to_server(servers[server], r);
    counters[server]++;
}

5.3 Least Connections

  • Assigns new request to the server with the fewest active connections.
  • Adapts dynamically to uneven load.
void handle_request(Request r) {
    int min = active_conns[0], idx = 0;
    for (int i = 1; i < NUM_SERVERS; ++i) {
        if (active_conns[i] < min) { min = active_conns[i]; idx = i; }
    }
    assign_to_server(servers[idx], r);
    active_conns[idx]++;
}

5.4 IP Hashing

  • Consistent hashing based on client IP; sticky sessions for the same client.
int hash_ip(char* ip) {
    int hash = 0;
    for (; *ip; ++ip) hash = (hash * 31) + *ip;
    return hash;
}
void handle_request(Request r) {
    int idx = hash_ip(r.client_ip) % NUM_SERVERS;
    assign_to_server(servers[idx], r);
}

5.5 Random Selection

  • Assigns request to a randomly chosen server.
#include <stdlib.h>
void handle_request(Request r) {
    int idx = rand() % NUM_SERVERS;
    assign_to_server(servers[idx], r);
}

5.6 Adaptive/Feedback-Based (Advanced)

  • Monitors real server response time, CPU/mem usage, or queue length.
  • Directs new traffic to least-loaded (measured) server.
  • Can use external metrics (e.g., Prometheus, SNMP) in modern clusters.

6. Software Implementation & Libraries

a. Linux/Unix

  • IPVS (IP Virtual Server), NGINX, HAProxy, LVS (Linux Virtual Server)
  • Configuration: specify backend servers, choose algorithm (rr, wrr, lc, etc.)

b. Cloud/Distributed Systems

  • Kubernetes: uses kube-proxy, load balancer services
  • AWS ELB, Azure Load Balancer, GCP Load Balancer—managed, auto-scale
  • Modern APIs: Traefik, Envoy Proxy, Istio (Service Meshes)

c. Embedded/RTOS

  • Real-time schedulers (Rate Monotonic, Earliest Deadline First)
  • Multi-core RTOS: partition tasks among cores for optimal utilization

7. Hardware Load Balancers

  • Specialized network appliances (e.g., F5, Citrix ADC) for ultra-low latency, L4/L7 balancing.
  • Use hardware ASICs for flow tracking, SSL offload, deep packet inspection.
  • Used by ISPs, enterprises, financial institutions.

8. Example Scenario: Web Server Load Balancing

  • A web application with 4 backend servers, HAProxy in front.
  • Clients connect to the load balancer (VIP), requests distributed per chosen algorithm.
  • If a server fails, balancer detects via health check and stops sending requests to it.
  • Logs, metrics, and alerts for monitoring.

Example HAProxy Config (for reference)

frontend http_front
    bind *:80
    default_backend web_servers

backend web_servers
    balance roundrobin
    server web1 192.168.1.101:80 check
    server web2 192.168.1.102:80 check
    server web3 192.168.1.103:80 check
    server web4 192.168.1.104:80 check

9. Advanced Topics

9.1 Global Server Load Balancing (GSLB)

  • What: Balances traffic not just within a data center, but across geographically distributed sites (multiple continents/countries).
  • How: Uses DNS or application-layer redirects to send clients to the nearest or healthiest region.
  • Why: Minimizes latency for global users, improves redundancy and disaster recovery.
  • Example: A company with servers in the US, Europe, and Asia configures GSLB to route each user to the closest data center; if the Europe region goes down, users are redirected to the next best site.

9.2 Session Persistence/Sticky Sessions

  • What: Ensures a user’s requests are routed to the same backend server during their session.
  • Why: Needed for applications storing user state (shopping cart, login sessions) in server memory instead of shared DB/cache.
  • How: Implemented via:
    • Cookie-based (load balancer inserts a session cookie)
    • IP-hash (same client IP always hashes to the same server)
    • Application token/session affinity
  • Example: HAProxy `stick-table`, NGINX `ip_hash`, AWS ELB session stickiness.

9.3 Health Checking

  • What: Actively or passively monitors backend servers for availability and responsiveness.
  • Why: Prevents sending requests to failed or overloaded servers, improving reliability.
  • Types:
    • Active: Load balancer pings or performs HTTP/TCP checks on servers periodically.
    • Passive: Monitors error rates or connection failures.
  • How: On repeated failures, server is removed from rotation until healthy.
  • Example: HAProxy/NGINX `check` directive, cloud provider built-in health checks.

9.4 SSL Termination

  • What: Load balancer handles all SSL/TLS decryption and encryption, then forwards unencrypted HTTP to backend servers.
  • Why: Reduces CPU load on backend servers, simplifies certificate management, enables L7 inspection/filtering.
  • How: Load balancer stores certificates and private keys, manages secure connections from clients.
  • Example: `ssl`/`https` options in HAProxy, AWS/GCP/Azure load balancers, F5/Citrix ADC SSL offload.

9.5 Auto-scaling

  • What: Automatically adds or removes backend servers based on real-time metrics (CPU, response time, queue length).
  • Why: Ensures efficient resource usage and cost, adapts to traffic spikes, avoids under/over-provisioning.
  • How: Integrated with orchestration (e.g., Kubernetes HPA, AWS EC2 Auto Scaling, Google Cloud Managed Instance Groups).
  • Example: Web service running in Kubernetes scales pods up/down based on HTTP request rate; load balancer automatically adds new pods to its rotation.

10. Comparison Table

AlgorithmStatic/DynamicComplexityState TrackingUse CasesWeakness
Round RobinStaticLowNoUniform servers, statelessIgnores load/server size
Weighted Round RobinStaticLowYesMixed server capacitiesStill ignores live load
Least ConnectionsDynamicMediumYesWeb, DB servers, stickyNeeds connection tracking
IP HashStaticLowNoSticky session appsUnbalanced if hot IPs
RandomStaticLowNoQuick demo/testingCan overload a server
Adaptive/CustomDynamicHighYesLarge-scale, cloud-nativeComplexity, monitoring