Contents

Deadlock Handling in OS

Deadlock Handling in Operating Systems

Deadlocks in operating systems occur when processes are waiting on each other in a way that no process can proceed. It’s like a traffic gridlock where each car (process) holds a piece of the road (resource) and is waiting for another piece that’s held by another car. The result is a standstill. To understand how to handle deadlocks, we first outline the four conditions necessary for a deadlock to occur (Coffman conditions):

  1. Mutual Exclusion: At least one resource is non-shareable (only one process can use it at a time).
  2. Hold and Wait: Processes already holding resources can request new resources without releasing their current ones.
  3. No Preemption: Resources cannot be forcibly taken away from a process; the process must release resources voluntarily.
  4. Circular Wait: Two or more processes form a circular chain where each process waits for a resource that the next process in the chain holds.

If all four conditions hold simultaneously, a deadlock can occur. Handling deadlocks means ensuring at least one of these conditions is prevented, avoided, or breaking the deadlock if it happens. Below, we discuss various deadlock handling methods, from prevention and avoidance to detection, recovery, and even ignoring the problem in certain systems.

Note: These notes use Org-mode format with headings (, *) and bullet lists. Pseudocode is provided in source code blocks, and tables illustrate resource allocation examples.

  • Real-world example of a deadlock: Consider two processes, P1 and P2, and two resources, R1 and R2. P1 holds R1 and needs R2, while P2 holds R2 and needs R1. Neither will release what it holds until it gets the other resource. All four conditions are satisfied (they each exclusively hold one resource, are waiting for another, won’t be preempted, and form a circular wait). Both processes are stuck forever. Handling strategies below aim to prevent or resolve such situations.

Deadlock Prevention

Deadlock prevention is a proactive strategy: design the system in a way that it’s impossible for all four Coffman conditions to hold, thereby preventing deadlocks outright. In practice, we eliminate or limit one or more of the conditions:

Eliminating Mutual Exclusion:

If no resource were exclusively held, deadlocks can’t form. However, some resources are inherently non-shareable (e.g., a printer, or a file being written). We cannot eliminate mutual exclusion for those. For others, we can simulate sharing. A classic technique is spooling for printers: instead of giving the printer directly to processes, they write their print jobs to a shared spool (disk file). The spooler (OS) then feeds jobs to the printer one by one. This way, processes don’t hold the printer; they write to a shared file and move on. The printer resource is effectively shared via the spool file. Spooling eliminates the need for mutual exclusion on the printer resource from the processes’ perspective.

  • Edge case: Mutual exclusion is often unavoidable for certain resources (you can’t truly share memory cells or a tape drive simultaneously). In those cases, other prevention strategies must be used. Takeaway: Complete elimination of mutual exclusion is rarely possible beyond special cases like spooling; it’s more common to focus on the other conditions for prevention.

Eliminating Hold and Wait

We prevent scenarios where a process holds one resource while waiting for another. Two main approaches:

  1. Require all-at-once allocation: A process must request all resources it will need at once, before it begins execution. If all are available, they are allocated and the process starts. If any resource is not available, the process waits and does not hold any partial allocation. This way, a process never holds one resource while waiting for another – eliminating the hold-and-wait condition.

  2. Example: If a process needs a printer and a scanner, it must request both upfront. It will only start if it gets both. It won’t start with the printer and later ask for the scanner (which could lead to hold-and-wait).

  3. Require release before new request: If a process is holding resources and needs another that’s not immediately available, it must first release all the resources it currently holds, then request everything it needs again (current plus new). This ensures a process isn’t simultaneously holding some and waiting for more.

  4. Example: A process holding Resource A wants Resource B. If B is unavailable, the process must drop A, then later try again for both A and B together.

Trade-offs and issues: Eliminating hold-and-wait can lead to low resource utilization and potential starvation. If processes must request all resources at once, they might over-request (just in case), tying up resources they won’t use immediately. Or a process may never get started if it needs many resources at once and they rarely are all free together. Forcing processes to release resources before acquiring new ones can lead to extra overhead and the risk that when they try again, the situation repeats (possibly starvation if always contending). Despite these downsides, eliminating hold-and-wait is an effective prevention of deadlock by design.

Eliminating No Preemption

Allow resources to be preempted (forcibly taken away) if needed. Normally, “no preemption” means once a process has a resource, it holds it until done. To break this condition:

  1. If a process is holding some resources and requests another that cannot be immediately allocated (because it’s held by others or not free), the OS can preempt (take away) the resources the process is currently holding and release them for others. The process is then paused. Later, before it resumes, it will have to re-acquire those resources.

  2. Alternatively, if another process needs a resource currently held by a process that is waiting for something else, the OS may take the resource from the waiting process and give it to the one that can proceed. This requires rolling back the state of the resource-holding process or restarting it, since its operation was interrupted.

Example: Process P1 holds a memory buffer and requests access to a file on disk. The file is locked by P2. Instead of letting P1 wait (holding the buffer), the OS could suspend P1 and take back the memory buffer, allowing P2 or others to use it. P1 is resumed only when it can get both the buffer and the file.

Implementing preemption may be complicated: some resources (like CPU) are easy to preempt (just context-switch the process out), and memory can be preempted by swapping out a process. But resources like an open file or a lock protecting critical data aren’t easily taken away without causing inconsistency. Often, preemption in deadlock prevention is applied to resources where it’s safe – for example, locks might be preempted if we can roll back the transaction or computation. This overlaps with recovery techniques (taking resources and rolling back state) but as a prevention strategy, it means the OS won’t let processes hold resources while waiting indefinitely for others.

Eliminating Circular Wait

To prevent the circular wait condition, impose a strict ordering on resource acquisition. Assign each resource type a unique global index (a number or rank). Require that each process can request resources only in increasing order of the indices. By rule, no process can request a resource with a lower index than one it currently holds. This global ordering prevents cycles.

  • Example: Suppose we number resources: R1=1, R2=2, R3=3, etc. If a process already holds R2, it cannot request R1 (since 1 < 2). It could only request R3 or higher next. If every process follows this rule, you can’t have a circular chain because any chain of waiting must go up in resource order, and therefore it can’t loop back to a lower-numbered resource.
  • In practice, the system designer or OS must determine an ordering. For instance, if two locks protect related data structures, give them an order and always lock in that order (if need both, always lock the lower-numbered one first). Many software systems use this technique to avoid deadlocks in code.
  • Edge cases: Ordering may constrain programming logic. If a process needs resources in a naturally opposite order from another process, one of them will have to request out of its ideal order or release and reorder requests, which can be complex. Also, the set of resources might be dynamic (created at runtime), making strict global ordering challenging. Still, in practice, designing a consistent locking order is one of the most effective ways to prevent deadlocks in complex software (e.g., databases and kernel code often use lock hierarchies).

Summary of Deadlock Prevention: By ensuring at least one of the four conditions never holds, deadlocks are prevented. However, these approaches can limit concurrency and resource utilization. They are usually implemented via design constraints (e.g., programming guidelines to lock in order, or OS policies requiring upfront requests). Not all prevention techniques are practical for every resource type, so there’s a trade-off between safety (no deadlocks) and efficiency/flexibility.

Deadlock Avoidance

Deadlock avoidance is a more flexible strategy than strict prevention. Rather than restraining how processes request resources at all times, the system carefully examines each resource request dynamically and decides whether to grant it or make the process wait, based on whether granting it could lead the system into a deadlock down the line. In other words, the OS avoids unsafe states. It requires some advance knowledge of what processes might request (their maximum needs). If a request could potentially trap the system in a future deadlock, it is not granted.

The most famous deadlock avoidance algorithm is Dijkstra’s Banker’s Algorithm (named by analogy to a banker who ensures he can satisfy all customers’ maximum loan demands without running out of money). This algorithm is used for systems with multiple instances of resources and works by ensuring that the system never enters an “unsafe” state.

  • Safe vs. Unsafe State:

    A safe state is one where there exists at least one sequence of process executions (an order of running the processes to completion) such that each process can obtain all the resources it needs in proper order and complete successfully. If such a sequence exists, the system can avoid deadlock by scheduling processes in that order (or a safe order). An unsafe state is a state that is not necessarily deadlocked yet, but if the system enters that state, it’s possible that it will end up in a deadlock because no safe execution sequence exists. Deadlock avoidance algorithms ensure the system never leaves the safe state set.

  • Banker’s Algorithm – Concept:

    Each process must declare the maximum number of instances of each resource type it might need to complete. The OS then only allocates resources in a way that it can always service all processes to completion (perhaps in some order). When a process requests resources, the OS pretends to allocate them and checks if the resulting state would be safe. If it’s safe (still at least one way for everyone to finish), the request is granted; if it’s unsafe (could lead to deadlock), the request is deferred (the process waits).

    This requires:

    • Knowing each process’s Max demand for each resource type in advance.
    • Tracking how many resources of each type are currently Allocated to each process.
    • Calculating how many more each process Needs to finish (Need = Max – Allocated).
    • Knowing how many resources are currently Available (free).
  • Banker’s Algorithm Example Scenario

    We have:

    • 4 Resource types: A, B, C, D
    • 3 Processes: P1, P2, P3
    • Total System Resources
      ResourceABCD
      Total6576
      Available3112
    • Allocation and Maximum Demand Per Process
      ProcessAlloc AAlloc BAlloc CAlloc DMax AMax BMax CMax DNeed ANeed BNeed CNeed D
      P1122133222101
      P2103312340201
      P3121013500140
    • Step-by-Step Safety Check

      1. Step 1: P1
      2. Need = (2 A, 1 B, 0 C, 1 D)
      3. Available = (3 A, 1 B, 1 C, 2 D)
      4. All needs are ≤ available → P1 can finish.

      After P1 finishes:

      • Releases: (1 A, 2 B, 2 C, 1 D)

      • New Available: (4 A, 3 B, 3 C, 3 D)

      • Step 2: P2

      • Need = (0 A, 2 B, 0 C, 1 D)

      • Available = (4 A, 3 B, 3 C, 3 D)

      • All needs are ≤ available → P2 can finish.

      After P2 finishes:

      • Releases: (1 A, 0 B, 3 C, 3 D)

      • New Available: (5 A, 3 B, 6 C, 6 D)

      • Step 3: P3

      • Need = (0 A, 1 B, 4 C, 0 D)

      • Available = (5 A, 3 B, 6 C, 6 D)

      • All needs are ≤ available → P3 can finish.

      After P3 finishes:

      • Releases: (1 A, 2 B, 1 C, 0 D)
      • Final Available: (6 A, 5 B, 7 C, 6 D)

      Safe Sequence Found:

      • P1 → P2 → P3

      Since we found an order (P1 -> P2 -> P3) where each could finish, the state was safe. The Banker’s algorithm would allow the system to be in this state. If a request comes in, the system will check similarly if granting it keeps a safe state.

  • Pseudocode: Banker’s Algorithm (Resource Request):

    Below is a simplified pseudocode for handling a resource request using Banker’s Algorithm. It assumes we have the following data structures:

    • Available[resource]: array of available units for each resource type.
    • Max[process][resource]: max demand matrix.
    • Allocation[process][resource]: current allocation matrix.
    • Need[process][resource]: remaining needs matrix (Need = Max - Allocation).

Banker’s Algorithm: Resource Request Handling

This section explains the resource request mechanism using Banker’s Algorithm.

Assumptions:

  • N: Number of processes
  • M: Number of resource types
  • Data structures:
    • Available[M] – Available instances of each resource
    • Max[N][M] – Maximum demand matrix
    • Allocation[N][M] – Resources currently allocated
    • Need[N][M] – Remaining needs (Max - Allocation)

Pseudocode: RequestResources Function

// Function to request resources for process P
int RequestResources(int P, int Request[M]) {
    // Step 1: Check if request is greater than Need
    for (int r = 0; r < M; r++) {
        if (Request[r] > Need[P][r]) {
            printf("Error: Process has exceeded its maximum claim.\n");
            return -1; // FAIL
        }
    }

    // Step 2: Check if resources are available
    for (int r = 0; r < M; r++) {
        if (Request[r] > Available[r]) {
            return 0; // WAIT
        }
    }

    // Step 3: Pretend to allocate requested resources
    for (int r = 0; r < M; r++) {
        Available[r]     -= Request[r];
        Allocation[P][r] += Request[r];
        Need[P][r]       -= Request[r];
    }

    // Step 4: Safety check
    if (IsSafeState()) {
        return 1; // SUCCESS
    } else {
        // Rollback allocation
        for (int r = 0; r < M; r++) {
            Available[r]     += Request[r];
            Allocation[P][r] -= Request[r];
            Need[P][r]       += Request[r];
        }
        return 0; // WAIT
    }
}

Pseudocode: IsSafeState Function

// Returns true (1) if the state is safe
int IsSafeState() {
    int Work[M];
    int Finish[N] = {0}; // All processes initially not finished

    // Step 1: Copy Available to Work
    for (int r = 0; r < M; r++) {
        Work[r] = Available[r];
    }

    // Step 2: Try to find a process that can finish
    while (1) {
        int found = 0;
        for (int p = 0; p < N; p++) {
            if (!Finish[p]) {
                int canFinish = 1;
                for (int r = 0; r < M; r++) {
                    if (Need[p][r] > Work[r]) {
                        canFinish = 0;
                        break;
                    }
                }
                if (canFinish) {
                    // Simulate finishing
                    for (int r = 0; r < M; r++) {
                        Work[r] += Allocation[p][r];
                    }
                    Finish[p] = 1;
                    found = 1;
                    break;
                }
            }
        }
        if (!found)
            break;
    }

    // Step 3: Check if all processes could finish
    for (int i = 0; i < N; i++) {
        if (!Finish[i])
            return 0; // Not safe
    }
    return 1; // Safe
}
  • Avoidance in practice:

    Banker’s algorithm guarantees no deadlocks if processes follow the rules. However, it has limitations:

    • The number of processes is assumed fixed, and each process’s maximum needs must be declared upfront. In general-purpose OS, users/programs may not know this in advance.
    • It can under-utilize resources because it is conservative; it may make a process wait even if, in reality, everything would have been fine, just because it could lead to a deadlock in theory.
    • The algorithm’s complexity is manageable for moderate sizes (safety check is O(N * M) per request, N=processes, M=resource types), but still adds overhead on each resource request.

    Because of these issues, Banker’s algorithm is mostly of academic interest and used in environments where the maximum claims are known (or can be reasonably estimated), such as certain real-time systems or resource scheduling in specific applications. General-purpose operating systems typically do not use Banker’s algorithm globally; they opt for either prevention (via design) or detection & recovery or simply ignore deadlocks.

Deadlock Detection and Recovery

  • If deadlock prevention or avoidance is not used, a system may deliberately allow resource requests to proceed freely, optimistically assuming deadlocks won’t happen often. When they do occur, the system must detect and resolve them. This is a two-step approach:
  • Deadlock Detection: The OS monitors the state of resource allocation to detect if a deadlock has occurred (i.e., a set of processes are stuck waiting indefinitely).
  • Deadlock Recovery: Once detected, the OS takes action to break the deadlock and recover, e.g., by terminating one or more processes or preempting resources.

This approach lets the system have maximum resource utilization and concurrency in normal operation, paying the cost of detection/recovery only when needed. It’s essential in systems where slight inefficiency from prevention is unacceptable, and deadlocks are rare but must be dealt with when they occur.

Deadlock Detection:

The method of detection depends on the resource allocation characteristics:

  1. Single Instance per Resource Type: If each resource type has only one instance (e.g., one printer, one tape drive), deadlock detection can be done via a Resource Allocation Graph (RAG) cycle check. The RAG is a directed graph with processes and resources as nodes; an edge from process P to resource R means P is waiting for R, and an edge from resource R to process P means R is held by P. In a single-instance resource graph, if there is a cycle in this graph, that cycle indicates a deadlock (cycle is a necessary and sufficient condition for deadlock in this case).
  2. Example: P1 -> R2 -> P2 -> R1 -> P1 forms a cycle (P1 waits for R2 held by P2, and P2 waits for R1 held by P1). This cycle means P1 and P2 are deadlocked. A graph algorithm (like depth-first search or union-find) can detect cycles. The OS could run such a check whenever a resource request is denied or periodically.
  3. Multiple Instances per Resource Type: If resource types can have many units (e.g., memory pages, identical tape drives), a cycle in a RAG is not a definitive deadlock indicator – it might be possible that one process could still get a needed instance from somewhere else. In this case, detection typically uses an algorithm similar to Banker’s safety check but in a post-facto way:
  4. We construct a simplified graph called a Wait-For Graph: This graph has only processes as nodes; draw a directed edge P_i -> P_j if P_i is waiting for some resource that P_j currently holds. (The intermediate resource nodes are abstracted out.)
  5. In a wait-for graph, a cycle indicates a deadlock (this is actually true even with multiple instances, provided the graph is constructed at a state where each waiting is specific).
  6. Alternatively, the OS can maintain data structures of Allocation, Available, etc., and periodically perform a deadlock detection algorithm akin to the safety check: try to find if there is a sequence that allows all processes to finish from the current state. If some processes cannot finish (they remain marked unfinished in the algorithm), those are deadlocked.
  7. Efficiency: A detection algorithm for multiple instances might be run on a schedule (say every X seconds) or when CPU is idle, because doing it on every request might be expensive. The complexity is similar to safety check: O(N*M) for N processes and M resource types each time you run it.

Deadlock Recovery:

Once a deadlock is detected, the system must break it. The OS has a few recovery options:

  • Process Termination (Killing processes):

    The simplest way to break a deadlock is to kill one or more of the involved processes to free their resources. Two sub-strategies exist:

    • Kill all deadlocked processes: This is crude but effective — terminate every process involved in the cycle. This guarantees the deadlock is resolved at the cost of losing a lot of work (all those processes’ execution is lost, and they may need to be restarted from scratch if needed).
    • Kill one at a time: Terminate processes one by one until the cycle breaks. After each termination, check if the deadlock is resolved (the cycle is gone). This is more flexible: you can choose the order of termination to minimize damage. For example, terminate the process that is least costly to rerun, or the one with lowest priority, or the one that has used the least CPU time (so losing it wastes less), etc. This requires repeated detection checks if more than one needs to be killed.

    When deciding which process to kill (called the “victim selection”), consider:

    • Process priority (don’t kill high-priority tasks if possible).
    • How long the process has run or how much longer it may need (to avoid wasting long computations that are nearly done).
    • Resources the process has used (maybe kill the one holding the least or the most, depending on strategy).
    • Whether the process can be rolled back or easily restarted.

    Terminating processes can cause side effects (data inconsistency, if the process was in the middle of updating a file, etc.), but if deadlock recovery is needed, it might be acceptable with additional measures (like having transactions that can be safely aborted).

  • Resource Preemption (forcibly taking resources):

    Instead of killing processes outright, the OS can try to preempt resources from some deadlocked processes and give them to others, to break the wait cycle. This is essentially rolling back the state of a process as if it hadn’t owned that resource. Steps or considerations for resource preemption:

    • Selecting a victim resource (and process): Determine which resource, if freed, would help resolve the deadlock. This usually means picking a process and resource that can be taken with minimal cost. Similar factors as above apply (whose resource to take such that it hurts least).
    • Rollback: If we take a resource (like memory or a file lock) from a process, how does that process continue? Often, it cannot; we might need to roll the process back to a previous safe state (checkpoint) before it acquired that resource. This implies the system or process must have the ability to save state (checkpointing periodically). In OS, general processes typically do not checkpoint, but database transactions do, and some long-running processes might.
    • Starvation concern: If the same process is always chosen as the victim to release resources, it may never make progress (starvation). So the OS should ensure that a process isn’t indefinitely picked on — for instance, by lowering its priority or making it immune to victim selection after a certain number of preemptions.

    Example (resource preemption): Suppose P1 and P2 are deadlocked as before (P1 needs a resource P2 has and vice versa). Instead of killing either, the OS could decide to take resource R1 away from P1 (preempt it) and give it to P2. To do this safely, P1 might need to be rolled back to a point before it acquired R1 (so it doesn’t think it has it). If no such checkpoint exists, preemption might not be viable without termination. However, if P1 had been periodically saving state, the OS could rollback P1 and restart it from an earlier point, freeing R1 for P2. P2 can then proceed, and later P1 will retry.

  • Combined Approach (Process Rollback):

    This is a variant of resource preemption where instead of just grabbing one resource, you roll back entire processes to a checkpoint and restart them, thereby freeing all resources they had (like a partial termination, but with an option to rerun them from a saved state). For example, in database systems, if deadlock is detected, one transaction is rolled back (aborted) so others can continue, and the aborted one is restarted later.

  • Recovery in practice:

    In most operating systems, automated deadlock recovery is rare because it’s tricky to implement correctly. Terminating processes can be done by the OS (or an operator/admin), but deciding which and ensuring system consistency is complex. Preemption and rollback require infrastructure (like checkpointing) that general OS processes don’t usually have.

    Many OS will simply log the issue or alert an operator. In some cases (like a print spooler deadlock), a specific subsystem might handle it by resetting hardware or restarting a service. Database management systems (which run on OS but have their own resource management for locks) do have deadlock detection and will abort transactions to recover.

    Real-time systems, which can’t afford indefinite waits, may incorporate simpler deadlock recovery strategies or, more often, they prevent deadlocks by careful design (because recovery in real-time could violate timing guarantees).

    Example (Wait-for graph detection and recovery): Imagine a system periodically builds a wait-for graph of processes:

    • P1 waits for P2 (edge P1->P2)
    • P2 waits for P3 (P2->P3)
    • P3 waits for P1 (P3->P1)

    This cycle indicates deadlock among P1, P2, P3. The OS might choose to terminate P3 (breaking the cycle P3->P1, now P1 and P2 can resume). Alternatively, if P3 held some resources that P1 needed, the OS could temporarily take them from P3 (suspend P3) and give to P1. These are design choices.

    In summary, detection and recovery allow maximum concurrency but rely on being able to clean up when things go wrong. It’s used when deadlocks are expected to be infrequent or the cost of strict prevention is too high. The cost is the complexity of detection and potential disruption of recovery.

    • Ostrich Algorithm (Deadlock Ignorance)

    An often-used “strategy” for deadlocks in general-purpose operating systems is humorously called the Ostrich Algorithm – essentially, ignore the problem and hope it never becomes serious. The name comes from the idea of an ostrich burying its head in the sand to ignore danger. In deadlock terms, the OS does nothing about deadlocks: it makes no attempt to prevent, avoid, or detect them. If a deadlock occurs, it’s up to the user or administrator to notice that the system or application is hung and to intervene (e.g., by killing a process or rebooting the machine).

    Rationale: Deadlock handling (prevention/avoidance/detection) can be expensive in terms of system overhead and complexity, and in many systems, true deadlocks happen very rarely. For example, desktop operating systems (Windows, Linux, macOS) run thousands of processes and threads, and deadlocks are possible in theory, but are infrequent relative to other issues. The ostrich approach assumes the cost of dealing with deadlocks automatically is higher than the cost of the rare deadlock that might occur.

    Most general operating systems (like UNIX, Linux, Windows) adopt this approach for deadlocks involving general resources. They simply don’t implement a global deadlock-handling mechanism. If a deadlock happens, the effects might be:

    • Some programs or system components freeze (waiting forever).
    • Eventually, a user might kill the stuck application or reboot the system if the deadlock affects critical parts. In some cases, the system might appear to slow down or hang until intervention.

    Advantages:

    • Simplicity: The OS is simpler. No intricate logic to avoid or detect deadlocks, which means less overhead and fewer bugs in that area.
    • Performance: No CPU cycles or memory spent on deadlock detection algorithms or on restricting resource allocation for safety. The system can allocate resources greedily for efficiency.

    Disadvantages:

    • Unpredictability: If a deadlock does occur, the system doesn’t handle it. This can lead to hangs or reduced performance (resources tied up) that are hard to diagnose for the user.
    • Potential crashes: In worst cases, a deadlock in a crucial system component could freeze the entire system, forcing a hard reboot, possibly with data loss (e.g., if a disk driver and another system service deadlock, disk writes might stop).
    • Poor robustness: Relying on “it rarely happens” is not acceptable for mission-critical systems. Ignoring deadlocks means you’re hoping the software never has a bug or scenario that triggers a deadlock, which might be optimistic.

    Use cases: The ostrich algorithm is acceptable in environments where deadlocks are either extremely unlikely or not catastrophic. For instance, a personal computer OS can ignore deadlocks because the user can restart an application or reboot. Also, many deadlocks involve only a particular application’s processes (like two threads in the same program locking each other) – the OS doesn’t attempt to fix that; instead developers use synchronization techniques to avoid it, or the user kills the app if it hangs.

    Operating systems often focus more on recovering from crashes or memory errors (which happen more often) than on deadlocks. In practice, many deadlocks are resolved by external intervention or simply killing the programs (which is effectively manual recovery via process termination).

    • Implementation Considerations in Different Systems Not all deadlock handling strategies are used equally in all systems. The choice depends on the system’s goals (throughput vs. real-time guarantees vs. simplicity) and the nature of applications.
  • General-Purpose OS (Linux, Windows, Unix):

    General OS for desktops/servers typically value throughput and user responsiveness, and deadlocks are rare or primarily the concern of application developers. Thus:

    • They often implement the Ostrich algorithm at the OS level for process/resource deadlocks. The OS doesn’t run a global Banker’s algorithm or global deadlock detector for arbitrary resources.
    • Prevention by design: In kernel development, however, known potential deadlocks (especially involving kernel locks) are prevented by careful design. For example, Linux kernel developers impose strict locking orders (eliminating circular wait in kernel locks) and use lock hierarchies to avoid deadlocks in the kernel. Tools like lockdep (Lock Debugger) exist in debug builds to detect potential deadlock patterns in locking. This is essentially using prevention (eliminating circular wait) in the kernel.
    • There is usually no Banker’s Algorithm for CPU or memory or device locks in general OS; it’s impractical due to unknown demands and performance cost.
    • Detection utilities: While not in the core OS scheduling, there are user-level tools or subsystems: e.g., databases implement their own deadlock detection (within the DB engine), or the OS might detect a “hung” process (Windows has “Not Responding” status for apps that haven’t processed events, which could be due to deadlock) and allow the user to kill it.
    • Recovery: The OS typically doesn’t automatically kill processes for deadlock (except perhaps the Out-Of-Memory killer, but that’s for memory exhaustion, not deadlock). Recovery is manual (kill process, reboot system). One exception is in specific subsystems: e.g., some OS will reset a driver if it appears hung (which might be due to deadlock internally).
    • Ignoring with resets: Even Unix/Windows will effectively do an Ostrich approach but recover by restart: if the deadlock is within an OS service and it crashes, sometimes the system will reboot or restart that service.
  • Real-Time Operating Systems (RTOS)

    RTOS and safety-critical systems cannot afford to ignore deadlocks or wait arbitrarily, because timely task completion is crucial. However, they also cannot afford heavy runtime overhead. Typical approaches:

    • Prevention via design is key: Many RTOS applications are designed such that deadlocks cannot occur (tasks are carefully coded with resource ordering, or a task might only lock one resource at a time). For example, use of protocols like Priority Ceiling or Priority Inheritance for mutexes not only addresses priority inversion but also can prevent circular wait by ensuring a task holding a resource runs at a elevated priority and finishes quickly (reducing chances of waiting chains). Priority Ceiling Protocol in particular prevents a low-priority task from blocking higher ones by pre-locking resources in a way that avoids circular wait.
    • Avoidance: Classic Banker’s algorithm is usually not used in RTOS because tasks don’t typically declare maximum resources in that way and the overhead is high. Instead, avoidance might be achieved by careful scheduling analysis offline. For instance, before runtime, the system designers analyze worst-case resource needs and ensure a feasible schedule (this is more common in static systems).
    • Detection: Some RTOS might include a watchdog or deadline monitoring. If a high-priority task misses its deadline or seems stuck waiting, the system might assume a deadlock or livelock and trigger recovery (like resetting parts of the system).
    • Recovery: Because tasks often represent critical functions (like controlling a motor or sensor), simply killing a task might not be acceptable unless there is redundancy. Some RTOS will perform a system reset or fail-safe shutdown if a deadlock is detected, because continuing in a deadlocked state could be dangerous. Others might try more granular recovery if possible (restart just the deadlocked tasks or subsystems if they can be isolated).
    • Ostrich? Ignoring deadlock is generally not an option in RTOS when human lives or expensive machinery are at stake. Still, some simple embedded RTOS systems rely on the assumption that their limited, well-tested code won’t deadlock, effectively ignoring it but with high confidence due to testing and simplicity.
  • Embedded Systems:

    Embedded systems range from simple microcontroller loops to complex devices running an OS. Deadlock handling in embedded systems depends on the complexity:

    1. Simple embedded (no OS or simple loop): Usually single-threaded or cooperatively multitasked, so classic deadlocks (which require multiple concurrent processes) are avoided by not having true concurrency. In this case, deadlock prevention is by architecture (only one task or cooperative tasks that yield explicitly).
    2. Embedded OS (like FreeRTOS or embedded Linux on devices): These might face deadlocks if using multiple threads and resources. Commonly, developers use prevention strategies (designing lock orders, disabling interrupts around critical sections to avoid needing multiple locks, etc.). Memory and resources are limited, so dynamic deadlock avoidance algorithms are uncommon.
    3. Watchdogs: Many embedded systems include a hardware or software watchdog timer that will reset the system if it hangs (no timely heartbeat). A deadlock would trigger such a hang, and the watchdog reboots the system to restore operation. This is a form of brute-force recovery aligned with the ostrich approach—don’t manage deadlocks actively, but if the system freezes, automatically reset it.
    4. Static allocation: Embedded systems often allocate resources at startup and run tasks in a fixed sequence, which reduces the chance of deadlock. If tasks run in loops and each loop they acquire and release resources in the same order, that predictability prevents deadlock.
    5. Specialized recovery: If an embedded device has a non-critical task deadlock, sometimes the system can kill or restart that component (if it’s modular). But often, a deadlock in an embedded device means something went wrong in design, and the simplest recovery is a full system reboot (which might happen quickly and automatically, so the user barely notices beyond a transient downtime).

Conclusion

Deadlock handling requires balancing between safety and performance. For teaching and understanding, we categorize strategies as prevention, avoidance, detection/recovery, and ignorance. In real-world systems, elements of these strategies are combined based on practical needs. For example, an OS might prevent deadlocks in core components (prevention), let user processes deadlock themselves without intervention (ostrisch/ignore), and rely on user or admin actions (manual recovery) if needed. High-stakes systems lean on prevention or controlled avoidance, because they cannot risk a deadlock.