summaryrefslogtreecommitdiffstats
path: root/debian/patches-rt/locking-rt-mutex-fix-deadlock-in-device-mapper-block.patch
blob: 6dd43e679a72f854b97d3c0af76f65aae596aa56 (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
From: Mikulas Patocka <mpatocka@redhat.com>
Date: Mon, 13 Nov 2017 12:56:53 -0500
Subject: [PATCH] locking/rt-mutex: fix deadlock in device mapper / block-IO
Origin: https://www.kernel.org/pub/linux/kernel/projects/rt/5.2/older/patches-5.2.17-rt9.tar.xz

When some block device driver creates a bio and submits it to another
block device driver, the bio is added to current->bio_list (in order to
avoid unbounded recursion).

However, this queuing of bios can cause deadlocks, in order to avoid them,
device mapper registers a function flush_current_bio_list. This function
is called when device mapper driver blocks. It redirects bios queued on
current->bio_list to helper workqueues, so that these bios can proceed
even if the driver is blocked.

The problem with CONFIG_PREEMPT_RT_FULL is that when the device mapper
driver blocks, it won't call flush_current_bio_list (because
tsk_is_pi_blocked returns true in sched_submit_work), so deadlocks in
block device stack can happen.

Note that we can't call blk_schedule_flush_plug if tsk_is_pi_blocked
returns true - that would cause
BUG_ON(rt_mutex_real_waiter(task->pi_blocked_on)) in
task_blocks_on_rt_mutex when flush_current_bio_list attempts to take a
spinlock.

So the proper fix is to call blk_schedule_flush_plug in rt_mutex_fastlock,
when fast acquire failed and when the task is about to block.

CC: stable-rt@vger.kernel.org
[bigeasy: The deadlock is not device-mapper specific, it can also occur
          in plain EXT4]
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
---
 kernel/locking/rtmutex.c |   13 +++++++++++++
 1 file changed, 13 insertions(+)

--- a/kernel/locking/rtmutex.c
+++ b/kernel/locking/rtmutex.c
@@ -25,6 +25,7 @@
 #include <linux/sched/debug.h>
 #include <linux/timer.h>
 #include <linux/ww_mutex.h>
+#include <linux/blkdev.h>
 
 #include "rtmutex_common.h"
 
@@ -1895,6 +1896,15 @@ rt_mutex_fastlock(struct rt_mutex *lock,
 	if (likely(rt_mutex_cmpxchg_acquire(lock, NULL, current)))
 		return 0;
 
+	/*
+	 * If rt_mutex blocks, the function sched_submit_work will not call
+	 * blk_schedule_flush_plug (because tsk_is_pi_blocked would be true).
+	 * We must call blk_schedule_flush_plug here, if we don't call it,
+	 * a deadlock in I/O may happen.
+	 */
+	if (unlikely(blk_needs_flush_plug(current)))
+		blk_schedule_flush_plug(current);
+
 	return slowfn(lock, state, NULL, RT_MUTEX_MIN_CHAINWALK, ww_ctx);
 }
 
@@ -1912,6 +1922,9 @@ rt_mutex_timed_fastlock(struct rt_mutex
 	    likely(rt_mutex_cmpxchg_acquire(lock, NULL, current)))
 		return 0;
 
+	if (unlikely(blk_needs_flush_plug(current)))
+		blk_schedule_flush_plug(current);
+
 	return slowfn(lock, state, timeout, chwalk, ww_ctx);
 }