Details
-
Bug
-
Resolution: Fixed
-
Major
-
None
-
None
-
3
-
9223372036854775807
Description
After applying the patch for LU-7927 to our code, another deadlock was exposed. It does not look like this was CAUSED by LU-7927, it just seems the timing change caused by LU-7927 allowed this bug to be observed. (Or possibly this code was deadlocking there first - It's hard to say precisely)
The lli_trunc_sem is taken in 'read' mode in both ll_page_mkwrite and vvp_io_fault_start. This can lead to a deadlock with another thread which asks for the semaphore in write mode before that time.
—
The issue is a double down_read on lli_trunc_sem:
PID: 35117 TASK: ffff8807c26e9680 CPU: 6 COMMAND: "fsx-linux-aio"
#0 [ffff8807c29f7ac0] schedule at ffffffff8149cf35
#1 [ffff8807c29f7b40] rwsem_down_read_failed at ffffffff8149ed25
#2 [ffff8807c29f7b90] call_rwsem_down_read_failed at ffffffff81271f64
#3 [ffff8807c29f7be8] vvp_io_fault_start at ffffffffa08f2526 [lustre]
#4 [ffff8807c29f7c58] cl_io_start at ffffffffa0522115 [obdclass]
#5 [ffff8807c29f7c80] cl_io_loop at ffffffffa0525705 [obdclass]
#6 [ffff8807c29f7cb0] ll_page_mkwrite at ffffffffa08d2a2a [lustre]
#7 [ffff8807c29f7d30] __do_fault at ffffffff81148c70
#8 [ffff8807c29f7db8] handle_mm_fault at ffffffff8114c2cf
#9 [ffff8807c29f7e40] __do_page_fault at ffffffff814a3420
#10 [ffff8807c29f7f40] do_page_fault at ffffffff814a37de
#11 [ffff8807c29f7f50] page_fault at ffffffff8149ff62
RIP: 000000002002551b RSP: 00007fffffff64c8 RFLAGS: 00010212
Done in ll_page_mkwrite, then again in vvp_io_fault_start.
This is a problem because a waiting writer takes priority over any
future readers. Here's an example of one:
PID: 35131 TASK: ffff8807c4ecf1c0 CPU: 13 COMMAND: "fsx-linux-aio"
#0 [ffff8807c3555b58] schedule at ffffffff8149cf35
#1 [ffff8807c3555bd8] rwsem_down_write_failed at ffffffff8149ef45
#2 [ffff8807c3555c50] call_rwsem_down_write_failed at ffffffff81271f93
#3 [ffff8807c3555ca0] vvp_io_setattr_start at ffffffffa08f0cea [lustre]
#4 [ffff8807c3555ce0] cl_io_start at ffffffffa0522115 [obdclass]
#5 [ffff8807c3555d08] cl_io_loop at ffffffffa0525705 [obdclass]
#6 [ffff8807c3555d38] cl_setattr_ost at ffffffffa08eb250 [lustre]
#7 [ffff8807c3555d80] ll_setattr_raw at ffffffffa08be009 [lustre]
#8 [ffff8807c3555e68] ll_setattr at ffffffffa08be313 [lustre]
#9 [ffff8807c3555e78] notify_change at ffffffff8119d401
#10 [ffff8807c3555eb8] do_truncate at ffffffff8118066d
#11 [ffff8807c3555f28] do_sys_ftruncate.constprop.20 at ffffffff811809bb
#12 [ffff8807c3555f70] sys_ftruncate at ffffffff81180a4e
#13 [ffff8807c3555f80] system_call_fastpath at ffffffff814a7db2
RIP: 0000000020152867 RSP: 00007fffffff6678 RFLAGS: 00010246
RAX: 000000000000004d RBX: ffffffff814a7db2 RCX: 0000010000081000
RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000005
RBP: 00007fffffff6670 R8: 0000000000000000 R9: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: ffffffff81180a4e
R13: ffff8807c3555f78 R14: 0000000000000000 R15: 00000000201028b0
ORIG_RAX: 000000000000004d CS: 0033 SS: 002b
Just to make clear, here's the sequence of events:
Thread 1 (pid 35117 above): down_read() <-- SUCCEEDS
Thread 2 (pid 35131 above): down_write() <-- FAILS, starts waiting
Thread 1: down_read() [again] <-- Fails, stuck behind thread 2 (which is
stuck behind thread 1)