Details
-
Bug
-
Resolution: Fixed
-
Medium
-
None
-
None
-
3
-
9223372036854775807
Description
Found out the root reason of the deadlock in sanity-pcc/test_99b:
A client holding a PW extent lock L1.
T1d - T8d: read direct I/O Client (OSC): use out all available RPC slots; 00000008:00000008:1.0:1758255488.304183:0:5556:0:(osc_request.c:3049:osc_build_rpc()) [0x200000405:0x9:0x0]: 1024 read pages, start 8192, end 9216, now 9r/0w/9d in flight OST Server side: Waiting on the server side locking; Lock callback to revoke the conflict L1 granted to the client; <=== Depend on the completion of T2.
T2: Client side lock blocking AST for L1 (OSC): osc_dlm_blocking_ast0()->osc_lock_flush()->osc_lock_discard_pages(): exclusive invalidate_lock() : waiting mapping invalidate_lock <=== Waiting for T3 or T4.
T3: Fast read generic_file_read_iter -- Acquire share invalidate_lock -- Waiting for page (folio) lock: -- fast read pgno: 3072 -- f99b.sanity-pcc: read ppos: 12582912, count: 1048576 <=== Waiting for T4.
T4: Generic file buffered read: ll_file_io_generic()) f99b.sanity-pcc: read ppos: 12582912, count: 1048576 - buffered read -- Acquire share invalidate_lock -- Acquire the page (folio) lock: pgno=3072 lov_io_submit() -> osc_io_submit() HANG <------------- Acquire page lock: pgno=3072, waiting for RPC slots. <==== Waiting for T1d-T8d
The deadlock: T1d-T8d => T2 => T3 | T4 => T1d-T8d