Details
-
Bug
-
Resolution: Unresolved
-
Critical
-
None
-
Lustre 2.16.0, Lustre 2.17.0
-
None
-
Vms is enough for reproducer
-
3
-
9223372036854775807
Description
The situation at vmcore looks next.
The client recovery thread trying to cancel local locks and wait at vvp_io_init()>filemap_invalidate_lock(inode>i_mapping). Another IO read thread hold this mutex and waits IO, but IO is on pause since a recovery process. So this is a some kind of deadlock between recovery and IO thread.
PID: 2092611 TASK: ffff911a97f90000 CPU: 19 COMMAND: "ldlm_lock_repla" #0 [ffffb6d520cbbad0] __schedule at ffffffffb9b32879 #1 [ffffb6d520cbbb38] schedule at ffffffffb9b32b2e #2 [ffffb6d520cbbb50] schedule_preempt_disabled at ffffffffb9b32dc1 #3 [ffffb6d520cbbb58] rwsem_down_write_slowpath at ffffffffb9b35fcd #4 [ffffb6d520cbbbf8] down_write at ffffffffb9b362f8 #5 [ffffb6d520cbbc08] vvp_io_init at ffffffffc2a762a1 [lustre] #6 [ffffb6d520cbbc70] __cl_io_init at ffffffffc21a8598 [obdclass] #7 [ffffb6d520cbbca0] osc_lock_discard_pages at ffffffffc2808940 [osc] #8 [ffffb6d520cbbce8] osc_lock_flush at ffffffffc27ec043 [osc] #9 [ffffb6d520cbbd38] osc_dlm_blocking_ast0.constprop.0 at ffffffffc27ec312 [osc] #10 [ffffb6d520cbbd70] osc_ldlm_blocking_ast at ffffffffc27ec5bf [osc] #11 [ffffb6d520cbbda0] ldlm_cancel_callback at ffffffffc254d5fd [ptlrpc] #12 [ffffb6d520cbbdf8] ldlm_lock_cancel at ffffffffc254d875 [ptlrpc] #13 [ffffb6d520cbbe10] ldlm_cli_cancel_list_local at ffffffffc256c520 [ptlrpc] #14 [ffffb6d520cbbe80] __ldlm_replay_locks at ffffffffc256d0bc [ptlrpc] #15 [ffffb6d520cbbf08] ldlm_lock_replay_thread at ffffffffc256d286 [ptlrpc] #16 [ffffb6d520cbbf18] kthread at ffffffffb8f3eb30 #17 [ffffb6d520cbbf50] ret_from_fork at ffffffffb8e0854c PID: 2086457 TASK: ffff911d62623980 CPU: 22 COMMAND: "fio" #0 [ffffb6d534f0f360] __schedule at ffffffffb9b32879 #1 [ffffb6d534f0f3c8] schedule at ffffffffb9b32b2e #2 [ffffb6d534f0f3e0] cl_sync_io_wait at ffffffffc21aa2ba [obdclass] #3 [ffffb6d534f0f438] ll_io_read_page at ffffffffc2a4023e [lustre] #4 [ffffb6d534f0f498] ll_readpage at ffffffffc2a40bac [lustre] #5 [ffffb6d534f0f528] filemap_read_folio at ffffffffb915af44 #6 [ffffb6d534f0f5c8] filemap_update_page at ffffffffb915be39 #7 [ffffb6d534f0f680] filemap_get_pages at ffffffffb915c2f7 #8 [ffffb6d534f0f748] filemap_read at ffffffffb915c515 #9 [ffffb6d534f0f848] vvp_io_read_start at ffffffffc2a73bf7 [lustre] #10 [ffffb6d534f0f8e0] cl_io_start at ffffffffc21a81e1 [obdclass] #11 [ffffb6d534f0f910] cl_io_loop at ffffffffc21ac999 [obdclass] #12 [ffffb6d534f0f948] ll_file_io_generic at ffffffffc2a11b01 [lustre] #13 [ffffb6d534f0fa58] do_file_read_iter at ffffffffc2a1337d [lustre] #14 [ffffb6d534f0fab8] aio_read at ffffffffb92d22d1 #15 [ffffb6d534f0fbd8] io_submit_one at ffffffffb92d5cde #16 [ffffb6d534f0fc50] __x64_sys_io_submit at ffffffffb92d6330 #17 [ffffb6d534f0fcc0] do_syscall_64 at ffffffffb9b228df
This new locking mechanism, which caused the deadlock, was added to support new kernel versions and has existed since version 2.16.
commit bba59b1287c9cd8c30a85fafb4fd5788452bd05c Author: Qian Yingjin <qian@ddn.com> Date: Tue Mar 21 04:53:00 2023 -0400 LU-16651 llite: hold invalidate_lock when invalidate cache pages The newer kernel (such as Ubuntu 2204) introduces a new member: invalidate_lock in the structure @address_space. The filesystem must exclusively acquire invalidate_lock before invalidating page cache in truncate / hole punch (and thus calling into ->invalidatepage) to block races between page cache invalidation and page cache filling functions (fault, read, ...) However, current Lustre client does not hold this lock when remove pages from page cache caused by the revocation of the extent DLM lock protecting them. If a client has two overlapped PR DLM extent locks, i.e: - L1 = <PR, [1M, 4M - 1] - L2 = <PR, [3M, 5M - 1] A reader process holds L1 and reads data in range [3M, 4M - 1]. L2 is being revoken due to the conflict access. Then the page read-in by the reader may be invalidated and deleted from page cache by the revocation of L2 (in lock blocking AST). The older kernel will check each page after read whether it was invalidated and deleted from page cache. If so, it will retry the page read. In the newer kernel, it removes this check and retry. Instead, it introduces a new rw_semaphore in the address_space - invalidate_lock - that holding the shared lock to protect adding of pages to page cache for page faults / reads / readahead, and the exclusive lock to protect invalidating pages, removing them from page cache for truncate / hole punch. Thus, in this patch it holds exclusive invalidate_lock in newer kernels when remove pages from page cache caused by the revocation of a extent DLM lock protecting them. Otherwsie, it will result in -EIO error or partial reads in the new added test case sanity/833. Test-parameters: clientdistro=ubuntu2204 testlist=sanity env=ONLY=833,ONLY_REPEAT=10 Change-Id: If3a27002b89636b9fd4d7b5ea50afa9aeac5d121 Signed-off-by: Qian Yingjin <qian@ddn.com> Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/50371 Tested-by: jenkins <devops@whamcloud.com> Tested-by: Maloo <maloo@whamcloud.com> Reviewed-by: Shaun Tancheff <shaun.tancheff@hpe.com> Reviewed-by: Oleg Drokin <green@whamcloud.com> Reviewed-by: Patrick Farrell <pfarrell@whamcloud.com>
The possible quick workaround would be disabling lock cancel during recovery.
lctl set_param ldlm.cancel_unused_locks_before_replay=0
But for a thick clients with thousands cached locks recovery would be a disaster at server.