[LU-5362] (mdc_locks.c:235:mdc_clear_replay_flag()) LBUG Created: 16/Jul/14 Updated: 16/Oct/21 |
|
| Status: | Open |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.7.0, Lustre 2.12.0, Lustre 2.13.0 |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Critical |
| Reporter: | John Hammond | Assignee: | WC Triage |
| Resolution: | Unresolved | Votes: | 0 |
| Labels: | fault, ldlm, mdc | ||
| Severity: | 3 |
| Rank (Obsolete): | 14951 |
| Description |
|
In mdc_enqueue() if ldlm_cli_enqueue() fails but the request has a transno then I see the folllowing: [ 257.072572] LustreError: 19779:0:(mdc_locks.c:904:mdc_enqueue()) @@@ transno returned on error rc -12 req@ffff8801f6191330 x1473823878678260/t4294967302(4294967302) o101->lustre-MDT0000-mdc-ffff8801f03e8ee0@0@lo:12/10 lens 632/544 e 0 to 0 dl 1405548101 ref 2 fl Complete:R/4/0 rc 0/0 [ 257.076802] LustreError: 19779:0:(mdc_locks.c:904:mdc_enqueue()) LBUG [ 257.077920] Pid: 19779, comm: sys_open [ 257.078613] [ 257.078614] Call Trace: [ 257.079369] [<ffffffffa02be8c5>] libcfs_debug_dumpstack+0x55/0x80 [libcfs] [ 257.080581] [<ffffffffa02beec7>] lbug_with_loc+0x47/0xb0 [libcfs] [ 257.081681] [<ffffffffa090ec36>] mdc_enqueue+0x1a26/0x1ab0 [mdc] [ 257.082731] [<ffffffffa090eea2>] mdc_intent_lock+0x1e2/0x52f [mdc] [ 257.083853] [<ffffffffa0e79110>] ? ll_md_blocking_ast+0x0/0x800 [lustre] [ 257.085084] [<ffffffffa0661b70>] ? ldlm_completion_ast+0x0/0x930 [ptlrpc] [ 257.086282] [<ffffffffa08c8d1b>] lmv_intent_open+0x31b/0x9f0 [lmv] [ 257.087380] [<ffffffffa0e79110>] ? ll_md_blocking_ast+0x0/0x800 [lustre] [ 257.088552] [<ffffffffa08c96d0>] lmv_intent_lock+0x2e0/0x1180 [lmv] [ 257.089691] [<ffffffffa041c5e6>] ? obd_alloc_fail+0x56/0x4e0 [obdclass] [ 257.090846] [<ffffffff81186c34>] ? kmem_cache_alloc_trace+0x254/0x2b0 [ 257.092020] [<ffffffffa0e76f79>] ? ll_i2suppgid+0x19/0x30 [lustre] [ 257.093145] [<ffffffffa0e76fbe>] ? ll_i2gids+0x2e/0xd0 [lustre] [ 257.094185] [<ffffffffa0e79110>] ? ll_md_blocking_ast+0x0/0x800 [lustre] [ 257.095381] [<ffffffffa0e7ad1a>] ll_lookup_it+0x25a/0xad0 [lustre] [ 257.096489] [<ffffffffa0e7b61c>] ll_lookup_nd+0x8c/0x4a0 [lustre] [ 257.097564] [<ffffffff811b29b5>] do_lookup+0x1a5/0x230 [ 257.098477] [<ffffffff811b2fc4>] __link_path_walk+0x584/0x840 [ 257.099492] [<ffffffff811b398a>] path_walk+0x6a/0xe0 [ 257.100401] [<ffffffff811b3b9b>] filename_lookup+0x6b/0xc0 [ 257.101374] [<ffffffff811b5074>] do_filp_open+0x104/0xd30 [ 257.102348] [<ffffffff811c255b>] ? alloc_fd+0x3b/0x160 [ 257.103249] [<ffffffff811c255b>] ? alloc_fd+0x3b/0x160 [ 257.104156] [<ffffffff815547cb>] ? _spin_unlock+0x2b/0x40 [ 257.105111] [<ffffffff811c25cb>] ? alloc_fd+0xab/0x160 [ 257.105978] [<ffffffff8119f809>] do_sys_open+0x69/0x140 [ 257.106901] [<ffffffff81554222>] ? trace_hardirqs_on_thunk+0x3a/0x3f [ 257.108009] [<ffffffff8119f920>] sys_open+0x20/0x30 [ 257.108903] [<ffffffff8100b072>] system_call_fastpath+0x16/0x1b [ 257.109934] Note that there are several ways that ldlm_cli_enqueue() may fail even if the request succeeded. Failure from ldlm_cli_enqueue_fini() is one example. This was found via memory allocation fault injection, specifically failing in ldlm_resource_new(). |
| Comments |
| Comment by Oleg Drokin [ 04/Sep/18 ] |
|
Still happening with 2.12 too. [ 491.544077] LustreError: 25274:0:(mdc_locks.c:202:mdc_clear_replay_flag()) @@@ transno returned on error rc -5 req@ffff8800912e6c80 x1610652424649696/t4295299960(4295299960) o101->lustre-MDT0000-mdc-ffff88005ad40800@0@lo:12/10 lens 376/312 e 0 to 0 dl 1536038359 ref 1 fl Complete:R/0/0 rc 0/0 [ 491.544079] LustreError: 25274:0:(mdc_locks.c:203:mdc_clear_replay_flag()) LBUG [ 491.544080] Pid: 25274, comm: dd 3.10.0-7.5-debug #2 SMP Sat Jun 2 18:41:57 EDT 2018 [ 491.544081] Call Trace: [ 491.544097] [<ffffffffa01a67dc>] libcfs_call_trace+0x8c/0xc0 [libcfs] [ 491.544102] [<ffffffffa01a688c>] lbug_with_loc+0x4c/0xa0 [libcfs] [ 491.544108] [<ffffffffa0853049>] osc_env_info.part.16+0x0/0x39 [mdc] [ 491.544112] [<ffffffffa084ac11>] mdc_enqueue_base+0x1771/0x19d0 [mdc] [ 491.544117] [<ffffffffa084b5bf>] mdc_intent_lock+0x12f/0x550 [mdc] [ 491.544122] [<ffffffffa0514586>] lmv_intent_lookup+0x226/0xb40 [lmv] [ 491.544125] [<ffffffffa0515209>] lmv_intent_lock+0x369/0xba0 [lmv] [ 491.544137] [<ffffffffa0e238cf>] ll_layout_intent+0x20f/0x770 [lustre] [ 491.544147] [<ffffffffa0e34d98>] ll_layout_write_intent+0x58/0x110 [lustre] [ 491.544159] [<ffffffffa0e72d7d>] vvp_io_fini+0x20d/0x6b0 [lustre] [ 491.544169] [<ffffffffa0e732a5>] vvp_io_setattr_fini+0x45/0x80 [lustre] [ 491.544204] [<ffffffffa036ab85>] cl_io_fini+0x75/0x250 [obdclass] [ 491.544213] [<ffffffffa0e6bda4>] cl_setattr_ost+0x154/0x3c0 [lustre] [ 491.544221] [<ffffffffa0e48408>] ll_setattr_raw+0xe78/0xed0 [lustre] [ 491.544228] [<ffffffffa0e484cc>] ll_setattr+0x6c/0xd0 [lustre] [ 491.544231] [<ffffffff8122cd41>] notify_change+0x2c1/0x420 [ 491.544233] [<ffffffff8120b7f5>] do_truncate+0x75/0xc0 [ 491.544235] [<ffffffff8120bbf0>] do_sys_ftruncate.constprop.19+0x120/0x180 [ 491.544236] [<ffffffff8120bc8e>] SyS_ftruncate+0xe/0x10 [ 491.544239] [<ffffffff817838e9>] system_call_fastpath+0x16/0x1b [ 491.544242] [<ffffffffffffffff>] 0xffffffffffffffff |
| Comment by Oleg Drokin [ 24/Jun/19 ] |
|
This hits on master/master-next still and the frequency somewhat increases with LU-11623 patches so we really need to do something here I guess. |
| Comment by Andriy Skulysh [ 16/Oct/21 ] |
|
It looks like a duplicate of LU-15120 |