[LU-17066] lod_xattr_set()) ASSERTION( (!!(!strcmp(name, "lustre.""lov") || !strcmp(name, "trusted.lov")) == !!(!lod_dt_obj(dt)->ldo_comp_cached)) ) failed Created: 31/Aug/23 Updated: 13/Nov/23 |
|
| Status: | Open |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.16.0 |
| Fix Version/s: | Lustre 2.16.0 |
| Type: | Bug | Priority: | Critical |
| Reporter: | Oleg Drokin | Assignee: | WC Triage |
| Resolution: | Unresolved | Votes: | 0 |
| Labels: | None | ||
| Severity: | 3 |
| Rank (Obsolete): | 9223372036854775807 |
| Description |
|
Adding additional racer test https://review.whamcloud.com/c/fs/lustre-release/+/41368 surfaced the following crash: [16104.408588] LustreError: 4168:0:(lod_object.c:5136:lod_xattr_set()) ASSERTION( (!!(!strcmp(name, "lustre.""lov") || !strcmp(name, "trusted.lov")) == !!(!lod_dt_obj(dt)->ldo_comp_cached)) ) failed: [16104.411561] LustreError: 4168:0:(lod_object.c:5136:lod_xattr_set()) LBUG [16104.412125] Pid: 4168, comm: mdt_rdpg01_003 3.10.0-7.9-debug #2 SMP Tue Feb 1 18:17:58 EST 2022 [16104.413146] Call Trace: [16104.413633] [<0>] libcfs_call_trace+0x90/0xf0 [libcfs] [16104.414162] [<0>] lbug_with_loc+0x4c/0xa0 [libcfs] [16104.414692] [<0>] lod_xattr_set+0x1b13/0x1c90 [lod] [16104.415221] [<0>] mdo_xattr_set+0xc0/0x4c0 [mdd] [16104.415746] [<0>] mdd_xattr_set+0xf85/0x1200 [mdd] [16104.416279] [<0>] mo_xattr_set+0x43/0x45 [mdt] [16104.416812] [<0>] mdt_close_handle_layouts+0x9a4/0xee0 [mdt] [16104.417358] [<0>] mdt_mfd_close+0x5b2/0xbb0 [mdt] [16104.417887] [<0>] mdt_close_internal+0xb4/0x240 [mdt] [16104.418532] [<0>] mdt_close+0x28c/0x970 [mdt] [16104.419217] [<0>] tgt_request_handle+0x88e/0x19b0 [ptlrpc] [16104.419815] [<0>] ptlrpc_server_handle_request+0x251/0xc00 [ptlrpc] [16104.420555] [<0>] ptlrpc_main+0xc66/0x1670 [ptlrpc] [16104.421185] [<0>] kthread+0xe4/0xf0 [16104.422327] [<0>] ret_from_fork_nospec_begin+0x7/0x21 [16104.422934] [<0>] 0xfffffffffffffffe [16104.423456] Kernel panic - not syncing: LBUG Crashdump: http://testing.linuxhacker.ru/lustre-reports/external/crashes/boilpot-bigmem-65-2023-08-25-15:53:21/ Intresting that the same assertion in the same place was previously seen mid-2021 as part of https://review.whamcloud.com/44178 testing in sanity-flr and hot-pool tests: https://testing.whamcloud.com/test_sets/de728c9c-b9dd-4c65-b898-507df9215560 Here's the tracker for this crash that lists all matching failures: https://knox.linuxhacker.ru/crashdb_ui_external.py.cgi?newid=6580 |
| Comments |
| Comment by Alex Zhuravlev [ 16/Oct/23 ] |
|
I'm hitting this quite often locally. |
| Comment by Alex Zhuravlev [ 13/Nov/23 ] |
|
it's a race between migrate (which takes LAYOUT bit) and unlink (which does not). the migrate drops cached layout, but a racing unlink reloads it back. |
| Comment by Alex Zhuravlev [ 13/Nov/23 ] |
|
"Alex Zhuravlev <bzzz@whamcloud.com>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/53113 |