[LU-14862] Zero fid assignment post eviction followed by ASSERTION( name->name[0] != 0 ) failed Created: 19/Jul/21 Updated: 02/Sep/22 |
|
| Status: | Open |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.15.0 |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Minor |
| Reporter: | Oleg Drokin | Assignee: | WC Triage |
| Resolution: | Unresolved | Votes: | 0 |
| Labels: | None | ||
| Severity: | 3 |
| Rank (Obsolete): | 9223372036854775807 |
| Description |
|
It looks like lmv_revalidate_slaves() does not validate the fid it receives in the loop (unlike many other places that check with fid_is_sane) and calls mdc_intent_lock on a zero fid causing the assertion to hit. Now it look the problem is deeper somewhere since we even get said zero fid assigned somewhere, though I am not sure where. The trace below is from racer on latest master [ 2289.873955] LustreError: 27285:0:(file.c:5018:ll_inode_revalidate_fini()) lustre: revalidate FID [0x200000402:0x1:0x0] error: rc = -5 [ 2289.879247] LustreError: 27285:0:(file.c:5018:ll_inode_revalidate_fini()) Skipped 407 previous similar messages [ 2290.097986] LustreError: 27109:0:(vvp_io.c:1793:vvp_io_init()) lustre: refresh file layout [0x200000406:0xd650:0x0] error -108. [ 2293.106030] LustreError: 28772:0:(llite_nfs.c:342:ll_dir_get_parent_fid()) lustre: failure inode [0x240000409:0x4bab:0x0] get parent: rc = -116 [ 2293.108749] LustreError: 28772:0:(llite_nfs.c:342:ll_dir_get_parent_fid()) Skipped 29 previous similar messages [ 2303.467231] Lustre: 14085:0:(client.c:2285:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1626631396/real 1626631396] req@ffff8800be874280 x1705644635655488/t0(0) o36->lustre-MDT0000-mdc-ffff8800d6af9800@192.168.201.101@tcp:12/10 lens 496/440 e 0 to 1 dl 1626631525 ref 2 fl Rpc:XQr/0/ffffffff rc 0/-1 job:'setfattr.0' [ 2303.473349] Lustre: 14085:0:(client.c:2285:ptlrpc_expire_one_request()) Skipped 14 previous similar messages [ 2377.306136] LustreError: 13927:0:(llite_lib.c:1674:ll_update_lsm_md()) lustre: [0x200000404:0x1ea58:0x0] dir layout mismatch: [ 2377.308485] LustreError: 13927:0:(lustre_lmv.h:134:lsm_md_dump()) magic 0xcd20cd0 stripe count 1 master mdt 0 hash type crush:0x2000003 max inherit 0 version 1 migrate offset 0 migrate hash 0x0 pool [ 2377.311901] LustreError: 13927:0:(lustre_lmv.h:141:lsm_md_dump()) stripe[0] [0x200000400:0x529:0x0] [ 2377.313845] LustreError: 13927:0:(lustre_lmv.h:134:lsm_md_dump()) magic 0xcd20cd0 stripe count 1 master mdt 0 hash type crush:0x2000003 max inherit 0 version 1 migrate offset 0 migrate hash 0x0 pool [ 2377.317112] LustreError: 13927:0:(lustre_lmv.h:141:lsm_md_dump()) stripe[0] [0x0:0x0:0x0] [ 2377.318945] LustreError: 13927:0:(llite_lib.c:2955:ll_prep_inode()) new_inode -fatal: rc -22 [ 2377.320628] LustreError: 13927:0:(llite_lib.c:2955:ll_prep_inode()) Skipped 19 previous similar messages [ 2377.320805] LustreError: 13967:0:(ldlm_resource.c:1498:ldlm_resource_get()) ASSERTION( name->name[0] != 0 ) failed: [ 2377.320815] LustreError: 13967:0:(ldlm_resource.c:1498:ldlm_resource_get()) LBUG [ 2377.320820] Pid: 13967, comm: ls 3.10.0-7.9-debug #1 SMP Mon Feb 1 17:33:41 EST 2021 [ 2377.320820] Call Trace: [ 2377.320914] [<0>] libcfs_call_trace+0x90/0xf0 [libcfs] [ 2377.320924] [<0>] lbug_with_loc+0x4c/0xa0 [libcfs] [ 2377.321103] [<0>] ldlm_resource_get+0x7d9/0x940 [ptlrpc] [ 2377.321137] [<0>] ldlm_lock_create+0x55/0x9e0 [ptlrpc] [ 2377.321177] [<0>] ldlm_cli_enqueue+0xc8/0x9e0 [ptlrpc] [ 2377.321186] [<0>] mdc_enqueue_base+0x323/0x14a0 [mdc] [ 2377.321191] [<0>] mdc_intent_lock+0x135/0x570 [mdc] [ 2377.321213] [<0>] lmv_revalidate_slaves+0x452/0xbd0 [lmv] [ 2377.321216] [<0>] lmv_merge_attr+0x45/0x1b0 [lmv] [ 2377.321285] [<0>] ll_getattr_dentry+0x6c5/0x920 [lustre] [ 2377.321296] [<0>] ll_getattr+0x1e/0x20 [lustre] [ 2377.321300] [<0>] vfs_getattr+0x46/0x80 [ 2377.321301] [<0>] vfs_fstat+0x45/0x80 [ 2377.321303] [<0>] SYSC_newfstat+0x24/0x60 [ 2377.321305] [<0>] SyS_newfstat+0xe/0x10 [ 2377.321308] [<0>] system_call_fastpath+0x1f/0x24 [ 2377.321345] [<0>] 0xfffffffffffffffe [ 2377.321347] Kernel panic - not syncing: LBUG But when I was searching for similar crashes, I also arrived at this https://testing.whamcloud.com/test_sets/4fe20054-a07f-11e9-9e3d-52540065bddc which is essentially the same crash only we arrived there in a bit of a different way - not because so "dir layout mismatch" seems to be a bit of a red herring and a fallout from earlier incorrect fid assignment too. |
| Comments |
| Comment by Alex Zhuravlev [ 02/Sep/22 ] |
|
I'm hitting this quite often in racer:
PID: 504341 TASK: ffff891905fb92c0 CPU: 0 COMMAND: "ls"
#0 [ffff891897e4f790] panic at ffffffff820af786
/tmp/kernel/kernel/panic.c: 299
#1 [ffff891897e4f820] ldlm_resource_get at ffffffffc0589652 [ptlrpc]
/home/lustre/master-mine/libcfs/include/libcfs/libcfs_debug.h: 156
#2 [ffff891897e4f868] ldlm_lock_create at ffffffffc0581e76 [ptlrpc]
/home/lustre/master-mine/lustre/ptlrpc/../../lustre/ldlm/ldlm_lock.c: 1671
#3 [ffff891897e4f8a8] ldlm_cli_enqueue at ffffffffc05a1cbd [ptlrpc]
/home/lustre/master-mine/lustre/ptlrpc/../../lustre/ldlm/ldlm_request.c: 1008
#4 [ffff891897e4f928] mdc_enqueue_base at ffffffffc09061e7 [mdc]
/home/lustre/master-mine/lustre/mdc/mdc_locks.c: 1006
#5 [ffff891897e4fa28] mdc_intent_lock at ffffffffc090856a [mdc]
/home/lustre/master-mine/lustre/mdc/mdc_locks.c: 1360
#6 [ffff891897e4fac8] lmv_revalidate_slaves at ffffffffc0815ebc [lmv]
/home/lustre/master-mine/lustre/lmv/lmv_intent.c: 225
#7 [ffff891897e4fb88] lmv_merge_attr at ffffffffc080925d [lmv]
/home/lustre/master-mine/lustre/lmv/lmv_obd.c: 3827
#8 [ffff891897e4fbc0] ll_update_inode at ffffffffc0f7189a [lustre]
/home/lustre/master-mine/lustre/llite/llite_lib.c: 1804
#9 [ffff891897e4fc38] ll_prep_inode at ffffffffc0f74b7d [lustre]
/home/lustre/master-mine/lustre/llite/llite_lib.c: 3245
#10 [ffff891897e4fd00] ll_revalidate_it_finish at ffffffffc0f3a10b [lustre]
/home/lustre/master-mine/lustre/llite/dcache.c: 271
#11 [ffff891897e4fd48] ll_inode_revalidate at ffffffffc0f49b6e [lustre]
/home/lustre/master-mine/lustre/llite/file.c: 5170
#12 [ffff891897e4fdc0] ll_getattr_dentry at ffffffffc0f58a28 [lustre]
/home/lustre/master-mine/lustre/llite/file.c: 5247
#13 [ffff891897e4fdf8] __set_pages_np at ffffffff820a3310
/tmp/kernel/arch/x86/mm/pageattr.c: 2303
#14 [ffff891897e4fe68] vfs_statx_fd at ffffffff821cab4f
/tmp/kernel/./include/linux/file.h: 43
#15 [ffff891897e4fea0] __se_sys_newfstat at ffffffff821cae4a
|