[LU-16954] mount failed: File exists(cannot create duplicate filename '/devices/virtual/bdi/lustre-ffffxxx') Created: 11/Jul/23  Updated: 02/Oct/23  Resolved: 28/Sep/23

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: None
Fix Version/s: Lustre 2.16.0

Type: Bug Priority: Minor
Reporter: Maloo Assignee: Qian Yingjin
Resolution: Fixed Votes: 0
Labels: None

Issue Links:
Related
is related to LU-16697 Lustre should set appropriate BDI_CAP... Resolved
is related to LU-16970 conf-sanity: test_32f failed with 1 Open
is related to LU-17012 conf-sanity test_113: crashed during ... Open
Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

This issue was created by maloo for emoly <emoly@whamcloud.com>

This issue relates to the following test suite run: https://testing.whamcloud.com/test_sets/6fe16807-4d7e-4087-9ad5-7a68e355230f

test_64f failed with the following error:

[ 3230.932454] Lustre: DEBUG MARKER: mount -t lustre -o user_xattr,flock onyx-53vm4@tcp:/lustre /mnt/lustre
[ 3230.946766] sysfs: cannot create duplicate filename '/devices/virtual/bdi/lustre-ffff8dd549f3d000'
[ 3230.949612] CPU: 0 PID: 251880 Comm: mount.lustre Kdump: loaded Tainted: G           OE     5.15.0-52-generic #58~20.04.1-Ubuntu
[ 3230.952966] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
[ 3230.954694] Call Trace:
[ 3230.955792]  <TASK>
[ 3230.956840]  dump_stack_lvl+0x4a/0x63
[ 3230.958180]  dump_stack+0x10/0x16
[ 3230.959417]  sysfs_warn_dup.cold+0x17/0x2b
[ 3230.960781]  sysfs_create_dir_ns+0xbc/0xd0
[ 3230.962169]  kobject_add_internal+0xba/0x2c0
[ 3230.963546]  kobject_add+0x7e/0xb0
[ 3230.964737]  ? mutex_lock+0x13/0x40
[ 3230.965939]  device_add+0x125/0x8f0
[ 3230.967123]  device_create_groups_vargs+0xd8/0x100
[ 3230.968520]  device_create+0x49/0x70
[ 3230.969687]  bdi_register_va.part.0+0x3f/0x210
[ 3230.971062]  bdi_register_va+0x1f/0x30
[ 3230.972250]  super_setup_bdi_name+0x79/0xe0
[ 3230.973549]  ll_fill_super+0xc85/0x1b30 [lustre]
[ 3230.975069]  ? lustre_start_mgc+0xf56/0x1a10 [obdclass]
[ 3230.976781]  ? lustre_start_mgc+0xf56/0x1a10 [obdclass]
[ 3230.978273]  lustre_fill_super+0xeb/0x490 [lustre]
[ 3230.979672]  ? ll_alloc_inode+0x180/0x180 [lustre]
[ 3230.981068]  mount_nodev+0x49/0xa0
[ 3230.982172]  lustre_mount+0x18/0x20 [lustre]
[ 3230.983474]  legacy_get_tree+0x2b/0x50
[ 3230.984653]  vfs_get_tree+0x2a/0xc0
[ 3230.985759]  ? capable+0x19/0x20
[ 3230.986837]  path_mount+0x461/0xa70
[ 3230.987916]  ? putname+0x57/0x70
[ 3230.988920]  do_mount+0x80/0xa0
[ 3230.989889]  __x64_sys_mount+0x8b/0xe0
[ 3230.990958]  do_syscall_64+0x5c/0xc0
[ 3230.991983]  ? syscall_exit_to_user_mode+0x27/0x50
[ 3230.993258]  ? __x64_sys_read+0x1a/0x20
[ 3230.994348]  ? do_syscall_64+0x69/0xc0
[ 3230.995445]  entry_SYSCALL_64_after_hwframe+0x61/0xcb
[ 3230.996774] RIP: 0033:0x7fd233293eae
[ 3230.997849] Code: 48 8b 0d 85 1f 0f 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 49 89 ca b8 a5 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 52 1f 0f 00 f7 d8 64 89 01 48
[ 3231.002521] RSP: 002b:00007fffe15baf68 EFLAGS: 00000286 ORIG_RAX: 00000000000000a5
[ 3231.004799] RAX: ffffffffffffffda RBX: 00007fffe15be348 RCX: 00007fd233293eae
[ 3231.006635] RDX: 000056144e7a10b2 RSI: 00007fffe15bafc8 RDI: 00005614504bf500
[ 3231.008459] RBP: 00005614504c0ad0 R08: 00005614504c0ad0 R09: 0000000000000000
[ 3231.010300] R10: 0000000001000000 R11: 0000000000000286 R12: 00000000fffffff5
[ 3231.012165] R13: 00007fffe15bafc8 R14: 0000000000000000 R15: 000056144e7a10b2
[ 3231.014018]  </TASK>
[ 3231.014949] kobject_add_internal failed for lustre-ffff8dd549f3d000 with -EEXIST, don't try to register things with the same name in the same directory.
[ 3231.020139] LustreError: 251880:0:(super25.c:187:lustre_fill_super()) llite: Unable to mount <unknown>: rc = -17
[ 3231.353241] Lustre: DEBUG MARKER: /usr/sbin/lctl mark  sanity test_64f: @@@@@@ FAIL: mount failed 
[ 3231.627375] Lustre: DEBUG MARKER: sanity test_64f: @@@@@@ FAIL: mount failed
[ 3231.947200] Lustre: DEBUG MARKER: /usr/sbin/lctl dk > /autotest/autotest-1/2023-07-10/e2fsprogs-reviews_review-e2fsprogs-part-2_1303_8_8e1e4e70-b5a1-4584-bdbe-d637b9866068//sanity.test_64f.debug_log.$(hostname -s).1688990136.log;

Test session details:
clients: https://build.whamcloud.com/job/lustre-master/4442 - 5.15.0-52-generic
servers: https://build.whamcloud.com/job/lustre-master/4442 - 4.18.0-425.10.1.el8_lustre.x86_64

VVVVVVV DO NOT REMOVE LINES BELOW, Added by Maloo for auto-association VVVVVVV
sanity test_64f - mount failed



 Comments   
Comment by Emoly Liu [ 11/Jul/23 ]

+1 https://testing.whamcloud.com/test_logs/2a6125b0-ef39-4420-9986-68406193286e/show_text

Comment by Gerrit Updater [ 11/Jul/23 ]

"Emoly Liu <emoly@whamcloud.com>" uploaded a new patch: https://review.whamcloud.com/c/tools/e2fsprogs/+/51627
Subject: LU-16954 test: fortestonly
Project: tools/e2fsprogs
Branch: master-lustre
Current Patch Set: 1
Commit: 2ccd59ad60c8dc4e1c037201dca25a4d6b10d9c8

Comment by Emoly Liu [ 12/Jul/23 ]

dongyang , could you please have a look at this issue? My test above showed this issue was caused by  the (latest) e2fsprogs master-lustre branch. 

Comment by Gerrit Updater [ 17/Jul/23 ]

"Emoly Liu <emoly@whamcloud.com>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/51693
Subject: LU-16954 test: fortestonly to verify commit f5a75ea
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 9998b59ba73195d3b6fc7146807216c9a38b77bc

Comment by Gerrit Updater [ 18/Jul/23 ]

"Li Dongyang <dongyangli@ddn.com>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/51701
Subject: LU-16954 Revert "LU-16697 llite: Set BDI_CAP_* flags for lustre"
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 77d98384ac0bb5d9b5d1076c62c13069c648ecfd

Comment by Emoly Liu [ 19/Jul/23 ]

The Maloo testing results above showed that this issue was caused by the patch of LU-16697 at https://review.whamcloud.com/#/c/fs/lustre-release/+/50497/ (commit f5a75ea44db32ac27ada327b4752c3bc611cf9df).

Comment by Qian Yingjin [ 24/Jul/23 ]

I can not reproduce this bug locally...

Comment by Gerrit Updater [ 24/Jul/23 ]

"Qian Yingjin <qian@ddn.com>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/51752
Subject: LU-16954 llite: debug patch for duplicate filename
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 360264582ae7c06aad7ac8e19fcffbf6c698c68c

Comment by Gerrit Updater [ 25/Jul/23 ]

"Qian Yingjin <qian@ddn.com>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/51755
Subject: LU-16954 test: fortestonly for duplicate filename failure
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 4280e036f9edadfbbe411b3903e6449d609ef59c

Comment by Qian Yingjin [ 28/Jul/23 ]

It seems to be related to the kernel patch: fs: explicitly unregister per-superblock BDIs.
See https://lore.kernel.org/all/20211021124441.668816-4-hch@lst.de/T/#u for more details: https://lkml.kernel.org/lkml/20220601111059.v4.1.I0e579520b03aa244906b8fe2ef1ec63f2ab7eecf@changeid/

Could Maloo testing system change to use the newer kenrel such as linux-image-unsigned-5.17.0-1003-oem/jammy 5.17.0-1003.3 amd64?
This kernel contains the patch. We'd better to try whether it fixes this failure.

Regards,
Qian

Comment by Li Xi [ 01/Aug/23 ]

mdiep Any possibility we can update the ubuntu 2204 kernel to > v5.17? Or can we specify any Test-Parameters to do that?

Comment by Li Xi [ 01/Aug/23 ]

A ticket already created: DCO-9417

Comment by Qian Yingjin [ 02/Aug/23 ]

According to the testing output (dmesg) in https://review.whamcloud.com/c/fs/lustre-release/+/51755,
Maloo testing on Ubuntu2204: ll_put_super():
https://testing.whamcloud.com/test_logs/0c171cbd-c4d4-4721-ba61-3d79b542b0a7/show_text

PUT SB - force: 0 unstable_nr: 0 ref: 2 wb_list: 0

local test on CentOS 8: ll_put_super():

PUT SB - force: 0 unstable_nr: 0 ref: 1 wb_list: 0 sb@00000000c38268c1 cfg_instance lustre-client-ffff935c2b64d800

On Ubuntu 2204, the bdi refcnt is 2, 1 more larger than on CentOS8.

This results the sysfs not unregister.
So I think the kernel patch "kernel patch: fs: explicitly unregister per-superblock BDIs" should help to solve this problem.

However, I still have no idea why bdi refcnt on Unbunt2204 is 2 during Lustre umount in ll_put_super...

Comment by Gerrit Updater [ 03/Aug/23 ]

"Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/c/fs/lustre-release/+/51701/
Subject: LU-16954 llite: do not set SB_I_CGROUPWB on super block
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: d604d68c3f8ab8fbb52f5dd19651a11ac3dc0374

Comment by Qian Yingjin [ 16/Aug/23 ]

The Maloo testing results verifies that this bug was fixed in the newer Ubuntun 2204 kernel >= v5.17:
https://testing.whamcloud.com/test_sessions/related?jobs=lustre-reviews&builds=96950#redirect

The patch kernel patch: fs: explicitly unregister per-superblock BDIs (See https://lore.kernel.org/all/20211021124441.668816-4-hch@lst.de/T/#u for more details) fixed the duplicate filename failure during Lustre mount.

It fixed by calling @bdi_unregister() directly to unregister the sysfs BDI interface.
In the newer kernel >= v5.17, @bdi_unregister is a kernel export symbol.
However, for Ubuntu 2204 v5.15, the function @bdi_unregister is not exported...

Thus, there is no easy way to fix this problem in the kernel <= v5.17.

void bdi_unregister(struct backing_dev_info *bdi)
{
    ...
    if (bdi->dev) {
		bdi_debug_unregister(bdi);
		device_unregister(bdi->dev);
		bdi->dev = NULL;
   }
   ...

The solution maybe directly call @bdi_debug_unregister(bdi) and device_unregister(bdi->dev) explicitly in ->put_super().

I will make a patch to see whether it will cause any problem by this way.

Comment by Gerrit Updater [ 16/Aug/23 ]

"Qian Yingjin <qian@ddn.com>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/51955
Subject: LU-16954 llite: add SB_I_CGROUPWB on super block
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 24263dc9ae44ad02639178f5b167dcd0f1ccc68f

Comment by Gerrit Updater [ 28/Sep/23 ]

"Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/c/fs/lustre-release/+/51955/
Subject: LU-16954 llite: add SB_I_CGROUPWB on super block for cgroup
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: dcc1dd39a67f15de9174e7acdda599e3c54c1421

Comment by Peter Jones [ 28/Sep/23 ]

It looks like this work is now complete for 2.16. The only remaining patches are fortestonly (so presumably will be abandoned)

Generated at Sat Feb 10 03:31:22 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.