[LU-16954] mount failed: File exists(cannot create duplicate filename '/devices/virtual/bdi/lustre-ffffxxx') Created: 11/Jul/23 Updated: 02/Oct/23 Resolved: 28/Sep/23 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | None |
| Fix Version/s: | Lustre 2.16.0 |
| Type: | Bug | Priority: | Minor |
| Reporter: | Maloo | Assignee: | Qian Yingjin |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | None | ||
| Issue Links: |
|
||||||||||||||||
| Severity: | 3 | ||||||||||||||||
| Rank (Obsolete): | 9223372036854775807 | ||||||||||||||||
| Description |
|
This issue was created by maloo for emoly <emoly@whamcloud.com> This issue relates to the following test suite run: https://testing.whamcloud.com/test_sets/6fe16807-4d7e-4087-9ad5-7a68e355230f test_64f failed with the following error: [ 3230.932454] Lustre: DEBUG MARKER: mount -t lustre -o user_xattr,flock onyx-53vm4@tcp:/lustre /mnt/lustre [ 3230.946766] sysfs: cannot create duplicate filename '/devices/virtual/bdi/lustre-ffff8dd549f3d000' [ 3230.949612] CPU: 0 PID: 251880 Comm: mount.lustre Kdump: loaded Tainted: G OE 5.15.0-52-generic #58~20.04.1-Ubuntu [ 3230.952966] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011 [ 3230.954694] Call Trace: [ 3230.955792] <TASK> [ 3230.956840] dump_stack_lvl+0x4a/0x63 [ 3230.958180] dump_stack+0x10/0x16 [ 3230.959417] sysfs_warn_dup.cold+0x17/0x2b [ 3230.960781] sysfs_create_dir_ns+0xbc/0xd0 [ 3230.962169] kobject_add_internal+0xba/0x2c0 [ 3230.963546] kobject_add+0x7e/0xb0 [ 3230.964737] ? mutex_lock+0x13/0x40 [ 3230.965939] device_add+0x125/0x8f0 [ 3230.967123] device_create_groups_vargs+0xd8/0x100 [ 3230.968520] device_create+0x49/0x70 [ 3230.969687] bdi_register_va.part.0+0x3f/0x210 [ 3230.971062] bdi_register_va+0x1f/0x30 [ 3230.972250] super_setup_bdi_name+0x79/0xe0 [ 3230.973549] ll_fill_super+0xc85/0x1b30 [lustre] [ 3230.975069] ? lustre_start_mgc+0xf56/0x1a10 [obdclass] [ 3230.976781] ? lustre_start_mgc+0xf56/0x1a10 [obdclass] [ 3230.978273] lustre_fill_super+0xeb/0x490 [lustre] [ 3230.979672] ? ll_alloc_inode+0x180/0x180 [lustre] [ 3230.981068] mount_nodev+0x49/0xa0 [ 3230.982172] lustre_mount+0x18/0x20 [lustre] [ 3230.983474] legacy_get_tree+0x2b/0x50 [ 3230.984653] vfs_get_tree+0x2a/0xc0 [ 3230.985759] ? capable+0x19/0x20 [ 3230.986837] path_mount+0x461/0xa70 [ 3230.987916] ? putname+0x57/0x70 [ 3230.988920] do_mount+0x80/0xa0 [ 3230.989889] __x64_sys_mount+0x8b/0xe0 [ 3230.990958] do_syscall_64+0x5c/0xc0 [ 3230.991983] ? syscall_exit_to_user_mode+0x27/0x50 [ 3230.993258] ? __x64_sys_read+0x1a/0x20 [ 3230.994348] ? do_syscall_64+0x69/0xc0 [ 3230.995445] entry_SYSCALL_64_after_hwframe+0x61/0xcb [ 3230.996774] RIP: 0033:0x7fd233293eae [ 3230.997849] Code: 48 8b 0d 85 1f 0f 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 49 89 ca b8 a5 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 52 1f 0f 00 f7 d8 64 89 01 48 [ 3231.002521] RSP: 002b:00007fffe15baf68 EFLAGS: 00000286 ORIG_RAX: 00000000000000a5 [ 3231.004799] RAX: ffffffffffffffda RBX: 00007fffe15be348 RCX: 00007fd233293eae [ 3231.006635] RDX: 000056144e7a10b2 RSI: 00007fffe15bafc8 RDI: 00005614504bf500 [ 3231.008459] RBP: 00005614504c0ad0 R08: 00005614504c0ad0 R09: 0000000000000000 [ 3231.010300] R10: 0000000001000000 R11: 0000000000000286 R12: 00000000fffffff5 [ 3231.012165] R13: 00007fffe15bafc8 R14: 0000000000000000 R15: 000056144e7a10b2 [ 3231.014018] </TASK> [ 3231.014949] kobject_add_internal failed for lustre-ffff8dd549f3d000 with -EEXIST, don't try to register things with the same name in the same directory. [ 3231.020139] LustreError: 251880:0:(super25.c:187:lustre_fill_super()) llite: Unable to mount <unknown>: rc = -17 [ 3231.353241] Lustre: DEBUG MARKER: /usr/sbin/lctl mark sanity test_64f: @@@@@@ FAIL: mount failed [ 3231.627375] Lustre: DEBUG MARKER: sanity test_64f: @@@@@@ FAIL: mount failed [ 3231.947200] Lustre: DEBUG MARKER: /usr/sbin/lctl dk > /autotest/autotest-1/2023-07-10/e2fsprogs-reviews_review-e2fsprogs-part-2_1303_8_8e1e4e70-b5a1-4584-bdbe-d637b9866068//sanity.test_64f.debug_log.$(hostname -s).1688990136.log; Test session details: VVVVVVV DO NOT REMOVE LINES BELOW, Added by Maloo for auto-association VVVVVVV |
| Comments |
| Comment by Emoly Liu [ 11/Jul/23 ] |
|
+1 https://testing.whamcloud.com/test_logs/2a6125b0-ef39-4420-9986-68406193286e/show_text |
| Comment by Gerrit Updater [ 11/Jul/23 ] |
|
|
| Comment by Emoly Liu [ 12/Jul/23 ] |
|
dongyang , could you please have a look at this issue? My test above showed this issue was caused by the (latest) e2fsprogs master-lustre branch. |
| Comment by Gerrit Updater [ 17/Jul/23 ] |
|
|
| Comment by Gerrit Updater [ 18/Jul/23 ] |
|
"Li Dongyang <dongyangli@ddn.com>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/51701 |
| Comment by Emoly Liu [ 19/Jul/23 ] |
|
The Maloo testing results above showed that this issue was caused by the patch of |
| Comment by Qian Yingjin [ 24/Jul/23 ] |
|
I can not reproduce this bug locally... |
| Comment by Gerrit Updater [ 24/Jul/23 ] |
|
|
| Comment by Gerrit Updater [ 25/Jul/23 ] |
|
|
| Comment by Qian Yingjin [ 28/Jul/23 ] |
|
It seems to be related to the kernel patch: fs: explicitly unregister per-superblock BDIs. Could Maloo testing system change to use the newer kenrel such as linux-image-unsigned-5.17.0-1003-oem/jammy 5.17.0-1003.3 amd64? Regards, |
| Comment by Li Xi [ 01/Aug/23 ] |
|
mdiep Any possibility we can update the ubuntu 2204 kernel to > v5.17? Or can we specify any Test-Parameters to do that? |
| Comment by Li Xi [ 01/Aug/23 ] |
|
A ticket already created: DCO-9417 |
| Comment by Qian Yingjin [ 02/Aug/23 ] |
|
According to the testing output (dmesg) in https://review.whamcloud.com/c/fs/lustre-release/+/51755, PUT SB - force: 0 unstable_nr: 0 ref: 2 wb_list: 0 local test on CentOS 8: ll_put_super(): PUT SB - force: 0 unstable_nr: 0 ref: 1 wb_list: 0 sb@00000000c38268c1 cfg_instance lustre-client-ffff935c2b64d800 On Ubuntu 2204, the bdi refcnt is 2, 1 more larger than on CentOS8. This results the sysfs not unregister. However, I still have no idea why bdi refcnt on Unbunt2204 is 2 during Lustre umount in ll_put_super... |
| Comment by Gerrit Updater [ 03/Aug/23 ] |
|
"Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/c/fs/lustre-release/+/51701/ |
| Comment by Qian Yingjin [ 16/Aug/23 ] |
|
The Maloo testing results verifies that this bug was fixed in the newer Ubuntun 2204 kernel >= v5.17: The patch kernel patch: fs: explicitly unregister per-superblock BDIs (See https://lore.kernel.org/all/20211021124441.668816-4-hch@lst.de/T/#u for more details) fixed the duplicate filename failure during Lustre mount. It fixed by calling @bdi_unregister() directly to unregister the sysfs BDI interface. Thus, there is no easy way to fix this problem in the kernel <= v5.17.
void bdi_unregister(struct backing_dev_info *bdi)
{
...
if (bdi->dev) {
bdi_debug_unregister(bdi);
device_unregister(bdi->dev);
bdi->dev = NULL;
}
...
The solution maybe directly call @bdi_debug_unregister(bdi) and device_unregister(bdi->dev) explicitly in ->put_super(). I will make a patch to see whether it will cause any problem by this way. |
| Comment by Gerrit Updater [ 16/Aug/23 ] |
|
"Qian Yingjin <qian@ddn.com>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/51955 |
| Comment by Gerrit Updater [ 28/Sep/23 ] |
|
"Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/c/fs/lustre-release/+/51955/ |
| Comment by Peter Jones [ 28/Sep/23 ] |
|
It looks like this work is now complete for 2.16. The only remaining patches are fortestonly (so presumably will be abandoned) |