[LU-11806] sanity-hsm, sanity-dom, racer fails to run testing due to client mount fails with ‘File exists’ Created: 18/Dec/18  Updated: 21/Dec/18  Resolved: 21/Dec/18

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.12.0
Fix Version/s: None

Type: Bug Priority: Minor
Reporter: James Nunez (Inactive) Assignee: James A Simmons
Resolution: Duplicate Votes: 0
Labels: ubuntu
Environment:

Ubuntu 18.04 clients


Issue Links:
Duplicate
duplicates LU-11803 sanity test 255c fails with 'Ladvise ... Resolved
Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

No sanity-hsm tests are run due to a problem with clients mounting the Lustre file system. Same problem for sanity-dom and racer. So far, we only see this for Ubuntu 18.04 client testing.

Looking at the client test_log for https://testing.whamcloud.com/test_sets/d791f818-fdd4-11e8-a97c-52540065bddc , we see the two clients aren’t able to mount the file system

-----============= acceptance-small: sanity-hsm ============----- Tue Dec 11 00:37:27 UTC 2018
Running: bash /usr/lib64/lustre/tests/sanity-hsm.sh
CMD: trevis-19vm4 /usr/sbin/lctl get_param -n version 2>/dev/null ||
				/usr/sbin/lctl lustre_build_version 2>/dev/null ||
				/usr/sbin/lctl --version 2>/dev/null | cut -d' ' -f2
Starting client trevis-19vm1.trevis.whamcloud.com,trevis-19vm2:  -o user_xattr,flock trevis-19vm4@tcp:/lustre /mnt/lustre2
CMD: trevis-19vm1.trevis.whamcloud.com,trevis-19vm2 
running=\$(mount | grep -c /mnt/lustre2' ');
rc=0;
if [ \$running -eq 0 ] ; then
	mkdir -p /mnt/lustre2;
	mount -t lustre  -o user_xattr,flock trevis-19vm4@tcp:/lustre /mnt/lustre2;
	rc=\$?;
fi;
exit \$rc
trevis-19vm1: mount.lustre: mount trevis-19vm4@tcp:/lustre at /mnt/lustre2 failed: File exists
trevis-19vm2: mount.lustre: mount trevis-19vm4@tcp:/lustre at /mnt/lustre2 failed: File exists

Both clients have the following stack traces in their console logs

[  362.502293] sysfs: cannot create duplicate filename '/devices/virtual/bdi/lustre-        (ptrval)'
[  362.503296] WARNING: CPU: 1 PID: 2060 at /build/linux-wuhukg/linux-4.15.0/fs/sysfs/dir.c:31 sysfs_warn_dup+0x56/0x70
[  362.504411] Modules linked in: mgc(OE) lustre(OE) lmv(OE) mdc(OE) fid(OE) osc(OE) lov(OE) fld(OE) ksocklnd(OE) ptlrpc(OE) obdclass(OE) lnet(OE) libcfs(OE) rpcsec_gss_krb5 auth_rpcgss nfsv4 nfs lockd grace fscache crct10dif_pclmul crc32_pclmul ghash_clmulni_intel pcbc aesni_intel aes_x86_64 crypto_simd glue_helper cryptd joydev input_leds mac_hid serio_raw sch_fq_codel sunrpc ip_tables x_tables autofs4 psmouse virtio_blk floppy i2c_piix4 8139too 8139cp pata_acpi mii
[  362.508527] CPU: 1 PID: 2060 Comm: mount.lustre Tainted: G           OE    4.15.0-32-generic #35-Ubuntu
[  362.509518] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
[  362.510149] RIP: 0010:sysfs_warn_dup+0x56/0x70
[  362.510657] RSP: 0018:ffffafd10081f970 EFLAGS: 00010286
[  362.511235] RAX: 0000000000000000 RBX: ffff97c0f4a54000 RCX: 0000000000000006
[  362.512039] RDX: 0000000000000007 RSI: 0000000000000082 RDI: ffff97c13fd16490
[  362.512830] RBP: ffffafd10081f988 R08: 0000000000000000 R09: 000000000000020e
[  362.513597] R10: 0000000000000001 R11: ffffffff8471ca60 R12: ffff97c138186880
[  362.514358] R13: ffff97c13298fdd0 R14: ffff97c135b31000 R15: 0000000000000000
[  362.515125] FS:  00007fcb13183740(0000) GS:ffff97c13fd00000(0000) knlGS:0000000000000000
[  362.515988] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  362.516616] CR2: 00005616fd546338 CR3: 000000007861a003 CR4: 00000000000606e0
[  362.517389] Call Trace:
[  362.517716]  sysfs_create_dir_ns+0x77/0x90
[  362.518202]  kobject_add_internal+0xac/0x2b0
[  362.518697]  kobject_add+0x71/0xd0
[  362.519104]  ? _cond_resched+0x19/0x40
[  362.519556]  device_add+0x12c/0x680
[  362.519973]  device_create_groups_vargs+0xe4/0xf0
[  362.520515]  device_create_vargs+0x16/0x20
[  362.520996]  bdi_register_va.part.11+0x28/0x190
[  362.521532]  bdi_register_va+0x1b/0x20
[  362.521978]  super_setup_bdi_name+0x87/0xe0
[  362.522485]  ll_fill_super+0x1ce/0x1230 [lustre]
[  362.523035]  ? lustre_start_mgc+0x30e/0x2710 [obdclass]
[  362.523664]  ? libcfs_debug_msg+0x50/0x70 [libcfs]
[  362.524209]  ? libcfs_debug_msg+0x50/0x70 [libcfs]
[  362.524769]  lustre_fill_super+0x98d/0x2a10 [obdclass]
[  362.525345]  ? sget_userns+0x419/0x490
[  362.525788]  ? sget+0x7d/0xa0
[  362.526171]  ? lustre_common_put_super+0xbe0/0xbe0 [obdclass]
[  362.526818]  mount_nodev+0x4f/0xa0
[  362.527237]  lustre_mount+0x38/0x50 [obdclass]
[  362.527752]  mount_fs+0x37/0x150
[  362.528140]  vfs_kern_mount.part.23+0x5d/0x110
[  362.528652]  do_mount+0x5ed/0xce0
[  362.529047]  ? copy_mount_options+0x2c/0x220
[  362.529538]  SyS_mount+0x98/0xe0
[  362.529963]  do_syscall_64+0x73/0x130
[  362.530405]  entry_SYSCALL_64_after_hwframe+0x3d/0xa2
[  362.530976] RIP: 0033:0x7fcb122823ca
[  362.531402] RSP: 002b:00007ffeaea5d1c8 EFLAGS: 00000286 ORIG_RAX: 00000000000000a5
[  362.532216] RAX: ffffffffffffffda RBX: 00005616fd543f10 RCX: 00007fcb122823ca
[  362.532983] RDX: 00005616fb57d005 RSI: 00007ffeaea5d238 RDI: 00005616fd542260
[  362.533753] RBP: 00007ffeaea5d238 R08: 00005616fd543f10 R09: 0000000000000001
[  362.534519] R10: 0000000001000000 R11: 0000000000000286 R12: 0000000000000000
[  362.535281] R13: 00000000fffffff5 R14: 0000000000000000 R15: 00005616fb7895e0
[  362.536054] Code: 85 c0 48 89 c3 74 12 b9 00 10 00 00 48 89 c2 31 f6 4c 89 ef e8 ac c6 ff ff 4c 89 e2 48 89 de 48 c7 c7 00 5c 2f 85 e8 ea 76 d8 ff <0f> 0b 48 89 df e8 80 09 f4 ff 5b 41 5c 41 5d 5d c3 66 0f 1f 84 
[  362.537994] ---[ end trace 39c378564360064a ]---
[  362.538553] ------------[ cut here ]------------
[  362.539325] kobject_add_internal failed for lustre-        (ptrval) with -EEXIST, don't try to register things with the same name in the same directory.
[  362.540764] WARNING: CPU: 1 PID: 2060 at /build/linux-wuhukg/linux-4.15.0/lib/kobject.c:240 kobject_add_internal+0x26e/0x2b0
[  362.541940] Modules linked in: mgc(OE) lustre(OE) lmv(OE) mdc(OE) fid(OE) osc(OE) lov(OE) fld(OE) ksocklnd(OE) ptlrpc(OE) obdclass(OE) lnet(OE) libcfs(OE) rpcsec_gss_krb5 auth_rpcgss nfsv4 nfs lockd grace fscache crct10dif_pclmul crc32_pclmul ghash_clmulni_intel pcbc aesni_intel aes_x86_64 crypto_simd glue_helper cryptd joydev input_leds mac_hid serio_raw sch_fq_codel sunrpc ip_tables x_tables autofs4 psmouse virtio_blk floppy i2c_piix4 8139too 8139cp pata_acpi mii
[  362.546100] CPU: 1 PID: 2060 Comm: mount.lustre Tainted: G        W  OE    4.15.0-32-generic #35-Ubuntu
[  362.547086] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
[  362.547719] RIP: 0010:kobject_add_internal+0x26e/0x2b0
[  362.548286] RSP: 0018:ffffafd10081f9c0 EFLAGS: 00010282
[  362.548867] RAX: 0000000000000000 RBX: ffff97c135b31010 RCX: 0000000000000006
[  362.549677] RDX: 0000000000000007 RSI: 0000000000000086 RDI: ffff97c13fd16490
[  362.550438] RBP: ffffafd10081f9f0 R08: 0000000000000000 R09: 0000000000000243
[  362.551197] R10: ffffdbedc0d29400 R11: ffffffff8471ca60 R12: ffff97c13290c2a0
[  362.551964] R13: 00000000ffffffef R14: ffff97c135b31000 R15: 0000000000000000
[  362.552736] FS:  00007fcb13183740(0000) GS:ffff97c13fd00000(0000) knlGS:0000000000000000
[  362.553598] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  362.554240] CR2: 00005616fd546338 CR3: 000000007861a003 CR4: 00000000000606e0
[  362.555015] Call Trace:
[  362.555327]  kobject_add+0x71/0xd0
[  362.555736]  ? _cond_resched+0x19/0x40
[  362.556172]  device_add+0x12c/0x680
[  362.556590]  device_create_groups_vargs+0xe4/0xf0
[  362.557120]  device_create_vargs+0x16/0x20
[  362.557594]  bdi_register_va.part.11+0x28/0x190
[  362.558104]  bdi_register_va+0x1b/0x20
[  362.558549]  super_setup_bdi_name+0x87/0xe0
[  362.559041]  ll_fill_super+0x1ce/0x1230 [lustre]
[  362.559588]  ? lustre_start_mgc+0x30e/0x2710 [obdclass]
[  362.560174]  ? libcfs_debug_msg+0x50/0x70 [libcfs]
[  362.560720]  ? libcfs_debug_msg+0x50/0x70 [libcfs]
[  362.561276]  lustre_fill_super+0x98d/0x2a10 [obdclass]
[  362.561886]  ? sget_userns+0x419/0x490
[  362.562325]  ? sget+0x7d/0xa0
[  362.562711]  ? lustre_common_put_super+0xbe0/0xbe0 [obdclass]
[  362.563346]  mount_nodev+0x4f/0xa0
[  362.563770]  lustre_mount+0x38/0x50 [obdclass]
[  362.564276]  mount_fs+0x37/0x150
[  362.564668]  vfs_kern_mount.part.23+0x5d/0x110
[  362.565170]  do_mount+0x5ed/0xce0
[  362.565575]  ? copy_mount_options+0x2c/0x220
[  362.566061]  SyS_mount+0x98/0xe0
[  362.566452]  do_syscall_64+0x73/0x130
[  362.566883]  entry_SYSCALL_64_after_hwframe+0x3d/0xa2
[  362.567451] RIP: 0033:0x7fcb122823ca
[  362.567868] RSP: 002b:00007ffeaea5d1c8 EFLAGS: 00000286 ORIG_RAX: 00000000000000a5
[  362.568686] RAX: ffffffffffffffda RBX: 00005616fd543f10 RCX: 00007fcb122823ca
[  362.569463] RDX: 00005616fb57d005 RSI: 00007ffeaea5d238 RDI: 00005616fd542260
[  362.570224] RBP: 00007ffeaea5d238 R08: 00005616fd543f10 R09: 0000000000000001
[  362.570990] R10: 0000000001000000 R11: 0000000000000286 R12: 0000000000000000
[  362.571762] R13: 00000000fffffff5 R14: 0000000000000000 R15: 00005616fb7895e0
[  362.572530] Code: 49 89 c4 48 85 ff 0f 84 41 fe ff ff 48 83 c7 18 e9 fc fd ff ff 48 8b 13 48 c7 c6 f0 e3 11 85 48 c7 c7 f0 09 3a 85 e8 72 3f 71 ff <0f> 0b e9 8f fe ff ff 0f 0b eb a5 0f 0b eb 98 48 89 de 48 c7 c7 
[  362.574470] ---[ end trace 39c378564360064b ]---
[  362.576724] Lustre: Unmounted lustre-client
[  362.577624] LustreError: 2060:0:(obd_mount.c:1608:lustre_fill_super()) Unable to mount  (-17)

All of these failures take place after a node-reset/lustre-initialization.

There are several failures like this at
https://testing.whamcloud.com/test_sets/04303b9a-fa11-11e8-8a18-52540065bddc
https://testing.whamcloud.com/test_sets/64026ecc-fdd5-11e8-93ea-52540065bddc



 Comments   
Comment by Peter Jones [ 18/Dec/18 ]

James

Could you please comment on this one?

Thanks

Peter

Comment by Andreas Dilger [ 18/Dec/18 ]

Also has similar parameter name problem:

sysfs: cannot create duplicate filename '/devices/virtual/bdi/lustre-        (ptrval)'
Comment by James A Simmons [ 18/Dec/18 ]

I see the problem. Its the uuid issue. So newer kernels no allow you to expose the pointer address to userland. We need to create a new UUID method that is not the pointer of an internal kernel object

Comment by Andreas Dilger [ 21/Dec/18 ]

Close as a duplicate of LU-11803, which has a patch.

Generated at Sat Feb 10 02:47:05 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.